No description

Find a file

Tiara Rodney 157bb4955d bugfix(11): operate_rate must not sum a None sum(t.steps > 0 and t.meta.get("clean") for t in rows) yields the right operand of `and` when steps>0, so a trajectory whose meta lacks the "clean" key contributes None and sum() raises TypeError. Wrap the predicate in bool() so it counts trajectories that operated and are clean. Surfaced by mypy once posix-sdc began shipping py.typed (Trajectory is now typed).		2026-06-17 23:49:26 +02:00
src/tiararodney/sekft	bugfix(11): operate_rate must not sum a None	2026-06-17 23:49:26 +02:00
tests	test: annotate the sft test helpers	2026-06-17 14:03:52 +02:00
.gitignore	chore: initial commit	2026-06-16 20:13:14 +02:00
LICENSE	chore: add GPL-2.0 license	2026-06-16 20:28:17 +02:00
Pipfile	chore: set up mypy strict checking and ship py.typed	2026-06-17 14:03:46 +02:00
pyproject.toml	chore: set up mypy strict checking and ship py.typed	2026-06-17 14:03:46 +02:00
README.md	docs: rewrite README for the packaged trainer	2026-06-16 23:49:01 +02:00
TODO	todo(11): in-progress	2026-06-17 23:48:29 +02:00
tox.ini	chore: add tox lint, format and test environments	2026-06-16 20:13:49 +02:00

README.md

sekft

Fine-tune small open models to operate a POSIX shell as a self-directed citizen: land with no imperative, discover where directives live, learn the provider from its own self-documentation, do the work, and terminate (exit on success, panic when genuinely blocked).

sekft is the training half. The dataset and the synthetic-data factory live in posix-sdc (tiararodney.posix-sdc), which this package depends on. Here live the trainer, the behavioural evaluator, and the resident-base harness.

Components

sekft.sft (sekft-train) — supervised fine-tuner. Renders trajectories with the tokenizer's own chat template and trains an assistant-only loss mask (the commands plus the terminal token; environment turns masked to -100) into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a shell-operator SFT, so it is the part tested hardest.
sekft.eval (sekft-eval) — behavioural eval. Train loss says nothing about whether the model operates the shell and leaves. This drops base + adapter into held-out scenarios with no scaffold and reports the rates that count: reach command-mode, terminate, checker passes.
sekft.resident (sekft-resident) — resident-base harness. Loads the 14 GB base once and keeps it hot, training and evaluating adapters without reloading it (over OcuLink/PCIe the base transfer otherwise dominates every run).

The render contract

The render the model trains on MUST equal what it is served with. The serving harness (ccpty) sends structured {role, content} messages over the OpenAI chat-completions protocol, so the endpoint applies the model's own chat template. sekft therefore renders with apply_chat_template, after normalize_for_template canonicalises each session: a leading system turn is folded into the first user turn and consecutive same-role turns are merged, because instruct templates such as Mistral's have no system role and require strict user/assistant alternation. The same canonicalisation must run serve-side, or train and serve diverge.

Install

The training paths only run on a CUDA host, so the GPU stack is an extra:

pipenv install              # editable sekft + the local editable posix-sdc
pipenv install -e '.[gpu]'  # torch / transformers / peft / datasets, on the box

pyproject.toml declares tiararodney.posix-sdc abstractly; the Pipfile overrides it with the local editable ../posix-sdc for side-by-side development.

Use (on the GPU box)

# fine-tune an adapter on the posix-sdc trajectories
sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
            --out ./ckpt --load-4bit

# inspect the assistant-only loss mask without training (runs anywhere)
sekft-train --data ./trajectories --base <dir> --inspect

# behavioural eval on held-out scenario bundles (worlds, not trajectories)
sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16

# resident loop: load the base once, cycle adapters without reloading it
sekft-resident --base <dir> --load-4bit

The eval consumes held-out scenario bundles from posix-sdc (it stands up and verifies each in a fresh container), not trajectories.

Result

Fine-tuning mistralai/Mistral-7B-Instruct-v0.2 on the posix-sdc data lifted clean termination on archetype-level held-out scenarios from 0/16 (base) to 9/16 (tuned): the operate-and-terminate mechanism generalised to unseen task types, while task competence stayed archetype-local. See the experiment From seed to weights.