No description
Find a file
Tiara Rodney 157bb4955d
bugfix(11): operate_rate must not sum a None
sum(t.steps > 0 and t.meta.get("clean") for t in rows) yields the right operand
of `and` when steps>0, so a trajectory whose meta lacks the "clean" key
contributes None and sum() raises TypeError. Wrap the predicate in bool() so it
counts trajectories that operated and are clean. Surfaced by mypy once posix-sdc
began shipping py.typed (Trajectory is now typed).
2026-06-17 23:49:26 +02:00
src/tiararodney/sekft bugfix(11): operate_rate must not sum a None 2026-06-17 23:49:26 +02:00
tests test: annotate the sft test helpers 2026-06-17 14:03:52 +02:00
.gitignore chore: initial commit 2026-06-16 20:13:14 +02:00
LICENSE chore: add GPL-2.0 license 2026-06-16 20:28:17 +02:00
Pipfile chore: set up mypy strict checking and ship py.typed 2026-06-17 14:03:46 +02:00
pyproject.toml chore: set up mypy strict checking and ship py.typed 2026-06-17 14:03:46 +02:00
README.md docs: rewrite README for the packaged trainer 2026-06-16 23:49:01 +02:00
TODO todo(11): in-progress 2026-06-17 23:48:29 +02:00
tox.ini chore: add tox lint, format and test environments 2026-06-16 20:13:49 +02:00

sekft

Fine-tune small open models to operate a POSIX shell as a self-directed citizen: land with no imperative, discover where directives live, learn the provider from its own self-documentation, do the work, and terminate (exit on success, panic when genuinely blocked).

sekft is the training half. The dataset and the synthetic-data factory live in posix-sdc (tiararodney.posix-sdc), which this package depends on. Here live the trainer, the behavioural evaluator, and the resident-base harness.

Components

  • sekft.sft (sekft-train) — supervised fine-tuner. Renders trajectories with the tokenizer's own chat template and trains an assistant-only loss mask (the commands plus the terminal token; environment turns masked to -100) into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a shell-operator SFT, so it is the part tested hardest.
  • sekft.eval (sekft-eval) — behavioural eval. Train loss says nothing about whether the model operates the shell and leaves. This drops base + adapter into held-out scenarios with no scaffold and reports the rates that count: reach command-mode, terminate, checker passes.
  • sekft.resident (sekft-resident) — resident-base harness. Loads the 14 GB base once and keeps it hot, training and evaluating adapters without reloading it (over OcuLink/PCIe the base transfer otherwise dominates every run).

The render contract

The render the model trains on MUST equal what it is served with. The serving harness (ccpty) sends structured {role, content} messages over the OpenAI chat-completions protocol, so the endpoint applies the model's own chat template. sekft therefore renders with apply_chat_template, after normalize_for_template canonicalises each session: a leading system turn is folded into the first user turn and consecutive same-role turns are merged, because instruct templates such as Mistral's have no system role and require strict user/assistant alternation. The same canonicalisation must run serve-side, or train and serve diverge.

Install

The training paths only run on a CUDA host, so the GPU stack is an extra:

pipenv install              # editable sekft + the local editable posix-sdc
pipenv install -e '.[gpu]'  # torch / transformers / peft / datasets, on the box

pyproject.toml declares tiararodney.posix-sdc abstractly; the Pipfile overrides it with the local editable ../posix-sdc for side-by-side development.

Use (on the GPU box)

# fine-tune an adapter on the posix-sdc trajectories
sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
            --out ./ckpt --load-4bit

# inspect the assistant-only loss mask without training (runs anywhere)
sekft-train --data ./trajectories --base <dir> --inspect

# behavioural eval on held-out scenario bundles (worlds, not trajectories)
sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16

# resident loop: load the base once, cycle adapters without reloading it
sekft-resident --base <dir> --load-4bit

The eval consumes held-out scenario bundles from posix-sdc (it stands up and verifies each in a fresh container), not trajectories.

Result

Fine-tuning mistralai/Mistral-7B-Instruct-v0.2 on the posix-sdc data lifted clean termination on archetype-level held-out scenarios from 0/16 (base) to 9/16 (tuned): the operate-and-terminate mechanism generalised to unseen task types, while task competence stayed archetype-local. See the experiment From seed to weights.