sum(t.steps > 0 and t.meta.get("clean") for t in rows) yields the right operand
of `and` when steps>0, so a trajectory whose meta lacks the "clean" key
contributes None and sum() raises TypeError. Wrap the predicate in bool() so it
counts trajectories that operated and are clean. Surfaced by mypy once posix-sdc
began shipping py.typed (Trajectory is now typed).
|
||
|---|---|---|
| src/tiararodney/sekft | ||
| tests | ||
| .gitignore | ||
| LICENSE | ||
| Pipfile | ||
| pyproject.toml | ||
| README.md | ||
| TODO | ||
| tox.ini | ||
sekft
Fine-tune small open models to operate a POSIX shell as a self-directed citizen:
land with no imperative, discover where directives live, learn the provider
from its own self-documentation, do the work, and terminate (exit on success,
panic when genuinely blocked).
sekft is the training half. The dataset and the synthetic-data factory live
in posix-sdc (tiararodney.posix-sdc), which this package
depends on. Here live the trainer, the behavioural evaluator, and the
resident-base harness.
Components
sekft.sft(sekft-train) — supervised fine-tuner. Renders trajectories with the tokenizer's own chat template and trains an assistant-only loss mask (the commands plus the terminal token; environment turns masked to -100) into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a shell-operator SFT, so it is the part tested hardest.sekft.eval(sekft-eval) — behavioural eval. Train loss says nothing about whether the model operates the shell and leaves. This drops base + adapter into held-out scenarios with no scaffold and reports the rates that count: reach command-mode, terminate, checker passes.sekft.resident(sekft-resident) — resident-base harness. Loads the 14 GB base once and keeps it hot, training and evaluating adapters without reloading it (over OcuLink/PCIe the base transfer otherwise dominates every run).
The render contract
The render the model trains on MUST equal what it is served with. The serving
harness (ccpty) sends structured {role, content} messages over the OpenAI
chat-completions protocol, so the endpoint applies the model's own chat
template. sekft therefore renders with apply_chat_template, after
normalize_for_template canonicalises each session: a leading system turn is
folded into the first user turn and consecutive same-role turns are merged,
because instruct templates such as Mistral's have no system role and require
strict user/assistant alternation. The same canonicalisation must run
serve-side, or train and serve diverge.
Install
The training paths only run on a CUDA host, so the GPU stack is an extra:
pipenv install # editable sekft + the local editable posix-sdc
pipenv install -e '.[gpu]' # torch / transformers / peft / datasets, on the box
pyproject.toml declares tiararodney.posix-sdc abstractly; the Pipfile
overrides it with the local editable ../posix-sdc for side-by-side development.
Use (on the GPU box)
# fine-tune an adapter on the posix-sdc trajectories
sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
--out ./ckpt --load-4bit
# inspect the assistant-only loss mask without training (runs anywhere)
sekft-train --data ./trajectories --base <dir> --inspect
# behavioural eval on held-out scenario bundles (worlds, not trajectories)
sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16
# resident loop: load the base once, cycle adapters without reloading it
sekft-resident --base <dir> --load-4bit
The eval consumes held-out scenario bundles from posix-sdc (it stands up and verifies each in a fresh container), not trajectories.
Result
Fine-tuning mistralai/Mistral-7B-Instruct-v0.2 on the posix-sdc data lifted
clean termination on archetype-level held-out scenarios from 0/16 (base) to
9/16 (tuned): the operate-and-terminate mechanism generalised to unseen task
types, while task competence stayed archetype-local. See the experiment
From seed to weights.