|
|
||
|---|---|---|
| src/tiararodney/sekft | ||
| tests | ||
| .gitignore | ||
| CHANGELOG.md | ||
| LICENSE | ||
| Pipfile | ||
| Pipfile.lock | ||
| pyproject.toml | ||
| README.md | ||
| TODO | ||
| tox.ini | ||
sekft
Fine-tune small open models to operate a POSIX shell as a self-directed citizen:
land with no imperative, discover where directives live, learn the provider
from its own self-documentation, do the work, and terminate (exit on success,
panic when genuinely blocked).
sekft is the training half. The dataset and the synthetic-data factory live
in posix-sdc (tiararodney.posix-sdc), which this package
depends on. Here live the trainer, the behavioural evaluator, and the
resident-base harness.
Components
sekft.sft(sekft-train) — supervised fine-tuner. Renders trajectories with the tokenizer's own chat template and trains an assistant-only loss mask (the commands plus the terminal token; environment turns masked to -100) into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a shell-operator SFT, so it is the part tested hardest.sekft.eval(sekft-eval) — behavioural eval. Train loss says nothing about whether the model operates the shell and leaves. This drops base + adapter into held-out scenarios with no scaffold and reports the rates that count: reach command-mode, terminate, checker passes.sekft.resident(sekft-resident) — resident-base harness. Loads the 14 GB base once and keeps it hot, training and evaluating adapters without reloading it (over OcuLink/PCIe the base transfer otherwise dominates every run).
The render contract
The render the model trains on MUST equal what it is served with. The serving
harness (ccpty) sends structured {role, content} messages over the OpenAI
chat-completions protocol, so the endpoint applies the model's own chat
template. sekft therefore renders with apply_chat_template, after
normalize_for_template canonicalises each session: a leading system turn is
folded into the first user turn and consecutive same-role turns are merged,
because instruct templates such as Mistral's have no system role and require
strict user/assistant alternation. The same canonicalisation must run
serve-side, or train and serve diverge.
Install
The training paths only run on a CUDA host, so the GPU stack is an extra:
pipenv install # editable sekft + the local editable posix-sdc
pipenv install -e '.[gpu]' # torch / transformers / peft / datasets, on the box
pyproject.toml declares tiararodney.posix-sdc abstractly; the Pipfile
overrides it with the local editable ../posix-sdc for side-by-side development.
Use (on the GPU box)
# fine-tune an adapter on the posix-sdc trajectories
sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
--out ./ckpt --load-4bit
# inspect the assistant-only loss mask without training (runs anywhere)
sekft-train --data ./trajectories --base <dir> --inspect
# behavioural eval on held-out scenario bundles (worlds, not trajectories)
sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16
# resident loop: load the base once, cycle adapters without reloading it
sekft-resident --base <dir> --load-4bit
The eval consumes held-out scenario bundles from posix-sdc (it stands up and verifies each in a fresh container), not trajectories.
Result
Fine-tuning mistralai/Mistral-7B-Instruct-v0.2 on the posix-sdc data lifted
clean termination on archetype-level held-out scenarios from 0/16 (base) to
9/16 (tuned): the operate-and-terminate mechanism generalised to unseen task
types, while task competence stayed archetype-local. See the experiment
From seed to weights.