No description

Find a file

Tiara Rodney 814261dc56 todo(10): open		2026-06-17 23:43:08 +02:00
src/tiararodney/sekft	refactor: annotate the trainer modules under mypy strict	2026-06-17 14:03:52 +02:00
tests	test: annotate the sft test helpers	2026-06-17 14:03:52 +02:00
.gitignore	chore: initial commit	2026-06-16 20:13:14 +02:00
LICENSE	chore: add GPL-2.0 license	2026-06-16 20:28:17 +02:00
Pipfile	chore: set up mypy strict checking and ship py.typed	2026-06-17 14:03:46 +02:00
pyproject.toml	chore: set up mypy strict checking and ship py.typed	2026-06-17 14:03:46 +02:00
README.md	docs: rewrite README for the packaged trainer	2026-06-16 23:49:01 +02:00
TODO	todo(10): open	2026-06-17 23:43:08 +02:00
tox.ini	chore: add tox lint, format and test environments	2026-06-16 20:13:49 +02:00

README.md

sekft

Fine-tune small open models to operate a POSIX shell as a self-directed citizen: land with no imperative, discover where directives live, learn the provider from its own self-documentation, do the work, and terminate (exit on success, panic when genuinely blocked).

sekft is the training half. The dataset and the synthetic-data factory live in posix-sdc (tiararodney.posix-sdc), which this package depends on. Here live the trainer, the behavioural evaluator, and the resident-base harness.

Components

sekft.sft (sekft-train) — supervised fine-tuner. Renders trajectories with the tokenizer's own chat template and trains an assistant-only loss mask (the commands plus the terminal token; environment turns masked to -100) into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a shell-operator SFT, so it is the part tested hardest.
sekft.eval (sekft-eval) — behavioural eval. Train loss says nothing about whether the model operates the shell and leaves. This drops base + adapter into held-out scenarios with no scaffold and reports the rates that count: reach command-mode, terminate, checker passes.
sekft.resident (sekft-resident) — resident-base harness. Loads the 14 GB base once and keeps it hot, training and evaluating adapters without reloading it (over OcuLink/PCIe the base transfer otherwise dominates every run).

The render contract

The render the model trains on MUST equal what it is served with. The serving harness (ccpty) sends structured {role, content} messages over the OpenAI chat-completions protocol, so the endpoint applies the model's own chat template. sekft therefore renders with apply_chat_template, after normalize_for_template canonicalises each session: a leading system turn is folded into the first user turn and consecutive same-role turns are merged, because instruct templates such as Mistral's have no system role and require strict user/assistant alternation. The same canonicalisation must run serve-side, or train and serve diverge.

Install

The training paths only run on a CUDA host, so the GPU stack is an extra:

pipenv install              # editable sekft + the local editable posix-sdc
pipenv install -e '.[gpu]'  # torch / transformers / peft / datasets, on the box

pyproject.toml declares tiararodney.posix-sdc abstractly; the Pipfile overrides it with the local editable ../posix-sdc for side-by-side development.

Use (on the GPU box)

# fine-tune an adapter on the posix-sdc trajectories
sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
            --out ./ckpt --load-4bit

# inspect the assistant-only loss mask without training (runs anywhere)
sekft-train --data ./trajectories --base <dir> --inspect

# behavioural eval on held-out scenario bundles (worlds, not trajectories)
sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16

# resident loop: load the base once, cycle adapters without reloading it
sekft-resident --base <dir> --load-4bit

The eval consumes held-out scenario bundles from posix-sdc (it stands up and verifies each in a fresh container), not trajectories.

Result

Fine-tuning mistralai/Mistral-7B-Instruct-v0.2 on the posix-sdc data lifted clean termination on archetype-level held-out scenarios from 0/16 (base) to 9/16 (tuned): the operate-and-terminate mechanism generalised to unseen task types, while task competence stayed archetype-local. See the experiment From seed to weights.