85 lines
4 KiB
Markdown
85 lines
4 KiB
Markdown
# sekft
|
|
|
|
Fine-tune small open models to operate a POSIX shell as a self-directed citizen:
|
|
land with **no imperative**, discover where directives live, learn the provider
|
|
from its own self-documentation, do the work, and terminate (`exit` on success,
|
|
`panic` when genuinely blocked).
|
|
|
|
> **Not tool-calling.** sekft trains shell operation, not function-calling. The
|
|
> model is given no typed tool API and no JSON-schema action list; it writes
|
|
> plain-text commands at a real prompt, with the whole system as its action
|
|
> space, discovered like a person would (`--help`, `man`, `ls`) rather than
|
|
> enumerated up front.
|
|
|
|
sekft is the **training half**. The dataset and the synthetic-data factory live
|
|
in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
|
|
depends on. Here live the trainer, the behavioural evaluator, and the
|
|
resident-base harness.
|
|
|
|
## Components
|
|
|
|
- **`sekft.sft`** (`sekft-train`) — supervised fine-tuner. Renders trajectories
|
|
with the tokenizer's own chat template and trains an **assistant-only** loss
|
|
mask (the commands plus the terminal token; environment turns masked to -100)
|
|
into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a
|
|
shell-operator SFT, so it is the part tested hardest.
|
|
- **`sekft.eval`** (`sekft-eval`) — behavioural eval. Train loss says nothing
|
|
about whether the model operates the shell and leaves. This drops base +
|
|
adapter into held-out scenarios with no scaffold and reports the rates that
|
|
count: reach command-mode, terminate, checker passes.
|
|
- **`sekft.resident`** (`sekft-resident`) — resident-base harness. Loads the
|
|
14 GB base once and keeps it hot, training and evaluating adapters without
|
|
reloading it (over OcuLink/PCIe the base transfer otherwise dominates every
|
|
run).
|
|
|
|
## The render contract
|
|
|
|
The render the model trains on MUST equal what it is served with. The serving
|
|
harness (ccpty) sends structured `{role, content}` messages over the OpenAI
|
|
chat-completions protocol, so the endpoint applies the **model's own chat
|
|
template**. sekft therefore renders with `apply_chat_template`, after
|
|
`normalize_for_template` canonicalises each session: a leading `system` turn is
|
|
folded into the first `user` turn and consecutive same-role turns are merged,
|
|
because instruct templates such as Mistral's have no system role and require
|
|
strict user/assistant alternation. The same canonicalisation must run
|
|
serve-side, or train and serve diverge.
|
|
|
|
## Install
|
|
|
|
The training paths only run on a CUDA host, so the GPU stack is an extra:
|
|
|
|
```sh
|
|
pipenv install # editable sekft + the local editable posix-sdc
|
|
pipenv install -e '.[gpu]' # torch / transformers / peft / datasets, on the box
|
|
```
|
|
|
|
`pyproject.toml` declares `tiararodney.posix-sdc` abstractly; the `Pipfile`
|
|
overrides it with the local editable `../posix-sdc` for side-by-side development.
|
|
|
|
## Use (on the GPU box)
|
|
|
|
```sh
|
|
# fine-tune an adapter on the posix-sdc trajectories
|
|
sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
|
|
--out ./ckpt --load-4bit
|
|
|
|
# inspect the assistant-only loss mask without training (runs anywhere)
|
|
sekft-train --data ./trajectories --base <dir> --inspect
|
|
|
|
# behavioural eval on held-out scenario bundles (worlds, not trajectories)
|
|
sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16
|
|
|
|
# resident loop: load the base once, cycle adapters without reloading it
|
|
sekft-resident --base <dir> --load-4bit
|
|
```
|
|
|
|
The eval consumes held-out **scenario bundles** from posix-sdc (it stands up and
|
|
verifies each in a fresh container), not trajectories.
|
|
|
|
## Result
|
|
|
|
Fine-tuning `mistralai/Mistral-7B-Instruct-v0.2` on the posix-sdc data lifted
|
|
clean termination on archetype-level held-out scenarios from **0/16 (base) to
|
|
9/16 (tuned)**: the operate-and-terminate mechanism generalised to unseen task
|
|
types, while task competence stayed archetype-local. See the experiment
|
|
[*From seed to weights*](https://blog.tiararodney.com/projects/2026/semantic-execution-kernel/experiments/from-seed-to-weights/).
|