sekft/README.md

# sekft

Fine-tune small open models to operate a POSIX shell as a self-directed citizen:
land with **no imperative**, discover where directives live, learn the provider
from its own self-documentation, do the work, and terminate (`exit` on success,
`panic` when genuinely blocked).

> **Not tool-calling.** sekft trains shell operation, not function-calling. The
> model is given no typed tool API and no JSON-schema action list; it writes
> plain-text commands at a real prompt, with the whole system as its action
> space, discovered like a person would (`--help`, `man`, `ls`) rather than
> enumerated up front.

sekft is the **training half**. The dataset and the synthetic-data factory live
in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
depends on. Here live the trainer, the behavioural evaluator, and the
resident-base harness.

## Components

- **`sekft.sft`** (`sekft-train`) — supervised fine-tuner. Renders trajectories
  with the tokenizer's own chat template and trains an **assistant-only** loss
  mask (the commands plus the terminal token; environment turns masked to -100)
  into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a
  shell-operator SFT, so it is the part tested hardest.
- **`sekft.eval`** (`sekft-eval`) — behavioural eval. Train loss says nothing
  about whether the model operates the shell and leaves. This drops base +
  adapter into held-out scenarios with no scaffold and reports the rates that
  count: reach command-mode, terminate, checker passes.
- **`sekft.resident`** (`sekft-resident`) — resident-base harness. Loads the
  14 GB base once and keeps it hot, training and evaluating adapters without
  reloading it (over OcuLink/PCIe the base transfer otherwise dominates every
  run).

## The render contract

The render the model trains on MUST equal what it is served with. The serving
harness (ccpty) sends structured `{role, content}` messages over the OpenAI
chat-completions protocol, so the endpoint applies the **model's own chat
template**. sekft therefore renders with `apply_chat_template`, after
`normalize_for_template` canonicalises each session: a leading `system` turn is
folded into the first `user` turn and consecutive same-role turns are merged,
because instruct templates such as Mistral's have no system role and require
strict user/assistant alternation. The same canonicalisation must run
serve-side, or train and serve diverge.

## Install

The training paths only run on a CUDA host, so the GPU stack is an extra:

```sh
pipenv install              # editable sekft + the local editable posix-sdc
pipenv install -e '.[gpu]'  # torch / transformers / peft / datasets, on the box
```

`pyproject.toml` declares `tiararodney.posix-sdc` abstractly; the `Pipfile`
overrides it with the local editable `../posix-sdc` for side-by-side development.

## Use (on the GPU box)

```sh
# fine-tune an adapter on the posix-sdc trajectories
sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
            --out ./ckpt --load-4bit

# inspect the assistant-only loss mask without training (runs anywhere)
sekft-train --data ./trajectories --base <dir> --inspect

# behavioural eval on held-out scenario bundles (worlds, not trajectories)
sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16

# resident loop: load the base once, cycle adapters without reloading it
sekft-resident --base <dir> --load-4bit
```

The eval consumes held-out **scenario bundles** from posix-sdc (it stands up and
verifies each in a fresh container), not trajectories.

## Result

Fine-tuning `mistralai/Mistral-7B-Instruct-v0.2` on the posix-sdc data lifted
clean termination on archetype-level held-out scenarios from **0/16 (base) to
9/16 (tuned)**: the operate-and-terminate mechanism generalised to unseen task
types, while task competence stayed archetype-local. See the experiment
[*From seed to weights*](https://blog.tiararodney.com/projects/2026/semantic-execution-kernel/experiments/from-seed-to-weights/).