# sekft Fine-tune small open models to operate a POSIX shell as a self-directed citizen: land with **no imperative**, discover where directives live, learn the provider from its own self-documentation, do the work, and terminate (`exit` on success, `panic` when genuinely blocked). > **Not tool-calling.** sekft trains shell operation, not function-calling. The > model is given no typed tool API and no JSON-schema action list; it writes > plain-text commands at a real prompt, with the whole system as its action > space, discovered like a person would (`--help`, `man`, `ls`) rather than > enumerated up front. sekft is the **training half**. The dataset and the synthetic-data factory live in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package depends on. Here live the trainer, the behavioural evaluator, and the resident-base harness. ## Components - **`sekft.sft`** (`sekft-train`) — supervised fine-tuner. Renders trajectories with the tokenizer's own chat template and trains an **assistant-only** loss mask (the commands plus the terminal token; environment turns masked to -100) into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a shell-operator SFT, so it is the part tested hardest. - **`sekft.eval`** (`sekft-eval`) — behavioural eval. Train loss says nothing about whether the model operates the shell and leaves. This drops base + adapter into held-out scenarios with no scaffold and reports the rates that count: reach command-mode, terminate, checker passes. - **`sekft.resident`** (`sekft-resident`) — resident-base harness. Loads the 14 GB base once and keeps it hot, training and evaluating adapters without reloading it (over OcuLink/PCIe the base transfer otherwise dominates every run). ## The render contract The render the model trains on MUST equal what it is served with. The serving harness (ccpty) sends structured `{role, content}` messages over the OpenAI chat-completions protocol, so the endpoint applies the **model's own chat template**. sekft therefore renders with `apply_chat_template`, after `normalize_for_template` canonicalises each session: a leading `system` turn is folded into the first `user` turn and consecutive same-role turns are merged, because instruct templates such as Mistral's have no system role and require strict user/assistant alternation. The same canonicalisation must run serve-side, or train and serve diverge. ## Install The training paths only run on a CUDA host, so the GPU stack is an extra: ```sh pipenv install # editable sekft + the local editable posix-sdc pipenv install -e '.[gpu]' # torch / transformers / peft / datasets, on the box ``` `pyproject.toml` declares `tiararodney.posix-sdc` abstractly; the `Pipfile` overrides it with the local editable `../posix-sdc` for side-by-side development. ## Use (on the GPU box) ```sh # fine-tune an adapter on the posix-sdc trajectories sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \ --out ./ckpt --load-4bit # inspect the assistant-only loss mask without training (runs anywhere) sekft-train --data ./trajectories --base