Merge branch 'feature/8'
This commit is contained in:
commit
e46e12c70b
5 changed files with 77 additions and 86 deletions
136
README.md
136
README.md
|
|
@ -1,89 +1,79 @@
|
||||||
# sekft
|
# sekft
|
||||||
|
|
||||||
Synthetic-trajectory generation for fine-tuning a model to operate a shell
|
Fine-tune small open models to operate a POSIX shell as a self-directed citizen:
|
||||||
as a self-directed citizen: land with **no imperative**, discover where
|
land with **no imperative**, discover where directives live, learn the provider
|
||||||
directives live, learn the provider from its own self-documentation, retrieve
|
from its own self-documentation, do the work, and terminate (`exit` on success,
|
||||||
the directives, execute them, and terminate (`exit` on success, `panic` when
|
`panic` when genuinely blocked).
|
||||||
genuinely blocked).
|
|
||||||
|
|
||||||
The dataset teaches a **mechanism, not a program**. Every axis of a scenario
|
sekft is the **training half**. The dataset and the synthetic-data factory live
|
||||||
is varied; only the four-step routine is held invariant:
|
in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
|
||||||
|
depends on. Here live the trainer, the behavioural evaluator, and the
|
||||||
|
resident-base harness.
|
||||||
|
|
||||||
1. **expect an announcement** of where directives are (motd / banner / env / file)
|
## Components
|
||||||
2. **understand the provider** via its self-documentation (`--help` / `man` / usage)
|
|
||||||
3. **retrieve** the directives
|
|
||||||
4. **execute**, then terminate
|
|
||||||
|
|
||||||
Bind the *convention* (there is an announcement at entry; tools are
|
- **`sekft.sft`** (`sekft-train`) — supervised fine-tuner. Renders trajectories
|
||||||
self-documenting), free everything else. The model that learns this tolerates
|
with the tokenizer's own chat template and trains an **assistant-only** loss
|
||||||
an unstable userland because it re-learns the interface every session.
|
mask (the commands plus the terminal token; environment turns masked to -100)
|
||||||
|
into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a
|
||||||
|
shell-operator SFT, so it is the part tested hardest.
|
||||||
|
- **`sekft.eval`** (`sekft-eval`) — behavioural eval. Train loss says nothing
|
||||||
|
about whether the model operates the shell and leaves. This drops base +
|
||||||
|
adapter into held-out scenarios with no scaffold and reports the rates that
|
||||||
|
count: reach command-mode, terminate, checker passes.
|
||||||
|
- **`sekft.resident`** (`sekft-resident`) — resident-base harness. Loads the
|
||||||
|
14 GB base once and keeps it hot, training and evaluating adapters without
|
||||||
|
reloading it (over OcuLink/PCIe the base transfer otherwise dominates every
|
||||||
|
run).
|
||||||
|
|
||||||
## Pipeline
|
## The render contract
|
||||||
|
|
||||||
```
|
The render the model trains on MUST equal what it is served with. The serving
|
||||||
A. author generate.py model writes scenario bundles from the taxonomy
|
harness (ccpty) sends structured `{role, content}` messages over the OpenAI
|
||||||
+ ref-gate dashdocker.py run the bundle's own reference solution; admit only if its checker passes
|
chat-completions protocol, so the endpoint applies the **model's own chat
|
||||||
B. rollout rollout.py scaffolded operator model acts in a fresh dash-in-docker container
|
template**. sekft therefore renders with `apply_chat_template`, after
|
||||||
C. verify rollout.py run the checker against container STATE (effect, not transcript)
|
`normalize_for_template` canonicalises each session: a leading `system` turn is
|
||||||
D. record rollout.py strip the operator scaffold; save env<->action turns in deploy format
|
folded into the first `user` turn and consecutive same-role turns are merged,
|
||||||
E. pairs [seam] rejects from B/C become DPO negatives against keepers from the same scenario
|
because instruct templates such as Mistral's have no system role and require
|
||||||
```
|
strict user/assistant alternation. The same canonicalisation must run
|
||||||
|
serve-side, or train and serve diverge.
|
||||||
|
|
||||||
This repo implements **A-D** plus the execution backend (`dashdocker.py`).
|
## Install
|
||||||
Stage E (preference-pair assembly from the kept/rejected trajectories) is the
|
|
||||||
remaining seam; the rejects are already labelled by `outcome`/`keep`.
|
|
||||||
|
|
||||||
## Files
|
The training paths only run on a CUDA host, so the GPU stack is an extra:
|
||||||
|
|
||||||
- `taxonomy.py` - the axes of variation (task / provider / announcement /
|
|
||||||
doc-depth / difficulty) as pure data. No model, no container.
|
|
||||||
- `schema.py` - the `Scenario` bundle dataclasses + JSON (de)serialisation.
|
|
||||||
- `generate.py` - sample a combo, prompt a teacher model to author the bundle,
|
|
||||||
gate on the reference solution, write validated bundles to disk.
|
|
||||||
- `dashdocker.py` - the dash-in-Docker backend. `run(fixtures, script)` for the
|
|
||||||
one-shot reference gate; `session(fixtures)` for stateful rollouts, with
|
|
||||||
`Session.exec` (state-replayed), `.cwd()` (prompt building), `.check()` (Stage
|
|
||||||
C). Each command runs as its own `docker exec` (no tty buffering); cwd +
|
|
||||||
exported env are replayed between commands; `exit`/`panic` are intercepted as
|
|
||||||
terminals.
|
|
||||||
- `rollout.py` - Stage D. Rolls an operator model through a scenario in a fresh
|
|
||||||
container with only the disposable `SCAFFOLD`, records the turns
|
|
||||||
imperative-free (orientation + login + prompt/command/output, ending in the
|
|
||||||
terminal), verifies against final state, and classifies the outcome into a
|
|
||||||
`keep` decision. Multiple `--samples` per scenario for rejection sampling.
|
|
||||||
- `Dockerfile` - `sekft-dash`: alpine + dash, `/bin/sh` -> dash.
|
|
||||||
|
|
||||||
## Run
|
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
docker build -t sekft-dash . # the execution sandbox (once)
|
pipenv install # editable sekft + the local editable posix-sdc
|
||||||
|
pipenv install -e '.[gpu]' # torch / transformers / peft / datasets, on the box
|
||||||
SEKFT_MODEL=qwen2.5:32b \ # strong teacher via the litellm proxy
|
|
||||||
SEKFT_URL=http://localhost:4000/v1 \
|
|
||||||
SEKFT_KEY=sk-litellm-dev \
|
|
||||||
python generate.py --n 50 --out ./scenarios
|
|
||||||
|
|
||||||
SEKFT_OP_MODEL=qwen2.5:32b \ # operator (teacher in round 1, student in STaR)
|
|
||||||
python rollout.py --scenarios ./scenarios --out ./trajectories --samples 3
|
|
||||||
```
|
```
|
||||||
|
|
||||||
`rollout.py` writes one JSON per (scenario, sample) with the recorded turns and
|
`pyproject.toml` declares `tiararodney.posix-sdc` abstractly; the `Pipfile`
|
||||||
a `keep` flag. The keepers are the SFT set; the rejects (labelled by `outcome`)
|
overrides it with the local editable `../posix-sdc` for side-by-side development.
|
||||||
are Stage E's DPO negatives. Both stages run the model through the litellm
|
|
||||||
proxy; the rollout's container work is CPU/disk only.
|
|
||||||
|
|
||||||
When the `sekft-dash` image is present, `generate.py` runs each bundle's
|
## Use (on the GPU box)
|
||||||
reference solution in a fresh container and admits it only if its checker then
|
|
||||||
passes (real solvability gate). Without the image it falls back to a
|
|
||||||
**structural** dry-run that proves consistency, not solvability (`--no-docker`
|
|
||||||
forces this). The backend is verified end-to-end: `python dashdocker.py` runs a
|
|
||||||
self-test (fixtures, cwd/env replay, terminals).
|
|
||||||
|
|
||||||
## Non-negotiables (or the data rots)
|
```sh
|
||||||
|
# fine-tune an adapter on the posix-sdc trajectories
|
||||||
|
sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
|
||||||
|
--out ./ckpt --load-4bit
|
||||||
|
|
||||||
- **Reference-solution gate is mandatory** once the runner exists: never admit
|
# inspect the assistant-only loss mask without training (runs anywhere)
|
||||||
a scenario whose own checker its reference solution cannot pass.
|
sekft-train --data ./trajectories --base <dir> --inspect
|
||||||
- **Verify effect, not claim**: the checker inspects container state.
|
|
||||||
- **Strip teacher prose** from recorded assistant turns (Stage D).
|
# behavioural eval on held-out scenario bundles (worlds, not trajectories)
|
||||||
- **Balance terminals**: enough `empty-queue` and `blocked -> panic` scenarios
|
sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16
|
||||||
or the student learns "always exit success".
|
|
||||||
|
# resident loop: load the base once, cycle adapters without reloading it
|
||||||
|
sekft-resident --base <dir> --load-4bit
|
||||||
|
```
|
||||||
|
|
||||||
|
The eval consumes held-out **scenario bundles** from posix-sdc (it stands up and
|
||||||
|
verifies each in a fresh container), not trajectories.
|
||||||
|
|
||||||
|
## Result
|
||||||
|
|
||||||
|
Fine-tuning `mistralai/Mistral-7B-Instruct-v0.2` on the posix-sdc data lifted
|
||||||
|
clean termination on archetype-level held-out scenarios from **0/16 (base) to
|
||||||
|
9/16 (tuned)**: the operate-and-terminate mechanism generalised to unseen task
|
||||||
|
types, while task competence stayed archetype-local. See the experiment
|
||||||
|
[*From seed to weights*](https://blog.tiararodney.com/projects/2026/semantic-execution-kernel/experiments/from-seed-to-weights/).
|
||||||
|
|
|
||||||
2
TODO
2
TODO
|
|
@ -124,7 +124,7 @@ Content-Type: application/issue
|
||||||
ID: 8
|
ID: 8
|
||||||
Type: feature
|
Type: feature
|
||||||
Title: Refresh docs for the packaged trainer
|
Title: Refresh docs for the packaged trainer
|
||||||
Status: in-progress
|
Status: done
|
||||||
Priority: medium
|
Priority: medium
|
||||||
Created: 2026-06-16
|
Created: 2026-06-16
|
||||||
Module: sekft
|
Module: sekft
|
||||||
|
|
|
||||||
|
|
@ -6,14 +6,15 @@ scenarios with NO scaffold (the trained behaviour must stand on its own), and
|
||||||
reports the rates that count: does it reach command-mode, does it terminate,
|
reports the rates that count: does it reach command-mode, does it terminate,
|
||||||
does the checker pass.
|
does the checker pass.
|
||||||
|
|
||||||
python eval.py --base <hf-dir> --adapter ./ckpt-mistral-r16 \
|
sekft-eval --base <hf-dir> --adapter ./ckpt-mistral-r16 \
|
||||||
--scenarios ./holdout-scenarios --n 10
|
--scenarios ./holdout-scenarios --n 10
|
||||||
|
|
||||||
Reuses the rollout loop with a *local* operator: the model formats and
|
Reuses the posix-sdc rollout loop with a *local* operator: the model renders and
|
||||||
generates in the same role-delimited render it was trained on (train == eval ==
|
generates with the same chat template it was trained on (train == eval == serve,
|
||||||
deploy, or the prompts go out of distribution). Prerequisites on the box: torch
|
via ``apply_chat_template`` + ``normalize_for_template``, or the prompts go out
|
||||||
+ transformers + peft, the ``sekft-dash`` image, and held-out SCENARIO bundles
|
of distribution). Prerequisites on the box: torch + transformers + peft, the
|
||||||
(from ``generate.py`` -- not trajectories; the eval stands up and verifies each).
|
``sekft-dash`` image, and held-out SCENARIO bundles from the posix-sdc factory
|
||||||
|
(not trajectories; the eval stands up and verifies each).
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -8,14 +8,14 @@ fresh LoRA adapter on the resident base and ``unload``s it back to clean; each
|
||||||
|
|
||||||
Interactive (IPython on the GPU box) is the intended use:
|
Interactive (IPython on the GPU box) is the intended use:
|
||||||
|
|
||||||
from resident import Resident
|
from tiararodney.sekft.resident import Resident
|
||||||
r = Resident("~/llm-models/mistral-7b-instruct-v0.2", load_4bit=True)
|
r = Resident("~/llm-models/mistral-7b-instruct-v0.2", load_4bit=True)
|
||||||
r.fit("~/sekft/trajectories", "~/sekft/ckpt-a", lora_r=16, lr=2e-4, epochs=3)
|
r.fit("~/sekft/trajectories", "~/sekft/ckpt-a", lora_r=16, lr=2e-4, epochs=3)
|
||||||
r.evaluate("~/sekft/ckpt-a", "~/sekft/holdout", n=10)
|
r.evaluate("~/sekft/ckpt-a", "~/sekft/holdout", n=10)
|
||||||
r.fit("~/sekft/trajectories", "~/sekft/ckpt-b", lora_r=32) # NO base reload
|
r.fit("~/sekft/trajectories", "~/sekft/ckpt-b", lora_r=32) # NO base reload
|
||||||
|
|
||||||
Or `python resident.py --base <dir> --selftest-data <stub_dir>` to prove the
|
Or `sekft-resident --base <dir> --selftest-data <stub_dir>` to prove the base
|
||||||
base loads once and two adapters train against it.
|
loads once and two adapters train against it.
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -17,8 +17,8 @@ system role and require strict user/assistant alternation. That same
|
||||||
canonicalisation must run on the serving side. Everything else is standard
|
canonicalisation must run on the serving side. Everything else is standard
|
||||||
causal-LM SFT with an assistant-only loss mask.
|
causal-LM SFT with an assistant-only loss mask.
|
||||||
|
|
||||||
python sft.py --data ./trajectories --base <hf-model-dir> --out ./ckpt
|
sekft-train --data ./trajectories --base <hf-model-dir> --out ./ckpt
|
||||||
python sft.py --data ./trajectories --base <dir> --inspect # mask stats, no training
|
sekft-train --data ./trajectories --base <dir> --inspect # mask stats, no training
|
||||||
|
|
||||||
Training needs torch + transformers + peft (a GPU box). ``--inspect`` and the
|
Training needs torch + transformers + peft (a GPU box). ``--inspect`` and the
|
||||||
normalize/mask helpers run anywhere a tokenizer with a chat template is
|
normalize/mask helpers run anywhere a tokenizer with a chat template is
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue