diff --git a/README.md b/README.md new file mode 100644 index 0000000..f5cfbc5 --- /dev/null +++ b/README.md @@ -0,0 +1,89 @@ +# sekft + +Synthetic-trajectory generation for fine-tuning a model to operate a shell +as a self-directed citizen: land with **no imperative**, discover where +directives live, learn the provider from its own self-documentation, retrieve +the directives, execute them, and terminate (`exit` on success, `panic` when +genuinely blocked). + +The dataset teaches a **mechanism, not a program**. Every axis of a scenario +is varied; only the four-step routine is held invariant: + +1. **expect an announcement** of where directives are (motd / banner / env / file) +2. **understand the provider** via its self-documentation (`--help` / `man` / usage) +3. **retrieve** the directives +4. **execute**, then terminate + +Bind the *convention* (there is an announcement at entry; tools are +self-documenting), free everything else. The model that learns this tolerates +an unstable userland because it re-learns the interface every session. + +## Pipeline + +``` +A. author generate.py model writes scenario bundles from the taxonomy + + ref-gate dashdocker.py run the bundle's own reference solution; admit only if its checker passes +B. rollout rollout.py scaffolded operator model acts in a fresh dash-in-docker container +C. verify rollout.py run the checker against container STATE (effect, not transcript) +D. record rollout.py strip the operator scaffold; save env<->action turns in deploy format +E. pairs [seam] rejects from B/C become DPO negatives against keepers from the same scenario +``` + +This repo implements **A-D** plus the execution backend (`dashdocker.py`). +Stage E (preference-pair assembly from the kept/rejected trajectories) is the +remaining seam; the rejects are already labelled by `outcome`/`keep`. + +## Files + +- `taxonomy.py` - the axes of variation (task / provider / announcement / + doc-depth / difficulty) as pure data. No model, no container. +- `schema.py` - the `Scenario` bundle dataclasses + JSON (de)serialisation. +- `generate.py` - sample a combo, prompt a teacher model to author the bundle, + gate on the reference solution, write validated bundles to disk. +- `dashdocker.py` - the dash-in-Docker backend. `run(fixtures, script)` for the + one-shot reference gate; `session(fixtures)` for stateful rollouts, with + `Session.exec` (state-replayed), `.cwd()` (prompt building), `.check()` (Stage + C). Each command runs as its own `docker exec` (no tty buffering); cwd + + exported env are replayed between commands; `exit`/`panic` are intercepted as + terminals. +- `rollout.py` - Stage D. Rolls an operator model through a scenario in a fresh + container with only the disposable `SCAFFOLD`, records the turns + imperative-free (orientation + login + prompt/command/output, ending in the + terminal), verifies against final state, and classifies the outcome into a + `keep` decision. Multiple `--samples` per scenario for rejection sampling. +- `Dockerfile` - `sekft-dash`: alpine + dash, `/bin/sh` -> dash. + +## Run + +```sh +docker build -t sekft-dash . # the execution sandbox (once) + +SEKFT_MODEL=qwen2.5:32b \ # strong teacher via the litellm proxy +SEKFT_URL=http://localhost:4000/v1 \ +SEKFT_KEY=sk-litellm-dev \ + python generate.py --n 50 --out ./scenarios + +SEKFT_OP_MODEL=qwen2.5:32b \ # operator (teacher in round 1, student in STaR) + python rollout.py --scenarios ./scenarios --out ./trajectories --samples 3 +``` + +`rollout.py` writes one JSON per (scenario, sample) with the recorded turns and +a `keep` flag. The keepers are the SFT set; the rejects (labelled by `outcome`) +are Stage E's DPO negatives. Both stages run the model through the litellm +proxy; the rollout's container work is CPU/disk only. + +When the `sekft-dash` image is present, `generate.py` runs each bundle's +reference solution in a fresh container and admits it only if its checker then +passes (real solvability gate). Without the image it falls back to a +**structural** dry-run that proves consistency, not solvability (`--no-docker` +forces this). The backend is verified end-to-end: `python dashdocker.py` runs a +self-test (fixtures, cwd/env replay, terminals). + +## Non-negotiables (or the data rots) + +- **Reference-solution gate is mandatory** once the runner exists: never admit + a scenario whose own checker its reference solution cannot pass. +- **Verify effect, not claim**: the checker inspects container state. +- **Strip teacher prose** from recorded assistant turns (Stage D). +- **Balance terminals**: enough `empty-queue` and `blocked -> panic` scenarios + or the student learns "always exit success". diff --git a/TODO b/TODO index a1406b9..4bef28b 100644 --- a/TODO +++ b/TODO @@ -81,7 +81,7 @@ Content-Type: application/issue ID: 5 Type: feature Title: Pipeline overview README -Status: in-progress +Status: done Priority: medium Created: 2026-06-16 Module: sekft