docs: add pipeline overview
This commit is contained in:
parent
12f10854c3
commit
ea31b7fceb
1 changed files with 89 additions and 0 deletions
89
README.md
Normal file
89
README.md
Normal file
|
|
@ -0,0 +1,89 @@
|
||||||
|
# sekft
|
||||||
|
|
||||||
|
Synthetic-trajectory generation for fine-tuning a model to operate a shell
|
||||||
|
as a self-directed citizen: land with **no imperative**, discover where
|
||||||
|
directives live, learn the provider from its own self-documentation, retrieve
|
||||||
|
the directives, execute them, and terminate (`exit` on success, `panic` when
|
||||||
|
genuinely blocked).
|
||||||
|
|
||||||
|
The dataset teaches a **mechanism, not a program**. Every axis of a scenario
|
||||||
|
is varied; only the four-step routine is held invariant:
|
||||||
|
|
||||||
|
1. **expect an announcement** of where directives are (motd / banner / env / file)
|
||||||
|
2. **understand the provider** via its self-documentation (`--help` / `man` / usage)
|
||||||
|
3. **retrieve** the directives
|
||||||
|
4. **execute**, then terminate
|
||||||
|
|
||||||
|
Bind the *convention* (there is an announcement at entry; tools are
|
||||||
|
self-documenting), free everything else. The model that learns this tolerates
|
||||||
|
an unstable userland because it re-learns the interface every session.
|
||||||
|
|
||||||
|
## Pipeline
|
||||||
|
|
||||||
|
```
|
||||||
|
A. author generate.py model writes scenario bundles from the taxonomy
|
||||||
|
+ ref-gate dashdocker.py run the bundle's own reference solution; admit only if its checker passes
|
||||||
|
B. rollout rollout.py scaffolded operator model acts in a fresh dash-in-docker container
|
||||||
|
C. verify rollout.py run the checker against container STATE (effect, not transcript)
|
||||||
|
D. record rollout.py strip the operator scaffold; save env<->action turns in deploy format
|
||||||
|
E. pairs [seam] rejects from B/C become DPO negatives against keepers from the same scenario
|
||||||
|
```
|
||||||
|
|
||||||
|
This repo implements **A-D** plus the execution backend (`dashdocker.py`).
|
||||||
|
Stage E (preference-pair assembly from the kept/rejected trajectories) is the
|
||||||
|
remaining seam; the rejects are already labelled by `outcome`/`keep`.
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
- `taxonomy.py` - the axes of variation (task / provider / announcement /
|
||||||
|
doc-depth / difficulty) as pure data. No model, no container.
|
||||||
|
- `schema.py` - the `Scenario` bundle dataclasses + JSON (de)serialisation.
|
||||||
|
- `generate.py` - sample a combo, prompt a teacher model to author the bundle,
|
||||||
|
gate on the reference solution, write validated bundles to disk.
|
||||||
|
- `dashdocker.py` - the dash-in-Docker backend. `run(fixtures, script)` for the
|
||||||
|
one-shot reference gate; `session(fixtures)` for stateful rollouts, with
|
||||||
|
`Session.exec` (state-replayed), `.cwd()` (prompt building), `.check()` (Stage
|
||||||
|
C). Each command runs as its own `docker exec` (no tty buffering); cwd +
|
||||||
|
exported env are replayed between commands; `exit`/`panic` are intercepted as
|
||||||
|
terminals.
|
||||||
|
- `rollout.py` - Stage D. Rolls an operator model through a scenario in a fresh
|
||||||
|
container with only the disposable `SCAFFOLD`, records the turns
|
||||||
|
imperative-free (orientation + login + prompt/command/output, ending in the
|
||||||
|
terminal), verifies against final state, and classifies the outcome into a
|
||||||
|
`keep` decision. Multiple `--samples` per scenario for rejection sampling.
|
||||||
|
- `Dockerfile` - `sekft-dash`: alpine + dash, `/bin/sh` -> dash.
|
||||||
|
|
||||||
|
## Run
|
||||||
|
|
||||||
|
```sh
|
||||||
|
docker build -t sekft-dash . # the execution sandbox (once)
|
||||||
|
|
||||||
|
SEKFT_MODEL=qwen2.5:32b \ # strong teacher via the litellm proxy
|
||||||
|
SEKFT_URL=http://localhost:4000/v1 \
|
||||||
|
SEKFT_KEY=sk-litellm-dev \
|
||||||
|
python generate.py --n 50 --out ./scenarios
|
||||||
|
|
||||||
|
SEKFT_OP_MODEL=qwen2.5:32b \ # operator (teacher in round 1, student in STaR)
|
||||||
|
python rollout.py --scenarios ./scenarios --out ./trajectories --samples 3
|
||||||
|
```
|
||||||
|
|
||||||
|
`rollout.py` writes one JSON per (scenario, sample) with the recorded turns and
|
||||||
|
a `keep` flag. The keepers are the SFT set; the rejects (labelled by `outcome`)
|
||||||
|
are Stage E's DPO negatives. Both stages run the model through the litellm
|
||||||
|
proxy; the rollout's container work is CPU/disk only.
|
||||||
|
|
||||||
|
When the `sekft-dash` image is present, `generate.py` runs each bundle's
|
||||||
|
reference solution in a fresh container and admits it only if its checker then
|
||||||
|
passes (real solvability gate). Without the image it falls back to a
|
||||||
|
**structural** dry-run that proves consistency, not solvability (`--no-docker`
|
||||||
|
forces this). The backend is verified end-to-end: `python dashdocker.py` runs a
|
||||||
|
self-test (fixtures, cwd/env replay, terminals).
|
||||||
|
|
||||||
|
## Non-negotiables (or the data rots)
|
||||||
|
|
||||||
|
- **Reference-solution gate is mandatory** once the runner exists: never admit
|
||||||
|
a scenario whose own checker its reference solution cannot pass.
|
||||||
|
- **Verify effect, not claim**: the checker inspects container state.
|
||||||
|
- **Strip teacher prose** from recorded assistant turns (Stage D).
|
||||||
|
- **Balance terminals**: enough `empty-queue` and `blocked -> panic` scenarios
|
||||||
|
or the student learns "always exit success".
|
||||||
Loading…
Add table
Add a link
Reference in a new issue