| src/tiararodney/sekft | ||
| tests | ||
| .gitignore | ||
| Dockerfile | ||
| LICENSE | ||
| Pipfile | ||
| pyproject.toml | ||
| README.md | ||
| TODO | ||
| tox.ini | ||
sekft
Synthetic-trajectory generation for fine-tuning a model to operate a shell
as a self-directed citizen: land with no imperative, discover where
directives live, learn the provider from its own self-documentation, retrieve
the directives, execute them, and terminate (exit on success, panic when
genuinely blocked).
The dataset teaches a mechanism, not a program. Every axis of a scenario is varied; only the four-step routine is held invariant:
- expect an announcement of where directives are (motd / banner / env / file)
- understand the provider via its self-documentation (
--help/man/ usage) - retrieve the directives
- execute, then terminate
Bind the convention (there is an announcement at entry; tools are self-documenting), free everything else. The model that learns this tolerates an unstable userland because it re-learns the interface every session.
Pipeline
A. author generate.py model writes scenario bundles from the taxonomy
+ ref-gate dashdocker.py run the bundle's own reference solution; admit only if its checker passes
B. rollout rollout.py scaffolded operator model acts in a fresh dash-in-docker container
C. verify rollout.py run the checker against container STATE (effect, not transcript)
D. record rollout.py strip the operator scaffold; save env<->action turns in deploy format
E. pairs [seam] rejects from B/C become DPO negatives against keepers from the same scenario
This repo implements A-D plus the execution backend (dashdocker.py).
Stage E (preference-pair assembly from the kept/rejected trajectories) is the
remaining seam; the rejects are already labelled by outcome/keep.
Files
taxonomy.py- the axes of variation (task / provider / announcement / doc-depth / difficulty) as pure data. No model, no container.schema.py- theScenariobundle dataclasses + JSON (de)serialisation.generate.py- sample a combo, prompt a teacher model to author the bundle, gate on the reference solution, write validated bundles to disk.dashdocker.py- the dash-in-Docker backend.run(fixtures, script)for the one-shot reference gate;session(fixtures)for stateful rollouts, withSession.exec(state-replayed),.cwd()(prompt building),.check()(Stage C). Each command runs as its owndocker exec(no tty buffering); cwd + exported env are replayed between commands;exit/panicare intercepted as terminals.rollout.py- Stage D. Rolls an operator model through a scenario in a fresh container with only the disposableSCAFFOLD, records the turns imperative-free (orientation + login + prompt/command/output, ending in the terminal), verifies against final state, and classifies the outcome into akeepdecision. Multiple--samplesper scenario for rejection sampling.Dockerfile-sekft-dash: alpine + dash,/bin/sh-> dash.
Run
docker build -t sekft-dash . # the execution sandbox (once)
SEKFT_MODEL=qwen2.5:32b \ # strong teacher via the litellm proxy
SEKFT_URL=http://localhost:4000/v1 \
SEKFT_KEY=sk-litellm-dev \
python generate.py --n 50 --out ./scenarios
SEKFT_OP_MODEL=qwen2.5:32b \ # operator (teacher in round 1, student in STaR)
python rollout.py --scenarios ./scenarios --out ./trajectories --samples 3
rollout.py writes one JSON per (scenario, sample) with the recorded turns and
a keep flag. The keepers are the SFT set; the rejects (labelled by outcome)
are Stage E's DPO negatives. Both stages run the model through the litellm
proxy; the rollout's container work is CPU/disk only.
When the sekft-dash image is present, generate.py runs each bundle's
reference solution in a fresh container and admits it only if its checker then
passes (real solvability gate). Without the image it falls back to a
structural dry-run that proves consistency, not solvability (--no-docker
forces this). The backend is verified end-to-end: python dashdocker.py runs a
self-test (fixtures, cwd/env replay, terminals).
Non-negotiables (or the data rots)
- Reference-solution gate is mandatory once the runner exists: never admit a scenario whose own checker its reference solution cannot pass.
- Verify effect, not claim: the checker inspects container state.
- Strip teacher prose from recorded assistant turns (Stage D).
- Balance terminals: enough
empty-queueandblocked -> panicscenarios or the student learns "always exit success".