No description
Find a file
2026-06-16 20:15:14 +02:00
src/tiararodney/sekft feat: add resident-base train and eval harness 2026-06-16 20:14:47 +02:00
.gitignore chore: initial commit 2026-06-16 20:13:14 +02:00
Dockerfile chore: initial commit 2026-06-16 20:13:14 +02:00
Pipfile chore: pin posix-sdc as a local editable dependency 2026-06-16 20:13:49 +02:00
pyproject.toml feat: scaffold installable namespace package 2026-06-16 20:13:48 +02:00
README.md docs: add pipeline overview 2026-06-16 20:15:12 +02:00
TODO todo(5): done 2026-06-16 20:15:13 +02:00
tox.ini chore: add tox lint, format and test environments 2026-06-16 20:13:49 +02:00

sekft

Synthetic-trajectory generation for fine-tuning a model to operate a shell as a self-directed citizen: land with no imperative, discover where directives live, learn the provider from its own self-documentation, retrieve the directives, execute them, and terminate (exit on success, panic when genuinely blocked).

The dataset teaches a mechanism, not a program. Every axis of a scenario is varied; only the four-step routine is held invariant:

  1. expect an announcement of where directives are (motd / banner / env / file)
  2. understand the provider via its self-documentation (--help / man / usage)
  3. retrieve the directives
  4. execute, then terminate

Bind the convention (there is an announcement at entry; tools are self-documenting), free everything else. The model that learns this tolerates an unstable userland because it re-learns the interface every session.

Pipeline

A. author      generate.py     model writes scenario bundles from the taxonomy
   + ref-gate   dashdocker.py   run the bundle's own reference solution; admit only if its checker passes
B. rollout     rollout.py       scaffolded operator model acts in a fresh dash-in-docker container
C. verify      rollout.py       run the checker against container STATE (effect, not transcript)
D. record      rollout.py       strip the operator scaffold; save env<->action turns in deploy format
E. pairs       [seam]           rejects from B/C become DPO negatives against keepers from the same scenario

This repo implements A-D plus the execution backend (dashdocker.py). Stage E (preference-pair assembly from the kept/rejected trajectories) is the remaining seam; the rejects are already labelled by outcome/keep.

Files

  • taxonomy.py - the axes of variation (task / provider / announcement / doc-depth / difficulty) as pure data. No model, no container.
  • schema.py - the Scenario bundle dataclasses + JSON (de)serialisation.
  • generate.py - sample a combo, prompt a teacher model to author the bundle, gate on the reference solution, write validated bundles to disk.
  • dashdocker.py - the dash-in-Docker backend. run(fixtures, script) for the one-shot reference gate; session(fixtures) for stateful rollouts, with Session.exec (state-replayed), .cwd() (prompt building), .check() (Stage C). Each command runs as its own docker exec (no tty buffering); cwd + exported env are replayed between commands; exit/panic are intercepted as terminals.
  • rollout.py - Stage D. Rolls an operator model through a scenario in a fresh container with only the disposable SCAFFOLD, records the turns imperative-free (orientation + login + prompt/command/output, ending in the terminal), verifies against final state, and classifies the outcome into a keep decision. Multiple --samples per scenario for rejection sampling.
  • Dockerfile - sekft-dash: alpine + dash, /bin/sh -> dash.

Run

docker build -t sekft-dash .              # the execution sandbox (once)

SEKFT_MODEL=qwen2.5:32b \                  # strong teacher via the litellm proxy
SEKFT_URL=http://localhost:4000/v1 \
SEKFT_KEY=sk-litellm-dev \
  python generate.py --n 50 --out ./scenarios

SEKFT_OP_MODEL=qwen2.5:32b \              # operator (teacher in round 1, student in STaR)
  python rollout.py --scenarios ./scenarios --out ./trajectories --samples 3

rollout.py writes one JSON per (scenario, sample) with the recorded turns and a keep flag. The keepers are the SFT set; the rejects (labelled by outcome) are Stage E's DPO negatives. Both stages run the model through the litellm proxy; the rollout's container work is CPU/disk only.

When the sekft-dash image is present, generate.py runs each bundle's reference solution in a fresh container and admits it only if its checker then passes (real solvability gate). Without the image it falls back to a structural dry-run that proves consistency, not solvability (--no-docker forces this). The backend is verified end-to-end: python dashdocker.py runs a self-test (fixtures, cwd/env replay, terminals).

Non-negotiables (or the data rots)

  • Reference-solution gate is mandatory once the runner exists: never admit a scenario whose own checker its reference solution cannot pass.
  • Verify effect, not claim: the checker inspects container state.
  • Strip teacher prose from recorded assistant turns (Stage D).
  • Balance terminals: enough empty-queue and blocked -> panic scenarios or the student learns "always exit success".