Commit graph

1 commit

Author SHA1 Message Date
Tiara Rodney
414e963825
feat(12): load training data from a raw dir, a curated jsonl, or the Hub
iter_keepers read only raw per-trajectory .json -- one of three input shapes.
Add load_turns(data, hub, revision) yielding assistant-bearing turns from a raw
rollout dir (keep-filtered), a curated .jsonl corpus (one record per line), or
the published corpus via posix-sdc's load_trajectories (the in-repo data/ of a
checkout, else the Hugging Face Hub). sekft-train gains --hub and --revision and
dispatches --data by dir-vs-.jsonl; train() and inspect() use it.

Raw-rollout reading stays sekft-local; curated + Hub reuse posix-sdc's loader,
imported lazily so the raw/jsonl paths need neither posix-sdc nor huggingface_hub
installed. Unit tests cover the raw-dir and jsonl dispatch.
2026-06-18 00:05:27 +02:00