todo(12): open

This commit is contained in:
Tiara Rodney 2026-06-18 00:03:10 +02:00
parent 2209ade52c
commit d47ba8a56e
Signed by: tiara
GPG key ID: 5CD8EC1D46106723

23
TODO
View file

@ -191,3 +191,26 @@ Description: operate_rate computes sum(t.steps > 0 and t.meta.get('clean') for t
and resident.py:157. Wrap the predicate in bool() so it counts and resident.py:157. Wrap the predicate in bool() so it counts
trajectories that operated and are clean, fixing both the type trajectories that operated and are clean, fixing both the type
error and the latent crash. error and the latent crash.
--ISSUE
Content-Type: application/issue
ID: 12
Type: feature
Title: load training data from a raw dir, a curated jsonl, or the Hub
Status: open
Priority: medium
Created: 2026-06-17
Module: sekft
Relationships:
Description: iter_keepers reads only raw per-trajectory .json - one of three
input shapes the trainer should accept. Add load_turns(data, hub,
revision) that yields assistant-bearing turns from: a directory of
raw rollout .json (keep-filtered, today's iter_keepers); a curated
.jsonl corpus file (already keep-filtered, yield turns per line);
or the published corpus via posix-sdc's load_trajectories (local
data/ in a checkout, else the Hub). sekft-train gains --hub and
--revision; --data dispatches by dir-vs-.jsonl. Raw-rollout reading
stays sekft-local; curated+Hub reuse posix-sdc's loader (imported
lazily so the trainer needs neither posix-sdc nor huggingface_hub
for the raw/jsonl paths). Unit tests for the raw-dir and jsonl
dispatch.