todo(12): open

This commit is contained in:
Tiara Rodney 2026-06-18 00:03:10 +02:00
parent 2209ade52c
commit d47ba8a56e
Signed by: tiara
GPG key ID: 5CD8EC1D46106723

23
TODO
View file

@ -191,3 +191,26 @@ Description: operate_rate computes sum(t.steps > 0 and t.meta.get('clean') for t
and resident.py:157. Wrap the predicate in bool() so it counts
trajectories that operated and are clean, fixing both the type
error and the latent crash.
--ISSUE
Content-Type: application/issue
ID: 12
Type: feature
Title: load training data from a raw dir, a curated jsonl, or the Hub
Status: open
Priority: medium
Created: 2026-06-17
Module: sekft
Relationships:
Description: iter_keepers reads only raw per-trajectory .json - one of three
input shapes the trainer should accept. Add load_turns(data, hub,
revision) that yields assistant-bearing turns from: a directory of
raw rollout .json (keep-filtered, today's iter_keepers); a curated
.jsonl corpus file (already keep-filtered, yield turns per line);
or the published corpus via posix-sdc's load_trajectories (local
data/ in a checkout, else the Hub). sekft-train gains --hub and
--revision; --data dispatches by dir-vs-.jsonl. Raw-rollout reading
stays sekft-local; curated+Hub reuse posix-sdc's loader (imported
lazily so the trainer needs neither posix-sdc nor huggingface_hub
for the raw/jsonl paths). Unit tests for the raw-dir and jsonl
dispatch.