todo(12): open
This commit is contained in:
parent
2209ade52c
commit
d47ba8a56e
1 changed files with 23 additions and 0 deletions
23
TODO
23
TODO
|
|
@ -191,3 +191,26 @@ Description: operate_rate computes sum(t.steps > 0 and t.meta.get('clean') for t
|
||||||
and resident.py:157. Wrap the predicate in bool() so it counts
|
and resident.py:157. Wrap the predicate in bool() so it counts
|
||||||
trajectories that operated and are clean, fixing both the type
|
trajectories that operated and are clean, fixing both the type
|
||||||
error and the latent crash.
|
error and the latent crash.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 12
|
||||||
|
Type: feature
|
||||||
|
Title: load training data from a raw dir, a curated jsonl, or the Hub
|
||||||
|
Status: open
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-17
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: iter_keepers reads only raw per-trajectory .json - one of three
|
||||||
|
input shapes the trainer should accept. Add load_turns(data, hub,
|
||||||
|
revision) that yields assistant-bearing turns from: a directory of
|
||||||
|
raw rollout .json (keep-filtered, today's iter_keepers); a curated
|
||||||
|
.jsonl corpus file (already keep-filtered, yield turns per line);
|
||||||
|
or the published corpus via posix-sdc's load_trajectories (local
|
||||||
|
data/ in a checkout, else the Hub). sekft-train gains --hub and
|
||||||
|
--revision; --data dispatches by dir-vs-.jsonl. Raw-rollout reading
|
||||||
|
stays sekft-local; curated+Hub reuse posix-sdc's loader (imported
|
||||||
|
lazily so the trainer needs neither posix-sdc nor huggingface_hub
|
||||||
|
for the raw/jsonl paths). Unit tests for the raw-dir and jsonl
|
||||||
|
dispatch.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue