apply_chat_template returns a BatchEncoding ({input_ids: [...]}) on transformers
>= 5 where 4.x returned a bare list[int]. build_masked_example treated the render
as a dict, so len/slicing were wrong and the prefix-differencing spuriously
raised "chat template is not additive" on every real model. Extract the id
sequence via a _render_ids helper; verified the assistant-only mask against
mistralai/Mistral-7B-Instruct-v0.2. The fake tokenizer returned a bare list and
missed this, so a BatchEncoding-returning variant now guards it.
2.9 KiB
2.9 KiB
Changelog
All notable changes to sekft, the shell-operator SFT trainer behind the posix-sdc experiment, are documented in this file.
The format is based on Keep a Changelog, and the project follows Semantic Versioning.
1.0.1 - 2026-06-18
Fixed
build_masked_examplecould not derive the assistant mask on transformers ≥ 5:apply_chat_templatenow returns aBatchEncoding({input_ids: [...]}) where 4.x returned a barelist[int], so the render was treated as a dict and the prefix-differencing spuriously raised "chat template is not additive" on every real model. The id sequence is now extracted either way; verified the assistant-only mask againstmistralai/Mistral-7B-Instruct-v0.2. The fake-tokenizer test gained aBatchEncoding-returning variant so this can't regress.
1.0.0 - 2026-06-18
First release: the training and evaluation pipeline that turns posix-sdc trajectories into a fine-tuned shell operator.
Added
sekft-train: LoRA / QLoRA supervised fine-tuning of a base model on shell-operation trajectories, with an assistant-only loss mask derived by token-prefix differencing — the commands and the terminalexit/panictoken are trained; the environment turns (orientation, prompts, command output) are masked to-100. The render uses the tokenizer's ownapply_chat_template, so training matches what the serving harness sends (train = serve), withnormalize_for_templatecanonicalising trajectories for instruct templates that have no system role and require strict user/assistant alternation.- Three sources of training data: a directory of raw rollout
.json(keep-filtered), a curated.jsonlcorpus, or the published posix-sdc corpus over the Hugging Face Hub (--hub). --inspectfor mask and token statistics without training, and structured stderr logging across every phase (-v/-q): per-trajectory and progress lines while the corpus is tokenized, dataset accounting that warns on dropped (over-length / empty-mask) trajectories, and the per-step training curve.sekft-eval: behavioural evaluation that drops the tuned model into held-out scenarios with no scaffold and scores whether it operates and terminates.sekft-resident: a resident-base harness that loads the base model once and fits several adapters without reloading, for paired / STaR-style runs.- Packaging: the
tiararodney.sekftnamespace package withsekft-train,sekft-eval, andsekft-residentconsole scripts; a typed (py.typed), mypy-strict codebase; an optional[gpu]extra (torch / transformers / peft); and a dependency onposix-sdc[hub]. Released under GPL-2.0.