bugfix(16): operators must not feed a BatchEncoding to model.generate

The transformers 5.x return-type change behind #15 also breaks generation:
apply_chat_template(add_generation_prompt=True, return_tensors="pt") returns a
BatchEncoding, and eval.py + resident.py passed it to model.generate, which does
inputs.shape[0] -> AttributeError (the holdout eval crashed on scenario 1). #15
fixed only the trainer. Factor a shared _input_ids helper and a render_prompt_ids
function; both operators use it. Tests cover _input_ids for both shapes and
render_prompt_ids.

2026-06-18 16:49:30 +02:00

3.6 KiB

Raw Permalink Blame History

Changelog

All notable changes to sekft, the shell-operator SFT trainer behind the posix-sdc experiment, are documented in this file.

The format is based on Keep a Changelog, and the project follows Semantic Versioning.

1.0.2 - 2026-06-18

Fixed

The generation operators (sekft-eval, sekft-resident) passed the BatchEncoding from apply_chat_template(..., return_tensors="pt") straight to model.generate, which does inputs.shape[0] and raised AttributeError on transformers ≥ 5 — the holdout eval crashed on its first scenario. 1.0.1 fixed only the trainer's masking; this sweeps the generation path too. A shared _input_ids helper and a render_prompt_ids function now extract the id tensor for both operators, with unit tests for the BatchEncoding and bare shapes.

1.0.1 - 2026-06-18

Fixed

build_masked_example could not derive the assistant mask on transformers ≥ 5: apply_chat_template now returns a BatchEncoding ({input_ids: [...]}) where 4.x returned a bare list[int], so the render was treated as a dict and the prefix-differencing spuriously raised "chat template is not additive" on every real model. The id sequence is now extracted either way; verified the assistant-only mask against mistralai/Mistral-7B-Instruct-v0.2. The fake-tokenizer test gained a BatchEncoding-returning variant so this can't regress.

1.0.0 - 2026-06-18

First release: the training and evaluation pipeline that turns posix-sdc trajectories into a fine-tuned shell operator.

Added

sekft-train: LoRA / QLoRA supervised fine-tuning of a base model on shell-operation trajectories, with an assistant-only loss mask derived by token-prefix differencing — the commands and the terminal exit / panic token are trained; the environment turns (orientation, prompts, command output) are masked to -100. The render uses the tokenizer's own apply_chat_template, so training matches what the serving harness sends (train = serve), with normalize_for_template canonicalising trajectories for instruct templates that have no system role and require strict user/assistant alternation.
Three sources of training data: a directory of raw rollout .json (keep-filtered), a curated .jsonl corpus, or the published posix-sdc corpus over the Hugging Face Hub (--hub).
--inspect for mask and token statistics without training, and structured stderr logging across every phase (-v / -q): per-trajectory and progress lines while the corpus is tokenized, dataset accounting that warns on dropped (over-length / empty-mask) trajectories, and the per-step training curve.
sekft-eval: behavioural evaluation that drops the tuned model into held-out scenarios with no scaffold and scores whether it operates and terminates.
sekft-resident: a resident-base harness that loads the base model once and fits several adapters without reloading, for paired / STaR-style runs.
Packaging: the tiararodney.sekft namespace package with sekft-train, sekft-eval, and sekft-resident console scripts; a typed (py.typed), mypy-strict codebase; an optional [gpu] extra (torch / transformers / peft); and a dependency on posix-sdc[hub]. Released under GPL-2.0.

3.6 KiB Raw Permalink Blame History

Changelog

1.0.2 - 2026-06-18

Fixed

1.0.1 - 2026-06-18

Fixed

1.0.0 - 2026-06-18

Added

3.6 KiB

Raw Permalink Blame History