apply_chat_template returns a BatchEncoding ({input_ids: [...]}) on transformers
>= 5 where 4.x returned a bare list[int]. build_masked_example treated the render
as a dict, so len/slicing were wrong and the prefix-differencing spuriously
raised "chat template is not additive" on every real model. Extract the id
sequence via a _render_ids helper; verified the assistant-only mask against
mistralai/Mistral-7B-Instruct-v0.2. The fake tokenizer returned a bare list and
missed this, so a BatchEncoding-returning variant now guards it.
54 lines
2.9 KiB
Markdown
54 lines
2.9 KiB
Markdown
# Changelog
|
|
|
|
All notable changes to sekft, the shell-operator SFT trainer behind the
|
|
[posix-sdc](https://huggingface.co/datasets/tiararodney/posix-sdc) experiment,
|
|
are documented in this file.
|
|
|
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
|
|
## [1.0.1] - 2026-06-18
|
|
|
|
### Fixed
|
|
- `build_masked_example` could not derive the assistant mask on transformers
|
|
≥ 5: `apply_chat_template` now returns a `BatchEncoding` (`{input_ids: [...]}`)
|
|
where 4.x returned a bare `list[int]`, so the render was treated as a dict and
|
|
the prefix-differencing spuriously raised "chat template is not additive" on
|
|
every real model. The id sequence is now extracted either way; verified the
|
|
assistant-only mask against `mistralai/Mistral-7B-Instruct-v0.2`. The
|
|
fake-tokenizer test gained a `BatchEncoding`-returning variant so this can't
|
|
regress.
|
|
|
|
## [1.0.0] - 2026-06-18
|
|
|
|
First release: the training and evaluation pipeline that turns posix-sdc
|
|
trajectories into a fine-tuned shell operator.
|
|
|
|
### Added
|
|
- `sekft-train`: LoRA / QLoRA supervised fine-tuning of a base model on
|
|
shell-operation trajectories, with an **assistant-only loss mask** derived by
|
|
token-prefix differencing — the commands and the terminal `exit` / `panic`
|
|
token are trained; the environment turns (orientation, prompts, command
|
|
output) are masked to `-100`. The render uses the tokenizer's own
|
|
`apply_chat_template`, so training matches what the serving harness sends
|
|
(train = serve), with `normalize_for_template` canonicalising trajectories for
|
|
instruct templates that have no system role and require strict user/assistant
|
|
alternation.
|
|
- Three sources of training data: a directory of raw rollout `.json`
|
|
(keep-filtered), a curated `.jsonl` corpus, or the published posix-sdc corpus
|
|
over the Hugging Face Hub (`--hub`).
|
|
- `--inspect` for mask and token statistics without training, and structured
|
|
stderr logging across every phase (`-v` / `-q`): per-trajectory and progress
|
|
lines while the corpus is tokenized, dataset accounting that warns on dropped
|
|
(over-length / empty-mask) trajectories, and the per-step training curve.
|
|
- `sekft-eval`: behavioural evaluation that drops the tuned model into held-out
|
|
scenarios with no scaffold and scores whether it operates and terminates.
|
|
- `sekft-resident`: a resident-base harness that loads the base model once and
|
|
fits several adapters without reloading, for paired / STaR-style runs.
|
|
- Packaging: the `tiararodney.sekft` namespace package with `sekft-train`,
|
|
`sekft-eval`, and `sekft-resident` console scripts; a typed (`py.typed`),
|
|
mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft);
|
|
and a dependency on `posix-sdc[hub]`. Released under GPL-2.0.
|
|
|
|
[1.0.1]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.0...v1.0.1
|
|
[1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0
|