Commit graph

3 commits

Author SHA1 Message Date
Tiara Rodney
4987d951ce
bugfix(15): normalise apply_chat_template's BatchEncoding (transformers 5.x)
apply_chat_template returns a BatchEncoding ({input_ids: [...]}) on transformers
>= 5 where 4.x returned a bare list[int]. build_masked_example treated the render
as a dict, so len/slicing were wrong and the prefix-differencing spuriously
raised "chat template is not additive" on every real model. Extract the id
sequence via a _render_ids helper; verified the assistant-only mask against
mistralai/Mistral-7B-Instruct-v0.2. The fake tokenizer returned a bare list and
missed this, so a BatchEncoding-returning variant now guards it.
2026-06-18 12:37:01 +02:00
Tiara Rodney
64020c321d
test: annotate the sft test helpers 2026-06-17 14:03:52 +02:00
Tiara Rodney
4d1ec9e7fa
test: add SFT render and mask unit tests 2026-06-16 20:15:19 +02:00