todo(15): open

This commit is contained in:
Tiara Rodney 2026-06-18 12:34:37 +02:00
parent 705b4a028b
commit 01e4d75237
Signed by: tiara
GPG key ID: 5CD8EC1D46106723

22
TODO
View file

@ -249,3 +249,25 @@ Description: The lock committed with the triplet (#13) predated the published
and its transitive deps into the lock. Commit the refreshed
Pipfile.lock so the next machine installs the published wheel with
the Hub path available.
--ISSUE
Content-Type: application/issue
ID: 15
Type: bugfix
Title: apply_chat_template returns BatchEncoding on transformers 5.x
Status: open
Priority: high
Created: 2026-06-18
Module: sekft
Relationships:
Description: build_masked_example assumed apply_chat_template returns a flat
list[int] (transformers 4.x). On transformers 5.x it returns a
BatchEncoding ({input_ids: [...]}), so ids was a dict, len(ids) was
the key count, and the prefix-differencing spuriously raised 'chat
template is not additive' on every real model (verified against
mistralai/Mistral-7B-Instruct-v0.2). The masking logic is sound and
the Mistral template is additive; only the return type needs
normalising. Add a _render_ids helper that extracts input_ids when
the result is dict-like, and use it for both renders. The
fake-tokenizer test returned a bare list and missed this, so add a
BatchEncoding-returning fake and assert the mask matches.