todo(15): open

This commit is contained in:
Tiara Rodney 2026-06-18 12:34:37 +02:00
parent 705b4a028b
commit 01e4d75237
Signed by: tiara
GPG key ID: 5CD8EC1D46106723

22
TODO
View file

@ -249,3 +249,25 @@ Description: The lock committed with the triplet (#13) predated the published
and its transitive deps into the lock. Commit the refreshed and its transitive deps into the lock. Commit the refreshed
Pipfile.lock so the next machine installs the published wheel with Pipfile.lock so the next machine installs the published wheel with
the Hub path available. the Hub path available.
--ISSUE
Content-Type: application/issue
ID: 15
Type: bugfix
Title: apply_chat_template returns BatchEncoding on transformers 5.x
Status: open
Priority: high
Created: 2026-06-18
Module: sekft
Relationships:
Description: build_masked_example assumed apply_chat_template returns a flat
list[int] (transformers 4.x). On transformers 5.x it returns a
BatchEncoding ({input_ids: [...]}), so ids was a dict, len(ids) was
the key count, and the prefix-differencing spuriously raised 'chat
template is not additive' on every real model (verified against
mistralai/Mistral-7B-Instruct-v0.2). The masking logic is sound and
the Mistral template is additive; only the return type needs
normalising. Add a _render_ids helper that extracts input_ids when
the result is dict-like, and use it for both renders. The
fake-tokenizer test returned a bare list and missed this, so add a
BatchEncoding-returning fake and assert the mask matches.