todo(15): open

2026-06-18 12:34:37 +02:00 · 2026-06-18 12:34:37 +02:00 · 01e4d75237
commit 01e4d75237
parent 705b4a028b
1 changed files with 22 additions and 0 deletions
--- a/22
+++ b/22
@ -249,3 +249,25 @@ Description: The lock committed with the triplet (#13) predated the published
             and its transitive deps into the lock. Commit the refreshed
             Pipfile.lock so the next machine installs the published wheel with
             the Hub path available.
+
+--ISSUE
+Content-Type: application/issue
+ID: 15
+Type: bugfix
+Title: apply_chat_template returns BatchEncoding on transformers 5.x
+Status: open
+Priority: high
+Created: 2026-06-18
+Module: sekft
+Relationships: 
+Description: build_masked_example assumed apply_chat_template returns a flat
+             list[int] (transformers 4.x). On transformers 5.x it returns a
+             BatchEncoding ({input_ids: [...]}), so ids was a dict, len(ids) was
+             the key count, and the prefix-differencing spuriously raised 'chat
+             template is not additive' on every real model (verified against
+             mistralai/Mistral-7B-Instruct-v0.2). The masking logic is sound and
+             the Mistral template is additive; only the return type needs
+             normalising. Add a _render_ids helper that extracts input_ids when
+             the result is dict-like, and use it for both renders. The
+             fake-tokenizer test returned a bare list and missed this, so add a
+             BatchEncoding-returning fake and assert the mask matches.