todo(15): open
This commit is contained in:
parent
705b4a028b
commit
01e4d75237
1 changed files with 22 additions and 0 deletions
22
TODO
22
TODO
|
|
@ -249,3 +249,25 @@ Description: The lock committed with the triplet (#13) predated the published
|
|||
and its transitive deps into the lock. Commit the refreshed
|
||||
Pipfile.lock so the next machine installs the published wheel with
|
||||
the Hub path available.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 15
|
||||
Type: bugfix
|
||||
Title: apply_chat_template returns BatchEncoding on transformers 5.x
|
||||
Status: open
|
||||
Priority: high
|
||||
Created: 2026-06-18
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: build_masked_example assumed apply_chat_template returns a flat
|
||||
list[int] (transformers 4.x). On transformers 5.x it returns a
|
||||
BatchEncoding ({input_ids: [...]}), so ids was a dict, len(ids) was
|
||||
the key count, and the prefix-differencing spuriously raised 'chat
|
||||
template is not additive' on every real model (verified against
|
||||
mistralai/Mistral-7B-Instruct-v0.2). The masking logic is sound and
|
||||
the Mistral template is additive; only the return type needs
|
||||
normalising. Add a _render_ids helper that extracts input_ids when
|
||||
the result is dict-like, and use it for both renders. The
|
||||
fake-tokenizer test returned a bare list and missed this, so add a
|
||||
BatchEncoding-returning fake and assert the mask matches.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue