Merge branch 'develop'

Merge branch 'feature/18'
todo(18): done
2026-06-18 23:28:32 +02:00 · 2026-06-18 23:28:32 +02:00 · 2026-06-18 23:28:31 +02:00 · 2026-06-18 23:28:31 +02:00 · 2026-06-18 23:28:30 +02:00 · 2026-06-18 23:28:29 +02:00
7 changed files with 189 additions and 10 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -7,6 +7,38 @@ are documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [1.0.3] - 2026-06-18
+
+### Changed
+- The README intro now states up front that this is **not tool-calling**: sekft
+  trains shell operation, not function-calling; the model is given no typed tool
+  API or JSON-schema action list, and writes plain-text commands at a real prompt
+  with the whole system as its action space.
+
+## [1.0.2] - 2026-06-18
+
+### Fixed
+- The generation operators (`sekft-eval`, `sekft-resident`) passed the
+  `BatchEncoding` from `apply_chat_template(..., return_tensors="pt")` straight
+  to `model.generate`, which does `inputs.shape[0]` and raised `AttributeError`
+  on transformers ≥ 5 — the holdout eval crashed on its first scenario. 1.0.1
+  fixed only the trainer's masking; this sweeps the generation path too. A shared
+  `_input_ids` helper and a `render_prompt_ids` function now extract the id
+  tensor for both operators, with unit tests for the BatchEncoding and bare
+  shapes.
+
+## [1.0.1] - 2026-06-18
+
+### Fixed
+- `build_masked_example` could not derive the assistant mask on transformers
+  ≥ 5: `apply_chat_template` now returns a `BatchEncoding` (`{input_ids: [...]}`)
+  where 4.x returned a bare `list[int]`, so the render was treated as a dict and
+  the prefix-differencing spuriously raised "chat template is not additive" on
+  every real model. The id sequence is now extracted either way; verified the
+  assistant-only mask against `mistralai/Mistral-7B-Instruct-v0.2`. The
+  fake-tokenizer test gained a `BatchEncoding`-returning variant so this can't
+  regress.
+
 ## [1.0.0] - 2026-06-18

 First release: the training and evaluation pipeline that turns posix-sdc
@ -38,4 +70,7 @@ trajectories into a fine-tuned shell operator.
  mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft);
  and a dependency on `posix-sdc[hub]`. Released under GPL-2.0.

+[1.0.3]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.2...v1.0.3
+[1.0.2]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.1...v1.0.2
+[1.0.1]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.0...v1.0.1
 [1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0
--- a/README.md
+++ b/README.md
@ -5,6 +5,12 @@ land with **no imperative**, discover where directives live, learn the provider
 from its own self-documentation, do the work, and terminate (`exit` on success,
 `panic` when genuinely blocked).

+> **Not tool-calling.** sekft trains shell operation, not function-calling. The
+> model is given no typed tool API and no JSON-schema action list; it writes
+> plain-text commands at a real prompt, with the whole system as its action
+> space, discovered like a person would (`--help`, `man`, `ls`) rather than
+> enumerated up front.
+
 sekft is the **training half**. The dataset and the synthetic-data factory live
 in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
 depends on. Here live the trainer, the behavioural evaluator, and the
--- a/75
+++ b/75
@ -249,3 +249,78 @@ Description: The lock committed with the triplet (#13) predated the published
             and its transitive deps into the lock. Commit the refreshed
             Pipfile.lock so the next machine installs the published wheel with
             the Hub path available.
+
+--ISSUE
+Content-Type: application/issue
+ID: 15
+Type: bugfix
+Title: apply_chat_template returns BatchEncoding on transformers 5.x
+Status: done
+Priority: high
+Created: 2026-06-18
+Module: sekft
+Relationships: 
+Description: build_masked_example assumed apply_chat_template returns a flat
+             list[int] (transformers 4.x). On transformers 5.x it returns a
+             BatchEncoding ({input_ids: [...]}), so ids was a dict, len(ids) was
+             the key count, and the prefix-differencing spuriously raised 'chat
+             template is not additive' on every real model (verified against
+             mistralai/Mistral-7B-Instruct-v0.2). The masking logic is sound and
+             the Mistral template is additive; only the return type needs
+             normalising. Add a _render_ids helper that extracts input_ids when
+             the result is dict-like, and use it for both renders. The
+             fake-tokenizer test returned a bare list and missed this, so add a
+             BatchEncoding-returning fake and assert the mask matches.
+
+--ISSUE
+Content-Type: application/issue
+ID: 16
+Type: bugfix
+Title: generation operators pass BatchEncoding to generate (transformers 5.x)
+Status: done
+Priority: high
+Created: 2026-06-18
+Module: sekft
+Relationships: 
+Description: The same transformers 5.x return-type change that broke
+             build_masked_example (#15) also breaks the generation path:
+             apply_chat_template(add_generation_prompt=True,
+             return_tensors='pt') returns a BatchEncoding, and eval.py and
+             resident.py pass it straight to model.generate(), which does
+             inputs_tensor.shape[0] -> AttributeError (the holdout eval crashed
+             here on scenario 1). #15 only fixed the trainer. Factor the id
+             extraction into a shared _input_ids helper, add
+             render_prompt_ids(tokenizer, messages, device) in sft.py, and use
+             it in both operators. Add a unit test for _input_ids covering the
+             BatchEncoding and bare-sequence cases. This is the sweep I should
+             have done at #15.
+
+--ISSUE
+Content-Type: application/issue
+ID: 17
+Type: feature
+Title: docs: state up front that this is not tool-calling
+Status: done
+Priority: medium
+Created: 2026-06-18
+Module: sekft
+Relationships: 
+Description: Add a prominent clarification to the README intro that sekft trains
+             shell operation, not function-calling: the model is given no typed
+             tool API or JSON-schema action list; it writes plain-text commands
+             at a real prompt with the whole system as its action space,
+             discovered like a person does.
+
+--ISSUE
+Content-Type: application/issue
+ID: 18
+Type: feature
+Title: docs: deliver the not-tool-calling intro clarification (1.0.3)
+Status: done
+Priority: medium
+Created: 2026-06-18
+Module: sekft
+Relationships: 
+Description: Deliver the not-tool-calling clarification to the README intro and
+             add the 1.0.3 changelog entry. The prior issue's merge carried only
+             the todo status; the step-4 work commit was skipped.
--- a/src/tiararodney/sekft/eval.py
+++ b/src/tiararodney/sekft/eval.py
@ -28,7 +28,7 @@ from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
 from tiararodney.posix_sdc.factory.rollout import rollout
 from tiararodney.posix_sdc.schema import Scenario

-from .sft import normalize_for_template
+from .sft import render_prompt_ids


 def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
@ -49,9 +49,7 @@ def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
    model.eval()

    def operator(messages: list[dict[str, str]]) -> str:
-        msgs = normalize_for_template(messages)
-        ids = tok.apply_chat_template(
-            msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
+        ids = render_prompt_ids(tok, messages, model.device)
        with torch.no_grad():
            out = model.generate(
                ids, max_new_tokens=max_new_tokens,
--- a/src/tiararodney/sekft/resident.py
+++ b/src/tiararodney/sekft/resident.py
@ -32,7 +32,7 @@ from peft import (LoraConfig, PeftModel, get_peft_model,
 from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
                          DataCollatorForSeq2Seq, Trainer, TrainingArguments)

-from .sft import build_masked_example, iter_keepers, normalize_for_template
+from .sft import build_masked_example, iter_keepers, render_prompt_ids

 LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"]

@ -132,9 +132,7 @@ class Resident:
        pm.eval()

        def operator(messages: list[dict[str, str]]) -> str:
-            msgs = normalize_for_template(messages)
-            ids = self.tok.apply_chat_template(
-                msgs, add_generation_prompt=True, return_tensors="pt").to(pm.device)
+            ids = render_prompt_ids(self.tok, messages, pm.device)
            with torch.no_grad():
                o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0,
                                temperature=max(temperature, 1e-2),
--- a/src/tiararodney/sekft/sft.py
+++ b/src/tiararodney/sekft/sft.py
@ -62,6 +62,35 @@ def normalize_for_template(messages: list[dict[str, str]]) -> list[dict[str, str
    return out


+def _input_ids(enc: Any) -> Any:
+    """The id sequence from an ``apply_chat_template`` result. transformers >= 5
+    returns a ``BatchEncoding`` (``{input_ids: ...}``) where 4.x returned the
+    bare ``list[int]`` / tensor; return the ids either way. Passing the dict on
+    unfixed breaks everything downstream: the trainer's prefix-differencing sees
+    ``len`` as the key count, and ``model.generate`` does ``inputs.shape[0]`` on
+    a dict and raises ``AttributeError``."""
+    return enc["input_ids"] if hasattr(enc, "keys") else enc
+
+
+def _render_ids(tokenizer: Any, msgs: list[dict[str, str]]) -> Any:
+    """Token ids for a rendered conversation (no generation prompt), as a flat
+    sequence — see :func:`_input_ids` for the BatchEncoding normalisation."""
+    return _input_ids(tokenizer.apply_chat_template(msgs, add_generation_prompt=False))
+
+
+def render_prompt_ids(tokenizer: Any, messages: list[dict[str, str]],
+                      device: Any = None) -> Any:
+    """The tokenized generation prompt for an operator: canonicalise the turns,
+    append the assistant generation prompt, and return the ``input_ids`` tensor
+    (extracted from the BatchEncoding on transformers >= 5), moved to ``device``
+    if given. Shared by the eval and resident operators so neither feeds a
+    BatchEncoding to ``model.generate``."""
+    enc = tokenizer.apply_chat_template(
+        normalize_for_template(messages), add_generation_prompt=True, return_tensors="pt")
+    ids = _input_ids(enc)
+    return ids.to(device) if device is not None else ids
+
+
 def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]:
    """Tokenize a trajectory with the tokenizer's OWN chat template and build an
    assistant-only loss mask.
@ -76,11 +105,11 @@ def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict
    non-additive one raises rather than silently mis-mask.
    """
    msgs = normalize_for_template(messages)
-    ids = tokenizer.apply_chat_template(msgs, add_generation_prompt=False)
+    ids = _render_ids(tokenizer, msgs)
    labels = [-100] * len(ids)
    prev: list[int] = []
    for i, m in enumerate(msgs):
-        upto = tokenizer.apply_chat_template(msgs[:i + 1], add_generation_prompt=False)
+        upto = _render_ids(tokenizer, msgs[:i + 1])
        if ids[:len(upto)] != upto or upto[:len(prev)] != prev:
            raise ValueError("chat template is not additive; cannot derive an "
                             "assistant loss mask by token-prefix differencing")
--- a/tests/unit/test_sft.py
+++ b/tests/unit/test_sft.py
@ -27,6 +27,15 @@ class FakeTok:
        return toks


+class FakeTokBatchEncoding(FakeTok):
+    """Like FakeTok, but returns a dict as transformers >= 5's
+    ``apply_chat_template`` does (a BatchEncoding), to exercise the id-extraction."""
+
+    def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
+                            return_tensors: Any = None) -> dict[str, list[str]]:
+        return {"input_ids": super().apply_chat_template(msgs, add_generation_prompt, return_tensors)}
+
+
 def test_normalize_folds_system_and_merges_consecutive() -> None:
    raw = [
        {"role": "system", "content": "orient"},
@ -63,6 +72,35 @@ def test_mask_trains_assistant_turns_only() -> None:
    assert {"orient", "login", "out"} <= set(masked)       # environment masked


+def test_mask_handles_batchencoding_return() -> None:
+    # transformers >= 5 returns a BatchEncoding ({input_ids: [...]}) rather than a
+    # bare list[int]; the mask must come out identical. Regression for the 5.x bug
+    # that made every real template look "not additive".
+    raw = [
+        {"role": "user", "content": "login"},
+        {"role": "assistant", "content": "cat f"},
+        {"role": "user", "content": "out"},
+        {"role": "assistant", "content": "exit"},
+    ]
+    assert (sft.build_masked_example(raw, FakeTokBatchEncoding())
+            == sft.build_masked_example(raw, FakeTok()))
+
+
+def test_input_ids_extracts_from_batchencoding_or_passthrough() -> None:
+    # BatchEncoding (transformers 5.x) -> its input_ids; bare list/tensor (4.x) -> itself
+    assert sft._input_ids({"input_ids": [1, 2, 3], "attention_mask": [1, 1, 1]}) == [1, 2, 3]
+    assert sft._input_ids([4, 5, 6]) == [4, 5, 6]
+
+
+def test_render_prompt_ids_normalises_and_appends_generation_prompt() -> None:
+    # the generation operators rely on this: fold + append <assistant>, return ids
+    # (not a BatchEncoding) so model.generate doesn't choke on a dict.
+    raw = [{"role": "system", "content": "orient"}, {"role": "user", "content": "go"}]
+    ids = sft.render_prompt_ids(FakeTok(), raw)
+    assert ids[-1] == "<assistant>"                 # generation prompt appended
+    assert {"orient", "go"} <= set(ids)             # system folded into the user turn
+
+
 def test_mask_raises_on_non_additive_template() -> None:
    class BadTok:
        def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
Author	SHA1	Message	Date
Tiara Rodney	d8d9202e77	Merge branch 'develop'	2026-06-18 23:28:32 +02:00
Tiara Rodney	bcf19c1bfe	Merge branch 'feature/18'	2026-06-18 23:28:32 +02:00
Tiara Rodney	136d84279d	todo(18): done Intro blockquote + 1.0.3 changelog committed on feature/18 (step 4), then released.	2026-06-18 23:28:31 +02:00
Tiara Rodney	03610709b7	docs(18): clarify the intro is not tool-calling; changelog 1.0.3	2026-06-18 23:28:31 +02:00
Tiara Rodney	c1d9947e9b	todo(18): in-progress README intro carries the not-tool-calling blockquote and CHANGELOG has [1.0.3], committed on the issue branch, then released as v1.0.3.	2026-06-18 23:28:30 +02:00
Tiara Rodney	1a813b5e1d	todo(18): open	2026-06-18 23:28:29 +02:00
Tiara Rodney	847d3dac10	Merge branch 'feature/17'	2026-06-18 23:11:03 +02:00
Tiara Rodney	15201302b2	todo(17): done Intro blockquote added: 'not tool-calling'. Stays on develop for the next sekft release.	2026-06-18 23:11:03 +02:00
Tiara Rodney	b87578d0b0	todo(17): in-progress The README intro carries a clear 'not tool-calling' callout before the 'training half' paragraph.	2026-06-18 23:10:27 +02:00
Tiara Rodney	7edfb0640c	todo(17): open	2026-06-18 23:10:26 +02:00
Tiara Rodney	0a4adbdc5f	Merge branch 'develop'	2026-06-18 16:49:49 +02:00
Tiara Rodney	bd04c02b41	Merge branch 'bugfix/16' bugfix(16): operators must not feed a BatchEncoding to model.generate	2026-06-18 16:49:33 +02:00
Tiara Rodney	1fb35e8e10	todo(16): done _input_ids extracts ids from a BatchEncoding (5.x) or bare list/tensor (4.x); render_prompt_ids builds the generation prompt and extracts the tensor; eval.py + resident.py operators use it (no more BatchEncoding to generate); 12 tests pass (2 new); mypy strict clean. Box eval verification follows this release. No submodule changes.	2026-06-18 16:49:32 +02:00
Tiara Rodney	1279bc8965	bugfix(16): operators must not feed a BatchEncoding to model.generate The transformers 5.x return-type change behind #15 also breaks generation: apply_chat_template(add_generation_prompt=True, return_tensors="pt") returns a BatchEncoding, and eval.py + resident.py passed it to model.generate, which does inputs.shape[0] -> AttributeError (the holdout eval crashed on scenario 1). #15 fixed only the trainer. Factor a shared _input_ids helper and a render_prompt_ids function; both operators use it. Tests cover _input_ids for both shapes and render_prompt_ids.	2026-06-18 16:49:30 +02:00
Tiara Rodney	d261919404	todo(16): in-progress A shared _input_ids helper extracts the id sequence from a BatchEncoding (5.x) or bare list/tensor (4.x); _render_ids uses it; a new render_prompt_ids(tokenizer, messages, device) builds the generation prompt and extracts input_ids; eval.py and resident.py operators use render_prompt_ids instead of passing a BatchEncoding to generate; unit test covers _input_ids for both shapes; existing tests pass; mypy strict clean; holdout eval runs on the box without the AttributeError.	2026-06-18 16:46:57 +02:00
Tiara Rodney	87cfccd54e	todo(16): open	2026-06-18 16:46:10 +02:00
Tiara Rodney	a76470e55d	Merge branch 'develop'	2026-06-18 12:37:18 +02:00
Tiara Rodney	e1f8ef8d1a	Merge branch 'bugfix/15' bugfix(15): normalise apply_chat_template's BatchEncoding (transformers 5.x)	2026-06-18 12:37:04 +02:00
Tiara Rodney	f9913b45c3	todo(15): done _render_ids extracts input_ids from a BatchEncoding (5.x) or passes a list through (4.x); regression test asserts the BatchEncoding path yields the same mask; 10 tests pass; mypy strict clean. End-to-end box verification of the correct mask against Mistral done before this release. No submodule changes.	2026-06-18 12:37:03 +02:00
Tiara Rodney	4987d951ce	bugfix(15): normalise apply_chat_template's BatchEncoding (transformers 5.x) apply_chat_template returns a BatchEncoding ({input_ids: [...]}) on transformers >= 5 where 4.x returned a bare list[int]. build_masked_example treated the render as a dict, so len/slicing were wrong and the prefix-differencing spuriously raised "chat template is not additive" on every real model. Extract the id sequence via a _render_ids helper; verified the assistant-only mask against mistralai/Mistral-7B-Instruct-v0.2. The fake tokenizer returned a bare list and missed this, so a BatchEncoding-returning variant now guards it.	2026-06-18 12:37:01 +02:00
Tiara Rodney	7853224796	todo(15): in-progress build_masked_example normalises apply_chat_template's BatchEncoding (transformers 5.x) and list[int] (4.x) returns via a _render_ids helper; a BatchEncoding-returning fake tokenizer produces the same mask as the list-returning one (regression test added); existing tests pass; mypy strict clean; verified end-to-end on the box that sekft-train --inspect produces a correct assistant-only mask against Mistral.	2026-06-18 12:34:55 +02:00
Tiara Rodney	01e4d75237	todo(15): open	2026-06-18 12:34:37 +02:00