Compare commits

..

22 commits

Author SHA1 Message Date
Tiara Rodney
d8d9202e77
Merge branch 'develop' 2026-06-18 23:28:32 +02:00
Tiara Rodney
bcf19c1bfe
Merge branch 'feature/18' 2026-06-18 23:28:32 +02:00
Tiara Rodney
136d84279d
todo(18): done
Intro blockquote + 1.0.3 changelog committed on feature/18 (step 4), then released.
2026-06-18 23:28:31 +02:00
Tiara Rodney
03610709b7
docs(18): clarify the intro is not tool-calling; changelog 1.0.3 2026-06-18 23:28:31 +02:00
Tiara Rodney
c1d9947e9b
todo(18): in-progress
README intro carries the not-tool-calling blockquote and CHANGELOG has [1.0.3], committed on the issue branch, then released as v1.0.3.
2026-06-18 23:28:30 +02:00
Tiara Rodney
1a813b5e1d
todo(18): open 2026-06-18 23:28:29 +02:00
Tiara Rodney
847d3dac10
Merge branch 'feature/17' 2026-06-18 23:11:03 +02:00
Tiara Rodney
15201302b2
todo(17): done
Intro blockquote added: 'not tool-calling'. Stays on develop for the next sekft release.
2026-06-18 23:11:03 +02:00
Tiara Rodney
b87578d0b0
todo(17): in-progress
The README intro carries a clear 'not tool-calling' callout before the 'training half' paragraph.
2026-06-18 23:10:27 +02:00
Tiara Rodney
7edfb0640c
todo(17): open 2026-06-18 23:10:26 +02:00
Tiara Rodney
0a4adbdc5f
Merge branch 'develop' 2026-06-18 16:49:49 +02:00
Tiara Rodney
bd04c02b41
Merge branch 'bugfix/16'
bugfix(16): operators must not feed a BatchEncoding to model.generate
2026-06-18 16:49:33 +02:00
Tiara Rodney
1fb35e8e10
todo(16): done
_input_ids extracts ids from a BatchEncoding (5.x) or bare list/tensor (4.x); render_prompt_ids builds the generation prompt and extracts the tensor; eval.py + resident.py operators use it (no more BatchEncoding to generate); 12 tests pass (2 new); mypy strict clean. Box eval verification follows this release. No submodule changes.
2026-06-18 16:49:32 +02:00
Tiara Rodney
1279bc8965
bugfix(16): operators must not feed a BatchEncoding to model.generate
The transformers 5.x return-type change behind #15 also breaks generation:
apply_chat_template(add_generation_prompt=True, return_tensors="pt") returns a
BatchEncoding, and eval.py + resident.py passed it to model.generate, which does
inputs.shape[0] -> AttributeError (the holdout eval crashed on scenario 1). #15
fixed only the trainer. Factor a shared _input_ids helper and a render_prompt_ids
function; both operators use it. Tests cover _input_ids for both shapes and
render_prompt_ids.
2026-06-18 16:49:30 +02:00
Tiara Rodney
d261919404
todo(16): in-progress
A shared _input_ids helper extracts the id sequence from a BatchEncoding (5.x) or bare list/tensor (4.x); _render_ids uses it; a new render_prompt_ids(tokenizer, messages, device) builds the generation prompt and extracts input_ids; eval.py and resident.py operators use render_prompt_ids instead of passing a BatchEncoding to generate; unit test covers _input_ids for both shapes; existing tests pass; mypy strict clean; holdout eval runs on the box without the AttributeError.
2026-06-18 16:46:57 +02:00
Tiara Rodney
87cfccd54e
todo(16): open 2026-06-18 16:46:10 +02:00
Tiara Rodney
a76470e55d
Merge branch 'develop' 2026-06-18 12:37:18 +02:00
Tiara Rodney
e1f8ef8d1a
Merge branch 'bugfix/15'
bugfix(15): normalise apply_chat_template's BatchEncoding (transformers 5.x)
2026-06-18 12:37:04 +02:00
Tiara Rodney
f9913b45c3
todo(15): done
_render_ids extracts input_ids from a BatchEncoding (5.x) or passes a list through (4.x); regression test asserts the BatchEncoding path yields the same mask; 10 tests pass; mypy strict clean. End-to-end box verification of the correct mask against Mistral done before this release. No submodule changes.
2026-06-18 12:37:03 +02:00
Tiara Rodney
4987d951ce
bugfix(15): normalise apply_chat_template's BatchEncoding (transformers 5.x)
apply_chat_template returns a BatchEncoding ({input_ids: [...]}) on transformers
>= 5 where 4.x returned a bare list[int]. build_masked_example treated the render
as a dict, so len/slicing were wrong and the prefix-differencing spuriously
raised "chat template is not additive" on every real model. Extract the id
sequence via a _render_ids helper; verified the assistant-only mask against
mistralai/Mistral-7B-Instruct-v0.2. The fake tokenizer returned a bare list and
missed this, so a BatchEncoding-returning variant now guards it.
2026-06-18 12:37:01 +02:00
Tiara Rodney
7853224796
todo(15): in-progress
build_masked_example normalises apply_chat_template's BatchEncoding (transformers 5.x) and list[int] (4.x) returns via a _render_ids helper; a BatchEncoding-returning fake tokenizer produces the same mask as the list-returning one (regression test added); existing tests pass; mypy strict clean; verified end-to-end on the box that sekft-train --inspect produces a correct assistant-only mask against Mistral.
2026-06-18 12:34:55 +02:00
Tiara Rodney
01e4d75237
todo(15): open 2026-06-18 12:34:37 +02:00
7 changed files with 189 additions and 10 deletions

View file

@ -7,6 +7,38 @@ are documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.0.3] - 2026-06-18
### Changed
- The README intro now states up front that this is **not tool-calling**: sekft
trains shell operation, not function-calling; the model is given no typed tool
API or JSON-schema action list, and writes plain-text commands at a real prompt
with the whole system as its action space.
## [1.0.2] - 2026-06-18
### Fixed
- The generation operators (`sekft-eval`, `sekft-resident`) passed the
`BatchEncoding` from `apply_chat_template(..., return_tensors="pt")` straight
to `model.generate`, which does `inputs.shape[0]` and raised `AttributeError`
on transformers ≥ 5 — the holdout eval crashed on its first scenario. 1.0.1
fixed only the trainer's masking; this sweeps the generation path too. A shared
`_input_ids` helper and a `render_prompt_ids` function now extract the id
tensor for both operators, with unit tests for the BatchEncoding and bare
shapes.
## [1.0.1] - 2026-06-18
### Fixed
- `build_masked_example` could not derive the assistant mask on transformers
≥ 5: `apply_chat_template` now returns a `BatchEncoding` (`{input_ids: [...]}`)
where 4.x returned a bare `list[int]`, so the render was treated as a dict and
the prefix-differencing spuriously raised "chat template is not additive" on
every real model. The id sequence is now extracted either way; verified the
assistant-only mask against `mistralai/Mistral-7B-Instruct-v0.2`. The
fake-tokenizer test gained a `BatchEncoding`-returning variant so this can't
regress.
## [1.0.0] - 2026-06-18 ## [1.0.0] - 2026-06-18
First release: the training and evaluation pipeline that turns posix-sdc First release: the training and evaluation pipeline that turns posix-sdc
@ -38,4 +70,7 @@ trajectories into a fine-tuned shell operator.
mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft); mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft);
and a dependency on `posix-sdc[hub]`. Released under GPL-2.0. and a dependency on `posix-sdc[hub]`. Released under GPL-2.0.
[1.0.3]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.2...v1.0.3
[1.0.2]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.1...v1.0.2
[1.0.1]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.0...v1.0.1
[1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0 [1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0

View file

@ -5,6 +5,12 @@ land with **no imperative**, discover where directives live, learn the provider
from its own self-documentation, do the work, and terminate (`exit` on success, from its own self-documentation, do the work, and terminate (`exit` on success,
`panic` when genuinely blocked). `panic` when genuinely blocked).
> **Not tool-calling.** sekft trains shell operation, not function-calling. The
> model is given no typed tool API and no JSON-schema action list; it writes
> plain-text commands at a real prompt, with the whole system as its action
> space, discovered like a person would (`--help`, `man`, `ls`) rather than
> enumerated up front.
sekft is the **training half**. The dataset and the synthetic-data factory live sekft is the **training half**. The dataset and the synthetic-data factory live
in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
depends on. Here live the trainer, the behavioural evaluator, and the depends on. Here live the trainer, the behavioural evaluator, and the

75
TODO
View file

@ -249,3 +249,78 @@ Description: The lock committed with the triplet (#13) predated the published
and its transitive deps into the lock. Commit the refreshed and its transitive deps into the lock. Commit the refreshed
Pipfile.lock so the next machine installs the published wheel with Pipfile.lock so the next machine installs the published wheel with
the Hub path available. the Hub path available.
--ISSUE
Content-Type: application/issue
ID: 15
Type: bugfix
Title: apply_chat_template returns BatchEncoding on transformers 5.x
Status: done
Priority: high
Created: 2026-06-18
Module: sekft
Relationships:
Description: build_masked_example assumed apply_chat_template returns a flat
list[int] (transformers 4.x). On transformers 5.x it returns a
BatchEncoding ({input_ids: [...]}), so ids was a dict, len(ids) was
the key count, and the prefix-differencing spuriously raised 'chat
template is not additive' on every real model (verified against
mistralai/Mistral-7B-Instruct-v0.2). The masking logic is sound and
the Mistral template is additive; only the return type needs
normalising. Add a _render_ids helper that extracts input_ids when
the result is dict-like, and use it for both renders. The
fake-tokenizer test returned a bare list and missed this, so add a
BatchEncoding-returning fake and assert the mask matches.
--ISSUE
Content-Type: application/issue
ID: 16
Type: bugfix
Title: generation operators pass BatchEncoding to generate (transformers 5.x)
Status: done
Priority: high
Created: 2026-06-18
Module: sekft
Relationships:
Description: The same transformers 5.x return-type change that broke
build_masked_example (#15) also breaks the generation path:
apply_chat_template(add_generation_prompt=True,
return_tensors='pt') returns a BatchEncoding, and eval.py and
resident.py pass it straight to model.generate(), which does
inputs_tensor.shape[0] -> AttributeError (the holdout eval crashed
here on scenario 1). #15 only fixed the trainer. Factor the id
extraction into a shared _input_ids helper, add
render_prompt_ids(tokenizer, messages, device) in sft.py, and use
it in both operators. Add a unit test for _input_ids covering the
BatchEncoding and bare-sequence cases. This is the sweep I should
have done at #15.
--ISSUE
Content-Type: application/issue
ID: 17
Type: feature
Title: docs: state up front that this is not tool-calling
Status: done
Priority: medium
Created: 2026-06-18
Module: sekft
Relationships:
Description: Add a prominent clarification to the README intro that sekft trains
shell operation, not function-calling: the model is given no typed
tool API or JSON-schema action list; it writes plain-text commands
at a real prompt with the whole system as its action space,
discovered like a person does.
--ISSUE
Content-Type: application/issue
ID: 18
Type: feature
Title: docs: deliver the not-tool-calling intro clarification (1.0.3)
Status: done
Priority: medium
Created: 2026-06-18
Module: sekft
Relationships:
Description: Deliver the not-tool-calling clarification to the README intro and
add the 1.0.3 changelog entry. The prior issue's merge carried only
the todo status; the step-4 work commit was skipped.

View file

@ -28,7 +28,7 @@ from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
from tiararodney.posix_sdc.factory.rollout import rollout from tiararodney.posix_sdc.factory.rollout import rollout
from tiararodney.posix_sdc.schema import Scenario from tiararodney.posix_sdc.schema import Scenario
from .sft import normalize_for_template from .sft import render_prompt_ids
def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64, def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
@ -49,9 +49,7 @@ def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
model.eval() model.eval()
def operator(messages: list[dict[str, str]]) -> str: def operator(messages: list[dict[str, str]]) -> str:
msgs = normalize_for_template(messages) ids = render_prompt_ids(tok, messages, model.device)
ids = tok.apply_chat_template(
msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
with torch.no_grad(): with torch.no_grad():
out = model.generate( out = model.generate(
ids, max_new_tokens=max_new_tokens, ids, max_new_tokens=max_new_tokens,

View file

@ -32,7 +32,7 @@ from peft import (LoraConfig, PeftModel, get_peft_model,
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
DataCollatorForSeq2Seq, Trainer, TrainingArguments) DataCollatorForSeq2Seq, Trainer, TrainingArguments)
from .sft import build_masked_example, iter_keepers, normalize_for_template from .sft import build_masked_example, iter_keepers, render_prompt_ids
LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"] LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"]
@ -132,9 +132,7 @@ class Resident:
pm.eval() pm.eval()
def operator(messages: list[dict[str, str]]) -> str: def operator(messages: list[dict[str, str]]) -> str:
msgs = normalize_for_template(messages) ids = render_prompt_ids(self.tok, messages, pm.device)
ids = self.tok.apply_chat_template(
msgs, add_generation_prompt=True, return_tensors="pt").to(pm.device)
with torch.no_grad(): with torch.no_grad():
o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0, o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0,
temperature=max(temperature, 1e-2), temperature=max(temperature, 1e-2),

View file

@ -62,6 +62,35 @@ def normalize_for_template(messages: list[dict[str, str]]) -> list[dict[str, str
return out return out
def _input_ids(enc: Any) -> Any:
"""The id sequence from an ``apply_chat_template`` result. transformers >= 5
returns a ``BatchEncoding`` (``{input_ids: ...}``) where 4.x returned the
bare ``list[int]`` / tensor; return the ids either way. Passing the dict on
unfixed breaks everything downstream: the trainer's prefix-differencing sees
``len`` as the key count, and ``model.generate`` does ``inputs.shape[0]`` on
a dict and raises ``AttributeError``."""
return enc["input_ids"] if hasattr(enc, "keys") else enc
def _render_ids(tokenizer: Any, msgs: list[dict[str, str]]) -> Any:
"""Token ids for a rendered conversation (no generation prompt), as a flat
sequence see :func:`_input_ids` for the BatchEncoding normalisation."""
return _input_ids(tokenizer.apply_chat_template(msgs, add_generation_prompt=False))
def render_prompt_ids(tokenizer: Any, messages: list[dict[str, str]],
device: Any = None) -> Any:
"""The tokenized generation prompt for an operator: canonicalise the turns,
append the assistant generation prompt, and return the ``input_ids`` tensor
(extracted from the BatchEncoding on transformers >= 5), moved to ``device``
if given. Shared by the eval and resident operators so neither feeds a
BatchEncoding to ``model.generate``."""
enc = tokenizer.apply_chat_template(
normalize_for_template(messages), add_generation_prompt=True, return_tensors="pt")
ids = _input_ids(enc)
return ids.to(device) if device is not None else ids
def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]: def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]:
"""Tokenize a trajectory with the tokenizer's OWN chat template and build an """Tokenize a trajectory with the tokenizer's OWN chat template and build an
assistant-only loss mask. assistant-only loss mask.
@ -76,11 +105,11 @@ def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict
non-additive one raises rather than silently mis-mask. non-additive one raises rather than silently mis-mask.
""" """
msgs = normalize_for_template(messages) msgs = normalize_for_template(messages)
ids = tokenizer.apply_chat_template(msgs, add_generation_prompt=False) ids = _render_ids(tokenizer, msgs)
labels = [-100] * len(ids) labels = [-100] * len(ids)
prev: list[int] = [] prev: list[int] = []
for i, m in enumerate(msgs): for i, m in enumerate(msgs):
upto = tokenizer.apply_chat_template(msgs[:i + 1], add_generation_prompt=False) upto = _render_ids(tokenizer, msgs[:i + 1])
if ids[:len(upto)] != upto or upto[:len(prev)] != prev: if ids[:len(upto)] != upto or upto[:len(prev)] != prev:
raise ValueError("chat template is not additive; cannot derive an " raise ValueError("chat template is not additive; cannot derive an "
"assistant loss mask by token-prefix differencing") "assistant loss mask by token-prefix differencing")

View file

@ -27,6 +27,15 @@ class FakeTok:
return toks return toks
class FakeTokBatchEncoding(FakeTok):
"""Like FakeTok, but returns a dict as transformers >= 5's
``apply_chat_template`` does (a BatchEncoding), to exercise the id-extraction."""
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
return_tensors: Any = None) -> dict[str, list[str]]:
return {"input_ids": super().apply_chat_template(msgs, add_generation_prompt, return_tensors)}
def test_normalize_folds_system_and_merges_consecutive() -> None: def test_normalize_folds_system_and_merges_consecutive() -> None:
raw = [ raw = [
{"role": "system", "content": "orient"}, {"role": "system", "content": "orient"},
@ -63,6 +72,35 @@ def test_mask_trains_assistant_turns_only() -> None:
assert {"orient", "login", "out"} <= set(masked) # environment masked assert {"orient", "login", "out"} <= set(masked) # environment masked
def test_mask_handles_batchencoding_return() -> None:
# transformers >= 5 returns a BatchEncoding ({input_ids: [...]}) rather than a
# bare list[int]; the mask must come out identical. Regression for the 5.x bug
# that made every real template look "not additive".
raw = [
{"role": "user", "content": "login"},
{"role": "assistant", "content": "cat f"},
{"role": "user", "content": "out"},
{"role": "assistant", "content": "exit"},
]
assert (sft.build_masked_example(raw, FakeTokBatchEncoding())
== sft.build_masked_example(raw, FakeTok()))
def test_input_ids_extracts_from_batchencoding_or_passthrough() -> None:
# BatchEncoding (transformers 5.x) -> its input_ids; bare list/tensor (4.x) -> itself
assert sft._input_ids({"input_ids": [1, 2, 3], "attention_mask": [1, 1, 1]}) == [1, 2, 3]
assert sft._input_ids([4, 5, 6]) == [4, 5, 6]
def test_render_prompt_ids_normalises_and_appends_generation_prompt() -> None:
# the generation operators rely on this: fold + append <assistant>, return ids
# (not a BatchEncoding) so model.generate doesn't choke on a dict.
raw = [{"role": "system", "content": "orient"}, {"role": "user", "content": "go"}]
ids = sft.render_prompt_ids(FakeTok(), raw)
assert ids[-1] == "<assistant>" # generation prompt appended
assert {"orient", "go"} <= set(ids) # system folded into the user turn
def test_mask_raises_on_non_additive_template() -> None: def test_mask_raises_on_non_additive_template() -> None:
class BadTok: class BadTok:
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False, def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,