Compare commits

...

16 commits

Author SHA1 Message Date
Tiara Rodney
d8d9202e77
Merge branch 'develop' 2026-06-18 23:28:32 +02:00
Tiara Rodney
bcf19c1bfe
Merge branch 'feature/18' 2026-06-18 23:28:32 +02:00
Tiara Rodney
136d84279d
todo(18): done
Intro blockquote + 1.0.3 changelog committed on feature/18 (step 4), then released.
2026-06-18 23:28:31 +02:00
Tiara Rodney
03610709b7
docs(18): clarify the intro is not tool-calling; changelog 1.0.3 2026-06-18 23:28:31 +02:00
Tiara Rodney
c1d9947e9b
todo(18): in-progress
README intro carries the not-tool-calling blockquote and CHANGELOG has [1.0.3], committed on the issue branch, then released as v1.0.3.
2026-06-18 23:28:30 +02:00
Tiara Rodney
1a813b5e1d
todo(18): open 2026-06-18 23:28:29 +02:00
Tiara Rodney
847d3dac10
Merge branch 'feature/17' 2026-06-18 23:11:03 +02:00
Tiara Rodney
15201302b2
todo(17): done
Intro blockquote added: 'not tool-calling'. Stays on develop for the next sekft release.
2026-06-18 23:11:03 +02:00
Tiara Rodney
b87578d0b0
todo(17): in-progress
The README intro carries a clear 'not tool-calling' callout before the 'training half' paragraph.
2026-06-18 23:10:27 +02:00
Tiara Rodney
7edfb0640c
todo(17): open 2026-06-18 23:10:26 +02:00
Tiara Rodney
0a4adbdc5f
Merge branch 'develop' 2026-06-18 16:49:49 +02:00
Tiara Rodney
bd04c02b41
Merge branch 'bugfix/16'
bugfix(16): operators must not feed a BatchEncoding to model.generate
2026-06-18 16:49:33 +02:00
Tiara Rodney
1fb35e8e10
todo(16): done
_input_ids extracts ids from a BatchEncoding (5.x) or bare list/tensor (4.x); render_prompt_ids builds the generation prompt and extracts the tensor; eval.py + resident.py operators use it (no more BatchEncoding to generate); 12 tests pass (2 new); mypy strict clean. Box eval verification follows this release. No submodule changes.
2026-06-18 16:49:32 +02:00
Tiara Rodney
1279bc8965
bugfix(16): operators must not feed a BatchEncoding to model.generate
The transformers 5.x return-type change behind #15 also breaks generation:
apply_chat_template(add_generation_prompt=True, return_tensors="pt") returns a
BatchEncoding, and eval.py + resident.py passed it to model.generate, which does
inputs.shape[0] -> AttributeError (the holdout eval crashed on scenario 1). #15
fixed only the trainer. Factor a shared _input_ids helper and a render_prompt_ids
function; both operators use it. Tests cover _input_ids for both shapes and
render_prompt_ids.
2026-06-18 16:49:30 +02:00
Tiara Rodney
d261919404
todo(16): in-progress
A shared _input_ids helper extracts the id sequence from a BatchEncoding (5.x) or bare list/tensor (4.x); _render_ids uses it; a new render_prompt_ids(tokenizer, messages, device) builds the generation prompt and extracts input_ids; eval.py and resident.py operators use render_prompt_ids instead of passing a BatchEncoding to generate; unit test covers _input_ids for both shapes; existing tests pass; mypy strict clean; holdout eval runs on the box without the AttributeError.
2026-06-18 16:46:57 +02:00
Tiara Rodney
87cfccd54e
todo(16): open 2026-06-18 16:46:10 +02:00
7 changed files with 126 additions and 18 deletions

View file

@ -7,6 +7,26 @@ are documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.0.3] - 2026-06-18
### Changed
- The README intro now states up front that this is **not tool-calling**: sekft
trains shell operation, not function-calling; the model is given no typed tool
API or JSON-schema action list, and writes plain-text commands at a real prompt
with the whole system as its action space.
## [1.0.2] - 2026-06-18
### Fixed
- The generation operators (`sekft-eval`, `sekft-resident`) passed the
`BatchEncoding` from `apply_chat_template(..., return_tensors="pt")` straight
to `model.generate`, which does `inputs.shape[0]` and raised `AttributeError`
on transformers ≥ 5 — the holdout eval crashed on its first scenario. 1.0.1
fixed only the trainer's masking; this sweeps the generation path too. A shared
`_input_ids` helper and a `render_prompt_ids` function now extract the id
tensor for both operators, with unit tests for the BatchEncoding and bare
shapes.
## [1.0.1] - 2026-06-18 ## [1.0.1] - 2026-06-18
### Fixed ### Fixed
@ -50,5 +70,7 @@ trajectories into a fine-tuned shell operator.
mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft); mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft);
and a dependency on `posix-sdc[hub]`. Released under GPL-2.0. and a dependency on `posix-sdc[hub]`. Released under GPL-2.0.
[1.0.3]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.2...v1.0.3
[1.0.2]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.1...v1.0.2
[1.0.1]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.0...v1.0.1 [1.0.1]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.0...v1.0.1
[1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0 [1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0

View file

@ -5,6 +5,12 @@ land with **no imperative**, discover where directives live, learn the provider
from its own self-documentation, do the work, and terminate (`exit` on success, from its own self-documentation, do the work, and terminate (`exit` on success,
`panic` when genuinely blocked). `panic` when genuinely blocked).
> **Not tool-calling.** sekft trains shell operation, not function-calling. The
> model is given no typed tool API and no JSON-schema action list; it writes
> plain-text commands at a real prompt, with the whole system as its action
> space, discovered like a person would (`--help`, `man`, `ls`) rather than
> enumerated up front.
sekft is the **training half**. The dataset and the synthetic-data factory live sekft is the **training half**. The dataset and the synthetic-data factory live
in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
depends on. Here live the trainer, the behavioural evaluator, and the depends on. Here live the trainer, the behavioural evaluator, and the

53
TODO
View file

@ -271,3 +271,56 @@ Description: build_masked_example assumed apply_chat_template returns a flat
the result is dict-like, and use it for both renders. The the result is dict-like, and use it for both renders. The
fake-tokenizer test returned a bare list and missed this, so add a fake-tokenizer test returned a bare list and missed this, so add a
BatchEncoding-returning fake and assert the mask matches. BatchEncoding-returning fake and assert the mask matches.
--ISSUE
Content-Type: application/issue
ID: 16
Type: bugfix
Title: generation operators pass BatchEncoding to generate (transformers 5.x)
Status: done
Priority: high
Created: 2026-06-18
Module: sekft
Relationships:
Description: The same transformers 5.x return-type change that broke
build_masked_example (#15) also breaks the generation path:
apply_chat_template(add_generation_prompt=True,
return_tensors='pt') returns a BatchEncoding, and eval.py and
resident.py pass it straight to model.generate(), which does
inputs_tensor.shape[0] -> AttributeError (the holdout eval crashed
here on scenario 1). #15 only fixed the trainer. Factor the id
extraction into a shared _input_ids helper, add
render_prompt_ids(tokenizer, messages, device) in sft.py, and use
it in both operators. Add a unit test for _input_ids covering the
BatchEncoding and bare-sequence cases. This is the sweep I should
have done at #15.
--ISSUE
Content-Type: application/issue
ID: 17
Type: feature
Title: docs: state up front that this is not tool-calling
Status: done
Priority: medium
Created: 2026-06-18
Module: sekft
Relationships:
Description: Add a prominent clarification to the README intro that sekft trains
shell operation, not function-calling: the model is given no typed
tool API or JSON-schema action list; it writes plain-text commands
at a real prompt with the whole system as its action space,
discovered like a person does.
--ISSUE
Content-Type: application/issue
ID: 18
Type: feature
Title: docs: deliver the not-tool-calling intro clarification (1.0.3)
Status: done
Priority: medium
Created: 2026-06-18
Module: sekft
Relationships:
Description: Deliver the not-tool-calling clarification to the README intro and
add the 1.0.3 changelog entry. The prior issue's merge carried only
the todo status; the step-4 work commit was skipped.

View file

@ -28,7 +28,7 @@ from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
from tiararodney.posix_sdc.factory.rollout import rollout from tiararodney.posix_sdc.factory.rollout import rollout
from tiararodney.posix_sdc.schema import Scenario from tiararodney.posix_sdc.schema import Scenario
from .sft import normalize_for_template from .sft import render_prompt_ids
def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64, def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
@ -49,9 +49,7 @@ def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
model.eval() model.eval()
def operator(messages: list[dict[str, str]]) -> str: def operator(messages: list[dict[str, str]]) -> str:
msgs = normalize_for_template(messages) ids = render_prompt_ids(tok, messages, model.device)
ids = tok.apply_chat_template(
msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
with torch.no_grad(): with torch.no_grad():
out = model.generate( out = model.generate(
ids, max_new_tokens=max_new_tokens, ids, max_new_tokens=max_new_tokens,

View file

@ -32,7 +32,7 @@ from peft import (LoraConfig, PeftModel, get_peft_model,
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
DataCollatorForSeq2Seq, Trainer, TrainingArguments) DataCollatorForSeq2Seq, Trainer, TrainingArguments)
from .sft import build_masked_example, iter_keepers, normalize_for_template from .sft import build_masked_example, iter_keepers, render_prompt_ids
LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"] LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"]
@ -132,9 +132,7 @@ class Resident:
pm.eval() pm.eval()
def operator(messages: list[dict[str, str]]) -> str: def operator(messages: list[dict[str, str]]) -> str:
msgs = normalize_for_template(messages) ids = render_prompt_ids(self.tok, messages, pm.device)
ids = self.tok.apply_chat_template(
msgs, add_generation_prompt=True, return_tensors="pt").to(pm.device)
with torch.no_grad(): with torch.no_grad():
o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0, o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0,
temperature=max(temperature, 1e-2), temperature=max(temperature, 1e-2),

View file

@ -62,17 +62,33 @@ def normalize_for_template(messages: list[dict[str, str]]) -> list[dict[str, str
return out return out
def _render_ids(tokenizer: Any, msgs: list[dict[str, str]]) -> Any: def _input_ids(enc: Any) -> Any:
"""Token ids for a rendered conversation, as a flat sequence. """The id sequence from an ``apply_chat_template`` result. transformers >= 5
returns a ``BatchEncoding`` (``{input_ids: ...}``) where 4.x returned the
bare ``list[int]`` / tensor; return the ids either way. Passing the dict on
unfixed breaks everything downstream: the trainer's prefix-differencing sees
``len`` as the key count, and ``model.generate`` does ``inputs.shape[0]`` on
a dict and raises ``AttributeError``."""
return enc["input_ids"] if hasattr(enc, "keys") else enc
``apply_chat_template`` returns a ``BatchEncoding`` (``{input_ids: [...]}``)
on transformers >= 5, where 4.x returned a bare ``list[int]``. Normalise to def _render_ids(tokenizer: Any, msgs: list[dict[str, str]]) -> Any:
the id sequence either way, so the prefix-differencing below diffs tokens and """Token ids for a rendered conversation (no generation prompt), as a flat
not a dict (a dict makes ``len`` the key count and spuriously trips the sequence see :func:`_input_ids` for the BatchEncoding normalisation."""
not-additive guard). return _input_ids(tokenizer.apply_chat_template(msgs, add_generation_prompt=False))
"""
out = tokenizer.apply_chat_template(msgs, add_generation_prompt=False)
return out["input_ids"] if hasattr(out, "keys") else out def render_prompt_ids(tokenizer: Any, messages: list[dict[str, str]],
device: Any = None) -> Any:
"""The tokenized generation prompt for an operator: canonicalise the turns,
append the assistant generation prompt, and return the ``input_ids`` tensor
(extracted from the BatchEncoding on transformers >= 5), moved to ``device``
if given. Shared by the eval and resident operators so neither feeds a
BatchEncoding to ``model.generate``."""
enc = tokenizer.apply_chat_template(
normalize_for_template(messages), add_generation_prompt=True, return_tensors="pt")
ids = _input_ids(enc)
return ids.to(device) if device is not None else ids
def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]: def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]:

View file

@ -86,6 +86,21 @@ def test_mask_handles_batchencoding_return() -> None:
== sft.build_masked_example(raw, FakeTok())) == sft.build_masked_example(raw, FakeTok()))
def test_input_ids_extracts_from_batchencoding_or_passthrough() -> None:
# BatchEncoding (transformers 5.x) -> its input_ids; bare list/tensor (4.x) -> itself
assert sft._input_ids({"input_ids": [1, 2, 3], "attention_mask": [1, 1, 1]}) == [1, 2, 3]
assert sft._input_ids([4, 5, 6]) == [4, 5, 6]
def test_render_prompt_ids_normalises_and_appends_generation_prompt() -> None:
# the generation operators rely on this: fold + append <assistant>, return ids
# (not a BatchEncoding) so model.generate doesn't choke on a dict.
raw = [{"role": "system", "content": "orient"}, {"role": "user", "content": "go"}]
ids = sft.render_prompt_ids(FakeTok(), raw)
assert ids[-1] == "<assistant>" # generation prompt appended
assert {"orient", "go"} <= set(ids) # system folded into the user turn
def test_mask_raises_on_non_additive_template() -> None: def test_mask_raises_on_non_additive_template() -> None:
class BadTok: class BadTok:
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False, def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,