Compare commits

..

No commits in common. "master" and "v1.0.0" have entirely different histories.

7 changed files with 10 additions and 189 deletions

View file

@ -7,38 +7,6 @@ are documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.0.3] - 2026-06-18
### Changed
- The README intro now states up front that this is **not tool-calling**: sekft
trains shell operation, not function-calling; the model is given no typed tool
API or JSON-schema action list, and writes plain-text commands at a real prompt
with the whole system as its action space.
## [1.0.2] - 2026-06-18
### Fixed
- The generation operators (`sekft-eval`, `sekft-resident`) passed the
`BatchEncoding` from `apply_chat_template(..., return_tensors="pt")` straight
to `model.generate`, which does `inputs.shape[0]` and raised `AttributeError`
on transformers ≥ 5 — the holdout eval crashed on its first scenario. 1.0.1
fixed only the trainer's masking; this sweeps the generation path too. A shared
`_input_ids` helper and a `render_prompt_ids` function now extract the id
tensor for both operators, with unit tests for the BatchEncoding and bare
shapes.
## [1.0.1] - 2026-06-18
### Fixed
- `build_masked_example` could not derive the assistant mask on transformers
≥ 5: `apply_chat_template` now returns a `BatchEncoding` (`{input_ids: [...]}`)
where 4.x returned a bare `list[int]`, so the render was treated as a dict and
the prefix-differencing spuriously raised "chat template is not additive" on
every real model. The id sequence is now extracted either way; verified the
assistant-only mask against `mistralai/Mistral-7B-Instruct-v0.2`. The
fake-tokenizer test gained a `BatchEncoding`-returning variant so this can't
regress.
## [1.0.0] - 2026-06-18 ## [1.0.0] - 2026-06-18
First release: the training and evaluation pipeline that turns posix-sdc First release: the training and evaluation pipeline that turns posix-sdc
@ -70,7 +38,4 @@ trajectories into a fine-tuned shell operator.
mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft); mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft);
and a dependency on `posix-sdc[hub]`. Released under GPL-2.0. and a dependency on `posix-sdc[hub]`. Released under GPL-2.0.
[1.0.3]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.2...v1.0.3
[1.0.2]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.1...v1.0.2
[1.0.1]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.0...v1.0.1
[1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0 [1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0

View file

@ -5,12 +5,6 @@ land with **no imperative**, discover where directives live, learn the provider
from its own self-documentation, do the work, and terminate (`exit` on success, from its own self-documentation, do the work, and terminate (`exit` on success,
`panic` when genuinely blocked). `panic` when genuinely blocked).
> **Not tool-calling.** sekft trains shell operation, not function-calling. The
> model is given no typed tool API and no JSON-schema action list; it writes
> plain-text commands at a real prompt, with the whole system as its action
> space, discovered like a person would (`--help`, `man`, `ls`) rather than
> enumerated up front.
sekft is the **training half**. The dataset and the synthetic-data factory live sekft is the **training half**. The dataset and the synthetic-data factory live
in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
depends on. Here live the trainer, the behavioural evaluator, and the depends on. Here live the trainer, the behavioural evaluator, and the

75
TODO
View file

@ -249,78 +249,3 @@ Description: The lock committed with the triplet (#13) predated the published
and its transitive deps into the lock. Commit the refreshed and its transitive deps into the lock. Commit the refreshed
Pipfile.lock so the next machine installs the published wheel with Pipfile.lock so the next machine installs the published wheel with
the Hub path available. the Hub path available.
--ISSUE
Content-Type: application/issue
ID: 15
Type: bugfix
Title: apply_chat_template returns BatchEncoding on transformers 5.x
Status: done
Priority: high
Created: 2026-06-18
Module: sekft
Relationships:
Description: build_masked_example assumed apply_chat_template returns a flat
list[int] (transformers 4.x). On transformers 5.x it returns a
BatchEncoding ({input_ids: [...]}), so ids was a dict, len(ids) was
the key count, and the prefix-differencing spuriously raised 'chat
template is not additive' on every real model (verified against
mistralai/Mistral-7B-Instruct-v0.2). The masking logic is sound and
the Mistral template is additive; only the return type needs
normalising. Add a _render_ids helper that extracts input_ids when
the result is dict-like, and use it for both renders. The
fake-tokenizer test returned a bare list and missed this, so add a
BatchEncoding-returning fake and assert the mask matches.
--ISSUE
Content-Type: application/issue
ID: 16
Type: bugfix
Title: generation operators pass BatchEncoding to generate (transformers 5.x)
Status: done
Priority: high
Created: 2026-06-18
Module: sekft
Relationships:
Description: The same transformers 5.x return-type change that broke
build_masked_example (#15) also breaks the generation path:
apply_chat_template(add_generation_prompt=True,
return_tensors='pt') returns a BatchEncoding, and eval.py and
resident.py pass it straight to model.generate(), which does
inputs_tensor.shape[0] -> AttributeError (the holdout eval crashed
here on scenario 1). #15 only fixed the trainer. Factor the id
extraction into a shared _input_ids helper, add
render_prompt_ids(tokenizer, messages, device) in sft.py, and use
it in both operators. Add a unit test for _input_ids covering the
BatchEncoding and bare-sequence cases. This is the sweep I should
have done at #15.
--ISSUE
Content-Type: application/issue
ID: 17
Type: feature
Title: docs: state up front that this is not tool-calling
Status: done
Priority: medium
Created: 2026-06-18
Module: sekft
Relationships:
Description: Add a prominent clarification to the README intro that sekft trains
shell operation, not function-calling: the model is given no typed
tool API or JSON-schema action list; it writes plain-text commands
at a real prompt with the whole system as its action space,
discovered like a person does.
--ISSUE
Content-Type: application/issue
ID: 18
Type: feature
Title: docs: deliver the not-tool-calling intro clarification (1.0.3)
Status: done
Priority: medium
Created: 2026-06-18
Module: sekft
Relationships:
Description: Deliver the not-tool-calling clarification to the README intro and
add the 1.0.3 changelog entry. The prior issue's merge carried only
the todo status; the step-4 work commit was skipped.

View file

@ -28,7 +28,7 @@ from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
from tiararodney.posix_sdc.factory.rollout import rollout from tiararodney.posix_sdc.factory.rollout import rollout
from tiararodney.posix_sdc.schema import Scenario from tiararodney.posix_sdc.schema import Scenario
from .sft import render_prompt_ids from .sft import normalize_for_template
def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64, def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
@ -49,7 +49,9 @@ def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
model.eval() model.eval()
def operator(messages: list[dict[str, str]]) -> str: def operator(messages: list[dict[str, str]]) -> str:
ids = render_prompt_ids(tok, messages, model.device) msgs = normalize_for_template(messages)
ids = tok.apply_chat_template(
msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
with torch.no_grad(): with torch.no_grad():
out = model.generate( out = model.generate(
ids, max_new_tokens=max_new_tokens, ids, max_new_tokens=max_new_tokens,

View file

@ -32,7 +32,7 @@ from peft import (LoraConfig, PeftModel, get_peft_model,
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
DataCollatorForSeq2Seq, Trainer, TrainingArguments) DataCollatorForSeq2Seq, Trainer, TrainingArguments)
from .sft import build_masked_example, iter_keepers, render_prompt_ids from .sft import build_masked_example, iter_keepers, normalize_for_template
LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"] LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"]
@ -132,7 +132,9 @@ class Resident:
pm.eval() pm.eval()
def operator(messages: list[dict[str, str]]) -> str: def operator(messages: list[dict[str, str]]) -> str:
ids = render_prompt_ids(self.tok, messages, pm.device) msgs = normalize_for_template(messages)
ids = self.tok.apply_chat_template(
msgs, add_generation_prompt=True, return_tensors="pt").to(pm.device)
with torch.no_grad(): with torch.no_grad():
o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0, o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0,
temperature=max(temperature, 1e-2), temperature=max(temperature, 1e-2),

View file

@ -62,35 +62,6 @@ def normalize_for_template(messages: list[dict[str, str]]) -> list[dict[str, str
return out return out
def _input_ids(enc: Any) -> Any:
"""The id sequence from an ``apply_chat_template`` result. transformers >= 5
returns a ``BatchEncoding`` (``{input_ids: ...}``) where 4.x returned the
bare ``list[int]`` / tensor; return the ids either way. Passing the dict on
unfixed breaks everything downstream: the trainer's prefix-differencing sees
``len`` as the key count, and ``model.generate`` does ``inputs.shape[0]`` on
a dict and raises ``AttributeError``."""
return enc["input_ids"] if hasattr(enc, "keys") else enc
def _render_ids(tokenizer: Any, msgs: list[dict[str, str]]) -> Any:
"""Token ids for a rendered conversation (no generation prompt), as a flat
sequence see :func:`_input_ids` for the BatchEncoding normalisation."""
return _input_ids(tokenizer.apply_chat_template(msgs, add_generation_prompt=False))
def render_prompt_ids(tokenizer: Any, messages: list[dict[str, str]],
device: Any = None) -> Any:
"""The tokenized generation prompt for an operator: canonicalise the turns,
append the assistant generation prompt, and return the ``input_ids`` tensor
(extracted from the BatchEncoding on transformers >= 5), moved to ``device``
if given. Shared by the eval and resident operators so neither feeds a
BatchEncoding to ``model.generate``."""
enc = tokenizer.apply_chat_template(
normalize_for_template(messages), add_generation_prompt=True, return_tensors="pt")
ids = _input_ids(enc)
return ids.to(device) if device is not None else ids
def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]: def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]:
"""Tokenize a trajectory with the tokenizer's OWN chat template and build an """Tokenize a trajectory with the tokenizer's OWN chat template and build an
assistant-only loss mask. assistant-only loss mask.
@ -105,11 +76,11 @@ def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict
non-additive one raises rather than silently mis-mask. non-additive one raises rather than silently mis-mask.
""" """
msgs = normalize_for_template(messages) msgs = normalize_for_template(messages)
ids = _render_ids(tokenizer, msgs) ids = tokenizer.apply_chat_template(msgs, add_generation_prompt=False)
labels = [-100] * len(ids) labels = [-100] * len(ids)
prev: list[int] = [] prev: list[int] = []
for i, m in enumerate(msgs): for i, m in enumerate(msgs):
upto = _render_ids(tokenizer, msgs[:i + 1]) upto = tokenizer.apply_chat_template(msgs[:i + 1], add_generation_prompt=False)
if ids[:len(upto)] != upto or upto[:len(prev)] != prev: if ids[:len(upto)] != upto or upto[:len(prev)] != prev:
raise ValueError("chat template is not additive; cannot derive an " raise ValueError("chat template is not additive; cannot derive an "
"assistant loss mask by token-prefix differencing") "assistant loss mask by token-prefix differencing")

View file

@ -27,15 +27,6 @@ class FakeTok:
return toks return toks
class FakeTokBatchEncoding(FakeTok):
"""Like FakeTok, but returns a dict as transformers >= 5's
``apply_chat_template`` does (a BatchEncoding), to exercise the id-extraction."""
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
return_tensors: Any = None) -> dict[str, list[str]]:
return {"input_ids": super().apply_chat_template(msgs, add_generation_prompt, return_tensors)}
def test_normalize_folds_system_and_merges_consecutive() -> None: def test_normalize_folds_system_and_merges_consecutive() -> None:
raw = [ raw = [
{"role": "system", "content": "orient"}, {"role": "system", "content": "orient"},
@ -72,35 +63,6 @@ def test_mask_trains_assistant_turns_only() -> None:
assert {"orient", "login", "out"} <= set(masked) # environment masked assert {"orient", "login", "out"} <= set(masked) # environment masked
def test_mask_handles_batchencoding_return() -> None:
# transformers >= 5 returns a BatchEncoding ({input_ids: [...]}) rather than a
# bare list[int]; the mask must come out identical. Regression for the 5.x bug
# that made every real template look "not additive".
raw = [
{"role": "user", "content": "login"},
{"role": "assistant", "content": "cat f"},
{"role": "user", "content": "out"},
{"role": "assistant", "content": "exit"},
]
assert (sft.build_masked_example(raw, FakeTokBatchEncoding())
== sft.build_masked_example(raw, FakeTok()))
def test_input_ids_extracts_from_batchencoding_or_passthrough() -> None:
# BatchEncoding (transformers 5.x) -> its input_ids; bare list/tensor (4.x) -> itself
assert sft._input_ids({"input_ids": [1, 2, 3], "attention_mask": [1, 1, 1]}) == [1, 2, 3]
assert sft._input_ids([4, 5, 6]) == [4, 5, 6]
def test_render_prompt_ids_normalises_and_appends_generation_prompt() -> None:
# the generation operators rely on this: fold + append <assistant>, return ids
# (not a BatchEncoding) so model.generate doesn't choke on a dict.
raw = [{"role": "system", "content": "orient"}, {"role": "user", "content": "go"}]
ids = sft.render_prompt_ids(FakeTok(), raw)
assert ids[-1] == "<assistant>" # generation prompt appended
assert {"orient", "go"} <= set(ids) # system folded into the user turn
def test_mask_raises_on_non_additive_template() -> None: def test_mask_raises_on_non_additive_template() -> None:
class BadTok: class BadTok:
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False, def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,