Compare commits
No commits in common. "master" and "v1.0.0" have entirely different histories.
7 changed files with 10 additions and 189 deletions
35
CHANGELOG.md
35
CHANGELOG.md
|
|
@ -7,38 +7,6 @@ are documented in this file.
|
||||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
||||||
and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
|
||||||
## [1.0.3] - 2026-06-18
|
|
||||||
|
|
||||||
### Changed
|
|
||||||
- The README intro now states up front that this is **not tool-calling**: sekft
|
|
||||||
trains shell operation, not function-calling; the model is given no typed tool
|
|
||||||
API or JSON-schema action list, and writes plain-text commands at a real prompt
|
|
||||||
with the whole system as its action space.
|
|
||||||
|
|
||||||
## [1.0.2] - 2026-06-18
|
|
||||||
|
|
||||||
### Fixed
|
|
||||||
- The generation operators (`sekft-eval`, `sekft-resident`) passed the
|
|
||||||
`BatchEncoding` from `apply_chat_template(..., return_tensors="pt")` straight
|
|
||||||
to `model.generate`, which does `inputs.shape[0]` and raised `AttributeError`
|
|
||||||
on transformers ≥ 5 — the holdout eval crashed on its first scenario. 1.0.1
|
|
||||||
fixed only the trainer's masking; this sweeps the generation path too. A shared
|
|
||||||
`_input_ids` helper and a `render_prompt_ids` function now extract the id
|
|
||||||
tensor for both operators, with unit tests for the BatchEncoding and bare
|
|
||||||
shapes.
|
|
||||||
|
|
||||||
## [1.0.1] - 2026-06-18
|
|
||||||
|
|
||||||
### Fixed
|
|
||||||
- `build_masked_example` could not derive the assistant mask on transformers
|
|
||||||
≥ 5: `apply_chat_template` now returns a `BatchEncoding` (`{input_ids: [...]}`)
|
|
||||||
where 4.x returned a bare `list[int]`, so the render was treated as a dict and
|
|
||||||
the prefix-differencing spuriously raised "chat template is not additive" on
|
|
||||||
every real model. The id sequence is now extracted either way; verified the
|
|
||||||
assistant-only mask against `mistralai/Mistral-7B-Instruct-v0.2`. The
|
|
||||||
fake-tokenizer test gained a `BatchEncoding`-returning variant so this can't
|
|
||||||
regress.
|
|
||||||
|
|
||||||
## [1.0.0] - 2026-06-18
|
## [1.0.0] - 2026-06-18
|
||||||
|
|
||||||
First release: the training and evaluation pipeline that turns posix-sdc
|
First release: the training and evaluation pipeline that turns posix-sdc
|
||||||
|
|
@ -70,7 +38,4 @@ trajectories into a fine-tuned shell operator.
|
||||||
mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft);
|
mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft);
|
||||||
and a dependency on `posix-sdc[hub]`. Released under GPL-2.0.
|
and a dependency on `posix-sdc[hub]`. Released under GPL-2.0.
|
||||||
|
|
||||||
[1.0.3]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.2...v1.0.3
|
|
||||||
[1.0.2]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.1...v1.0.2
|
|
||||||
[1.0.1]: https://git.code.tiararodney.com/tiara/sekft/compare/v1.0.0...v1.0.1
|
|
||||||
[1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0
|
[1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0
|
||||||
|
|
|
||||||
|
|
@ -5,12 +5,6 @@ land with **no imperative**, discover where directives live, learn the provider
|
||||||
from its own self-documentation, do the work, and terminate (`exit` on success,
|
from its own self-documentation, do the work, and terminate (`exit` on success,
|
||||||
`panic` when genuinely blocked).
|
`panic` when genuinely blocked).
|
||||||
|
|
||||||
> **Not tool-calling.** sekft trains shell operation, not function-calling. The
|
|
||||||
> model is given no typed tool API and no JSON-schema action list; it writes
|
|
||||||
> plain-text commands at a real prompt, with the whole system as its action
|
|
||||||
> space, discovered like a person would (`--help`, `man`, `ls`) rather than
|
|
||||||
> enumerated up front.
|
|
||||||
|
|
||||||
sekft is the **training half**. The dataset and the synthetic-data factory live
|
sekft is the **training half**. The dataset and the synthetic-data factory live
|
||||||
in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
|
in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
|
||||||
depends on. Here live the trainer, the behavioural evaluator, and the
|
depends on. Here live the trainer, the behavioural evaluator, and the
|
||||||
|
|
|
||||||
75
TODO
75
TODO
|
|
@ -249,78 +249,3 @@ Description: The lock committed with the triplet (#13) predated the published
|
||||||
and its transitive deps into the lock. Commit the refreshed
|
and its transitive deps into the lock. Commit the refreshed
|
||||||
Pipfile.lock so the next machine installs the published wheel with
|
Pipfile.lock so the next machine installs the published wheel with
|
||||||
the Hub path available.
|
the Hub path available.
|
||||||
|
|
||||||
--ISSUE
|
|
||||||
Content-Type: application/issue
|
|
||||||
ID: 15
|
|
||||||
Type: bugfix
|
|
||||||
Title: apply_chat_template returns BatchEncoding on transformers 5.x
|
|
||||||
Status: done
|
|
||||||
Priority: high
|
|
||||||
Created: 2026-06-18
|
|
||||||
Module: sekft
|
|
||||||
Relationships:
|
|
||||||
Description: build_masked_example assumed apply_chat_template returns a flat
|
|
||||||
list[int] (transformers 4.x). On transformers 5.x it returns a
|
|
||||||
BatchEncoding ({input_ids: [...]}), so ids was a dict, len(ids) was
|
|
||||||
the key count, and the prefix-differencing spuriously raised 'chat
|
|
||||||
template is not additive' on every real model (verified against
|
|
||||||
mistralai/Mistral-7B-Instruct-v0.2). The masking logic is sound and
|
|
||||||
the Mistral template is additive; only the return type needs
|
|
||||||
normalising. Add a _render_ids helper that extracts input_ids when
|
|
||||||
the result is dict-like, and use it for both renders. The
|
|
||||||
fake-tokenizer test returned a bare list and missed this, so add a
|
|
||||||
BatchEncoding-returning fake and assert the mask matches.
|
|
||||||
|
|
||||||
--ISSUE
|
|
||||||
Content-Type: application/issue
|
|
||||||
ID: 16
|
|
||||||
Type: bugfix
|
|
||||||
Title: generation operators pass BatchEncoding to generate (transformers 5.x)
|
|
||||||
Status: done
|
|
||||||
Priority: high
|
|
||||||
Created: 2026-06-18
|
|
||||||
Module: sekft
|
|
||||||
Relationships:
|
|
||||||
Description: The same transformers 5.x return-type change that broke
|
|
||||||
build_masked_example (#15) also breaks the generation path:
|
|
||||||
apply_chat_template(add_generation_prompt=True,
|
|
||||||
return_tensors='pt') returns a BatchEncoding, and eval.py and
|
|
||||||
resident.py pass it straight to model.generate(), which does
|
|
||||||
inputs_tensor.shape[0] -> AttributeError (the holdout eval crashed
|
|
||||||
here on scenario 1). #15 only fixed the trainer. Factor the id
|
|
||||||
extraction into a shared _input_ids helper, add
|
|
||||||
render_prompt_ids(tokenizer, messages, device) in sft.py, and use
|
|
||||||
it in both operators. Add a unit test for _input_ids covering the
|
|
||||||
BatchEncoding and bare-sequence cases. This is the sweep I should
|
|
||||||
have done at #15.
|
|
||||||
|
|
||||||
--ISSUE
|
|
||||||
Content-Type: application/issue
|
|
||||||
ID: 17
|
|
||||||
Type: feature
|
|
||||||
Title: docs: state up front that this is not tool-calling
|
|
||||||
Status: done
|
|
||||||
Priority: medium
|
|
||||||
Created: 2026-06-18
|
|
||||||
Module: sekft
|
|
||||||
Relationships:
|
|
||||||
Description: Add a prominent clarification to the README intro that sekft trains
|
|
||||||
shell operation, not function-calling: the model is given no typed
|
|
||||||
tool API or JSON-schema action list; it writes plain-text commands
|
|
||||||
at a real prompt with the whole system as its action space,
|
|
||||||
discovered like a person does.
|
|
||||||
|
|
||||||
--ISSUE
|
|
||||||
Content-Type: application/issue
|
|
||||||
ID: 18
|
|
||||||
Type: feature
|
|
||||||
Title: docs: deliver the not-tool-calling intro clarification (1.0.3)
|
|
||||||
Status: done
|
|
||||||
Priority: medium
|
|
||||||
Created: 2026-06-18
|
|
||||||
Module: sekft
|
|
||||||
Relationships:
|
|
||||||
Description: Deliver the not-tool-calling clarification to the README intro and
|
|
||||||
add the 1.0.3 changelog entry. The prior issue's merge carried only
|
|
||||||
the todo status; the step-4 work commit was skipped.
|
|
||||||
|
|
|
||||||
|
|
@ -28,7 +28,7 @@ from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
|
||||||
from tiararodney.posix_sdc.factory.rollout import rollout
|
from tiararodney.posix_sdc.factory.rollout import rollout
|
||||||
from tiararodney.posix_sdc.schema import Scenario
|
from tiararodney.posix_sdc.schema import Scenario
|
||||||
|
|
||||||
from .sft import render_prompt_ids
|
from .sft import normalize_for_template
|
||||||
|
|
||||||
|
|
||||||
def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
|
def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
|
||||||
|
|
@ -49,7 +49,9 @@ def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
|
||||||
model.eval()
|
model.eval()
|
||||||
|
|
||||||
def operator(messages: list[dict[str, str]]) -> str:
|
def operator(messages: list[dict[str, str]]) -> str:
|
||||||
ids = render_prompt_ids(tok, messages, model.device)
|
msgs = normalize_for_template(messages)
|
||||||
|
ids = tok.apply_chat_template(
|
||||||
|
msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
out = model.generate(
|
out = model.generate(
|
||||||
ids, max_new_tokens=max_new_tokens,
|
ids, max_new_tokens=max_new_tokens,
|
||||||
|
|
|
||||||
|
|
@ -32,7 +32,7 @@ from peft import (LoraConfig, PeftModel, get_peft_model,
|
||||||
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
|
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
|
||||||
DataCollatorForSeq2Seq, Trainer, TrainingArguments)
|
DataCollatorForSeq2Seq, Trainer, TrainingArguments)
|
||||||
|
|
||||||
from .sft import build_masked_example, iter_keepers, render_prompt_ids
|
from .sft import build_masked_example, iter_keepers, normalize_for_template
|
||||||
|
|
||||||
LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"]
|
LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"]
|
||||||
|
|
||||||
|
|
@ -132,7 +132,9 @@ class Resident:
|
||||||
pm.eval()
|
pm.eval()
|
||||||
|
|
||||||
def operator(messages: list[dict[str, str]]) -> str:
|
def operator(messages: list[dict[str, str]]) -> str:
|
||||||
ids = render_prompt_ids(self.tok, messages, pm.device)
|
msgs = normalize_for_template(messages)
|
||||||
|
ids = self.tok.apply_chat_template(
|
||||||
|
msgs, add_generation_prompt=True, return_tensors="pt").to(pm.device)
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0,
|
o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0,
|
||||||
temperature=max(temperature, 1e-2),
|
temperature=max(temperature, 1e-2),
|
||||||
|
|
|
||||||
|
|
@ -62,35 +62,6 @@ def normalize_for_template(messages: list[dict[str, str]]) -> list[dict[str, str
|
||||||
return out
|
return out
|
||||||
|
|
||||||
|
|
||||||
def _input_ids(enc: Any) -> Any:
|
|
||||||
"""The id sequence from an ``apply_chat_template`` result. transformers >= 5
|
|
||||||
returns a ``BatchEncoding`` (``{input_ids: ...}``) where 4.x returned the
|
|
||||||
bare ``list[int]`` / tensor; return the ids either way. Passing the dict on
|
|
||||||
unfixed breaks everything downstream: the trainer's prefix-differencing sees
|
|
||||||
``len`` as the key count, and ``model.generate`` does ``inputs.shape[0]`` on
|
|
||||||
a dict and raises ``AttributeError``."""
|
|
||||||
return enc["input_ids"] if hasattr(enc, "keys") else enc
|
|
||||||
|
|
||||||
|
|
||||||
def _render_ids(tokenizer: Any, msgs: list[dict[str, str]]) -> Any:
|
|
||||||
"""Token ids for a rendered conversation (no generation prompt), as a flat
|
|
||||||
sequence — see :func:`_input_ids` for the BatchEncoding normalisation."""
|
|
||||||
return _input_ids(tokenizer.apply_chat_template(msgs, add_generation_prompt=False))
|
|
||||||
|
|
||||||
|
|
||||||
def render_prompt_ids(tokenizer: Any, messages: list[dict[str, str]],
|
|
||||||
device: Any = None) -> Any:
|
|
||||||
"""The tokenized generation prompt for an operator: canonicalise the turns,
|
|
||||||
append the assistant generation prompt, and return the ``input_ids`` tensor
|
|
||||||
(extracted from the BatchEncoding on transformers >= 5), moved to ``device``
|
|
||||||
if given. Shared by the eval and resident operators so neither feeds a
|
|
||||||
BatchEncoding to ``model.generate``."""
|
|
||||||
enc = tokenizer.apply_chat_template(
|
|
||||||
normalize_for_template(messages), add_generation_prompt=True, return_tensors="pt")
|
|
||||||
ids = _input_ids(enc)
|
|
||||||
return ids.to(device) if device is not None else ids
|
|
||||||
|
|
||||||
|
|
||||||
def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]:
|
def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]:
|
||||||
"""Tokenize a trajectory with the tokenizer's OWN chat template and build an
|
"""Tokenize a trajectory with the tokenizer's OWN chat template and build an
|
||||||
assistant-only loss mask.
|
assistant-only loss mask.
|
||||||
|
|
@ -105,11 +76,11 @@ def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict
|
||||||
non-additive one raises rather than silently mis-mask.
|
non-additive one raises rather than silently mis-mask.
|
||||||
"""
|
"""
|
||||||
msgs = normalize_for_template(messages)
|
msgs = normalize_for_template(messages)
|
||||||
ids = _render_ids(tokenizer, msgs)
|
ids = tokenizer.apply_chat_template(msgs, add_generation_prompt=False)
|
||||||
labels = [-100] * len(ids)
|
labels = [-100] * len(ids)
|
||||||
prev: list[int] = []
|
prev: list[int] = []
|
||||||
for i, m in enumerate(msgs):
|
for i, m in enumerate(msgs):
|
||||||
upto = _render_ids(tokenizer, msgs[:i + 1])
|
upto = tokenizer.apply_chat_template(msgs[:i + 1], add_generation_prompt=False)
|
||||||
if ids[:len(upto)] != upto or upto[:len(prev)] != prev:
|
if ids[:len(upto)] != upto or upto[:len(prev)] != prev:
|
||||||
raise ValueError("chat template is not additive; cannot derive an "
|
raise ValueError("chat template is not additive; cannot derive an "
|
||||||
"assistant loss mask by token-prefix differencing")
|
"assistant loss mask by token-prefix differencing")
|
||||||
|
|
|
||||||
|
|
@ -27,15 +27,6 @@ class FakeTok:
|
||||||
return toks
|
return toks
|
||||||
|
|
||||||
|
|
||||||
class FakeTokBatchEncoding(FakeTok):
|
|
||||||
"""Like FakeTok, but returns a dict as transformers >= 5's
|
|
||||||
``apply_chat_template`` does (a BatchEncoding), to exercise the id-extraction."""
|
|
||||||
|
|
||||||
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
|
|
||||||
return_tensors: Any = None) -> dict[str, list[str]]:
|
|
||||||
return {"input_ids": super().apply_chat_template(msgs, add_generation_prompt, return_tensors)}
|
|
||||||
|
|
||||||
|
|
||||||
def test_normalize_folds_system_and_merges_consecutive() -> None:
|
def test_normalize_folds_system_and_merges_consecutive() -> None:
|
||||||
raw = [
|
raw = [
|
||||||
{"role": "system", "content": "orient"},
|
{"role": "system", "content": "orient"},
|
||||||
|
|
@ -72,35 +63,6 @@ def test_mask_trains_assistant_turns_only() -> None:
|
||||||
assert {"orient", "login", "out"} <= set(masked) # environment masked
|
assert {"orient", "login", "out"} <= set(masked) # environment masked
|
||||||
|
|
||||||
|
|
||||||
def test_mask_handles_batchencoding_return() -> None:
|
|
||||||
# transformers >= 5 returns a BatchEncoding ({input_ids: [...]}) rather than a
|
|
||||||
# bare list[int]; the mask must come out identical. Regression for the 5.x bug
|
|
||||||
# that made every real template look "not additive".
|
|
||||||
raw = [
|
|
||||||
{"role": "user", "content": "login"},
|
|
||||||
{"role": "assistant", "content": "cat f"},
|
|
||||||
{"role": "user", "content": "out"},
|
|
||||||
{"role": "assistant", "content": "exit"},
|
|
||||||
]
|
|
||||||
assert (sft.build_masked_example(raw, FakeTokBatchEncoding())
|
|
||||||
== sft.build_masked_example(raw, FakeTok()))
|
|
||||||
|
|
||||||
|
|
||||||
def test_input_ids_extracts_from_batchencoding_or_passthrough() -> None:
|
|
||||||
# BatchEncoding (transformers 5.x) -> its input_ids; bare list/tensor (4.x) -> itself
|
|
||||||
assert sft._input_ids({"input_ids": [1, 2, 3], "attention_mask": [1, 1, 1]}) == [1, 2, 3]
|
|
||||||
assert sft._input_ids([4, 5, 6]) == [4, 5, 6]
|
|
||||||
|
|
||||||
|
|
||||||
def test_render_prompt_ids_normalises_and_appends_generation_prompt() -> None:
|
|
||||||
# the generation operators rely on this: fold + append <assistant>, return ids
|
|
||||||
# (not a BatchEncoding) so model.generate doesn't choke on a dict.
|
|
||||||
raw = [{"role": "system", "content": "orient"}, {"role": "user", "content": "go"}]
|
|
||||||
ids = sft.render_prompt_ids(FakeTok(), raw)
|
|
||||||
assert ids[-1] == "<assistant>" # generation prompt appended
|
|
||||||
assert {"orient", "go"} <= set(ids) # system folded into the user turn
|
|
||||||
|
|
||||||
|
|
||||||
def test_mask_raises_on_non_additive_template() -> None:
|
def test_mask_raises_on_non_additive_template() -> None:
|
||||||
class BadTok:
|
class BadTok:
|
||||||
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
|
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue