Merge branch 'develop'
This commit is contained in:
commit
705b4a028b
18 changed files with 3276 additions and 14 deletions
41
CHANGELOG.md
Normal file
41
CHANGELOG.md
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
# Changelog
|
||||
|
||||
All notable changes to sekft, the shell-operator SFT trainer behind the
|
||||
[posix-sdc](https://huggingface.co/datasets/tiararodney/posix-sdc) experiment,
|
||||
are documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
||||
and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [1.0.0] - 2026-06-18
|
||||
|
||||
First release: the training and evaluation pipeline that turns posix-sdc
|
||||
trajectories into a fine-tuned shell operator.
|
||||
|
||||
### Added
|
||||
- `sekft-train`: LoRA / QLoRA supervised fine-tuning of a base model on
|
||||
shell-operation trajectories, with an **assistant-only loss mask** derived by
|
||||
token-prefix differencing — the commands and the terminal `exit` / `panic`
|
||||
token are trained; the environment turns (orientation, prompts, command
|
||||
output) are masked to `-100`. The render uses the tokenizer's own
|
||||
`apply_chat_template`, so training matches what the serving harness sends
|
||||
(train = serve), with `normalize_for_template` canonicalising trajectories for
|
||||
instruct templates that have no system role and require strict user/assistant
|
||||
alternation.
|
||||
- Three sources of training data: a directory of raw rollout `.json`
|
||||
(keep-filtered), a curated `.jsonl` corpus, or the published posix-sdc corpus
|
||||
over the Hugging Face Hub (`--hub`).
|
||||
- `--inspect` for mask and token statistics without training, and structured
|
||||
stderr logging across every phase (`-v` / `-q`): per-trajectory and progress
|
||||
lines while the corpus is tokenized, dataset accounting that warns on dropped
|
||||
(over-length / empty-mask) trajectories, and the per-step training curve.
|
||||
- `sekft-eval`: behavioural evaluation that drops the tuned model into held-out
|
||||
scenarios with no scaffold and scores whether it operates and terminates.
|
||||
- `sekft-resident`: a resident-base harness that loads the base model once and
|
||||
fits several adapters without reloading, for paired / STaR-style runs.
|
||||
- Packaging: the `tiararodney.sekft` namespace package with `sekft-train`,
|
||||
`sekft-eval`, and `sekft-resident` console scripts; a typed (`py.typed`),
|
||||
mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft);
|
||||
and a dependency on `posix-sdc[hub]`. Released under GPL-2.0.
|
||||
|
||||
[1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0
|
||||
14
Dockerfile
14
Dockerfile
|
|
@ -1,14 +0,0 @@
|
|||
# Minimal dash-in-a-box for sekft trajectory generation.
|
||||
# docker build -t sekft-dash .
|
||||
#
|
||||
# dash as the operated shell (strict POSIX, no bashisms), busybox applets for
|
||||
# the coreutils. busybox is intentionally close to minimal POSIX so trajectories
|
||||
# transfer toward sek rather than encoding GNU-isms. Add `coreutils findutils
|
||||
# grep sed` here if you want GNU semantics instead.
|
||||
FROM alpine:3.19
|
||||
RUN apk add --no-cache dash \
|
||||
&& ln -sf /usr/bin/dash /bin/dash \
|
||||
&& ln -sf /usr/bin/dash /bin/sh
|
||||
# /work is the default arena; provider files land at their absolute paths.
|
||||
RUN mkdir -p /work
|
||||
WORKDIR /work
|
||||
338
LICENSE
Normal file
338
LICENSE
Normal file
|
|
@ -0,0 +1,338 @@
|
|||
GNU GENERAL PUBLIC LICENSE
|
||||
Version 2, June 1991
|
||||
|
||||
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
|
||||
<https://fsf.org/>
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
Preamble
|
||||
|
||||
The licenses for most software are designed to take away your
|
||||
freedom to share and change it. By contrast, the GNU General Public
|
||||
License is intended to guarantee your freedom to share and change free
|
||||
software--to make sure the software is free for all its users. This
|
||||
General Public License applies to most of the Free Software
|
||||
Foundation's software and to any other program whose authors commit to
|
||||
using it. (Some other Free Software Foundation software is covered by
|
||||
the GNU Lesser General Public License instead.) You can apply it to
|
||||
your programs, too.
|
||||
|
||||
When we speak of free software, we are referring to freedom, not
|
||||
price. Our General Public Licenses are designed to make sure that you
|
||||
have the freedom to distribute copies of free software (and charge for
|
||||
this service if you wish), that you receive source code or can get it
|
||||
if you want it, that you can change the software or use pieces of it
|
||||
in new free programs; and that you know you can do these things.
|
||||
|
||||
To protect your rights, we need to make restrictions that forbid
|
||||
anyone to deny you these rights or to ask you to surrender the rights.
|
||||
These restrictions translate to certain responsibilities for you if you
|
||||
distribute copies of the software, or if you modify it.
|
||||
|
||||
For example, if you distribute copies of such a program, whether
|
||||
gratis or for a fee, you must give the recipients all the rights that
|
||||
you have. You must make sure that they, too, receive or can get the
|
||||
source code. And you must show them these terms so they know their
|
||||
rights.
|
||||
|
||||
We protect your rights with two steps: (1) copyright the software, and
|
||||
(2) offer you this license which gives you legal permission to copy,
|
||||
distribute and/or modify the software.
|
||||
|
||||
Also, for each author's protection and ours, we want to make certain
|
||||
that everyone understands that there is no warranty for this free
|
||||
software. If the software is modified by someone else and passed on, we
|
||||
want its recipients to know that what they have is not the original, so
|
||||
that any problems introduced by others will not reflect on the original
|
||||
authors' reputations.
|
||||
|
||||
Finally, any free program is threatened constantly by software
|
||||
patents. We wish to avoid the danger that redistributors of a free
|
||||
program will individually obtain patent licenses, in effect making the
|
||||
program proprietary. To prevent this, we have made it clear that any
|
||||
patent must be licensed for everyone's free use or not licensed at all.
|
||||
|
||||
The precise terms and conditions for copying, distribution and
|
||||
modification follow.
|
||||
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||||
|
||||
0. This License applies to any program or other work which contains
|
||||
a notice placed by the copyright holder saying it may be distributed
|
||||
under the terms of this General Public License. The "Program", below,
|
||||
refers to any such program or work, and a "work based on the Program"
|
||||
means either the Program or any derivative work under copyright law:
|
||||
that is to say, a work containing the Program or a portion of it,
|
||||
either verbatim or with modifications and/or translated into another
|
||||
language. (Hereinafter, translation is included without limitation in
|
||||
the term "modification".) Each licensee is addressed as "you".
|
||||
|
||||
Activities other than copying, distribution and modification are not
|
||||
covered by this License; they are outside its scope. The act of
|
||||
running the Program is not restricted, and the output from the Program
|
||||
is covered only if its contents constitute a work based on the
|
||||
Program (independent of having been made by running the Program).
|
||||
Whether that is true depends on what the Program does.
|
||||
|
||||
1. You may copy and distribute verbatim copies of the Program's
|
||||
source code as you receive it, in any medium, provided that you
|
||||
conspicuously and appropriately publish on each copy an appropriate
|
||||
copyright notice and disclaimer of warranty; keep intact all the
|
||||
notices that refer to this License and to the absence of any warranty;
|
||||
and give any other recipients of the Program a copy of this License
|
||||
along with the Program.
|
||||
|
||||
You may charge a fee for the physical act of transferring a copy, and
|
||||
you may at your option offer warranty protection in exchange for a fee.
|
||||
|
||||
2. You may modify your copy or copies of the Program or any portion
|
||||
of it, thus forming a work based on the Program, and copy and
|
||||
distribute such modifications or work under the terms of Section 1
|
||||
above, provided that you also meet all of these conditions:
|
||||
|
||||
a) You must cause the modified files to carry prominent notices
|
||||
stating that you changed the files and the date of any change.
|
||||
|
||||
b) You must cause any work that you distribute or publish, that in
|
||||
whole or in part contains or is derived from the Program or any
|
||||
part thereof, to be licensed as a whole at no charge to all third
|
||||
parties under the terms of this License.
|
||||
|
||||
c) If the modified program normally reads commands interactively
|
||||
when run, you must cause it, when started running for such
|
||||
interactive use in the most ordinary way, to print or display an
|
||||
announcement including an appropriate copyright notice and a
|
||||
notice that there is no warranty (or else, saying that you provide
|
||||
a warranty) and that users may redistribute the program under
|
||||
these conditions, and telling the user how to view a copy of this
|
||||
License. (Exception: if the Program itself is interactive but
|
||||
does not normally print such an announcement, your work based on
|
||||
the Program is not required to print an announcement.)
|
||||
|
||||
These requirements apply to the modified work as a whole. If
|
||||
identifiable sections of that work are not derived from the Program,
|
||||
and can be reasonably considered independent and separate works in
|
||||
themselves, then this License, and its terms, do not apply to those
|
||||
sections when you distribute them as separate works. But when you
|
||||
distribute the same sections as part of a whole which is a work based
|
||||
on the Program, the distribution of the whole must be on the terms of
|
||||
this License, whose permissions for other licensees extend to the
|
||||
entire whole, and thus to each and every part regardless of who wrote it.
|
||||
|
||||
Thus, it is not the intent of this section to claim rights or contest
|
||||
your rights to work written entirely by you; rather, the intent is to
|
||||
exercise the right to control the distribution of derivative or
|
||||
collective works based on the Program.
|
||||
|
||||
In addition, mere aggregation of another work not based on the Program
|
||||
with the Program (or with a work based on the Program) on a volume of
|
||||
a storage or distribution medium does not bring the other work under
|
||||
the scope of this License.
|
||||
|
||||
3. You may copy and distribute the Program (or a work based on it,
|
||||
under Section 2) in object code or executable form under the terms of
|
||||
Sections 1 and 2 above provided that you also do one of the following:
|
||||
|
||||
a) Accompany it with the complete corresponding machine-readable
|
||||
source code, which must be distributed under the terms of Sections
|
||||
1 and 2 above on a medium customarily used for software interchange; or,
|
||||
|
||||
b) Accompany it with a written offer, valid for at least three
|
||||
years, to give any third party, for a charge no more than your
|
||||
cost of physically performing source distribution, a complete
|
||||
machine-readable copy of the corresponding source code, to be
|
||||
distributed under the terms of Sections 1 and 2 above on a medium
|
||||
customarily used for software interchange; or,
|
||||
|
||||
c) Accompany it with the information you received as to the offer
|
||||
to distribute corresponding source code. (This alternative is
|
||||
allowed only for noncommercial distribution and only if you
|
||||
received the program in object code or executable form with such
|
||||
an offer, in accord with Subsection b above.)
|
||||
|
||||
The source code for a work means the preferred form of the work for
|
||||
making modifications to it. For an executable work, complete source
|
||||
code means all the source code for all modules it contains, plus any
|
||||
associated interface definition files, plus the scripts used to
|
||||
control compilation and installation of the executable. However, as a
|
||||
special exception, the source code distributed need not include
|
||||
anything that is normally distributed (in either source or binary
|
||||
form) with the major components (compiler, kernel, and so on) of the
|
||||
operating system on which the executable runs, unless that component
|
||||
itself accompanies the executable.
|
||||
|
||||
If distribution of executable or object code is made by offering
|
||||
access to copy from a designated place, then offering equivalent
|
||||
access to copy the source code from the same place counts as
|
||||
distribution of the source code, even though third parties are not
|
||||
compelled to copy the source along with the object code.
|
||||
|
||||
4. You may not copy, modify, sublicense, or distribute the Program
|
||||
except as expressly provided under this License. Any attempt
|
||||
otherwise to copy, modify, sublicense or distribute the Program is
|
||||
void, and will automatically terminate your rights under this License.
|
||||
However, parties who have received copies, or rights, from you under
|
||||
this License will not have their licenses terminated so long as such
|
||||
parties remain in full compliance.
|
||||
|
||||
5. You are not required to accept this License, since you have not
|
||||
signed it. However, nothing else grants you permission to modify or
|
||||
distribute the Program or its derivative works. These actions are
|
||||
prohibited by law if you do not accept this License. Therefore, by
|
||||
modifying or distributing the Program (or any work based on the
|
||||
Program), you indicate your acceptance of this License to do so, and
|
||||
all its terms and conditions for copying, distributing or modifying
|
||||
the Program or works based on it.
|
||||
|
||||
6. Each time you redistribute the Program (or any work based on the
|
||||
Program), the recipient automatically receives a license from the
|
||||
original licensor to copy, distribute or modify the Program subject to
|
||||
these terms and conditions. You may not impose any further
|
||||
restrictions on the recipients' exercise of the rights granted herein.
|
||||
You are not responsible for enforcing compliance by third parties to
|
||||
this License.
|
||||
|
||||
7. If, as a consequence of a court judgment or allegation of patent
|
||||
infringement or for any other reason (not limited to patent issues),
|
||||
conditions are imposed on you (whether by court order, agreement or
|
||||
otherwise) that contradict the conditions of this License, they do not
|
||||
excuse you from the conditions of this License. If you cannot
|
||||
distribute so as to satisfy simultaneously your obligations under this
|
||||
License and any other pertinent obligations, then as a consequence you
|
||||
may not distribute the Program at all. For example, if a patent
|
||||
license would not permit royalty-free redistribution of the Program by
|
||||
all those who receive copies directly or indirectly through you, then
|
||||
the only way you could satisfy both it and this License would be to
|
||||
refrain entirely from distribution of the Program.
|
||||
|
||||
If any portion of this section is held invalid or unenforceable under
|
||||
any particular circumstance, the balance of the section is intended to
|
||||
apply and the section as a whole is intended to apply in other
|
||||
circumstances.
|
||||
|
||||
It is not the purpose of this section to induce you to infringe any
|
||||
patents or other property right claims or to contest validity of any
|
||||
such claims; this section has the sole purpose of protecting the
|
||||
integrity of the free software distribution system, which is
|
||||
implemented by public license practices. Many people have made
|
||||
generous contributions to the wide range of software distributed
|
||||
through that system in reliance on consistent application of that
|
||||
system; it is up to the author/donor to decide if he or she is willing
|
||||
to distribute software through any other system and a licensee cannot
|
||||
impose that choice.
|
||||
|
||||
This section is intended to make thoroughly clear what is believed to
|
||||
be a consequence of the rest of this License.
|
||||
|
||||
8. If the distribution and/or use of the Program is restricted in
|
||||
certain countries either by patents or by copyrighted interfaces, the
|
||||
original copyright holder who places the Program under this License
|
||||
may add an explicit geographical distribution limitation excluding
|
||||
those countries, so that distribution is permitted only in or among
|
||||
countries not thus excluded. In such case, this License incorporates
|
||||
the limitation as if written in the body of this License.
|
||||
|
||||
9. The Free Software Foundation may publish revised and/or new versions
|
||||
of the General Public License from time to time. Such new versions will
|
||||
be similar in spirit to the present version, but may differ in detail to
|
||||
address new problems or concerns.
|
||||
|
||||
Each version is given a distinguishing version number. If the Program
|
||||
specifies a version number of this License which applies to it and "any
|
||||
later version", you have the option of following the terms and conditions
|
||||
either of that version or of any later version published by the Free
|
||||
Software Foundation. If the Program does not specify a version number of
|
||||
this License, you may choose any version ever published by the Free Software
|
||||
Foundation.
|
||||
|
||||
10. If you wish to incorporate parts of the Program into other free
|
||||
programs whose distribution conditions are different, write to the author
|
||||
to ask for permission. For software which is copyrighted by the Free
|
||||
Software Foundation, write to the Free Software Foundation; we sometimes
|
||||
make exceptions for this. Our decision will be guided by the two goals
|
||||
of preserving the free status of all derivatives of our free software and
|
||||
of promoting the sharing and reuse of software generally.
|
||||
|
||||
NO WARRANTY
|
||||
|
||||
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
||||
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
||||
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
||||
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
||||
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
||||
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||||
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
||||
REPAIR OR CORRECTION.
|
||||
|
||||
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
||||
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
||||
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
||||
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
||||
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
||||
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
||||
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGES.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
How to Apply These Terms to Your New Programs
|
||||
|
||||
If you develop a new program, and you want it to be of the greatest
|
||||
possible use to the public, the best way to achieve this is to make it
|
||||
free software which everyone can redistribute and change under these terms.
|
||||
|
||||
To do so, attach the following notices to the program. It is safest
|
||||
to attach them to the start of each source file to most effectively
|
||||
convey the exclusion of warranty; and each file should have at least
|
||||
the "copyright" line and a pointer to where the full notice is found.
|
||||
|
||||
<one line to give the program's name and a brief idea of what it does.>
|
||||
Copyright (C) <year> <name of author>
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License along
|
||||
with this program; if not, see <https://www.gnu.org/licenses/>.
|
||||
|
||||
Also add information on how to contact you by electronic and paper mail.
|
||||
|
||||
If the program is interactive, make it output a short notice like this
|
||||
when it starts in an interactive mode:
|
||||
|
||||
Gnomovision version 69, Copyright (C) year name of author
|
||||
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||
This is free software, and you are welcome to redistribute it
|
||||
under certain conditions; type `show c' for details.
|
||||
|
||||
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||
parts of the General Public License. Of course, the commands you use may
|
||||
be called something other than `show w' and `show c'; they could even be
|
||||
mouse-clicks or menu items--whatever suits your program.
|
||||
|
||||
You should also get your employer (if you work as a programmer) or your
|
||||
school, if any, to sign a "copyright disclaimer" for the program, if
|
||||
necessary. Here is a sample; alter the names:
|
||||
|
||||
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||||
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||||
|
||||
<signature of Moe Ghoul>, 1 April 1989
|
||||
Moe Ghoul, President of Vice
|
||||
|
||||
This General Public License does not permit incorporating your program into
|
||||
proprietary programs. If your program is a subroutine library, you may
|
||||
consider it more useful to permit linking proprietary applications with the
|
||||
library. If this is what you want to do, use the GNU Lesser General
|
||||
Public License instead of this License.
|
||||
37
Pipfile
Normal file
37
Pipfile
Normal file
|
|
@ -0,0 +1,37 @@
|
|||
[[source]]
|
||||
url = "https://pypi.org/simple"
|
||||
verify_ssl = true
|
||||
name = "pypi"
|
||||
|
||||
[[source]]
|
||||
url = "https://pypi.code.tiararodney.com/root/byteb4rb1e/+simple/"
|
||||
verify_ssl = true
|
||||
name = "pypicodetiararodney"
|
||||
|
||||
[packages]
|
||||
"tiararodney.sekft" = {file = ".", editable = true}
|
||||
"tiararodney.posix-sdc" = {version = "*", index = "pypicodetiararodney", extras= ["hub"]}
|
||||
|
||||
[dev-packages]
|
||||
tox = "*"
|
||||
pytest = "*"
|
||||
mypy = "*"
|
||||
build = "*"
|
||||
twine = "*"
|
||||
setuptools-scm = "~=8.2.0"
|
||||
pypi-attestations = "*"
|
||||
autopep8 = "*"
|
||||
"tiararodney.posix-sdc" = {ref = "develop", git = "https://git.code.tiararodney.com/tiara/posix-sdc.git", extras = ["hub"]}
|
||||
|
||||
[requires]
|
||||
python_version = "3"
|
||||
|
||||
[scripts]
|
||||
"dist" = "python3 -m build"
|
||||
"dist:attestations" = "python3 -m pypi_attestations sign dist/*"
|
||||
"dist:publish:tiararodney" = "python3 -m twine upload --sign --repository tiararodney dist/*"
|
||||
"test" = "tox"
|
||||
"test:static" = "tox run -m static"
|
||||
"test:unit" = "tox run -m unit"
|
||||
"test:integration" = "tox run -m integration"
|
||||
"test:smoke" = "tox run -m smoke"
|
||||
1660
Pipfile.lock
generated
Normal file
1660
Pipfile.lock
generated
Normal file
File diff suppressed because it is too large
Load diff
79
README.md
Normal file
79
README.md
Normal file
|
|
@ -0,0 +1,79 @@
|
|||
# sekft
|
||||
|
||||
Fine-tune small open models to operate a POSIX shell as a self-directed citizen:
|
||||
land with **no imperative**, discover where directives live, learn the provider
|
||||
from its own self-documentation, do the work, and terminate (`exit` on success,
|
||||
`panic` when genuinely blocked).
|
||||
|
||||
sekft is the **training half**. The dataset and the synthetic-data factory live
|
||||
in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
|
||||
depends on. Here live the trainer, the behavioural evaluator, and the
|
||||
resident-base harness.
|
||||
|
||||
## Components
|
||||
|
||||
- **`sekft.sft`** (`sekft-train`) — supervised fine-tuner. Renders trajectories
|
||||
with the tokenizer's own chat template and trains an **assistant-only** loss
|
||||
mask (the commands plus the terminal token; environment turns masked to -100)
|
||||
into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a
|
||||
shell-operator SFT, so it is the part tested hardest.
|
||||
- **`sekft.eval`** (`sekft-eval`) — behavioural eval. Train loss says nothing
|
||||
about whether the model operates the shell and leaves. This drops base +
|
||||
adapter into held-out scenarios with no scaffold and reports the rates that
|
||||
count: reach command-mode, terminate, checker passes.
|
||||
- **`sekft.resident`** (`sekft-resident`) — resident-base harness. Loads the
|
||||
14 GB base once and keeps it hot, training and evaluating adapters without
|
||||
reloading it (over OcuLink/PCIe the base transfer otherwise dominates every
|
||||
run).
|
||||
|
||||
## The render contract
|
||||
|
||||
The render the model trains on MUST equal what it is served with. The serving
|
||||
harness (ccpty) sends structured `{role, content}` messages over the OpenAI
|
||||
chat-completions protocol, so the endpoint applies the **model's own chat
|
||||
template**. sekft therefore renders with `apply_chat_template`, after
|
||||
`normalize_for_template` canonicalises each session: a leading `system` turn is
|
||||
folded into the first `user` turn and consecutive same-role turns are merged,
|
||||
because instruct templates such as Mistral's have no system role and require
|
||||
strict user/assistant alternation. The same canonicalisation must run
|
||||
serve-side, or train and serve diverge.
|
||||
|
||||
## Install
|
||||
|
||||
The training paths only run on a CUDA host, so the GPU stack is an extra:
|
||||
|
||||
```sh
|
||||
pipenv install # editable sekft + the local editable posix-sdc
|
||||
pipenv install -e '.[gpu]' # torch / transformers / peft / datasets, on the box
|
||||
```
|
||||
|
||||
`pyproject.toml` declares `tiararodney.posix-sdc` abstractly; the `Pipfile`
|
||||
overrides it with the local editable `../posix-sdc` for side-by-side development.
|
||||
|
||||
## Use (on the GPU box)
|
||||
|
||||
```sh
|
||||
# fine-tune an adapter on the posix-sdc trajectories
|
||||
sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
|
||||
--out ./ckpt --load-4bit
|
||||
|
||||
# inspect the assistant-only loss mask without training (runs anywhere)
|
||||
sekft-train --data ./trajectories --base <dir> --inspect
|
||||
|
||||
# behavioural eval on held-out scenario bundles (worlds, not trajectories)
|
||||
sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16
|
||||
|
||||
# resident loop: load the base once, cycle adapters without reloading it
|
||||
sekft-resident --base <dir> --load-4bit
|
||||
```
|
||||
|
||||
The eval consumes held-out **scenario bundles** from posix-sdc (it stands up and
|
||||
verifies each in a fresh container), not trajectories.
|
||||
|
||||
## Result
|
||||
|
||||
Fine-tuning `mistralai/Mistral-7B-Instruct-v0.2` on the posix-sdc data lifted
|
||||
clean termination on archetype-level held-out scenarios from **0/16 (base) to
|
||||
9/16 (tuned)**: the operate-and-terminate mechanism generalised to unseen task
|
||||
types, while task competence stayed archetype-local. See the experiment
|
||||
[*From seed to weights*](https://blog.tiararodney.com/projects/2026/semantic-execution-kernel/experiments/from-seed-to-weights/).
|
||||
234
TODO
234
TODO
|
|
@ -15,3 +15,237 @@ Mappings:
|
|||
- Module: sekft
|
||||
Product: sek
|
||||
Component: sekft
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 1
|
||||
Type: feature
|
||||
Title: Package sekft as an installable namespace package
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-16
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: Turn the flat trainer scripts into an installable tiararodney.sekft
|
||||
namespace package: src layout, pyproject with the abstract
|
||||
posix-sdc dependency and an optional gpu extra, console scripts, a
|
||||
Pipfile pinning posix-sdc as a local editable override, and tox
|
||||
environments.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 2
|
||||
Type: feature
|
||||
Title: SFT trainer with chat-template render and assistant-only mask
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-16
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: Add the supervised fine-tuner: render trajectories through the
|
||||
tokenizer's own chat template (matching serving), canonicalise
|
||||
turns (fold system, merge consecutive), derive an assistant-only
|
||||
loss mask by token-prefix differencing, and train a QLoRA adapter.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 3
|
||||
Type: feature
|
||||
Title: Behavioural evaluator
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-16
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: Add the behavioural eval: load base plus LoRA adapter, drop it into
|
||||
held-out scenarios with no scaffold, drive them through a local
|
||||
operator that renders with the model's chat template, and report
|
||||
reach/terminate/checker rates.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 4
|
||||
Type: feature
|
||||
Title: Resident-base train/eval harness
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-16
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: Add the resident harness that loads the 14GB base once and keeps it
|
||||
hot, training fresh LoRA adapters and evaluating them without
|
||||
reloading the base, for the slow-OcuLink iterate loop.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 5
|
||||
Type: feature
|
||||
Title: Pipeline overview README
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-16
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: Document the sekft pipeline: the trainer, evaluator, and resident
|
||||
harness; how they consume the posix-sdc dataset; the render
|
||||
contract; and how to run on the GPU box.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 6
|
||||
Type: feature
|
||||
Title: Test suite: unit and smoke
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-16
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: Add a pytest suite: torch-free unit tests for the render
|
||||
canonicalisation and assistant-only mask (fake tokenizer), and
|
||||
smoke tests that the console entry points respond to --help without
|
||||
the GPU stack.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 7
|
||||
Type: feature
|
||||
Title: Add GPL-2.0 license and drop the relocated Dockerfile
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-16
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: License sekft under GPL-2.0 (canonical text plus pyproject
|
||||
metadata) and remove the dash Dockerfile, which now lives in
|
||||
posix-sdc under docker/alpine-dash.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 8
|
||||
Type: feature
|
||||
Title: Refresh docs for the packaged trainer
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-16
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: The README still describes sekft as the data factory
|
||||
(generate/rollout/dashdocker/taxonomy/schema), which all moved to
|
||||
posix-sdc. Rewrite it as the trainer (sft/eval/resident) that
|
||||
consumes posix-sdc, and update the module docstrings to
|
||||
console-script invocations and the chat-template render contract.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 9
|
||||
Type: feature
|
||||
Title: Type-check the package under mypy strict
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-17
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: Make the lint env honestly pass: add mypy as a dev dependency,
|
||||
ignore_missing_imports for the ML libs, fully annotate
|
||||
eval/resident/sft (including the inner operator callables), and
|
||||
ship a py.typed marker so the Typing::Typed claim is real.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 10
|
||||
Type: feature
|
||||
Title: structured logging for the trainer (sft)
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-17
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: The trainer is nearly silent: outside an example count and a save
|
||||
line it prints nothing through tokenizer load, the ~14GB base-model
|
||||
load, example building, and the whole training loop, and
|
||||
trajectories dropped for exceeding --max-len or having an empty
|
||||
loss mask vanish without a trace. Add a small shared logging setup
|
||||
(_log.py, stderr so stdout stays clean for results) and a module
|
||||
logger; give sekft-train -v/--verbose and -q/--quiet. Log the run
|
||||
config and each phase, report dataset accounting (keepers ->
|
||||
usable, with counts dropped for length / empty-mask and a warning
|
||||
when any are dropped), and raise transformers' verbosity during
|
||||
training so the per-step curve shows. Apply to train() and
|
||||
inspect().
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 11
|
||||
Type: bugfix
|
||||
Title: operate_rate can sum a None (eval + resident)
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-17
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: operate_rate computes sum(t.steps > 0 and t.meta.get('clean') for t
|
||||
in rows). The 'and' yields the right operand when steps>0, so if
|
||||
meta lacks the 'clean' key it yields None and sum() raises
|
||||
TypeError at runtime; mypy (now that posix-sdc ships py.typed and
|
||||
Trajectory is typed) flags the generator item type in eval.py:83
|
||||
and resident.py:157. Wrap the predicate in bool() so it counts
|
||||
trajectories that operated and are clean, fixing both the type
|
||||
error and the latent crash.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 12
|
||||
Type: feature
|
||||
Title: load training data from a raw dir, a curated jsonl, or the Hub
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-17
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: iter_keepers reads only raw per-trajectory .json - one of three
|
||||
input shapes the trainer should accept. Add load_turns(data, hub,
|
||||
revision) that yields assistant-bearing turns from: a directory of
|
||||
raw rollout .json (keep-filtered, today's iter_keepers); a curated
|
||||
.jsonl corpus file (already keep-filtered, yield turns per line);
|
||||
or the published corpus via posix-sdc's load_trajectories (local
|
||||
data/ in a checkout, else the Hub). sekft-train gains --hub and
|
||||
--revision; --data dispatches by dir-vs-.jsonl. Raw-rollout reading
|
||||
stays sekft-local; curated+Hub reuse posix-sdc's loader (imported
|
||||
lazily so the trainer needs neither posix-sdc nor huggingface_hub
|
||||
for the raw/jsonl paths). Unit tests for the raw-dir and jsonl
|
||||
dispatch.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 13
|
||||
Type: feature
|
||||
Title: reference posix-sdc three ways for seamless multi-machine dev
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-17
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: Wire the posix-sdc dependency as a triplet: the abstract
|
||||
posix-sdc[hub] in pyproject (so the trainer's --hub path can reach
|
||||
the Hub via huggingface_hub); the published wheel from the private
|
||||
index in Pipfile [packages]; the git develop branch in Pipfile
|
||||
[dev-packages] for develop-time. Commit Pipfile.lock so the
|
||||
dependency surface and lock land together.
|
||||
|
||||
--ISSUE
|
||||
Content-Type: application/issue
|
||||
ID: 14
|
||||
Type: bugfix
|
||||
Title: refresh Pipfile.lock against published posix-sdc 1.2.2
|
||||
Status: done
|
||||
Priority: medium
|
||||
Created: 2026-06-17
|
||||
Module: sekft
|
||||
Relationships:
|
||||
Description: The lock committed with the triplet (#13) predated the published
|
||||
posix-sdc 1.2.2 wheel, so it could not pin the real [hub] closure.
|
||||
Now that 1.2.2 is on the private index, re-lock: posix-sdc resolves
|
||||
to ==1.2.2 from the index and the [hub] extra pulls huggingface_hub
|
||||
and its transitive deps into the lock. Commit the refreshed
|
||||
Pipfile.lock so the next machine installs the published wheel with
|
||||
the Hub path available.
|
||||
|
|
|
|||
92
pyproject.toml
Normal file
92
pyproject.toml
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
[build-system]
|
||||
requires = [
|
||||
"setuptools",
|
||||
"wheel",
|
||||
"setuptools-scm[toml]"
|
||||
]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "tiararodney.sekft"
|
||||
description = "Fine-tune small open models to operate a POSIX shell (sek)"
|
||||
authors = [
|
||||
{ name = "Tiara Rodney", email = "tiara.rodney@byteb4rb1e.me" }
|
||||
]
|
||||
license-files = ["LICENSE"]
|
||||
readme = "README.md"
|
||||
classifiers = [
|
||||
"Development Status :: 3 - Alpha",
|
||||
"Intended Audience :: Developers",
|
||||
"Intended Audience :: Science/Research",
|
||||
"License :: OSI Approved :: GNU General Public License v2 (GPLv2)",
|
||||
"Natural Language :: English",
|
||||
"Operating System :: POSIX :: Linux",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.9",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
"Programming Language :: Python :: 3.12",
|
||||
"Topic :: Scientific/Engineering :: Artificial Intelligence",
|
||||
"Topic :: System :: Shells",
|
||||
"Typing :: Typed",
|
||||
]
|
||||
dependencies = [
|
||||
"tiararodney.posix-sdc[hub]",
|
||||
]
|
||||
dynamic = ["version"]
|
||||
requires-python = ">=3.9"
|
||||
|
||||
[project.optional-dependencies]
|
||||
gpu = [
|
||||
"torch",
|
||||
"transformers",
|
||||
"peft",
|
||||
"datasets",
|
||||
"accelerate",
|
||||
"bitsandbytes",
|
||||
"tensorboard",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
sekft-train = "tiararodney.sekft.sft:main"
|
||||
sekft-eval = "tiararodney.sekft.eval:main"
|
||||
sekft-resident = "tiararodney.sekft.resident:main"
|
||||
|
||||
[project.urls]
|
||||
Git = "https://git.code.tiararodney.com/tiararodney/sekft"
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
where = ["src"]
|
||||
namespaces = true
|
||||
|
||||
[tool.setuptools.package-data]
|
||||
"tiararodney.sekft" = ["py.typed"]
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
pythonpath = ["src", "../posix-sdc/src"]
|
||||
testpaths = ["tests"]
|
||||
markers = [
|
||||
"pytest: integration tests runnable without external services",
|
||||
"gpu: requires torch and a GPU",
|
||||
"docker: requires Docker and the sekft-dash image",
|
||||
]
|
||||
|
||||
[tool.mypy]
|
||||
strict = true
|
||||
mypy_path = "src"
|
||||
explicit_package_bases = true
|
||||
namespace_packages = true
|
||||
|
||||
[[tool.mypy.overrides]]
|
||||
module = [
|
||||
"torch.*", "transformers.*", "peft.*", "datasets.*", "bitsandbytes.*",
|
||||
"tiararodney.posix_sdc.*",
|
||||
]
|
||||
ignore_missing_imports = true
|
||||
|
||||
[tool.autopep8]
|
||||
max_line_length = 80
|
||||
aggressive = 3
|
||||
recursive = true
|
||||
|
||||
[tool.setuptools_scm]
|
||||
5
src/tiararodney/sekft/__init__.py
Normal file
5
src/tiararodney/sekft/__init__.py
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
"""sekft: fine-tune small open models to operate a POSIX shell (sek).
|
||||
|
||||
Consumes the posix-sdc dataset; the trainer, behavioural evaluator, and the
|
||||
resident-base harness live here.
|
||||
"""
|
||||
20
src/tiararodney/sekft/_log.py
Normal file
20
src/tiararodney/sekft/_log.py
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
"""Console logging setup shared by the sekft entry points.
|
||||
|
||||
Logs go to stderr so stdout stays clean for a command's actual output (metrics
|
||||
JSON, a path a caller might capture). Call :func:`setup` once at the top of a
|
||||
``main()``; modules then log through ``logging.getLogger("sekft.<area>")``.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
|
||||
|
||||
def setup(verbose: bool = False, quiet: bool = False) -> None:
|
||||
"""Configure root logging to stderr. ``quiet`` shows warnings and worse,
|
||||
``verbose`` adds debug; the default is info."""
|
||||
level = logging.WARNING if quiet else logging.DEBUG if verbose else logging.INFO
|
||||
logging.basicConfig(
|
||||
level=level,
|
||||
format="%(asctime)s %(levelname)-5s %(name)s %(message)s",
|
||||
datefmt="%H:%M:%S",
|
||||
)
|
||||
105
src/tiararodney/sekft/eval.py
Normal file
105
src/tiararodney/sekft/eval.py
Normal file
|
|
@ -0,0 +1,105 @@
|
|||
"""Behavioural eval: the metric that matters.
|
||||
|
||||
Train loss says nothing about whether the model operates the shell and leaves.
|
||||
This loads a fine-tuned model (base + LoRA adapter), drops it into held-out
|
||||
scenarios with NO scaffold (the trained behaviour must stand on its own), and
|
||||
reports the rates that count: does it reach command-mode, does it terminate,
|
||||
does the checker pass.
|
||||
|
||||
sekft-eval --base <hf-dir> --adapter ./ckpt-mistral-r16 \
|
||||
--scenarios ./holdout-scenarios --n 10
|
||||
|
||||
Reuses the posix-sdc rollout loop with a *local* operator: the model renders and
|
||||
generates with the same chat template it was trained on (train == eval == serve,
|
||||
via ``apply_chat_template`` + ``normalize_for_template``, or the prompts go out
|
||||
of distribution). Prerequisites on the box: torch + transformers + peft, the
|
||||
``sekft-dash`` image, and held-out SCENARIO bundles from the posix-sdc factory
|
||||
(not trajectories; the eval stands up and verifies each).
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
from collections.abc import Callable
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
|
||||
from tiararodney.posix_sdc.factory.rollout import rollout
|
||||
from tiararodney.posix_sdc.schema import Scenario
|
||||
|
||||
from .sft import normalize_for_template
|
||||
|
||||
|
||||
def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
|
||||
temperature: float = 0.7) -> Callable[[list[dict[str, str]]], str]:
|
||||
"""A ``messages -> command`` callable backed by base + LoRA adapter.
|
||||
|
||||
Renders the conversation exactly as the model was trained, appends the
|
||||
assistant header, generates one turn, and cuts at the first stop marker.
|
||||
"""
|
||||
import torch
|
||||
from peft import PeftModel
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
tok = AutoTokenizer.from_pretrained(adapter)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
base, torch_dtype=torch.float16, device_map="auto")
|
||||
model = PeftModel.from_pretrained(model, adapter)
|
||||
model.eval()
|
||||
|
||||
def operator(messages: list[dict[str, str]]) -> str:
|
||||
msgs = normalize_for_template(messages)
|
||||
ids = tok.apply_chat_template(
|
||||
msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
|
||||
with torch.no_grad():
|
||||
out = model.generate(
|
||||
ids, max_new_tokens=max_new_tokens,
|
||||
do_sample=temperature > 0, temperature=max(temperature, 1e-2),
|
||||
eos_token_id=tok.eos_token_id, pad_token_id=tok.eos_token_id)
|
||||
text: str = tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True).strip()
|
||||
return text
|
||||
|
||||
return operator
|
||||
|
||||
|
||||
def evaluate(base: str, adapter: str, scenarios_dir: Path, n: int,
|
||||
max_steps: int, temperature: float) -> dict[str, Any]:
|
||||
if not available():
|
||||
raise SystemExit("sekft-dash image unavailable; `docker build -t sekft-dash .`")
|
||||
operator = make_local_operator(base, adapter, temperature=temperature)
|
||||
backend = DashDocker()
|
||||
rows = []
|
||||
for f in sorted(scenarios_dir.glob("*.json"))[:n]:
|
||||
sc = Scenario.from_dict(json.loads(f.read_text()))
|
||||
tj = rollout(sc, backend, max_steps=max_steps, temperature=temperature,
|
||||
operator=operator, use_scaffold=False)
|
||||
rows.append(tj)
|
||||
print(f" {sc.id}: {tj.outcome} (terminal={tj.terminal} "
|
||||
f"verified={tj.verified} steps={tj.steps})")
|
||||
d = len(rows) or 1
|
||||
return {
|
||||
"n": len(rows),
|
||||
"operate_rate": round(sum(bool(t.steps > 0 and t.meta.get("clean")) for t in rows) / d, 3),
|
||||
"terminate_rate": round(sum(t.terminal in ("exit", "panic") for t in rows) / d, 3),
|
||||
"verified_rate": round(sum(t.verified for t in rows) / d, 3),
|
||||
"clean_rate": round(sum(t.keep for t in rows) / d, 3),
|
||||
}
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(description="Behavioural eval of a tuned model.")
|
||||
ap.add_argument("--base", required=True)
|
||||
ap.add_argument("--adapter", required=True)
|
||||
ap.add_argument("--scenarios", type=Path, required=True)
|
||||
ap.add_argument("--n", type=int, default=10)
|
||||
ap.add_argument("--max-steps", type=int, default=30)
|
||||
ap.add_argument("--temperature", type=float, default=0.7)
|
||||
ns = ap.parse_args()
|
||||
m = evaluate(ns.base, ns.adapter, ns.scenarios, ns.n, ns.max_steps, ns.temperature)
|
||||
print("\n=== behavioural metrics ===")
|
||||
print(json.dumps(m, indent=2))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
0
src/tiararodney/sekft/py.typed
Normal file
0
src/tiararodney/sekft/py.typed
Normal file
189
src/tiararodney/sekft/resident.py
Normal file
189
src/tiararodney/sekft/resident.py
Normal file
|
|
@ -0,0 +1,189 @@
|
|||
"""Resident harness: load the base ONCE, cycle adapters.
|
||||
|
||||
On a slow link (OcuLink / PCIe 3.0 x4) the 14 GB base transfer dominates every
|
||||
process start. This loads the base once and keeps it hot, so the
|
||||
iterate-train-eval loop pays the transfer only at startup. Each ``fit`` trains a
|
||||
fresh LoRA adapter on the resident base and ``unload``s it back to clean; each
|
||||
``evaluate`` attaches a saved adapter for inference and unloads.
|
||||
|
||||
Interactive (IPython on the GPU box) is the intended use:
|
||||
|
||||
from tiararodney.sekft.resident import Resident
|
||||
r = Resident("~/llm-models/mistral-7b-instruct-v0.2", load_4bit=True)
|
||||
r.fit("~/sekft/trajectories", "~/sekft/ckpt-a", lora_r=16, lr=2e-4, epochs=3)
|
||||
r.evaluate("~/sekft/ckpt-a", "~/sekft/holdout", n=10)
|
||||
r.fit("~/sekft/trajectories", "~/sekft/ckpt-b", lora_r=32) # NO base reload
|
||||
|
||||
Or `sekft-resident --base <dir> --selftest-data <stub_dir>` to prove the base
|
||||
loads once and two adapters train against it.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import gc
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import torch
|
||||
from datasets import Dataset
|
||||
from peft import (LoraConfig, PeftModel, get_peft_model,
|
||||
prepare_model_for_kbit_training)
|
||||
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
|
||||
DataCollatorForSeq2Seq, Trainer, TrainingArguments)
|
||||
|
||||
from .sft import build_masked_example, iter_keepers, normalize_for_template
|
||||
|
||||
LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"]
|
||||
|
||||
|
||||
def _free() -> None:
|
||||
gc.collect()
|
||||
torch.cuda.empty_cache()
|
||||
|
||||
|
||||
class Resident:
|
||||
"""A base model held resident on the GPU; adapters cycle through it."""
|
||||
|
||||
def __init__(self, base: str, load_4bit: bool = False) -> None:
|
||||
self.base_path = str(Path(base).expanduser())
|
||||
self.load_4bit = load_4bit
|
||||
self.tok = AutoTokenizer.from_pretrained(self.base_path)
|
||||
if self.tok.pad_token is None:
|
||||
self.tok.pad_token = self.tok.eos_token
|
||||
quant = None
|
||||
if load_4bit:
|
||||
quant = BitsAndBytesConfig(
|
||||
load_in_4bit=True, bnb_4bit_quant_type="nf4",
|
||||
bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True)
|
||||
print(f"[resident] loading base ONCE: {self.base_path} (4bit={load_4bit}) ...")
|
||||
self.base = AutoModelForCausalLM.from_pretrained(
|
||||
self.base_path, dtype=torch.float16, quantization_config=quant)
|
||||
self.base = (prepare_model_for_kbit_training(self.base) if load_4bit
|
||||
else self.base)
|
||||
if not load_4bit:
|
||||
self.base.enable_input_require_grads()
|
||||
dev = next(self.base.parameters()).device
|
||||
mem = torch.cuda.memory_allocated() / 1e9
|
||||
print(f"[resident] base resident on {dev}; {mem:.1f} GB VRAM")
|
||||
|
||||
# -- build masked rows from kept trajectories --------------------------
|
||||
|
||||
def _rows(self, data_dir: Path, max_len: int) -> list[dict[str, list[Any]]]:
|
||||
rows = []
|
||||
for turns in iter_keepers(data_dir):
|
||||
ex = build_masked_example(turns, self.tok)
|
||||
if len(ex["input_ids"]) <= max_len and any(l != -100 for l in ex["labels"]):
|
||||
rows.append(ex)
|
||||
if not rows:
|
||||
raise SystemExit(f"no usable keeper trajectories in {data_dir}")
|
||||
return rows
|
||||
|
||||
# -- train a fresh adapter on the resident base ------------------------
|
||||
|
||||
def fit(self, data_dir: str, out: str, lora_r: int = 16, lr: float = 2e-4,
|
||||
epochs: float = 3.0, batch: int = 1, accum: int = 8,
|
||||
max_len: int = 4096) -> Path:
|
||||
ddir, odir = Path(data_dir).expanduser(), Path(out).expanduser()
|
||||
ds = Dataset.from_list(self._rows(ddir, max_len))
|
||||
if not self.load_4bit:
|
||||
self.base.gradient_checkpointing_enable()
|
||||
model = get_peft_model(self.base, LoraConfig(
|
||||
r=lora_r, lora_alpha=lora_r * 2, lora_dropout=0.05,
|
||||
task_type="CAUSAL_LM", target_modules=LORA_TARGETS))
|
||||
model.print_trainable_parameters()
|
||||
args = TrainingArguments(
|
||||
output_dir=str(odir), per_device_train_batch_size=batch,
|
||||
gradient_accumulation_steps=accum, num_train_epochs=epochs,
|
||||
learning_rate=lr, fp16=True, logging_steps=1, save_strategy="no",
|
||||
report_to=["tensorboard"], logging_dir=str(odir / "runs"),
|
||||
remove_unused_columns=False, warmup_ratio=0.03)
|
||||
tr = Trainer(model=model, args=args, train_dataset=ds,
|
||||
data_collator=DataCollatorForSeq2Seq(
|
||||
self.tok, padding=True, label_pad_token_id=-100))
|
||||
tr.train()
|
||||
odir.mkdir(parents=True, exist_ok=True)
|
||||
model.save_pretrained(str(odir))
|
||||
self.tok.save_pretrained(str(odir))
|
||||
(odir / "log_history.jsonl").write_text(
|
||||
"\n".join(json.dumps(r) for r in tr.state.log_history))
|
||||
losses = [h["loss"] for h in tr.state.log_history if "loss" in h]
|
||||
print(f"[resident] fit -> {odir} final loss {losses[-1] if losses else '?'}")
|
||||
self.base = model.unload() # strip LoRA, restore resident base
|
||||
del model, tr, ds
|
||||
_free()
|
||||
return odir
|
||||
|
||||
# -- behavioural eval of a saved adapter -------------------------------
|
||||
|
||||
def evaluate(self, adapter: str, scenarios_dir: str, n: int = 10,
|
||||
max_steps: int = 30, temperature: float = 0.7) -> dict[str, Any]:
|
||||
from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
|
||||
from tiararodney.posix_sdc.factory.rollout import rollout
|
||||
from tiararodney.posix_sdc.schema import Scenario
|
||||
if not available():
|
||||
raise SystemExit("sekft-dash image unavailable on this box")
|
||||
# adapter=None -> evaluate the BASE model (the within-holdout baseline).
|
||||
if adapter:
|
||||
adapter = str(Path(adapter).expanduser())
|
||||
pm = PeftModel.from_pretrained(self.base, adapter)
|
||||
else:
|
||||
pm = self.base
|
||||
pm.eval()
|
||||
|
||||
def operator(messages: list[dict[str, str]]) -> str:
|
||||
msgs = normalize_for_template(messages)
|
||||
ids = self.tok.apply_chat_template(
|
||||
msgs, add_generation_prompt=True, return_tensors="pt").to(pm.device)
|
||||
with torch.no_grad():
|
||||
o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0,
|
||||
temperature=max(temperature, 1e-2),
|
||||
eos_token_id=self.tok.eos_token_id,
|
||||
pad_token_id=self.tok.eos_token_id)
|
||||
text: str = self.tok.decode(o[0][ids.shape[1]:], skip_special_tokens=True).strip()
|
||||
return text
|
||||
|
||||
backend = DashDocker()
|
||||
rows = []
|
||||
for f in sorted(Path(scenarios_dir).expanduser().glob("*.json"))[:n]:
|
||||
sc = Scenario.from_dict(json.loads(f.read_text()))
|
||||
tj = rollout(sc, backend, max_steps=max_steps, temperature=temperature,
|
||||
operator=operator, use_scaffold=False)
|
||||
rows.append(tj)
|
||||
print(f" {sc.id}: {tj.outcome} terminal={tj.terminal} verified={tj.verified}")
|
||||
d = len(rows) or 1
|
||||
m = {
|
||||
"n": len(rows),
|
||||
"operate_rate": round(sum(bool(t.steps > 0 and t.meta.get("clean")) for t in rows) / d, 3),
|
||||
"terminate_rate": round(sum(t.terminal in ("exit", "panic") for t in rows) / d, 3),
|
||||
"verified_rate": round(sum(t.verified for t in rows) / d, 3),
|
||||
"clean_rate": round(sum(t.keep for t in rows) / d, 3),
|
||||
}
|
||||
if adapter: # base is unwrapped only if we wrapped it
|
||||
self.base = pm.unload()
|
||||
del pm
|
||||
_free()
|
||||
print("[resident] eval:", json.dumps(m))
|
||||
return m
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(description="Resident base; cycle adapters.")
|
||||
ap.add_argument("--base", required=True)
|
||||
ap.add_argument("--load-4bit", action="store_true")
|
||||
ap.add_argument("--selftest-data",
|
||||
help="fit two adapters on this data to prove resident multi-fit")
|
||||
ns = ap.parse_args()
|
||||
r = Resident(ns.base, ns.load_4bit)
|
||||
if ns.selftest_data:
|
||||
print("=== selftest: two fits on the SAME resident base (no reload) ===")
|
||||
r.fit(ns.selftest_data, "/tmp/res-a", epochs=1, lora_r=8)
|
||||
r.fit(ns.selftest_data, "/tmp/res-b", epochs=1, lora_r=8)
|
||||
print("=== selftest OK: base loaded once, two adapters trained ===")
|
||||
else:
|
||||
print("Resident ready. Import and use r.fit() / r.evaluate(), "
|
||||
"or pass --selftest-data <dir>.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
289
src/tiararodney/sekft/sft.py
Normal file
289
src/tiararodney/sekft/sft.py
Normal file
|
|
@ -0,0 +1,289 @@
|
|||
"""sekft trainer: SFT a base model on kept shell-operation trajectories.
|
||||
|
||||
Trains assistant turns ONLY -- the commands and the terminal ``exit`` / ``panic``.
|
||||
The environment turns (system orientation, prompts, command output) are masked
|
||||
to ``-100`` so the model learns to *produce* commands, not to predict the
|
||||
environment's replies. Getting this mask wrong is the classic way to ruin a
|
||||
shell-operator SFT (the model starts hallucinating output), so it is the part
|
||||
worth testing hardest -- and it is framework-independent.
|
||||
|
||||
Render uses the tokenizer's OWN chat template (``apply_chat_template``), so the
|
||||
training render is identical to what the serving harness produces (ccpty sends
|
||||
structured messages and the inference endpoint applies the model's default
|
||||
template). Trajectories are canonicalised first (``normalize_for_template``):
|
||||
a leading ``system`` turn is folded into the first ``user`` turn and consecutive
|
||||
same-role turns are merged, because instruct templates such as Mistral's have no
|
||||
system role and require strict user/assistant alternation. That same
|
||||
canonicalisation must run on the serving side. Everything else is standard
|
||||
causal-LM SFT with an assistant-only loss mask.
|
||||
|
||||
sekft-train --data ./trajectories --base <hf-model-dir> --out ./ckpt
|
||||
sekft-train --data corpus.jsonl --base <dir> # a curated .jsonl corpus
|
||||
sekft-train --hub --base <dir> # the published corpus (Hub)
|
||||
sekft-train --data ./trajectories --base <dir> --inspect # mask stats, no training
|
||||
|
||||
Training needs torch + transformers + peft (a GPU box). ``--inspect`` and the
|
||||
normalize/mask helpers run anywhere a tokenizer with a chat template is
|
||||
available.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
from collections.abc import Iterator
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from ._log import setup as _setup_logging
|
||||
|
||||
log = logging.getLogger("sekft.train")
|
||||
|
||||
|
||||
def normalize_for_template(messages: list[dict[str, str]]) -> list[dict[str, str]]:
|
||||
"""Canonicalise a trajectory for instruct chat templates that have no system
|
||||
role and require strict user/assistant alternation (Mistral and friends):
|
||||
treat ``system`` as ``user``, then merge consecutive same-role turns by
|
||||
joining their content with a newline.
|
||||
|
||||
This is loss-neutral for the assistant mask (only environment/user turns
|
||||
ever merge; the assistant commands are never adjacent in this data) and it
|
||||
is what lets ``apply_chat_template`` render the multi-turn shell dialogue.
|
||||
The serving side MUST apply the same canonicalisation, or train and serve
|
||||
diverge again.
|
||||
"""
|
||||
out: list[dict[str, str]] = []
|
||||
for m in messages:
|
||||
role = "user" if m["role"] == "system" else m["role"]
|
||||
if out and out[-1]["role"] == role:
|
||||
out[-1] = {"role": role, "content": out[-1]["content"] + "\n" + m["content"]}
|
||||
else:
|
||||
out.append({"role": role, "content": m["content"]})
|
||||
return out
|
||||
|
||||
|
||||
def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]:
|
||||
"""Tokenize a trajectory with the tokenizer's OWN chat template and build an
|
||||
assistant-only loss mask.
|
||||
|
||||
The render is ``tokenizer.apply_chat_template`` on the canonicalised turns,
|
||||
so it is byte-identical to what the serving harness sends. The mask is
|
||||
derived by token-prefix differencing: the tokens an assistant turn
|
||||
contributes are exactly those that appear when it extends the rendered
|
||||
prefix, which trains the commands plus the template's end-of-turn token (so
|
||||
the model learns to stop) and masks every environment turn to ``-100``. This
|
||||
assumes an additive template (each turn extends the previous render); a
|
||||
non-additive one raises rather than silently mis-mask.
|
||||
"""
|
||||
msgs = normalize_for_template(messages)
|
||||
ids = tokenizer.apply_chat_template(msgs, add_generation_prompt=False)
|
||||
labels = [-100] * len(ids)
|
||||
prev: list[int] = []
|
||||
for i, m in enumerate(msgs):
|
||||
upto = tokenizer.apply_chat_template(msgs[:i + 1], add_generation_prompt=False)
|
||||
if ids[:len(upto)] != upto or upto[:len(prev)] != prev:
|
||||
raise ValueError("chat template is not additive; cannot derive an "
|
||||
"assistant loss mask by token-prefix differencing")
|
||||
if m["role"] == "assistant":
|
||||
for j in range(len(prev), len(upto)):
|
||||
labels[j] = ids[j]
|
||||
prev = upto
|
||||
return {"input_ids": ids, "attention_mask": [1] * len(ids), "labels": labels}
|
||||
|
||||
|
||||
def iter_keepers(data_dir: Path) -> Iterator[list[dict[str, str]]]:
|
||||
"""Yield ``turns`` (message lists) from raw rollout JSONs marked keep."""
|
||||
for f in sorted(data_dir.glob("*.json")):
|
||||
d = json.loads(f.read_text())
|
||||
if d.get("keep"):
|
||||
yield d["turns"]
|
||||
|
||||
|
||||
def load_turns(data: Path, hub: bool = False,
|
||||
revision: str | None = None) -> Iterator[list[dict[str, str]]]:
|
||||
"""Yield assistant-bearing ``turns`` from one of three sources:
|
||||
|
||||
- ``--hub``: the published corpus via posix-sdc's ``load_trajectories`` (the
|
||||
in-repo ``data/`` of a posix-sdc checkout, else the Hugging Face Hub);
|
||||
- ``data`` a ``.jsonl`` file: a curated corpus, already keep-filtered, one
|
||||
record per line;
|
||||
- ``data`` a directory: raw rollout ``.json`` (keep-filtered here).
|
||||
|
||||
posix-sdc is imported lazily, so the raw-dir and ``.jsonl`` paths need
|
||||
neither posix-sdc nor huggingface_hub installed.
|
||||
"""
|
||||
if hub:
|
||||
from tiararodney.posix_sdc import load_trajectories
|
||||
for r in load_trajectories(revision=revision):
|
||||
yield r["turns"]
|
||||
elif data.is_dir():
|
||||
yield from iter_keepers(data)
|
||||
elif data.suffix == ".jsonl":
|
||||
with open(data) as fh:
|
||||
for line in fh:
|
||||
if line.strip():
|
||||
yield json.loads(line)["turns"]
|
||||
else:
|
||||
raise SystemExit(
|
||||
f"--data must be a rollout directory or a .jsonl corpus (got {data})")
|
||||
|
||||
|
||||
def mask_stats(example: dict[str, list[Any]]) -> tuple[int, int]:
|
||||
"""(trained tokens, total tokens) for an example."""
|
||||
trained = sum(1 for x in example["labels"] if x != -100)
|
||||
return trained, len(example["labels"])
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------
|
||||
# Training (GPU box: torch + transformers + peft)
|
||||
# --------------------------------------------------------------------------
|
||||
|
||||
def train(data_dir: Path, base: str, out: Path, epochs: float, lr: float,
|
||||
batch: int, accum: int, max_len: int, lora_r: int,
|
||||
load_4bit: bool = False, hub: bool = False,
|
||||
revision: str | None = None) -> None:
|
||||
import torch
|
||||
from datasets import Dataset
|
||||
from peft import LoraConfig, get_peft_model
|
||||
from transformers import (AutoModelForCausalLM, AutoTokenizer,
|
||||
DataCollatorForSeq2Seq, Trainer, TrainingArguments)
|
||||
from transformers.utils import logging as hf_logging
|
||||
|
||||
# Surface the Trainer's own per-step curve (loss/lr/grad_norm); it is at
|
||||
# WARNING by default, which is most of why training looks silent.
|
||||
hf_logging.set_verbosity_info()
|
||||
|
||||
source = "hub" if hub else data_dir
|
||||
log.info("base=%s data=%s out=%s", base, source, out)
|
||||
log.info("loading tokenizer: %s", base)
|
||||
tok = AutoTokenizer.from_pretrained(base)
|
||||
if tok.pad_token is None:
|
||||
tok.pad_token = tok.eos_token
|
||||
|
||||
log.info("building masked examples from %s ...", source)
|
||||
rows: list[dict[str, list[Any]]] = []
|
||||
n_seen = n_long = n_empty = 0
|
||||
for turns in load_turns(data_dir, hub=hub, revision=revision):
|
||||
n_seen += 1
|
||||
ex = build_masked_example(turns, tok)
|
||||
log.debug(" trajectory %d: %d turns -> %d tokens, %d trained",
|
||||
n_seen, len(turns), len(ex["input_ids"]), mask_stats(ex)[0])
|
||||
if n_seen % 100 == 0:
|
||||
log.info(" ... %d trajectories processed, %d usable", n_seen, len(rows))
|
||||
if len(ex["input_ids"]) > max_len:
|
||||
n_long += 1
|
||||
continue
|
||||
if not any(l != -100 for l in ex["labels"]):
|
||||
n_empty += 1
|
||||
continue
|
||||
rows.append(ex)
|
||||
if not rows:
|
||||
raise SystemExit(f"no usable keeper trajectories in {data_dir}")
|
||||
trained = sum(mask_stats(r)[0] for r in rows)
|
||||
total = sum(mask_stats(r)[1] for r in rows)
|
||||
log.info("dataset: %d keepers -> %d usable; %d trained / %d tokens (%.1f%% assistant)",
|
||||
n_seen, len(rows), trained, total, 100 * trained / total)
|
||||
if n_long or n_empty:
|
||||
log.warning("dropped %d trajectories: %d over --max-len %d, %d empty-mask",
|
||||
n_long + n_empty, n_long, max_len, n_empty)
|
||||
ds = Dataset.from_list(rows)
|
||||
|
||||
# 4-bit (QLoRA) shrinks the base from ~14 GB to ~4 GB to move across the
|
||||
# OcuLink/PCIe link and to hold in VRAM; nf4 + fp16 compute works on the
|
||||
# V100 (sm_70). Without it, plain fp16 weights.
|
||||
quant = None
|
||||
if load_4bit:
|
||||
from transformers import BitsAndBytesConfig
|
||||
quant = BitsAndBytesConfig(
|
||||
load_in_4bit=True, bnb_4bit_quant_type="nf4",
|
||||
bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True,
|
||||
)
|
||||
log.info("loading base model: %s (%s)", base,
|
||||
"4-bit QLoRA" if load_4bit else "fp16")
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
base, dtype=torch.float16, quantization_config=quant)
|
||||
if load_4bit:
|
||||
from peft import prepare_model_for_kbit_training
|
||||
model = prepare_model_for_kbit_training(model) # handles ckpt + input grads
|
||||
else:
|
||||
model.enable_input_require_grads()
|
||||
model.gradient_checkpointing_enable()
|
||||
model = get_peft_model(model, LoraConfig(
|
||||
r=lora_r, lora_alpha=lora_r * 2, lora_dropout=0.05, task_type="CAUSAL_LM",
|
||||
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
|
||||
))
|
||||
n_train, n_all = model.get_nb_trainable_parameters()
|
||||
log.info("LoRA r=%d: %d trainable / %d params (%.3f%%)",
|
||||
lora_r, n_train, n_all, 100 * n_train / n_all)
|
||||
|
||||
args = TrainingArguments(
|
||||
output_dir=str(out), per_device_train_batch_size=batch,
|
||||
gradient_accumulation_steps=accum, num_train_epochs=epochs,
|
||||
learning_rate=lr, fp16=True, logging_steps=1, save_strategy="epoch",
|
||||
report_to=["tensorboard"], logging_dir=str(out / "runs"),
|
||||
remove_unused_columns=False, warmup_ratio=0.03,
|
||||
)
|
||||
trainer = Trainer(
|
||||
model=model, args=args, train_dataset=ds,
|
||||
data_collator=DataCollatorForSeq2Seq(tok, padding=True, label_pad_token_id=-100),
|
||||
)
|
||||
log.info("training: %g epochs, lr=%g, batch=%d x accum=%d (effective %d), max_len=%d",
|
||||
epochs, lr, batch, accum, batch * accum, max_len)
|
||||
trainer.train()
|
||||
model.save_pretrained(str(out))
|
||||
tok.save_pretrained(str(out))
|
||||
# durable, greppable record of the curve (loss/lr/grad_norm per step).
|
||||
(out / "log_history.jsonl").write_text(
|
||||
"\n".join(json.dumps(r) for r in trainer.state.log_history))
|
||||
log.info("saved LoRA adapter + log_history.jsonl -> %s (tensorboard: --logdir %s)",
|
||||
out, out / "runs")
|
||||
|
||||
|
||||
def inspect(data_dir: Path, base: str, hub: bool = False,
|
||||
revision: str | None = None) -> None:
|
||||
from transformers import AutoTokenizer
|
||||
log.info("loading tokenizer: %s", base)
|
||||
tok = AutoTokenizer.from_pretrained(base)
|
||||
n = tt = tr = 0
|
||||
for turns in load_turns(data_dir, hub=hub, revision=revision):
|
||||
ex = build_masked_example(turns, tok)
|
||||
t, total = mask_stats(ex)
|
||||
tr += t; tt += total; n += 1
|
||||
if not n:
|
||||
raise SystemExit(f"no keeper trajectories in {data_dir}")
|
||||
log.info("%d keeper trajectories; %d/%d tokens trained (%.1f%% assistant, rest masked)",
|
||||
n, tr, tt, 100 * tr / tt)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(description="SFT a model on shell trajectories.")
|
||||
ap.add_argument("--data", type=Path, default=Path("./trajectories"),
|
||||
help="a raw rollout dir or a curated .jsonl corpus")
|
||||
ap.add_argument("--hub", action="store_true",
|
||||
help="load the published corpus via posix-sdc (Hub); ignores --data")
|
||||
ap.add_argument("--revision", default=None,
|
||||
help="dataset revision/tag to pin when using --hub")
|
||||
ap.add_argument("--base", required=True, help="HF model id or local dir")
|
||||
ap.add_argument("--out", type=Path, default=Path("./ckpt"))
|
||||
ap.add_argument("--inspect", action="store_true", help="mask stats only, no training")
|
||||
ap.add_argument("--epochs", type=float, default=3.0)
|
||||
ap.add_argument("--lr", type=float, default=2e-4)
|
||||
ap.add_argument("--batch", type=int, default=1)
|
||||
ap.add_argument("--accum", type=int, default=8)
|
||||
ap.add_argument("--max-len", type=int, default=4096)
|
||||
ap.add_argument("--lora-r", type=int, default=16)
|
||||
ap.add_argument("--load-4bit", action="store_true",
|
||||
help="QLoRA: load base in 4-bit (less to move over the link, less VRAM)")
|
||||
ap.add_argument("-v", "--verbose", action="store_true", help="debug-level logging")
|
||||
ap.add_argument("-q", "--quiet", action="store_true", help="warnings and errors only")
|
||||
ns = ap.parse_args()
|
||||
_setup_logging(verbose=ns.verbose, quiet=ns.quiet)
|
||||
if ns.inspect:
|
||||
inspect(ns.data, ns.base, hub=ns.hub, revision=ns.revision)
|
||||
else:
|
||||
train(ns.data, ns.base, ns.out, ns.epochs, ns.lr, ns.batch, ns.accum,
|
||||
ns.max_len, ns.lora_r, ns.load_4bit, hub=ns.hub, revision=ns.revision)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
30
tests/smoke/test_entrypoints.py
Normal file
30
tests/smoke/test_entrypoints.py
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
"""Smoke tests: the console entry points load and respond to --help without the
|
||||
GPU stack (torch is imported lazily inside the training/eval code paths)."""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
ROOT = Path(__file__).resolve().parents[2]
|
||||
SRC = ROOT / "src"
|
||||
POSIX_SRC = ROOT.parent / "posix-sdc" / "src"
|
||||
|
||||
|
||||
def _help(module: str) -> "subprocess.CompletedProcess[str]":
|
||||
env = dict(os.environ, PYTHONPATH=os.pathsep.join([str(SRC), str(POSIX_SRC)]))
|
||||
return subprocess.run([sys.executable, "-m", module, "--help"],
|
||||
capture_output=True, text=True, env=env)
|
||||
|
||||
|
||||
def test_train_help() -> None:
|
||||
cp = _help("tiararodney.sekft.sft")
|
||||
assert cp.returncode == 0, cp.stderr
|
||||
assert "--data" in cp.stdout
|
||||
|
||||
|
||||
def test_eval_help() -> None:
|
||||
cp = _help("tiararodney.sekft.eval")
|
||||
assert cp.returncode == 0, cp.stderr
|
||||
assert "--adapter" in cp.stdout
|
||||
35
tests/unit/test_load.py
Normal file
35
tests/unit/test_load.py
Normal file
|
|
@ -0,0 +1,35 @@
|
|||
"""Unit tests for the trainer's three-source data loader (raw dir / curated
|
||||
jsonl). The Hub path delegates to posix-sdc and is covered there."""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from tiararodney.sekft import sft
|
||||
|
||||
|
||||
def test_load_turns_from_raw_dir(tmp_path: Path) -> None:
|
||||
(tmp_path / "a.json").write_text(json.dumps(
|
||||
{"keep": True, "turns": [{"role": "assistant", "content": "ls"}]}))
|
||||
(tmp_path / "b.json").write_text(json.dumps( # not kept -> excluded
|
||||
{"keep": False, "turns": [{"role": "assistant", "content": "rm -rf /"}]}))
|
||||
got = list(sft.load_turns(tmp_path))
|
||||
assert len(got) == 1
|
||||
assert got[0][0]["content"] == "ls"
|
||||
|
||||
|
||||
def test_load_turns_from_jsonl(tmp_path: Path) -> None:
|
||||
f = tmp_path / "corpus.jsonl"
|
||||
f.write_text("\n".join(json.dumps({"turns": [{"role": "assistant", "content": c}]})
|
||||
for c in ("ls", "cat x")) + "\n")
|
||||
got = list(sft.load_turns(f))
|
||||
assert [t[0]["content"] for t in got] == ["ls", "cat x"]
|
||||
|
||||
|
||||
def test_load_turns_rejects_other_paths(tmp_path: Path) -> None:
|
||||
bad = tmp_path / "notes.txt"
|
||||
bad.write_text("hi")
|
||||
with pytest.raises(SystemExit):
|
||||
list(sft.load_turns(bad))
|
||||
75
tests/unit/test_sft.py
Normal file
75
tests/unit/test_sft.py
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
"""Unit tests for the SFT render canonicalisation and assistant-only mask.
|
||||
|
||||
These run anywhere: a fake additive tokenizer stands in for a real chat
|
||||
template, so no torch/transformers is needed."""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
|
||||
from tiararodney.sekft import sft
|
||||
|
||||
|
||||
class FakeTok:
|
||||
"""Additive chat template: each turn renders to ``<role> tokens... </e>``;
|
||||
the generation prompt appends ``<assistant>``."""
|
||||
|
||||
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
|
||||
return_tensors: Any = None) -> list[str]:
|
||||
toks: list[str] = []
|
||||
for m in msgs:
|
||||
toks.append(f"<{m['role']}>")
|
||||
toks += m["content"].split()
|
||||
toks.append("</e>")
|
||||
if add_generation_prompt:
|
||||
toks.append("<assistant>")
|
||||
return toks
|
||||
|
||||
|
||||
def test_normalize_folds_system_and_merges_consecutive() -> None:
|
||||
raw = [
|
||||
{"role": "system", "content": "orient"},
|
||||
{"role": "user", "content": "login"},
|
||||
{"role": "user", "content": "prompt"},
|
||||
{"role": "assistant", "content": "cat f"},
|
||||
{"role": "user", "content": "out"},
|
||||
{"role": "user", "content": "prompt"},
|
||||
{"role": "assistant", "content": "exit"},
|
||||
]
|
||||
norm = sft.normalize_for_template(raw)
|
||||
assert [m["role"] for m in norm] == ["user", "assistant", "user", "assistant"]
|
||||
assert norm[0]["content"] == "orient\nlogin\nprompt"
|
||||
|
||||
|
||||
def test_normalize_leaves_clean_alternation_untouched() -> None:
|
||||
raw = [{"role": "user", "content": "a"}, {"role": "assistant", "content": "b"}]
|
||||
assert sft.normalize_for_template(raw) == raw
|
||||
|
||||
|
||||
def test_mask_trains_assistant_turns_only() -> None:
|
||||
raw = [
|
||||
{"role": "system", "content": "orient"},
|
||||
{"role": "user", "content": "login"},
|
||||
{"role": "assistant", "content": "cat f"},
|
||||
{"role": "user", "content": "out"},
|
||||
{"role": "assistant", "content": "exit"},
|
||||
]
|
||||
ex = sft.build_masked_example(raw, FakeTok())
|
||||
trained = [t for t, lab in zip(ex["input_ids"], ex["labels"]) if lab != -100]
|
||||
masked = [t for t, lab in zip(ex["input_ids"], ex["labels"]) if lab == -100]
|
||||
assert set(trained) <= {"<assistant>", "cat", "f", "exit", "</e>"}
|
||||
assert "cat" in trained and "exit" in trained # both commands present
|
||||
assert {"orient", "login", "out"} <= set(masked) # environment masked
|
||||
|
||||
|
||||
def test_mask_raises_on_non_additive_template() -> None:
|
||||
class BadTok:
|
||||
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
|
||||
return_tensors: Any = None) -> list[int]:
|
||||
return list(range(len(msgs), 0, -1)) # reversed: prefixes do not nest
|
||||
|
||||
with pytest.raises(ValueError):
|
||||
sft.build_masked_example(
|
||||
[{"role": "user", "content": "a"}, {"role": "assistant", "content": "b"}],
|
||||
BadTok())
|
||||
47
tox.ini
Normal file
47
tox.ini
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
[tox]
|
||||
requires =
|
||||
tox>=4.19
|
||||
env_list =
|
||||
unit-py3{9-13}
|
||||
smoke-py3{9-13}
|
||||
lint
|
||||
format
|
||||
|
||||
[testenv]
|
||||
deps =
|
||||
../posix-sdc
|
||||
.
|
||||
|
||||
[testenv:lint]
|
||||
description = run type check on code base
|
||||
labels = static
|
||||
deps =
|
||||
mypy
|
||||
commands =
|
||||
mypy src tests --junit-xml test-reports/{env_name}.xml
|
||||
|
||||
[testenv:format]
|
||||
description = check formatting
|
||||
labels = static
|
||||
deps =
|
||||
autopep8
|
||||
commands =
|
||||
autopep8 --diff --exit-code src tests
|
||||
|
||||
[testenv:unit-py3{9-13}]
|
||||
description = run unit tests
|
||||
labels = unit
|
||||
deps =
|
||||
{[testenv]deps}
|
||||
pytest
|
||||
commands =
|
||||
pytest tests/unit --junitxml=test-reports/{env_name}.xml
|
||||
|
||||
[testenv:smoke-py3{9-13}]
|
||||
description = run smoke tests against the console entry points
|
||||
labels = smoke
|
||||
deps =
|
||||
{[testenv]deps}
|
||||
pytest
|
||||
commands =
|
||||
pytest tests/smoke --junitxml=test-reports/{env_name}.xml
|
||||
Loading…
Add table
Add a link
Reference in a new issue