Merge branch 'develop'
This commit is contained in:
commit
705b4a028b
18 changed files with 3276 additions and 14 deletions
41
CHANGELOG.md
Normal file
41
CHANGELOG.md
Normal file
|
|
@ -0,0 +1,41 @@
|
||||||
|
# Changelog
|
||||||
|
|
||||||
|
All notable changes to sekft, the shell-operator SFT trainer behind the
|
||||||
|
[posix-sdc](https://huggingface.co/datasets/tiararodney/posix-sdc) experiment,
|
||||||
|
are documented in this file.
|
||||||
|
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
||||||
|
and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
|
||||||
|
## [1.0.0] - 2026-06-18
|
||||||
|
|
||||||
|
First release: the training and evaluation pipeline that turns posix-sdc
|
||||||
|
trajectories into a fine-tuned shell operator.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `sekft-train`: LoRA / QLoRA supervised fine-tuning of a base model on
|
||||||
|
shell-operation trajectories, with an **assistant-only loss mask** derived by
|
||||||
|
token-prefix differencing — the commands and the terminal `exit` / `panic`
|
||||||
|
token are trained; the environment turns (orientation, prompts, command
|
||||||
|
output) are masked to `-100`. The render uses the tokenizer's own
|
||||||
|
`apply_chat_template`, so training matches what the serving harness sends
|
||||||
|
(train = serve), with `normalize_for_template` canonicalising trajectories for
|
||||||
|
instruct templates that have no system role and require strict user/assistant
|
||||||
|
alternation.
|
||||||
|
- Three sources of training data: a directory of raw rollout `.json`
|
||||||
|
(keep-filtered), a curated `.jsonl` corpus, or the published posix-sdc corpus
|
||||||
|
over the Hugging Face Hub (`--hub`).
|
||||||
|
- `--inspect` for mask and token statistics without training, and structured
|
||||||
|
stderr logging across every phase (`-v` / `-q`): per-trajectory and progress
|
||||||
|
lines while the corpus is tokenized, dataset accounting that warns on dropped
|
||||||
|
(over-length / empty-mask) trajectories, and the per-step training curve.
|
||||||
|
- `sekft-eval`: behavioural evaluation that drops the tuned model into held-out
|
||||||
|
scenarios with no scaffold and scores whether it operates and terminates.
|
||||||
|
- `sekft-resident`: a resident-base harness that loads the base model once and
|
||||||
|
fits several adapters without reloading, for paired / STaR-style runs.
|
||||||
|
- Packaging: the `tiararodney.sekft` namespace package with `sekft-train`,
|
||||||
|
`sekft-eval`, and `sekft-resident` console scripts; a typed (`py.typed`),
|
||||||
|
mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft);
|
||||||
|
and a dependency on `posix-sdc[hub]`. Released under GPL-2.0.
|
||||||
|
|
||||||
|
[1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0
|
||||||
14
Dockerfile
14
Dockerfile
|
|
@ -1,14 +0,0 @@
|
||||||
# Minimal dash-in-a-box for sekft trajectory generation.
|
|
||||||
# docker build -t sekft-dash .
|
|
||||||
#
|
|
||||||
# dash as the operated shell (strict POSIX, no bashisms), busybox applets for
|
|
||||||
# the coreutils. busybox is intentionally close to minimal POSIX so trajectories
|
|
||||||
# transfer toward sek rather than encoding GNU-isms. Add `coreutils findutils
|
|
||||||
# grep sed` here if you want GNU semantics instead.
|
|
||||||
FROM alpine:3.19
|
|
||||||
RUN apk add --no-cache dash \
|
|
||||||
&& ln -sf /usr/bin/dash /bin/dash \
|
|
||||||
&& ln -sf /usr/bin/dash /bin/sh
|
|
||||||
# /work is the default arena; provider files land at their absolute paths.
|
|
||||||
RUN mkdir -p /work
|
|
||||||
WORKDIR /work
|
|
||||||
338
LICENSE
Normal file
338
LICENSE
Normal file
|
|
@ -0,0 +1,338 @@
|
||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
Version 2, June 1991
|
||||||
|
|
||||||
|
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
|
||||||
|
<https://fsf.org/>
|
||||||
|
Everyone is permitted to copy and distribute verbatim copies
|
||||||
|
of this license document, but changing it is not allowed.
|
||||||
|
|
||||||
|
Preamble
|
||||||
|
|
||||||
|
The licenses for most software are designed to take away your
|
||||||
|
freedom to share and change it. By contrast, the GNU General Public
|
||||||
|
License is intended to guarantee your freedom to share and change free
|
||||||
|
software--to make sure the software is free for all its users. This
|
||||||
|
General Public License applies to most of the Free Software
|
||||||
|
Foundation's software and to any other program whose authors commit to
|
||||||
|
using it. (Some other Free Software Foundation software is covered by
|
||||||
|
the GNU Lesser General Public License instead.) You can apply it to
|
||||||
|
your programs, too.
|
||||||
|
|
||||||
|
When we speak of free software, we are referring to freedom, not
|
||||||
|
price. Our General Public Licenses are designed to make sure that you
|
||||||
|
have the freedom to distribute copies of free software (and charge for
|
||||||
|
this service if you wish), that you receive source code or can get it
|
||||||
|
if you want it, that you can change the software or use pieces of it
|
||||||
|
in new free programs; and that you know you can do these things.
|
||||||
|
|
||||||
|
To protect your rights, we need to make restrictions that forbid
|
||||||
|
anyone to deny you these rights or to ask you to surrender the rights.
|
||||||
|
These restrictions translate to certain responsibilities for you if you
|
||||||
|
distribute copies of the software, or if you modify it.
|
||||||
|
|
||||||
|
For example, if you distribute copies of such a program, whether
|
||||||
|
gratis or for a fee, you must give the recipients all the rights that
|
||||||
|
you have. You must make sure that they, too, receive or can get the
|
||||||
|
source code. And you must show them these terms so they know their
|
||||||
|
rights.
|
||||||
|
|
||||||
|
We protect your rights with two steps: (1) copyright the software, and
|
||||||
|
(2) offer you this license which gives you legal permission to copy,
|
||||||
|
distribute and/or modify the software.
|
||||||
|
|
||||||
|
Also, for each author's protection and ours, we want to make certain
|
||||||
|
that everyone understands that there is no warranty for this free
|
||||||
|
software. If the software is modified by someone else and passed on, we
|
||||||
|
want its recipients to know that what they have is not the original, so
|
||||||
|
that any problems introduced by others will not reflect on the original
|
||||||
|
authors' reputations.
|
||||||
|
|
||||||
|
Finally, any free program is threatened constantly by software
|
||||||
|
patents. We wish to avoid the danger that redistributors of a free
|
||||||
|
program will individually obtain patent licenses, in effect making the
|
||||||
|
program proprietary. To prevent this, we have made it clear that any
|
||||||
|
patent must be licensed for everyone's free use or not licensed at all.
|
||||||
|
|
||||||
|
The precise terms and conditions for copying, distribution and
|
||||||
|
modification follow.
|
||||||
|
|
||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||||||
|
|
||||||
|
0. This License applies to any program or other work which contains
|
||||||
|
a notice placed by the copyright holder saying it may be distributed
|
||||||
|
under the terms of this General Public License. The "Program", below,
|
||||||
|
refers to any such program or work, and a "work based on the Program"
|
||||||
|
means either the Program or any derivative work under copyright law:
|
||||||
|
that is to say, a work containing the Program or a portion of it,
|
||||||
|
either verbatim or with modifications and/or translated into another
|
||||||
|
language. (Hereinafter, translation is included without limitation in
|
||||||
|
the term "modification".) Each licensee is addressed as "you".
|
||||||
|
|
||||||
|
Activities other than copying, distribution and modification are not
|
||||||
|
covered by this License; they are outside its scope. The act of
|
||||||
|
running the Program is not restricted, and the output from the Program
|
||||||
|
is covered only if its contents constitute a work based on the
|
||||||
|
Program (independent of having been made by running the Program).
|
||||||
|
Whether that is true depends on what the Program does.
|
||||||
|
|
||||||
|
1. You may copy and distribute verbatim copies of the Program's
|
||||||
|
source code as you receive it, in any medium, provided that you
|
||||||
|
conspicuously and appropriately publish on each copy an appropriate
|
||||||
|
copyright notice and disclaimer of warranty; keep intact all the
|
||||||
|
notices that refer to this License and to the absence of any warranty;
|
||||||
|
and give any other recipients of the Program a copy of this License
|
||||||
|
along with the Program.
|
||||||
|
|
||||||
|
You may charge a fee for the physical act of transferring a copy, and
|
||||||
|
you may at your option offer warranty protection in exchange for a fee.
|
||||||
|
|
||||||
|
2. You may modify your copy or copies of the Program or any portion
|
||||||
|
of it, thus forming a work based on the Program, and copy and
|
||||||
|
distribute such modifications or work under the terms of Section 1
|
||||||
|
above, provided that you also meet all of these conditions:
|
||||||
|
|
||||||
|
a) You must cause the modified files to carry prominent notices
|
||||||
|
stating that you changed the files and the date of any change.
|
||||||
|
|
||||||
|
b) You must cause any work that you distribute or publish, that in
|
||||||
|
whole or in part contains or is derived from the Program or any
|
||||||
|
part thereof, to be licensed as a whole at no charge to all third
|
||||||
|
parties under the terms of this License.
|
||||||
|
|
||||||
|
c) If the modified program normally reads commands interactively
|
||||||
|
when run, you must cause it, when started running for such
|
||||||
|
interactive use in the most ordinary way, to print or display an
|
||||||
|
announcement including an appropriate copyright notice and a
|
||||||
|
notice that there is no warranty (or else, saying that you provide
|
||||||
|
a warranty) and that users may redistribute the program under
|
||||||
|
these conditions, and telling the user how to view a copy of this
|
||||||
|
License. (Exception: if the Program itself is interactive but
|
||||||
|
does not normally print such an announcement, your work based on
|
||||||
|
the Program is not required to print an announcement.)
|
||||||
|
|
||||||
|
These requirements apply to the modified work as a whole. If
|
||||||
|
identifiable sections of that work are not derived from the Program,
|
||||||
|
and can be reasonably considered independent and separate works in
|
||||||
|
themselves, then this License, and its terms, do not apply to those
|
||||||
|
sections when you distribute them as separate works. But when you
|
||||||
|
distribute the same sections as part of a whole which is a work based
|
||||||
|
on the Program, the distribution of the whole must be on the terms of
|
||||||
|
this License, whose permissions for other licensees extend to the
|
||||||
|
entire whole, and thus to each and every part regardless of who wrote it.
|
||||||
|
|
||||||
|
Thus, it is not the intent of this section to claim rights or contest
|
||||||
|
your rights to work written entirely by you; rather, the intent is to
|
||||||
|
exercise the right to control the distribution of derivative or
|
||||||
|
collective works based on the Program.
|
||||||
|
|
||||||
|
In addition, mere aggregation of another work not based on the Program
|
||||||
|
with the Program (or with a work based on the Program) on a volume of
|
||||||
|
a storage or distribution medium does not bring the other work under
|
||||||
|
the scope of this License.
|
||||||
|
|
||||||
|
3. You may copy and distribute the Program (or a work based on it,
|
||||||
|
under Section 2) in object code or executable form under the terms of
|
||||||
|
Sections 1 and 2 above provided that you also do one of the following:
|
||||||
|
|
||||||
|
a) Accompany it with the complete corresponding machine-readable
|
||||||
|
source code, which must be distributed under the terms of Sections
|
||||||
|
1 and 2 above on a medium customarily used for software interchange; or,
|
||||||
|
|
||||||
|
b) Accompany it with a written offer, valid for at least three
|
||||||
|
years, to give any third party, for a charge no more than your
|
||||||
|
cost of physically performing source distribution, a complete
|
||||||
|
machine-readable copy of the corresponding source code, to be
|
||||||
|
distributed under the terms of Sections 1 and 2 above on a medium
|
||||||
|
customarily used for software interchange; or,
|
||||||
|
|
||||||
|
c) Accompany it with the information you received as to the offer
|
||||||
|
to distribute corresponding source code. (This alternative is
|
||||||
|
allowed only for noncommercial distribution and only if you
|
||||||
|
received the program in object code or executable form with such
|
||||||
|
an offer, in accord with Subsection b above.)
|
||||||
|
|
||||||
|
The source code for a work means the preferred form of the work for
|
||||||
|
making modifications to it. For an executable work, complete source
|
||||||
|
code means all the source code for all modules it contains, plus any
|
||||||
|
associated interface definition files, plus the scripts used to
|
||||||
|
control compilation and installation of the executable. However, as a
|
||||||
|
special exception, the source code distributed need not include
|
||||||
|
anything that is normally distributed (in either source or binary
|
||||||
|
form) with the major components (compiler, kernel, and so on) of the
|
||||||
|
operating system on which the executable runs, unless that component
|
||||||
|
itself accompanies the executable.
|
||||||
|
|
||||||
|
If distribution of executable or object code is made by offering
|
||||||
|
access to copy from a designated place, then offering equivalent
|
||||||
|
access to copy the source code from the same place counts as
|
||||||
|
distribution of the source code, even though third parties are not
|
||||||
|
compelled to copy the source along with the object code.
|
||||||
|
|
||||||
|
4. You may not copy, modify, sublicense, or distribute the Program
|
||||||
|
except as expressly provided under this License. Any attempt
|
||||||
|
otherwise to copy, modify, sublicense or distribute the Program is
|
||||||
|
void, and will automatically terminate your rights under this License.
|
||||||
|
However, parties who have received copies, or rights, from you under
|
||||||
|
this License will not have their licenses terminated so long as such
|
||||||
|
parties remain in full compliance.
|
||||||
|
|
||||||
|
5. You are not required to accept this License, since you have not
|
||||||
|
signed it. However, nothing else grants you permission to modify or
|
||||||
|
distribute the Program or its derivative works. These actions are
|
||||||
|
prohibited by law if you do not accept this License. Therefore, by
|
||||||
|
modifying or distributing the Program (or any work based on the
|
||||||
|
Program), you indicate your acceptance of this License to do so, and
|
||||||
|
all its terms and conditions for copying, distributing or modifying
|
||||||
|
the Program or works based on it.
|
||||||
|
|
||||||
|
6. Each time you redistribute the Program (or any work based on the
|
||||||
|
Program), the recipient automatically receives a license from the
|
||||||
|
original licensor to copy, distribute or modify the Program subject to
|
||||||
|
these terms and conditions. You may not impose any further
|
||||||
|
restrictions on the recipients' exercise of the rights granted herein.
|
||||||
|
You are not responsible for enforcing compliance by third parties to
|
||||||
|
this License.
|
||||||
|
|
||||||
|
7. If, as a consequence of a court judgment or allegation of patent
|
||||||
|
infringement or for any other reason (not limited to patent issues),
|
||||||
|
conditions are imposed on you (whether by court order, agreement or
|
||||||
|
otherwise) that contradict the conditions of this License, they do not
|
||||||
|
excuse you from the conditions of this License. If you cannot
|
||||||
|
distribute so as to satisfy simultaneously your obligations under this
|
||||||
|
License and any other pertinent obligations, then as a consequence you
|
||||||
|
may not distribute the Program at all. For example, if a patent
|
||||||
|
license would not permit royalty-free redistribution of the Program by
|
||||||
|
all those who receive copies directly or indirectly through you, then
|
||||||
|
the only way you could satisfy both it and this License would be to
|
||||||
|
refrain entirely from distribution of the Program.
|
||||||
|
|
||||||
|
If any portion of this section is held invalid or unenforceable under
|
||||||
|
any particular circumstance, the balance of the section is intended to
|
||||||
|
apply and the section as a whole is intended to apply in other
|
||||||
|
circumstances.
|
||||||
|
|
||||||
|
It is not the purpose of this section to induce you to infringe any
|
||||||
|
patents or other property right claims or to contest validity of any
|
||||||
|
such claims; this section has the sole purpose of protecting the
|
||||||
|
integrity of the free software distribution system, which is
|
||||||
|
implemented by public license practices. Many people have made
|
||||||
|
generous contributions to the wide range of software distributed
|
||||||
|
through that system in reliance on consistent application of that
|
||||||
|
system; it is up to the author/donor to decide if he or she is willing
|
||||||
|
to distribute software through any other system and a licensee cannot
|
||||||
|
impose that choice.
|
||||||
|
|
||||||
|
This section is intended to make thoroughly clear what is believed to
|
||||||
|
be a consequence of the rest of this License.
|
||||||
|
|
||||||
|
8. If the distribution and/or use of the Program is restricted in
|
||||||
|
certain countries either by patents or by copyrighted interfaces, the
|
||||||
|
original copyright holder who places the Program under this License
|
||||||
|
may add an explicit geographical distribution limitation excluding
|
||||||
|
those countries, so that distribution is permitted only in or among
|
||||||
|
countries not thus excluded. In such case, this License incorporates
|
||||||
|
the limitation as if written in the body of this License.
|
||||||
|
|
||||||
|
9. The Free Software Foundation may publish revised and/or new versions
|
||||||
|
of the General Public License from time to time. Such new versions will
|
||||||
|
be similar in spirit to the present version, but may differ in detail to
|
||||||
|
address new problems or concerns.
|
||||||
|
|
||||||
|
Each version is given a distinguishing version number. If the Program
|
||||||
|
specifies a version number of this License which applies to it and "any
|
||||||
|
later version", you have the option of following the terms and conditions
|
||||||
|
either of that version or of any later version published by the Free
|
||||||
|
Software Foundation. If the Program does not specify a version number of
|
||||||
|
this License, you may choose any version ever published by the Free Software
|
||||||
|
Foundation.
|
||||||
|
|
||||||
|
10. If you wish to incorporate parts of the Program into other free
|
||||||
|
programs whose distribution conditions are different, write to the author
|
||||||
|
to ask for permission. For software which is copyrighted by the Free
|
||||||
|
Software Foundation, write to the Free Software Foundation; we sometimes
|
||||||
|
make exceptions for this. Our decision will be guided by the two goals
|
||||||
|
of preserving the free status of all derivatives of our free software and
|
||||||
|
of promoting the sharing and reuse of software generally.
|
||||||
|
|
||||||
|
NO WARRANTY
|
||||||
|
|
||||||
|
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
||||||
|
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
||||||
|
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
||||||
|
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
||||||
|
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||||
|
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
||||||
|
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||||||
|
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
||||||
|
REPAIR OR CORRECTION.
|
||||||
|
|
||||||
|
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||||
|
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
||||||
|
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
||||||
|
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
||||||
|
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
||||||
|
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
||||||
|
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
||||||
|
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
||||||
|
POSSIBILITY OF SUCH DAMAGES.
|
||||||
|
|
||||||
|
END OF TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
How to Apply These Terms to Your New Programs
|
||||||
|
|
||||||
|
If you develop a new program, and you want it to be of the greatest
|
||||||
|
possible use to the public, the best way to achieve this is to make it
|
||||||
|
free software which everyone can redistribute and change under these terms.
|
||||||
|
|
||||||
|
To do so, attach the following notices to the program. It is safest
|
||||||
|
to attach them to the start of each source file to most effectively
|
||||||
|
convey the exclusion of warranty; and each file should have at least
|
||||||
|
the "copyright" line and a pointer to where the full notice is found.
|
||||||
|
|
||||||
|
<one line to give the program's name and a brief idea of what it does.>
|
||||||
|
Copyright (C) <year> <name of author>
|
||||||
|
|
||||||
|
This program is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License along
|
||||||
|
with this program; if not, see <https://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
Also add information on how to contact you by electronic and paper mail.
|
||||||
|
|
||||||
|
If the program is interactive, make it output a short notice like this
|
||||||
|
when it starts in an interactive mode:
|
||||||
|
|
||||||
|
Gnomovision version 69, Copyright (C) year name of author
|
||||||
|
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||||
|
This is free software, and you are welcome to redistribute it
|
||||||
|
under certain conditions; type `show c' for details.
|
||||||
|
|
||||||
|
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||||
|
parts of the General Public License. Of course, the commands you use may
|
||||||
|
be called something other than `show w' and `show c'; they could even be
|
||||||
|
mouse-clicks or menu items--whatever suits your program.
|
||||||
|
|
||||||
|
You should also get your employer (if you work as a programmer) or your
|
||||||
|
school, if any, to sign a "copyright disclaimer" for the program, if
|
||||||
|
necessary. Here is a sample; alter the names:
|
||||||
|
|
||||||
|
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||||||
|
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||||||
|
|
||||||
|
<signature of Moe Ghoul>, 1 April 1989
|
||||||
|
Moe Ghoul, President of Vice
|
||||||
|
|
||||||
|
This General Public License does not permit incorporating your program into
|
||||||
|
proprietary programs. If your program is a subroutine library, you may
|
||||||
|
consider it more useful to permit linking proprietary applications with the
|
||||||
|
library. If this is what you want to do, use the GNU Lesser General
|
||||||
|
Public License instead of this License.
|
||||||
37
Pipfile
Normal file
37
Pipfile
Normal file
|
|
@ -0,0 +1,37 @@
|
||||||
|
[[source]]
|
||||||
|
url = "https://pypi.org/simple"
|
||||||
|
verify_ssl = true
|
||||||
|
name = "pypi"
|
||||||
|
|
||||||
|
[[source]]
|
||||||
|
url = "https://pypi.code.tiararodney.com/root/byteb4rb1e/+simple/"
|
||||||
|
verify_ssl = true
|
||||||
|
name = "pypicodetiararodney"
|
||||||
|
|
||||||
|
[packages]
|
||||||
|
"tiararodney.sekft" = {file = ".", editable = true}
|
||||||
|
"tiararodney.posix-sdc" = {version = "*", index = "pypicodetiararodney", extras= ["hub"]}
|
||||||
|
|
||||||
|
[dev-packages]
|
||||||
|
tox = "*"
|
||||||
|
pytest = "*"
|
||||||
|
mypy = "*"
|
||||||
|
build = "*"
|
||||||
|
twine = "*"
|
||||||
|
setuptools-scm = "~=8.2.0"
|
||||||
|
pypi-attestations = "*"
|
||||||
|
autopep8 = "*"
|
||||||
|
"tiararodney.posix-sdc" = {ref = "develop", git = "https://git.code.tiararodney.com/tiara/posix-sdc.git", extras = ["hub"]}
|
||||||
|
|
||||||
|
[requires]
|
||||||
|
python_version = "3"
|
||||||
|
|
||||||
|
[scripts]
|
||||||
|
"dist" = "python3 -m build"
|
||||||
|
"dist:attestations" = "python3 -m pypi_attestations sign dist/*"
|
||||||
|
"dist:publish:tiararodney" = "python3 -m twine upload --sign --repository tiararodney dist/*"
|
||||||
|
"test" = "tox"
|
||||||
|
"test:static" = "tox run -m static"
|
||||||
|
"test:unit" = "tox run -m unit"
|
||||||
|
"test:integration" = "tox run -m integration"
|
||||||
|
"test:smoke" = "tox run -m smoke"
|
||||||
1660
Pipfile.lock
generated
Normal file
1660
Pipfile.lock
generated
Normal file
File diff suppressed because it is too large
Load diff
79
README.md
Normal file
79
README.md
Normal file
|
|
@ -0,0 +1,79 @@
|
||||||
|
# sekft
|
||||||
|
|
||||||
|
Fine-tune small open models to operate a POSIX shell as a self-directed citizen:
|
||||||
|
land with **no imperative**, discover where directives live, learn the provider
|
||||||
|
from its own self-documentation, do the work, and terminate (`exit` on success,
|
||||||
|
`panic` when genuinely blocked).
|
||||||
|
|
||||||
|
sekft is the **training half**. The dataset and the synthetic-data factory live
|
||||||
|
in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
|
||||||
|
depends on. Here live the trainer, the behavioural evaluator, and the
|
||||||
|
resident-base harness.
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
- **`sekft.sft`** (`sekft-train`) — supervised fine-tuner. Renders trajectories
|
||||||
|
with the tokenizer's own chat template and trains an **assistant-only** loss
|
||||||
|
mask (the commands plus the terminal token; environment turns masked to -100)
|
||||||
|
into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a
|
||||||
|
shell-operator SFT, so it is the part tested hardest.
|
||||||
|
- **`sekft.eval`** (`sekft-eval`) — behavioural eval. Train loss says nothing
|
||||||
|
about whether the model operates the shell and leaves. This drops base +
|
||||||
|
adapter into held-out scenarios with no scaffold and reports the rates that
|
||||||
|
count: reach command-mode, terminate, checker passes.
|
||||||
|
- **`sekft.resident`** (`sekft-resident`) — resident-base harness. Loads the
|
||||||
|
14 GB base once and keeps it hot, training and evaluating adapters without
|
||||||
|
reloading it (over OcuLink/PCIe the base transfer otherwise dominates every
|
||||||
|
run).
|
||||||
|
|
||||||
|
## The render contract
|
||||||
|
|
||||||
|
The render the model trains on MUST equal what it is served with. The serving
|
||||||
|
harness (ccpty) sends structured `{role, content}` messages over the OpenAI
|
||||||
|
chat-completions protocol, so the endpoint applies the **model's own chat
|
||||||
|
template**. sekft therefore renders with `apply_chat_template`, after
|
||||||
|
`normalize_for_template` canonicalises each session: a leading `system` turn is
|
||||||
|
folded into the first `user` turn and consecutive same-role turns are merged,
|
||||||
|
because instruct templates such as Mistral's have no system role and require
|
||||||
|
strict user/assistant alternation. The same canonicalisation must run
|
||||||
|
serve-side, or train and serve diverge.
|
||||||
|
|
||||||
|
## Install
|
||||||
|
|
||||||
|
The training paths only run on a CUDA host, so the GPU stack is an extra:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pipenv install # editable sekft + the local editable posix-sdc
|
||||||
|
pipenv install -e '.[gpu]' # torch / transformers / peft / datasets, on the box
|
||||||
|
```
|
||||||
|
|
||||||
|
`pyproject.toml` declares `tiararodney.posix-sdc` abstractly; the `Pipfile`
|
||||||
|
overrides it with the local editable `../posix-sdc` for side-by-side development.
|
||||||
|
|
||||||
|
## Use (on the GPU box)
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# fine-tune an adapter on the posix-sdc trajectories
|
||||||
|
sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
|
||||||
|
--out ./ckpt --load-4bit
|
||||||
|
|
||||||
|
# inspect the assistant-only loss mask without training (runs anywhere)
|
||||||
|
sekft-train --data ./trajectories --base <dir> --inspect
|
||||||
|
|
||||||
|
# behavioural eval on held-out scenario bundles (worlds, not trajectories)
|
||||||
|
sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16
|
||||||
|
|
||||||
|
# resident loop: load the base once, cycle adapters without reloading it
|
||||||
|
sekft-resident --base <dir> --load-4bit
|
||||||
|
```
|
||||||
|
|
||||||
|
The eval consumes held-out **scenario bundles** from posix-sdc (it stands up and
|
||||||
|
verifies each in a fresh container), not trajectories.
|
||||||
|
|
||||||
|
## Result
|
||||||
|
|
||||||
|
Fine-tuning `mistralai/Mistral-7B-Instruct-v0.2` on the posix-sdc data lifted
|
||||||
|
clean termination on archetype-level held-out scenarios from **0/16 (base) to
|
||||||
|
9/16 (tuned)**: the operate-and-terminate mechanism generalised to unseen task
|
||||||
|
types, while task competence stayed archetype-local. See the experiment
|
||||||
|
[*From seed to weights*](https://blog.tiararodney.com/projects/2026/semantic-execution-kernel/experiments/from-seed-to-weights/).
|
||||||
234
TODO
234
TODO
|
|
@ -15,3 +15,237 @@ Mappings:
|
||||||
- Module: sekft
|
- Module: sekft
|
||||||
Product: sek
|
Product: sek
|
||||||
Component: sekft
|
Component: sekft
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 1
|
||||||
|
Type: feature
|
||||||
|
Title: Package sekft as an installable namespace package
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-16
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: Turn the flat trainer scripts into an installable tiararodney.sekft
|
||||||
|
namespace package: src layout, pyproject with the abstract
|
||||||
|
posix-sdc dependency and an optional gpu extra, console scripts, a
|
||||||
|
Pipfile pinning posix-sdc as a local editable override, and tox
|
||||||
|
environments.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 2
|
||||||
|
Type: feature
|
||||||
|
Title: SFT trainer with chat-template render and assistant-only mask
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-16
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: Add the supervised fine-tuner: render trajectories through the
|
||||||
|
tokenizer's own chat template (matching serving), canonicalise
|
||||||
|
turns (fold system, merge consecutive), derive an assistant-only
|
||||||
|
loss mask by token-prefix differencing, and train a QLoRA adapter.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 3
|
||||||
|
Type: feature
|
||||||
|
Title: Behavioural evaluator
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-16
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: Add the behavioural eval: load base plus LoRA adapter, drop it into
|
||||||
|
held-out scenarios with no scaffold, drive them through a local
|
||||||
|
operator that renders with the model's chat template, and report
|
||||||
|
reach/terminate/checker rates.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 4
|
||||||
|
Type: feature
|
||||||
|
Title: Resident-base train/eval harness
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-16
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: Add the resident harness that loads the 14GB base once and keeps it
|
||||||
|
hot, training fresh LoRA adapters and evaluating them without
|
||||||
|
reloading the base, for the slow-OcuLink iterate loop.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 5
|
||||||
|
Type: feature
|
||||||
|
Title: Pipeline overview README
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-16
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: Document the sekft pipeline: the trainer, evaluator, and resident
|
||||||
|
harness; how they consume the posix-sdc dataset; the render
|
||||||
|
contract; and how to run on the GPU box.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 6
|
||||||
|
Type: feature
|
||||||
|
Title: Test suite: unit and smoke
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-16
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: Add a pytest suite: torch-free unit tests for the render
|
||||||
|
canonicalisation and assistant-only mask (fake tokenizer), and
|
||||||
|
smoke tests that the console entry points respond to --help without
|
||||||
|
the GPU stack.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 7
|
||||||
|
Type: feature
|
||||||
|
Title: Add GPL-2.0 license and drop the relocated Dockerfile
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-16
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: License sekft under GPL-2.0 (canonical text plus pyproject
|
||||||
|
metadata) and remove the dash Dockerfile, which now lives in
|
||||||
|
posix-sdc under docker/alpine-dash.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 8
|
||||||
|
Type: feature
|
||||||
|
Title: Refresh docs for the packaged trainer
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-16
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: The README still describes sekft as the data factory
|
||||||
|
(generate/rollout/dashdocker/taxonomy/schema), which all moved to
|
||||||
|
posix-sdc. Rewrite it as the trainer (sft/eval/resident) that
|
||||||
|
consumes posix-sdc, and update the module docstrings to
|
||||||
|
console-script invocations and the chat-template render contract.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 9
|
||||||
|
Type: feature
|
||||||
|
Title: Type-check the package under mypy strict
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-17
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: Make the lint env honestly pass: add mypy as a dev dependency,
|
||||||
|
ignore_missing_imports for the ML libs, fully annotate
|
||||||
|
eval/resident/sft (including the inner operator callables), and
|
||||||
|
ship a py.typed marker so the Typing::Typed claim is real.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 10
|
||||||
|
Type: feature
|
||||||
|
Title: structured logging for the trainer (sft)
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-17
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: The trainer is nearly silent: outside an example count and a save
|
||||||
|
line it prints nothing through tokenizer load, the ~14GB base-model
|
||||||
|
load, example building, and the whole training loop, and
|
||||||
|
trajectories dropped for exceeding --max-len or having an empty
|
||||||
|
loss mask vanish without a trace. Add a small shared logging setup
|
||||||
|
(_log.py, stderr so stdout stays clean for results) and a module
|
||||||
|
logger; give sekft-train -v/--verbose and -q/--quiet. Log the run
|
||||||
|
config and each phase, report dataset accounting (keepers ->
|
||||||
|
usable, with counts dropped for length / empty-mask and a warning
|
||||||
|
when any are dropped), and raise transformers' verbosity during
|
||||||
|
training so the per-step curve shows. Apply to train() and
|
||||||
|
inspect().
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 11
|
||||||
|
Type: bugfix
|
||||||
|
Title: operate_rate can sum a None (eval + resident)
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-17
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: operate_rate computes sum(t.steps > 0 and t.meta.get('clean') for t
|
||||||
|
in rows). The 'and' yields the right operand when steps>0, so if
|
||||||
|
meta lacks the 'clean' key it yields None and sum() raises
|
||||||
|
TypeError at runtime; mypy (now that posix-sdc ships py.typed and
|
||||||
|
Trajectory is typed) flags the generator item type in eval.py:83
|
||||||
|
and resident.py:157. Wrap the predicate in bool() so it counts
|
||||||
|
trajectories that operated and are clean, fixing both the type
|
||||||
|
error and the latent crash.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 12
|
||||||
|
Type: feature
|
||||||
|
Title: load training data from a raw dir, a curated jsonl, or the Hub
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-17
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: iter_keepers reads only raw per-trajectory .json - one of three
|
||||||
|
input shapes the trainer should accept. Add load_turns(data, hub,
|
||||||
|
revision) that yields assistant-bearing turns from: a directory of
|
||||||
|
raw rollout .json (keep-filtered, today's iter_keepers); a curated
|
||||||
|
.jsonl corpus file (already keep-filtered, yield turns per line);
|
||||||
|
or the published corpus via posix-sdc's load_trajectories (local
|
||||||
|
data/ in a checkout, else the Hub). sekft-train gains --hub and
|
||||||
|
--revision; --data dispatches by dir-vs-.jsonl. Raw-rollout reading
|
||||||
|
stays sekft-local; curated+Hub reuse posix-sdc's loader (imported
|
||||||
|
lazily so the trainer needs neither posix-sdc nor huggingface_hub
|
||||||
|
for the raw/jsonl paths). Unit tests for the raw-dir and jsonl
|
||||||
|
dispatch.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 13
|
||||||
|
Type: feature
|
||||||
|
Title: reference posix-sdc three ways for seamless multi-machine dev
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-17
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: Wire the posix-sdc dependency as a triplet: the abstract
|
||||||
|
posix-sdc[hub] in pyproject (so the trainer's --hub path can reach
|
||||||
|
the Hub via huggingface_hub); the published wheel from the private
|
||||||
|
index in Pipfile [packages]; the git develop branch in Pipfile
|
||||||
|
[dev-packages] for develop-time. Commit Pipfile.lock so the
|
||||||
|
dependency surface and lock land together.
|
||||||
|
|
||||||
|
--ISSUE
|
||||||
|
Content-Type: application/issue
|
||||||
|
ID: 14
|
||||||
|
Type: bugfix
|
||||||
|
Title: refresh Pipfile.lock against published posix-sdc 1.2.2
|
||||||
|
Status: done
|
||||||
|
Priority: medium
|
||||||
|
Created: 2026-06-17
|
||||||
|
Module: sekft
|
||||||
|
Relationships:
|
||||||
|
Description: The lock committed with the triplet (#13) predated the published
|
||||||
|
posix-sdc 1.2.2 wheel, so it could not pin the real [hub] closure.
|
||||||
|
Now that 1.2.2 is on the private index, re-lock: posix-sdc resolves
|
||||||
|
to ==1.2.2 from the index and the [hub] extra pulls huggingface_hub
|
||||||
|
and its transitive deps into the lock. Commit the refreshed
|
||||||
|
Pipfile.lock so the next machine installs the published wheel with
|
||||||
|
the Hub path available.
|
||||||
|
|
|
||||||
92
pyproject.toml
Normal file
92
pyproject.toml
Normal file
|
|
@ -0,0 +1,92 @@
|
||||||
|
[build-system]
|
||||||
|
requires = [
|
||||||
|
"setuptools",
|
||||||
|
"wheel",
|
||||||
|
"setuptools-scm[toml]"
|
||||||
|
]
|
||||||
|
build-backend = "setuptools.build_meta"
|
||||||
|
|
||||||
|
[project]
|
||||||
|
name = "tiararodney.sekft"
|
||||||
|
description = "Fine-tune small open models to operate a POSIX shell (sek)"
|
||||||
|
authors = [
|
||||||
|
{ name = "Tiara Rodney", email = "tiara.rodney@byteb4rb1e.me" }
|
||||||
|
]
|
||||||
|
license-files = ["LICENSE"]
|
||||||
|
readme = "README.md"
|
||||||
|
classifiers = [
|
||||||
|
"Development Status :: 3 - Alpha",
|
||||||
|
"Intended Audience :: Developers",
|
||||||
|
"Intended Audience :: Science/Research",
|
||||||
|
"License :: OSI Approved :: GNU General Public License v2 (GPLv2)",
|
||||||
|
"Natural Language :: English",
|
||||||
|
"Operating System :: POSIX :: Linux",
|
||||||
|
"Programming Language :: Python :: 3",
|
||||||
|
"Programming Language :: Python :: 3.9",
|
||||||
|
"Programming Language :: Python :: 3.10",
|
||||||
|
"Programming Language :: Python :: 3.11",
|
||||||
|
"Programming Language :: Python :: 3.12",
|
||||||
|
"Topic :: Scientific/Engineering :: Artificial Intelligence",
|
||||||
|
"Topic :: System :: Shells",
|
||||||
|
"Typing :: Typed",
|
||||||
|
]
|
||||||
|
dependencies = [
|
||||||
|
"tiararodney.posix-sdc[hub]",
|
||||||
|
]
|
||||||
|
dynamic = ["version"]
|
||||||
|
requires-python = ">=3.9"
|
||||||
|
|
||||||
|
[project.optional-dependencies]
|
||||||
|
gpu = [
|
||||||
|
"torch",
|
||||||
|
"transformers",
|
||||||
|
"peft",
|
||||||
|
"datasets",
|
||||||
|
"accelerate",
|
||||||
|
"bitsandbytes",
|
||||||
|
"tensorboard",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
sekft-train = "tiararodney.sekft.sft:main"
|
||||||
|
sekft-eval = "tiararodney.sekft.eval:main"
|
||||||
|
sekft-resident = "tiararodney.sekft.resident:main"
|
||||||
|
|
||||||
|
[project.urls]
|
||||||
|
Git = "https://git.code.tiararodney.com/tiararodney/sekft"
|
||||||
|
|
||||||
|
[tool.setuptools.packages.find]
|
||||||
|
where = ["src"]
|
||||||
|
namespaces = true
|
||||||
|
|
||||||
|
[tool.setuptools.package-data]
|
||||||
|
"tiararodney.sekft" = ["py.typed"]
|
||||||
|
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
pythonpath = ["src", "../posix-sdc/src"]
|
||||||
|
testpaths = ["tests"]
|
||||||
|
markers = [
|
||||||
|
"pytest: integration tests runnable without external services",
|
||||||
|
"gpu: requires torch and a GPU",
|
||||||
|
"docker: requires Docker and the sekft-dash image",
|
||||||
|
]
|
||||||
|
|
||||||
|
[tool.mypy]
|
||||||
|
strict = true
|
||||||
|
mypy_path = "src"
|
||||||
|
explicit_package_bases = true
|
||||||
|
namespace_packages = true
|
||||||
|
|
||||||
|
[[tool.mypy.overrides]]
|
||||||
|
module = [
|
||||||
|
"torch.*", "transformers.*", "peft.*", "datasets.*", "bitsandbytes.*",
|
||||||
|
"tiararodney.posix_sdc.*",
|
||||||
|
]
|
||||||
|
ignore_missing_imports = true
|
||||||
|
|
||||||
|
[tool.autopep8]
|
||||||
|
max_line_length = 80
|
||||||
|
aggressive = 3
|
||||||
|
recursive = true
|
||||||
|
|
||||||
|
[tool.setuptools_scm]
|
||||||
5
src/tiararodney/sekft/__init__.py
Normal file
5
src/tiararodney/sekft/__init__.py
Normal file
|
|
@ -0,0 +1,5 @@
|
||||||
|
"""sekft: fine-tune small open models to operate a POSIX shell (sek).
|
||||||
|
|
||||||
|
Consumes the posix-sdc dataset; the trainer, behavioural evaluator, and the
|
||||||
|
resident-base harness live here.
|
||||||
|
"""
|
||||||
20
src/tiararodney/sekft/_log.py
Normal file
20
src/tiararodney/sekft/_log.py
Normal file
|
|
@ -0,0 +1,20 @@
|
||||||
|
"""Console logging setup shared by the sekft entry points.
|
||||||
|
|
||||||
|
Logs go to stderr so stdout stays clean for a command's actual output (metrics
|
||||||
|
JSON, a path a caller might capture). Call :func:`setup` once at the top of a
|
||||||
|
``main()``; modules then log through ``logging.getLogger("sekft.<area>")``.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
|
||||||
|
|
||||||
|
def setup(verbose: bool = False, quiet: bool = False) -> None:
|
||||||
|
"""Configure root logging to stderr. ``quiet`` shows warnings and worse,
|
||||||
|
``verbose`` adds debug; the default is info."""
|
||||||
|
level = logging.WARNING if quiet else logging.DEBUG if verbose else logging.INFO
|
||||||
|
logging.basicConfig(
|
||||||
|
level=level,
|
||||||
|
format="%(asctime)s %(levelname)-5s %(name)s %(message)s",
|
||||||
|
datefmt="%H:%M:%S",
|
||||||
|
)
|
||||||
105
src/tiararodney/sekft/eval.py
Normal file
105
src/tiararodney/sekft/eval.py
Normal file
|
|
@ -0,0 +1,105 @@
|
||||||
|
"""Behavioural eval: the metric that matters.
|
||||||
|
|
||||||
|
Train loss says nothing about whether the model operates the shell and leaves.
|
||||||
|
This loads a fine-tuned model (base + LoRA adapter), drops it into held-out
|
||||||
|
scenarios with NO scaffold (the trained behaviour must stand on its own), and
|
||||||
|
reports the rates that count: does it reach command-mode, does it terminate,
|
||||||
|
does the checker pass.
|
||||||
|
|
||||||
|
sekft-eval --base <hf-dir> --adapter ./ckpt-mistral-r16 \
|
||||||
|
--scenarios ./holdout-scenarios --n 10
|
||||||
|
|
||||||
|
Reuses the posix-sdc rollout loop with a *local* operator: the model renders and
|
||||||
|
generates with the same chat template it was trained on (train == eval == serve,
|
||||||
|
via ``apply_chat_template`` + ``normalize_for_template``, or the prompts go out
|
||||||
|
of distribution). Prerequisites on the box: torch + transformers + peft, the
|
||||||
|
``sekft-dash`` image, and held-out SCENARIO bundles from the posix-sdc factory
|
||||||
|
(not trajectories; the eval stands up and verifies each).
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
from collections.abc import Callable
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
|
||||||
|
from tiararodney.posix_sdc.factory.rollout import rollout
|
||||||
|
from tiararodney.posix_sdc.schema import Scenario
|
||||||
|
|
||||||
|
from .sft import normalize_for_template
|
||||||
|
|
||||||
|
|
||||||
|
def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
|
||||||
|
temperature: float = 0.7) -> Callable[[list[dict[str, str]]], str]:
|
||||||
|
"""A ``messages -> command`` callable backed by base + LoRA adapter.
|
||||||
|
|
||||||
|
Renders the conversation exactly as the model was trained, appends the
|
||||||
|
assistant header, generates one turn, and cuts at the first stop marker.
|
||||||
|
"""
|
||||||
|
import torch
|
||||||
|
from peft import PeftModel
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
|
||||||
|
tok = AutoTokenizer.from_pretrained(adapter)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
base, torch_dtype=torch.float16, device_map="auto")
|
||||||
|
model = PeftModel.from_pretrained(model, adapter)
|
||||||
|
model.eval()
|
||||||
|
|
||||||
|
def operator(messages: list[dict[str, str]]) -> str:
|
||||||
|
msgs = normalize_for_template(messages)
|
||||||
|
ids = tok.apply_chat_template(
|
||||||
|
msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
|
||||||
|
with torch.no_grad():
|
||||||
|
out = model.generate(
|
||||||
|
ids, max_new_tokens=max_new_tokens,
|
||||||
|
do_sample=temperature > 0, temperature=max(temperature, 1e-2),
|
||||||
|
eos_token_id=tok.eos_token_id, pad_token_id=tok.eos_token_id)
|
||||||
|
text: str = tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True).strip()
|
||||||
|
return text
|
||||||
|
|
||||||
|
return operator
|
||||||
|
|
||||||
|
|
||||||
|
def evaluate(base: str, adapter: str, scenarios_dir: Path, n: int,
|
||||||
|
max_steps: int, temperature: float) -> dict[str, Any]:
|
||||||
|
if not available():
|
||||||
|
raise SystemExit("sekft-dash image unavailable; `docker build -t sekft-dash .`")
|
||||||
|
operator = make_local_operator(base, adapter, temperature=temperature)
|
||||||
|
backend = DashDocker()
|
||||||
|
rows = []
|
||||||
|
for f in sorted(scenarios_dir.glob("*.json"))[:n]:
|
||||||
|
sc = Scenario.from_dict(json.loads(f.read_text()))
|
||||||
|
tj = rollout(sc, backend, max_steps=max_steps, temperature=temperature,
|
||||||
|
operator=operator, use_scaffold=False)
|
||||||
|
rows.append(tj)
|
||||||
|
print(f" {sc.id}: {tj.outcome} (terminal={tj.terminal} "
|
||||||
|
f"verified={tj.verified} steps={tj.steps})")
|
||||||
|
d = len(rows) or 1
|
||||||
|
return {
|
||||||
|
"n": len(rows),
|
||||||
|
"operate_rate": round(sum(bool(t.steps > 0 and t.meta.get("clean")) for t in rows) / d, 3),
|
||||||
|
"terminate_rate": round(sum(t.terminal in ("exit", "panic") for t in rows) / d, 3),
|
||||||
|
"verified_rate": round(sum(t.verified for t in rows) / d, 3),
|
||||||
|
"clean_rate": round(sum(t.keep for t in rows) / d, 3),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
ap = argparse.ArgumentParser(description="Behavioural eval of a tuned model.")
|
||||||
|
ap.add_argument("--base", required=True)
|
||||||
|
ap.add_argument("--adapter", required=True)
|
||||||
|
ap.add_argument("--scenarios", type=Path, required=True)
|
||||||
|
ap.add_argument("--n", type=int, default=10)
|
||||||
|
ap.add_argument("--max-steps", type=int, default=30)
|
||||||
|
ap.add_argument("--temperature", type=float, default=0.7)
|
||||||
|
ns = ap.parse_args()
|
||||||
|
m = evaluate(ns.base, ns.adapter, ns.scenarios, ns.n, ns.max_steps, ns.temperature)
|
||||||
|
print("\n=== behavioural metrics ===")
|
||||||
|
print(json.dumps(m, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
0
src/tiararodney/sekft/py.typed
Normal file
0
src/tiararodney/sekft/py.typed
Normal file
189
src/tiararodney/sekft/resident.py
Normal file
189
src/tiararodney/sekft/resident.py
Normal file
|
|
@ -0,0 +1,189 @@
|
||||||
|
"""Resident harness: load the base ONCE, cycle adapters.
|
||||||
|
|
||||||
|
On a slow link (OcuLink / PCIe 3.0 x4) the 14 GB base transfer dominates every
|
||||||
|
process start. This loads the base once and keeps it hot, so the
|
||||||
|
iterate-train-eval loop pays the transfer only at startup. Each ``fit`` trains a
|
||||||
|
fresh LoRA adapter on the resident base and ``unload``s it back to clean; each
|
||||||
|
``evaluate`` attaches a saved adapter for inference and unloads.
|
||||||
|
|
||||||
|
Interactive (IPython on the GPU box) is the intended use:
|
||||||
|
|
||||||
|
from tiararodney.sekft.resident import Resident
|
||||||
|
r = Resident("~/llm-models/mistral-7b-instruct-v0.2", load_4bit=True)
|
||||||
|
r.fit("~/sekft/trajectories", "~/sekft/ckpt-a", lora_r=16, lr=2e-4, epochs=3)
|
||||||
|
r.evaluate("~/sekft/ckpt-a", "~/sekft/holdout", n=10)
|
||||||
|
r.fit("~/sekft/trajectories", "~/sekft/ckpt-b", lora_r=32) # NO base reload
|
||||||
|
|
||||||
|
Or `sekft-resident --base <dir> --selftest-data <stub_dir>` to prove the base
|
||||||
|
loads once and two adapters train against it.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import gc
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import torch
|
||||||
|
from datasets import Dataset
|
||||||
|
from peft import (LoraConfig, PeftModel, get_peft_model,
|
||||||
|
prepare_model_for_kbit_training)
|
||||||
|
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
|
||||||
|
DataCollatorForSeq2Seq, Trainer, TrainingArguments)
|
||||||
|
|
||||||
|
from .sft import build_masked_example, iter_keepers, normalize_for_template
|
||||||
|
|
||||||
|
LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"]
|
||||||
|
|
||||||
|
|
||||||
|
def _free() -> None:
|
||||||
|
gc.collect()
|
||||||
|
torch.cuda.empty_cache()
|
||||||
|
|
||||||
|
|
||||||
|
class Resident:
|
||||||
|
"""A base model held resident on the GPU; adapters cycle through it."""
|
||||||
|
|
||||||
|
def __init__(self, base: str, load_4bit: bool = False) -> None:
|
||||||
|
self.base_path = str(Path(base).expanduser())
|
||||||
|
self.load_4bit = load_4bit
|
||||||
|
self.tok = AutoTokenizer.from_pretrained(self.base_path)
|
||||||
|
if self.tok.pad_token is None:
|
||||||
|
self.tok.pad_token = self.tok.eos_token
|
||||||
|
quant = None
|
||||||
|
if load_4bit:
|
||||||
|
quant = BitsAndBytesConfig(
|
||||||
|
load_in_4bit=True, bnb_4bit_quant_type="nf4",
|
||||||
|
bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True)
|
||||||
|
print(f"[resident] loading base ONCE: {self.base_path} (4bit={load_4bit}) ...")
|
||||||
|
self.base = AutoModelForCausalLM.from_pretrained(
|
||||||
|
self.base_path, dtype=torch.float16, quantization_config=quant)
|
||||||
|
self.base = (prepare_model_for_kbit_training(self.base) if load_4bit
|
||||||
|
else self.base)
|
||||||
|
if not load_4bit:
|
||||||
|
self.base.enable_input_require_grads()
|
||||||
|
dev = next(self.base.parameters()).device
|
||||||
|
mem = torch.cuda.memory_allocated() / 1e9
|
||||||
|
print(f"[resident] base resident on {dev}; {mem:.1f} GB VRAM")
|
||||||
|
|
||||||
|
# -- build masked rows from kept trajectories --------------------------
|
||||||
|
|
||||||
|
def _rows(self, data_dir: Path, max_len: int) -> list[dict[str, list[Any]]]:
|
||||||
|
rows = []
|
||||||
|
for turns in iter_keepers(data_dir):
|
||||||
|
ex = build_masked_example(turns, self.tok)
|
||||||
|
if len(ex["input_ids"]) <= max_len and any(l != -100 for l in ex["labels"]):
|
||||||
|
rows.append(ex)
|
||||||
|
if not rows:
|
||||||
|
raise SystemExit(f"no usable keeper trajectories in {data_dir}")
|
||||||
|
return rows
|
||||||
|
|
||||||
|
# -- train a fresh adapter on the resident base ------------------------
|
||||||
|
|
||||||
|
def fit(self, data_dir: str, out: str, lora_r: int = 16, lr: float = 2e-4,
|
||||||
|
epochs: float = 3.0, batch: int = 1, accum: int = 8,
|
||||||
|
max_len: int = 4096) -> Path:
|
||||||
|
ddir, odir = Path(data_dir).expanduser(), Path(out).expanduser()
|
||||||
|
ds = Dataset.from_list(self._rows(ddir, max_len))
|
||||||
|
if not self.load_4bit:
|
||||||
|
self.base.gradient_checkpointing_enable()
|
||||||
|
model = get_peft_model(self.base, LoraConfig(
|
||||||
|
r=lora_r, lora_alpha=lora_r * 2, lora_dropout=0.05,
|
||||||
|
task_type="CAUSAL_LM", target_modules=LORA_TARGETS))
|
||||||
|
model.print_trainable_parameters()
|
||||||
|
args = TrainingArguments(
|
||||||
|
output_dir=str(odir), per_device_train_batch_size=batch,
|
||||||
|
gradient_accumulation_steps=accum, num_train_epochs=epochs,
|
||||||
|
learning_rate=lr, fp16=True, logging_steps=1, save_strategy="no",
|
||||||
|
report_to=["tensorboard"], logging_dir=str(odir / "runs"),
|
||||||
|
remove_unused_columns=False, warmup_ratio=0.03)
|
||||||
|
tr = Trainer(model=model, args=args, train_dataset=ds,
|
||||||
|
data_collator=DataCollatorForSeq2Seq(
|
||||||
|
self.tok, padding=True, label_pad_token_id=-100))
|
||||||
|
tr.train()
|
||||||
|
odir.mkdir(parents=True, exist_ok=True)
|
||||||
|
model.save_pretrained(str(odir))
|
||||||
|
self.tok.save_pretrained(str(odir))
|
||||||
|
(odir / "log_history.jsonl").write_text(
|
||||||
|
"\n".join(json.dumps(r) for r in tr.state.log_history))
|
||||||
|
losses = [h["loss"] for h in tr.state.log_history if "loss" in h]
|
||||||
|
print(f"[resident] fit -> {odir} final loss {losses[-1] if losses else '?'}")
|
||||||
|
self.base = model.unload() # strip LoRA, restore resident base
|
||||||
|
del model, tr, ds
|
||||||
|
_free()
|
||||||
|
return odir
|
||||||
|
|
||||||
|
# -- behavioural eval of a saved adapter -------------------------------
|
||||||
|
|
||||||
|
def evaluate(self, adapter: str, scenarios_dir: str, n: int = 10,
|
||||||
|
max_steps: int = 30, temperature: float = 0.7) -> dict[str, Any]:
|
||||||
|
from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
|
||||||
|
from tiararodney.posix_sdc.factory.rollout import rollout
|
||||||
|
from tiararodney.posix_sdc.schema import Scenario
|
||||||
|
if not available():
|
||||||
|
raise SystemExit("sekft-dash image unavailable on this box")
|
||||||
|
# adapter=None -> evaluate the BASE model (the within-holdout baseline).
|
||||||
|
if adapter:
|
||||||
|
adapter = str(Path(adapter).expanduser())
|
||||||
|
pm = PeftModel.from_pretrained(self.base, adapter)
|
||||||
|
else:
|
||||||
|
pm = self.base
|
||||||
|
pm.eval()
|
||||||
|
|
||||||
|
def operator(messages: list[dict[str, str]]) -> str:
|
||||||
|
msgs = normalize_for_template(messages)
|
||||||
|
ids = self.tok.apply_chat_template(
|
||||||
|
msgs, add_generation_prompt=True, return_tensors="pt").to(pm.device)
|
||||||
|
with torch.no_grad():
|
||||||
|
o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0,
|
||||||
|
temperature=max(temperature, 1e-2),
|
||||||
|
eos_token_id=self.tok.eos_token_id,
|
||||||
|
pad_token_id=self.tok.eos_token_id)
|
||||||
|
text: str = self.tok.decode(o[0][ids.shape[1]:], skip_special_tokens=True).strip()
|
||||||
|
return text
|
||||||
|
|
||||||
|
backend = DashDocker()
|
||||||
|
rows = []
|
||||||
|
for f in sorted(Path(scenarios_dir).expanduser().glob("*.json"))[:n]:
|
||||||
|
sc = Scenario.from_dict(json.loads(f.read_text()))
|
||||||
|
tj = rollout(sc, backend, max_steps=max_steps, temperature=temperature,
|
||||||
|
operator=operator, use_scaffold=False)
|
||||||
|
rows.append(tj)
|
||||||
|
print(f" {sc.id}: {tj.outcome} terminal={tj.terminal} verified={tj.verified}")
|
||||||
|
d = len(rows) or 1
|
||||||
|
m = {
|
||||||
|
"n": len(rows),
|
||||||
|
"operate_rate": round(sum(bool(t.steps > 0 and t.meta.get("clean")) for t in rows) / d, 3),
|
||||||
|
"terminate_rate": round(sum(t.terminal in ("exit", "panic") for t in rows) / d, 3),
|
||||||
|
"verified_rate": round(sum(t.verified for t in rows) / d, 3),
|
||||||
|
"clean_rate": round(sum(t.keep for t in rows) / d, 3),
|
||||||
|
}
|
||||||
|
if adapter: # base is unwrapped only if we wrapped it
|
||||||
|
self.base = pm.unload()
|
||||||
|
del pm
|
||||||
|
_free()
|
||||||
|
print("[resident] eval:", json.dumps(m))
|
||||||
|
return m
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
ap = argparse.ArgumentParser(description="Resident base; cycle adapters.")
|
||||||
|
ap.add_argument("--base", required=True)
|
||||||
|
ap.add_argument("--load-4bit", action="store_true")
|
||||||
|
ap.add_argument("--selftest-data",
|
||||||
|
help="fit two adapters on this data to prove resident multi-fit")
|
||||||
|
ns = ap.parse_args()
|
||||||
|
r = Resident(ns.base, ns.load_4bit)
|
||||||
|
if ns.selftest_data:
|
||||||
|
print("=== selftest: two fits on the SAME resident base (no reload) ===")
|
||||||
|
r.fit(ns.selftest_data, "/tmp/res-a", epochs=1, lora_r=8)
|
||||||
|
r.fit(ns.selftest_data, "/tmp/res-b", epochs=1, lora_r=8)
|
||||||
|
print("=== selftest OK: base loaded once, two adapters trained ===")
|
||||||
|
else:
|
||||||
|
print("Resident ready. Import and use r.fit() / r.evaluate(), "
|
||||||
|
"or pass --selftest-data <dir>.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
289
src/tiararodney/sekft/sft.py
Normal file
289
src/tiararodney/sekft/sft.py
Normal file
|
|
@ -0,0 +1,289 @@
|
||||||
|
"""sekft trainer: SFT a base model on kept shell-operation trajectories.
|
||||||
|
|
||||||
|
Trains assistant turns ONLY -- the commands and the terminal ``exit`` / ``panic``.
|
||||||
|
The environment turns (system orientation, prompts, command output) are masked
|
||||||
|
to ``-100`` so the model learns to *produce* commands, not to predict the
|
||||||
|
environment's replies. Getting this mask wrong is the classic way to ruin a
|
||||||
|
shell-operator SFT (the model starts hallucinating output), so it is the part
|
||||||
|
worth testing hardest -- and it is framework-independent.
|
||||||
|
|
||||||
|
Render uses the tokenizer's OWN chat template (``apply_chat_template``), so the
|
||||||
|
training render is identical to what the serving harness produces (ccpty sends
|
||||||
|
structured messages and the inference endpoint applies the model's default
|
||||||
|
template). Trajectories are canonicalised first (``normalize_for_template``):
|
||||||
|
a leading ``system`` turn is folded into the first ``user`` turn and consecutive
|
||||||
|
same-role turns are merged, because instruct templates such as Mistral's have no
|
||||||
|
system role and require strict user/assistant alternation. That same
|
||||||
|
canonicalisation must run on the serving side. Everything else is standard
|
||||||
|
causal-LM SFT with an assistant-only loss mask.
|
||||||
|
|
||||||
|
sekft-train --data ./trajectories --base <hf-model-dir> --out ./ckpt
|
||||||
|
sekft-train --data corpus.jsonl --base <dir> # a curated .jsonl corpus
|
||||||
|
sekft-train --hub --base <dir> # the published corpus (Hub)
|
||||||
|
sekft-train --data ./trajectories --base <dir> --inspect # mask stats, no training
|
||||||
|
|
||||||
|
Training needs torch + transformers + peft (a GPU box). ``--inspect`` and the
|
||||||
|
normalize/mask helpers run anywhere a tokenizer with a chat template is
|
||||||
|
available.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from collections.abc import Iterator
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from ._log import setup as _setup_logging
|
||||||
|
|
||||||
|
log = logging.getLogger("sekft.train")
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_for_template(messages: list[dict[str, str]]) -> list[dict[str, str]]:
|
||||||
|
"""Canonicalise a trajectory for instruct chat templates that have no system
|
||||||
|
role and require strict user/assistant alternation (Mistral and friends):
|
||||||
|
treat ``system`` as ``user``, then merge consecutive same-role turns by
|
||||||
|
joining their content with a newline.
|
||||||
|
|
||||||
|
This is loss-neutral for the assistant mask (only environment/user turns
|
||||||
|
ever merge; the assistant commands are never adjacent in this data) and it
|
||||||
|
is what lets ``apply_chat_template`` render the multi-turn shell dialogue.
|
||||||
|
The serving side MUST apply the same canonicalisation, or train and serve
|
||||||
|
diverge again.
|
||||||
|
"""
|
||||||
|
out: list[dict[str, str]] = []
|
||||||
|
for m in messages:
|
||||||
|
role = "user" if m["role"] == "system" else m["role"]
|
||||||
|
if out and out[-1]["role"] == role:
|
||||||
|
out[-1] = {"role": role, "content": out[-1]["content"] + "\n" + m["content"]}
|
||||||
|
else:
|
||||||
|
out.append({"role": role, "content": m["content"]})
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]:
|
||||||
|
"""Tokenize a trajectory with the tokenizer's OWN chat template and build an
|
||||||
|
assistant-only loss mask.
|
||||||
|
|
||||||
|
The render is ``tokenizer.apply_chat_template`` on the canonicalised turns,
|
||||||
|
so it is byte-identical to what the serving harness sends. The mask is
|
||||||
|
derived by token-prefix differencing: the tokens an assistant turn
|
||||||
|
contributes are exactly those that appear when it extends the rendered
|
||||||
|
prefix, which trains the commands plus the template's end-of-turn token (so
|
||||||
|
the model learns to stop) and masks every environment turn to ``-100``. This
|
||||||
|
assumes an additive template (each turn extends the previous render); a
|
||||||
|
non-additive one raises rather than silently mis-mask.
|
||||||
|
"""
|
||||||
|
msgs = normalize_for_template(messages)
|
||||||
|
ids = tokenizer.apply_chat_template(msgs, add_generation_prompt=False)
|
||||||
|
labels = [-100] * len(ids)
|
||||||
|
prev: list[int] = []
|
||||||
|
for i, m in enumerate(msgs):
|
||||||
|
upto = tokenizer.apply_chat_template(msgs[:i + 1], add_generation_prompt=False)
|
||||||
|
if ids[:len(upto)] != upto or upto[:len(prev)] != prev:
|
||||||
|
raise ValueError("chat template is not additive; cannot derive an "
|
||||||
|
"assistant loss mask by token-prefix differencing")
|
||||||
|
if m["role"] == "assistant":
|
||||||
|
for j in range(len(prev), len(upto)):
|
||||||
|
labels[j] = ids[j]
|
||||||
|
prev = upto
|
||||||
|
return {"input_ids": ids, "attention_mask": [1] * len(ids), "labels": labels}
|
||||||
|
|
||||||
|
|
||||||
|
def iter_keepers(data_dir: Path) -> Iterator[list[dict[str, str]]]:
|
||||||
|
"""Yield ``turns`` (message lists) from raw rollout JSONs marked keep."""
|
||||||
|
for f in sorted(data_dir.glob("*.json")):
|
||||||
|
d = json.loads(f.read_text())
|
||||||
|
if d.get("keep"):
|
||||||
|
yield d["turns"]
|
||||||
|
|
||||||
|
|
||||||
|
def load_turns(data: Path, hub: bool = False,
|
||||||
|
revision: str | None = None) -> Iterator[list[dict[str, str]]]:
|
||||||
|
"""Yield assistant-bearing ``turns`` from one of three sources:
|
||||||
|
|
||||||
|
- ``--hub``: the published corpus via posix-sdc's ``load_trajectories`` (the
|
||||||
|
in-repo ``data/`` of a posix-sdc checkout, else the Hugging Face Hub);
|
||||||
|
- ``data`` a ``.jsonl`` file: a curated corpus, already keep-filtered, one
|
||||||
|
record per line;
|
||||||
|
- ``data`` a directory: raw rollout ``.json`` (keep-filtered here).
|
||||||
|
|
||||||
|
posix-sdc is imported lazily, so the raw-dir and ``.jsonl`` paths need
|
||||||
|
neither posix-sdc nor huggingface_hub installed.
|
||||||
|
"""
|
||||||
|
if hub:
|
||||||
|
from tiararodney.posix_sdc import load_trajectories
|
||||||
|
for r in load_trajectories(revision=revision):
|
||||||
|
yield r["turns"]
|
||||||
|
elif data.is_dir():
|
||||||
|
yield from iter_keepers(data)
|
||||||
|
elif data.suffix == ".jsonl":
|
||||||
|
with open(data) as fh:
|
||||||
|
for line in fh:
|
||||||
|
if line.strip():
|
||||||
|
yield json.loads(line)["turns"]
|
||||||
|
else:
|
||||||
|
raise SystemExit(
|
||||||
|
f"--data must be a rollout directory or a .jsonl corpus (got {data})")
|
||||||
|
|
||||||
|
|
||||||
|
def mask_stats(example: dict[str, list[Any]]) -> tuple[int, int]:
|
||||||
|
"""(trained tokens, total tokens) for an example."""
|
||||||
|
trained = sum(1 for x in example["labels"] if x != -100)
|
||||||
|
return trained, len(example["labels"])
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------
|
||||||
|
# Training (GPU box: torch + transformers + peft)
|
||||||
|
# --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def train(data_dir: Path, base: str, out: Path, epochs: float, lr: float,
|
||||||
|
batch: int, accum: int, max_len: int, lora_r: int,
|
||||||
|
load_4bit: bool = False, hub: bool = False,
|
||||||
|
revision: str | None = None) -> None:
|
||||||
|
import torch
|
||||||
|
from datasets import Dataset
|
||||||
|
from peft import LoraConfig, get_peft_model
|
||||||
|
from transformers import (AutoModelForCausalLM, AutoTokenizer,
|
||||||
|
DataCollatorForSeq2Seq, Trainer, TrainingArguments)
|
||||||
|
from transformers.utils import logging as hf_logging
|
||||||
|
|
||||||
|
# Surface the Trainer's own per-step curve (loss/lr/grad_norm); it is at
|
||||||
|
# WARNING by default, which is most of why training looks silent.
|
||||||
|
hf_logging.set_verbosity_info()
|
||||||
|
|
||||||
|
source = "hub" if hub else data_dir
|
||||||
|
log.info("base=%s data=%s out=%s", base, source, out)
|
||||||
|
log.info("loading tokenizer: %s", base)
|
||||||
|
tok = AutoTokenizer.from_pretrained(base)
|
||||||
|
if tok.pad_token is None:
|
||||||
|
tok.pad_token = tok.eos_token
|
||||||
|
|
||||||
|
log.info("building masked examples from %s ...", source)
|
||||||
|
rows: list[dict[str, list[Any]]] = []
|
||||||
|
n_seen = n_long = n_empty = 0
|
||||||
|
for turns in load_turns(data_dir, hub=hub, revision=revision):
|
||||||
|
n_seen += 1
|
||||||
|
ex = build_masked_example(turns, tok)
|
||||||
|
log.debug(" trajectory %d: %d turns -> %d tokens, %d trained",
|
||||||
|
n_seen, len(turns), len(ex["input_ids"]), mask_stats(ex)[0])
|
||||||
|
if n_seen % 100 == 0:
|
||||||
|
log.info(" ... %d trajectories processed, %d usable", n_seen, len(rows))
|
||||||
|
if len(ex["input_ids"]) > max_len:
|
||||||
|
n_long += 1
|
||||||
|
continue
|
||||||
|
if not any(l != -100 for l in ex["labels"]):
|
||||||
|
n_empty += 1
|
||||||
|
continue
|
||||||
|
rows.append(ex)
|
||||||
|
if not rows:
|
||||||
|
raise SystemExit(f"no usable keeper trajectories in {data_dir}")
|
||||||
|
trained = sum(mask_stats(r)[0] for r in rows)
|
||||||
|
total = sum(mask_stats(r)[1] for r in rows)
|
||||||
|
log.info("dataset: %d keepers -> %d usable; %d trained / %d tokens (%.1f%% assistant)",
|
||||||
|
n_seen, len(rows), trained, total, 100 * trained / total)
|
||||||
|
if n_long or n_empty:
|
||||||
|
log.warning("dropped %d trajectories: %d over --max-len %d, %d empty-mask",
|
||||||
|
n_long + n_empty, n_long, max_len, n_empty)
|
||||||
|
ds = Dataset.from_list(rows)
|
||||||
|
|
||||||
|
# 4-bit (QLoRA) shrinks the base from ~14 GB to ~4 GB to move across the
|
||||||
|
# OcuLink/PCIe link and to hold in VRAM; nf4 + fp16 compute works on the
|
||||||
|
# V100 (sm_70). Without it, plain fp16 weights.
|
||||||
|
quant = None
|
||||||
|
if load_4bit:
|
||||||
|
from transformers import BitsAndBytesConfig
|
||||||
|
quant = BitsAndBytesConfig(
|
||||||
|
load_in_4bit=True, bnb_4bit_quant_type="nf4",
|
||||||
|
bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True,
|
||||||
|
)
|
||||||
|
log.info("loading base model: %s (%s)", base,
|
||||||
|
"4-bit QLoRA" if load_4bit else "fp16")
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
base, dtype=torch.float16, quantization_config=quant)
|
||||||
|
if load_4bit:
|
||||||
|
from peft import prepare_model_for_kbit_training
|
||||||
|
model = prepare_model_for_kbit_training(model) # handles ckpt + input grads
|
||||||
|
else:
|
||||||
|
model.enable_input_require_grads()
|
||||||
|
model.gradient_checkpointing_enable()
|
||||||
|
model = get_peft_model(model, LoraConfig(
|
||||||
|
r=lora_r, lora_alpha=lora_r * 2, lora_dropout=0.05, task_type="CAUSAL_LM",
|
||||||
|
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
|
||||||
|
))
|
||||||
|
n_train, n_all = model.get_nb_trainable_parameters()
|
||||||
|
log.info("LoRA r=%d: %d trainable / %d params (%.3f%%)",
|
||||||
|
lora_r, n_train, n_all, 100 * n_train / n_all)
|
||||||
|
|
||||||
|
args = TrainingArguments(
|
||||||
|
output_dir=str(out), per_device_train_batch_size=batch,
|
||||||
|
gradient_accumulation_steps=accum, num_train_epochs=epochs,
|
||||||
|
learning_rate=lr, fp16=True, logging_steps=1, save_strategy="epoch",
|
||||||
|
report_to=["tensorboard"], logging_dir=str(out / "runs"),
|
||||||
|
remove_unused_columns=False, warmup_ratio=0.03,
|
||||||
|
)
|
||||||
|
trainer = Trainer(
|
||||||
|
model=model, args=args, train_dataset=ds,
|
||||||
|
data_collator=DataCollatorForSeq2Seq(tok, padding=True, label_pad_token_id=-100),
|
||||||
|
)
|
||||||
|
log.info("training: %g epochs, lr=%g, batch=%d x accum=%d (effective %d), max_len=%d",
|
||||||
|
epochs, lr, batch, accum, batch * accum, max_len)
|
||||||
|
trainer.train()
|
||||||
|
model.save_pretrained(str(out))
|
||||||
|
tok.save_pretrained(str(out))
|
||||||
|
# durable, greppable record of the curve (loss/lr/grad_norm per step).
|
||||||
|
(out / "log_history.jsonl").write_text(
|
||||||
|
"\n".join(json.dumps(r) for r in trainer.state.log_history))
|
||||||
|
log.info("saved LoRA adapter + log_history.jsonl -> %s (tensorboard: --logdir %s)",
|
||||||
|
out, out / "runs")
|
||||||
|
|
||||||
|
|
||||||
|
def inspect(data_dir: Path, base: str, hub: bool = False,
|
||||||
|
revision: str | None = None) -> None:
|
||||||
|
from transformers import AutoTokenizer
|
||||||
|
log.info("loading tokenizer: %s", base)
|
||||||
|
tok = AutoTokenizer.from_pretrained(base)
|
||||||
|
n = tt = tr = 0
|
||||||
|
for turns in load_turns(data_dir, hub=hub, revision=revision):
|
||||||
|
ex = build_masked_example(turns, tok)
|
||||||
|
t, total = mask_stats(ex)
|
||||||
|
tr += t; tt += total; n += 1
|
||||||
|
if not n:
|
||||||
|
raise SystemExit(f"no keeper trajectories in {data_dir}")
|
||||||
|
log.info("%d keeper trajectories; %d/%d tokens trained (%.1f%% assistant, rest masked)",
|
||||||
|
n, tr, tt, 100 * tr / tt)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
ap = argparse.ArgumentParser(description="SFT a model on shell trajectories.")
|
||||||
|
ap.add_argument("--data", type=Path, default=Path("./trajectories"),
|
||||||
|
help="a raw rollout dir or a curated .jsonl corpus")
|
||||||
|
ap.add_argument("--hub", action="store_true",
|
||||||
|
help="load the published corpus via posix-sdc (Hub); ignores --data")
|
||||||
|
ap.add_argument("--revision", default=None,
|
||||||
|
help="dataset revision/tag to pin when using --hub")
|
||||||
|
ap.add_argument("--base", required=True, help="HF model id or local dir")
|
||||||
|
ap.add_argument("--out", type=Path, default=Path("./ckpt"))
|
||||||
|
ap.add_argument("--inspect", action="store_true", help="mask stats only, no training")
|
||||||
|
ap.add_argument("--epochs", type=float, default=3.0)
|
||||||
|
ap.add_argument("--lr", type=float, default=2e-4)
|
||||||
|
ap.add_argument("--batch", type=int, default=1)
|
||||||
|
ap.add_argument("--accum", type=int, default=8)
|
||||||
|
ap.add_argument("--max-len", type=int, default=4096)
|
||||||
|
ap.add_argument("--lora-r", type=int, default=16)
|
||||||
|
ap.add_argument("--load-4bit", action="store_true",
|
||||||
|
help="QLoRA: load base in 4-bit (less to move over the link, less VRAM)")
|
||||||
|
ap.add_argument("-v", "--verbose", action="store_true", help="debug-level logging")
|
||||||
|
ap.add_argument("-q", "--quiet", action="store_true", help="warnings and errors only")
|
||||||
|
ns = ap.parse_args()
|
||||||
|
_setup_logging(verbose=ns.verbose, quiet=ns.quiet)
|
||||||
|
if ns.inspect:
|
||||||
|
inspect(ns.data, ns.base, hub=ns.hub, revision=ns.revision)
|
||||||
|
else:
|
||||||
|
train(ns.data, ns.base, ns.out, ns.epochs, ns.lr, ns.batch, ns.accum,
|
||||||
|
ns.max_len, ns.lora_r, ns.load_4bit, hub=ns.hub, revision=ns.revision)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
30
tests/smoke/test_entrypoints.py
Normal file
30
tests/smoke/test_entrypoints.py
Normal file
|
|
@ -0,0 +1,30 @@
|
||||||
|
"""Smoke tests: the console entry points load and respond to --help without the
|
||||||
|
GPU stack (torch is imported lazily inside the training/eval code paths)."""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
ROOT = Path(__file__).resolve().parents[2]
|
||||||
|
SRC = ROOT / "src"
|
||||||
|
POSIX_SRC = ROOT.parent / "posix-sdc" / "src"
|
||||||
|
|
||||||
|
|
||||||
|
def _help(module: str) -> "subprocess.CompletedProcess[str]":
|
||||||
|
env = dict(os.environ, PYTHONPATH=os.pathsep.join([str(SRC), str(POSIX_SRC)]))
|
||||||
|
return subprocess.run([sys.executable, "-m", module, "--help"],
|
||||||
|
capture_output=True, text=True, env=env)
|
||||||
|
|
||||||
|
|
||||||
|
def test_train_help() -> None:
|
||||||
|
cp = _help("tiararodney.sekft.sft")
|
||||||
|
assert cp.returncode == 0, cp.stderr
|
||||||
|
assert "--data" in cp.stdout
|
||||||
|
|
||||||
|
|
||||||
|
def test_eval_help() -> None:
|
||||||
|
cp = _help("tiararodney.sekft.eval")
|
||||||
|
assert cp.returncode == 0, cp.stderr
|
||||||
|
assert "--adapter" in cp.stdout
|
||||||
35
tests/unit/test_load.py
Normal file
35
tests/unit/test_load.py
Normal file
|
|
@ -0,0 +1,35 @@
|
||||||
|
"""Unit tests for the trainer's three-source data loader (raw dir / curated
|
||||||
|
jsonl). The Hub path delegates to posix-sdc and is covered there."""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from tiararodney.sekft import sft
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_turns_from_raw_dir(tmp_path: Path) -> None:
|
||||||
|
(tmp_path / "a.json").write_text(json.dumps(
|
||||||
|
{"keep": True, "turns": [{"role": "assistant", "content": "ls"}]}))
|
||||||
|
(tmp_path / "b.json").write_text(json.dumps( # not kept -> excluded
|
||||||
|
{"keep": False, "turns": [{"role": "assistant", "content": "rm -rf /"}]}))
|
||||||
|
got = list(sft.load_turns(tmp_path))
|
||||||
|
assert len(got) == 1
|
||||||
|
assert got[0][0]["content"] == "ls"
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_turns_from_jsonl(tmp_path: Path) -> None:
|
||||||
|
f = tmp_path / "corpus.jsonl"
|
||||||
|
f.write_text("\n".join(json.dumps({"turns": [{"role": "assistant", "content": c}]})
|
||||||
|
for c in ("ls", "cat x")) + "\n")
|
||||||
|
got = list(sft.load_turns(f))
|
||||||
|
assert [t[0]["content"] for t in got] == ["ls", "cat x"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_turns_rejects_other_paths(tmp_path: Path) -> None:
|
||||||
|
bad = tmp_path / "notes.txt"
|
||||||
|
bad.write_text("hi")
|
||||||
|
with pytest.raises(SystemExit):
|
||||||
|
list(sft.load_turns(bad))
|
||||||
75
tests/unit/test_sft.py
Normal file
75
tests/unit/test_sft.py
Normal file
|
|
@ -0,0 +1,75 @@
|
||||||
|
"""Unit tests for the SFT render canonicalisation and assistant-only mask.
|
||||||
|
|
||||||
|
These run anywhere: a fake additive tokenizer stands in for a real chat
|
||||||
|
template, so no torch/transformers is needed."""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from tiararodney.sekft import sft
|
||||||
|
|
||||||
|
|
||||||
|
class FakeTok:
|
||||||
|
"""Additive chat template: each turn renders to ``<role> tokens... </e>``;
|
||||||
|
the generation prompt appends ``<assistant>``."""
|
||||||
|
|
||||||
|
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
|
||||||
|
return_tensors: Any = None) -> list[str]:
|
||||||
|
toks: list[str] = []
|
||||||
|
for m in msgs:
|
||||||
|
toks.append(f"<{m['role']}>")
|
||||||
|
toks += m["content"].split()
|
||||||
|
toks.append("</e>")
|
||||||
|
if add_generation_prompt:
|
||||||
|
toks.append("<assistant>")
|
||||||
|
return toks
|
||||||
|
|
||||||
|
|
||||||
|
def test_normalize_folds_system_and_merges_consecutive() -> None:
|
||||||
|
raw = [
|
||||||
|
{"role": "system", "content": "orient"},
|
||||||
|
{"role": "user", "content": "login"},
|
||||||
|
{"role": "user", "content": "prompt"},
|
||||||
|
{"role": "assistant", "content": "cat f"},
|
||||||
|
{"role": "user", "content": "out"},
|
||||||
|
{"role": "user", "content": "prompt"},
|
||||||
|
{"role": "assistant", "content": "exit"},
|
||||||
|
]
|
||||||
|
norm = sft.normalize_for_template(raw)
|
||||||
|
assert [m["role"] for m in norm] == ["user", "assistant", "user", "assistant"]
|
||||||
|
assert norm[0]["content"] == "orient\nlogin\nprompt"
|
||||||
|
|
||||||
|
|
||||||
|
def test_normalize_leaves_clean_alternation_untouched() -> None:
|
||||||
|
raw = [{"role": "user", "content": "a"}, {"role": "assistant", "content": "b"}]
|
||||||
|
assert sft.normalize_for_template(raw) == raw
|
||||||
|
|
||||||
|
|
||||||
|
def test_mask_trains_assistant_turns_only() -> None:
|
||||||
|
raw = [
|
||||||
|
{"role": "system", "content": "orient"},
|
||||||
|
{"role": "user", "content": "login"},
|
||||||
|
{"role": "assistant", "content": "cat f"},
|
||||||
|
{"role": "user", "content": "out"},
|
||||||
|
{"role": "assistant", "content": "exit"},
|
||||||
|
]
|
||||||
|
ex = sft.build_masked_example(raw, FakeTok())
|
||||||
|
trained = [t for t, lab in zip(ex["input_ids"], ex["labels"]) if lab != -100]
|
||||||
|
masked = [t for t, lab in zip(ex["input_ids"], ex["labels"]) if lab == -100]
|
||||||
|
assert set(trained) <= {"<assistant>", "cat", "f", "exit", "</e>"}
|
||||||
|
assert "cat" in trained and "exit" in trained # both commands present
|
||||||
|
assert {"orient", "login", "out"} <= set(masked) # environment masked
|
||||||
|
|
||||||
|
|
||||||
|
def test_mask_raises_on_non_additive_template() -> None:
|
||||||
|
class BadTok:
|
||||||
|
def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
|
||||||
|
return_tensors: Any = None) -> list[int]:
|
||||||
|
return list(range(len(msgs), 0, -1)) # reversed: prefixes do not nest
|
||||||
|
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
sft.build_masked_example(
|
||||||
|
[{"role": "user", "content": "a"}, {"role": "assistant", "content": "b"}],
|
||||||
|
BadTok())
|
||||||
47
tox.ini
Normal file
47
tox.ini
Normal file
|
|
@ -0,0 +1,47 @@
|
||||||
|
[tox]
|
||||||
|
requires =
|
||||||
|
tox>=4.19
|
||||||
|
env_list =
|
||||||
|
unit-py3{9-13}
|
||||||
|
smoke-py3{9-13}
|
||||||
|
lint
|
||||||
|
format
|
||||||
|
|
||||||
|
[testenv]
|
||||||
|
deps =
|
||||||
|
../posix-sdc
|
||||||
|
.
|
||||||
|
|
||||||
|
[testenv:lint]
|
||||||
|
description = run type check on code base
|
||||||
|
labels = static
|
||||||
|
deps =
|
||||||
|
mypy
|
||||||
|
commands =
|
||||||
|
mypy src tests --junit-xml test-reports/{env_name}.xml
|
||||||
|
|
||||||
|
[testenv:format]
|
||||||
|
description = check formatting
|
||||||
|
labels = static
|
||||||
|
deps =
|
||||||
|
autopep8
|
||||||
|
commands =
|
||||||
|
autopep8 --diff --exit-code src tests
|
||||||
|
|
||||||
|
[testenv:unit-py3{9-13}]
|
||||||
|
description = run unit tests
|
||||||
|
labels = unit
|
||||||
|
deps =
|
||||||
|
{[testenv]deps}
|
||||||
|
pytest
|
||||||
|
commands =
|
||||||
|
pytest tests/unit --junitxml=test-reports/{env_name}.xml
|
||||||
|
|
||||||
|
[testenv:smoke-py3{9-13}]
|
||||||
|
description = run smoke tests against the console entry points
|
||||||
|
labels = smoke
|
||||||
|
deps =
|
||||||
|
{[testenv]deps}
|
||||||
|
pytest
|
||||||
|
commands =
|
||||||
|
pytest tests/smoke --junitxml=test-reports/{env_name}.xml
|
||||||
Loading…
Add table
Add a link
Reference in a new issue