Merge branch 'develop'

2026-06-18 00:59:08 +02:00 · 2026-06-18 00:59:08 +02:00 · 705b4a028b
commit 705b4a028b
parent ad027a7d52 7856ef6032
18 changed files with 3276 additions and 14 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -0,0 +1,41 @@
+# Changelog
+
+All notable changes to sekft, the shell-operator SFT trainer behind the
+[posix-sdc](https://huggingface.co/datasets/tiararodney/posix-sdc) experiment,
+are documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and the project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [1.0.0] - 2026-06-18
+
+First release: the training and evaluation pipeline that turns posix-sdc
+trajectories into a fine-tuned shell operator.
+
+### Added
+- `sekft-train`: LoRA / QLoRA supervised fine-tuning of a base model on
+  shell-operation trajectories, with an **assistant-only loss mask** derived by
+  token-prefix differencing — the commands and the terminal `exit` / `panic`
+  token are trained; the environment turns (orientation, prompts, command
+  output) are masked to `-100`. The render uses the tokenizer's own
+  `apply_chat_template`, so training matches what the serving harness sends
+  (train = serve), with `normalize_for_template` canonicalising trajectories for
+  instruct templates that have no system role and require strict user/assistant
+  alternation.
+- Three sources of training data: a directory of raw rollout `.json`
+  (keep-filtered), a curated `.jsonl` corpus, or the published posix-sdc corpus
+  over the Hugging Face Hub (`--hub`).
+- `--inspect` for mask and token statistics without training, and structured
+  stderr logging across every phase (`-v` / `-q`): per-trajectory and progress
+  lines while the corpus is tokenized, dataset accounting that warns on dropped
+  (over-length / empty-mask) trajectories, and the per-step training curve.
+- `sekft-eval`: behavioural evaluation that drops the tuned model into held-out
+  scenarios with no scaffold and scores whether it operates and terminates.
+- `sekft-resident`: a resident-base harness that loads the base model once and
+  fits several adapters without reloading, for paired / STaR-style runs.
+- Packaging: the `tiararodney.sekft` namespace package with `sekft-train`,
+  `sekft-eval`, and `sekft-resident` console scripts; a typed (`py.typed`),
+  mypy-strict codebase; an optional `[gpu]` extra (torch / transformers / peft);
+  and a dependency on `posix-sdc[hub]`. Released under GPL-2.0.
+
+[1.0.0]: https://git.code.tiararodney.com/tiara/sekft/releases/tag/v1.0.0
--- a/14
+++ b/14
@ -1,14 +0,0 @@
-# Minimal dash-in-a-box for sekft trajectory generation.
-#   docker build -t sekft-dash .
-#
-# dash as the operated shell (strict POSIX, no bashisms), busybox applets for
-# the coreutils.  busybox is intentionally close to minimal POSIX so trajectories
-# transfer toward sek rather than encoding GNU-isms.  Add `coreutils findutils
-# grep sed` here if you want GNU semantics instead.
-FROM alpine:3.19
-RUN apk add --no-cache dash \
- && ln -sf /usr/bin/dash /bin/dash \
- && ln -sf /usr/bin/dash /bin/sh
-# /work is the default arena; provider files land at their absolute paths.
-RUN mkdir -p /work
-WORKDIR /work
--- a/338
+++ b/338
@ -0,0 +1,338 @@
+                    GNU GENERAL PUBLIC LICENSE
+                       Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ <https://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                    GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+  2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) You must cause the modified files to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    b) You must cause any work that you distribute or publish, that in
+    whole or in part contains or is derived from the Program or any
+    part thereof, to be licensed as a whole at no charge to all third
+    parties under the terms of this License.
+
+    c) If the modified program normally reads commands interactively
+    when run, you must cause it, when started running for such
+    interactive use in the most ordinary way, to print or display an
+    announcement including an appropriate copyright notice and a
+    notice that there is no warranty (or else, saying that you provide
+    a warranty) and that users may redistribute the program under
+    these conditions, and telling the user how to view a copy of this
+    License.  (Exception: if the Program itself is interactive but
+    does not normally print such an announcement, your work based on
+    the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+    a) Accompany it with the complete corresponding machine-readable
+    source code, which must be distributed under the terms of Sections
+    1 and 2 above on a medium customarily used for software interchange; or,
+
+    b) Accompany it with a written offer, valid for at least three
+    years, to give any third party, for a charge no more than your
+    cost of physically performing source distribution, a complete
+    machine-readable copy of the corresponding source code, to be
+    distributed under the terms of Sections 1 and 2 above on a medium
+    customarily used for software interchange; or,
+
+    c) Accompany it with the information you received as to the offer
+    to distribute corresponding source code.  (This alternative is
+    allowed only for noncommercial distribution and only if you
+    received the program in object code or executable form with such
+    an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it.  For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable.  However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+  5. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Program or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+  6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+  7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded.  In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+  9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation.  If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+  10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission.  For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this.  Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+                            NO WARRANTY
+
+  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License along
+    with this program; if not, see <https://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+    Gnomovision version 69, Copyright (C) year name of author
+    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  <signature of Moe Ghoul>, 1 April 1989
+  Moe Ghoul, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.
--- a/37
+++ b/37
@ -0,0 +1,37 @@
+[[source]]
+url = "https://pypi.org/simple"
+verify_ssl = true
+name = "pypi"
+
+[[source]]
+url = "https://pypi.code.tiararodney.com/root/byteb4rb1e/+simple/"
+verify_ssl = true
+name = "pypicodetiararodney"
+
+[packages]
+"tiararodney.sekft" = {file = ".", editable = true}
+"tiararodney.posix-sdc" = {version = "*", index = "pypicodetiararodney", extras=  ["hub"]}
+
+[dev-packages]
+tox = "*"
+pytest = "*"
+mypy = "*"
+build = "*"
+twine = "*"
+setuptools-scm = "~=8.2.0"
+pypi-attestations = "*"
+autopep8 = "*"
+"tiararodney.posix-sdc" = {ref = "develop", git = "https://git.code.tiararodney.com/tiara/posix-sdc.git", extras = ["hub"]}
+
+[requires]
+python_version = "3"
+
+[scripts]
+"dist" = "python3 -m build"
+"dist:attestations" = "python3 -m pypi_attestations sign dist/*"
+"dist:publish:tiararodney" = "python3 -m twine upload --sign --repository tiararodney dist/*"
+"test" = "tox"
+"test:static" = "tox run -m static"
+"test:unit" = "tox run -m unit"
+"test:integration" = "tox run -m integration"
+"test:smoke" = "tox run -m smoke"
--- a/Pipfile.lock
+++ b/Pipfile.lock
--- a/README.md
+++ b/README.md
@ -0,0 +1,79 @@
+# sekft
+
+Fine-tune small open models to operate a POSIX shell as a self-directed citizen:
+land with **no imperative**, discover where directives live, learn the provider
+from its own self-documentation, do the work, and terminate (`exit` on success,
+`panic` when genuinely blocked).
+
+sekft is the **training half**. The dataset and the synthetic-data factory live
+in [`posix-sdc`](../posix-sdc) (`tiararodney.posix-sdc`), which this package
+depends on. Here live the trainer, the behavioural evaluator, and the
+resident-base harness.
+
+## Components
+
+- **`sekft.sft`** (`sekft-train`) — supervised fine-tuner. Renders trajectories
+  with the tokenizer's own chat template and trains an **assistant-only** loss
+  mask (the commands plus the terminal token; environment turns masked to -100)
+  into a QLoRA adapter. Getting the mask wrong is the classic way to ruin a
+  shell-operator SFT, so it is the part tested hardest.
+- **`sekft.eval`** (`sekft-eval`) — behavioural eval. Train loss says nothing
+  about whether the model operates the shell and leaves. This drops base +
+  adapter into held-out scenarios with no scaffold and reports the rates that
+  count: reach command-mode, terminate, checker passes.
+- **`sekft.resident`** (`sekft-resident`) — resident-base harness. Loads the
+  14 GB base once and keeps it hot, training and evaluating adapters without
+  reloading it (over OcuLink/PCIe the base transfer otherwise dominates every
+  run).
+
+## The render contract
+
+The render the model trains on MUST equal what it is served with. The serving
+harness (ccpty) sends structured `{role, content}` messages over the OpenAI
+chat-completions protocol, so the endpoint applies the **model's own chat
+template**. sekft therefore renders with `apply_chat_template`, after
+`normalize_for_template` canonicalises each session: a leading `system` turn is
+folded into the first `user` turn and consecutive same-role turns are merged,
+because instruct templates such as Mistral's have no system role and require
+strict user/assistant alternation. The same canonicalisation must run
+serve-side, or train and serve diverge.
+
+## Install
+
+The training paths only run on a CUDA host, so the GPU stack is an extra:
+
+```sh
+pipenv install              # editable sekft + the local editable posix-sdc
+pipenv install -e '.[gpu]'  # torch / transformers / peft / datasets, on the box
+```
+
+`pyproject.toml` declares `tiararodney.posix-sdc` abstractly; the `Pipfile`
+overrides it with the local editable `../posix-sdc` for side-by-side development.
+
+## Use (on the GPU box)
+
+```sh
+# fine-tune an adapter on the posix-sdc trajectories
+sekft-train --data ./trajectories --base mistralai/Mistral-7B-Instruct-v0.2 \
+            --out ./ckpt --load-4bit
+
+# inspect the assistant-only loss mask without training (runs anywhere)
+sekft-train --data ./trajectories --base <dir> --inspect
+
+# behavioural eval on held-out scenario bundles (worlds, not trajectories)
+sekft-eval --base <dir> --adapter ./ckpt --scenarios ./holdout --n 16
+
+# resident loop: load the base once, cycle adapters without reloading it
+sekft-resident --base <dir> --load-4bit
+```
+
+The eval consumes held-out **scenario bundles** from posix-sdc (it stands up and
+verifies each in a fresh container), not trajectories.
+
+## Result
+
+Fine-tuning `mistralai/Mistral-7B-Instruct-v0.2` on the posix-sdc data lifted
+clean termination on archetype-level held-out scenarios from **0/16 (base) to
+9/16 (tuned)**: the operate-and-terminate mechanism generalised to unseen task
+types, while task competence stayed archetype-local. See the experiment
+[*From seed to weights*](https://blog.tiararodney.com/projects/2026/semantic-execution-kernel/experiments/from-seed-to-weights/).
--- a/234
+++ b/234
@ -15,3 +15,237 @@ Mappings:
        - Module: sekft
          Product: sek
          Component: sekft
+
+--ISSUE
+Content-Type: application/issue
+ID: 1
+Type: feature
+Title: Package sekft as an installable namespace package
+Status: done
+Priority: medium
+Created: 2026-06-16
+Module: sekft
+Relationships: 
+Description: Turn the flat trainer scripts into an installable tiararodney.sekft
+             namespace package: src layout, pyproject with the abstract
+             posix-sdc dependency and an optional gpu extra, console scripts, a
+             Pipfile pinning posix-sdc as a local editable override, and tox
+             environments.
+
+--ISSUE
+Content-Type: application/issue
+ID: 2
+Type: feature
+Title: SFT trainer with chat-template render and assistant-only mask
+Status: done
+Priority: medium
+Created: 2026-06-16
+Module: sekft
+Relationships: 
+Description: Add the supervised fine-tuner: render trajectories through the
+             tokenizer's own chat template (matching serving), canonicalise
+             turns (fold system, merge consecutive), derive an assistant-only
+             loss mask by token-prefix differencing, and train a QLoRA adapter.
+
+--ISSUE
+Content-Type: application/issue
+ID: 3
+Type: feature
+Title: Behavioural evaluator
+Status: done
+Priority: medium
+Created: 2026-06-16
+Module: sekft
+Relationships: 
+Description: Add the behavioural eval: load base plus LoRA adapter, drop it into
+             held-out scenarios with no scaffold, drive them through a local
+             operator that renders with the model's chat template, and report
+             reach/terminate/checker rates.
+
+--ISSUE
+Content-Type: application/issue
+ID: 4
+Type: feature
+Title: Resident-base train/eval harness
+Status: done
+Priority: medium
+Created: 2026-06-16
+Module: sekft
+Relationships: 
+Description: Add the resident harness that loads the 14GB base once and keeps it
+             hot, training fresh LoRA adapters and evaluating them without
+             reloading the base, for the slow-OcuLink iterate loop.
+
+--ISSUE
+Content-Type: application/issue
+ID: 5
+Type: feature
+Title: Pipeline overview README
+Status: done
+Priority: medium
+Created: 2026-06-16
+Module: sekft
+Relationships: 
+Description: Document the sekft pipeline: the trainer, evaluator, and resident
+             harness; how they consume the posix-sdc dataset; the render
+             contract; and how to run on the GPU box.
+
+--ISSUE
+Content-Type: application/issue
+ID: 6
+Type: feature
+Title: Test suite: unit and smoke
+Status: done
+Priority: medium
+Created: 2026-06-16
+Module: sekft
+Relationships: 
+Description: Add a pytest suite: torch-free unit tests for the render
+             canonicalisation and assistant-only mask (fake tokenizer), and
+             smoke tests that the console entry points respond to --help without
+             the GPU stack.
+
+--ISSUE
+Content-Type: application/issue
+ID: 7
+Type: feature
+Title: Add GPL-2.0 license and drop the relocated Dockerfile
+Status: done
+Priority: medium
+Created: 2026-06-16
+Module: sekft
+Relationships: 
+Description: License sekft under GPL-2.0 (canonical text plus pyproject
+             metadata) and remove the dash Dockerfile, which now lives in
+             posix-sdc under docker/alpine-dash.
+
+--ISSUE
+Content-Type: application/issue
+ID: 8
+Type: feature
+Title: Refresh docs for the packaged trainer
+Status: done
+Priority: medium
+Created: 2026-06-16
+Module: sekft
+Relationships: 
+Description: The README still describes sekft as the data factory
+             (generate/rollout/dashdocker/taxonomy/schema), which all moved to
+             posix-sdc. Rewrite it as the trainer (sft/eval/resident) that
+             consumes posix-sdc, and update the module docstrings to
+             console-script invocations and the chat-template render contract.
+
+--ISSUE
+Content-Type: application/issue
+ID: 9
+Type: feature
+Title: Type-check the package under mypy strict
+Status: done
+Priority: medium
+Created: 2026-06-17
+Module: sekft
+Relationships: 
+Description: Make the lint env honestly pass: add mypy as a dev dependency,
+             ignore_missing_imports for the ML libs, fully annotate
+             eval/resident/sft (including the inner operator callables), and
+             ship a py.typed marker so the Typing::Typed claim is real.
+
+--ISSUE
+Content-Type: application/issue
+ID: 10
+Type: feature
+Title: structured logging for the trainer (sft)
+Status: done
+Priority: medium
+Created: 2026-06-17
+Module: sekft
+Relationships: 
+Description: The trainer is nearly silent: outside an example count and a save
+             line it prints nothing through tokenizer load, the ~14GB base-model
+             load, example building, and the whole training loop, and
+             trajectories dropped for exceeding --max-len or having an empty
+             loss mask vanish without a trace. Add a small shared logging setup
+             (_log.py, stderr so stdout stays clean for results) and a module
+             logger; give sekft-train -v/--verbose and -q/--quiet. Log the run
+             config and each phase, report dataset accounting (keepers ->
+             usable, with counts dropped for length / empty-mask and a warning
+             when any are dropped), and raise transformers' verbosity during
+             training so the per-step curve shows. Apply to train() and
+             inspect().
+
+--ISSUE
+Content-Type: application/issue
+ID: 11
+Type: bugfix
+Title: operate_rate can sum a None (eval + resident)
+Status: done
+Priority: medium
+Created: 2026-06-17
+Module: sekft
+Relationships: 
+Description: operate_rate computes sum(t.steps > 0 and t.meta.get('clean') for t
+             in rows). The 'and' yields the right operand when steps>0, so if
+             meta lacks the 'clean' key it yields None and sum() raises
+             TypeError at runtime; mypy (now that posix-sdc ships py.typed and
+             Trajectory is typed) flags the generator item type in eval.py:83
+             and resident.py:157. Wrap the predicate in bool() so it counts
+             trajectories that operated and are clean, fixing both the type
+             error and the latent crash.
+
+--ISSUE
+Content-Type: application/issue
+ID: 12
+Type: feature
+Title: load training data from a raw dir, a curated jsonl, or the Hub
+Status: done
+Priority: medium
+Created: 2026-06-17
+Module: sekft
+Relationships: 
+Description: iter_keepers reads only raw per-trajectory .json - one of three
+             input shapes the trainer should accept. Add load_turns(data, hub,
+             revision) that yields assistant-bearing turns from: a directory of
+             raw rollout .json (keep-filtered, today's iter_keepers); a curated
+             .jsonl corpus file (already keep-filtered, yield turns per line);
+             or the published corpus via posix-sdc's load_trajectories (local
+             data/ in a checkout, else the Hub). sekft-train gains --hub and
+             --revision; --data dispatches by dir-vs-.jsonl. Raw-rollout reading
+             stays sekft-local; curated+Hub reuse posix-sdc's loader (imported
+             lazily so the trainer needs neither posix-sdc nor huggingface_hub
+             for the raw/jsonl paths). Unit tests for the raw-dir and jsonl
+             dispatch.
+
+--ISSUE
+Content-Type: application/issue
+ID: 13
+Type: feature
+Title: reference posix-sdc three ways for seamless multi-machine dev
+Status: done
+Priority: medium
+Created: 2026-06-17
+Module: sekft
+Relationships: 
+Description: Wire the posix-sdc dependency as a triplet: the abstract
+             posix-sdc[hub] in pyproject (so the trainer's --hub path can reach
+             the Hub via huggingface_hub); the published wheel from the private
+             index in Pipfile [packages]; the git develop branch in Pipfile
+             [dev-packages] for develop-time. Commit Pipfile.lock so the
+             dependency surface and lock land together.
+
+--ISSUE
+Content-Type: application/issue
+ID: 14
+Type: bugfix
+Title: refresh Pipfile.lock against published posix-sdc 1.2.2
+Status: done
+Priority: medium
+Created: 2026-06-17
+Module: sekft
+Relationships: 
+Description: The lock committed with the triplet (#13) predated the published
+             posix-sdc 1.2.2 wheel, so it could not pin the real [hub] closure.
+             Now that 1.2.2 is on the private index, re-lock: posix-sdc resolves
+             to ==1.2.2 from the index and the [hub] extra pulls huggingface_hub
+             and its transitive deps into the lock. Commit the refreshed
+             Pipfile.lock so the next machine installs the published wheel with
+             the Hub path available.
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,92 @@
+[build-system]
+requires = [
+    "setuptools",
+    "wheel",
+    "setuptools-scm[toml]"
+]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "tiararodney.sekft"
+description = "Fine-tune small open models to operate a POSIX shell (sek)"
+authors = [
+    { name = "Tiara Rodney", email = "tiara.rodney@byteb4rb1e.me" }
+]
+license-files = ["LICENSE"]
+readme = "README.md"
+classifiers = [
+  "Development Status :: 3 - Alpha",
+  "Intended Audience :: Developers",
+  "Intended Audience :: Science/Research",
+  "License :: OSI Approved :: GNU General Public License v2 (GPLv2)",
+  "Natural Language :: English",
+  "Operating System :: POSIX :: Linux",
+  "Programming Language :: Python :: 3",
+  "Programming Language :: Python :: 3.9",
+  "Programming Language :: Python :: 3.10",
+  "Programming Language :: Python :: 3.11",
+  "Programming Language :: Python :: 3.12",
+  "Topic :: Scientific/Engineering :: Artificial Intelligence",
+  "Topic :: System :: Shells",
+  "Typing :: Typed",
+]
+dependencies = [
+  "tiararodney.posix-sdc[hub]",
+]
+dynamic = ["version"]
+requires-python = ">=3.9"
+
+[project.optional-dependencies]
+gpu = [
+  "torch",
+  "transformers",
+  "peft",
+  "datasets",
+  "accelerate",
+  "bitsandbytes",
+  "tensorboard",
+]
+
+[project.scripts]
+sekft-train = "tiararodney.sekft.sft:main"
+sekft-eval = "tiararodney.sekft.eval:main"
+sekft-resident = "tiararodney.sekft.resident:main"
+
+[project.urls]
+Git = "https://git.code.tiararodney.com/tiararodney/sekft"
+
+[tool.setuptools.packages.find]
+where = ["src"]
+namespaces = true
+
+[tool.setuptools.package-data]
+"tiararodney.sekft" = ["py.typed"]
+
+[tool.pytest.ini_options]
+pythonpath = ["src", "../posix-sdc/src"]
+testpaths = ["tests"]
+markers = [
+    "pytest: integration tests runnable without external services",
+    "gpu: requires torch and a GPU",
+    "docker: requires Docker and the sekft-dash image",
+]
+
+[tool.mypy]
+strict = true
+mypy_path = "src"
+explicit_package_bases = true
+namespace_packages = true
+
+[[tool.mypy.overrides]]
+module = [
+  "torch.*", "transformers.*", "peft.*", "datasets.*", "bitsandbytes.*",
+  "tiararodney.posix_sdc.*",
+]
+ignore_missing_imports = true
+
+[tool.autopep8]
+max_line_length = 80
+aggressive = 3
+recursive = true
+
+[tool.setuptools_scm]
--- a/src/tiararodney/sekft/init.py
+++ b/src/tiararodney/sekft/init.py
@ -0,0 +1,5 @@
+"""sekft: fine-tune small open models to operate a POSIX shell (sek).
+
+Consumes the posix-sdc dataset; the trainer, behavioural evaluator, and the
+resident-base harness live here.
+"""
--- a/src/tiararodney/sekft/_log.py
+++ b/src/tiararodney/sekft/_log.py
@ -0,0 +1,20 @@
+"""Console logging setup shared by the sekft entry points.
+
+Logs go to stderr so stdout stays clean for a command's actual output (metrics
+JSON, a path a caller might capture). Call :func:`setup` once at the top of a
+``main()``; modules then log through ``logging.getLogger("sekft.<area>")``.
+"""
+from __future__ import annotations
+
+import logging
+
+
+def setup(verbose: bool = False, quiet: bool = False) -> None:
+    """Configure root logging to stderr. ``quiet`` shows warnings and worse,
+    ``verbose`` adds debug; the default is info."""
+    level = logging.WARNING if quiet else logging.DEBUG if verbose else logging.INFO
+    logging.basicConfig(
+        level=level,
+        format="%(asctime)s %(levelname)-5s %(name)s  %(message)s",
+        datefmt="%H:%M:%S",
+    )
--- a/src/tiararodney/sekft/eval.py
+++ b/src/tiararodney/sekft/eval.py
@ -0,0 +1,105 @@
+"""Behavioural eval: the metric that matters.
+
+Train loss says nothing about whether the model operates the shell and leaves.
+This loads a fine-tuned model (base + LoRA adapter), drops it into held-out
+scenarios with NO scaffold (the trained behaviour must stand on its own), and
+reports the rates that count: does it reach command-mode, does it terminate,
+does the checker pass.
+
+    sekft-eval --base <hf-dir> --adapter ./ckpt-mistral-r16 \
+               --scenarios ./holdout-scenarios --n 10
+
+Reuses the posix-sdc rollout loop with a *local* operator: the model renders and
+generates with the same chat template it was trained on (train == eval == serve,
+via ``apply_chat_template`` + ``normalize_for_template``, or the prompts go out
+of distribution). Prerequisites on the box: torch + transformers + peft, the
+``sekft-dash`` image, and held-out SCENARIO bundles from the posix-sdc factory
+(not trajectories; the eval stands up and verifies each).
+"""
+from __future__ import annotations
+
+import argparse
+import json
+from collections.abc import Callable
+from pathlib import Path
+from typing import Any
+
+from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
+from tiararodney.posix_sdc.factory.rollout import rollout
+from tiararodney.posix_sdc.schema import Scenario
+
+from .sft import normalize_for_template
+
+
+def make_local_operator(base: str, adapter: str, max_new_tokens: int = 64,
+                        temperature: float = 0.7) -> Callable[[list[dict[str, str]]], str]:
+    """A ``messages -> command`` callable backed by base + LoRA adapter.
+
+    Renders the conversation exactly as the model was trained, appends the
+    assistant header, generates one turn, and cuts at the first stop marker.
+    """
+    import torch
+    from peft import PeftModel
+    from transformers import AutoModelForCausalLM, AutoTokenizer
+
+    tok = AutoTokenizer.from_pretrained(adapter)
+    model = AutoModelForCausalLM.from_pretrained(
+        base, torch_dtype=torch.float16, device_map="auto")
+    model = PeftModel.from_pretrained(model, adapter)
+    model.eval()
+
+    def operator(messages: list[dict[str, str]]) -> str:
+        msgs = normalize_for_template(messages)
+        ids = tok.apply_chat_template(
+            msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
+        with torch.no_grad():
+            out = model.generate(
+                ids, max_new_tokens=max_new_tokens,
+                do_sample=temperature > 0, temperature=max(temperature, 1e-2),
+                eos_token_id=tok.eos_token_id, pad_token_id=tok.eos_token_id)
+        text: str = tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True).strip()
+        return text
+
+    return operator
+
+
+def evaluate(base: str, adapter: str, scenarios_dir: Path, n: int,
+             max_steps: int, temperature: float) -> dict[str, Any]:
+    if not available():
+        raise SystemExit("sekft-dash image unavailable; `docker build -t sekft-dash .`")
+    operator = make_local_operator(base, adapter, temperature=temperature)
+    backend = DashDocker()
+    rows = []
+    for f in sorted(scenarios_dir.glob("*.json"))[:n]:
+        sc = Scenario.from_dict(json.loads(f.read_text()))
+        tj = rollout(sc, backend, max_steps=max_steps, temperature=temperature,
+                     operator=operator, use_scaffold=False)
+        rows.append(tj)
+        print(f"  {sc.id}: {tj.outcome} (terminal={tj.terminal} "
+              f"verified={tj.verified} steps={tj.steps})")
+    d = len(rows) or 1
+    return {
+        "n": len(rows),
+        "operate_rate": round(sum(bool(t.steps > 0 and t.meta.get("clean")) for t in rows) / d, 3),
+        "terminate_rate": round(sum(t.terminal in ("exit", "panic") for t in rows) / d, 3),
+        "verified_rate": round(sum(t.verified for t in rows) / d, 3),
+        "clean_rate": round(sum(t.keep for t in rows) / d, 3),
+    }
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser(description="Behavioural eval of a tuned model.")
+    ap.add_argument("--base", required=True)
+    ap.add_argument("--adapter", required=True)
+    ap.add_argument("--scenarios", type=Path, required=True)
+    ap.add_argument("--n", type=int, default=10)
+    ap.add_argument("--max-steps", type=int, default=30)
+    ap.add_argument("--temperature", type=float, default=0.7)
+    ns = ap.parse_args()
+    m = evaluate(ns.base, ns.adapter, ns.scenarios, ns.n, ns.max_steps, ns.temperature)
+    print("\n=== behavioural metrics ===")
+    print(json.dumps(m, indent=2))
+
+
+if __name__ == "__main__":
+    main()
--- a/src/tiararodney/sekft/py.typed
+++ b/src/tiararodney/sekft/py.typed
--- a/src/tiararodney/sekft/resident.py
+++ b/src/tiararodney/sekft/resident.py
@ -0,0 +1,189 @@
+"""Resident harness: load the base ONCE, cycle adapters.
+
+On a slow link (OcuLink / PCIe 3.0 x4) the 14 GB base transfer dominates every
+process start. This loads the base once and keeps it hot, so the
+iterate-train-eval loop pays the transfer only at startup. Each ``fit`` trains a
+fresh LoRA adapter on the resident base and ``unload``s it back to clean; each
+``evaluate`` attaches a saved adapter for inference and unloads.
+
+Interactive (IPython on the GPU box) is the intended use:
+
+    from tiararodney.sekft.resident import Resident
+    r = Resident("~/llm-models/mistral-7b-instruct-v0.2", load_4bit=True)
+    r.fit("~/sekft/trajectories", "~/sekft/ckpt-a", lora_r=16, lr=2e-4, epochs=3)
+    r.evaluate("~/sekft/ckpt-a", "~/sekft/holdout", n=10)
+    r.fit("~/sekft/trajectories", "~/sekft/ckpt-b", lora_r=32)   # NO base reload
+
+Or `sekft-resident --base <dir> --selftest-data <stub_dir>` to prove the base
+loads once and two adapters train against it.
+"""
+from __future__ import annotations
+
+import argparse
+import gc
+import json
+from pathlib import Path
+from typing import Any
+
+import torch
+from datasets import Dataset
+from peft import (LoraConfig, PeftModel, get_peft_model,
+                  prepare_model_for_kbit_training)
+from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
+                          DataCollatorForSeq2Seq, Trainer, TrainingArguments)
+
+from .sft import build_masked_example, iter_keepers, normalize_for_template
+
+LORA_TARGETS = ["q_proj", "k_proj", "v_proj", "o_proj"]
+
+
+def _free() -> None:
+    gc.collect()
+    torch.cuda.empty_cache()
+
+
+class Resident:
+    """A base model held resident on the GPU; adapters cycle through it."""
+
+    def __init__(self, base: str, load_4bit: bool = False) -> None:
+        self.base_path = str(Path(base).expanduser())
+        self.load_4bit = load_4bit
+        self.tok = AutoTokenizer.from_pretrained(self.base_path)
+        if self.tok.pad_token is None:
+            self.tok.pad_token = self.tok.eos_token
+        quant = None
+        if load_4bit:
+            quant = BitsAndBytesConfig(
+                load_in_4bit=True, bnb_4bit_quant_type="nf4",
+                bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True)
+        print(f"[resident] loading base ONCE: {self.base_path} (4bit={load_4bit}) ...")
+        self.base = AutoModelForCausalLM.from_pretrained(
+            self.base_path, dtype=torch.float16, quantization_config=quant)
+        self.base = (prepare_model_for_kbit_training(self.base) if load_4bit
+                     else self.base)
+        if not load_4bit:
+            self.base.enable_input_require_grads()
+        dev = next(self.base.parameters()).device
+        mem = torch.cuda.memory_allocated() / 1e9
+        print(f"[resident] base resident on {dev}; {mem:.1f} GB VRAM")
+
+    # -- build masked rows from kept trajectories --------------------------
+
+    def _rows(self, data_dir: Path, max_len: int) -> list[dict[str, list[Any]]]:
+        rows = []
+        for turns in iter_keepers(data_dir):
+            ex = build_masked_example(turns, self.tok)
+            if len(ex["input_ids"]) <= max_len and any(l != -100 for l in ex["labels"]):
+                rows.append(ex)
+        if not rows:
+            raise SystemExit(f"no usable keeper trajectories in {data_dir}")
+        return rows
+
+    # -- train a fresh adapter on the resident base ------------------------
+
+    def fit(self, data_dir: str, out: str, lora_r: int = 16, lr: float = 2e-4,
+            epochs: float = 3.0, batch: int = 1, accum: int = 8,
+            max_len: int = 4096) -> Path:
+        ddir, odir = Path(data_dir).expanduser(), Path(out).expanduser()
+        ds = Dataset.from_list(self._rows(ddir, max_len))
+        if not self.load_4bit:
+            self.base.gradient_checkpointing_enable()
+        model = get_peft_model(self.base, LoraConfig(
+            r=lora_r, lora_alpha=lora_r * 2, lora_dropout=0.05,
+            task_type="CAUSAL_LM", target_modules=LORA_TARGETS))
+        model.print_trainable_parameters()
+        args = TrainingArguments(
+            output_dir=str(odir), per_device_train_batch_size=batch,
+            gradient_accumulation_steps=accum, num_train_epochs=epochs,
+            learning_rate=lr, fp16=True, logging_steps=1, save_strategy="no",
+            report_to=["tensorboard"], logging_dir=str(odir / "runs"),
+            remove_unused_columns=False, warmup_ratio=0.03)
+        tr = Trainer(model=model, args=args, train_dataset=ds,
+                     data_collator=DataCollatorForSeq2Seq(
+                         self.tok, padding=True, label_pad_token_id=-100))
+        tr.train()
+        odir.mkdir(parents=True, exist_ok=True)
+        model.save_pretrained(str(odir))
+        self.tok.save_pretrained(str(odir))
+        (odir / "log_history.jsonl").write_text(
+            "\n".join(json.dumps(r) for r in tr.state.log_history))
+        losses = [h["loss"] for h in tr.state.log_history if "loss" in h]
+        print(f"[resident] fit -> {odir}  final loss {losses[-1] if losses else '?'}")
+        self.base = model.unload()        # strip LoRA, restore resident base
+        del model, tr, ds
+        _free()
+        return odir
+
+    # -- behavioural eval of a saved adapter -------------------------------
+
+    def evaluate(self, adapter: str, scenarios_dir: str, n: int = 10,
+                 max_steps: int = 30, temperature: float = 0.7) -> dict[str, Any]:
+        from tiararodney.posix_sdc.factory.dashdocker import DashDocker, available
+        from tiararodney.posix_sdc.factory.rollout import rollout
+        from tiararodney.posix_sdc.schema import Scenario
+        if not available():
+            raise SystemExit("sekft-dash image unavailable on this box")
+        # adapter=None -> evaluate the BASE model (the within-holdout baseline).
+        if adapter:
+            adapter = str(Path(adapter).expanduser())
+            pm = PeftModel.from_pretrained(self.base, adapter)
+        else:
+            pm = self.base
+        pm.eval()
+
+        def operator(messages: list[dict[str, str]]) -> str:
+            msgs = normalize_for_template(messages)
+            ids = self.tok.apply_chat_template(
+                msgs, add_generation_prompt=True, return_tensors="pt").to(pm.device)
+            with torch.no_grad():
+                o = pm.generate(ids, max_new_tokens=64, do_sample=temperature > 0,
+                                temperature=max(temperature, 1e-2),
+                                eos_token_id=self.tok.eos_token_id,
+                                pad_token_id=self.tok.eos_token_id)
+            text: str = self.tok.decode(o[0][ids.shape[1]:], skip_special_tokens=True).strip()
+            return text
+
+        backend = DashDocker()
+        rows = []
+        for f in sorted(Path(scenarios_dir).expanduser().glob("*.json"))[:n]:
+            sc = Scenario.from_dict(json.loads(f.read_text()))
+            tj = rollout(sc, backend, max_steps=max_steps, temperature=temperature,
+                         operator=operator, use_scaffold=False)
+            rows.append(tj)
+            print(f"  {sc.id}: {tj.outcome} terminal={tj.terminal} verified={tj.verified}")
+        d = len(rows) or 1
+        m = {
+            "n": len(rows),
+            "operate_rate": round(sum(bool(t.steps > 0 and t.meta.get("clean")) for t in rows) / d, 3),
+            "terminate_rate": round(sum(t.terminal in ("exit", "panic") for t in rows) / d, 3),
+            "verified_rate": round(sum(t.verified for t in rows) / d, 3),
+            "clean_rate": round(sum(t.keep for t in rows) / d, 3),
+        }
+        if adapter:                 # base is unwrapped only if we wrapped it
+            self.base = pm.unload()
+            del pm
+        _free()
+        print("[resident] eval:", json.dumps(m))
+        return m
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser(description="Resident base; cycle adapters.")
+    ap.add_argument("--base", required=True)
+    ap.add_argument("--load-4bit", action="store_true")
+    ap.add_argument("--selftest-data",
+                    help="fit two adapters on this data to prove resident multi-fit")
+    ns = ap.parse_args()
+    r = Resident(ns.base, ns.load_4bit)
+    if ns.selftest_data:
+        print("=== selftest: two fits on the SAME resident base (no reload) ===")
+        r.fit(ns.selftest_data, "/tmp/res-a", epochs=1, lora_r=8)
+        r.fit(ns.selftest_data, "/tmp/res-b", epochs=1, lora_r=8)
+        print("=== selftest OK: base loaded once, two adapters trained ===")
+    else:
+        print("Resident ready. Import and use r.fit() / r.evaluate(), "
+              "or pass --selftest-data <dir>.")
+
+
+if __name__ == "__main__":
+    main()
--- a/src/tiararodney/sekft/sft.py
+++ b/src/tiararodney/sekft/sft.py
@ -0,0 +1,289 @@
+"""sekft trainer: SFT a base model on kept shell-operation trajectories.
+
+Trains assistant turns ONLY -- the commands and the terminal ``exit`` / ``panic``.
+The environment turns (system orientation, prompts, command output) are masked
+to ``-100`` so the model learns to *produce* commands, not to predict the
+environment's replies. Getting this mask wrong is the classic way to ruin a
+shell-operator SFT (the model starts hallucinating output), so it is the part
+worth testing hardest -- and it is framework-independent.
+
+Render uses the tokenizer's OWN chat template (``apply_chat_template``), so the
+training render is identical to what the serving harness produces (ccpty sends
+structured messages and the inference endpoint applies the model's default
+template). Trajectories are canonicalised first (``normalize_for_template``):
+a leading ``system`` turn is folded into the first ``user`` turn and consecutive
+same-role turns are merged, because instruct templates such as Mistral's have no
+system role and require strict user/assistant alternation. That same
+canonicalisation must run on the serving side. Everything else is standard
+causal-LM SFT with an assistant-only loss mask.
+
+    sekft-train --data ./trajectories --base <hf-model-dir> --out ./ckpt
+    sekft-train --data corpus.jsonl --base <dir>          # a curated .jsonl corpus
+    sekft-train --hub --base <dir>                        # the published corpus (Hub)
+    sekft-train --data ./trajectories --base <dir> --inspect   # mask stats, no training
+
+Training needs torch + transformers + peft (a GPU box). ``--inspect`` and the
+normalize/mask helpers run anywhere a tokenizer with a chat template is
+available.
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import logging
+from collections.abc import Iterator
+from pathlib import Path
+from typing import Any
+
+from ._log import setup as _setup_logging
+
+log = logging.getLogger("sekft.train")
+
+
+def normalize_for_template(messages: list[dict[str, str]]) -> list[dict[str, str]]:
+    """Canonicalise a trajectory for instruct chat templates that have no system
+    role and require strict user/assistant alternation (Mistral and friends):
+    treat ``system`` as ``user``, then merge consecutive same-role turns by
+    joining their content with a newline.
+
+    This is loss-neutral for the assistant mask (only environment/user turns
+    ever merge; the assistant commands are never adjacent in this data) and it
+    is what lets ``apply_chat_template`` render the multi-turn shell dialogue.
+    The serving side MUST apply the same canonicalisation, or train and serve
+    diverge again.
+    """
+    out: list[dict[str, str]] = []
+    for m in messages:
+        role = "user" if m["role"] == "system" else m["role"]
+        if out and out[-1]["role"] == role:
+            out[-1] = {"role": role, "content": out[-1]["content"] + "\n" + m["content"]}
+        else:
+            out.append({"role": role, "content": m["content"]})
+    return out
+
+
+def build_masked_example(messages: list[dict[str, str]], tokenizer: Any) -> dict[str, list[Any]]:
+    """Tokenize a trajectory with the tokenizer's OWN chat template and build an
+    assistant-only loss mask.
+
+    The render is ``tokenizer.apply_chat_template`` on the canonicalised turns,
+    so it is byte-identical to what the serving harness sends. The mask is
+    derived by token-prefix differencing: the tokens an assistant turn
+    contributes are exactly those that appear when it extends the rendered
+    prefix, which trains the commands plus the template's end-of-turn token (so
+    the model learns to stop) and masks every environment turn to ``-100``. This
+    assumes an additive template (each turn extends the previous render); a
+    non-additive one raises rather than silently mis-mask.
+    """
+    msgs = normalize_for_template(messages)
+    ids = tokenizer.apply_chat_template(msgs, add_generation_prompt=False)
+    labels = [-100] * len(ids)
+    prev: list[int] = []
+    for i, m in enumerate(msgs):
+        upto = tokenizer.apply_chat_template(msgs[:i + 1], add_generation_prompt=False)
+        if ids[:len(upto)] != upto or upto[:len(prev)] != prev:
+            raise ValueError("chat template is not additive; cannot derive an "
+                             "assistant loss mask by token-prefix differencing")
+        if m["role"] == "assistant":
+            for j in range(len(prev), len(upto)):
+                labels[j] = ids[j]
+        prev = upto
+    return {"input_ids": ids, "attention_mask": [1] * len(ids), "labels": labels}
+
+
+def iter_keepers(data_dir: Path) -> Iterator[list[dict[str, str]]]:
+    """Yield ``turns`` (message lists) from raw rollout JSONs marked keep."""
+    for f in sorted(data_dir.glob("*.json")):
+        d = json.loads(f.read_text())
+        if d.get("keep"):
+            yield d["turns"]
+
+
+def load_turns(data: Path, hub: bool = False,
+               revision: str | None = None) -> Iterator[list[dict[str, str]]]:
+    """Yield assistant-bearing ``turns`` from one of three sources:
+
+    - ``--hub``: the published corpus via posix-sdc's ``load_trajectories`` (the
+      in-repo ``data/`` of a posix-sdc checkout, else the Hugging Face Hub);
+    - ``data`` a ``.jsonl`` file: a curated corpus, already keep-filtered, one
+      record per line;
+    - ``data`` a directory: raw rollout ``.json`` (keep-filtered here).
+
+    posix-sdc is imported lazily, so the raw-dir and ``.jsonl`` paths need
+    neither posix-sdc nor huggingface_hub installed.
+    """
+    if hub:
+        from tiararodney.posix_sdc import load_trajectories
+        for r in load_trajectories(revision=revision):
+            yield r["turns"]
+    elif data.is_dir():
+        yield from iter_keepers(data)
+    elif data.suffix == ".jsonl":
+        with open(data) as fh:
+            for line in fh:
+                if line.strip():
+                    yield json.loads(line)["turns"]
+    else:
+        raise SystemExit(
+            f"--data must be a rollout directory or a .jsonl corpus (got {data})")
+
+
+def mask_stats(example: dict[str, list[Any]]) -> tuple[int, int]:
+    """(trained tokens, total tokens) for an example."""
+    trained = sum(1 for x in example["labels"] if x != -100)
+    return trained, len(example["labels"])
+
+
+# --------------------------------------------------------------------------
+# Training (GPU box: torch + transformers + peft)
+# --------------------------------------------------------------------------
+
+def train(data_dir: Path, base: str, out: Path, epochs: float, lr: float,
+          batch: int, accum: int, max_len: int, lora_r: int,
+          load_4bit: bool = False, hub: bool = False,
+          revision: str | None = None) -> None:
+    import torch
+    from datasets import Dataset
+    from peft import LoraConfig, get_peft_model
+    from transformers import (AutoModelForCausalLM, AutoTokenizer,
+                              DataCollatorForSeq2Seq, Trainer, TrainingArguments)
+    from transformers.utils import logging as hf_logging
+
+    # Surface the Trainer's own per-step curve (loss/lr/grad_norm); it is at
+    # WARNING by default, which is most of why training looks silent.
+    hf_logging.set_verbosity_info()
+
+    source = "hub" if hub else data_dir
+    log.info("base=%s  data=%s  out=%s", base, source, out)
+    log.info("loading tokenizer: %s", base)
+    tok = AutoTokenizer.from_pretrained(base)
+    if tok.pad_token is None:
+        tok.pad_token = tok.eos_token
+
+    log.info("building masked examples from %s ...", source)
+    rows: list[dict[str, list[Any]]] = []
+    n_seen = n_long = n_empty = 0
+    for turns in load_turns(data_dir, hub=hub, revision=revision):
+        n_seen += 1
+        ex = build_masked_example(turns, tok)
+        log.debug("  trajectory %d: %d turns -> %d tokens, %d trained",
+                  n_seen, len(turns), len(ex["input_ids"]), mask_stats(ex)[0])
+        if n_seen % 100 == 0:
+            log.info("  ... %d trajectories processed, %d usable", n_seen, len(rows))
+        if len(ex["input_ids"]) > max_len:
+            n_long += 1
+            continue
+        if not any(l != -100 for l in ex["labels"]):
+            n_empty += 1
+            continue
+        rows.append(ex)
+    if not rows:
+        raise SystemExit(f"no usable keeper trajectories in {data_dir}")
+    trained = sum(mask_stats(r)[0] for r in rows)
+    total = sum(mask_stats(r)[1] for r in rows)
+    log.info("dataset: %d keepers -> %d usable; %d trained / %d tokens (%.1f%% assistant)",
+             n_seen, len(rows), trained, total, 100 * trained / total)
+    if n_long or n_empty:
+        log.warning("dropped %d trajectories: %d over --max-len %d, %d empty-mask",
+                    n_long + n_empty, n_long, max_len, n_empty)
+    ds = Dataset.from_list(rows)
+
+    # 4-bit (QLoRA) shrinks the base from ~14 GB to ~4 GB to move across the
+    # OcuLink/PCIe link and to hold in VRAM; nf4 + fp16 compute works on the
+    # V100 (sm_70). Without it, plain fp16 weights.
+    quant = None
+    if load_4bit:
+        from transformers import BitsAndBytesConfig
+        quant = BitsAndBytesConfig(
+            load_in_4bit=True, bnb_4bit_quant_type="nf4",
+            bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True,
+        )
+    log.info("loading base model: %s (%s)", base,
+             "4-bit QLoRA" if load_4bit else "fp16")
+    model = AutoModelForCausalLM.from_pretrained(
+        base, dtype=torch.float16, quantization_config=quant)
+    if load_4bit:
+        from peft import prepare_model_for_kbit_training
+        model = prepare_model_for_kbit_training(model)  # handles ckpt + input grads
+    else:
+        model.enable_input_require_grads()
+        model.gradient_checkpointing_enable()
+    model = get_peft_model(model, LoraConfig(
+        r=lora_r, lora_alpha=lora_r * 2, lora_dropout=0.05, task_type="CAUSAL_LM",
+        target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
+    ))
+    n_train, n_all = model.get_nb_trainable_parameters()
+    log.info("LoRA r=%d: %d trainable / %d params (%.3f%%)",
+             lora_r, n_train, n_all, 100 * n_train / n_all)
+
+    args = TrainingArguments(
+        output_dir=str(out), per_device_train_batch_size=batch,
+        gradient_accumulation_steps=accum, num_train_epochs=epochs,
+        learning_rate=lr, fp16=True, logging_steps=1, save_strategy="epoch",
+        report_to=["tensorboard"], logging_dir=str(out / "runs"),
+        remove_unused_columns=False, warmup_ratio=0.03,
+    )
+    trainer = Trainer(
+        model=model, args=args, train_dataset=ds,
+        data_collator=DataCollatorForSeq2Seq(tok, padding=True, label_pad_token_id=-100),
+    )
+    log.info("training: %g epochs, lr=%g, batch=%d x accum=%d (effective %d), max_len=%d",
+             epochs, lr, batch, accum, batch * accum, max_len)
+    trainer.train()
+    model.save_pretrained(str(out))
+    tok.save_pretrained(str(out))
+    # durable, greppable record of the curve (loss/lr/grad_norm per step).
+    (out / "log_history.jsonl").write_text(
+        "\n".join(json.dumps(r) for r in trainer.state.log_history))
+    log.info("saved LoRA adapter + log_history.jsonl -> %s  (tensorboard: --logdir %s)",
+             out, out / "runs")
+
+
+def inspect(data_dir: Path, base: str, hub: bool = False,
+            revision: str | None = None) -> None:
+    from transformers import AutoTokenizer
+    log.info("loading tokenizer: %s", base)
+    tok = AutoTokenizer.from_pretrained(base)
+    n = tt = tr = 0
+    for turns in load_turns(data_dir, hub=hub, revision=revision):
+        ex = build_masked_example(turns, tok)
+        t, total = mask_stats(ex)
+        tr += t; tt += total; n += 1
+    if not n:
+        raise SystemExit(f"no keeper trajectories in {data_dir}")
+    log.info("%d keeper trajectories; %d/%d tokens trained (%.1f%% assistant, rest masked)",
+             n, tr, tt, 100 * tr / tt)
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser(description="SFT a model on shell trajectories.")
+    ap.add_argument("--data", type=Path, default=Path("./trajectories"),
+                    help="a raw rollout dir or a curated .jsonl corpus")
+    ap.add_argument("--hub", action="store_true",
+                    help="load the published corpus via posix-sdc (Hub); ignores --data")
+    ap.add_argument("--revision", default=None,
+                    help="dataset revision/tag to pin when using --hub")
+    ap.add_argument("--base", required=True, help="HF model id or local dir")
+    ap.add_argument("--out", type=Path, default=Path("./ckpt"))
+    ap.add_argument("--inspect", action="store_true", help="mask stats only, no training")
+    ap.add_argument("--epochs", type=float, default=3.0)
+    ap.add_argument("--lr", type=float, default=2e-4)
+    ap.add_argument("--batch", type=int, default=1)
+    ap.add_argument("--accum", type=int, default=8)
+    ap.add_argument("--max-len", type=int, default=4096)
+    ap.add_argument("--lora-r", type=int, default=16)
+    ap.add_argument("--load-4bit", action="store_true",
+                    help="QLoRA: load base in 4-bit (less to move over the link, less VRAM)")
+    ap.add_argument("-v", "--verbose", action="store_true", help="debug-level logging")
+    ap.add_argument("-q", "--quiet", action="store_true", help="warnings and errors only")
+    ns = ap.parse_args()
+    _setup_logging(verbose=ns.verbose, quiet=ns.quiet)
+    if ns.inspect:
+        inspect(ns.data, ns.base, hub=ns.hub, revision=ns.revision)
+    else:
+        train(ns.data, ns.base, ns.out, ns.epochs, ns.lr, ns.batch, ns.accum,
+              ns.max_len, ns.lora_r, ns.load_4bit, hub=ns.hub, revision=ns.revision)
+
+
+if __name__ == "__main__":
+    main()
--- a/tests/smoke/test_entrypoints.py
+++ b/tests/smoke/test_entrypoints.py
@ -0,0 +1,30 @@
+"""Smoke tests: the console entry points load and respond to --help without the
+GPU stack (torch is imported lazily inside the training/eval code paths)."""
+from __future__ import annotations
+
+import os
+import subprocess
+import sys
+from pathlib import Path
+
+ROOT = Path(__file__).resolve().parents[2]
+SRC = ROOT / "src"
+POSIX_SRC = ROOT.parent / "posix-sdc" / "src"
+
+
+def _help(module: str) -> "subprocess.CompletedProcess[str]":
+    env = dict(os.environ, PYTHONPATH=os.pathsep.join([str(SRC), str(POSIX_SRC)]))
+    return subprocess.run([sys.executable, "-m", module, "--help"],
+                          capture_output=True, text=True, env=env)
+
+
+def test_train_help() -> None:
+    cp = _help("tiararodney.sekft.sft")
+    assert cp.returncode == 0, cp.stderr
+    assert "--data" in cp.stdout
+
+
+def test_eval_help() -> None:
+    cp = _help("tiararodney.sekft.eval")
+    assert cp.returncode == 0, cp.stderr
+    assert "--adapter" in cp.stdout
--- a/tests/unit/test_load.py
+++ b/tests/unit/test_load.py
@ -0,0 +1,35 @@
+"""Unit tests for the trainer's three-source data loader (raw dir / curated
+jsonl). The Hub path delegates to posix-sdc and is covered there."""
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tiararodney.sekft import sft
+
+
+def test_load_turns_from_raw_dir(tmp_path: Path) -> None:
+    (tmp_path / "a.json").write_text(json.dumps(
+        {"keep": True, "turns": [{"role": "assistant", "content": "ls"}]}))
+    (tmp_path / "b.json").write_text(json.dumps(   # not kept -> excluded
+        {"keep": False, "turns": [{"role": "assistant", "content": "rm -rf /"}]}))
+    got = list(sft.load_turns(tmp_path))
+    assert len(got) == 1
+    assert got[0][0]["content"] == "ls"
+
+
+def test_load_turns_from_jsonl(tmp_path: Path) -> None:
+    f = tmp_path / "corpus.jsonl"
+    f.write_text("\n".join(json.dumps({"turns": [{"role": "assistant", "content": c}]})
+                           for c in ("ls", "cat x")) + "\n")
+    got = list(sft.load_turns(f))
+    assert [t[0]["content"] for t in got] == ["ls", "cat x"]
+
+
+def test_load_turns_rejects_other_paths(tmp_path: Path) -> None:
+    bad = tmp_path / "notes.txt"
+    bad.write_text("hi")
+    with pytest.raises(SystemExit):
+        list(sft.load_turns(bad))
--- a/tests/unit/test_sft.py
+++ b/tests/unit/test_sft.py
@ -0,0 +1,75 @@
+"""Unit tests for the SFT render canonicalisation and assistant-only mask.
+
+These run anywhere: a fake additive tokenizer stands in for a real chat
+template, so no torch/transformers is needed."""
+from __future__ import annotations
+
+from typing import Any
+
+import pytest
+
+from tiararodney.sekft import sft
+
+
+class FakeTok:
+    """Additive chat template: each turn renders to ``<role> tokens... </e>``;
+    the generation prompt appends ``<assistant>``."""
+
+    def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
+                            return_tensors: Any = None) -> list[str]:
+        toks: list[str] = []
+        for m in msgs:
+            toks.append(f"<{m['role']}>")
+            toks += m["content"].split()
+            toks.append("</e>")
+        if add_generation_prompt:
+            toks.append("<assistant>")
+        return toks
+
+
+def test_normalize_folds_system_and_merges_consecutive() -> None:
+    raw = [
+        {"role": "system", "content": "orient"},
+        {"role": "user", "content": "login"},
+        {"role": "user", "content": "prompt"},
+        {"role": "assistant", "content": "cat f"},
+        {"role": "user", "content": "out"},
+        {"role": "user", "content": "prompt"},
+        {"role": "assistant", "content": "exit"},
+    ]
+    norm = sft.normalize_for_template(raw)
+    assert [m["role"] for m in norm] == ["user", "assistant", "user", "assistant"]
+    assert norm[0]["content"] == "orient\nlogin\nprompt"
+
+
+def test_normalize_leaves_clean_alternation_untouched() -> None:
+    raw = [{"role": "user", "content": "a"}, {"role": "assistant", "content": "b"}]
+    assert sft.normalize_for_template(raw) == raw
+
+
+def test_mask_trains_assistant_turns_only() -> None:
+    raw = [
+        {"role": "system", "content": "orient"},
+        {"role": "user", "content": "login"},
+        {"role": "assistant", "content": "cat f"},
+        {"role": "user", "content": "out"},
+        {"role": "assistant", "content": "exit"},
+    ]
+    ex = sft.build_masked_example(raw, FakeTok())
+    trained = [t for t, lab in zip(ex["input_ids"], ex["labels"]) if lab != -100]
+    masked = [t for t, lab in zip(ex["input_ids"], ex["labels"]) if lab == -100]
+    assert set(trained) <= {"<assistant>", "cat", "f", "exit", "</e>"}
+    assert "cat" in trained and "exit" in trained          # both commands present
+    assert {"orient", "login", "out"} <= set(masked)       # environment masked
+
+
+def test_mask_raises_on_non_additive_template() -> None:
+    class BadTok:
+        def apply_chat_template(self, msgs: list[dict[str, str]], add_generation_prompt: bool = False,
+                                return_tensors: Any = None) -> list[int]:
+            return list(range(len(msgs), 0, -1))   # reversed: prefixes do not nest
+
+    with pytest.raises(ValueError):
+        sft.build_masked_example(
+            [{"role": "user", "content": "a"}, {"role": "assistant", "content": "b"}],
+            BadTok())
--- a/tox.ini
+++ b/tox.ini
@ -0,0 +1,47 @@
+[tox]
+requires =
+    tox>=4.19
+env_list =
+    unit-py3{9-13}
+    smoke-py3{9-13}
+    lint
+    format
+
+[testenv]
+deps =
+    ../posix-sdc
+    .
+
+[testenv:lint]
+description = run type check on code base
+labels = static
+deps =
+    mypy
+commands =
+    mypy src tests --junit-xml test-reports/{env_name}.xml
+
+[testenv:format]
+description = check formatting
+labels = static
+deps =
+    autopep8
+commands =
+    autopep8 --diff --exit-code src tests
+
+[testenv:unit-py3{9-13}]
+description = run unit tests
+labels = unit
+deps =
+    {[testenv]deps}
+    pytest
+commands =
+    pytest tests/unit --junitxml=test-reports/{env_name}.xml
+
+[testenv:smoke-py3{9-13}]
+description = run smoke tests against the console entry points
+labels = smoke
+deps =
+    {[testenv]deps}
+    pytest
+commands =
+    pytest tests/smoke --junitxml=test-reports/{env_name}.xml