RFC — A blazing-fast Rust `package.py` parser
Forward-looking design for a Rust parser that extracts the four solver-relevant fields from a rez package.py without invoking Python. Built to compose with the load_family lazy-discovery hook on cold-cache integrations.
Status: Stages 1–4 shipped. Survey tool at
scripts/survey_package_py.py; Rust parser crate atcrates/rer-package/; PyO3 bindings (pyrer.parse_static_package_pyand the batchedparse_static_packages_py) atcrates/rer-python/src/lib.rs; differential safety net atscripts/diff_against_rez.py; perf benches atscripts/bench_package_py_parser.pyandscripts/bench_batched_parser.py. Stage 1 numbers, Stage 3 per-file timing, Stage 2 differential (0 mismatches on 5,979 files), and Stage 4 batched speedup (2.81× on 2,000 files) are all inline below.
Stage 1 result — Fortiche, May 2026
Run on /thierry/rez/pkg (the Fortiche-on-CIFS rez repo):
| Count | % | |
|---|---|---|
package.py files surveyed | 6,439 | 100% |
| Fast-parseable | 5,982 | 92.9% |
| Not fast-parseable | 457 | 7.1% |
Non-fast-parseable breakdown (files can match multiple buckets):
| Pattern | Count | % of total |
|---|---|---|
dynamic-requires (@early / @late on requires) | 352 | 5.5% |
imports (load-bearing import statements) | 96 | 1.5% |
missing-version (mostly rez's own test fixtures) | 85 | 1.3% |
missing-name (mostly rez's own test fixtures) | 75 | 1.2% |
top-level-classdef | 54 | 0.8% |
unrecognised-raise | 2 | 0.0% |
Decisive finding: the dominant seeming failure pattern from a
naive survey, top-level-with (2,245 files, 34.9% of the corpus), is
100% rez's declarative with scope("config") DSL — every one of
the 2,245 files matched. That body only writes attributes of the
as-name (config object) and never touches solver fields, so it is
solver-irrelevant — the parser treats it the same way it treats
def commands(...). Including this single extension lifts the
accept rate from a marginal 58.6% to a green-light 92.9%.
Well past the 70% PROCEED threshold from the original RFC.
Motivation
pyrer's solver is already roughly 34× faster than rez's on the
188-case differential benchmark. Real rez env invocations are no
longer bottlenecked on the solve — they are bottlenecked on what
surrounds it:
- Python interpreter startup (~200–300 ms per process).
- Package discovery — opening, reading, and AST-evaluating each
package.pyrez decides to inspect. This is the big remaining cost on cold-cache invocations. - The solve itself (~tens of ms on
pyrer). - Environment construction (Rex evaluation, PATH munging, shell hooks).
Issue #86 added the load_family callback so the solver only asks
the host to load families it actually needs — that addresses the
"how many" axis. This RFC addresses the "how fast each one" axis:
loading a package.py currently means a full Python compile + exec,
which can run into the milliseconds per file on a warm cache and
adds up fast across a wide BFS or a CI batch.
A Rust parser that extracts the four solver-relevant fields without
invoking the Python interpreter has the potential to drop per-file
parse cost from milliseconds to tens of microseconds for the static
majority of package.py files. Combined with load_family, it
attacks both the count and the cost of the discovery phase.
What package.py actually is
A rez package.py is arbitrary Python. The solver only reads
four fields:
name— stringversion— stringrequires— list of rez-requirement stringsvariants— list of lists of rez-requirement strings
But a real-world package.py can also carry, with varying
frequency:
description,authors,tools,tests,help— irrelevant to the solve.commands(),pre_commands(),post_commands()— function bodies that affect runtime environment, not the solve.build_command,build_system— irrelevant to the solve.@early()/@late()decorated functions onrequires/variants— these are dynamic and do affect the solve.- Top-level
if/elsechains on env vars (if config.studio_mode: requires = […]) — also dynamic relative to the solve. - Top-level
importstatements with load-bearing side effects.
The fast parser only needs to handle the case where the four solver
fields are literal assignments. Everything else falls back to
rez's evaluator, which already exists and is correct.
Scope
In scope (fast path)
| Statement | Action |
|---|---|
name = "..." (string literal) | Extract |
version = "..." (string literal) | Extract |
requires = ["str", "str", …] (list of string literals) | Extract |
variants = [["str", …], …] (list of lists of string literals) | Extract |
def commands(...), def pre_commands(...), def post_commands(...), def tools(...) (function body) | Ignore — not solver-relevant |
Top-level assignments to non-solver fields (description, authors, tools, tests, help, build_command, build_system, etc.) | Ignore |
| Top-level docstring | Ignore |
Out of scope (bail to rez)
| Pattern | Why we bail |
|---|---|
def requires(...) / def variants(...) with @early / @late | Solver-relevant value is dynamic |
Top-level if/else, try/except, for | Can't statically know which branch wins |
import / from ... import | May have load-bearing side effects |
Function calls assigning to a solver field (requires = make_requires(...)) | Not statically resolvable |
| Any other expression we don't recognise | Conservative bail |
The bias is hard toward bailing. A false positive (parsing a file
the fast path shouldn't have handled) produces a different requires
than rez, which means different resolves, which is a silent
correctness regression. The slow path through rez always exists; the
fast path is opt-in coverage. We accept low coverage with zero
divergence over high coverage with any divergence.
Architecture
New crate: rer-package
crates/rer-package/
├── Cargo.toml
├── src/
│ ├── lib.rs
│ ├── parser.rs # AST walk + literal extraction
│ └── classify.rs # bail-or-extract decisions
└── tests/
├── static_fixtures/ # known-fast-parseable .py files
├── dynamic_fixtures/ # known-bail .py files
└── corpus/ # large real-world sample, diff against rez
Depends on rustpython-parser (well-maintained, parses to AST, no
runtime dependency on a Python install). Returns:
pub fn parse_static_package_py(source: &str) -> Option<PackageData>
None means the file is not statically parseable; the caller falls
back to rez. Some(data) means the four fields were all extracted
as literals; the caller can skip rez.Package evaluation entirely.
PyO3 binding
Single function on pyrer:
pyrer.parse_static_package_py(source: str | bytes) -> Optional[pyrer.PackageData]
About twenty lines of glue. The integration site is the
load_family callback in the rez shim:
def load_family(name):
out = []
for pkg_path in _find_package_files(name, paths=PACKAGE_PATHS):
with open(pkg_path) as f:
source = f.read()
# Fast path: try the Rust parser first.
pd = pyrer.parse_static_package_py(source)
if pd is None:
# Bail to rez's evaluator for the dynamic case.
pkg = _rez_package_from_file(pkg_path)
pd = pyrer.PackageData.from_rez(pkg)
out.append(pd)
return out
The shim composes the two — load_family decides which files to
read; parse_static_package_py decides how fast to read each one.
Build order
Stage 1 — Corpus survey (~2 days)
Build rer-stat-package-py: a tool that walks a directory tree of
package.py files and classifies each one into:
| Category | Meaning |
|---|---|
fast-parseable | All four solver fields are literal assignments; no disqualifying top-level statements. |
dynamic-requires | requires is @early/@late or assigned conditionally. |
dynamic-variants | Same, for variants. |
top-level-if | A top-level if/else we'd have to bail on. |
imports | Has import statements. |
other | Anything else that disqualifies the fast path. |
Reports counts, percentages, and example file paths per bucket.
Run this against Fortiche's actual studio repo. The output is the go/no-go signal for Stage 2:
- ≥ 70% fast-parseable: proceed. The fast path covers the typical case; the engineering ROI is real.
- 40–70%: marginal. Worth a discussion about which patterns to expand coverage to and whether the complexity is worth it.
- < 40%: don't build it. The slow-path fallback would dominate; the fast path saves work in the minority case. The memcache alternative below is the smarter bet.
Stage 1 is the cheapest experiment that produces the number this project needs.
Stage 2 — Parser + binding + differential test (~1–2 weeks)
- Implement
rer_package::parse_static_package_pyagainstrustpython-parser. - Hand-curate ~30 fixture files (≥ 15 static, ≥ 15 dynamic across every disqualifying pattern). Unit-test both arms.
- PyO3 binding on
pyrer.parse_static_package_py. - Differential test harness: for every file in the corpus where
the fast parser returns
Some(data), also load the file through rez'sPackageand compare the four fields. Any mismatch is a release blocker, exactly like the 188-case rez solver differential. This is the safety net for the "bias toward bail" policy.
Stage 3 — Shim wiring + end-to-end benchmark (~3–5 days)
-
Document the
load_familyintegration pattern (above). -
End-to-end benchmark: a real
rez envinvocation against a representative repo, three configurations:- Eager BFS (today's shim baseline)
load_familyonly (issue #86 today)load_family+parse_static_package_py(this RFC)
Report wall-clock for cold and warm cache.
Honest forecast — superseded by Stage 3 measurement
The original RFC predicted 2–50× wins by replacing rez's "compile + exec" with Rust AST parsing. Stage 3's measurement against real Fortiche files showed something smaller and worth naming clearly.
Stage 3 result — Fortiche, May 2026
Two iterations, both run with scripts/bench_package_py_parser.py --corpus /thierry/rez/pkg --samples 100 --iters 30 --with-rez
against the live Fortiche-on-CIFS repo and rez 3.3.0.
V1: AST-based parser (rustpython-parser)
| Path | μs / file | Speedup |
|---|---|---|
open + read + parse_static_package_py | 1,533 | — |
DeveloperPackage.from_path + from_rez (rez) | 2,632 | — |
| Result | 1.7× |
The parse step alone was 1,990 μs — rustpython-parser builds the
full module AST when we only need four top-level fields. It also
adds ~30 MB of crate deps to the build. Diagnosis: wrong tool.
V2: hand-rolled lexer
A 700-line module-level scanner that walks the four patterns
(name = "...", version = "...", requires = [...],
variants = [[...]]) directly, with bracket / string / comment /
indent tracking but no AST allocation. Same public API as V1.
| Path | μs / file | Speedup |
|---|---|---|
open + read + parse_static_package_py | 75.24 | — |
DeveloperPackage.from_path + from_rez (rez) | 2,615.54 | — |
| Result | 34.8× |
In-memory breakdown:
| Path | μs / file |
|---|---|
parse_static_package_py(source) (hand-rolled scan) | 59.23 |
from_rez(fake_pkg) (attribute walk, lower bound) | 11.17 |
from_rez(real rez Package) (post-load, just walk) | 13.50 |
| file read alone (open + read, warm cache) | 14.81 |
The parse step alone dropped from 1,990 μs (V1) to 59 μs (V2) — ~33× on that layer, lifting the full-load comparison vs rez from 1.7× to 34.8×. Per-file savings: ~2.54 ms. Over a 50-family resolve, that's ~127 ms saved per resolve — real, artist- perceptible latency.
The V2 rewrite held 92.9% acceptance on the Fortiche corpus (5,979 / 6,439 vs V1's 5,985; 6 files of drift, well within rounding). Two non-obvious bug categories surfaced during the rewrite, both Windows-specific:
- CRLF line endings — Samba-served
package.pyfiles end lines with\r\n. The scanner needed\rtreated transparently in inline whitespace. \<CRLF>line continuations on non-solver assignments (e.g.changelog = \followed by a CRLF then a multi-line triple- quoted string). The continuation handler only recognised\<LF>.
Both have dedicated unit tests in the V2 implementation.
What the bench numbers say about the next ceiling
V2's 75 μs/file full-load splits as:
- ~15 μs file I/O (warm-cache CIFS)
- ~60 μs parsing CPU
CPU is no longer dominant; I/O is. Further parser optimisations have
diminishing returns. The next 10× lives in either avoiding more I/O
(load_family from #86 already does most of this — files the solver
never asks for never get loaded) or a parsed-package cache layered
on top of Fortiche's existing shared memcache.
Caveats on the 34.8× headline
- Cold-cache CIFS — the 14.81 μs file-read is a warm-cache number. On a truly cold network read, file I/O can be 1–100 ms, swamping the 60 μs parse cost. The Rust parser's relative advantage over rez stays the same proportionally (rez pays the same I/O), but the absolute saving per file shifts from 2.5 ms toward whatever the network roundtrip costs.
- Sample bias — 100 files sampled deterministically from the corpus. The full 6,439-file corpus could behave differently in pathological cases (very large files, unusual structure). The unit tests + the 6-file drift between V1 and V2 acceptance count is the existing safety net.
- No production A/B yet — the 127 ms / resolve number is from
micro-bench arithmetic, not a real
rez envmeasurement. Production wall time may show less if other phases dominate.
Risks and mitigations
| Risk | Mitigation |
|---|---|
| Silent correctness regression — the fast parser accepts a file it shouldn't | Bias hard toward bailing in classify.rs. Run the differential test on every file the fast parser claims. Treat any mismatch as a release blocker. |
Maintenance burden — rustpython-parser is a Python AST library; track upstream Python syntax changes | Pin to a known-good version. Studio package.py files don't typically use bleeding-edge syntax. |
| Coverage drift — over time studios add patterns the parser doesn't handle | The fast path is opt-in. Coverage drift means the slow path runs more often, not that correctness breaks. We can extend coverage when patterns become common. |
| Stage 1 says "don't build it" | We've spent two days on a survey and now know the workload. Pivot to the memcache route below or accept the status quo. Cheap pivot point. |
Stage 2 safety net — differential against rez
The bias-toward-bailing policy is only safe if "V2 accepts a file"
also means "V2 produces the same (name, version, requires, variants)
that rez does". A divergence here is a silent correctness regression
in any rez integration shim that uses the fast path.
scripts/diff_against_rez.py is the test harness: for every file V2
accepts, it also loads via rez's DeveloperPackage.from_path and
stringifies the four fields with str(req), then compares
byte-for-byte. Any divergence is a release blocker, exactly like
the 188-case rez solver differential.
Result on /thierry/rez/pkg + rez 3.3.0
| Count | % of V2-accepted | |
|---|---|---|
| Total files surveyed | 6,439 | — |
| V2 accepted | 5,979 | — |
| V2 bailed (slow path) | 460 | — |
| Match (all four fields agree with rez) | 5,813 | 97.22% |
| Mismatch (correctness regression) | 0 | 0.00% |
| rez evaluation error | 166 | 2.78% |
Wall-clock for the full run: 74 seconds (CIFS warm). Zero mismatches over the entire Fortiche corpus — the safety net is green.
What about the 166 rez-eval-error files?
These are files V2 accepts but rez 3.3.0 in this dev venv can't load. Sampled five of them — all five are the same error:
InvalidPackageError: Package … uses @include decorator, but no include path has been configured with the 'package_definition_python_path' setting.
That's a rez environment-config issue, not a content issue.
Production rez at Fortiche has package_definition_python_path
set; rez would load these files fine. V2 correctly accepts them
because @include def some_func(): is a non-solver decorator (the
function being decorated isn't requires / variants), and the
four solver fields are static.
So the "166 rez-eval-error" bucket is really "files this dev venv can't evaluate because of a missing config knob" — not a correctness signal. In production they'd match.
If the differential test ever needs to be tightened, the path is to
also configure package_definition_python_path in the dev venv —
but that's CI infrastructure, not a parser change.
Stage 4 — Batched parallel parse (issue #94)
After Stages 1–3 landed, cProfile of a real Fortiche resolve
showed the static parser itself was no longer in the top of the
flamegraph. The cost had moved one layer up: the shim's serial
Python loop of open() calls feeding the parser. On a 132-package
resolve that was 3.20 s of pure I/O (35% of total wall time), one
file at a time while seven cores idled.
parse_static_packages_py(paths) is the response: open + parse
every path in one Rust call across a Rayon thread pool, with the
GIL released for the whole batch. Same per-file semantics as
parse_static_package_py — accept rate, output shape, differential
correctness all carry over.
Result on Fortiche
scripts/bench_batched_parser.py against /thierry/rez/pkg over
CIFS, best-of-3:
| Sample | Serial open + parse | Batched | Speedup |
|---|---|---|---|
| 500 files (warm cache) | 56.71 ms | 40.76 ms | 1.39× |
| 2,000 files | 4,234 ms | 1,508 ms | 2.81× |
Per-file saving on the 2,000-file run: ~1.36 ms. Extrapolated
to the issue's target workload (132-package resolve, ~2,600
package.py files): ~3.5 s saved per resolve.
Both paths produce identical accepts (1,864/2,000 → static-parseable fraction matches the per-file parser) — zero correctness drift.
The 500-file bench is bottlenecked on warm-page-cache parsing CPU; the Rayon dispatch overhead amortises less on small batches. On cold-cache or larger batches the parallel-I/O overlap shows through. The 2.81× is a lower bound on warm hardware; on cold CIFS (Windows production) it should grow.
Design choices
- Output is positionally aligned with input. Missing files,
unreadable bytes, and parser bails all become
Noneat the matching index. The shim'szip(pkgs, result)is then trivially correct. - No exception escapes. Per-file failures map to
None. The function only raises if the input type is wrong. - Pool size = Rayon default (
RAYON_NUM_THREADSenv var or logical core count). No per-call knob initially; capacity control is environmental. - Pure addition. The single-file API stays. Shims feature-detect
with
hasattr(pyrer, "parse_static_packages_py")and fall back to the per-file loop on older pyrer.
Safety net
Reused from Stage 2. The same from_rez(pkg) comparison can be
shadow-checked at production runtime, gated on
REZ_PYRER_VALIDATE_BATCHED. The integration page in the
rez integration docs
has the recipe.
The offline Stage 2 differential — 5,813 / 5,813 matched on the
Fortiche corpus — covers the per-file semantics. Stage 4's batched
call uses the exact same parse_static_package_py per file, so the
existing safety net carries over byte-for-byte; the only Stage 4
specific risk is around ordering / completion which the
positional-alignment contract handles explicitly.
What's next after this
Stage 4 takes us to:
- ~93 % of
package.pyfiles served by the Rust fast path - ~75 µs per file via the static parser (Stage 3)
- ~1/3 the wall-time on the open+parse phase via the batched call (Stage 4)
The remaining cost on _load_family is now real I/O (CIFS round-
trips for files Rayon's pool can't overlap further) plus the
dynamic-7 % rez evaluator path. Both are architectural —
addressing them needs a layer outside this RFC (memcache caching
of parsed PackageData across invocations is the obvious next
move, as called out in the "Considered alternatives" section
below).
Considered alternatives
Parsed-package cache on top of the shared memcache
Instead of parsing package.py fast, parse once via rez and cache
the four-field result in Fortiche's existing shared memcache, keyed
by (repo_id, family, version, mtime). Subsequent reads across the
studio are sub-millisecond regardless of whether the file is
static or dynamic.
Tradeoffs:
- Pro: no novel parser, no AST-classification edge cases, reuses infrastructure already in place. Lower risk, faster to ship.
- Pro: works for dynamic packages too (cache stores the evaluated fields).
- Con: first hit anywhere in the studio still pays full rez evaluation. The Rust parser is "always fast"; the cache is "fast after first studio-wide hit".
- Con: requires invalidation on
package.pychange (mtime check is cheap but adds complexity).
The user's preference is to explore the Rust parser. The cache is worth re-evaluating after Stage 1 — if the static-parseable fraction is lower than expected, the cache pays off more reliably with less risk.
Concrete next step
Open a draft PR with rer-stat-package-py only: the corpus
classifier walker. No parser yet. The deliverable is a small CLI
that takes a path and prints a histogram of categories. Two days
of work; the output decides whether Stages 2–3 happen.
A reasonable acceptance criterion for the survey tool itself:
- Recognises every pattern listed in the Scope section.
- Reports per-category counts and percentages.
- Outputs example file paths for each bucket (for hand-inspection of edge cases).
- Falls back gracefully on files it can't even parse as Python
(rare but possible — broken
package.pyfiles exist in the wild).
See also
- Wiring pyrer into rez — the
load_familycallback this parser composes with. - Issue #86 — the lazy-discovery hook that motivated this RFC.
rustpython-parseron crates.io — the AST library this RFC depends on.