Evolution Finds Its Limits: What ALPS Learned About Search and Fitting
On measuring the contribution of evolutionary mechanisms to compression-driven program synthesis, and finding that most of them contribute nothing.
ALPS evolves Lisp-like programs to predict binary sequences using compression as fitness. Across 88 experiments, ablations showed that crossover, tournament selection, and a module system each had zero measurable impact. The engine's effectiveness comes from two sources: Lamarckian MLE parameter refit and operator injection. Everything else was removed.
ALPS (Adaptive Lamarckian Program Search) is a program synthesis engine that evolves Lisp-like abstract syntax trees to predict binary sequences. Fitness is measured in bits saved: how much better a program compresses a stream compared to a memoryless baseline. The system was built to answer a specific question: which mechanisms in evolutionary program synthesis are doing the work.
The engine included the standard toolkit. Seven mutation operators for topology perturbation. Crossover recombination for combining subtrees from two parents. Tournament selection for biasing reproduction toward higher-fitness individuals. A module system for mining common subexpressions and inserting them into new programs. And a Lamarckian step: after each mutation, parametric components of the program are fit analytically by maximum likelihood estimation against the training stream.
Across nine environments spanning periodic patterns, Markov chains, arithmetic sequences, and noisy variants, the system passed fitness targets in 8 of 9 cases. On second-order Markov chains, it achieved F = 0.331 bits against a theoretical limit of 0.289. Robustness was strong: coefficient of variation below 0.07 across 5-seed checks. The engine works.
The ablation study measured each mechanism individually against a controlled baseline. Removing crossover: delta below 0.002. Removing tournament selection and replacing it with random parent choice: delta below 0.002. Removing the module system, roughly 600 lines of code including subtree mining, library management, and insertion mutation: delta below 0.002. Removing the context forcing phase: delta below 0.002. No mechanism reached statistical significance.
The Lamarckian MLE refit step accounts for the entire fitness signal on parameter-fitting tasks. Given any program topology that exposes a markov_table node to the stream, the MLE step finds globally optimal parameters in closed form. Evolution's contribution on these tasks reduces to placing a parametric node in the program tree and wiring context variables to it. Once that topology exists, the analytical step handles everything else.
Structural tasks tell a different story. When the correct prediction rule requires discovering a boolean operator rather than fitting continuous parameters, Lamarckian refit cannot help. Three additional environments tested this: streams governed by XOR, parity, and AND relationships between recent bits. On these tasks, the system discovers the correct operators through operator injection, a guided mutation that introduces structural expressions like (xor prev prev2) or (parity3 ctx) directly into the program tree.
On XOR prediction, evolved programs express the relationship as (xor prev prev2) or equivalently (!= prev prev2). On parity, the system discovers (parity3 ctx). Without operator injection, the fitness delta on XOR drops by 0.10: the difference between discovering the correct structure and remaining in a parameter-fitting local optimum. Operator injection is the one evolutionary mechanism with a measurable contribution beyond what analytical fitting provides.
After the ablations, we stripped the engine to what the data supported. Crossover, tournament selection, and the module system were removed entirely: approximately 800 lines of code. The stripped engine was rerun on all structural benchmarks. Performance matched or exceeded the original on every task, with lower variance on XOR (std 0.072 vs 0.264).
The result is a specific finding about the division of labor in program synthesis. For tasks where the correct model has learnable parameters, Lamarckian refit dominates and evolutionary mechanisms are redundant. For tasks requiring structural discovery, operator injection, a form of guided topology search, is the mechanism that matters. The role of evolution in ALPS is narrower than the architecture assumed: it is a topology search, and the rest is fitting.