technical
Minimax Search Algorithm
At its core, Gillespie uses the minimax algorithm to search the game tree. The algorithm explores possible future positions by alternating between maximizing White's advantage and minimizing Black's advantage, assuming both players make optimal moves.
Core Concept
MAX vs MIN: At each level, the algorithm alternates between maximizing (White's turn) and minimizing (Black's turn) the evaluation score.
Search Tree Visualization
Alpha-Beta Pruning
Alpha-beta pruning is an optimization that dramatically reduces the number of nodes searched. It eliminates branches that cannot possibly affect the final decision, achieving the same result as minimax but orders of magnitude faster.
How It Works
Best value maximizer can guarantee. Initialized to -∞, increases during search.
Best value minimizer can guarantee. Initialized to +∞, decreases during search.
When β ≤ α, remaining moves cannot improve the position and are pruned.
Efficiency Gains
b = branching factor (~35 for chess), d = search depth
Move Ordering Impact
The effectiveness of alpha-beta pruning heavily depends on move ordering. Gillespie employs several techniques to search promising moves first:
Advanced Search Techniques
Quiescence Search
Extends search beyond nominal depth for tactical positions (captures, checks, promotions) to avoid horizon effect. Prevents evaluation of unstable positions.
Transposition Table
Hash table storing previously evaluated positions. Different move orders can reach identical positions; the table prevents redundant evaluation.
Null Move Pruning
If passing the turn (illegal in chess) still yields a good position, current position is likely strong. Allows aggressive pruning in non-zugzwang positions.
Late Move Reductions
Later moves in well-ordered search are less likely to be best. Search them at reduced depth initially, re-search if they exceed expectations.
Search Statistics
MoE-NNUE Architecture
Mixture of ExpertsTraditional chess engines employ a single neural network for position evaluation. Gillespie pioneers a Mixture of Experts (MoE) approach, utilizing multiple specialized NNUE models that dynamically adapt to different game phases and position types.
NNUE (Efficiently Updatable Neural Network): A neural network architecture designed for chess evaluation that can be incrementally updated as moves are made, enabling real-time position assessment at millions of nodes per second.
System Overview
Position Analysis
Extract features: material balance, king safety, pawn structure, piece mobility
Expert Selection
Gating network routes position to specialized expert models based on game phase
Evaluation Output
Weighted combination of expert outputs produces final centipawn evaluation
Architecture Flow
Expert Model Specialization
Each expert model is trained on specific game phases and position types, developing deep understanding of contextual evaluation:
Opening Expert
Specializes in moves 1-15. Focus: development, center control, king safety preparation.
Middlegame Expert
Specializes in moves 16-40. Focus: tactical combinations, piece coordination, attack patterns.
Endgame Expert
Specializes in moves 40+. Focus: king activity, pawn promotion, zugzwang positions.
Tactical Expert
Activates on sharp positions. Focus: sacrifices, forcing sequences, mate threats.
Positional Expert
Handles quiet, closed positions. Focus: space advantage, piece placement, long-term planning.
Gating Network Mechanism
The gating network is a lightweight neural network that analyzes position features and determines which expert models to activate. It outputs a probability distribution over all experts.
Input Features (64-dim vector)
SOFTMAX OUTPUT: The gating network applies softmax activation to produce weights that sum to 1.0, enabling smooth blending between experts during transitional positions.
Example Gate Activations
Position Context
Move 8, Italian Game opening, material equal, low tactical complexity
Network Architecture Details
Gating Network
Expert NNUE Models
Complete Forward Pass
Performance Metrics
MoE-NNUE vs Single-Model NNUE
Computational Overhead
EFFICIENCY TRADE-OFF: The 27% computational overhead yields 8-11% accuracy improvements across all game phases, resulting in +180 Elo gain against baseline single-model engine.
Training Methodology
Two-Stage Training Pipeline
Expert Pre-Training
- • Each expert trained independently on phase-specific positions
- • Self-play games filtered by move number and complexity
- • Loss function: Mean Squared Error on centipawn values
- • Training time: ~120 hours per expert on 8× A100 GPUs
Gating Network Training
- • Expert weights frozen, only gating network trainable
- • Reinforcement learning via self-play tournaments
- • Reward signal: game outcome + evaluation accuracy
- • Training time: ~80 hours on single machine
Training Data Statistics
Bullet Trainer: High-Velocity Training System
The Bullet Trainer is Gillespie's specialized training infrastructure designed for rapid iteration and high-throughput position generation. It generates training data through ultra-fast self-play at drastically reduced time controls.
System Architecture
Generation Rate
Training Loop
Initial Bootstrap
EPOCH 0Train on human GM games (Lichess Elite Database). Establishes baseline chess knowledge.
Self-Play Generation
LOOPCurrent model plays against itself. Exploration noise added to move selection for diversity.
Model Update
TRAINTrain on new positions, backpropagate game outcomes. Update weights via Adam optimizer.
Validation
TESTPlay 1000 games vs. previous checkpoint. Keep if Elo gain > +10.
Position Quality Metrics
Complexity Score
Weighted sum of: piece count, mobility, tactical opportunities, pawn structure variety
Evaluation Variance
Standard deviation of evaluations across multiple depths. High variance = uncertain position
Uniqueness Hash
Zobrist hash with piece-square feature encoding. Prevents similar positions in training batch
Data Augmentation
Hardware Infrastructure
Implementation & Optimization
CPU Optimizations
SIMD Vectorization
AVX2/AVX-512 instructions for parallel expert evaluation. Process 8 positions simultaneously with 256-bit registers.
Incremental Updates
NNUE accumulator caching reduces recomputation. Only affected neurons updated when pieces move.
Sparse Activation
Top-K expert selection (K=2). Only 40% of experts evaluated per position, reducing inference cost.
Memory Layout
Quantization
INT8 quantization reduces model size by 75% with <2% accuracy loss: