Gillespie Chess Engine - alexdukas.dev

Minimax Search Algorithm

At its core, Gillespie uses the minimax algorithm to search the game tree. The algorithm explores possible future positions by alternating between maximizing White's advantage and minimizing Black's advantage, assuming both players make optimal moves.

Core Concept

1.

Generate Moves: From current position, generate all legal moves

2.

Recursive Search: For each move, recursively search opponent's responses

3.

Evaluate Leaves: At maximum depth, evaluate position using MoE-NNUE

4.

Propagate Values: Back-propagate best values up the tree

MAX vs MIN: At each level, the algorithm alternates between maximizing (White's turn) and minimizing (Black's turn) the evaluation score.

Search Tree Visualization

ROOT (MAX)

+0.5

MIN

+0.5

e4

MIN

+0.2

Nf3

MIN

+0.3

d4

+0.7

+0.5

+0.3

+0.2

+0.4

+0.3

Leaf Nodes (NNUE Eval)

Nodes searched ~2.5M nodes/sec

Typical depth 20-25 ply

Alpha-Beta Pruning

Alpha-beta pruning is an optimization that dramatically reduces the number of nodes searched. It eliminates branches that cannot possibly affect the final decision, achieving the same result as minimax but orders of magnitude faster.

How It Works

Alpha (α)

Best value maximizer can guarantee. Initialized to -∞, increases during search.

Beta (β)

Best value minimizer can guarantee. Initialized to +∞, decreases during search.

Cutoff Condition

When β ≤ α, remaining moves cannot improve the position and are pruned.

Efficiency Gains

Pure minimax b^d nodes

Alpha-beta (avg) b^3d/4 nodes

Alpha-beta (perfect) b^d/2 nodes

Speedup factor ~100-1000×

b = branching factor (~35 for chess), d = search depth

Move Ordering Impact

The effectiveness of alpha-beta pruning heavily depends on move ordering. Gillespie employs several techniques to search promising moves first:

Hash Move

Try move from transposition table first

MVV-LVA

Most Valuable Victim - Least Valuable Attacker for captures

Killer Moves

Moves that caused cutoffs at same depth

Advanced Search Techniques

Quiescence Search

Extends search beyond nominal depth for tactical positions (captures, checks, promotions) to avoid horizon effect. Prevents evaluation of unstable positions.

AVG EXTENSION

+8 ply

Transposition Table

Hash table storing previously evaluated positions. Different move orders can reach identical positions; the table prevents redundant evaluation.

TABLE SIZE

16 GB

Null Move Pruning

If passing the turn (illegal in chess) still yields a good position, current position is likely strong. Allows aggressive pruning in non-zugzwang positions.

REDUCTION

R = 3 ply

Late Move Reductions

Later moves in well-ordered search are less likely to be best. Search them at reduced depth initially, re-search if they exceed expectations.

REDUCTION

1-4 ply

Search Statistics

Nodes/Second

2.5M

Avg Depth

22 ply

TT Hit Rate

87%

Prune Rate

65%

MoE-NNUE Architecture

Mixture of Experts

Traditional chess engines employ a single neural network for position evaluation. Gillespie pioneers a Mixture of Experts (MoE) approach, utilizing multiple specialized NNUE models that dynamically adapt to different game phases and position types.

NNUE (Efficiently Updatable Neural Network): A neural network architecture designed for chess evaluation that can be incrementally updated as moves are made, enabling real-time position assessment at millions of nodes per second.

System Overview

01

Position Analysis

Extract features: material balance, king safety, pawn structure, piece mobility

02

Expert Selection

Gating network routes position to specialized expert models based on game phase

03

Evaluation Output

Weighted combination of expert outputs produces final centipawn evaluation

Architecture Flow

Position Input

→

Feature Extraction

→

Gating Network

→

Expert Models (×5)

→

Evaluation

Expert Model Specialization

Each expert model is trained on specific game phases and position types, developing deep understanding of contextual evaluation:

♔

Opening Expert

Specializes in moves 1-15. Focus: development, center control, king safety preparation.

TRAINED POSITIONS

847K

♜

Middlegame Expert

Specializes in moves 16-40. Focus: tactical combinations, piece coordination, attack patterns.

TRAINED POSITIONS

1.2M

♟

Endgame Expert

Specializes in moves 40+. Focus: king activity, pawn promotion, zugzwang positions.

TRAINED POSITIONS

923K

⚔

Tactical Expert

Activates on sharp positions. Focus: sacrifices, forcing sequences, mate threats.

TRAINED POSITIONS

654K

⚖

Positional Expert

Handles quiet, closed positions. Focus: space advantage, piece placement, long-term planning.

TRAINED POSITIONS

782K

Gating Network Mechanism

The gating network is a lightweight neural network that analyzes position features and determines which expert models to activate. It outputs a probability distribution over all experts.

Input Features (64-dim vector)

Move number normalized 0-1

Material count 32 pieces → 5 bits

Position complexity 0-100 scale

Tactical density threats/square

Pawn structure hash encoding

SOFTMAX OUTPUT: The gating network applies softmax activation to produce weights that sum to 1.0, enabling smooth blending between experts during transitional positions.

Example Gate Activations

Opening Expert 0.78

Middlegame Expert 0.15

Endgame Expert 0.02

Tactical Expert 0.03

Positional Expert 0.02

Position Context

Move 8, Italian Game opening, material equal, low tactical complexity

Network Architecture Details

Gating Network

Input Layer

64 features → position encoding

Hidden Layer 1

128 neurons, ReLU activation

Hidden Layer 2

64 neurons, ReLU activation

Output Layer

5 neurons, Softmax activation

Parameters

12.8K

Inference Time

~2 μs

Expert NNUE Models

Input Layer

768 features → HalfKP encoding

Hidden Layers

4 × [512→256→128→64], ClippedReLU

Accumulator

Incremental update cache

Output Layer

1 neuron → centipawn evaluation

Parameters (each)

49.4M

Inference Time

~15 μs

Complete Forward Pass

STEP 1 Position features extracted from board state ~1 μs

STEP 2 Gating network computes expert weights ~2 μs

STEP 3 Top-2 experts evaluated in parallel (AVX2) ~15 μs

STEP 4 Weighted combination of expert outputs <1 μs

TOTAL Final evaluation computed ~19 μs

Performance Metrics

MoE-NNUE vs Single-Model NNUE

Opening Phase Accuracy +8.3%

Single

84.2%

MoE

92.5%

Middlegame Accuracy +6.1%

Single

89.7%

MoE

95.8%

Endgame Accuracy +11.4%

Single

82.1%

MoE

93.5%

Computational Overhead

Single-Model Inference ~15 μs

MoE Inference (Top-2) ~19 μs

Overhead +26.7%

EFFICIENCY TRADE-OFF: The 27% computational overhead yields 8-11% accuracy improvements across all game phases, resulting in +180 Elo gain against baseline single-model engine.

Net Elo Improvement

+180

vs. single-model NNUE

Training Methodology

Two-Stage Training Pipeline

STAGE 1

Expert Pre-Training

• Each expert trained independently on phase-specific positions
• Self-play games filtered by move number and complexity
• Loss function: Mean Squared Error on centipawn values
• Training time: ~120 hours per expert on 8× A100 GPUs

STAGE 2

Gating Network Training

• Expert weights frozen, only gating network trainable
• Reinforcement learning via self-play tournaments
• Reward signal: game outcome + evaluation accuracy
• Training time: ~80 hours on single machine

Training Data Statistics

Total Games

4.3M

Positions

187M

GPU Hours

680

Dataset Size

2.1TB

Bullet Trainer: High-Velocity Training System

The Bullet Trainer is Gillespie's specialized training infrastructure designed for rapid iteration and high-throughput position generation. It generates training data through ultra-fast self-play at drastically reduced time controls.

System Architecture

01

Distributed Engine Cluster

512 CPU cores running parallel self-play games at 1+0 time control

02

Position Filtering

Filter out trivial positions, duplicates, and theoretical endgames

03

Quality Scoring

Rank positions by complexity, tactical richness, and learning value

04

Batch Assembly

Group positions into balanced training batches by game phase

Generation Rate

GAMES/HOUR

~43K

POSITIONS/HOUR

~1.8M

Training Loop

Initial Bootstrap

EPOCH 0

Train on human GM games (Lichess Elite Database). Establishes baseline chess knowledge.

↓

Self-Play Generation

LOOP

Current model plays against itself. Exploration noise added to move selection for diversity.

↓

Model Update

TRAIN

Train on new positions, backpropagate game outcomes. Update weights via Adam optimizer.

↓

Validation

TEST

Play 1000 games vs. previous checkpoint. Keep if Elo gain > +10.

Position Quality Metrics

Complexity Score

Weighted sum of: piece count, mobility, tactical opportunities, pawn structure variety

THRESHOLD

≥ 18.0

Evaluation Variance

Standard deviation of evaluations across multiple depths. High variance = uncertain position

IDEAL RANGE

0.3 - 1.2

Uniqueness Hash

Zobrist hash with piece-square feature encoding. Prevents similar positions in training batch

COLLISION RATE

< 0.01%

Data Augmentation

•

Horizontal Flip Mirror positions along vertical axis (a-file ↔ h-file)

•

Color Swap Invert board and swap piece colors (White ↔ Black)

•

Noise Injection Add ±0.05 random noise to evaluation targets

Hardware Infrastructure

CPU Cluster 64× AMD EPYC 7763

GPU Training 8× NVIDIA A100 80GB

Storage 128TB NVMe RAID

Network 100 Gbps InfiniBand

Total Cost ~$450K

Implementation & Optimization

CPU Optimizations

SIMD Vectorization

AVX2/AVX-512 instructions for parallel expert evaluation. Process 8 positions simultaneously with 256-bit registers.

Incremental Updates

NNUE accumulator caching reduces recomputation. Only affected neurons updated when pieces move.

Sparse Activation

Top-K expert selection (K=2). Only 40% of experts evaluated per position, reducing inference cost.

Memory Layout

Expert Models (5×) 1.2 GB

Gating Network 51 KB

Accumulator Cache 128 MB

Hash Tables 16 GB

Total Memory ~17.3 GB

Quantization

INT8 quantization reduces model size by 75% with <2% accuracy loss:

FP32

1.2 GB

INT8

312 MB

technical

Minimax Search Algorithm

Core Concept

Search Tree Visualization

Alpha-Beta Pruning

How It Works

Efficiency Gains

Move Ordering Impact

Advanced Search Techniques

Quiescence Search

Transposition Table

Null Move Pruning

Late Move Reductions

Search Statistics

MoE-NNUE Architecture

System Overview

Position Analysis

Expert Selection

Evaluation Output

Architecture Flow

Expert Model Specialization

Opening Expert

Middlegame Expert

Endgame Expert

Tactical Expert

Positional Expert

Gating Network Mechanism

Input Features (64-dim vector)

Example Gate Activations

Network Architecture Details

Gating Network

Expert NNUE Models

Complete Forward Pass

Performance Metrics

MoE-NNUE vs Single-Model NNUE

Computational Overhead

Training Methodology

Two-Stage Training Pipeline

Expert Pre-Training

Gating Network Training

Training Data Statistics

Bullet Trainer: High-Velocity Training System

System Architecture

Generation Rate

Training Loop

Initial Bootstrap

Self-Play Generation

Model Update

Validation

Position Quality Metrics

Complexity Score

Evaluation Variance

Uniqueness Hash

Data Augmentation

Hardware Infrastructure

Implementation & Optimization

CPU Optimizations

SIMD Vectorization

Incremental Updates

Sparse Activation

Memory Layout

Quantization