Python pipeline for extracting and analyzing the dependency graph of Lean 4's Mathlib as a multi-layer network (declaration, module, namespace).
The pipeline:
- Extracts the dependency graph from a local Mathlib build using
lean4exportandlean-training-data(seesrc/scripts/extract.sh), producing node and edge CSVs. - Parses raw output into a canonical graph (
src/parser/). - Analyzes the graph at three layers — declaration, module, namespace — computing degree distributions, PageRank/HITS, betweenness, communities, cascades, and robustness (
src/analysis/). - Generates figures and summary tables (
src/plots/,src/scripts/).
The extracted graph (308,129 declarations, 8.4M edges, 7,563 modules) is published on HuggingFace: MathNetwork/MathlibGraph. You don't need to re-run extraction to use the analysis scripts — they can load the graph directly from HuggingFace.
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtRun the full analysis pipeline on the published dataset:
python -m src.mainIndividual analyses live under src/scripts/ and can be run directly, e.g.:
python -m src.scripts.analyze_module_depth
python -m src.scripts.run_pagerank_communityTo re-extract the graph from a local Mathlib build:
bash src/scripts/extract.sh --local --output ./outputsrc/parser/— extract graph from Lean 4 sourcessrc/analysis/structure/— degree distribution, PageRank/HITS, betweenness, community detectionsrc/analysis/dynamics/— cascade and robustness analysissrc/plots/— figure generationsrc/scripts/— standalone analyses and pipeline entry pointssrc/tests/— pytest suite
MIT