Documentation
PhyslibSearch — semantic search for the PhysLib formal physics library.
Overview
PhyslibSearch is a semantic search engine for PhysLib, a formal Lean 4 physics library. It lets you find theorems, definitions, and instances using natural language — e.g. “Schrödinger equation” or “conservation of angular momentum” — without knowing the exact Lean name.
The index covers 9,663 visible declarations across all PhysLib modules, each with a human-readable informal description generated by an LLM and embedded into a high-dimensional vector space for similarity search.
How it works
The pipeline has three stages:
- Informalization. Each Lean 4 declaration is translated into an informal mathematical description (name + explanation) by Gemini. Context such as neighboring declarations in the same module and type dependencies is included in the prompt to improve quality.
- Embedding. The informal description is embedded using
gemini-embedding-2-preview(3072-dimensional cosine space) and stored in ChromaDB. - Search.At query time the user's query is optionally augmented with HyDE (Hypothetical Document Embeddings): an LLM generates a plausible Lean 4 declaration that would answer the query, and that hypothetical declaration is embedded instead of the raw query. This aligns the query embedding with the document embedding space.
API Reference
Base URL: http://physlibsearch.net. All endpoints accept and return JSON. Rate limits apply: 1 req/s on search, 15 req/min on expand.
/searchSearch declarations using natural language. Accepts a batch of queries and returns a ranked list of results per query.
Returns list[list[QueryResult]] — one inner list per input query. Sets a session cookie for feedback tracking.
Request body
{
"query": ["Newton's second law"],
"num_results": 10 // 1–150, default 10
}Response
[
[
{
"result": {
"name": ["UnitExamples", "newtonsSecondWithDim_isDimensionallyCorrect"],
"kind": "theorem",
"module_name": ["UnitExamples"],
"signature": "theorem UnitExamples.newtonsSecondWithDim_isDimensionallyCorrect ...",
"type": "IsDimensionallyCorrect UnitExamples.NewtonsSecondWithDim",
"informal_name": "Newton's second law F = ma is dimensionally correct",
"informal_description": "The proposition representing Newton's second law ...",
"docstring": null,
"value": null
},
"distance": 0.276
}
]
]/expandExpand a natural language query using HyDE (Hypothetical Document Embeddings). Gemini generates a plausible Lean 4 declaration that would answer the query. Pass the result to /search for better embedding alignment.
Request body is a raw JSON string (quoted). Response is a raw JSON string.
Request body
"conservation of angular momentum"
Response
"theorem angularMomentumConservation {J : AngularMomentum} (h : NoExternalTorque J) : IsConserved J := ..."/modulesList all PhysLib modules with their declaration counts.
Response
[
{ "name": ["Physlib", "ClassicalMechanics", "EulerLagrange"], "count": 4 },
{ "name": ["Physlib", "Mathematics", "LinearMaps"], "count": 38 },
...
]/modules/declarationsFetch all visible declarations in a given module, in source order.
Request body
["Physlib", "ClassicalMechanics", "EulerLagrange"]
Response
[
{
"name": ["Physlib", "ClassicalMechanics", "EulerLagrange", "eulerLagrangeEq"],
"kind": "theorem",
"module_name": ["Physlib", "ClassicalMechanics", "EulerLagrange"],
"signature": "theorem Physlib.ClassicalMechanics.EulerLagrange.eulerLagrangeEq ...",
"informal_name": "Euler–Lagrange equation",
"informal_description": "...",
...
}
]/fetchFetch full declaration records by Lean name. Useful for retrieving specific declarations you already know the name of.
Request body
[ ["UnitExamples", "newtonsSecondWithDim_isDimensionallyCorrect"], ["Physlib", "ClassicalMechanics", "EulerLagrange", "eulerLagrangeEq"] ]
Response
[ { "name": [...], "kind": "theorem", ... }, ... ]Example with curl:
# Search
curl -s -X POST http://physlibsearch.net/search \
-H "Content-Type: application/json" \
-d '{"query": ["Schrödinger equation"], "num_results": 5}' | jq .
# Expand query with HyDE, then search
AUGMENTED=$(curl -s -X POST http://physlibsearch.net/expand \
-H "Content-Type: application/json" \
-d '"conservation of energy"')
curl -s -X POST http://physlibsearch.net/search \
-H "Content-Type: application/json" \
-d "{"query": [$AUGMENTED], "num_results": 5}" | jq .Self-hosting
PhyslibSearch runs on three services:
- PostgreSQL — stores declarations, informal descriptions, and query logs
- ChromaDB — local file-based vector store (no separate server needed)
- FastAPI — REST API server (
server.py)
Required environment variables (see .env.example):
GEMINI_API_KEY=... # Google AI Studio key for embeddings & LLM CONNECTION_STRING=... # PostgreSQL connection string CHROMA_PATH=chroma # Path to ChromaDB files GEMINI_MODEL=... # LLM model for informalization (one-time) GEMINI_FAST_MODEL=... # LLM model for HyDE augmentation (per query) GEMINI_EMBEDDING_MODEL=gemini-embedding-2-preview
# 1. Build the index (one-time) python -m database schema # create PostgreSQL schema python -m database jixia # extract declarations with jixia python -m database informal # informalize with LLM python -m database vector-db # compute embeddings # 2. Run the API uvicorn server:app --host 0.0.0.0 --port 8000
Next steps
- Continuous integration. As PhysLib grows, the database should update automatically when new declarations are merged. The indexing pipeline is designed to be incremental — each SQL step uses
ON CONFLICT DO NOTHING, and the informalization and embedding steps skip already-processed entries — but this needs to be verified end-to-end for every stage before wiring up a CI workflow that triggers on new PhysLib commits. - Index QuantumInfo. The
QuantumInfomodule has not been indexed yet due to integration issues between PhysLib's QuantumInfo dependencies and jixia. Once resolved, re-runningpython -m database jixiashould pick it up incrementally without reprocessing the rest of the library.
Open source
PhyslibSearch is open source, built by Kernel Science. View on GitHub ↗
The repository contains the full indexing pipeline, API server, and this frontend.
database/— indexing pipeline (extraction, informalization, embedding)server.py— FastAPI REST APIengine.py— ChromaDB + PostgreSQL retrievalquery_expansion.py— HyDE query expansionfrontend/— this Next.js interface
