How to run autonomous AI research.
Give Archimedes a hypothesis, paper, dataset, or benchmark. It produces a complete, reproducible research trace: literature map, plan, code, experiments, metrics, failures, final paper, and a replayable session — start to finish, with no human in the loop.
Overview
AI Research Engineer (codename Archimedes) is an open-source, multi-agent framework that automates the entire lifecycle of machine learning research — from novel hypothesis generation, through literature review and experiment design, to working code, empirical validation, and a reproducible manuscript and a full audit trail of how it got there.
It is built on Google's Agent Development Kit (ADK) for orchestration and the Claude Agent SDK for surgical code implementation. Every run produces an auditable, replayable trace of exactly how the agent thought, what it tried, what failed, and why it made the decisions it did.
Install
You need the Claude Code CLI and a few API keys before your first run.
npm install -g @anthropic-ai/claude-codegit clone https://github.com/archimedes-run/ai-research-engineer.git
cd ai-research-engineer
uv sync --extra devANTHROPIC_API_KEY="your_key"
OPENROUTER_API_KEY="your_key"
SEMANTIC_SCHOLAR_API_KEY="your_key" # optional, raises literature search rate limitsQuickstart
Run the full multi-agent research lifecycle against a single natural-language prompt:
uv run ai-research-engineer "Investigate Kolmogorov-Arnold Networks \
for weather forecasting" --mode orchestratedBy default, output lands in ./agentic_output/ and is preserved after the run. Use --temp-dir for an auto-cleaned scratch run, or --working-dir <path> to pin a custom location.
Execution modes
--mode is required and controls how much autonomy the agent is given:
- orchestrated — the full pipeline: ideation, planning, stage-by-stage implementation, reflection, and manuscript synthesis.
- simple — direct Claude Code execution with no planning overhead. Faster and cheaper for narrow coding tasks.
- evolve — an autonomous Darwinian optimization loop that samples a FAISS database of past attempts, mutates the highest-scoring candidate, and keeps what improves the metric.
Two more flags shape the run: --research-mode novelty|replication toggles between inventing new architectures or strict paper replication, and --template picks the LaTeX template used for the final manuscript.
Research domains
--domain injects domain-specific planning and review heuristics into every agent in the pipeline. Supported domains: aiml, finance, bioinformatics, algorithms, and physics.
The research vault
Every run produces a structured workspace designed to survive long sessions and context resets — the agent never has to re-derive what it already learned:
knowledge_base/— synthesized literature notes and architecture blueprints.literature/— raw full-text sources pulled from ArXiv and Semantic Scholar.workflow/— implementation code, training loops, and neural network modules.results/— metric logs, model checkpoints, and comparison plots.manuscript/— the final, compiled LaTeX/PDF paper.
Python API
Prefer code to a CLI? The same engine is a plain async-first Python class with a streaming event interface — this is exactly what powers the CLI and any backend you wire up around it.
from ai_research_engineer import AIEngineer
engineer = AIEngineer(
agent_type="adk", # "adk" | "claude_code" | "evolve"
research_mode="novelty", # "novelty" | "replication"
domain="ai_ml",
working_dir="./my_run",
)
result = engineer.run("Investigate sparse mixture-of-experts routing")
print(result.response)
print(result.files_created)For streaming token-by-token and tool-call events (e.g. to drive a UI), call await engineer.run_async(prompt, stream=True) and iterate the returned async generator.
Architecture
For a full technical breakdown of the agent graph — the ideation loop, the planning loop, the stage orchestrator and reflector, the evolution loop, and the paper-writing loop — read the architecture deep dive.
Read the Architecture →