Documentation

How to run autonomous AI research.

Give Archimedes a hypothesis, paper, dataset, or benchmark. It produces a complete, reproducible research trace: literature map, plan, code, experiments, metrics, failures, final paper, and a replayable session — start to finish, with no human in the loop.

01 // What it is

Overview

AI Research Engineer (codename Archimedes) is an open-source, multi-agent framework that automates the entire lifecycle of machine learning research — from novel hypothesis generation, through literature review and experiment design, to working code, empirical validation, and a reproducible manuscript and a full audit trail of how it got there.

It is built on Google's Agent Development Kit (ADK) for orchestration and the Claude Agent SDK for surgical code implementation. Every run produces an auditable, replayable trace of exactly how the agent thought, what it tried, what failed, and why it made the decisions it did.

02 // Setup

Install

You need the Claude Code CLI and a few API keys before your first run.

1. install the Claude Code CLI

npm install -g @anthropic-ai/claude-code

2. clone and install

git clone https://github.com/archimedes-run/ai-research-engineer.git
cd ai-research-engineer
uv sync --extra dev

3. configure .env

ANTHROPIC_API_KEY="your_key"
OPENROUTER_API_KEY="your_key"
SEMANTIC_SCHOLAR_API_KEY="your_key"   # optional, raises literature search rate limits

03 // First run

Quickstart

Run the full multi-agent research lifecycle against a single natural-language prompt:

terminal

uv run ai-research-engineer "Investigate Kolmogorov-Arnold Networks \
  for weather forecasting" --mode orchestrated

By default, output lands in ./agentic_output/ and is preserved after the run. Use --temp-dir for an auto-cleaned scratch run, or --working-dir <path> to pin a custom location.

04 // Configuration

Execution modes

--mode is required and controls how much autonomy the agent is given:

orchestrated — the full pipeline: ideation, planning, stage-by-stage implementation, reflection, and manuscript synthesis.
simple — direct Claude Code execution with no planning overhead. Faster and cheaper for narrow coding tasks.
evolve — an autonomous Darwinian optimization loop that samples a FAISS database of past attempts, mutates the highest-scoring candidate, and keeps what improves the metric.

Two more flags shape the run: --research-mode novelty|replication toggles between inventing new architectures or strict paper replication, and --template picks the LaTeX template used for the final manuscript.

05 // Context

Research domains

--domain injects domain-specific planning and review heuristics into every agent in the pipeline. Supported domains: aiml, finance, bioinformatics, algorithms, and physics.

06 // Output

The research vault

Every run produces a structured workspace designed to survive long sessions and context resets — the agent never has to re-derive what it already learned:

knowledge_base/ — synthesized literature notes and architecture blueprints.
literature/ — raw full-text sources pulled from ArXiv and Semantic Scholar.
workflow/ — implementation code, training loops, and neural network modules.
results/ — metric logs, model checkpoints, and comparison plots.
manuscript/ — the final, compiled LaTeX/PDF paper.

07 // Embedding

Python API

Prefer code to a CLI? The same engine is a plain async-first Python class with a streaming event interface — this is exactly what powers the CLI and any backend you wire up around it.

python

from ai_research_engineer import AIEngineer

engineer = AIEngineer(
    agent_type="adk",          # "adk" | "claude_code" | "evolve"
    research_mode="novelty",   # "novelty" | "replication"
    domain="ai_ml",
    working_dir="./my_run",
)

result = engineer.run("Investigate sparse mixture-of-experts routing")
print(result.response)
print(result.files_created)

For streaming token-by-token and tool-call events (e.g. to drive a UI), call await engineer.run_async(prompt, stream=True) and iterate the returned async generator.

08 // Deep dive

Architecture

For a full technical breakdown of the agent graph — the ideation loop, the planning loop, the stage orchestrator and reflector, the evolution loop, and the paper-writing loop — read the architecture deep dive.

Read the Architecture →