Long-running Claude for Scientific Computing
- Leverage autonomous AI agents to accelerate complex scientific coding tasks.
- Implement multi-day workflows with persistent memory and test oracles for reliability.
- Utilize Claude Code to build differentiable numerical solvers for cosmology research.
- Optimize scientific computing projects with HPC clusters and Git-based coordination.
Scientific computing projects often involve intricate, long-horizon tasks that can take weeks or months to complete. With the advent of advanced AI agents like Claude, researchers can now delegate high-level objectives to autonomous workflows that manage themselves over multiple days. This approach is especially valuable for tasks such as reimplementing numerical solvers or converting legacy scientific software, where clear success criteria and well-scoped goals exist.
In this article, we explore how to apply a long-running Claude agent for scientific computing, focusing on a practical example: developing a differentiable cosmological Boltzmann solver. By combining autonomous AI workflows, persistent memory, and test oracles, this method enables efficient, reliable progress even when the task lies outside the operator’s core expertise. The integration with high-performance computing clusters and Git-based coordination further streamlines the process, making it accessible for academic labs and research groups.
Continue Reading
What Is Long-Running Claude and Why Does It Matter for Scientific Computing?
Long-running Claude refers to the use of the Claude AI agent to autonomously manage and execute complex scientific computing projects over extended periods, often spanning multiple days or weeks. Unlike traditional conversational AI interactions that require constant human input, this approach allows researchers to specify high-level objectives and let the agent handle detailed implementation, testing, and debugging with minimal supervision.
This paradigm shift is crucial because many scientific computing tasks are time-consuming and require iterative refinement. By delegating these tasks to a capable AI agent, researchers can accelerate development cycles, reduce human error, and focus on higher-level scientific questions.
How Does Claude Enable Multi-Day Agentic Coding Workflows?
Claude supports multi-day workflows by maintaining persistent memory across sessions through files like CLAUDE.md and CHANGELOG.md. The CLAUDE.md file contains the project plan, goals, and instructions, which Claude can update as it progresses. Meanwhile, CHANGELOG.md acts as a portable lab notebook, recording completed tasks, failed attempts, and accuracy checkpoints.
This memory persistence prevents redundant work and enables the agent to build upon prior progress effectively. Additionally, Claude uses test oracles—automated test suites and reference implementations—to verify correctness continuously and avoid regressions.
Case Study: Building a Differentiable Cosmological Boltzmann Solver
One compelling example of long-running Claude in action is the implementation of a differentiable cosmological Boltzmann solver. These solvers predict the statistical properties of the Cosmic Microwave Background (CMB) by evolving coupled equations for photons, baryons, neutrinos, and dark matter through the early universe.
Traditional solvers like CLASS and CAMB are foundational in cosmology, but a differentiable version enables gradient-based inference methods, significantly speeding up parameter estimation. Writing this solver in JAX leverages automatic differentiation and GPU acceleration, but the complexity and domain expertise required make it a challenging project.
By using Claude, a non-expert researcher was able to guide the agent to implement a solver with feature parity to CLASS and an accuracy target of 0.1%, a level consistent with the agreement between CLASS and CAMB themselves. This demonstrates how AI-assisted scientific computing can bridge expertise gaps and accelerate research.
Key Components of the Long-Running Claude Workflow
1. Drafting a Clear Project Plan
Success begins with a well-crafted CLAUDE.md file that outlines the project’s goals, deliverables, and constraints. This plan should be iterated locally with Claude until it is comprehensive and clear. For the Boltzmann solver, this included specifying feature parity with CLASS, accuracy targets, and the use of JAX for differentiability.
2. Maintaining Progress with a Changelog
The CHANGELOG.md file acts as the agent’s long-term memory, tracking progress, failed approaches, and accuracy metrics. This prevents the agent from repeating mistakes and provides transparency for human collaborators.
3. Using Test Oracles for Continuous Validation
Claude runs unit tests continuously against a reference implementation (e.g., CLASS C source code) to verify correctness. This test oracle guides the agent’s debugging and development, ensuring scientific rigor.
4. Coordinating Work via Git
Git repositories serve as a coordination and version control mechanism. Claude commits and pushes changes after meaningful work units, runs tests before commits, and maintains a recoverable history. This setup supports hands-off monitoring and rollback if needed.
5. Executing on HPC Clusters
For compute-intensive tasks, Claude operates on HPC clusters using job schedulers like SLURM. Sessions run inside terminal multiplexers (e.g., tmux), allowing detachment and asynchronous monitoring. This infrastructure supports scalability and efficient resource use.
Benefits of Using Long-Running Claude for Scientific Projects
Accelerated research cycles by automating tedious coding and debugging tasks.
Reduced dependency on continuous human oversight, freeing researchers to focus on interpretation and design.
Improved code quality through continuous testing and systematic progress tracking.
Ability to tackle complex, multi-step scientific workflows that require domain knowledge and careful error tracing.
Enhanced reproducibility and transparency via Git-based version control and detailed changelogs.
Challenges and Considerations When Deploying Long-Running Claude
While promising, this approach also presents challenges:
Defining clear and measurable success criteria is essential to guide the agent effectively.
Some scientific tasks require deep domain expertise that may limit the agent’s autonomous capabilities.
Managing compute resources and job scheduling on HPC clusters requires infrastructure knowledge.
Ensuring robust test oracles and reference implementations is critical for reliable validation.
Human oversight remains important for steering, updating instructions, and interpreting results.
Practical Tips for Implementing Long-Running Claude Workflows
Start with a well-defined project scope and success metrics documented in
CLAUDE.md.Set up comprehensive unit tests and integrate them into the agent’s workflow as a test oracle.
Use Git for version control and mandate commits only after passing all tests.
Leverage HPC clusters and terminal multiplexers to run sessions asynchronously and at scale.
Regularly review changelogs and update instructions to refine the agent’s approach.
Combine autonomous agent work with periodic human intervention for quality assurance.
Future Directions for AI-Driven Scientific Computing
Long-running Claude workflows represent a step toward more autonomous scientific discovery. As AI models improve, we anticipate:
Greater integration of domain-specific knowledge bases to enhance agent expertise.
More sophisticated orchestration patterns allowing multiple agents to collaborate in parallel.
Expanded use of differentiable programming and gradient-based inference across scientific domains.
Improved interfaces for human-agent collaboration, enabling seamless steering and feedback.
Broader adoption within academic and industrial research environments to accelerate innovation.
Frequently Asked Questions
CLAUDE.md for project instructions and CHANGELOG.md as a progress log, enabling it to remember past work, update plans, and avoid repeating failed approaches across sessions.Call To Action
Unlock the potential of autonomous AI agents like Claude to accelerate your scientific computing projects. Implement long-running workflows with persistent memory, rigorous testing, and scalable infrastructure to transform complex research tasks into manageable, efficient processes.
Note: Provide a strategic conclusion reinforcing long-term business impact and keyword relevance.

