SciMLAgents: Write the Solver, Not the Solution [Paper]

SciML-Agent is an evaluation framework designed to assess the ability of large language models (LLMs) to generate scientifically grounded ODE-solving code. It introduces dedicated benchmarks and leverages guided prompting and fine-tuning to produce executable and numerically valid solutions. With proper prompting and fine-tuning, current LLMs — even small models like Qwen3 (0.6B) — can write code that is both executable and numerically correct. Our results show that enabling LLMs to write the solver code rather than directly predicting the solution leads to more accurate and interpretable outcomes.

TL;DR: The reasoning capabilities of LLMs have enabled impressive progress in general coding tasks, but it remains unclear whether they can handle scientific computing problems that demand domain-aware numerical decisions, such as selecting appropriate ODE solvers based on stiffness or setting stable tolerances. To explore this, we investigate whether LLMs can serve as SciML agents that translate natural-language ODE specifications into scientifically valid code. We introduce two new benchmarks: a diagnostic set of adversarial misleading problems requiring algebraic reasoning to detect non-stiffness, and a large-scale dataset of 1,000 diverse ODE tasks spanning stiff and non-stiff regimes. We evaluate both open- and closed-source LLMs under unguided and guided prompting settings, and with off-the-shelf versus fine-tuned variants, measuring executability and numerical validity. Our results show that with sufficient context and guided prompts, recent instruction-following models can generate executable and numerically correct solvers, with some open-source systems (e.g., Qwen3) performing strongly without fine-tuning, while older or smaller models still benefit substantially from fine-tuning. These findings demonstrate that careful prompting and fine-tuning can yield specialized SciML agents for ODE solving, establishing a foundation for assessing scientific code generation beyond mere syntactic correctness.

For more details, please check out our paper.

Quick Start

Clone the repository:

git clone https://github.com/SqueezeAILab/sciml-agent
cd sciml-agent

Load the datasets in Python

import json

# Load ODE-1000
with open("./ODE-1000/ode_1000.json", "r") as f:
    ode_data = json.load(f)

# Load Symbolic Equivalence Dataset
with open("./Symbolic_Equivalence_Dataset/dataset/trigonometric.json", "r") as f:
    sym_data = json.load(f)

print("ODE-1000 samples:", len(ode_data))
print("Symbolic Equivalence samples:", len(sym_data))

Citation

SciML-Agent was introduced in the following paper. If you find this evaluation framework useful for your work, we would appreciate it if you cite:

@article{gaonkar2025scimlagent,
  title={SciMLAgents: Write the Solver, Not the Solution},
  author={Gaonkar, Saarth and Zheng, Xiang and Xi, Haocheng and Tiwari, Rishabh and Keutzer, Kurt and Morozov, Dmitriy and Mahoney, Michael and Gholami, Amir},
  journal={arXiv},
  year={2025},
  url={https://arxiv.org/abs/2509.09936}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
ODE-1000		ODE-1000
Symbolic_Equivalence_Dataset		Symbolic_Equivalence_Dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SciMLAgents: Write the Solver, Not the Solution [Paper]

Quick Start

Citation

About

Uh oh!

Releases

Contributors 3

Uh oh!

License

SqueezeAILab/sciml-agent

Folders and files

Latest commit

History

Repository files navigation

SciMLAgents: Write the Solver, Not the Solution [Paper]

Quick Start

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 3

Uh oh!