Skip to content

SqueezeAILab/sciml-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SciMLAgents: Write the Solver, Not the Solution [Paper]

SciML-Agent is an evaluation framework designed to assess the ability of large language models (LLMs) to generate scientifically grounded ODE-solving code. It introduces dedicated benchmarks and leverages guided prompting and fine-tuning to produce executable and numerically valid solutions. With proper prompting and fine-tuning, current LLMs — even small models like Qwen3 (0.6B) — can write code that is both executable and numerically correct. Our results show that enabling LLMs to write the solver code rather than directly predicting the solution leads to more accurate and interpretable outcomes.

TL;DR: The reasoning capabilities of LLMs have enabled impressive progress in general coding tasks, but it remains unclear whether they can handle scientific computing problems that demand domain-aware numerical decisions, such as selecting appropriate ODE solvers based on stiffness or setting stable tolerances. To explore this, we investigate whether LLMs can serve as SciML agents that translate natural-language ODE specifications into scientifically valid code. We introduce two new benchmarks: a diagnostic set of adversarial misleading problems requiring algebraic reasoning to detect non-stiffness, and a large-scale dataset of 1,000 diverse ODE tasks spanning stiff and non-stiff regimes. We evaluate both open- and closed-source LLMs under unguided and guided prompting settings, and with off-the-shelf versus fine-tuned variants, measuring executability and numerical validity. Our results show that with sufficient context and guided prompts, recent instruction-following models can generate executable and numerically correct solvers, with some open-source systems (e.g., Qwen3) performing strongly without fine-tuning, while older or smaller models still benefit substantially from fine-tuning. These findings demonstrate that careful prompting and fine-tuning can yield specialized SciML agents for ODE solving, establishing a foundation for assessing scientific code generation beyond mere syntactic correctness.

For more details, please check out our paper.


Quick Start

  1. Clone the repository:
git clone https://github.com/SqueezeAILab/sciml-agent
cd sciml-agent
  1. Load the datasets in Python
import json

# Load ODE-1000
with open("./ODE-1000/ode_1000.json", "r") as f:
    ode_data = json.load(f)

# Load Symbolic Equivalence Dataset
with open("./Symbolic_Equivalence_Dataset/dataset/trigonometric.json", "r") as f:
    sym_data = json.load(f)

print("ODE-1000 samples:", len(ode_data))
print("Symbolic Equivalence samples:", len(sym_data))

Citation

SciML-Agent was introduced in the following paper. If you find this evaluation framework useful for your work, we would appreciate it if you cite:

@article{gaonkar2025scimlagent,
  title={SciMLAgents: Write the Solver, Not the Solution},
  author={Gaonkar, Saarth and Zheng, Xiang and Xi, Haocheng and Tiwari, Rishabh and Keutzer, Kurt and Morozov, Dmitriy and Mahoney, Michael and Gholami, Amir},
  journal={arXiv},
  year={2025},
  url={https://arxiv.org/abs/2509.09936}
}

Releases

No releases published

Contributors 3

  •  
  •  
  •