Google’s AI System Automates Scientific Software Evolution, Outperforming Human-Written Code

AI-Driven Software Evolution

Google researchers have reportedly developed a novel workflow that uses artificial intelligence to automatically improve scientific software, according to sources familiar with the work. The system builds evolutionary “trees” of software tools where each “node” represents an individual program whose performance is evaluated against standard benchmarks. Analysts suggest this approach represents a significant advancement in automated software development for scientific applications.

AI-Driven Software Evolution
Methodology and Implementation
Performance Breakthroughs
Expert Reactions and Potential Impact
Cautions and Considerations
Beyond Iteration to Discovery

Methodology and Implementation

The system operates by prompting large language models to iteratively improve existing programs, with researchers providing supplementary information including research paper summaries and specialized knowledge. According to reports, the team refined their code-mutation system using tasks from the data-science competition platform Kaggle before applying the method across six scientific domains. For each domain, researchers reportedly grew multiple evolutionary trees containing up to 2,000 nodes each.

Initial nodes were created by asking the LLM to write programs from scratch, either implementing existing methods, combining approaches, or creating entirely new solutions. The mutation process allowed the system to duplicate and modify any node within the tree, not just the best-performing ones, creating what sources describe as an “open-ended discovery process” where evolution could follow unconventional paths to success., according to recent developments

Performance Breakthroughs

The system’s first application involved batch integration of single-cell RNA-sequencing data, where it reportedly generated 40 programs that outperformed ComBat, the best available human-written program in this domain. The top-performing evolved program demonstrated a 14% improvement over the human benchmark.

Subsequent applications included predicting COVID-19 hospitalizations across U.S. states, where the evolved predictors reportedly outperformed all models in the COVID-19 Forecast Hub repository. Additional successful applications included satellite image labeling, predicting neural activity in zebrafish, and time-series forecasting across various domains. In all cases, analysts indicate the evolved programs surpassed existing solutions.

Perhaps most impressively, when applied to calculus problem-solving, the system created variations of a common function that successfully solved 17 of 19 problems that had previously stumped the original program., according to expert analysis

Expert Reactions and Potential Impact

Jenny Zhang, a computer scientist at the University of British Columbia not involved in the research, commented that “It’s really cool to see big companies like Google using evolutionary approaches to make breakthroughs in other scientific fields. It gives me hope that the research direction that I’m doing, when scaled up, can make a big impact.”

Evan Johnson, a biostatistician at Rutgers University who developed the ComBat software that was outperformed in the genomics task, noted that “I think it’s exciting that they could potentially outperform humans without even thinking about it.” According to the preprint, the system reportedly reduces “exploration of a set of ideas from weeks or months to hours or days.”

Cautions and Considerations

Despite the promising results, Johnson offered two notes of caution applicable to any automated code-generation system. First, AI might potentially violate software licenses through plagiarism. Second, if users don’t understand the code and sufficiently oversee its generation, the resulting software could be fragile or untrustworthy.

Xutao Wang, a computational biologist working with Johnson at Rutgers, added the perspective that researchers should “let AI help you make a better solution instead of creating one for you.”

Zhang countered these concerns by drawing parallels to AlphaGo’s development, noting that while early versions learned by imitating human players, the superior AlphaGo Zero learned exclusively through self-play. Similarly, she suggested that with sufficient computing resources, the software evolution system might eventually transcend human guidance constraints and achieve even greater performance.

Beyond Iteration to Discovery

The system has already demonstrated capabilities beyond simple iteration and recombination. According to the paper, some evolved programs for pandemic prediction showed “significant conceptual leaps” beyond existing models. The researchers conclude that this ultimately demonstrates “the power of evolutionary search as a scientific discovery engine.”

Google researchers have answered questions about the work but declined to comment on the record as the manuscript has not yet undergone peer review. The team is reportedly working to make the system available to scientists, and many of the optimized tools can already be found online.