BioEvolve Bench

Can LLMs autonomously evolve faster, better bioinformatics algorithms? Track the race across tasks, datasets, and evolution harnesses.

3
Tasks
Evolved Algorithms
5
Harnesses

Harness Leaderboard

Which evolution harness produces the best results across all tasks? Ranked by tasks won (first place on the primary metric, ties broken by other directional metrics).

# Harness Type Tasks Results Best Results
1 Claude Code agent-loop - - -
2 SkyDiscover (AdaEvolve) evolutionary - - -
3 SkyDiscover (Beam Search) evolutionary - - -
4 SkyDiscover (Best-of-N) evolutionary - - -
5 SkyDiscover (Top-K) evolutionary - - -

Evolution Tasks

kNN Graph Construction Speed

Build the cell-cell kNN graph faster than scanpy.pp.neighbors while keeping edge-set Jaccard >= 0.9 vs the reference. Train on PBMC 3K (2.6K cells), evaluated on a held-out 50K-cell synthetic dataset to test scaling.

Problem k-Nearest Neighbor Graph Construction
Algorithm scanpy.pp.neighbors
Harness

Leiden Clustering Speed

Evolve Leiden graph clustering to run faster while maintaining clustering quality. Train on PBMC 3K (~2.6K cells, fast iteration); evaluated on a held-out 50K-cell synthetic PBMC dataset that preserves the original cluster structure.

Problem Graph Clustering
Algorithm Leiden Algorithm
Harness

MACS3 Peak Calling

Build a peak-calling implementation that matches or outperforms MACS3 on ATAC-seq data. Train on GM12878 scATAC-seq (111M reads).

Problem Peak Calling
Algorithm MACS3
Harness