BioEvolve Bench

Can LLMs autonomously evolve faster, better bioinformatics algorithms? Track the race across tasks, datasets, and evolution harnesses.

Tasks

–

Evolved Algorithms

Harnesses

Harness Leaderboard

Which evolution harness produces the best results across all tasks? Ranked by tasks won (first place on the primary metric, ties broken by other directional metrics).

#	Harness	Type	Tasks	Results	Best Results
1	Claude Code	agent-loop	-	-	-
2	SkyDiscover (AdaEvolve)	evolutionary	-	-	-
3	SkyDiscover (Beam Search)	evolutionary	-	-	-
4	SkyDiscover (Best-of-N)	evolutionary	-	-	-
5	SkyDiscover (Top-K)	evolutionary	-	-	-

Evolution Tasks

kNN Graph Construction Speed

Build the cell-cell kNN graph faster than scanpy.pp.neighbors while keeping edge-set Jaccard >= 0.9 vs the reference. Train on PBMC 3K (2.6K cells), evaluated on a held-out 50K-cell synthetic dataset to test scaling.

Problem k-Nearest Neighbor Graph Construction

Algorithm scanpy.pp.neighbors

Harness

Leiden Clustering Speed

Evolve Leiden graph clustering to run faster while maintaining clustering quality. Train on PBMC 3K (~2.6K cells, fast iteration); evaluated on a held-out 50K-cell synthetic PBMC dataset that preserves the original cluster structure.

Problem Graph Clustering

Algorithm Leiden Algorithm

Harness

MACS3 Peak Calling

Build a peak-calling implementation that matches or outperforms MACS3 on ATAC-seq data. Train on GM12878 scATAC-seq (111M reads).

Problem Peak Calling

Algorithm MACS3

Harness