1+ <!DOCTYPE html>
2+ < html >
3+
4+ < head >
5+ < meta charset ="utf-8 ">
6+ < meta name ="description " content ="🏎️: Evaluating Agentic Superoptimization on Large Codebases ">
7+ < meta name ="keywords "
8+ content ="FormulaCode, Visual Programming, Computer Vision, Context bottleneck Models, Scientific Discovery, Neurosymbolic Learning, Program Synthesis, Computer Vision ">
9+ < meta name ="viewport " content ="width=device-width, initial-scale=1 ">
10+ < title > FormulaCode: Evaluating Agentic Superoptimization on Large Codebases</ title >
11+
12+ < script >
13+ window . dataLayer = window . dataLayer || [ ] ;
14+
15+ function gtag ( ) {
16+ dataLayer . push ( arguments ) ;
17+ }
18+
19+ gtag ( 'js' , new Date ( ) ) ;
20+ gtag ( 'config' , 'G-PYVRSFMDRL' ) ;
21+ </ script >
22+
23+ < link href ="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro " rel ="stylesheet ">
24+
25+ < link rel ="stylesheet " href ="./static/css/bulma.min.css ">
26+ < link rel ="stylesheet " href ="./static/css/bulma-carousel.min.css ">
27+ < link rel ="stylesheet " href ="./static/css/bulma-slider.min.css ">
28+ < link rel ="stylesheet " href ="./static/css/fontawesome.all.min.css ">
29+ < link rel ="stylesheet " href ="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css ">
30+ < link rel ="stylesheet " href ="./static/css/index.css ">
31+ < link rel ="stylesheet " href ="./static/css/scrollytelling.css ">
32+ < link rel ="icon " href ="https://fav.farm/🌀 " type ="image/x-icon ">
33+
34+ < script src ="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js "> </ script >
35+ < script src ="https://polyfill.io/v3/polyfill.min.js?features=es6 "> </ script >
36+ < script id ="MathJax-script " async src ="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js "> </ script >
37+
38+ < script defer src ="./static/js/fontawesome.all.min.js "> </ script >
39+ < script src ="./static/js/bulma-carousel.min.js "> </ script >
40+ < script src ="./static/js/bulma-slider.min.js "> </ script >
41+ < script src ="./static/js/index.js "> </ script >
42+ </ head >
43+
44+ < body >
45+
46+ < section class ="hero ">
47+ < div class ="hero-body ">
48+ < div class ="container is-max-desktop ">
49+ < div class ="columns is-centered ">
50+ < div class ="column has-text-centered ">
51+ < h1 class ="title is-1 publication-title "> < span class ="formulacode "> FormulaCode</ span > :
52+ Evaluating Agentic Superoptimization on Large Codebases </ h1 >
53+ < div class ="is-size-5 publication-authors ">
54+ < span class ="author-block ">
55+ < a href ="https://atharvas.net "> Atharva Sehgal</ a > < sup > 1*</ sup > ,</ span >
56+ < span class ="author-block ">
57+ < a href ="https://www.linkedin.com/in/jamesahou/ "> James Hou</ a > < sup > 3*</ sup > ,</ span >
58+ < span class ="author-block ">
59+ < a href ="https://www.cs.utexas.edu/~swarat "> Swarat Chaudhuri</ a > < sup > 1</ sup > ,
60+ </ span >
61+ < span class ="author-block ">
62+ < a href ="https://www.jenjsun.com/ "> Jennifer Sun</ a > < sup > 2</ sup > ,</ span >
63+ < span class ="author-block ">
64+ < a href ="https://www.cms.caltech.edu/people/yyue/ "> Yisong Yue</ a > < sup > 3</ sup > </ span >
65+ </ div >
66+ < div class ="is-size-5 publication-authors ">
67+ < span class ="author-block "> < sup > 1</ sup > UT Austin,</ span >
68+ < span class ="author-block "> < sup > 2</ sup > Cornell </ span >
69+ < span class ="author-block "> < sup > 3</ sup > Caltech</ span >
70+ < span class ="author-block "> < sup > *</ sup > Equal Contribution</ span >
71+ </ div >
72+
73+ < div class ="column has-text-centered ">
74+ < div class ="publication-links ">
75+ < span class ="link-block ">
76+ < a href ="static/paper.pdf "
77+ class ="external-link button is-normal is-rounded is-dark ">
78+ < span class ="icon ">
79+ < i class ="fas fa-file-pdf "> </ i >
80+ </ span >
81+ < span > Paper</ span >
82+ </ a >
83+ </ span >
84+ <!-- Code Link. -->
85+ < span class ="link-block ">
86+ < a href ="https://github.com/formula-code "
87+ class ="external-link button is-normal is-rounded is-dark ">
88+ < span class ="icon ">
89+ < i class ="fab fa-github "> </ i >
90+ </ span >
91+ < span > Code</ span >
92+ </ a >
93+ </ span >
94+ <!-- <span class="link-block">
95+ <a href="https://example.com"
96+ class="external-link button is-normal is-rounded is-dark">
97+ <span class="icon">
98+ <i class="fas fa-external-link-alt"></i>
99+ </span>
100+ <span>Short Slide Deck</span>
101+ </a>
102+ </span> -->
103+ < span class ="link-block ">
104+ < a href ="./static/icmlpral-poster.pdf "
105+ class ="external-link button is-normal is-rounded is-dark ">
106+ < span class ="icon ">
107+ < i class ="fas fa-external-link-alt "> </ i >
108+ </ span >
109+ < span > ICML-PRAL Poster</ span >
110+ </ a >
111+ </ span >
112+ </ div >
113+
114+ </ div >
115+ </ div >
116+ </ div >
117+ </ div >
118+ </ div >
119+ </ section >
120+
121+ < section class ="hero teaser ">
122+ < div class ="container is-max-desktop ">
123+ < div class ="hero-body ">
124+ < img src ="./static/images/teaser.svg " style ="max-width: 100%; height: auto; " loading ="eager ">
125+ < div class ="subtitle has-text-centered is-size-6 ">
126+ Test cases streamline performance evaluation but constrain coding agents (e.g., < a
127+ href ="https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ "> AlphaEvolve</ a > )
128+ to a pass/fail reward – a signal too sparse for fostering iterative optimizations. < span
129+ class ="formulacode "> FormulaCode</ span > introduces a live
130+ repository-level benchmark that complements existing work (In gray (< a
131+ href ="https://www.swebench.com/ "> SWE-Bench</ a > )) by challenging agents to
132+ optimize 451 real-world performance bottlenecks against human solutions drawn from
133+ community-maintained benchmarks
134+ (in light blue). These benchmarks provide evaluation functions that capture fine-grained performance
135+ insights, are less
136+ susceptible to data leakage, and expose a larger optimization surface to coding agents.
137+ </ div >
138+
139+ </ div >
140+ </ div >
141+ </ section >
142+
143+
144+ < section class ="section ">
145+ < div class ="container is-max-desktop ">
146+ <!-- Abstract. -->
147+ < div class ="columns is-centered has-text-centered ">
148+ < div class ="column is-four-fifths ">
149+ < h2 class ="title is-3 "> Abstract</ h2 >
150+ < div class ="content has-text-justified ">
151+ < p >
152+ Rapid advances in LLM agents have shown the ability to optimize code using continuous
153+ objective functions — a significant leap beyond traditional code generation techniques.
154+ However, there is an urgent need for novel benchmarks that can effectively measure this
155+ capability and translate it into real-world impact. Current code benchmarks, which often
156+ rely on binary pass/fail outcomes, offer a limited evaluation framework that falls short of
157+ capturing the full potential of these emerging capabilities.
158+ </ p >
159+ < p >
160+ To bridge this gap, we introduce < span class ="formulacode "> FormulaCode</ span > , a novel
161+ benchmark designed for evaluating agentic superoptimization on large codebases, with a focus
162+ on real-world performance optimization. Constructed from a dataset of 451 real-world
163+ performance bottlenecks automatically mined from Github, FormulaCode enables comprehensive
164+ testing of an agent's ability to triage, diagnose, and resolve inefficiencies in realistic
165+ software environments.
166+ </ p >
167+ < p >
168+ FormulaCode proves to be a challenging benchmark for frontier LLMs and agentic frameworks,
169+ with unrestricted repository exploration emerging as a principal component for finding
170+ performance inefficiencies. By introducing FormulaCode, our goal is to drive the development
171+ of next-generation optimization algorithms that meet the rigorous demands of real-world
172+ software projects.
173+ </ p >
174+ </ div >
175+ </ div >
176+ </ div >
177+ <!--/ Abstract. -->
178+ </ div >
179+ </ section >
180+
181+
182+ < section class ="section ">
183+ < div class ="container is-max-desktop ">
184+ < div class ="columns is-centered has-text-centered ">
185+ < div class ="column is-four-fifths ">
186+ < h2 class ="title is-3 "> ⚠️ Work in progress. Check back in a few days for updates!</ h2 >
187+ < div class ="content has-text-justified ">
188+ </ div >
189+ </ div >
190+ </ div >
191+ </ div >
192+ </ section >
193+
194+ < section class ="section ">
195+ < div class ="container is-max-desktop ">
196+ < div class ="columns is-centered ">
197+ < div class ="column is-full-width ">
198+ < h2 class ="title is-3 "> Related Links</ h2 >
199+
200+ < div class ="content has-text-left ">
201+ < p >
202+ This project would not be possible without the excellent work of the community. These are
203+ some relevant papers to better understand the
204+ premise of our work:
205+ </ p >
206+ < ul >
207+ < li > < a href ="https://arxiv.org/abs/2310.06770 "> SWE-bench: Can Language Models Resolve Real-World GitHub Issues?</ a > </ li >
208+ < li > < a href ="https://arxiv.org/abs/2401.03065 "> CRUXEval: Code Reasoning, Understanding, and Execution Evaluation</ a > </ li >
209+ < li > < a href ="https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ "> AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms</ a > </ li >
210+ < li > < a href ="https://arxiv.org/abs/2210.05050 "> Neurosymbolic Programming for Science</ a >
211+ </ li >
212+ </ ul >
213+
214+ </ div >
215+ </ div >
216+ </ div >
217+
218+ </ div >
219+ </ section >
220+
221+
222+ < section class ="section " id ="BibTeX ">
223+ < div class ="container is-max-desktop content ">
224+ < h2 class ="title "> BibTeX</ h2 >
225+ < p >
226+ If you found this post interesting, please read < a href ="static/paper.pdf "> our
227+ paper</ a > for mathematical details and
228+ experimental results. You can cite our paper as follows:
229+ </ p >
230+ < pre > < code > @misc{sehgal2025selfevolvingvisualconceptlibrary,
231+ title={Evaluating Agentic Superoptimization on Large Codebases},
232+ author={Atharva Sehgal and Patrick Yuan and Ziniu Hu and Yisong Yue and Jennifer J. Sun and Swarat Chaudhuri},
233+ year={2025},
234+ eprint={????.?????},
235+ archivePrefix={arXiv},
236+ primaryClass={cs.CV},
237+ url={https://arxiv.org/abs/????.?????},
238+ }</ code > </ pre >
239+ </ div >
240+ </ section >
241+
242+ < footer class ="footer ">
243+ < div class ="container ">
244+ < div class ="content has-text-centered ">
245+ < a class ="icon-link " href ="static/paper.pdf ">
246+ < i class ="fas fa-file-pdf "> </ i >
247+ </ a >
248+ < a class ="icon-link " href ="https://github.com/formula-code " class ="external-link " disabled >
249+ < i class ="fab fa-github "> </ i >
250+ </ a >
251+ </ div >
252+ < div class ="columns is-centered ">
253+ < div class ="column is-8 ">
254+ < div class ="content ">
255+ < p >
256+ This template is based on the < a href ="https://nerfies.github.io/ "> Nerfiles</ a > project
257+ page.
258+ The source code is available < a href ="https://github.com/nerfies/nerfies.github.io "> here</ a >
259+ and is
260+ licensed under a < a rel ="license "
261+ href ="http://creativecommons.org/licenses/by-sa/4.0/ "> Creative
262+ Commons Attribution-ShareAlike 4.0 International License</ a > . I also make heavy use of
263+ the
264+ < a href ="https://github.com/russellsamora/scrollama "> Scrollama.js</ a > package. Please
265+ remember
266+ to cite either the < a href ="https://nerfies.github.io/ "> Nerfiles</ a > website or
267+ < a href ="https://github.com/trishullab/FormulaCode-web "> this website</ a > if you use this
268+ template!
269+ </ p >
270+ </ div >
271+ </ div >
272+ </ div >
273+ </ div >
274+ </ footer >
275+
276+ < script src ="./static/css/d3.min.js "> </ script >
277+ < script src ="./static/scrollama.js "> </ script >
278+ < script src ="./static/js/scrollytelling.js "> </ script >
279+ < script >
280+ // Init scrollable sections.
281+ mobileCorrections ( ) ;
282+ // init("#scientific-discovery");
283+ // init("#cbd");
284+ // init("#FormulaCode-iterations-loop");
285+ // init("#FormulaCode-results");
286+ </ script >
287+ </ body >
288+
289+ </ html >
0 commit comments