ripopt was developed over about six weeks using Claude Code with a Max subscription ($100/200 per month). I mostly used Opus 4.6 with a 200K context window. About a month in, a 1M token context window was released. Initially this seemed exciting, but I am not sure it improved anything. It is hard to tell because this coincided with the project hitting some challenging problems that slowed down development. It is a telling experience that we do not control what the LLM companies do, and they can change things underneath us at any time.
I used planning mode pretty extensively to break the project into documented phases, and frequently used teams of agents to do things in parallel. In this mode I would instruct Claude to read the literature, sometimes review open source codebases, and create a detailed plan to be executed. I allowed for some stochasticity in this process, and did not require a faithful translation of anything. Claude suggested new ideas in places that were better, although in talking to people they are nonstandard, and may make other things more difficult.
I consider it to have been hard work and a lot of it (~60-70 hours maybe over the 6-7 weeks) to make this. Not the same kind of hard work as learning the methods, and writing the code myself, but hard work to supervise, guide, and decide if an approach was working. The first few weeks were very productive, and got to a working prototype pretty fast. I hit a plateau then that didn’t advance until I got some new problems to work on that exposed bugs and gaps in the code logic. Solving these led to new plateau that I stayed on again until I got a new set of problems to work on. These plateaus were basically development stages where fixing one issue created new ones somewhere else, or where Claude would try to fix things by changing one thing at a time, which would not work. It appeared Claude was just trying things and not “reasoning” about the real problem.
I also used benchmarks and testing heavily to make sure we made progress. I started with the HS test suite which is small and fast, and later added the CUTEst benchmark which is much slower to run. At times I observed that Claude had sort of “fit to the benchmark” and was modifying tests to be more lenient so progress looked better. There were also phases where Claude would make changes specific to problems that did not generalize, or used complex logic that was not robust. This is something you have to watch, and sense the smell of to catch and stop it.
Early on I got very good performance on small problems with dense linear algebra. It was substantially more difficult to get good performance with sparse linear algebra. I partially implemented a solution inspired by MUMPS. This still needs a lot of work, but it is slow to develop this inside ripopt, and I need to learn more about these solvers to make progress. I think this is the main reason for the performance plateau.
I used git branches and worktrees (via Claude) which allowed some parallel development. I occasionally used Claude code web to develop in a remote branch. Mostly I developed on my local machine.
I didn’t do a great job documenting the process. There is a git history of the commits, but this doesn’t capture the planning, the decisions, the failures, and the successes. I wish I had done a better job of this, but it was hard to do while also trying to make progress on the code. I am not sure there is a great way to do this. The development took place over many sessions, and some of those would compact the conversations, or I would clear the context to start on new tasks.
This wasn’t all dedicated work. I frequently did this in parallel with other activities. A common pattern is you ask Claude to do something, and then wait for it to finish which can be minutes or longer before you have to tell it something to do next. It was common to have 3-6 other development projects going at the same time, or to be working on something else while this was in the background. It takes some skill to do a reasonable job at this. Some things were dedicated efforts, primarily the planning work that required frequent iterations, and deep thinking about the directions to choose.
After the initial plans were implemented I had a basic implementation that solved some problems. There isn’t really a debugging cycle to speak of, Claude automatically fixes bugs as you run tests. To continue making progress, I would ask Claude to identify problems in the benchmarks that ipopt solved but ripopt was either much slower, or didn’t solve. Then I would ask Claude to diagnose the problems and propose fixes. I would review these suggestions and choose the directions to implement. This was an iterative process that sometimes took hours or days to get to solutions.
For small problems there are compelling reasons to be excited about ripopt. For larger problems, there is still a lot of work to do to make it competitive.
The 200k token window gets filled pretty fast, and then things tend to degrade. Every session basically starts from scratch. I managed long-term memory by writing planning files (basically telling Claude to do this) that would get read at the beginning of each session. This was pretty effective.
In other sessions, I would task Claude with reviewing the code to identify the missing pieces or functionality to make it competitive with ipopt. There were lots of sessions like this, with interactive back and forth to identify the implementation path.
The process was not that automatic. I still don’t use –dangerously-skip-permissions, that is just too scary. I occasionally found it helpful to intervene and redirect mid-session. My Claude environment is pretty simple, but I have defined a lot of SKILL files that help with brainstorming, planning, etc. I doubt I could reproduce this project exactly again. I am not the same person I was 6 weeks ago though, and the environment is not the same. This has been true of many software projects in the past though.
I think there is some value in this project. First, I learned a lot about creating a project like this, what the limitations are, the timescales, the challenges, etc. Second, there may be code that smells bad to some in the repo, but it is code that works, passes tests, and performs pretty well on benchmarks. In the end we don’t know how all the code we run is written, we mostly care it works.
I did a lot of work introspecting the code with teams of agents that performed red-team reviews, that were setup as rust experts, optimization experts, code reviewers, and educators. Some of these teams were focused on code quality, e.g. eliminating code smells, duplicate code, dead code, and performance enhancments. I created examples and tutorials, and benchmarked the code against ipopt for comparison. For these things, I know it works. I know it doesn’t work well on some medium to large problems, and there is a path to improve this that relies on me learning some new things.
For a while I was running tests on Github actions, but I accidentally used up 3000 minutes of free CI on a set of projects I was working on, and disabled it. I run the tests locally a lot, and Claude runs them more often than I prefer often.
Originally I wanted this to be MIT licensed. However, the initial plan was to base it heavily on the Ipopt codebase and I instructed Claude to review that codebase, which is EPL licensed. Some research I did suggested this would be considered a derivative work, and as that would require the code to be EPL licensed. I decided an EPL license would be appropriate in this scenario. Similarly MUMPS is CeCILL-C licensed, and I used that license in rmumps for the same reason.
This is a hot topic right now (https://simonwillison.net/2026/Mar/5/chardet/). It is a murky area that likely can only be resolved in courts. It is difficult to defend a “clean room” approach to development, but especially when code is written in a new language, with new features, it is not hard to defend that it is different.
Maybe. It is not as mature as established solvers, but if enough people use it and provide feedback maybe it will be (or some future version of it might be). The timeline to get here was only 6 weeks, and I think with 20/20 hindsight an even better version could be made in another 6 weeks. That makes it pretty easy to try new ideas pretty quickly. Don’t like the interface? Write a new one. Don’t like the solver? Write a new one (this is pretty hard but less hard than writing one from scratch).
Probably my biggest concern is probably copyright infringement. Althoough the code is inspired by open-source projects, and licensed consistently with them, I don’t know if other codebases could have been in the training data for Claude, or accessed via web search during development. I have taken some care to include references to the literature where we drew inspiration from, but this may not be comprehensive.
I don’t know if I should publish something about the project, maybe something on arxiv at some point. It is something I don’t mind talking about, but maybe it is not yet at a stage where it is appropriate to publish. Certainly I think there is another stage necessary to get this to be competitive with existing, trusted solvers.
I only trust the performance on benchmarks I have run. New problems almost always expose new bugs or performance issues. These are opportunities to learn and improve, not something to be afraid of in my opinion.
I don’t know if the optimization community will take this seriously. I am an outsider, and this a very different way to develop software. I hope there will be synergy at some point, where people with much more expertise help me implement new ideas.
I have not read all the generated code (I don’t even really know how to write rust, but it is readable). It was generated faster than I could read or understand. That is a feature of LLMs, and I rely heavily on testing and benchmarking to identify problems and verify features. The development process is slowing down, so now there is more time to go back and review it. The six week development cycle was not that sustainable; I still have a day job that isn’t this!
I am mildly concerned about the changes in LLM pricing / access. I use the subscription model, but I think I cost Anthropic much more than the monthly fee. This access is subsidized by other users somehow. If I had to pay the true cost of tokens, which I guess would be in the $500-2000/month range, this would be harder to sustain. It is not that easy to get money for software development, so it would have to be bootstrapped by other work.
I am not too worried about this. In other projects, if there are issues reported on GitHUB, I can get Claude to review them, pose solutions, implement and test them. As long as I have access to LLMs, I think this will be manageable. Without it, like other projects I have had, issues are likely to accumulate unless I find a way to spend more time on them.
ripopt currently lacks community. I have been the only developer and have only recently shared it. It is part of a broader ecosystem I am developing, and as long as that ecosystem is active, I will have motivation to maintain and continue improving it. It is hard work to develop a community to maintain momentum on these projects, and if one doesn’t nucleate around it, it is likely to languish when I inevitably get busy with other things. I hope this is just the beginning, but life has a way of being unpredictable. For example, just three months ago I would not have predicted I even try making this!
I have written a lot of software in my life. This was very different. It was more creative than it might seem; I had to conceive the idea, steer it, think through the best path, etc. I could not have written this myself. I really think the result of this work is better than anything I could have done without Claude code. The code is not perfect, but it is well documented, tested, and does well on benchmarks. I will learn from this, and continue on this path while LLMs are available.
Is it my code? I am not sure. It is not even clear if the code can be copyrighted (the Supreme Court recently declined to overrule a lower court decision that AI generated art cannot be copyrighted). The effort to make this was sufficiently low that I don’t think the original goals of copyright even apply.
I certainly think the last 25 years I spent writing code, learning to debug, understanding algorithms, benchmarking etc, were essential to making this possible. The last year of using LLMs for a very broad range of activites: Youtube videos, fiction, a variety of other software projects, etc. were also crucial in preparing me to do this. I continue to push the boundaries of what I can do with LLMs, and I think this is a very exciting time to be doing this. I am not sure how long this will last, but I am going to enjoy it while it does.