Project-wide LLM policy#3959
Conversation
| * Neurodivergent authors tend to replicate the "terseness" of many LLMs, and often show up as false positives in LLM detection | ||
| * Kenyan authors, many of whom helped filter the data for LLMs, often show up as false positives in LLM detection |
There was a problem hiding this comment.
FWIW, while I've tried to cite everything across the RFC, this section was added after the first draft and I knew both of these had citations, but had a lot of trouble finding explicit articles about it. So, if you happen to find sources for these, I'd be happy to update the RFC to include them.
There was a problem hiding this comment.
When reading that section, I thought back to this article: https://marcusolang.substack.com/p/im-kenyan-i-dont-write-like-chatgpt.
There was a problem hiding this comment.
(To be clear, I do think this blog post is sufficient evidence for the second point, although I'm mostly leaving this open to remind myself to look for other sources too.)
|
|
||
| 1. If the LLM usage is *trivial*, it is completely ignored by the policy and always allowed. Generally, this means that changes made by LLMs are indistinguishable from those made by humans, where the LLM didn't have any creative input into the change. | ||
| 2. If the LLM usage is *slop*, it is considered spam and moderated accordingly. Generally, this means submitting changes made by LLMs with minimal human intervention. | ||
| 3. *Nontrivial* LLM usage must be *disclosed* in ideally as detailed as a manner as possible. This may necessitate additional tooling to notify new contributors about the policy and explain how disclosure works. |
There was a problem hiding this comment.
In PRs I've submitted to rust and cargo, I include a single sentence at the end usually of the form:
No AI tooling of any kind was used during the creation of this PR.
I think including an AI disclosure prompt/template in the GitHub PR template is a simple way to cover off disclosure while allowing the author the freedom to go into as much or as little detail as they feel is appropriate.
There was a problem hiding this comment.
Yeah, having a section on a PR template just to fill in LLM disclosure is a good idea, and that's one of the multiple options that could be used for tooling.
This is actually the one difference between the summary for the RFC and the one proposed for inclusion in the actual policy: for the RFC, the tooling is an important point since it doesn't exist yet, but for the final policy, the goal is to already have that.
I mostly leave the details out of the RFC since they're almost certainly going to be done via experimentation, but this is a pretty easy one to start with.
Co-authored-by: Arhan Chaudhary <arhan.ch@gmail.com>
There was a problem hiding this comment.
Since this change is big enough to be worth noting in a separate thread:
in short, non-trivial usage is now disallowed. Disclosure now acts as a way for contributors to justify that LLM usage did not affect the output, i.e., was trivial.
The main motivation for this was the recent news that the Colossus data centre in Memphis, explicitly cited by this policy as an outlier because the industry has not been so overtly evil, now powers Claude Code, the model that basically everyone on GitHub is using. Now, it's clear that most LLM users on GitHub are directly doing harm harder and faster than before.
Sure, there are other models, but I doubt many people will switch. The arguments laid out here do explicitly support a more restrictive model regardless of this news, and this is only additional justification for going all-in.
It's impossible to look at all the things the AI industry does in a good light. The only possible way is to argue that the ends justify the means, and even then, I would say that anyone who isn't a climate denialist and a racist would at least consider switching away from Claude Code toward a different model.
Overall, the reception of this RFC has been overwhelmingly positive, and I'd like to thank the numerous people who've personally reached out to thank me for what I've done so far. You're the folks who make this worth it.
There may still be a few parts that feel weird to read in light of these revisions. As always, please feel free to point them out on the diff.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
Co-authored-by: +merlan #flirora <flirora@flirora.xyz>
GitHub is being weird. I already merged these changes, so, it's weird it's showing them as unaddressed.
There was a problem hiding this comment.
Whatever it means to you to hear this from the outside: I sympathize greatly with your frustrations here and I appreciate your work more than you can imagine. Please take care, please feel better.
There was a problem hiding this comment.
Thank you for all the work you've done. I also really appreciate it.
Take a break if you need to, take care of yourself. If you wish to vent, feel free to reach out.
Wishing you the best.
There was a problem hiding this comment.
So, it appears that somehow the root of this thread got hidden because of a direct report to GitHub, which is extremely concerning. For now, you can read it on rustbot's view, since it appears that GitHub implemented this (likely new) feature very poorly. But note that not even the moderators can unhide this comment, and I'm not even sure org admins can either. Will have to investigate, but, since Rust Week is this week, might take a while.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
So I understand that my voice does not have very much weight here, since I have only made fairly minor contributions to rust. I do not currently intend to make any future contributions due to the project's external appearance of support of LLMs. I am very strongly in support of this policy, especially now that it has been changed to ban non-trivial LLM use entirely. If it were accepted I would consider contributing to the project again. This is both because of the requirements it would impose but also, more importantly, because it takes the extensive ethical issues seriously rather than pretending that they do not exist or are somehow irrelevant to the practice of software-making. Seeing this gives me hope.
There was a problem hiding this comment.
I also have been thinking of contributing for a while, and having a policy like this would be very encouraging for me. I want to contribute to projects that care about ethical issues.
There was a problem hiding this comment.
As another contributor who hasn't done that much¹, I am in the same boat.
¹ a couple refactors towards const generics during libs blitz, some std API improvements / additions
There was a problem hiding this comment.
thirded, I find pro LLM stances extremely offputting, have made minor contributions to Rust, rust-analyzer etc. in the past, and am absolutely not motivated to further engage with an ecosystem that is welcoming towards this technology.
There was a problem hiding this comment.
Also just clarifying, I explicitly mentioned I appreciated this being posted despite it potentially falling under:
- Simply stating your viewpoint on LLMs, even if you provide reasons. While these arguments can be useful for the RFC, they are better worded as explicit suggestions to specific areas of the RFC, rather than as just general comments.
I think that pointing out the sheer number of people who have been outright turned away from this project because of its inability to reject LLMs and their harm is important. So, as long as people keep their comments on this specific point confined to this thread, I think it's fine.
There was a problem hiding this comment.
I'm in much the same position. I won't contribute until a policy at least as strict as this one is adopted, and honestly even then, I'd still be hesitant to so long as pro-LLM voices are in positions of power in the project.
There was a problem hiding this comment.
I'd like to voice a similar sentiment: I started learning Rust seven(?)-ish months ago, and I've rapidly fallen in love. while I haven't contributed yet, I became a core contributor to/maintainer of the last language I fell in love with, and have already contemplated contributing to Rust once I have a bit more experience reading and writing it. if this policy or another that is similarly strict against the use of LLMs is adopted, and I otherwise feel safe in the community, there is a good chance I will contribute. if Rust adopts a policy that is permissive of LLMs, I assuredly will not contribute
There was a problem hiding this comment.
I haven't contributed to the language myself, but Rust is a language that I really like. It's been my go-to language for personal projects for the last 6 years, it's a language I advocate for at my workplace. And seeing rust-lang/rust-forge@14956c3 made me rather disappointed.
Anyway, just wanted to say thanks to those of you who have contributed and are taking a stand right now. You are appreciated. ❤️
There was a problem hiding this comment.
Procedural update: I updated the summary to properly match the updated policy better, and made some of the small changes I thought were easy to make for now.
Right now, all outstanding requests have been resolved, although so far, most have involved the motivation sections. I would love to see more feedback on the policy specifically, and a clarification of what the folks on the LC would like to see changed before its adoption.
I continue to receive a lot of feedback that this RFC is extremely long, and well, I agree. You're more than welcome to skip on the motivation sections if you're already convinced. But from my understanding, many people are not convinced by default, and thus, I have decided to provide overwhelming evidence to support the policy.
If major stakeholders in the project, particularly those who do not or did not think that an LLM ban would be well supported, would like to comment on either:
- if the RFC did convince you to adopt this policy, what was the most convincing motivation?
- if the RFC did not convince you, why not?
I would appreciate it. Because as it stands, the main reason why I've been hesitant to trim anything here is that a majority of the sections have had at least one person reach out stating that they really felt like a point they wanted to make was addressed by those sections. The only one doesn't make this cut is the "Limits of LLMs" section which, well, I've already said I'm thinking about trimming, but haven't yet because it's going to require a bit more work to make sure side points are moved to the appropriate other sections.
There was a problem hiding this comment.
I’ve been favoring a restrictive policy for reputational and possible legal reasons, but the part of the motivation that most convinced me was about the Colossus data center showing how the environmental effects of AI disproportionately harm minorities.
View all comments
Preface
A lot of discussion has occurred in private about the topic of LLM policy, and while some of that context has been included in the prior art, most of it is intentionally omitted here.
To keep things focused on policy, there are two broad categories of comments we'd like to request you avoid:
In general, defer to the code of conduct.
This RFC is long, and most of it is the sections surrounding and justifying the actual policy, rather than the policy itself. You are both welcome to and encouraged to skip around using the outline feature on GitHub. (In the rendered view, this is the bulleted list button on the top-right of the file view.)
While I have received a lot of feedback that the RFC is perhaps too long, a lot of the sections that might be considered for removal constitute important arguments that have been mentioned. In terms of discussion on length alone, I would appreciate that these arguments be directed on potentially simplifying the text of the policy itself, rather than just removing sections of motivation. Similarly, if you feel a particular argument could be expanded or added, feel free to mention that as well.
In general, I know that this RFC is going to be exhausting, and the discussion before it has been too. Removing details is not really going to help, although I'd be happy to accept any feedback on revising the text of these sections to be shorter without losing any meaning.
Important
Since RFCs involve many conversations at once that can be difficult to follow, please use review comment threads on the text changes instead of direct comments on the RFC.
If you don't have a particular section of the RFC to comment on, you can click on the "Comment on this file" button on the top-right corner of the diff, to the right of the "Viewed" checkbox. This will create a separate thread even if others have commented on the file too.
Existing policies
Right now, this policy is also proposed concurrent with multiple policies which are scoped to specific teams/repos:
rust-lang/rustrepo policy: Add an LLM policy forrust-lang/rustrust-forge#1040This RFC intentionally does not supersede any scoped policies; those policies are free to be merged before this one. In fact, even if after an RFC is accepted, they can still be merged, since updating the policies everywhere takes time, and getting a policy out immediately is still a net benefit.
Once an RFC is accepted, things can be adjusted for consistency.
Summary
This RFC proposes a strict policy regarding generative Artificial Intelligence (AI) models, specifically Large Language Models (LLMs), and their use within the rust-lang organization. It aims to minimize the harm done by LLMs by reducing both the extent they're used and the control they're given over the Rust project. The policy can be summarized in the following checklist with terms that will be defined throughout the RFC:
In terms of additional tooling for disclosure, this RFC encourages the creation of a bot that automatically replies to contributions from new users informing them of the LLM policy and what constitutes sufficient disclosure. As mentioned, in general, going into as much detail as possible (e.g. prompts used, etc.) is preferred, but not always required. The RFC leaves the exact details of such implementation unspecified and up for revision later.
Rendered