AI Model Licensing: Legal Rules for Open-Source Attribution

AI model licensing determines who can use, modify, and distribute the neural network weights behind today's most powerful AI systems. As open-weight models from organizations like DeepSeek, Moonshot AI, and Meta become the backbone of commercial products, the legal rules around attribution have moved from a niche developer concern to a boardroom issue.
In March 2026, two major incidents put AI licensing in the spotlight. Cursor (developed by Anysphere, valued at $29.3 billion) shipped its Composer 2 feature using Moonshot AI's Kimi K2.5 model without crediting the source. Days later, Rakuten launched "Rakuten AI 3.0" built on DeepSeek V3 after deleting the original MIT license file from the codebase.
These cases illustrate a growing pattern. Companies treat open-weight models as raw material, strip the license notices, and present the result as proprietary technology. The legal consequences of this practice are becoming clearer with each new court ruling.
What Does "Open-Weight" Actually Mean?
The term "open-source AI" is widely used but often misleading. Traditional open-source software licenses (like the GNU GPL or MIT License) were written for source code. AI model weights are not source code. They are mathematical parameters learned during training.
When a company releases model weights under an open license, it grants permission to use, modify, and redistribute those weights. But that permission comes with conditions. The specific conditions depend on the license chosen.
Most "open-weight" releases use one of four license types: standard MIT, Apache 2.0, GPL/LGPL, or a custom modified license. Each carries different obligations for anyone who builds on top of the model.
Understanding these differences is not optional for companies shipping AI-powered products. Getting it wrong can result in lawsuits, forced code releases, or injunctions that shut down a product entirely.
Common Open-Source License Types and Their Requirements
The table below compares the four most common license types used for AI model releases. Pay close attention to the attribution and distribution requirements, as these are where most violations occur.
| License Type | Attribution Required? | State Changes? | Copyleft? | Commercial Use? | Key Obligation |
|---|---|---|---|---|---|
| MIT | Yes, retain copyright notice | No | No | Yes | Include the original copyright notice and license text in all copies or substantial portions |
| Apache 2.0 | Yes, retain notice + NOTICE file | Yes, must state changes | No | Yes | Preserve NOTICE file; include license text; state modifications; patent grant included |
| GPL/LGPL | Yes | Yes | Yes, derivative works must use same license | Yes, but derivative works must also be GPL | Release source code of derivative works under the same GPL license |
| Modified MIT (e.g., Kimi K2.5) | Yes, plus additional display rules | Varies | No | Yes, with conditions | Standard MIT requirements plus "prominent display" of model name above revenue/user thresholds |
MIT License: Simple but Binding
The MIT License is the most permissive widely used license. DeepSeek released its V3 model under standard MIT terms. The core obligation is straightforward: include the original copyright notice and license text in any copy or substantial portion of the software.
"Substantial portion" is the key phrase. When a company fine-tunes DeepSeek V3 and ships the resulting model in a commercial product, the fine-tuned weights are derived from the original. The MIT license notice must travel with them.
Deleting the license file, as Rakuten did with DeepSeek V3 in March 2026, directly violates this requirement. The MIT License is only 170 words long. Compliance takes seconds. Violation invites litigation.
Apache 2.0: More Structure, More Protection
Apache 2.0 adds several requirements beyond MIT. Users must preserve any NOTICE file included with the original distribution. They must state what changes they made. And the license includes an explicit patent grant, which protects downstream users from patent claims by the original developer.
For AI models, the "state changes" requirement is significant. If a company fine-tunes an Apache 2.0 model, it must document that fine-tuning occurred. Simply shipping the model as if it were built from scratch violates the license.
GPL/LGPL: The Copyleft Challenge
GPL and LGPL licenses require that derivative works be released under the same license terms. This creates a "viral" effect. If a company builds a proprietary product on a GPL-licensed model, the entire derivative work may need to be open-sourced.
Few commercial AI models use GPL precisely because of this requirement. But some research models and tools do. Companies must check carefully before incorporating any GPL-licensed components into their AI stack.
Modified MIT: Custom Commercial Terms
Some organizations add custom clauses on top of standard MIT terms. Moonshot AI's Kimi K2.5 uses a Modified MIT License with a notable addition: products that exceed 100 million monthly active users or $20 million in monthly revenue must "prominently display" the Kimi model name.
This type of threshold-based requirement creates a compliance trap. A startup might launch with a Kimi-based product, grow past the threshold, and suddenly find itself in violation. The obligation existed from day one in the license text, but only becomes practically relevant at scale.
Recent AI Licensing Incidents
Several high-profile cases in 2025 and 2026 have tested the boundaries of AI model licensing. These incidents show that license violations are not theoretical risks. They are happening now, at the highest levels of the industry.
| Incident | Date | Details | Status |
|---|---|---|---|
| Cursor (Anysphere) / Kimi K2.5 | March 2026 | Used Kimi K2.5 as base model for Composer 2 feature without attribution. Anysphere valued at $29.3 billion. Competitor Windsurf acknowledged its own fine-tuning of GLM-4.6. | Under scrutiny; no lawsuit filed as of March 2026 |
| Rakuten / DeepSeek V3 | March 2026 | Launched "Rakuten AI 3.0" built on DeepSeek V3 after deleting the MIT license file from the repository. | Public backlash; compliance status unclear |
| Doe v. GitHub (Copilot) | Filed November 2022 | Class action alleging GitHub Copilot reproduces open-source code without required license notices. Two claims survived dismissal: license violation and breach of contract. | Ongoing as of March 2026 |
| Six Japanese AI Models / DeepSeek & Qwen | 2025-2026 | Reporting revealed six of ten major Japanese AI models are based on DeepSeek or Qwen architectures, raising questions about attribution practices across an entire national AI ecosystem. | Industry-wide review |
The Cursor/Kimi K2.5 Incident
Cursor, the AI-powered code editor developed by Anysphere, is one of the most prominent cases. Independent researchers identified that Cursor's Composer 2 feature was built on Moonshot AI's Kimi K2.5 model. Anysphere did not disclose this.
The Kimi K2.5 Modified MIT License requires prominent display of the model name for products above certain revenue and user thresholds. Given Cursor's rapid growth and Anysphere's $29.3 billion valuation, the product likely exceeds both thresholds.
For a deeper analysis of this specific incident, see What Model Is Cursor 2.0?.
Notably, Cursor's competitor Windsurf took a different approach. Windsurf publicly acknowledged that it fine-tuned the GLM-4.6 model, providing the transparency that open-source licenses require.
The Rakuten/DeepSeek Incident
Rakuten's case is more straightforward. The company used DeepSeek V3 (MIT-licensed) as the foundation for its "Rakuten AI 3.0" product. Instead of retaining the MIT license file, Rakuten deleted it entirely.
The MIT License has one core requirement: keep the copyright notice and license text. Rakuten's deletion was a direct, unambiguous violation. This case demonstrates that even large, sophisticated companies with legal teams sometimes treat AI model licenses as optional.
Are AI Model Weights Copyrightable?
The legal status of AI model weights is one of the most important unresolved questions in technology law. If weights are not copyrightable, then license terms governing their use may be harder to enforce. If they are copyrightable, open-source AI licenses carry the full force of copyright law.
The U.S. Copyright Office weighed in on this question in May 2025. In a formal report, the Office stated that the copyrightability of model weights is "highly disputed" but found a "strong argument" that weights can implicate reproduction and derivative-work rights under existing copyright law.
The Office stopped short of a definitive ruling. But the direction is clear. Most legal practitioners and organizations now operate on the assumption that model weights are copyrightable. This assumption strengthens the enforceability of every open-source AI license.
If a future court definitively rules that weights are copyrightable, companies that violated open-source license terms could face statutory copyright damages. Those damages can reach $150,000 per work for willful infringement under 17 U.S.C. Section 504.
Courts Are Treating Licenses as Contracts
A critical legal development is the trend toward treating open-source licenses as enforceable contracts. This matters because contract claims give license holders additional legal tools beyond copyright infringement.
In the ongoing Doe v. GitHub case (filed November 2022), two claims survived the motion to dismiss: a license violation claim and a breach of contract claim. The court's willingness to let the contract claim proceed signals that open-source licenses create binding legal obligations, not just permissions that can be ignored without consequence.
In Bartz v. Anthropic, Judge William Alsup of the Northern District of California described AI training as "transformative, spectacularly so." Despite this finding, Anthropic settled the case for $1.5 billion, suggesting that even companies with strong legal arguments prefer certainty to the risks of litigation.
In Kadrey v. Meta, the court found that Meta's use of copyrighted works for AI training qualified as fair use. But in Thomson Reuters v. ROSS Intelligence, the court reached the opposite conclusion. The inconsistency across cases means no company can rely on fair use as a guaranteed defense.
These cases collectively establish that AI companies must take license terms seriously. The legal environment is unpredictable, and the safest path is compliance.
Copyright Laundering: A Growing Concern
Developer Jamie Tanna coined the term "copyright laundering" to describe a specific pattern in AI development. The process works like this: an AI system ingests code or model weights released under copyleft licenses (like GPL), strips the provenance information during training or fine-tuning, and produces output that appears to be unencumbered by any license.
This pattern is at the heart of the Doe v. GitHub lawsuit. The plaintiffs allege that GitHub Copilot takes code released under licenses that require attribution, processes it through a neural network, and outputs code snippets without the required license notices.
The concept extends directly to AI model weights. When a company takes a GPL-licensed or attribution-required model, fine-tunes it, and ships the result without any license notices, it is performing the same stripping of provenance. The original license obligations do not disappear just because the weights passed through additional training.
Courts have not yet ruled definitively on whether fine-tuning creates a "derivative work" under copyright law. But the trend in case law suggests that courts will look at the substance of the relationship, not just the technical transformation.
How to Stay Compliant
Companies using open-weight AI models in commercial products should follow these practices to avoid legal exposure.
Read the License Before You Ship
This sounds obvious, but the Rakuten and Cursor incidents prove it is not happening consistently. Before incorporating any open-weight model into a product, read the full license text. Pay attention to attribution requirements, modification disclosure rules, and any threshold-based obligations.
Maintain a License Inventory
Track every open-source AI model and component used in your product. For each entry, record the license type, the specific attribution requirements, and whether your product currently complies. Update this inventory whenever you add or change AI model dependencies.
Preserve All License Files
Never delete, rename, or relocate license files from open-source AI model distributions. Include them in your product's distribution, whether that is a software package, a container image, or a cloud deployment.
Document Your Modifications
If you fine-tune, quantize, prune, or otherwise modify an open-weight model, document what you did. Apache 2.0 explicitly requires this. Other licenses may not, but documentation protects you in any future dispute.
Check for Copyleft Contamination
If any component in your AI stack uses a GPL or LGPL license, consult legal counsel before shipping. Copyleft obligations can extend to your entire derivative work, potentially requiring you to open-source your proprietary modifications.
Monitor Threshold Triggers
For licenses with commercial thresholds (like Kimi K2.5's Modified MIT), set up internal alerts when your product approaches the relevant user or revenue milestones. Do not wait until you are past the threshold to start complying.
What Happens Next
The legal landscape for AI model licensing is evolving rapidly. Several developments will shape the next 12 to 24 months.
The Doe v. GitHub case is the most important pending litigation. If the court rules that AI systems must preserve open-source license notices when generating code, the precedent will apply broadly to AI model weight licensing as well.
The U.S. Copyright Office may issue further guidance on the copyrightability of model weights. A definitive statement would resolve the current ambiguity and either strengthen or weaken the enforceability of open-source AI licenses.
Legislatures in the United States and the European Union are considering AI-specific regulations that may include licensing and attribution requirements. The EU AI Act already imposes transparency obligations on certain AI systems, and future amendments could address open-source model attribution directly.
For now, the safest course is to treat every open-source AI license as a binding legal contract. Read it, follow it, and document your compliance. The companies that do this will avoid the legal and reputational consequences that Cursor, Rakuten, and others are now facing.
Sources and References
- 17 U.S.C. Section 504 - Remedies for Infringement(copyright.gov).gov
- U.S. Copyright Office - Copyright and Artificial Intelligence Report (2025)(copyright.gov).gov
- MIT License Full Text(opensource.org)
- Apache License 2.0 Full Text(apache.org)
- GNU General Public License v3.0(gnu.org)
- EU AI Act Official Text(artificialintelligenceact.eu)
- Doe v. GitHub Copilot Litigation - Court Records(courtlistener.com)
- DeepSeek V3 MIT License(github.com)