Robust licence normalisation with a three-level hierarchy for common licences.
licence-normaliser maps common licence representations (SPDX tokens,
URLs, prose descriptions) to a canonical three-level hierarchy.
- Three-level hierarchy - LicenceFamily → LicenceName → LicenceVersion.
- Wide format support - SPDX tokens, URLs, and prose descriptions for supported licences.
- Creative Commons support - Full CC family with versions and IGO variants.
- Publisher-specific licences - Springer, Nature, Elsevier, Wiley, ACS, and more.
- File-driven data - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms.
- Pluggable parsers - Drop in a new parser class to ingest
any external licence registry. Parsers implement plugin interfaces
(
RegistryPlugin,URLPlugin, etc.). - Strict mode - Raise
LicenceNotFoundErrorinstead of silently returning"unknown". - Caching - LRU caching for performance.
- CLI - Command-line interface with
--strictand--tracesupport.
The library uses a three-level hierarchy:
- LicenceFamily - broad bucket:
"cc","osi","copyleft","publisher-tdm", ... - LicenceName - version-free:
"cc-by","cc-by-nc-nd","mit","wiley-tdm" - LicenceVersion - fully resolved:
"cc-by-3.0","cc-by-nc-nd-4.0"
LicenceVersion also has optional jurisdiction (e.g., "uk",
"au") and scope (e.g., "igo") fields for CC licences.
With uv:
uv pip install licence-normaliserOr with pip:
pip install licence-normaliserfrom licence_normaliser import normalise_licence
v = normalise_licence("CC BY-NC-ND 4.0")
assert str(v) == "cc-by-nc-nd-4.0" # ← LicenceVersion
assert str(v.licence) == "cc-by-nc-nd" # ← LicenceName
assert str(v.licence.family) == "cc" # ← LicenceFamily
# With jurisdiction and scope
v = normalise_licence("http://creativecommons.org/licenses/by-nc/2.0/uk")
assert v.jurisdiction == "uk"
assert v.scope is None
v = normalise_licence("http://creativecommons.org/licenses/by-nc/3.0/igo")
assert v.jurisdiction is None
assert v.scope == "igo"By default, unresolvable inputs return an "unknown" result. Pass
strict=True to raise LicenceNotFoundError instead:
from licence_normaliser import normalise_licence
from licence_normaliser.exceptions import LicenceNotFoundError
# Silent fallback (default)
v = normalise_licence("some-unknown-string")
assert v.family.key == "unknown"
# Strict: raises on unresolvable input
try:
v = normalise_licence("some-unknown-string", strict=True)
except LicenceNotFoundError as exc:
print(exc.raw) # original input
print(exc.cleaned) # cleaned form that failed lookupSet ENABLE_LICENCE_NORMALISER_TRACE=1 or pass trace=True to get
resolution traces showing how the licence was matched:
from licence_normaliser import normalise_licence
# Via function
v = normalise_licence("cc by-nc-nd 3.0 igo", trace=True)
print(v.explain())
# Via class
from licence_normaliser import LicenceNormaliser
ln = LicenceNormaliser(trace=True)
v = ln.normalise_licence("MIT")
print(v.explain())Output shows the resolution pipeline (alias → registry → url → prose → fallback) and which source file + line matched:
Input: 'cc by-nc-nd 3.0 igo' → 'cc by-nc-nd 3.0 igo'
[✓] alias: 'cc by-nc-nd 3.0 igo' → 'cc-by-nc-nd-3.0-igo' (line 139 in aliases.json)
Result:
version_key: 'cc-by-nc-nd-3.0-igo'
name_key: 'cc-by-nc-nd'
family_key: 'cc'
The trace can also be accessed via v._trace for programmatic use.
from licence_normaliser import normalise_licences
results = normalise_licences(["MIT", "Apache-2.0", "CC BY 4.0"])
for r in results:
print(r.key)
# Strict batch - raises on first unresolvable
results = normalise_licences(["MIT", "Apache-2.0"], strict=True)The LicenceNormaliser class lets you inject custom plugin classes for
specialised use cases:
from licence_normaliser import LicenceNormaliser
from licence_normaliser.parsers.alias import AliasParser
from licence_normaliser.parsers.spdx import SPDXParser
# Use only SPDX + Alias plugins (no CC, no publisher URLs)
ln = LicenceNormaliser(
registry=[SPDXParser],
alias=[AliasParser],
family=[AliasParser],
name=[AliasParser],
cache=True,
cache_maxsize=8192,
)
# MIT resolves via SPDX parser
assert str(ln.normalise_licence("MIT")) == "mit"
# CC BY resolves via Alias
assert str(ln.normalise_licence("CC BY-NC-ND 4.0")) == "cc-by-nc-nd-4.0"Note
LicenceNormaliser() automatically loads the full default set of
parsers. To use a reduced set you must explicitly pass all six
plugin lists (registry, url, alias, family, name, prose).
For caching, LicenceNormaliser wraps the resolution method
with lru_cache.
Disable it by passing cache=False for debugging:
from licence_normaliser import LicenceNormaliser
ln = LicenceNormaliser(cache=False)
result = ln.normalise_licence("MIT")licence-normaliser update-data --force
# Fetches fresh SPDX, OpenDefinition, OSI, CreativeCommons, and ScanCode JSONsAll integration tests live in
src/licence_normaliser/tests/test_integration.py
and only import the public API.
Normalise a single licence:
licence-normaliser normalise "MIT"
# Output: mit
licence-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# Licence: cc-by
# Family: cc
licence-normaliser normalise --strict "totally-unknown"
# Exits with code 1 and prints an errorBatch normalise:
licence-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
licence-normaliser batch --strict MIT "Apache-2.0"from licence_normaliser.exceptions import (
DataSourceError, # data source loading errors
LicenceNormaliserError, # base class
LicenceNotFoundError, # raised by strict mode
LicenceNormalisationError, # kept for backwards compatibility
)
from licence_normaliser import (
LicenceTrace, # resolution trace object
LicenceTraceStage, # resolution stage enum
)All tests run inside Docker:
make testTo test a specific Python version:
make test-env ENV=py312MIT
Artur Barseghyan <artur.barseghyan@gmail.com>
