Skip to content

feat: per-page canonical via mathlib3→mathlib4 map#181

Open
FordUniver wants to merge 1 commit intoleanprover-community:deprecate-banner-and-canonicalfrom
FordUniver:canonical-map-for-pr180
Open

feat: per-page canonical via mathlib3→mathlib4 map#181
FordUniver wants to merge 1 commit intoleanprover-community:deprecate-banner-and-canonicalfrom
FordUniver:canonical-map-for-pr180

Conversation

@FordUniver
Copy link
Copy Markdown

No description provided.

Replaces 'every mathlib3 page canonicalises to mathlib4 docs root' with
a per-page lookup against `mathlib4_canonical_map.yaml`. The mathlib4
root version is the many-to-one canonical pattern Google explicitly
rejects (5 common mistakes with rel=canonical), so the signal is
silently dropped.

The map (~3000 entries, ~93% of ported mathlib3 modules) was built from
the wiki-maintained mathlib4-port-status YAML with mathlib4 git rename
chasing, Defs/Basic split handling, deprecated_module shim filtering,
and a curated dictionary of directory renames git similarity detection
cannot follow (GroupCat → Grp, IsROrC → RCLike, etc.). mathlib3 is
frozen so a one-off snapshot is sufficient — the generator script is
kept separately and not included here.

Modules without a mapping fall back to the existing self-canonical
behavior so course-hosted mirror copies are still de-duplicated.
@bryangingechen
Copy link
Copy Markdown
Collaborator

Thanks! You may have answered this somewhere else, but what happens if some of the mathlib4 files get moved / renamed? I guess then the canonical link will be broken. Is that going to be a problem?

@FordUniver
Copy link
Copy Markdown
Author

Thanks! You may have answered this somewhere else, but what happens if some of the mathlib4 files get moved / renamed? I guess then the canonical link will be broken. Is that going to be a problem?

Not an SEO expert and I assume this also depends on each search engine, but I am pretty certain that a broken canonical link will just be ignored. Here is an example of a Django docs page with a broken canonical link. If you look for some content of that page verbatim on Google it does list the deprecated 1.11 page and not the broken 6.0 link, which is what I would expect. So a reasonable practice seems to be to just do a "best effort mapping" and assume that by the time the links go stale the newer version has already established itself in the ranking enough for it to not matter.

I also checked if noindex in combination with a canonical link makes sense, but that seems to not be the case. So it is either try to transfer page rank through canonical links or just noindex the deprecated version. If mathlib3 docs were still continuously re-deployed we could try and be clever about this, updating the map and noindexing when no target exists, but that is probably more hassle than it's worth.

Copy link
Copy Markdown
Collaborator

@bryangingechen bryangingechen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense! @kim-em: what do you think?

@FordUniver
Copy link
Copy Markdown
Author

We should check a deployed version of this btw to double check the canonical links are set as expected for the right pages but not the root page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants