canvas-exporter takes Canvas courses and writes them out as portable,
offline files you keep yourself — for long-term archival, migration to
another platform, or keeping a readable copy when Canvas is not accessible.
It reads from Canvas's APIs and a Canvas Data 2 Postgres replica; it never
writes anything back to Canvas.
It has two subcommands.
canvas-exporter pipeline exports a course's content and structure — no
student data — into any combination of these formats:
- Moodle backup (
.mbz) — Moodle's own native course-backup format: the same kind of file Moodle produces from Backup course and accepts in Restore course (under the hood, a gzipped tar of Moodle's backup XML). Restoring one rebuilds the course inside Moodle as real Moodle activities — pages, assignments, quizzes with a populated question bank, discussions, modules, and rubrics. This is the format to use when migrating a Canvas course to Moodle. - Static HTML site (
-html.zip) — a self-contained folder of linked HTML pages that opens in any browser, no server needed (e.g. drop it on SharePoint/Teams or use it as a migration staging copy). - WordPress export (
.xml.zip) — a WordPress WXR file you import through WordPress's built-in Tools → Import. - Files-only zip (
-files.zip) — just the course's uploaded files/attachments, with no surrounding content.
canvas-exporter submissions exports the student-side of a course —
grades, comments, rubric scores, submission files, classic-quiz responses,
discussions, peer reviews, and group membership — as per-course CSVs + binary
attachments via the Canvas REST API. No Postgres needed. See the
Submissions section near the end.
The two subcommands have separate flag surfaces and write different output layouts; they intentionally do not share an output root.
- Python 3.14 + uv — see Setup.
- A synced Canvas Data 2 (CD2) Postgres replica — required by
pipeline(thesubmissionssubcommand does not need it). This tool does not sync CD2 itself; it reads from a replica you maintain separately. UBC keeps its replica in sync withubc/canvas-data-2, which is based on Harvard'sHarvard-University-iCommons/canvas-data-2-aws— a serverless app that downloads CD2 and maintains an Aurora PostgreSQL replica. Pointpipelineat your replica with--postgres-url/--postgres-secret-id(see Postgres source). - A Canvas API token — needed by
submissions, and by thepipelineCanvas API step (the.imscc+ New Quizzes fetch). See Canvas API source.
The canvas-exporter pipeline command chains Canvas API + CD2 emitters
in one run and writes each course directly to
<output-dir>/<course_identifier>/. Per-course steps are auto-skipped
when their staging artefact is already on disk, so the same command
serves as fetcher, exporter, or fully-offline replay tool — see the
Recipes section.
uv run canvas-exporter pipeline \
--course-ids-file course_ids.csv \
--canvas-base-url https://canvas.example.com \
--output-dir ./exports--course-ids-file accepts either one Canvas course id per line or a CSV
(the first column is used, or a column named course_id if a header row is
present). Blank lines and # comments are ignored, and duplicate ids are
deduped. For a single course, use --course-id <id> instead. A minimal file:
# course_ids.csv — one id per line
123456
123457
Select output formats with --format (repeatable; defaults to
mbz html wxr fileszip). Each format and how to import it:
--format |
File | Import target / use |
|---|---|---|
mbz |
<cid>.mbz |
Moodle 4.5 native backup → Site administration → Courses → Restore course (native rubric grading, news-forum announcements, Moodle 4.x question bank). |
html |
<cid>-html.zip |
Navigable static HTML site — open locally or host on SharePoint/Teams. |
wxr |
<cid>.xml.zip |
WordPress WXR 1.2 → Tools → Import → WordPress (shipped zipped; the XML compresses ~10×). |
fileszip |
<cid>-files.zip |
The matching .imscc's web_resources/ repacked as a standalone files archive. |
A separate top-level subcommand, canvas-exporter submissions, pulls
student-side data (submissions, grades, comments, attachments) from
Canvas's REST API for offline retention. It is disjoint from pipeline
— different concurrency model, different output layout, no Postgres or
CD2 access required. Student data is sensitive; confirm your
institution's privacy policy and retention rules before running it.
--brand controls the look of the static HTML site only (--format html). It has no effect on the Moodle .mbz or WordPress WXR outputs,
and it never changes course content — only the surrounding presentation.
A theme sets exactly three things on every generated page:
- Stylesheet — the
style.cssinlined into each page (colours, typography, layout). - Wordmark — a short line rendered above the course title (e.g. the institution name). Omitted when empty.
- Footer — the text shown at the bottom of every page (defaults to
Exported from Canvas).
The emitter ships two built-in themes:
generic(default) — neutral slate palette, no institutional wordmark.ubc— University of British Columbia visual identity (the theme this tool was originally built around).
Pick a built-in name or pass a path to a custom theme file:
uv run canvas-exporter pipeline --format html --brand generic ... # default
uv run canvas-exporter pipeline --format html --brand ubc ...
uv run canvas-exporter pipeline --format html --brand path/to/my-theme.toml ...To make your own, write a theme.toml with wordmark and footer
strings and place a sibling style.css next to it (or embed the CSS
inline via a css = key). The built-in themes live under
src/canvas_export/emitters/html_site/themes/ and are a useful
starting point for a fork.
Each worker runs one course end-to-end through every applicable step,
so a slow Canvas export only blocks its own course while other workers
stream through Canvas → bundle → emit in parallel. Caches in
<staging>/ are the source of truth: each step checks whether its
artefact is already present and only runs when it is missing (or the
matching --refresh kind is set).
- Canvas API step writes
<staging>/<cid>.imsccand the New Quizzes sidecar<staging>/<cid>-new-quizzes.json.gz. Auto-skipped when both files are present; force a refetch with--refresh(bare, or--refresh=imscc,nq). - CD2 bundle step reads CD2 rows from Postgres into a
CourseBundleand caches it as<staging>/<cid>.json.gz. On subsequent runs the cached bundle is replayed unless missing/stale or--refresh=bundleis set. The Postgres connection is borrowed for the fetch only and released before emit. - Emit step writes
<cid>.mbz/-html.zip/.xml.zip/<cid>-files.zipdirectly into<output-dir>/<course_identifier>/in a single per-course pass. Output is rebuilt only when any staging file is newer than the requested-format outputs, or when--regen-outputis passed. The matching.imscc(if present) is automatically picked up so blobs are embedded in the.mbzand the same imscc is repacked as-files.zipin the same traversal.
Each per-course folder is published via a tmp-dir + atomic rename, so a
killed worker leaves no half-populated public folder. Staging always
holds intermediates (<cid>.imscc, NQ sidecar, bundle cache) and is
never cleaned up automatically.
For fast offline iteration on emitter bug fixes, run with everything
already staged and add --regen-output to force a rebuild from the
existing caches:
uv run canvas-exporter pipeline \
--course-ids-file course_ids.csv \
--staging-dir ./exports/_staging \
--output-dir ./exports \
--regen-outputWith all imscc / NQ / bundle files staged and no --refresh, the run
never contacts Postgres, AWS, or Canvas — credentials are not required.
This project requires Python 3.14 and uses uv to manage the virtual environment and dependencies.
# install uv (macOS)
brew install uv
# or, cross-platform
curl -LsSf https://astral.sh/uv/install.sh | sh
# install canvas-exporter and dev tools
uv sync --extra dev
# verify
uv run canvas-exporter --helpuv run executes the command inside the project's venv without needing to
activate it. To activate the venv for a shell session, source .venv/bin/activate
(macOS/Linux) or .venv\Scripts\Activate.ps1 (Windows PowerShell).
The CLI exposes two subcommands: canvas-exporter pipeline (course-content
backups, covered here) and canvas-exporter submissions (student-side data,
covered below). For pipeline, use
--course-ids-file for batches or --course-id for a single course.
Per-course steps auto-skip based on staging state; use --refresh to force a
refetch and --regen-output to rebuild output unconditionally. See
Recipes.
Point the exporter at a CD2 Postgres replica with --postgres-url. The DSN
is read from POSTGRES_URL so passwords don't end up in shell history.
The exporter auto-loads a .env file from the current directory:
# .env (in your working directory; do not commit)
POSTGRES_URL=postgresql://canvas_ro:****@canvas-replica.example.internal:5432/canvas?sslmode=require
POSTGRES_SCHEMA=canvasThen run the default pipeline without any Postgres flags:
uv run canvas-exporter pipeline --course-ids-file course_ids.csv --canvas-base-url https://canvas.example.com --output-dir ./exportsFor a one-off course, use --course-id instead:
uv run canvas-exporter pipeline --course-id 123456 --canvas-base-url https://canvas.example.com --output-dir ./exportsIf your Postgres credentials live in AWS Secrets Manager (standard RDS
JSON shape: {username, password, host, port, dbname}), reference the secret
id/ARN instead — the exporter resolves it at startup using your --aws-profile
/ --aws-region:
uv run canvas-exporter pipeline \
--course-ids-file course_ids.csv \
--canvas-base-url https://canvas.example.com \
--postgres-secret-id arn:aws:secretsmanager:us-east-1:123:secret:canvas-replica-AbCd \
--aws-profile stagingIf your AWS account is fronted by SSO/SAML (Okta, ADFS, etc.), get short-lived
credentials with saml2aws before using
--postgres-secret-id. The exporter calls Secrets Manager via boto3, which
reads those credentials from your profile.
# one-time setup
brew install saml2aws
saml2aws configure # walks you through IDP type, URL, role ARN, etc.
# every working session — credentials usually expire in 1 hour
saml2aws login --profile <your-profile>
# verify
aws sts get-caller-identity --profile <your-profile>Pass the same profile name as --aws-profile (or set AWS_PROFILE). If a
run fails with ExpiredToken against Secrets Manager at startup, run
saml2aws login again and rerun. Only the secret resolution touches AWS;
the actual data fetch is over Postgres and isn't affected by AWS credential
expiry mid-run.
| Flag | Env var | Purpose |
|---|---|---|
--postgres-url |
POSTGRES_URL |
Full postgresql:// DSN. |
--postgres-secret-id |
POSTGRES_SECRET_ID |
Secrets Manager secret id/ARN (RDS-format JSON). |
--postgres-schema |
POSTGRES_SCHEMA |
Schema holding Canvas tables (default canvas). |
psycopg3 pipeline mode batches the per-course queries server-side, so each course is one round-trip block instead of N serial queries.
Generate a Canvas access token (Account → Settings → New Access Token) and
export it before running (or put CANVAS_API_TOKEN / CANVAS_API_URL in the
same auto-loaded .env as your Postgres settings):
export CANVAS_API_TOKEN=...
uv run canvas-exporter pipeline \
--canvas-base-url https://canvas.example.com \
--course-ids-file course_ids.csv \
--output-dir ./exportsThe token must have permission to read each course you want to export.
Phase 1 stages <staging>/<course_id>.imscc — Canvas builds the cartridge
server-side; we just poll and download. For single-course iteration use
--course-id <id> instead of --course-ids-file.
Canvas's IMSCC content export does not carry New Quiz (Quiz LTI /
Quizzes.Next) question content — at best you get an LTI launch shell.
The Canvas API step always follows up each IMSCC build with a fetch
against Canvas's /api/quiz/v1/ REST API and writes
<staging>/<course_id>-new-quizzes.json.gz next to the .imscc. The
emit step picks the sidecar up automatically and embeds the NQ items
into the .mbz. See New Quizzes in the .mbz
for the rendering rules.
The fetch is best-effort: if it fails (rate limits, permissions), the
IMSCC is still produced and the course is reported as PARTIAL. If
Canvas hangs in waiting_for_external_tool while building the
cartridge, the IMSCC build is retried with Quiz LTI excluded so the
cartridge still completes; the sidecar then backfills the dropped NQs.
Both behaviours are always on — there are no CLI knobs to turn them
off.
Iterating on emitter logic against the warehouse is slow — every loop
pays for a per-course fetch. The pipeline caches each course's
CourseBundle to <staging>/<cid>.json.gz automatically. On
subsequent runs the cached bundle is replayed instead; Postgres is
only contacted when the cache is missing, stale, or
--refresh=bundle is set:
# 1. First run, online — fetches and stages bundles + IMSCCs.
uv run canvas-exporter pipeline \
--course-ids-file course_ids.csv \
--canvas-base-url https://canvas.example.com \
--output-dir ./exports
# 2. Iterate on emitters with no warehouse / Canvas access (offline).
# Every cache is already staged; --regen-output forces a rebuild from
# the existing files. No POSTGRES_URL / CANVAS_API_TOKEN needed.
uv run canvas-exporter pipeline \
--course-ids-file course_ids.csv \
--staging-dir ./exports/_staging \
--output-dir ./exports \
--regen-output
# 3. Inspect a bundle when chasing a fidelity bug.
gunzip -c ./exports/_staging/12345.json.gz | jq '.bundle.assignment_overrides'
diff <(gunzip -c ./exports/_staging/a.json.gz | jq -S .bundle.modules) \
<(gunzip -c ./exports/_staging/b.json.gz | jq -S .bundle.modules)
# 4. Refresh a stale cached bundle (e.g. after adding a CD2 field).
uv run canvas-exporter pipeline ... --refresh=bundleBundle files are gzipped JSON (<course_id>.json.gz) with a small
wrapper holding bundle_version, fetched_at, and the source.
Non-JSON-native scalars (datetime, Decimal, bytes, UUID) are
wrapped in a 2-key {"__t__": "<tag>", "v": "..."} sentinel so the
round-trip is lossless. JSON-encoded blob fields
(e.g. quiz_questions.question_data) are kept as strings — never
re-parsed at write time, since that would hide the encoding bugs the
bundle is meant to help debug.
Bundle files carry a bundle_version integer (BUNDLE_VERSION in
src/canvas_export/bundle/io.py). Bump this whenever a CD2/postgres
schema change alters what's written into the bundle (a new field, a
shape change, a renamed key). The pipeline enforces the bump:
- Cached bundle is current → loaded as-is, no Postgres connection.
- Cached bundle is older than
BUNDLE_VERSION→ the course falls through to the Postgres branch and the tee overwrites the stale file with a current-version bundle. A warning is logged so you see it happen. If no Postgres URL is configured, the run fails fast at startup with a "rerun with--refresh=bundleand creds" hint. - Cached bundle is newer than
BUNDLE_VERSION→BundleVersionMismatch(FAILED). You're running an older canvas-exporter than wrote the bundle.
A corrupt or future-version bundle is reported as FAILED with
diagnostics in exports/FAILED/.
Each per-course step auto-skips when its staging artefact is present, so the recipes below are mostly about how you arrange staging beforehand:
# Canvas-only refresh (re-fetch imscc + NQ sidecar; reuse cached bundle).
uv run canvas-exporter pipeline \
--course-ids-file ids.csv --output-dir ./out \
--canvas-base-url https://canvas.example.com \
--refresh=imscc,nq
# CD2-only refresh (re-fetch bundle from Postgres; reuse cached imscc).
uv run canvas-exporter pipeline \
--course-ids-file ids.csv --output-dir ./out \
--refresh=bundle
# Imscc-only files.zip (staging has imscc; emit only <cid>-files.zip per course).
uv run canvas-exporter pipeline \
--course-ids-file ids.csv --output-dir ./out \
--staging-dir ./out/_staging \
--format fileszip
# Fast offline iteration on emitter fixes (no DB, no network, rebuild from caches).
uv run canvas-exporter pipeline \
--course-ids-file ids.csv --output-dir ./out \
--staging-dir ./out/_staging \
--regen-output --format mbz| Flag | Env var | Purpose |
|---|---|---|
--course-ids-file |
— | One course id per line, or a CSV (first column, or a course_id header column). Blank lines and # comments ignored; duplicates deduped. Mutually exclusive with --course-id; one of the two is required. |
--course-id |
— | Single Canvas course id (shortcut for --course-ids-file when iterating on one course). |
--output-dir |
— | Where per-course folders land (default ./exports). |
--staging-dir |
— | Where intermediates live (default <output-dir>/_staging). Always kept across runs. |
--format |
— | Output formats. Repeatable / comma-separated. Defaults to mbz html wxr fileszip. |
--workers |
— | Parallel courses (default 1). |
--refresh[=KIND,...] |
— | Force a refetch of one or more staged caches. Bare --refresh refetches everything (imscc, NQ sidecar, bundle); pass --refresh=bundle or --refresh=imscc,nq to refetch a subset. Valid kinds: imscc, nq, bundle. |
--regen-output |
— | Rebuild output unconditionally, even when no staging file is newer than the existing outputs. Default behaviour rebuilds only when an output is missing or older than its staging inputs. |
--dry-run |
— | Run every fetch but write no output files. |
--audit-log |
— | Path to JSONL file: one record per course (status, counts, duration, error). |
--log-level |
— | DEBUG / INFO / WARNING / ERROR. |
| Flag | Env var | Purpose |
|---|---|---|
--canvas-base-url |
CANVAS_API_URL (preferred), CANVAS_BASE_URL (legacy alias) |
Canvas instance URL, e.g. https://canvas.example.com. Required only when at least one course needs an imscc or NQ fetch. Shares CANVAS_API_URL with the submissions subcommand. |
| — | CANVAS_API_TOKEN |
Canvas personal access token. Same conditional requirement as above. |
--canvas-poll-interval |
— | Seconds between Content Export status polls (default 5). |
--canvas-poll-timeout |
— | Max seconds to wait for one course's export (default 1800). |
--canvas-lti-wait-timeout |
— | Max seconds a course may stay in waiting_for_external_tool before we abandon it (default 30). Per-state cap under --canvas-poll-timeout. A stuck NQ build then triggers the retry-without-NQ fallback; the always-on /api/quiz/v1/ sidecar backfills the dropped NQs. |
| Flag | Env var | Purpose |
|---|---|---|
--postgres-url |
POSTGRES_URL |
Full postgresql:// DSN. Required only when at least one course needs a bundle fetch. |
--postgres-secret-id |
POSTGRES_SECRET_ID |
Secrets Manager secret id/ARN (RDS-format JSON). |
--postgres-schema |
POSTGRES_SCHEMA |
Schema holding Canvas tables (default canvas). |
--aws-profile |
AWS_PROFILE |
boto3 profile (used by --postgres-secret-id). |
--aws-region |
AWS_REGION |
AWS region (used by --postgres-secret-id). |
The emit step writes <cid>.mbz (when mbz in --format),
<cid>-html.zip (for html), <cid>.xml.zip (for wxr), and
<cid>-files.zip (for fileszip) directly into
<output-dir>/<course_identifier>/. The matching .imscc from the
Canvas step is auto-detected: when present, the .mbz embeds
web_resources/ blobs and rewrites inline file URLs to
@@PLUGINFILE@@, and the same imscc is repacked as <cid>-files.zip
in the same per-course traversal.
CD2 (Phase 2) only stores file metadata, not blobs. Canvas's Content
Exports API (Phase 1) produces a <course_id>.imscc whose
web_resources/ folder does carry the blobs. The pipeline brings the
two together automatically — there's nothing extra to opt into:
- The
.mbzembeds blobs. When Phase 1 has staged a<cid>.imscc, Phase 2's Moodle emitter embedsweb_resources/blobs asmod_resourcecontent and rewrites inline Canvas file URLs in pages, assignments, forums, quizzes, questions, and the course summary to@@PLUGINFILE@@tokens. Output drops straight into Moodle's "Restore course" UI. - A
-files.zipis packed in the same pass alongside the.mbzwhenfileszipis in--format(it's in the default set). The imscc is opened once per course and shared between the.mbzblob embedder and the files.zip packer.
Per-course output lands in <output-dir>/<course_identifier>/, where
<course_identifier> is derived from canvas.courses.id and
course_code (the same SQL pattern downstream systems use). Courses
not present in canvas.courses fall back to a bare <course_id>/
folder.
When you just need to re-emit artefacts after a CD2 fix, refresh the bundle and let the imscc cache short-circuit the Canvas step:
uv run canvas-exporter pipeline \
--course-ids-file ids.csv --output-dir ./exports \
--refresh=bundle --regen-output \
--format mbz --format htmlPer-course folders are republished atomically (tmp-dir + rename). With
--regen-output, the existing folder is replaced wholesale —
incremental per-format reuse is intentionally not supported, since the
atomic-swap guarantee is more valuable than the I/O savings.
The Postgres source (and bundle replay) pulls the following CD2 tables per course. ADHOC assignment overrides and any tables tied to per-student state (submissions, enrollments, conversations) are intentionally excluded — these exports target courses with no student data yet.
- Course settings + syllabus
- Pages (wiki)
- Assignments + assignment groups + assignment overrides (Section / Group only)
- Quizzes (Classic) + quiz questions + quiz groups + reusable question banks
- New Quizzes (Quiz LTI / Quizzes.Next) + items + item banks — fetched via
Canvas's
/api/quiz/v1/REST API, not CD2 (the data isn't in DAP). The Canvas API step always writes the sidecar; the emit step picks it up automatically when it exists. - Discussions + announcements
- Modules + module items (as the
LearningModulesorganization tree) - Rubrics + rubric associations
- File metadata (binaries are unreachable when Canvas is offline; pair the
CD2 outputs with a Canvas API run —
pipelinedoes this automatically whenever.imsccfiles are present alongside the Phase 2 run) - Course sections + enrollment term
- Custom grading standards
- Learning outcomes + outcome links + outcome groups
- LTI tool configurations (
context_external_tools, course-scoped only)
- Content file blobs are not available via CD2 alone. Pair the run with a
Canvas Content Export
.imsccto embed them —pipelinedoes this automatically whenever an.imsccis staged alongside the CD2 step (see Combining cd2 + Canvas .imscc). - Pages: images are not available without the matching
.imscc.
A gzipped tar — Moodle 4.5 native course backup format. Targets <backup_release>4.5</backup_release>
with build constants from MOODLE_405_STABLE. Two runs of the same course produce
byte-identical output.
moodle_backup.xml — manifest: version, contents, settings
files.xml — file catalog (empty stub when no matching .imscc is staged)
questions.xml — course-level question bank, one category per quiz
gradebook.xml outcomes.xml scales.xml roles.xml groups.xml users.xml completion.xml
course/
course.xml inforef.xml enrolments.xml roles.xml filters.xml calendar.xml comments.xml
sections/section_<id>/
section.xml inforef.xml
activities/<modname>_<cmid>/
<modname>.xml module.xml inforef.xml grades.xml
roles.xml filters.xml calendar.xml comments.xml completion.xml
grading.xml — assignments only; rubric definition (gradingform_rubric)
Canvas content type → Moodle activity:
| Canvas | Moodle | Notes |
|---|---|---|
| Page | mod_page |
Body in <content>; intro stays empty |
| Assignment | mod_assign |
Submission types → plugin_configs; due dates preserved |
| Discussion | mod_forum (general) |
assessed=0; graded discussions log a warning |
| Announcement | mod_forum (news, forcesubscribe=1) |
All announcements live in section 0 |
| Module | course section (format=topics) |
<sequence> lists cmids in order |
| Module item: SubHeader | mod_label |
Preserves module structure |
| Module item: ExternalUrl | mod_url |
|
| Module item: Attachment | mod_resource |
Binary embedded from web_resources/ when the matching .imscc is staged; metadata-only otherwise |
| Quiz (Classic) | mod_quiz + question bank category |
Moodle 4.x <questionbankentryid> shape; questions from CD2 quiz_questions |
| Quiz (New Quizzes / Quiz LTI) | mod_quiz + question bank category |
When the NQ sidecar is loaded; otherwise renders as empty mod_assign placeholder. See New Quizzes in the .mbz. |
| Rubric | gradingform_rubric in grading.xml |
One criterion → one criterion, ratings → levels |
| Syllabus | course/<summary> AND a mod_page in section 0 |
Question type mapping (Canvas → Moodle qtype):
multiple_choice/multiple_answers → multichoice; true_false → truefalse;
short_answer → shortanswer; essay → essay; numerical → numerical;
matching → match; text_only → description;
fill_in_multiple_blanks → multianswer (cloze with SHORTANSWER subquestions
per blank); multiple_dropdowns → multianswer (cloze with MULTICHOICE
subquestions per blank); file_upload → essay configured with required
attachments. Only calculated_question remains downgraded to Moodle's
description (no Moodle equivalent for Canvas's formula engine).
Canvas reusable question banks (assessment_question_banks) are exported
as a "Question Banks" question-category container with one sub-category per
bank; quiz questions sourced from a bank via assessment_question_id
reference the bank's question entry directly (no duplication). Random
quiz_group pulls from banks are NOT yet emitted as Moodle random qtype
slots — bank questions are reachable as a question bank but the quiz won't
auto-pick them at attempt time (v1 limitation).
New Quizzes (Quiz LTI / Quizzes.Next) live in a separate Canvas data
store from Classic Quizzes — CD2/DAP doesn't carry them and Canvas's
IMSCC content export only carries an LTI launch shell. The only source
of NQ items is Canvas's /api/quiz/v1/ REST API, which the Canvas API
step always writes to a per-course sidecar
(<staging>/<course_id>-new-quizzes.json.gz). When the matching
.imscc + NQ sidecar are both staged, the Moodle emitter picks up
the sidecar automatically and renders each NQ as a real mod_quiz
activity (the gradebook-shadow assignment that Canvas auto-creates is
suppressed so the quiz replaces, not duplicates, the empty
placeholder).
NQ interaction → Moodle qtype:
NQ interaction_type_slug |
Moodle qtype |
|---|---|
choice (single-answer) |
multichoice (single) |
choice (multi-answer) |
multichoice (multi) |
true-false |
truefalse |
essay |
essay |
matching |
match |
numeric |
numerical |
rich-fill-blank (open-entry) |
shortanswer (best-effort, v1) |
hot-spot, ordering, categorization, formula, stimulus, file-upload, BankEntry |
description (prompt preserved, 0 pts) |
Each NQ gets its own "Default for <quiz title>" category in the Moodle
question bank, parallel to Classic quizzes. NQ item banks land under a
"New Quiz Item Banks" container with one category per bank.
Without the sidecar, the .mbz still builds — every NQ just renders as the
empty mod_assign placeholder Canvas's shadow assignment row alone produces
(pre-NQ-support behaviour). This is the symptom that prompted the feature:
a course with three Respondus LockDown quizzes restored with three empty
assignments and no questions.
File embedding happens automatically when Phase 1 has staged a
<course_id>.imscc. Without that, files.xml is empty and HTML bodies
keep Canvas absolute URLs — restored courses have working text but
broken image references. With it, web_resources/ blobs are embedded
as mod_resource content and inline Canvas file URLs in pages,
assignments, forums, quizzes, questions, and the course summary are
rewritten to @@PLUGINFILE@@ tokens so Moodle resolves them against
the embedded files on restore.
index.html — module table of contents with links to all content
syllabus.html
pages/<slug>.html
assignments/<slug>.html
discussions/<slug>.html
quizzes/<slug>.html — questions rendered as readable HTML by type
The index is built from the Canvas module structure, so content appears in the same order as students saw it. Items with no corresponding file (e.g. external URLs, file attachments) appear as plain text in the TOC.
Quiz questions are rendered by type: multiple-choice and true/false questions list answer options; matching questions show a left→right table; fill-in-blank and dropdown questions list accepted answers per blank; essay and file-upload questions show a type label only.
A single RSS 2.0 / WXR 1.2 file importable via Tools → Import → WordPress.
Content is mapped to WordPress posts and pages:
| Canvas content | WordPress post type | Category |
|---|---|---|
| Syllabus | page |
Pages |
| Pages (wiki) | page |
Pages |
| Assignments | post |
Assignments |
| Discussions | post |
Discussions |
| Announcements | post |
Announcements |
| Quizzes | post |
Quizzes |
Module names are registered as additional WordPress categories and tagged on every item that belongs to them, preserving the course structure. Quiz questions are rendered as readable HTML (multiple-choice answers listed A/B/C/D, matching questions as a left→right table).
If your CD2 / DAP layout uses different table names (e.g. modules rather than
context_modules), edit the SQL constants in src/canvas_export/queries.py
— every query lives in that one file.
uv run pytestTests use in-memory fixtures; no live Postgres required.
The project's "build" is a trio of fast checks that should be clean before pushing:
uv run ruff check # lint
uv run ty check # type check (Astral's ty)
uv run pytest # teststy is configured in pyproject.toml (under [tool.ty]) to type-check
src/, tests/, and scripts/ against Python 3.14. Tests relax a couple
of rules where lxml .find(...).findtext(...) chains and dict isinstance
narrowing produce noise without indicating real bugs — see the [[tool.ty.overrides]]
block.
The repo ships a .pre-commit-config.yaml that runs ruff + ty on every
commit. Tests are not in the hooks — run uv run pytest locally or rely on CI.
uvx pre-commit install
uvx pre-commit run --all-files # one-off: run against the whole repoA separate subcommand for the student-data half of a course export. Where
pipeline builds offline course backups for restore (no student data), this
command pulls everything a student submitted and everything an instructor
gave back, as per-course CSVs + binary attachments via the Canvas REST API.
No Postgres / CD2 needed.
# Single course → ./submissions/<safe-name>_<id>/
export CANVAS_API_URL=https://canvas.example.com
export CANVAS_API_TOKEN=...
uv run canvas-exporter submissions --course-id 12345
# Many courses + an instructor-supplied folder name per course
uv run canvas-exporter submissions \
--course-mapping ./mapping.csv \
--out-dir ./submissions \
--course-parallelism 4 --download-workers 16
# All courses in a term
uv run canvas-exporter submissions --account-id 1 --term-id 17
# Pull only what you need; preview first
uv run canvas-exporter submissions --course-id 12345 \
--no-classic-quizzes --no-new-quizzes --dry-run -vsubmissions/
summary.json
<course-name>_<id>/
_course.json
manifest.jsonl # resume bookkeeping
failed.log # one line per download failure
README.txt # explains every file/folder here
grades.csv # one row per submission (assignment | classic_quiz | new_quiz)
gradebook.csv # wide-format instructor gradebook (one row per student)
comments.csv # one row per instructor / peer / self comment
rubric_scores.csv # one row per (student × rubric criterion)
attachments.csv # one row per downloaded file (+ annotated copy if any)
classic_quiz_responses.csv # one row per (student × classic-quiz question × attempt)
discussion_topics.csv
discussion_entries.csv
discussion_attachments.csv
peer_reviews.csv
group_membership.csv
assignments/<name>_<id>/
classic_quizzes/<name>_<id>/
new_quizzes/<name>_<id>/
discussions/<topic>_<id>/
Re-running with the same --out-dir and --resume (default on) honours
manifest.jsonl and skips already-downloaded files. --force wipes any
existing per-course folder before re-downloading.
| Flag | Default | Purpose |
|---|---|---|
--course-id N |
— | Repeatable; comma-separated allowed. |
--course-ids-file FILE |
— | One id per line; # comments OK; CSV with course_id header also accepted. |
--course-mapping FILE |
— | CSV with id,course_identifier columns. Each course's output folder becomes <out-dir>/<course_identifier>/. |
--account-id N --term-id N |
— | Every course in a term. Admin scope required. |
--all-my-courses |
off | GET /api/v1/courses. |
--assignments / --no-assignments |
on | |
--discussions / --no-discussions |
on | |
--classic-quizzes / --no-classic-quizzes |
on | |
--new-quizzes / --no-new-quizzes |
on | Submission-level NQ data always exported; per-question student analysis is best-effort via the New Quizzes service. |
--peer-reviews / --no-peer-reviews |
on | Drives comments.csv's author_role=peer classification. |
--groups / --no-groups |
on | group_membership.csv. |
--gradebook / --no-gradebook |
on | Wide-format instructor gradebook.csv. Requires --assignments. |
--quiz-reports / --no-quiz-reports |
on | Per-classic-quiz Student Analysis CSV from Canvas Reports API. |
--nq-lti-analysis / --no-nq-lti-analysis |
on | New Quizzes Student Analysis CSV via a headless LTI launch. |
--annotated-pdfs / --no-annotated-pdfs |
on | SpeedGrader/DocViewer annotated PDFs alongside originals. |
--include-announcements |
off | Include announcement discussion topics. |
--latest-only |
off | Skip older submission versions per student. |
--exclude-ext .mp4,.mov |
— | Skip downloads with these extensions. |
--since YYYY-MM-DD |
— | Skip submissions older than this. |
--out-dir DIR |
./submissions |
Per-course folders land here. |
--resume / --no-resume |
on | Honour manifest.jsonl. |
--force / --no-force |
off | Wipe and re-download courses whose output folder already exists; otherwise the course is skipped. |
--course-parallelism N |
4 | How many courses run concurrently. |
--download-workers N |
16 | File-download thread count per course. |
--max-retries N |
5 | Retries per HTTP request (429 / 5xx / connection errors). |
--dry-run |
off | List what would be downloaded; write nothing. |
-v / -q |
— | Verbose / quiet. |
--api-url |
CANVAS_API_URL env |
Canvas instance URL. |
--api-token |
CANVAS_API_TOKEN env (also CANVAS_API_KEY) |
Canvas access token. |
--audit-log FILE |
— | Optional JSONL: one CourseResult per course (matches pipeline's --audit-log). |
| Capability | Endpoint(s) | Token role needed |
|---|---|---|
| List a course's submissions, grades, comments, rubric assessments | GET /courses/:id/students/submissions?student_ids[]=all |
Teacher or higher on the course |
| List all courses in an account/term | GET /accounts/:id/courses |
Account admin |
| Classic Quiz attempts + per-question answers | GET /quizzes/:id/submissions, GET /quiz_submissions/:id/questions |
Teacher or higher |
| Classic Quiz Student Analysis report | POST/GET /courses/:id/quizzes/:qid/reports |
Teacher or higher |
| New Quiz submission-level data | GET /courses/:id/students/submissions (already covered above) |
Teacher or higher |
| New Quiz student analysis report (best-effort) | POST/GET /api/quiz/v1/courses/:id/quizzes/:aid/reports and/or LTI-launch quiz-api |
Depends on institution |
| SpeedGrader annotated submission PDFs | submission attachment preview_url → DocViewer POST /v2/documents/:id/merged-documents |
Teacher or higher with DocViewer enabled |
Concluded terms: the bulk submissions endpoint can return 0 records on some
concluded courses; the exporter requests enrollment_state[]=active,concluded
explicitly and, if the bulk endpoint still yields nothing, falls back to a
per-assignment loop.
The two pipelines have disjoint inputs (student-side REST vs. course-content
Content Export + CD2), disjoint outputs (CSV+attachments tree vs. binary
backup artefacts), and the flag surface for which content-type to pull is
specific to submissions (--assignments/--discussions/...). Folding
them into one command would mix two different vocabularies. Keep them as
siblings under the same CLI binary instead, sharing .env loading, the
parallel course runner, signal-driven shutdown, and the optional
--audit-log writer.
The scripts/ directory holds dev/QA utilities. They are not CLI entry
points (run them with uv run python scripts/<name>.py); the only supported
command is canvas-exporter.
| Script | What it does |
|---|---|
account_tree.py |
Print a Canvas sub-account tree with per-account course counts. |
copy_imscc_to_output.py |
Copy staged .imscc files from staging into the output tree. |
fetch_canvas_rest_api_docs.py |
Scrape Canvas's REST API docs into the markdown under docs/canvas_rest_api/. |
imscc_to_mbz_smoke.py |
Smoke test: load an .imscc and convert it to a Moodle .mbz. |
organize_by_identifier.py |
Reorganize exported course folders by course identifier. |
qtype_coverage_audit.py |
Audit quiz question-type coverage across staged bundles. |
quiz_fix_audit.py |
Validate quiz structure after fixes; surface coverage gaps. |
rename_to_cpe_naming.py |
Rename course folders to a CPE naming convention. |
- The Canvas Data 2 sync layer this tool reads from is maintained at UBC with
ubc/canvas-data-2, based on Harvard's MIT-licensedHarvard-University-iCommons/canvas-data-2-aws. - The
submissionssubcommand's REST client and per-content-type exporters were ported from the standalonecanvas-assignment-submission-downloadproject.
See NOTICE for attribution details.