Skip to content

ubc/canvas-exporter

Repository files navigation

canvas-exporter

canvas-exporter takes Canvas courses and writes them out as portable, offline files you keep yourself — for long-term archival, migration to another platform, or keeping a readable copy when Canvas is not accessible. It reads from Canvas's APIs and a Canvas Data 2 Postgres replica; it never writes anything back to Canvas.

It has two subcommands.

canvas-exporter pipeline exports a course's content and structure — no student data — into any combination of these formats:

  • Moodle backup (.mbz) — Moodle's own native course-backup format: the same kind of file Moodle produces from Backup course and accepts in Restore course (under the hood, a gzipped tar of Moodle's backup XML). Restoring one rebuilds the course inside Moodle as real Moodle activities — pages, assignments, quizzes with a populated question bank, discussions, modules, and rubrics. This is the format to use when migrating a Canvas course to Moodle.
  • Static HTML site (-html.zip) — a self-contained folder of linked HTML pages that opens in any browser, no server needed (e.g. drop it on SharePoint/Teams or use it as a migration staging copy).
  • WordPress export (.xml.zip) — a WordPress WXR file you import through WordPress's built-in Tools → Import.
  • Files-only zip (-files.zip) — just the course's uploaded files/attachments, with no surrounding content.

canvas-exporter submissions exports the student-side of a course — grades, comments, rubric scores, submission files, classic-quiz responses, discussions, peer reviews, and group membership — as per-course CSVs + binary attachments via the Canvas REST API. No Postgres needed. See the Submissions section near the end.

The two subcommands have separate flag surfaces and write different output layouts; they intentionally do not share an output root.

Prerequisites

  • Python 3.14 + uv — see Setup.
  • A synced Canvas Data 2 (CD2) Postgres replicarequired by pipeline (the submissions subcommand does not need it). This tool does not sync CD2 itself; it reads from a replica you maintain separately. UBC keeps its replica in sync with ubc/canvas-data-2, which is based on Harvard's Harvard-University-iCommons/canvas-data-2-aws — a serverless app that downloads CD2 and maintains an Aurora PostgreSQL replica. Point pipeline at your replica with --postgres-url / --postgres-secret-id (see Postgres source).
  • A Canvas API token — needed by submissions, and by the pipeline Canvas API step (the .imscc + New Quizzes fetch). See Canvas API source.

canvas-exporter pipeline

The canvas-exporter pipeline command chains Canvas API + CD2 emitters in one run and writes each course directly to <output-dir>/<course_identifier>/. Per-course steps are auto-skipped when their staging artefact is already on disk, so the same command serves as fetcher, exporter, or fully-offline replay tool — see the Recipes section.

uv run canvas-exporter pipeline \
  --course-ids-file course_ids.csv \
  --canvas-base-url https://canvas.example.com \
  --output-dir ./exports

--course-ids-file accepts either one Canvas course id per line or a CSV (the first column is used, or a column named course_id if a header row is present). Blank lines and # comments are ignored, and duplicate ids are deduped. For a single course, use --course-id <id> instead. A minimal file:

# course_ids.csv — one id per line
123456
123457

Select output formats with --format (repeatable; defaults to mbz html wxr fileszip). Each format and how to import it:

--format File Import target / use
mbz <cid>.mbz Moodle 4.5 native backup → Site administration → Courses → Restore course (native rubric grading, news-forum announcements, Moodle 4.x question bank).
html <cid>-html.zip Navigable static HTML site — open locally or host on SharePoint/Teams.
wxr <cid>.xml.zip WordPress WXR 1.2 → Tools → Import → WordPress (shipped zipped; the XML compresses ~10×).
fileszip <cid>-files.zip The matching .imscc's web_resources/ repacked as a standalone files archive.

A separate top-level subcommand, canvas-exporter submissions, pulls student-side data (submissions, grades, comments, attachments) from Canvas's REST API for offline retention. It is disjoint from pipeline — different concurrency model, different output layout, no Postgres or CD2 access required. Student data is sensitive; confirm your institution's privacy policy and retention rules before running it.

Branding

--brand controls the look of the static HTML site only (--format html). It has no effect on the Moodle .mbz or WordPress WXR outputs, and it never changes course content — only the surrounding presentation. A theme sets exactly three things on every generated page:

  • Stylesheet — the style.css inlined into each page (colours, typography, layout).
  • Wordmark — a short line rendered above the course title (e.g. the institution name). Omitted when empty.
  • Footer — the text shown at the bottom of every page (defaults to Exported from Canvas).

The emitter ships two built-in themes:

  • generic (default) — neutral slate palette, no institutional wordmark.
  • ubc — University of British Columbia visual identity (the theme this tool was originally built around).

Pick a built-in name or pass a path to a custom theme file:

uv run canvas-exporter pipeline --format html --brand generic ...   # default
uv run canvas-exporter pipeline --format html --brand ubc ...
uv run canvas-exporter pipeline --format html --brand path/to/my-theme.toml ...

To make your own, write a theme.toml with wordmark and footer strings and place a sibling style.css next to it (or embed the CSS inline via a css = key). The built-in themes live under src/canvas_export/emitters/html_site/themes/ and are a useful starting point for a fork.

Per-course steps

Each worker runs one course end-to-end through every applicable step, so a slow Canvas export only blocks its own course while other workers stream through Canvas → bundle → emit in parallel. Caches in <staging>/ are the source of truth: each step checks whether its artefact is already present and only runs when it is missing (or the matching --refresh kind is set).

  1. Canvas API step writes <staging>/<cid>.imscc and the New Quizzes sidecar <staging>/<cid>-new-quizzes.json.gz. Auto-skipped when both files are present; force a refetch with --refresh (bare, or --refresh=imscc,nq).
  2. CD2 bundle step reads CD2 rows from Postgres into a CourseBundle and caches it as <staging>/<cid>.json.gz. On subsequent runs the cached bundle is replayed unless missing/stale or --refresh=bundle is set. The Postgres connection is borrowed for the fetch only and released before emit.
  3. Emit step writes <cid>.mbz / -html.zip / .xml.zip / <cid>-files.zip directly into <output-dir>/<course_identifier>/ in a single per-course pass. Output is rebuilt only when any staging file is newer than the requested-format outputs, or when --regen-output is passed. The matching .imscc (if present) is automatically picked up so blobs are embedded in the .mbz and the same imscc is repacked as -files.zip in the same traversal.

Each per-course folder is published via a tmp-dir + atomic rename, so a killed worker leaves no half-populated public folder. Staging always holds intermediates (<cid>.imscc, NQ sidecar, bundle cache) and is never cleaned up automatically.

For fast offline iteration on emitter bug fixes, run with everything already staged and add --regen-output to force a rebuild from the existing caches:

uv run canvas-exporter pipeline \
  --course-ids-file course_ids.csv \
  --staging-dir ./exports/_staging \
  --output-dir ./exports \
  --regen-output

With all imscc / NQ / bundle files staged and no --refresh, the run never contacts Postgres, AWS, or Canvas — credentials are not required.

Setup

This project requires Python 3.14 and uses uv to manage the virtual environment and dependencies.

# install uv (macOS)
brew install uv
# or, cross-platform
curl -LsSf https://astral.sh/uv/install.sh | sh

# install canvas-exporter and dev tools
uv sync --extra dev

# verify
uv run canvas-exporter --help

uv run executes the command inside the project's venv without needing to activate it. To activate the venv for a shell session, source .venv/bin/activate (macOS/Linux) or .venv\Scripts\Activate.ps1 (Windows PowerShell).

The CLI exposes two subcommands: canvas-exporter pipeline (course-content backups, covered here) and canvas-exporter submissions (student-side data, covered below). For pipeline, use --course-ids-file for batches or --course-id for a single course. Per-course steps auto-skip based on staging state; use --refresh to force a refetch and --regen-output to rebuild output unconditionally. See Recipes.

Postgres source (default)

Point the exporter at a CD2 Postgres replica with --postgres-url. The DSN is read from POSTGRES_URL so passwords don't end up in shell history. The exporter auto-loads a .env file from the current directory:

# .env (in your working directory; do not commit)
POSTGRES_URL=postgresql://canvas_ro:****@canvas-replica.example.internal:5432/canvas?sslmode=require
POSTGRES_SCHEMA=canvas

Then run the default pipeline without any Postgres flags:

uv run canvas-exporter pipeline --course-ids-file course_ids.csv --canvas-base-url https://canvas.example.com --output-dir ./exports

For a one-off course, use --course-id instead:

uv run canvas-exporter pipeline --course-id 123456 --canvas-base-url https://canvas.example.com --output-dir ./exports

If your Postgres credentials live in AWS Secrets Manager (standard RDS JSON shape: {username, password, host, port, dbname}), reference the secret id/ARN instead — the exporter resolves it at startup using your --aws-profile / --aws-region:

uv run canvas-exporter pipeline \
  --course-ids-file course_ids.csv \
  --canvas-base-url https://canvas.example.com \
  --postgres-secret-id arn:aws:secretsmanager:us-east-1:123:secret:canvas-replica-AbCd \
  --aws-profile staging

Authenticating to AWS with saml2aws

If your AWS account is fronted by SSO/SAML (Okta, ADFS, etc.), get short-lived credentials with saml2aws before using --postgres-secret-id. The exporter calls Secrets Manager via boto3, which reads those credentials from your profile.

# one-time setup
brew install saml2aws
saml2aws configure        # walks you through IDP type, URL, role ARN, etc.

# every working session — credentials usually expire in 1 hour
saml2aws login --profile <your-profile>

# verify
aws sts get-caller-identity --profile <your-profile>

Pass the same profile name as --aws-profile (or set AWS_PROFILE). If a run fails with ExpiredToken against Secrets Manager at startup, run saml2aws login again and rerun. Only the secret resolution touches AWS; the actual data fetch is over Postgres and isn't affected by AWS credential expiry mid-run.

Flag Env var Purpose
--postgres-url POSTGRES_URL Full postgresql:// DSN.
--postgres-secret-id POSTGRES_SECRET_ID Secrets Manager secret id/ARN (RDS-format JSON).
--postgres-schema POSTGRES_SCHEMA Schema holding Canvas tables (default canvas).

psycopg3 pipeline mode batches the per-course queries server-side, so each course is one round-trip block instead of N serial queries.

Canvas API source (Phase 1)

Generate a Canvas access token (Account → Settings → New Access Token) and export it before running (or put CANVAS_API_TOKEN / CANVAS_API_URL in the same auto-loaded .env as your Postgres settings):

export CANVAS_API_TOKEN=...
uv run canvas-exporter pipeline \
  --canvas-base-url https://canvas.example.com \
  --course-ids-file course_ids.csv \
  --output-dir ./exports

The token must have permission to read each course you want to export. Phase 1 stages <staging>/<course_id>.imscc — Canvas builds the cartridge server-side; we just poll and download. For single-course iteration use --course-id <id> instead of --course-ids-file.

New Quizzes sidecar

Canvas's IMSCC content export does not carry New Quiz (Quiz LTI / Quizzes.Next) question content — at best you get an LTI launch shell. The Canvas API step always follows up each IMSCC build with a fetch against Canvas's /api/quiz/v1/ REST API and writes <staging>/<course_id>-new-quizzes.json.gz next to the .imscc. The emit step picks the sidecar up automatically and embeds the NQ items into the .mbz. See New Quizzes in the .mbz for the rendering rules.

The fetch is best-effort: if it fails (rate limits, permissions), the IMSCC is still produced and the course is reported as PARTIAL. If Canvas hangs in waiting_for_external_tool while building the cartridge, the IMSCC build is retried with Quiz LTI excluded so the cartridge still completes; the sidecar then backfills the dropped NQs. Both behaviours are always on — there are no CLI knobs to turn them off.

Bundle files (fast iteration on emitters)

Iterating on emitter logic against the warehouse is slow — every loop pays for a per-course fetch. The pipeline caches each course's CourseBundle to <staging>/<cid>.json.gz automatically. On subsequent runs the cached bundle is replayed instead; Postgres is only contacted when the cache is missing, stale, or --refresh=bundle is set:

# 1. First run, online — fetches and stages bundles + IMSCCs.
uv run canvas-exporter pipeline \
  --course-ids-file course_ids.csv \
  --canvas-base-url https://canvas.example.com \
  --output-dir ./exports

# 2. Iterate on emitters with no warehouse / Canvas access (offline).
#    Every cache is already staged; --regen-output forces a rebuild from
#    the existing files. No POSTGRES_URL / CANVAS_API_TOKEN needed.
uv run canvas-exporter pipeline \
  --course-ids-file course_ids.csv \
  --staging-dir ./exports/_staging \
  --output-dir ./exports \
  --regen-output

# 3. Inspect a bundle when chasing a fidelity bug.
gunzip -c ./exports/_staging/12345.json.gz | jq '.bundle.assignment_overrides'
diff <(gunzip -c ./exports/_staging/a.json.gz | jq -S .bundle.modules) \
     <(gunzip -c ./exports/_staging/b.json.gz | jq -S .bundle.modules)

# 4. Refresh a stale cached bundle (e.g. after adding a CD2 field).
uv run canvas-exporter pipeline ... --refresh=bundle

Bundle files are gzipped JSON (<course_id>.json.gz) with a small wrapper holding bundle_version, fetched_at, and the source. Non-JSON-native scalars (datetime, Decimal, bytes, UUID) are wrapped in a 2-key {"__t__": "<tag>", "v": "..."} sentinel so the round-trip is lossless. JSON-encoded blob fields (e.g. quiz_questions.question_data) are kept as strings — never re-parsed at write time, since that would hide the encoding bugs the bundle is meant to help debug.

Bundle versioning

Bundle files carry a bundle_version integer (BUNDLE_VERSION in src/canvas_export/bundle/io.py). Bump this whenever a CD2/postgres schema change alters what's written into the bundle (a new field, a shape change, a renamed key). The pipeline enforces the bump:

  • Cached bundle is current → loaded as-is, no Postgres connection.
  • Cached bundle is older than BUNDLE_VERSION → the course falls through to the Postgres branch and the tee overwrites the stale file with a current-version bundle. A warning is logged so you see it happen. If no Postgres URL is configured, the run fails fast at startup with a "rerun with --refresh=bundle and creds" hint.
  • Cached bundle is newer than BUNDLE_VERSIONBundleVersionMismatch (FAILED). You're running an older canvas-exporter than wrote the bundle.

A corrupt or future-version bundle is reported as FAILED with diagnostics in exports/FAILED/.

Recipes

Each per-course step auto-skips when its staging artefact is present, so the recipes below are mostly about how you arrange staging beforehand:

# Canvas-only refresh (re-fetch imscc + NQ sidecar; reuse cached bundle).
uv run canvas-exporter pipeline \
  --course-ids-file ids.csv --output-dir ./out \
  --canvas-base-url https://canvas.example.com \
  --refresh=imscc,nq

# CD2-only refresh (re-fetch bundle from Postgres; reuse cached imscc).
uv run canvas-exporter pipeline \
  --course-ids-file ids.csv --output-dir ./out \
  --refresh=bundle

# Imscc-only files.zip (staging has imscc; emit only <cid>-files.zip per course).
uv run canvas-exporter pipeline \
  --course-ids-file ids.csv --output-dir ./out \
  --staging-dir ./out/_staging \
  --format fileszip

# Fast offline iteration on emitter fixes (no DB, no network, rebuild from caches).
uv run canvas-exporter pipeline \
  --course-ids-file ids.csv --output-dir ./out \
  --staging-dir ./out/_staging \
  --regen-output --format mbz

Flags

Flag Env var Purpose
--course-ids-file One course id per line, or a CSV (first column, or a course_id header column). Blank lines and # comments ignored; duplicates deduped. Mutually exclusive with --course-id; one of the two is required.
--course-id Single Canvas course id (shortcut for --course-ids-file when iterating on one course).
--output-dir Where per-course folders land (default ./exports).
--staging-dir Where intermediates live (default <output-dir>/_staging). Always kept across runs.
--format Output formats. Repeatable / comma-separated. Defaults to mbz html wxr fileszip.
--workers Parallel courses (default 1).
--refresh[=KIND,...] Force a refetch of one or more staged caches. Bare --refresh refetches everything (imscc, NQ sidecar, bundle); pass --refresh=bundle or --refresh=imscc,nq to refetch a subset. Valid kinds: imscc, nq, bundle.
--regen-output Rebuild output unconditionally, even when no staging file is newer than the existing outputs. Default behaviour rebuilds only when an output is missing or older than its staging inputs.
--dry-run Run every fetch but write no output files.
--audit-log Path to JSONL file: one record per course (status, counts, duration, error).
--log-level DEBUG / INFO / WARNING / ERROR.

Canvas API

Flag Env var Purpose
--canvas-base-url CANVAS_API_URL (preferred), CANVAS_BASE_URL (legacy alias) Canvas instance URL, e.g. https://canvas.example.com. Required only when at least one course needs an imscc or NQ fetch. Shares CANVAS_API_URL with the submissions subcommand.
CANVAS_API_TOKEN Canvas personal access token. Same conditional requirement as above.
--canvas-poll-interval Seconds between Content Export status polls (default 5).
--canvas-poll-timeout Max seconds to wait for one course's export (default 1800).
--canvas-lti-wait-timeout Max seconds a course may stay in waiting_for_external_tool before we abandon it (default 30). Per-state cap under --canvas-poll-timeout. A stuck NQ build then triggers the retry-without-NQ fallback; the always-on /api/quiz/v1/ sidecar backfills the dropped NQs.

CD2 (Postgres)

Flag Env var Purpose
--postgres-url POSTGRES_URL Full postgresql:// DSN. Required only when at least one course needs a bundle fetch.
--postgres-secret-id POSTGRES_SECRET_ID Secrets Manager secret id/ARN (RDS-format JSON).
--postgres-schema POSTGRES_SCHEMA Schema holding Canvas tables (default canvas).
--aws-profile AWS_PROFILE boto3 profile (used by --postgres-secret-id).
--aws-region AWS_REGION AWS region (used by --postgres-secret-id).

The emit step writes <cid>.mbz (when mbz in --format), <cid>-html.zip (for html), <cid>.xml.zip (for wxr), and <cid>-files.zip (for fileszip) directly into <output-dir>/<course_identifier>/. The matching .imscc from the Canvas step is auto-detected: when present, the .mbz embeds web_resources/ blobs and rewrites inline file URLs to @@PLUGINFILE@@, and the same imscc is repacked as <cid>-files.zip in the same per-course traversal.

Combining cd2 + Canvas .imscc

CD2 (Phase 2) only stores file metadata, not blobs. Canvas's Content Exports API (Phase 1) produces a <course_id>.imscc whose web_resources/ folder does carry the blobs. The pipeline brings the two together automatically — there's nothing extra to opt into:

  • The .mbz embeds blobs. When Phase 1 has staged a <cid>.imscc, Phase 2's Moodle emitter embeds web_resources/ blobs as mod_resource content and rewrites inline Canvas file URLs in pages, assignments, forums, quizzes, questions, and the course summary to @@PLUGINFILE@@ tokens. Output drops straight into Moodle's "Restore course" UI.
  • A -files.zip is packed in the same pass alongside the .mbz when fileszip is in --format (it's in the default set). The imscc is opened once per course and shared between the .mbz blob embedder and the files.zip packer.

Per-course output lands in <output-dir>/<course_identifier>/, where <course_identifier> is derived from canvas.courses.id and course_code (the same SQL pattern downstream systems use). Courses not present in canvas.courses fall back to a bare <course_id>/ folder.

Pushing CD2 fix-ups without re-packing the .imscc

When you just need to re-emit artefacts after a CD2 fix, refresh the bundle and let the imscc cache short-circuit the Canvas step:

uv run canvas-exporter pipeline \
  --course-ids-file ids.csv --output-dir ./exports \
  --refresh=bundle --regen-output \
  --format mbz --format html

Per-course folders are republished atomically (tmp-dir + rename). With --regen-output, the existing folder is replaced wholesale — incremental per-format reuse is intentionally not supported, since the atomic-swap guarantee is more valuable than the I/O savings.

What's exported

The Postgres source (and bundle replay) pulls the following CD2 tables per course. ADHOC assignment overrides and any tables tied to per-student state (submissions, enrollments, conversations) are intentionally excluded — these exports target courses with no student data yet.

  • Course settings + syllabus
  • Pages (wiki)
  • Assignments + assignment groups + assignment overrides (Section / Group only)
  • Quizzes (Classic) + quiz questions + quiz groups + reusable question banks
  • New Quizzes (Quiz LTI / Quizzes.Next) + items + item banks — fetched via Canvas's /api/quiz/v1/ REST API, not CD2 (the data isn't in DAP). The Canvas API step always writes the sidecar; the emit step picks it up automatically when it exists.
  • Discussions + announcements
  • Modules + module items (as the LearningModules organization tree)
  • Rubrics + rubric associations
  • File metadata (binaries are unreachable when Canvas is offline; pair the CD2 outputs with a Canvas API run — pipeline does this automatically whenever .imscc files are present alongside the Phase 2 run)
  • Course sections + enrollment term
  • Custom grading standards
  • Learning outcomes + outcome links + outcome groups
  • LTI tool configurations (context_external_tools, course-scoped only)

Known Limitations

  • Content file blobs are not available via CD2 alone. Pair the run with a Canvas Content Export .imscc to embed them — pipeline does this automatically whenever an .imscc is staged alongside the CD2 step (see Combining cd2 + Canvas .imscc).
  • Pages: images are not available without the matching .imscc.

Layout inside the .mbz

A gzipped tar — Moodle 4.5 native course backup format. Targets <backup_release>4.5</backup_release> with build constants from MOODLE_405_STABLE. Two runs of the same course produce byte-identical output.

moodle_backup.xml             — manifest: version, contents, settings
files.xml                     — file catalog (empty stub when no matching .imscc is staged)
questions.xml                 — course-level question bank, one category per quiz
gradebook.xml outcomes.xml scales.xml roles.xml groups.xml users.xml completion.xml
course/
  course.xml inforef.xml enrolments.xml roles.xml filters.xml calendar.xml comments.xml
sections/section_<id>/
  section.xml inforef.xml
activities/<modname>_<cmid>/
  <modname>.xml module.xml inforef.xml grades.xml
  roles.xml filters.xml calendar.xml comments.xml completion.xml
  grading.xml                 — assignments only; rubric definition (gradingform_rubric)

Canvas content type → Moodle activity:

Canvas Moodle Notes
Page mod_page Body in <content>; intro stays empty
Assignment mod_assign Submission types → plugin_configs; due dates preserved
Discussion mod_forum (general) assessed=0; graded discussions log a warning
Announcement mod_forum (news, forcesubscribe=1) All announcements live in section 0
Module course section (format=topics) <sequence> lists cmids in order
Module item: SubHeader mod_label Preserves module structure
Module item: ExternalUrl mod_url
Module item: Attachment mod_resource Binary embedded from web_resources/ when the matching .imscc is staged; metadata-only otherwise
Quiz (Classic) mod_quiz + question bank category Moodle 4.x <questionbankentryid> shape; questions from CD2 quiz_questions
Quiz (New Quizzes / Quiz LTI) mod_quiz + question bank category When the NQ sidecar is loaded; otherwise renders as empty mod_assign placeholder. See New Quizzes in the .mbz.
Rubric gradingform_rubric in grading.xml One criterion → one criterion, ratings → levels
Syllabus course/<summary> AND a mod_page in section 0

Question type mapping (Canvas → Moodle qtype): multiple_choice/multiple_answers → multichoice; true_false → truefalse; short_answer → shortanswer; essay → essay; numerical → numerical; matching → match; text_only → description; fill_in_multiple_blanks → multianswer (cloze with SHORTANSWER subquestions per blank); multiple_dropdowns → multianswer (cloze with MULTICHOICE subquestions per blank); file_upload → essay configured with required attachments. Only calculated_question remains downgraded to Moodle's description (no Moodle equivalent for Canvas's formula engine).

Canvas reusable question banks (assessment_question_banks) are exported as a "Question Banks" question-category container with one sub-category per bank; quiz questions sourced from a bank via assessment_question_id reference the bank's question entry directly (no duplication). Random quiz_group pulls from banks are NOT yet emitted as Moodle random qtype slots — bank questions are reachable as a question bank but the quiz won't auto-pick them at attempt time (v1 limitation).

New Quizzes in the .mbz

New Quizzes (Quiz LTI / Quizzes.Next) live in a separate Canvas data store from Classic Quizzes — CD2/DAP doesn't carry them and Canvas's IMSCC content export only carries an LTI launch shell. The only source of NQ items is Canvas's /api/quiz/v1/ REST API, which the Canvas API step always writes to a per-course sidecar (<staging>/<course_id>-new-quizzes.json.gz). When the matching .imscc + NQ sidecar are both staged, the Moodle emitter picks up the sidecar automatically and renders each NQ as a real mod_quiz activity (the gradebook-shadow assignment that Canvas auto-creates is suppressed so the quiz replaces, not duplicates, the empty placeholder).

NQ interaction → Moodle qtype:

NQ interaction_type_slug Moodle qtype
choice (single-answer) multichoice (single)
choice (multi-answer) multichoice (multi)
true-false truefalse
essay essay
matching match
numeric numerical
rich-fill-blank (open-entry) shortanswer (best-effort, v1)
hot-spot, ordering, categorization, formula, stimulus, file-upload, BankEntry description (prompt preserved, 0 pts)

Each NQ gets its own "Default for <quiz title>" category in the Moodle question bank, parallel to Classic quizzes. NQ item banks land under a "New Quiz Item Banks" container with one category per bank.

Without the sidecar, the .mbz still builds — every NQ just renders as the empty mod_assign placeholder Canvas's shadow assignment row alone produces (pre-NQ-support behaviour). This is the symptom that prompted the feature: a course with three Respondus LockDown quizzes restored with three empty assignments and no questions.

File embedding happens automatically when Phase 1 has staged a <course_id>.imscc. Without that, files.xml is empty and HTML bodies keep Canvas absolute URLs — restored courses have working text but broken image references. With it, web_resources/ blobs are embedded as mod_resource content and inline Canvas file URLs in pages, assignments, forums, quizzes, questions, and the course summary are rewritten to @@PLUGINFILE@@ tokens so Moodle resolves them against the embedded files on restore.

Layout inside the HTML .zip

index.html               — module table of contents with links to all content
syllabus.html
pages/<slug>.html
assignments/<slug>.html
discussions/<slug>.html
quizzes/<slug>.html      — questions rendered as readable HTML by type

The index is built from the Canvas module structure, so content appears in the same order as students saw it. Items with no corresponding file (e.g. external URLs, file attachments) appear as plain text in the TOC.

Quiz questions are rendered by type: multiple-choice and true/false questions list answer options; matching questions show a left→right table; fill-in-blank and dropdown questions list accepted answers per blank; essay and file-upload questions show a type label only.

Layout inside the WXR .xml

A single RSS 2.0 / WXR 1.2 file importable via Tools → Import → WordPress.

Content is mapped to WordPress posts and pages:

Canvas content WordPress post type Category
Syllabus page Pages
Pages (wiki) page Pages
Assignments post Assignments
Discussions post Discussions
Announcements post Announcements
Quizzes post Quizzes

Module names are registered as additional WordPress categories and tagged on every item that belongs to them, preserving the course structure. Quiz questions are rendered as readable HTML (multiple-choice answers listed A/B/C/D, matching questions as a left→right table).

Customising table names

If your CD2 / DAP layout uses different table names (e.g. modules rather than context_modules), edit the SQL constants in src/canvas_export/queries.py — every query lives in that one file.

Tests

uv run pytest

Tests use in-memory fixtures; no live Postgres required.

Build / pre-commit checks

The project's "build" is a trio of fast checks that should be clean before pushing:

uv run ruff check          # lint
uv run ty check            # type check (Astral's ty)
uv run pytest              # tests

ty is configured in pyproject.toml (under [tool.ty]) to type-check src/, tests/, and scripts/ against Python 3.14. Tests relax a couple of rules where lxml .find(...).findtext(...) chains and dict isinstance narrowing produce noise without indicating real bugs — see the [[tool.ty.overrides]] block.

Pre-commit hooks

The repo ships a .pre-commit-config.yaml that runs ruff + ty on every commit. Tests are not in the hooks — run uv run pytest locally or rely on CI.

uvx pre-commit install
uvx pre-commit run --all-files   # one-off: run against the whole repo

Submissions (canvas-exporter submissions)

A separate subcommand for the student-data half of a course export. Where pipeline builds offline course backups for restore (no student data), this command pulls everything a student submitted and everything an instructor gave back, as per-course CSVs + binary attachments via the Canvas REST API. No Postgres / CD2 needed.

# Single course → ./submissions/<safe-name>_<id>/
export CANVAS_API_URL=https://canvas.example.com
export CANVAS_API_TOKEN=...
uv run canvas-exporter submissions --course-id 12345

# Many courses + an instructor-supplied folder name per course
uv run canvas-exporter submissions \
  --course-mapping ./mapping.csv \
  --out-dir ./submissions \
  --course-parallelism 4 --download-workers 16

# All courses in a term
uv run canvas-exporter submissions --account-id 1 --term-id 17

# Pull only what you need; preview first
uv run canvas-exporter submissions --course-id 12345 \
  --no-classic-quizzes --no-new-quizzes --dry-run -v

What lands on disk

submissions/
  summary.json
  <course-name>_<id>/
    _course.json
    manifest.jsonl              # resume bookkeeping
    failed.log                  # one line per download failure
    README.txt                  # explains every file/folder here
    grades.csv                  # one row per submission (assignment | classic_quiz | new_quiz)
    gradebook.csv               # wide-format instructor gradebook (one row per student)
    comments.csv                # one row per instructor / peer / self comment
    rubric_scores.csv           # one row per (student × rubric criterion)
    attachments.csv             # one row per downloaded file (+ annotated copy if any)
    classic_quiz_responses.csv  # one row per (student × classic-quiz question × attempt)
    discussion_topics.csv
    discussion_entries.csv
    discussion_attachments.csv
    peer_reviews.csv
    group_membership.csv
    assignments/<name>_<id>/
    classic_quizzes/<name>_<id>/
    new_quizzes/<name>_<id>/
    discussions/<topic>_<id>/

Re-running with the same --out-dir and --resume (default on) honours manifest.jsonl and skips already-downloaded files. --force wipes any existing per-course folder before re-downloading.

Flags

Flag Default Purpose
--course-id N Repeatable; comma-separated allowed.
--course-ids-file FILE One id per line; # comments OK; CSV with course_id header also accepted.
--course-mapping FILE CSV with id,course_identifier columns. Each course's output folder becomes <out-dir>/<course_identifier>/.
--account-id N --term-id N Every course in a term. Admin scope required.
--all-my-courses off GET /api/v1/courses.
--assignments / --no-assignments on
--discussions / --no-discussions on
--classic-quizzes / --no-classic-quizzes on
--new-quizzes / --no-new-quizzes on Submission-level NQ data always exported; per-question student analysis is best-effort via the New Quizzes service.
--peer-reviews / --no-peer-reviews on Drives comments.csv's author_role=peer classification.
--groups / --no-groups on group_membership.csv.
--gradebook / --no-gradebook on Wide-format instructor gradebook.csv. Requires --assignments.
--quiz-reports / --no-quiz-reports on Per-classic-quiz Student Analysis CSV from Canvas Reports API.
--nq-lti-analysis / --no-nq-lti-analysis on New Quizzes Student Analysis CSV via a headless LTI launch.
--annotated-pdfs / --no-annotated-pdfs on SpeedGrader/DocViewer annotated PDFs alongside originals.
--include-announcements off Include announcement discussion topics.
--latest-only off Skip older submission versions per student.
--exclude-ext .mp4,.mov Skip downloads with these extensions.
--since YYYY-MM-DD Skip submissions older than this.
--out-dir DIR ./submissions Per-course folders land here.
--resume / --no-resume on Honour manifest.jsonl.
--force / --no-force off Wipe and re-download courses whose output folder already exists; otherwise the course is skipped.
--course-parallelism N 4 How many courses run concurrently.
--download-workers N 16 File-download thread count per course.
--max-retries N 5 Retries per HTTP request (429 / 5xx / connection errors).
--dry-run off List what would be downloaded; write nothing.
-v / -q Verbose / quiet.
--api-url CANVAS_API_URL env Canvas instance URL.
--api-token CANVAS_API_TOKEN env (also CANVAS_API_KEY) Canvas access token.
--audit-log FILE Optional JSONL: one CourseResult per course (matches pipeline's --audit-log).

Required Canvas permissions

Capability Endpoint(s) Token role needed
List a course's submissions, grades, comments, rubric assessments GET /courses/:id/students/submissions?student_ids[]=all Teacher or higher on the course
List all courses in an account/term GET /accounts/:id/courses Account admin
Classic Quiz attempts + per-question answers GET /quizzes/:id/submissions, GET /quiz_submissions/:id/questions Teacher or higher
Classic Quiz Student Analysis report POST/GET /courses/:id/quizzes/:qid/reports Teacher or higher
New Quiz submission-level data GET /courses/:id/students/submissions (already covered above) Teacher or higher
New Quiz student analysis report (best-effort) POST/GET /api/quiz/v1/courses/:id/quizzes/:aid/reports and/or LTI-launch quiz-api Depends on institution
SpeedGrader annotated submission PDFs submission attachment preview_url → DocViewer POST /v2/documents/:id/merged-documents Teacher or higher with DocViewer enabled

Concluded terms: the bulk submissions endpoint can return 0 records on some concluded courses; the exporter requests enrollment_state[]=active,concluded explicitly and, if the bulk endpoint still yields nothing, falls back to a per-assignment loop.

Why a separate subcommand (not a pipeline --format)?

The two pipelines have disjoint inputs (student-side REST vs. course-content Content Export + CD2), disjoint outputs (CSV+attachments tree vs. binary backup artefacts), and the flag surface for which content-type to pull is specific to submissions (--assignments/--discussions/...). Folding them into one command would mix two different vocabularies. Keep them as siblings under the same CLI binary instead, sharing .env loading, the parallel course runner, signal-driven shutdown, and the optional --audit-log writer.

Developer scripts

The scripts/ directory holds dev/QA utilities. They are not CLI entry points (run them with uv run python scripts/<name>.py); the only supported command is canvas-exporter.

Script What it does
account_tree.py Print a Canvas sub-account tree with per-account course counts.
copy_imscc_to_output.py Copy staged .imscc files from staging into the output tree.
fetch_canvas_rest_api_docs.py Scrape Canvas's REST API docs into the markdown under docs/canvas_rest_api/.
imscc_to_mbz_smoke.py Smoke test: load an .imscc and convert it to a Moodle .mbz.
organize_by_identifier.py Reorganize exported course folders by course identifier.
qtype_coverage_audit.py Audit quiz question-type coverage across staged bundles.
quiz_fix_audit.py Validate quiz structure after fixes; surface coverage gaps.
rename_to_cpe_naming.py Rename course folders to a CPE naming convention.

Credits

  • The Canvas Data 2 sync layer this tool reads from is maintained at UBC with ubc/canvas-data-2, based on Harvard's MIT-licensed Harvard-University-iCommons/canvas-data-2-aws.
  • The submissions subcommand's REST client and per-content-type exporters were ported from the standalone canvas-assignment-submission-download project.

See NOTICE for attribution details.

About

Export your Canvas course data and student submissions

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors