Skip to content

feat: Implement SCORM package ingestion, playback, and integrity validation#1240

Open
romitshah02 wants to merge 9 commits into
Sunbird-Knowlg:developfrom
romitshah02:scorm-multi-sco
Open

feat: Implement SCORM package ingestion, playback, and integrity validation#1240
romitshah02 wants to merge 9 commits into
Sunbird-Knowlg:developfrom
romitshah02:scorm-multi-sco

Conversation

@romitshah02
Copy link
Copy Markdown

Summary of Change
Implemented end-to-end SCORM package ingestion and playback support for both single and multi sco. This includes backend ingestion logic (ScormMimeTypeMgrImpl), editor support for MIME detection and binary upload, and a SCORM API shim in the player. Included an integrity check to verify the existence of the launchFile specified in imsmanifest.xml.

Motivation
To enable compatibility with standard SCORM learning modules within the platform, ensuring robust package ingestion and reliable playback.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • Verified E2E flow (Authoring -> Ingestion -> Playback) locally.
  • Verified invalid SCORM packages now throw ClientException during ingestion.
  • Verified valid SCORM packages ingest and playback correctly.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

  • New Features

    • SCORM support: package upload, extraction, artifact storage, and automatic launch-file detection; review and publish flows.
  • Bug Fixes / Validation

    • Rejects invalid SCORM packages and missing/invalid launch files with clear errors.
  • Security

    • Extraction hardened to prevent path traversal/Zip-Slip and ensure safe launch-file resolution.
  • Tests

    • Added unit tests covering SCORM upload validation and failure cases.
  • Schema

    • Content schema now allows the SCORM MIME type.
  • Chores / Dev

    • Local seed-data tooling and README updates.
  • Infrastructure

    • Compose/VM and CI updates: OpenSearch migration and simplified build workflows.

Review Change Stack

@romitshah02 romitshah02 changed the title Scorm multi sco feat: Implement SCORM package ingestion, playback, and integrity validation May 28, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 23971986-9795-4d8e-8a55-eb15f76939f7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Collaborator

@sntiwari1 sntiwari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCORM Ingestion PR Review

This review covers the core Scala implementation files. The SCORM feature is a valuable addition; below are correctness/security/design issues that should be addressed before merge.

Copy link
Copy Markdown
Collaborator

@sntiwari1 sntiwari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseMimeTypeManager.scala — Dead code and misleading indentation in extractPackage (lines ~143-156)

The PR adds a second path-traversal check using baseDirPath:

val baseDirPath = Paths.get(basePath).toAbsolutePath.normalize()
for (entry <- ...) {
    val path = resolvedBaseDir.resolve(entry.getName).normalize()
    if (!path.startsWith(resolvedBaseDir))   // existing check
        throw new ClientException(...)
    if (!path.startsWith(baseDirPath)) {     // new PR check — dead code
        throw new ClientException(...)
    }

resolvedBaseDir is baseDir.toRealPath() (canonical, symlink-resolved). baseDirPath is Paths.get(basePath).toAbsolutePath.normalize() — these resolve to the same directory. Because any path that passes the first check (startsWith(resolvedBaseDir)) will also satisfy the second (startsWith(baseDirPath)), the second block is unreachable dead code.

Additionally the indentation in the PR diff is broken — the for loop is at an incorrect nesting level, making the structure misleading.

Recommendation: Remove the baseDirPath variable and its check entirely. The existing resolvedBaseDir check is sufficient and correct.

Copy link
Copy Markdown
Collaborator

@sntiwari1 sntiwari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScormMimeTypeMgrImpl.scala — Mutable val mapper on a class instance instead of a companion object

class ScormMimeTypeMgrImpl(implicit ss: StorageService) extends ... {
    val mapper = new ObjectMapper()
    mapper.registerModule(DefaultScalaModule)

ObjectMapper is thread-safe after configuration but the registerModule call happens in the constructor body. A new ScormMimeTypeMgrImpl is instantiated once and shared (see MimeTypeManagerFactory), so while this is not a concurrency bug in practice, it is an anti-pattern. More importantly ObjectMapper is an expensive object to construct.

Recommendation: Move mapper to a companion object so it is created exactly once for the JVM lifetime:

object ScormMimeTypeMgrImpl {
  private val mapper: ObjectMapper = new ObjectMapper()
    .registerModule(DefaultScalaModule)
}

This also removes the need for a separate registerModule call on a val.

Copy link
Copy Markdown
Collaborator

@sntiwari1 sntiwari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScormMimeTypeMgrImpl.scala — getScoList uses an unsafe .asInstanceOf cast that hides type errors at compile time

}).toList.asInstanceOf[List[Map[String, String]]]

The .map(...) already returns List[Map[String, String]] at the call site — Scala infers this from the literal Map("identifier" -> ..., "title" -> ..., "href" -> ...) expressions whose values are all String. The explicit .asInstanceOf cast suppresses that inference, making compiler type-checking blind to future regressions where a value might not be String.

Recommendation: Remove the cast entirely and let the type be inferred:

private def getScoList(xml: Elem): List[Map[String, String]] = {
    (xml \\ "item").filter(item => (item \@ "identifierref").nonEmpty).map { item =>
        val ref   = item \@ "identifierref"
        val title = (item \ "title").text
        val href  = (xml \\ "resource").find(res => (res \@ "identifier") == ref)
                                        .map(_ \@ "href").getOrElse("")
        Map("identifier" -> (item \@ "identifier"), "title" -> title, "href" -> href)
    }.toList
}

Copy link
Copy Markdown
Collaborator

@sntiwari1 sntiwari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScormMimeTypeMgrImpl.scala — upload throws a synchronous exception in the else branch, breaking the Future-returning contract

override def upload(...): Future[Map[String, AnyRef]] = {
    validateUploadRequest(...)
    if (isValidPackageStructure(...)) {
        ...
    } else {
        TelemetryManager.error(...)
        throw new ClientException(...)  // synchronous throw inside a Future-returning method
    }
}

When a method is declared to return Future[T], callers expect all failures to be communicated through a failed Future, not through a synchronous exception. If a caller wraps this in a Future { scormMgr.upload(...) }.flatten, the synchronous throw is caught by that outer Future. But if any caller calls upload(...) directly and chains .map/.flatMap on the result, the synchronous exception bypasses all recover/recoverWith handlers on the future chain — the same issue as validateUploadRequest.

Recommendation: Wrap the failure in a failed Future:

} else {
    TelemetryManager.error(...)
    Future.failed(new ClientException("ERR_INVALID_FILE", "Invalid SCORM package: imsmanifest.xml is missing!"))
}

Note: validateUploadRequest also throws synchronously, so wrapping that call too would make this fully consistent.

Copy link
Copy Markdown
Collaborator

@sntiwari1 sntiwari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScormMimeTypeMgrImpl.scala — getSecureXml overrides parser in an anonymous class but the field is def parser, not a stable val

val xmlLoader = new XMLLoader[Elem] {
    override def parser = saxParser
}
xmlLoader.loadFile(manifestFile)

XMLLoader.parser is a def that creates a new SAXParser on every call in the default implementation. The override here correctly returns the pre-configured saxParser, but because parser is a def (not a val), there is a subtle contract issue: if loadFile calls parser more than once internally (e.g., for error-recovery paths), a different SAXParser instance would normally be returned each time — but here the same instance is reused. SAXParser is not thread-safe and is not designed to be called twice without reset.

More importantly, the Scala standard library XML.loadFile method is not hardened against XXE and should not be used without this pattern — which the author correctly implements. However, the approach is fragile because it relies on internal behaviour of XMLLoader.

Recommendation: Use saxParser.getXMLReader and configure it directly, then parse via scala.xml.XML.withSAXParser(saxParser).loadFile(manifestFile) which is the idiomatic and stable API for supplying a custom SAX parser to the Scala XML library.

Copy link
Copy Markdown
Collaborator

@sntiwari1 sntiwari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schemas/content/1.0/schema.json — Inconsistent indentation for the new SCORM MIME type entry

        "application/vnd.ekstep.plugin-archive",
+          "application/vnd.ekstep.scorm-archive",
        "application/vnd.ekstep.h5p-archive",

The added line uses extra leading spaces compared to its neighbors. This is a minor cosmetic issue but can cause unnecessary noise in future diffs and may indicate an accidental copy-paste. Please align indentation to match the surrounding array entries (two extra spaces are being inserted).

Copy link
Copy Markdown
Collaborator

@sntiwari1 sntiwari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScormMimeTypeMgrImplTest.scala — Test for multi-SCO package parses scoList with raw classOf[List[Map[String, String]]] which silently produces wrong type at runtime

val scoList = mapper.readValue(scoListJson, classOf[List[Map[String, String]]])

Due to JVM type erasure, classOf[List[Map[String, String]]] is the same as classOf[List[_]]. Jackson deserializes it as List[java.util.LinkedHashMap[String, Object]], not List[Map[String, String]]. The test assertions sco("identifier") == "item1" happen to work only because LinkedHashMap is also a java.util.Map and Scala can call .apply on it, but the Scala type annotation List[Map[String, String]] is a lie — at runtime this is a List[java.util.LinkedHashMap].

Use a TypeReference to deserialize correctly:

import com.fasterxml.jackson.core.`type`.TypeReference
val scoList = mapper.readValue(scoListJson,
    new TypeReference[java.util.List[java.util.Map[String, String]]]() {})
scoList.size shouldBe 2

Or alternatively use mapper.readTree(scoListJson) for assertion purposes in tests.

@romitshah02 romitshah02 requested a review from sntiwari1 May 28, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants