Skip to content

fix: implement actual data collection for backup security manager#333

Open
daggerstuff wants to merge 5 commits intostagingfrom
fix/pix-44-backup-data-collection-7376318658770257015
Open

fix: implement actual data collection for backup security manager#333
daggerstuff wants to merge 5 commits intostagingfrom
fix/pix-44-backup-data-collection-7376318658770257015

Conversation

@daggerstuff
Copy link
Copy Markdown
Owner

@daggerstuff daggerstuff commented Mar 31, 2026

Replaced the placeholder dummy data implementation in getDataForBackup (PIX-44) with an actual implementation that uses Mongoose to collect application data for backups based on the backup type.

Changes include:

  • getDataForBackup now fetches data from all registered mongoose collections, parsing them sequentially for memory safety using .lean().
  • Implemented getLastBackupTime to search through previous backup metadata files (.meta.json) using standard directory prefixes to establish valid baseline timestamps for DIFFERENTIAL and INCREMENTAL backups.
  • Improved error handling structure by using error.message.

PR created automatically by Jules for task 7376318658770257015 started by @daggerstuff

Summary by Sourcery

Replace the placeholder backup data implementation with a real Mongoose-based data collection mechanism and introduce backup metadata scanning to support differential and incremental backups.

New Features:

  • Collect backup payloads from all registered Mongoose models, optionally filtered by last update time based on backup type.
  • Determine the most recent eligible backup timestamp by scanning stored backup metadata files across all storage providers.

Enhancements:

  • Improve error logging when reading backup metadata and during backup data collection to use cleaned-up error messages.

Summary by cubic

Implemented real backup data collection using mongoose and metadata scanning to set baselines for differential and incremental backups (PIX-44). This enables accurate backups across all storage providers.

  • New Features

    • getDataForBackup: builds one JSON payload with timestamp/type and per-model arrays; streams all mongoose models via .lean().cursor(); for differential/incremental, filters by updatedAt (if present) using the last valid backup time or a 24h fallback.
    • getLastBackupTime: scans backups/*.meta.json across providers for the latest COMPLETED backup; requires a FULL baseline for differential, otherwise allows FULL/DIFFERENTIAL/INCREMENTAL; ignores malformed files.
  • Bug Fixes

    • Cleaner error logs using error.message; fail loudly on data collection errors.

Written for commit cac6640. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes
    • Backups now correctly capture real data instead of placeholder content.
    • Enhanced error handling and reporting during backup operations.
    • Improved support for full and incremental backups with automatic change tracking based on modification timestamps.

Co-authored-by: daggerstuff <261005129+daggerstuff@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Mar 31, 2026

Reviewer's Guide

Implements real backup data collection in BackupSecurityManager by querying all Mongoose models (optionally filtered by last backup time), introduces a helper to derive the last backup timestamp from stored backup metadata, and tightens error logging to use error.message.

Sequence diagram for backup data collection using Mongoose and metadata lookup

sequenceDiagram
  actor Admin
  participant BackupSecurityManager
  participant StorageProvider as StorageProvider(s)
  participant Mongoose

  Admin->>BackupSecurityManager: getDataForBackup(type)
  activate BackupSecurityManager

  alt type is DIFFERENTIAL or INCREMENTAL
    BackupSecurityManager->>BackupSecurityManager: getLastBackupTime(requireFull)
    activate BackupSecurityManager
    loop each storageProvider
      BackupSecurityManager->>StorageProvider: listFiles(backups/)
      StorageProvider-->>BackupSecurityManager: files
      BackupSecurityManager->>StorageProvider: getFile(metaFile)
      StorageProvider-->>BackupSecurityManager: metadataBuffer
      BackupSecurityManager->>BackupSecurityManager: parse BackupMetadata
      BackupSecurityManager->>BackupSecurityManager: update latestTimestamp if eligible
    end
    deactivate BackupSecurityManager
    BackupSecurityManager->>BackupSecurityManager: build query with updatedAt >= baselineTime
  else type is FULL
    BackupSecurityManager->>BackupSecurityManager: query = {}
  end

  BackupSecurityManager->>Mongoose: modelNames()
  Mongoose-->>BackupSecurityManager: [modelName]

  loop each modelName
    BackupSecurityManager->>Mongoose: model(modelName)
    Mongoose-->>BackupSecurityManager: Model
    BackupSecurityManager->>Mongoose: Model.find(query).lean().exec()
    Mongoose-->>BackupSecurityManager: docs
    BackupSecurityManager->>BackupSecurityManager: add docs to backupData.data[modelName]
  end

  BackupSecurityManager-->>Admin: Uint8Array(JSON.stringify(backupData))
  deactivate BackupSecurityManager
Loading

Updated class diagram for BackupSecurityManager backup data collection

classDiagram
  class BackupSecurityManager {
    - storageProviders Map<StorageLocation, StorageProvider>
    - logger Logger
    + getLastBackupTime(requireFull boolean) Promise~Date|null~
    + getDataForBackup(type BackupType) Promise~Uint8Array~
  }

  class BackupType {
    <<enumeration>>
    FULL
    DIFFERENTIAL
    INCREMENTAL
  }

  class BackupStatus {
    <<enumeration>>
    COMPLETED
    FAILED
    IN_PROGRESS
  }

  class BackupMetadata {
    + timestamp string
    + type BackupType
    + status BackupStatus
  }

  class StorageProvider {
    <<interface>>
    + listFiles(prefix string) Promise~string[]~
    + getFile(path string) Promise~Uint8Array~
  }

  class Mongoose {
    + modelNames() string[]
    + model(name string) MongooseModel
  }

  class MongooseModel {
    + find(query any) MongooseQuery
  }

  class MongooseQuery {
    + lean() MongooseQuery
    + exec() Promise~any[]~
  }

  BackupSecurityManager --> StorageProvider : uses
  BackupSecurityManager --> BackupMetadata : reads
  BackupMetadata --> BackupType
  BackupMetadata --> BackupStatus
  BackupSecurityManager --> Mongoose : queries
  Mongoose --> MongooseModel
  MongooseModel --> MongooseQuery
Loading

File-Level Changes

Change Details Files
Implement real data collection for backups using Mongoose models, with optional incremental/differential filtering.
  • Replace dummy backup payload with a structured object containing timestamp, backup type, and per-model data map.
  • Build a query with updatedAt >= baseline for DIFFERENTIAL and INCREMENTAL backups using a computed baseline time.
  • Iterate over all registered Mongoose model names, execute lean() queries, and store non-empty results under their model name in the backup payload.
  • Encode the assembled backup payload using TextEncoder for transport/storage.
src/lib/security/backup/index.ts
Add helper to determine the last backup time from backup metadata files to support incremental and differential backups.
  • Iterate over storage providers and list files under the backups/ prefix, filtering for .meta.json metadata files.
  • For each metadata file, fetch and parse it, considering only COMPLETED backups with acceptable types depending on requireFull.
  • Track the latest backup timestamp across all providers and return it as a Date or null if none are valid.
  • Log provider-level errors while ignoring individual file parsing failures to keep scanning robust.
src/lib/security/backup/index.ts
Improve backup metadata error logging to use error.message instead of coercing the full Error object to string.
  • Update logger.error call when searching for backup metadata to prefer error.message when the thrown value is an Error instance.
src/lib/security/backup/index.ts

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

The backup module now collects actual database data instead of generating dummy payloads. A new private method retrieves the latest backup timestamp from storage providers by reading metadata files. Differential backups filter documents based on the previous backup's timestamp to capture only recent changes.

Changes

Cohort / File(s) Summary
Backup Data Collection
src/lib/security/backup/index.ts
Added getLastBackupTime() to scan storage providers and retrieve the most recent completed backup timestamp. Enhanced getDataForBackup() to dynamically import mongoose, enumerate models, and stream actual documents from the database. For differential/incremental backups, filters documents by updatedAt using baseline time from the last backup. Improved error logging in getBackupMetadata() to extract error.message when available.

Sequence Diagram

sequenceDiagram
    participant BackupManager as BackupSecurityManager
    participant StorageProvider as Storage Provider
    participant MetadataFile as Metadata File
    participant Database as Database/Models
    participant Cursor as Document Cursor

    BackupManager->>StorageManager: getDataForBackup(type)
    activate BackupManager
    
    alt if differential/incremental backup
        BackupManager->>BackupManager: getLastBackupTime(requireFull=false)
        activate BackupManager
        loop for each storage provider
            BackupManager->>StorageProvider: list(backups/)
            StorageProvider-->>BackupManager: file list
            loop for each *.meta.json file
                BackupManager->>MetadataFile: read & parse
                MetadataFile-->>BackupManager: BackupMetadata
                alt status === COMPLETED
                    BackupManager->>BackupManager: track timestamp
                end
            end
        end
        BackupManager-->>BackupManager: return latest timestamp
        deactivate BackupManager
    else full backup
        BackupManager->>BackupManager: baselineTime = null
    end
    
    BackupManager->>Database: import mongoose & enumerate models
    loop for each model
        BackupManager->>Database: Model.find(query).lean().cursor()
        activate Database
        Database-->>Cursor: document cursor
        loop for each document
            Cursor-->>BackupManager: document
            BackupManager->>BackupManager: JSON.stringify & append
        end
        deactivate Database
    end
    
    BackupManager-->>BackupManager: return JSON payload
    deactivate BackupManager
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hoppy changes in the backup warren!
Where once sat dummy data, now real records flow,
From databases deep, each document we know,
Storage providers scanned with careful care,
Differential backups capture change in the air! 🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: replacing placeholder dummy data collection with real Mongoose-based data collection in the backup security manager, which is the primary focus of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/pix-44-backup-data-collection-7376318658770257015

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 31, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pixelated Ready Ready Preview, Comment Apr 1, 2026 4:35pm

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In getDataForBackup, Promise.all over all Mongoose models can still create a large in-memory spike; consider iterating models sequentially or in small batches to better control memory usage during backup.
  • The query that filters by updatedAt assumes all models have an updatedAt field; it may be safer to either restrict to known backup-eligible models or detect whether a model supports that field before applying the filter.
  • In getLastBackupTime, individual file parse errors are silently swallowed; adding at least a debug-level log with the filename would make it easier to diagnose corrupted or malformed metadata files.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `getDataForBackup`, `Promise.all` over all Mongoose models can still create a large in-memory spike; consider iterating models sequentially or in small batches to better control memory usage during backup.
- The query that filters by `updatedAt` assumes all models have an `updatedAt` field; it may be safer to either restrict to known backup-eligible models or detect whether a model supports that field before applying the filter.
- In `getLastBackupTime`, individual file parse errors are silently swallowed; adding at least a debug-level log with the filename would make it easier to diagnose corrupted or malformed metadata files.

## Individual Comments

### Comment 1
<location path="src/lib/security/backup/index.ts" line_range="870-877" />
<code_context>
+        query = { updatedAt: { $gte: baselineTime } }
+      }
+
+      const models = mongoose.modelNames()
+      const dataPromises = models.map(async (modelName) => {
+        const Model = mongoose.model(modelName)
+        const docs = await Model.find(query).lean().exec()
+        return { modelName, docs }
+      })
+
+      const results = await Promise.all(dataPromises)
+
+      for (const result of results) {
</code_context>
<issue_to_address>
**issue (performance):** Fetching all documents from all Mongoose models at once can be very expensive and memory-intensive.

For full backups this builds `{}` and runs `Model.find({}).lean().exec()` for every registered model in parallel, which can load a very large number of documents into memory at once. Consider batching per model using pagination/streaming (e.g., cursors) instead of a single `find()`, and/or limiting backups to an explicit subset of models rather than all `mongoose.modelNames()` to reduce memory usage and long-running queries.
</issue_to_address>

Fix all in Cursor


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

@charliecreates charliecreates bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new backup collection logic has several correctness and scalability problems: it runs all model queries concurrently (high memory/DB load), uses a potentially invalid now - 24h baseline when no prior backup exists (can produce incomplete restore chains), and assumes updatedAt exists/reliably updates for every collection. getLastBackupTime() is also potentially very expensive because it reads/parses every *.meta.json on each run, and it silently ignores individual metadata file errors. There’s also an avoidable Record<string, any> that reduces type-safety.

Summary of changes

What changed

  • Added a Mongoose-based implementation for getDataForBackup() that:
    • Builds a backupData payload with timestamp, type, and per-model documents.
    • Uses mongoose.modelNames() and Model.find(query).lean().exec() to collect documents.
    • Applies an updatedAt >= baselineTime filter for DIFFERENTIAL/INCREMENTAL backups.
  • Introduced getLastBackupTime(requireFull?: boolean) to scan backups/ for *.meta.json files and compute the latest completed backup timestamp.
  • Tweaked error logging to prefer error.message when available.
  • Added import mongoose from "mongoose".

Comment on lines +813 to +851
private async getLastBackupTime(requireFull: boolean = false): Promise<Date | null> {
let latestTimestamp = 0;

const storageEntries = Array.from(this.storageProviders.entries())
for (let i = 0; i < storageEntries.length; i++) {
const entry = storageEntries[i]
if (!entry || !Array.isArray(entry) || entry.length < 2) continue

const [location, provider] = entry
if (!provider) continue

try {
const files = await provider.listFiles("backups/") // Fallback to list all and filter if pattern not supported
const metaFiles = files.filter(f => f.endsWith(".meta.json"))
for (const file of metaFiles) {
try {
const metadataBuffer = await provider.getFile(file)
const metadata = JSON.parse(new TextDecoder().decode(metadataBuffer)) as BackupMetadata

if (metadata.status === BackupStatus.COMPLETED) {
if (requireFull && metadata.type !== BackupType.FULL) continue
if (!requireFull && metadata.type !== BackupType.FULL && metadata.type !== BackupType.DIFFERENTIAL && metadata.type !== BackupType.INCREMENTAL) continue

const timestamp = new Date(metadata.timestamp).getTime()
if (timestamp > latestTimestamp) {
latestTimestamp = timestamp
}
}
} catch (e) {
// Ignore individual file parsing errors
}
}
} catch (error: unknown) {
logger.error(`Error searching for latest backup metadata: ${error instanceof Error ? error.message : String(error)}`)
}
}

return latestTimestamp > 0 ? new Date(latestTimestamp) : null;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLastBackupTime() lists all files under backups/ and then fetches/parses every *.meta.json to find the newest timestamp. This can become extremely expensive on large repositories/backends (many metadata files, network round-trips per file) and will run on every incremental/differential backup.

Suggestion

Avoid reading/parsing every metadata file on each run. Options:

  • If filenames contain sortable timestamps/IDs, derive the latest candidate from the name and only fetch that file.
  • Maintain a small index/"latest" pointer file per storage location (e.g. backups/latest.meta.json) written at backup completion.
  • If your provider supports server-side filtering/sorting, add/extend an API to query only the latest *.meta.json.

Reply with "@CharlieHelps yes please" if you want me to add a commit that introduces a latest.meta.json pointer approach.

Comment on lines +827 to +844
for (const file of metaFiles) {
try {
const metadataBuffer = await provider.getFile(file)
const metadata = JSON.parse(new TextDecoder().decode(metadataBuffer)) as BackupMetadata

if (metadata.status === BackupStatus.COMPLETED) {
if (requireFull && metadata.type !== BackupType.FULL) continue
if (!requireFull && metadata.type !== BackupType.FULL && metadata.type !== BackupType.DIFFERENTIAL && metadata.type !== BackupType.INCREMENTAL) continue

const timestamp = new Date(metadata.timestamp).getTime()
if (timestamp > latestTimestamp) {
latestTimestamp = timestamp
}
}
} catch (e) {
// Ignore individual file parsing errors
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner catch (e) silently ignores metadata parse/read errors for individual files. That can mask corruption/partial uploads and lead to choosing an incorrect baseline timestamp without any observability.

Suggestion

At minimum, log (ideally at debug/warn) which metadata file failed to parse and why, so operators can detect corruption.

Example:

} catch (e: unknown) {
  logger.warn(
    `Skipping invalid backup metadata file ${file}: ${e instanceof Error ? e.message : String(e)}`,
  )
}

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this logging change.

Comment on lines +870 to +884
const models = mongoose.modelNames()
const dataPromises = models.map(async (modelName) => {
const Model = mongoose.model(modelName)
const docs = await Model.find(query).lean().exec()
return { modelName, docs }
})

const results = await Promise.all(dataPromises)

for (const result of results) {
if (result.docs.length > 0) {
backupData.data[result.modelName] = result.docs
}
}
} catch (error: unknown) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDataForBackup() uses Promise.all() over all models, which runs all collection queries concurrently and materializes all results at once. This contradicts the PR description (“parsing them sequentially for memory safety”) and can spike DB load and memory usage (especially combined with JSON.stringify of a large in-memory object).

Suggestion

Collect model data sequentially (or with a small concurrency limit) and consider streaming/pagination for large collections.

Sequential example:

for (const modelName of mongoose.modelNames()) {
  const Model = mongoose.model(modelName)
  const docs = await Model.find(query).lean().exec()
  if (docs.length) backupData.data[modelName] = docs
}

If collections can be large, prefer cursors/pagination rather than loading everything into memory.

Reply with "@CharlieHelps yes please" if you’d like me to add a commit that switches to sequential collection (and optionally adds a simple concurrency limiter).

Comment on lines +863 to +868
if (type === BackupType.DIFFERENTIAL || type === BackupType.INCREMENTAL) {
const requireFull = type === BackupType.DIFFERENTIAL
const lastBackupTime = await this.getLastBackupTime(requireFull)
const baselineTime = lastBackupTime || new Date(Date.now() - 24 * 60 * 60 * 1000)
query = { updatedAt: { $gte: baselineTime } }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differential/incremental backups fall back to baselineTime = now - 24h when no prior baseline is found. That can produce invalid backup chains (missing older documents), making restores incomplete. For DIFFERENTIAL, if there’s no prior FULL, the correct behavior is usually to fail or force a FULL backup.

Suggestion

Treat “no baseline found” as a hard error (or automatically behave as FULL) to avoid silently producing incomplete backups.

Example:

const lastBackupTime = await this.getLastBackupTime(requireFull)
if (!lastBackupTime) {
  if (type === BackupType.DIFFERENTIAL) {
    throw new Error('No FULL backup found; cannot create DIFFERENTIAL backup')
  }
  // INCREMENTAL: either throw or treat as FULL
  query = {}
} else {
  query = { updatedAt: { $gte: lastBackupTime } }
}

Reply with "@CharlieHelps yes please" if you want me to add a commit implementing this baseline behavior.

Comment on lines +854 to +858
const backupData: Record<string, any> = {
timestamp: new Date().toISOString(),
type,
data: {}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backupData is typed as Record<string, any>, which weakens type-safety and makes it easier to accidentally serialize unsupported shapes (and can hide mistakes in backupData.data[...]). This is a concrete use of any, which is a long-term maintainability footgun.

Suggestion

Replace any with a minimal explicit type for the backup payload.

Example:

type BackupPayload = {
  timestamp: string
  type: BackupType
  data: Record<string, unknown[]> // or a stricter document type if available
}

const backupData: BackupPayload = {
  timestamp: new Date().toISOString(),
  type,
  data: {},
}

Reply with "@CharlieHelps yes please" if you’d like me to add a commit that removes the any here.

Comment on lines +861 to +868
let query = {}

if (type === BackupType.DIFFERENTIAL || type === BackupType.INCREMENTAL) {
const requireFull = type === BackupType.DIFFERENTIAL
const lastBackupTime = await this.getLastBackupTime(requireFull)
const baselineTime = lastBackupTime || new Date(Date.now() - 24 * 60 * 60 * 1000)
query = { updatedAt: { $gte: baselineTime } }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental/differential logic assumes every collection supports an updatedAt field. For models without timestamps (or where updatedAt isn’t reliably updated), this will silently drop records from non-full backups, breaking restore correctness for those collections.

Suggestion

Handle models without updatedAt deterministically (e.g., treat those models as full-copy even during incremental/differential, or enforce timestamps across all schemas and fail fast if missing).

One pragmatic approach:

  • Detect timestamp support per model (schema timestamps option or presence of updatedAt path).
  • If missing, either:
    • query without the updatedAt filter for that model, or
    • skip with a warning and document that such collections are not incrementally backed up.

Reply with "@CharlieHelps yes please" if you want me to add a commit that detects missing updatedAt and falls back to full-copy per model.

Comment on lines +813 to +851
private async getLastBackupTime(requireFull: boolean = false): Promise<Date | null> {
let latestTimestamp = 0;

const storageEntries = Array.from(this.storageProviders.entries())
for (let i = 0; i < storageEntries.length; i++) {
const entry = storageEntries[i]
if (!entry || !Array.isArray(entry) || entry.length < 2) continue

const [location, provider] = entry
if (!provider) continue

try {
const files = await provider.listFiles("backups/") // Fallback to list all and filter if pattern not supported
const metaFiles = files.filter(f => f.endsWith(".meta.json"))
for (const file of metaFiles) {
try {
const metadataBuffer = await provider.getFile(file)
const metadata = JSON.parse(new TextDecoder().decode(metadataBuffer)) as BackupMetadata

if (metadata.status === BackupStatus.COMPLETED) {
if (requireFull && metadata.type !== BackupType.FULL) continue
if (!requireFull && metadata.type !== BackupType.FULL && metadata.type !== BackupType.DIFFERENTIAL && metadata.type !== BackupType.INCREMENTAL) continue

const timestamp = new Date(metadata.timestamp).getTime()
if (timestamp > latestTimestamp) {
latestTimestamp = timestamp
}
}
} catch (e) {
// Ignore individual file parsing errors
}
}
} catch (error: unknown) {
logger.error(`Error searching for latest backup metadata: ${error instanceof Error ? error.message : String(error)}`)
}
}

return latestTimestamp > 0 ? new Date(latestTimestamp) : null;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLastBackupTime() hard-codes the prefix "backups/". If other parts of this module write metadata under a different base path (or per-StorageLocation root), this will silently fail to find baselines and cause incorrect differential/incremental behavior.

Suggestion

Reuse the same directory/prefix constants or path-building helpers used by the code that writes backups/metadata (and/or pass the base prefix in) instead of hard-coding "backups/".

Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this refactor.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/lib/security/backup/index.ts">

<violation number="1" location="src/lib/security/backup/index.ts:825">
P1: `listFiles("backups/")` uses a non-matching pattern, so previous metadata files are not found.</violation>

<violation number="2" location="src/lib/security/backup/index.ts:832">
P1: Incremental/differential baseline selection only considers COMPLETED backups, but created backups remain PENDING in normal flow, causing repeated fallback windows and potential data omission.</violation>

<violation number="3" location="src/lib/security/backup/index.ts:866">
P1: Using a 24-hour fallback baseline can omit older records and produce incomplete differential/incremental backups.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

const metadataBuffer = await provider.getFile(file)
const metadata = JSON.parse(new TextDecoder().decode(metadataBuffer)) as BackupMetadata

if (metadata.status === BackupStatus.COMPLETED) {
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Incremental/differential baseline selection only considers COMPLETED backups, but created backups remain PENDING in normal flow, causing repeated fallback windows and potential data omission.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/lib/security/backup/index.ts, line 832:

<comment>Incremental/differential baseline selection only considers COMPLETED backups, but created backups remain PENDING in normal flow, causing repeated fallback windows and potential data omission.</comment>

<file context>
@@ -809,16 +810,83 @@ export class BackupSecurityManager {
+            const metadataBuffer = await provider.getFile(file)
+            const metadata = JSON.parse(new TextDecoder().decode(metadataBuffer)) as BackupMetadata
+
+            if (metadata.status === BackupStatus.COMPLETED) {
+              if (requireFull && metadata.type !== BackupType.FULL) continue
+              if (!requireFull && metadata.type !== BackupType.FULL && metadata.type !== BackupType.DIFFERENTIAL && metadata.type !== BackupType.INCREMENTAL) continue
</file context>
Fix with Cubic

if (type === BackupType.DIFFERENTIAL || type === BackupType.INCREMENTAL) {
const requireFull = type === BackupType.DIFFERENTIAL
const lastBackupTime = await this.getLastBackupTime(requireFull)
const baselineTime = lastBackupTime || new Date(Date.now() - 24 * 60 * 60 * 1000)
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Using a 24-hour fallback baseline can omit older records and produce incomplete differential/incremental backups.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/lib/security/backup/index.ts, line 866:

<comment>Using a 24-hour fallback baseline can omit older records and produce incomplete differential/incremental backups.</comment>

<file context>
@@ -809,16 +810,83 @@ export class BackupSecurityManager {
+      if (type === BackupType.DIFFERENTIAL || type === BackupType.INCREMENTAL) {
+        const requireFull = type === BackupType.DIFFERENTIAL
+        const lastBackupTime = await this.getLastBackupTime(requireFull)
+        const baselineTime = lastBackupTime || new Date(Date.now() - 24 * 60 * 60 * 1000)
+        query = { updatedAt: { $gte: baselineTime } }
+      }
</file context>
Suggested change
const baselineTime = lastBackupTime || new Date(Date.now() - 24 * 60 * 60 * 1000)
const baselineTime = lastBackupTime || new Date(0)
Fix with Cubic

if (!provider) continue

try {
const files = await provider.listFiles("backups/") // Fallback to list all and filter if pattern not supported
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: listFiles("backups/") uses a non-matching pattern, so previous metadata files are not found.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/lib/security/backup/index.ts, line 825:

<comment>`listFiles("backups/")` uses a non-matching pattern, so previous metadata files are not found.</comment>

<file context>
@@ -809,16 +810,83 @@ export class BackupSecurityManager {
+      if (!provider) continue
+
+      try {
+        const files = await provider.listFiles("backups/") // Fallback to list all and filter if pattern not supported
+        const metaFiles = files.filter(f => f.endsWith(".meta.json"))
+        for (const file of metaFiles) {
</file context>
Fix with Cubic

Co-authored-by: daggerstuff <261005129+daggerstuff@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/lib/security/backup/index.ts (2)

859-864: ⚠️ Potential issue | 🟠 Major

24-hour fallback for DIFFERENTIAL backups risks incomplete backup chains.

When no prior FULL backup is found for a DIFFERENTIAL backup (line 862-863), the code falls back to Date.now() - 24h. This silently produces an incomplete backup—documents older than 24 hours will be missing, making restore incomplete. A DIFFERENTIAL backup by definition requires a FULL baseline.

🐛 Proposed fix to fail fast when no FULL baseline exists
       let baselineTime: Date | null = null;
       if (type === BackupType.DIFFERENTIAL || type === BackupType.INCREMENTAL) {
         const requireFull = type === BackupType.DIFFERENTIAL;
         const lastBackupTime = await this.getLastBackupTime(requireFull);
-        baselineTime = lastBackupTime || new Date(Date.now() - 24 * 60 * 60 * 1000);
+        if (!lastBackupTime) {
+          if (type === BackupType.DIFFERENTIAL) {
+            throw new Error('Cannot create DIFFERENTIAL backup: no prior FULL backup found');
+          }
+          // INCREMENTAL without any prior backup: treat as FULL (no filter)
+          baselineTime = null;
+        } else {
+          baselineTime = lastBackupTime;
+        }
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/security/backup/index.ts` around lines 859 - 864, The current
fallback to Date.now()-24h for baselineTime when type ===
BackupType.DIFFERENTIAL can create incomplete backups; instead, after calling
this.getLastBackupTime(requireFull) check if requireFull is true and
lastBackupTime is null/undefined and fail fast by throwing a descriptive error
(e.g., "No FULL baseline found for DIFFERENTIAL backup") rather than assigning a
24-hour fallback; update the logic around baselineTime (the block using
BackupType.DIFFERENTIAL / BackupType.INCREMENTAL and
this.getLastBackupTime(requireFull)) to throw when requireFull &&
!lastBackupTime so callers can handle the missing baseline.

837-839: ⚠️ Potential issue | 🟡 Minor

Silent catch masks metadata corruption and hinders debugging.

The inner catch block silently discards parse/read errors. Per coding guidelines, failures should be logged rather than silently ignored. This can mask corrupted or partially uploaded metadata files.

🔧 Proposed fix to add warning log
-          } catch {
-            // Ignore individual file parsing errors
-          }
+          } catch (e: unknown) {
+            logger.warn(
+              `Skipping invalid backup metadata file ${metaFile}: ${e instanceof Error ? e.message : String(e)}`,
+            );
+          }

As per coding guidelines: "Always log failures rather than silently failing; implement proper error handling and logging".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/security/backup/index.ts` around lines 837 - 839, The inner catch in
src/lib/security/backup/index.ts currently swallows parsing/read errors; change
it to log a warning with the error and the metadata file identifier before
continuing so corruption is visible. Locate the try/catch around the metadata
parsing (the anonymous catch that currently has "// Ignore individual file
parsing errors") and replace the silent swallow with a logger call (e.g.,
processLogger.warn or the module's logger) that includes a descriptive message,
the caught error object, and the filename/metadata key being processed so
callers can trace corrupted or partially uploaded metadata. Ensure the code
still continues processing other files after logging.
🧹 Nitpick comments (1)
src/lib/security/backup/index.ts (1)

816-818: Unused variable location from destructuring.

The location variable is destructured but never used. This was flagged by static analysis.

♻️ Proposed fix
-      const [, provider] = storageEntries[i];
+      const entry = storageEntries[i];
+      if (!entry) continue;
+      const provider = entry[1];

Or simply use _ convention if your linter allows it:

-      const [, provider] = storageEntries[i];
+      const [_location, provider] = storageEntries[i];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/security/backup/index.ts` around lines 816 - 818, The destructuring
in the loop over storageEntries creates an unused variable `location` (for
example in the for loop that does "const [, provider] = storageEntries[i]"),
causing lint warnings; fix it by either removing the unused destructured element
(only extract `provider`) or replace it with a throwaway name like `_location`
or `_` according to linter rules so only `provider` is used in the loop body
(update the destructuring in that loop where storageEntries is iterated).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/lib/security/backup/index.ts`:
- Around line 859-864: The current fallback to Date.now()-24h for baselineTime
when type === BackupType.DIFFERENTIAL can create incomplete backups; instead,
after calling this.getLastBackupTime(requireFull) check if requireFull is true
and lastBackupTime is null/undefined and fail fast by throwing a descriptive
error (e.g., "No FULL baseline found for DIFFERENTIAL backup") rather than
assigning a 24-hour fallback; update the logic around baselineTime (the block
using BackupType.DIFFERENTIAL / BackupType.INCREMENTAL and
this.getLastBackupTime(requireFull)) to throw when requireFull &&
!lastBackupTime so callers can handle the missing baseline.
- Around line 837-839: The inner catch in src/lib/security/backup/index.ts
currently swallows parsing/read errors; change it to log a warning with the
error and the metadata file identifier before continuing so corruption is
visible. Locate the try/catch around the metadata parsing (the anonymous catch
that currently has "// Ignore individual file parsing errors") and replace the
silent swallow with a logger call (e.g., processLogger.warn or the module's
logger) that includes a descriptive message, the caught error object, and the
filename/metadata key being processed so callers can trace corrupted or
partially uploaded metadata. Ensure the code still continues processing other
files after logging.

---

Nitpick comments:
In `@src/lib/security/backup/index.ts`:
- Around line 816-818: The destructuring in the loop over storageEntries creates
an unused variable `location` (for example in the for loop that does "const [,
provider] = storageEntries[i]"), causing lint warnings; fix it by either
removing the unused destructured element (only extract `provider`) or replace it
with a throwaway name like `_location` or `_` according to linter rules so only
`provider` is used in the loop body (update the destructuring in that loop where
storageEntries is iterated).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d0f3fcd-ecf8-4c73-b2f1-a6117637a010

📥 Commits

Reviewing files that changed from the base of the PR and between 919a5fd and 41e59c7.

📒 Files selected for processing (1)
  • src/lib/security/backup/index.ts

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/lib/security/backup/index.ts">

<violation number="1" location="src/lib/security/backup/index.ts:821">
P2: `getLastBackupTime` uses `listFiles("backups/")`, but storage providers treat the argument as a full glob pattern (not a recursive prefix). For providers like `LocalFileSystemProvider`, the pattern is anchored, so `backups/` never matches nested metadata files and the last-backup scan always returns null.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

if (!provider) continue;

try {
const files = await provider.listFiles("backups/");
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: getLastBackupTime uses listFiles("backups/"), but storage providers treat the argument as a full glob pattern (not a recursive prefix). For providers like LocalFileSystemProvider, the pattern is anchored, so backups/ never matches nested metadata files and the last-backup scan always returns null.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/lib/security/backup/index.ts, line 821:

<comment>`getLastBackupTime` uses `listFiles("backups/")`, but storage providers treat the argument as a full glob pattern (not a recursive prefix). For providers like `LocalFileSystemProvider`, the pattern is anchored, so `backups/` never matches nested metadata files and the last-backup scan always returns null.</comment>

<file context>
@@ -810,83 +809,95 @@ export class BackupSecurityManager {
-        const files = await provider.listFiles("backups/") // Fallback to list all and filter if pattern not supported
-        const metaFiles = files.filter(f => f.endsWith(".meta.json"))
-        for (const file of metaFiles) {
+        const files = await provider.listFiles("backups/");
+        const metaFiles = files.filter((f) => f.endsWith(".meta.json"));
+        for (const metaFile of metaFiles) {
</file context>
Suggested change
const files = await provider.listFiles("backups/");
const files = await provider.listFiles("backups/*/*/*/*.meta.json");
Fix with Cubic

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements real backup payload collection by querying Mongoose models and adds metadata scanning to establish a baseline timestamp for differential/incremental backups.

Changes:

  • Added getLastBackupTime() to scan backups/*.meta.json across storage providers and compute a baseline timestamp.
  • Replaced dummy getDataForBackup() data with a Mongoose-backed collection sweep using .lean().cursor() and optional updatedAt filtering.
  • Standardized some error logging to prefer error.message.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


try {
const mongooseModule = "mongoose";
const mongoose = (await import(/* @vite-ignore */ mongooseModule)).default || await import(/* @vite-ignore */ mongooseModule);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does two dynamic imports and can yield inconsistent types: if .default is missing, mongoose becomes the module namespace object, so mongoose.modelNames() may not exist. Import once, then resolve default vs namespace from that single result (e.g., store the import result in a variable and use mod.default ?? mod). This also avoids the extra module load.

Suggested change
const mongoose = (await import(/* @vite-ignore */ mongooseModule)).default || await import(/* @vite-ignore */ mongooseModule);
const mongooseImport = await import(/* @vite-ignore */ mongooseModule);
const mongoose = (mongooseImport as any).default ?? mongooseImport;

Copilot uses AI. Check for mistakes.
// [PIX-44] TODO: No more fucking cop-outs
const dummyData = {
message: `This is a ${type} backup created at ${new Date().toISOString()}`,
let appDataJson = '{"timestamp":"' + new Date().toISOString() + '","type":"' + type + '","data":{';
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building a large JSON payload via repeated string += ... inside nested loops can become very costly (frequent reallocations/copies) as the backup grows. Prefer accumulating into an array of string chunks and join('') once, or write into a dedicated buffer/stream abstraction; it keeps the same sequential read pattern without the quadratic-ish string concatenation overhead.

Copilot uses AI. Check for mistakes.
Comment on lines +874 to +877
if (!isFirstModel) {
appDataJson += ',';
}
appDataJson += '"' + modelName + '":[';
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building a large JSON payload via repeated string += ... inside nested loops can become very costly (frequent reallocations/copies) as the backup grows. Prefer accumulating into an array of string chunks and join('') once, or write into a dedicated buffer/stream abstraction; it keeps the same sequential read pattern without the quadratic-ish string concatenation overhead.

Copilot uses AI. Check for mistakes.
Comment on lines +883 to +889
for await (const doc of cursor) {
if (!isFirstDoc) {
appDataJson += ',';
}
appDataJson += JSON.stringify(doc);
isFirstDoc = false;
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building a large JSON payload via repeated string += ... inside nested loops can become very costly (frequent reallocations/copies) as the backup grows. Prefer accumulating into an array of string chunks and join('') once, or write into a dedicated buffer/stream abstraction; it keeps the same sequential read pattern without the quadratic-ish string concatenation overhead.

Copilot uses AI. Check for mistakes.
appDataJson += ']';
}

appDataJson += '}}';
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building a large JSON payload via repeated string += ... inside nested loops can become very costly (frequent reallocations/copies) as the backup grows. Prefer accumulating into an array of string chunks and join('') once, or write into a dedicated buffer/stream abstraction; it keeps the same sequential read pattern without the quadratic-ish string concatenation overhead.

Copilot uses AI. Check for mistakes.

// Use TextEncoder for cross-environment compatibility
return new TextEncoder().encode(JSON.stringify(dummyData))
return new TextEncoder().encode(appDataJson);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building a large JSON payload via repeated string += ... inside nested loops can become very costly (frequent reallocations/copies) as the backup grows. Prefer accumulating into an array of string chunks and join('') once, or write into a dedicated buffer/stream abstraction; it keeps the same sequential read pattern without the quadratic-ish string concatenation overhead.

Copilot uses AI. Check for mistakes.
if (type === BackupType.DIFFERENTIAL || type === BackupType.INCREMENTAL) {
const requireFull = type === BackupType.DIFFERENTIAL;
const lastBackupTime = await this.getLastBackupTime(requireFull);
baselineTime = lastBackupTime || new Date(Date.now() - 24 * 60 * 60 * 1000);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 24 * 60 * 60 * 1000 fallback is a magic number and the rationale isn’t encoded in the code. Extract this to a named constant (e.g., DEFAULT_BASELINE_LOOKBACK_MS) and add a short comment explaining why 24h is the chosen fallback.

Copilot uses AI. Check for mistakes.
Comment on lines +837 to +838
} catch {
// Ignore individual file parsing errors
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swallowing all per-file parsing errors makes it hard to diagnose corrupted/partial metadata and can silently degrade baseline discovery (e.g., always falling back). Consider at least logging at debug/warn with the metaFile name (and the caught error message), or explicitly handling expected parse failures vs unexpected ones.

Suggested change
} catch {
// Ignore individual file parsing errors
} catch (error: unknown) {
logger.warn(
`Failed to parse backup metadata file "${metaFile}": ${
error instanceof Error ? error.message : String(error)
}`,
);

Copilot uses AI. Check for mistakes.
Co-authored-by: daggerstuff <261005129+daggerstuff@users.noreply.github.com>
Co-authored-by: daggerstuff <261005129+daggerstuff@users.noreply.github.com>
Co-authored-by: daggerstuff <261005129+daggerstuff@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants