Skip to content

feat(sync): add sync command with incremental detection#62

Merged
QuakeWang merged 4 commits intomainfrom
feat/sync-command
Mar 12, 2026
Merged

feat(sync): add sync command with incremental detection#62
QuakeWang merged 4 commits intomainfrom
feat/sync-command

Conversation

@QuakeWang
Copy link
Owner

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

as title

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary behavior tests.
  • This PR requires documentation updates.

Implement single-direction sync between local filesystem and remote
object storage, with upload and download modes.

Core features:
- ETag/MD5-based incremental detection (lazy MD5: computed only when
  sizes match, avoiding unnecessary I/O for large directories)
- Streaming I/O for files > 8 MiB via chunked read/write
- Configurable concurrency via --concurrency flag
- --dry-run mode for previewing changes without transfer
- --delete flag to remove target-only files
- Symlink-safe local directory traversal

Architecture:
- Syncer trait + OpenDalSyncer implementation (operations/sync.rs)
- Generic execute_plan<TF, DF> eliminates upload/download code duplication
- Shared retry utility (utils/retry.rs) with exponential backoff,
  reused by both sync download and existing download command
- SyncArgs CLI struct with direction aliases (up/push/down/pull)
- StorageClient::sync_files facade method

Tech debt cleanup:
- Merge duplicate Human/Raw branches in stat_metadata
- Split grep_path into grep_recursive + grep_single helpers
- Replace 3 production unwrap() calls with expect("reason")

Verified: cargo check, clippy (0 warnings), fmt, 43/43 tests pass
against MinIO including manual sync e2e scenarios.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bdc551296c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +98 to +100
let mut offset = match fs::metadata(local_file_path).await {
Ok(meta) if meta.is_file() => meta.len().min(file_size),
_ => 0u64,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid unsafe resume by file length in single-file download

This uses the existing local file length as a resume offset and then appends from that point without validating that the existing bytes match the remote object prefix. If the destination file already contains unrelated or stale data but is shorter than the remote file, the command will silently produce a corrupted file while reporting success.

Useful? React with 👍 / 👎.

let fut = transfer_fn(self.operator.clone(), rel_path.clone());
async move { (rel_path, action, fut.await) }
})
.buffer_unordered(concurrency)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject zero concurrency before buffering transfers

Transfer concurrency is passed directly to buffer_unordered(concurrency) with no guard, so --concurrency 0 can crash the sync command at runtime instead of returning a validation error. This is user-triggerable from CLI input and should be rejected before execution.

Useful? React with 👍 / 👎.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances Storify's capabilities by introducing a robust file synchronization command that supports incremental transfers and flexible configuration. It also improves the user experience with a temporary configuration cache and expands the range of supported storage providers. These changes aim to provide more efficient and versatile data management options for users interacting with various storage backends.

Highlights

  • New sync command: Introduced a powerful sync command for efficient, incremental synchronization of files between local and remote storage, supporting both upload and download directions, dry-run capabilities, and optional deletion of target files not present in the source.
  • Temporary Config Cache: Added a new feature for managing temporary, encrypted configuration profiles with a Time-To-Live (TTL), allowing users to quickly switch storage providers without creating permanent named profiles.
  • Expanded Storage Provider Support: Integrated Azure Blob Storage (azblob) as a new supported storage provider, enhancing compatibility and flexibility for users.
  • Improved Download Functionality: Enhanced the download command to support incremental downloads for single files and improved path resolution, making it more robust and efficient.
  • Logging and Dependency Updates: Integrated tracing-log and tracing-subscriber for improved logging, and updated various dependencies including md5, socket2, and windows-sys.
Changelog
  • .env.example
    • Added azblob to the list of supported storage providers.
  • Cargo.lock
    • Added md5 and tracing-log dependencies.
    • Updated socket2 from 0.5.9 to 0.6.0.
    • Updated windows-sys from 0.52.0 to 0.59.0.
  • Cargo.toml
    • Added md5 and tracing-log as dependencies.
    • Moved tracing-subscriber to main dependencies.
  • README.md
    • Documented the new temporary config cache feature with usage examples.
  • docs/config-providers.md
    • Updated documentation to include the temporary config cache.
    • Revised the configuration resolution order.
    • Updated encryption cipher from AES-256-GCM to ChaCha20Poly1305.
  • docs/usage.md
    • Added detailed usage instructions for the temporary config cache commands.
  • src/cli/config.rs
    • Implemented temp subcommand for config management.
    • Added format_expires_hint function for displaying TTL expiry.
    • Modified create_profile to support temporary profiles with TTL.
    • Removed environment variable checks from set_default_profile and build_source_hint.
  • src/cli/entry.rs
    • Added Sync command to the main CLI entry point.
    • Introduced TempCommand enum for temporary config cache subcommands.
    • Changed CreateArgs to be boxed.
  • src/cli/storage.rs
    • Added SyncArgs struct to define arguments for the new sync command.
    • Integrated sync_files into the main execute function for storage commands.
  • src/config/loader.rs
    • Added TempCache variant to ConfigSource enum.
    • Included temp_expires_at_unix field in ResolvedConfig.
    • Implemented try_load_temp function to load temporary profiles.
    • Adjusted configuration resolution order to prioritize temporary cache.
  • src/config/profile_store.rs
    • Added TempProfileEntry struct for temporary profile metadata.
    • Included temp field in ProfileStoreFile to store temporary profiles.
    • Implemented methods for managing temporary profiles: temp_profile, temp_expires_at_unix, set_temp_profile, and clear_temp_profile.
    • Updated persist and load_from_file to handle temporary profiles and new encryption cipher.
    • Added now_unix helper function.
    • Added unit tests for temporary profile functionality.
  • src/error.rs
    • Added SyncFailed error variant for synchronization operations.
  • src/main.rs
    • Initialized logging using tracing-log and tracing-subscriber.
  • src/storage.rs
    • Integrated OpenDalSyncer and SyncOptions for synchronization.
    • Added sync_files method to StorageClient.
    • Refactored grep command logic into grep_recursive and grep_single for clarity and robustness.
    • Simplified stat output format.
  • src/storage/constants.rs
    • Added SYNC_CHUNK_SIZE constant for synchronization operations.
  • src/storage/operations/copy.rs
    • Changed reporter to be mutable for accurate progress updates.
  • src/storage/operations/download.rs
    • Implemented incremental download for single files.
    • Added is_directory, resolve_local_file_path, and download_single_file helper functions.
    • Modified download to handle single files and directories more robustly, including partial downloads and retries.
  • src/storage/operations/grep.rs
    • Updated comment for ascii_contains_case_insensitive.
  • src/storage/operations/mod.rs
    • Added sync module and Syncer trait to the operations.
  • src/storage/operations/mv.rs
    • Changed reporter to be mutable for accurate progress updates.
  • src/storage/operations/sync.rs
    • Added new file implementing the Syncer trait with OpenDalSyncer for incremental file synchronization.
    • Defined SyncDirection, SyncOptions, SyncAction, EntryMeta, and SyncReport structs.
    • Included logic for sync_upload, sync_download, execute_plan, list_remote_entries, and print_dry_run.
    • Implemented diffing logic with lazy MD5 computation and ETag normalization.
    • Provided local filesystem helpers like walk_local_dir and compute_file_md5.
    • Implemented streaming upload_file and download_file helpers.
  • src/storage/operations/upload.rs
    • Changed reporter to be mutable for accurate progress updates.
  • src/storage/utils/mod.rs
    • Added retry module for robust operations.
  • src/storage/utils/progress.rs
    • Modified ConsoleProgressReporter to improve progress reporting accuracy and frequency.
  • src/storage/utils/retry.rs
    • Added new file implementing read_range_with_retry with exponential backoff for transient errors.
  • src/tests/behavior/operations/download.rs
    • Added test_download_existing_file_to_file_path behavior test.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new sync command for incremental file synchronization and a temporary, TTL-based configuration cache, alongside enhancements to file downloads and progress reporting. However, a high-severity security vulnerability was identified in both the download and sync operations: the application follows symbolic links when writing to the local filesystem, which could allow an attacker to overwrite sensitive system files. It is recommended to implement a check to ensure that the target path is not a symbolic link before opening it for writing. Additionally, a platform-specific path handling issue in the new sync functionality needs to be addressed to ensure correct operation on Windows.

Comment on lines +111 to +124
let mut file = if offset > 0 {
OpenOptions::new()
.create(true)
.append(true)
.open(local_file_path)
.await?
} else {
OpenOptions::new()
.create(true)
.truncate(true)
.write(true)
.open(local_file_path)
.await?
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The download_single_file function follows symbolic links when opening a local file for writing. If a target file path is a symbolic link to a sensitive file (e.g., /etc/passwd), the application will follow the link and overwrite the target file with the downloaded content. This is because OpenOptions in Rust follows symbolic links by default. An attacker who can create symbolic links in the destination directory could exploit this to achieve unauthorized file modification or even system compromise if they can overwrite critical system files.

Additionally, fs::metadata(local_file_path) on line 98 also follows symbolic links, which could lead to an incorrect offset calculation if the symbolic link points to an existing file.

        if let Ok(meta) = fs::symlink_metadata(local_file_path).await {
            if meta.file_type().is_symlink() {
                return Err(Error::InvalidArgument {
                    message: format!("Target path is a symbolic link: {}", local_file_path.display()),
                });
            }
        }

        let mut file = if offset > 0 {
            OpenOptions::new()
                .create(true)
                .append(true)
                .open(local_file_path)
                .await?
        } else {
            OpenOptions::new()
                .create(true)
                .truncate(true)
                .write(true)
                .open(local_file_path)
                .await?
        };

Comment on lines +597 to +602
let mut file = tokio::fs::OpenOptions::new()
.create(true)
.truncate(true)
.write(true)
.open(local_path)
.await?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The download_file function in the new sync operation follows symbolic links when opening a local file for writing. Similar to the vulnerability in download.rs, if a file in the destination directory is a symbolic link, the application will follow it and overwrite the link's target. While walk_local_dir skips symbolic links during the initial directory scan, it does not prevent download_file from following a symbolic link that already exists (and was skipped by the walk, thus appearing as a "new" file to be created) or one that is created between the scan and the download.

Suggested change
let mut file = tokio::fs::OpenOptions::new()
.create(true)
.truncate(true)
.write(true)
.open(local_path)
.await?;
if let Ok(meta) = fs::symlink_metadata(local_path).await {
if meta.file_type().is_symlink() {
return Err(Error::InvalidArgument {
message: format!("Target path is a symbolic link: {}", local_path.display()),
});
}
}
let mut file = tokio::fs::OpenOptions::new()
.create(true)
.truncate(true)
.write(true)
.open(local_path)
.await?;

- Fix download sync missing same-size content detection: pass
  local_root to diff_entries for both directions, generalize
  determine_action to compute local MD5 when either side has an
  ETag and sizes match. Previously download sync used None for
  local_root, causing size-only fallback that missed same-size
  content changes.

- Add resume download ETag validation: persist remote ETag in a
  .etag sidecar file alongside partial downloads, validate on
  resume that the remote object hasn't changed. If ETag mismatches,
  discard partial data and restart. Clean up sidecar on completion.

- Guard against zero concurrency: clamp concurrency to min 1 in
  execute_plan to prevent buffer_unordered(0) panic.
- Add symlink guard in download_single_file: check symlink_metadata
  before opening local file for writing, reject if target is a
  symbolic link. Prevents attacker-placed symlinks from causing
  arbitrary file overwrites (e.g. /etc/passwd).

- Add same symlink guard in sync download_file for consistency.

- Normalize path separator in walk_local_dir: use replace(MAIN_SEPARATOR, '/')
  instead of to_string() to ensure correct remote paths on Windows.

Addresses gemini-code-assist review on PR #62.
@QuakeWang
Copy link
Owner Author

@codex review

@QuakeWang QuakeWang merged commit 645acab into main Mar 12, 2026
3 checks passed
@QuakeWang QuakeWang deleted the feat/sync-command branch March 12, 2026 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant