feat(sync): add sync command with incremental detection#62
Conversation
Implement single-direction sync between local filesystem and remote
object storage, with upload and download modes.
Core features:
- ETag/MD5-based incremental detection (lazy MD5: computed only when
sizes match, avoiding unnecessary I/O for large directories)
- Streaming I/O for files > 8 MiB via chunked read/write
- Configurable concurrency via --concurrency flag
- --dry-run mode for previewing changes without transfer
- --delete flag to remove target-only files
- Symlink-safe local directory traversal
Architecture:
- Syncer trait + OpenDalSyncer implementation (operations/sync.rs)
- Generic execute_plan<TF, DF> eliminates upload/download code duplication
- Shared retry utility (utils/retry.rs) with exponential backoff,
reused by both sync download and existing download command
- SyncArgs CLI struct with direction aliases (up/push/down/pull)
- StorageClient::sync_files facade method
Tech debt cleanup:
- Merge duplicate Human/Raw branches in stat_metadata
- Split grep_path into grep_recursive + grep_single helpers
- Replace 3 production unwrap() calls with expect("reason")
Verified: cargo check, clippy (0 warnings), fmt, 43/43 tests pass
against MinIO including manual sync e2e scenarios.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bdc551296c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let mut offset = match fs::metadata(local_file_path).await { | ||
| Ok(meta) if meta.is_file() => meta.len().min(file_size), | ||
| _ => 0u64, |
There was a problem hiding this comment.
Avoid unsafe resume by file length in single-file download
This uses the existing local file length as a resume offset and then appends from that point without validating that the existing bytes match the remote object prefix. If the destination file already contains unrelated or stale data but is shorter than the remote file, the command will silently produce a corrupted file while reporting success.
Useful? React with 👍 / 👎.
| let fut = transfer_fn(self.operator.clone(), rel_path.clone()); | ||
| async move { (rel_path, action, fut.await) } | ||
| }) | ||
| .buffer_unordered(concurrency) |
There was a problem hiding this comment.
Reject zero concurrency before buffering transfers
Transfer concurrency is passed directly to buffer_unordered(concurrency) with no guard, so --concurrency 0 can crash the sync command at runtime instead of returning a validation error. This is user-triggerable from CLI input and should be rejected before execution.
Useful? React with 👍 / 👎.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances Storify's capabilities by introducing a robust file synchronization command that supports incremental transfers and flexible configuration. It also improves the user experience with a temporary configuration cache and expands the range of supported storage providers. These changes aim to provide more efficient and versatile data management options for users interacting with various storage backends. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new sync command for incremental file synchronization and a temporary, TTL-based configuration cache, alongside enhancements to file downloads and progress reporting. However, a high-severity security vulnerability was identified in both the download and sync operations: the application follows symbolic links when writing to the local filesystem, which could allow an attacker to overwrite sensitive system files. It is recommended to implement a check to ensure that the target path is not a symbolic link before opening it for writing. Additionally, a platform-specific path handling issue in the new sync functionality needs to be addressed to ensure correct operation on Windows.
| let mut file = if offset > 0 { | ||
| OpenOptions::new() | ||
| .create(true) | ||
| .append(true) | ||
| .open(local_file_path) | ||
| .await? | ||
| } else { | ||
| OpenOptions::new() | ||
| .create(true) | ||
| .truncate(true) | ||
| .write(true) | ||
| .open(local_file_path) | ||
| .await? | ||
| }; |
There was a problem hiding this comment.
The download_single_file function follows symbolic links when opening a local file for writing. If a target file path is a symbolic link to a sensitive file (e.g., /etc/passwd), the application will follow the link and overwrite the target file with the downloaded content. This is because OpenOptions in Rust follows symbolic links by default. An attacker who can create symbolic links in the destination directory could exploit this to achieve unauthorized file modification or even system compromise if they can overwrite critical system files.
Additionally, fs::metadata(local_file_path) on line 98 also follows symbolic links, which could lead to an incorrect offset calculation if the symbolic link points to an existing file.
if let Ok(meta) = fs::symlink_metadata(local_file_path).await {
if meta.file_type().is_symlink() {
return Err(Error::InvalidArgument {
message: format!("Target path is a symbolic link: {}", local_file_path.display()),
});
}
}
let mut file = if offset > 0 {
OpenOptions::new()
.create(true)
.append(true)
.open(local_file_path)
.await?
} else {
OpenOptions::new()
.create(true)
.truncate(true)
.write(true)
.open(local_file_path)
.await?
};| let mut file = tokio::fs::OpenOptions::new() | ||
| .create(true) | ||
| .truncate(true) | ||
| .write(true) | ||
| .open(local_path) | ||
| .await?; |
There was a problem hiding this comment.
The download_file function in the new sync operation follows symbolic links when opening a local file for writing. Similar to the vulnerability in download.rs, if a file in the destination directory is a symbolic link, the application will follow it and overwrite the link's target. While walk_local_dir skips symbolic links during the initial directory scan, it does not prevent download_file from following a symbolic link that already exists (and was skipped by the walk, thus appearing as a "new" file to be created) or one that is created between the scan and the download.
| let mut file = tokio::fs::OpenOptions::new() | |
| .create(true) | |
| .truncate(true) | |
| .write(true) | |
| .open(local_path) | |
| .await?; | |
| if let Ok(meta) = fs::symlink_metadata(local_path).await { | |
| if meta.file_type().is_symlink() { | |
| return Err(Error::InvalidArgument { | |
| message: format!("Target path is a symbolic link: {}", local_path.display()), | |
| }); | |
| } | |
| } | |
| let mut file = tokio::fs::OpenOptions::new() | |
| .create(true) | |
| .truncate(true) | |
| .write(true) | |
| .open(local_path) | |
| .await?; |
- Fix download sync missing same-size content detection: pass local_root to diff_entries for both directions, generalize determine_action to compute local MD5 when either side has an ETag and sizes match. Previously download sync used None for local_root, causing size-only fallback that missed same-size content changes. - Add resume download ETag validation: persist remote ETag in a .etag sidecar file alongside partial downloads, validate on resume that the remote object hasn't changed. If ETag mismatches, discard partial data and restart. Clean up sidecar on completion. - Guard against zero concurrency: clamp concurrency to min 1 in execute_plan to prevent buffer_unordered(0) panic.
- Add symlink guard in download_single_file: check symlink_metadata before opening local file for writing, reject if target is a symbolic link. Prevents attacker-placed symlinks from causing arbitrary file overwrites (e.g. /etc/passwd). - Add same symlink guard in sync download_file for consistency. - Normalize path separator in walk_local_dir: use replace(MAIN_SEPARATOR, '/') instead of to_string() to ensure correct remote paths on Windows. Addresses gemini-code-assist review on PR #62.
|
@codex review |
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
as title
PR Checklist
Please convert it to a draft if some of the following conditions are not met.