Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

73 changes: 73 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ pkgin install xcp
(although sparse-files are not yet supported in this case).
* Optionally understands `.gitignore` files to limit the copied directories.
* Optional native file-globbing.
* Optional checksum verification to detect copy errors caused by storage or
memory issues. The checksum is calculated during the copy and verified by
re-reading only the destination file. Uses xxHash for minimal performance
impact.

### (Possible) future features

Expand Down Expand Up @@ -132,3 +136,72 @@ large files this can be a significant win:
* Single 4.1GB file on NFSv4 mount
* `cp`: 6m18s
* `xcp`: 0m37s

## Usage Examples

### Basic Copy

```bash
# Simple file copy
xcp source.txt dest.txt

# Recursive directory copy
xcp -r source_dir/ dest_dir/

# Copy with progress bar disabled
xcp --no-progress large_file.bin /mnt/backup/
```

### Checksum Verification

Use `--verify-checksum` to detect copy errors caused by hardware issues:

```bash
# Copy with checksum verification
xcp --verify-checksum important_file.bin backup.bin

# Recursive copy with verification
xcp -r --verify-checksum project/ /backup/project/

# Works with both drivers
xcp --driver=parblock --verify-checksum large_file.bin dest.bin
```

**How it works:**
- Checksum calculated during copy (using xxHash3 for speed)
- Destination file re-read to verify integrity
- Error returned immediately on mismatch (no retry)
- Works with both `parfile` and `parblock` drivers
- Sparse file optimization is disabled when checksum verification is enabled to
ensure consistent hashing

**Performance:** ~2x overhead due to destination re-read (e.g., 34ms → 70ms for
50MB). Worthwhile for critical data where integrity matters.

**For mechanical hard drives (HDD):** Checksum verification may cause performance
issues due to the read-after-write pattern. If you experience hangs or slow
performance on HDDs, the verification should still work correctly but may be slower.
For maximum data integrity assurance, add `--fsync` to force data to disk before
verification (slower but guarantees correct checksums even in rare cache coherency
scenarios):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also be very slow on NFS mounts; copy_file_range() allows the copy to happen server-side; checksumming will require that data to be copied back to the client for verification.


```bash
# Maximum integrity for critical data (slower on HDD)
xcp --verify-checksum --fsync critical_data.db backup.db
```

### Other Options

```bash
# Copy with specific number of workers
xcp --workers 8 -r large_dir/ backup/

# Use block-level parallelism
xcp --driver=parblock --block-size=4MB huge_file.bin dest.bin

# Respect .gitignore files
xcp -r --gitignore project/ backup/

# Sync to disk after each file
xcp --fsync critical_file.db backup.db
```
1 change: 1 addition & 0 deletions libxcp/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ num_cpus = "1.17.0"
regex = "1.11.2"
thiserror = "2.0.16"
walkdir = "2.5.0"
xxhash-rust = { version = "0.8", features = ["xxh3"] }

[dev-dependencies]
tempfile = "3.21.0"
Expand Down
8 changes: 8 additions & 0 deletions libxcp/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,13 @@ pub struct Config {
/// semantics of `cp` numbered backups
/// (e.g. `file.txt.~123~`). Default is `None`.
pub backup: Backup,

/// Verify checksums after copying.
///
/// Calculates a checksum during the copy operation and verifies
/// it by reading back the destination file. If the checksums
/// don't match, an error is returned. Default is `false`.
pub verify_checksum: bool,
}

impl Config {
Expand Down Expand Up @@ -171,6 +178,7 @@ impl Default for Config {
fsync: false,
reflink: Reflink::Auto,
backup: Backup::None,
verify_checksum: false,
}
}
}
4 changes: 3 additions & 1 deletion libxcp/src/drivers/parblock.rs
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,9 @@ fn queue_file_blocks(
queue_file_range(&harc, 0..len, pool, status_channel)
};

if probably_sparse(&harc.infd)? {
// Disable sparse file optimization when checksum verification is enabled
// to ensure consistent hashing of all file content including holes
if !harc.config.verify_checksum && probably_sparse(&harc.infd)? {
if let Some(extents) = map_extents(&harc.infd)? {
let sparse_map = merge_extents(extents)?;
let mut queued = 0;
Expand Down
7 changes: 7 additions & 0 deletions libxcp/src/errors.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,11 @@ pub enum XcpError {

#[error("Unsupported OS")]
UnsupportedOS(&'static str),

#[error("Checksum verification failed for {path}: expected {expected:016x}, got {actual:016x}")]
ChecksumMismatch {
path: PathBuf,
expected: u64,
actual: u64,
},
}
90 changes: 87 additions & 3 deletions libxcp/src/operations.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,17 @@
use std::os::unix::fs::{chown, MetadataExt};
use std::{cmp, thread};
use std::fs::{self, canonicalize, create_dir_all, read_link, File, Metadata};
use std::io::Read;
use std::path::{Path, PathBuf};
use std::sync::Arc;
use std::sync::{Arc, Mutex};

use crossbeam_channel as cbc;
use libfs::{
allocate_file, copy_file_bytes, copy_owner, copy_permissions, copy_timestamps, next_sparse_segments, probably_sparse, reflink, sync, FileType
};
use log::{debug, error, info, warn};
use walkdir::WalkDir;
use xxhash_rust::xxh3::Xxh3;

use crate::backup::{get_backup_path, needs_backup};
use crate::config::{Config, Reflink};
Expand All @@ -39,6 +41,8 @@ pub struct CopyHandle {
pub outfd: File,
pub metadata: Metadata,
pub config: Arc<Config>,
pub to: PathBuf,
src_checksum: Mutex<Option<u64>>,
}

impl CopyHandle {
Expand All @@ -60,6 +64,8 @@ impl CopyHandle {
outfd,
metadata,
config: config.clone(),
to: to.to_path_buf(),
src_checksum: Mutex::new(None),
};

Ok(handle)
Expand All @@ -68,13 +74,27 @@ impl CopyHandle {
/// Copy len bytes from wherever the descriptor cursors are set.
fn copy_bytes(&self, len: u64, updates: &Arc<dyn StatusUpdater>) -> Result<u64> {
let mut written = 0;
let mut hasher = if self.config.verify_checksum {
Some(Xxh3::new())
} else {
None
};

while written < len {
let bytes_to_copy = cmp::min(len - written, self.config.block_size);
let bytes = copy_file_bytes(&self.infd, &self.outfd, bytes_to_copy)? as u64;
let bytes = if let Some(ref mut h) = hasher {
copy_file_bytes_with_hash(&self.infd, &self.outfd, bytes_to_copy, h)?
} else {
copy_file_bytes(&self.infd, &self.outfd, bytes_to_copy)? as u64
};
written += bytes;
updates.send(StatusUpdate::Copied(bytes))?;
}

if let Some(h) = hasher {
*self.src_checksum.lock().unwrap() = Some(h.digest());
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be unwrapped; it can be mapped to an error type.

}

Ok(written)
}

Expand Down Expand Up @@ -119,7 +139,9 @@ impl CopyHandle {
if self.try_reflink()? {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting will be silently ignored if on a reflink-capable filesystem (btrfs, XFS). Is this deliberate?

return Ok(self.metadata.len());
}
let total = if probably_sparse(&self.infd)? {
// Disable sparse file optimization when checksum verification is enabled
// to ensure consistent hashing of all file content including holes
let total = if !self.config.verify_checksum && probably_sparse(&self.infd)? {
self.copy_sparse(updates)?
} else {
self.copy_bytes(self.metadata.len(), updates)?
Expand All @@ -138,10 +160,27 @@ impl CopyHandle {
if self.config.ownership && copy_owner(&self.infd, &self.outfd).is_err() {
warn!("Failed to copy file ownership: {:?}", self.infd);
}

if self.config.fsync {
debug!("Syncing file {:?}", self.outfd);
sync(&self.outfd)?;
}

if self.config.verify_checksum {
if let Some(expected) = *self.src_checksum.lock().unwrap() {
debug!("Verifying checksum for {:?}", self.to);
let actual = compute_file_checksum(&self.to)?;
if expected != actual {
return Err(XcpError::ChecksumMismatch {
path: self.to.clone(),
expected,
actual,
}.into());
}
debug!("Checksum verified: {:016x}", expected);
}
}

Ok(())
}
}
Expand Down Expand Up @@ -265,3 +304,48 @@ pub fn tree_walker(
fn empty_path(path: &Path) -> bool {
*path == PathBuf::new()
}

fn copy_file_bytes_with_hash(infd: &File, outfd: &File, bytes: u64, hasher: &mut Xxh3) -> Result<u64> {
use std::io::BufReader;

const BUFFER_SIZE: usize = 64 * 1024;
let mut reader = BufReader::with_capacity(BUFFER_SIZE, infd);
let mut writer = std::io::BufWriter::with_capacity(BUFFER_SIZE, outfd);
let mut buffer = vec![0u8; BUFFER_SIZE];
let mut total_copied = 0u64;

while total_copied < bytes {
let to_read = cmp::min(bytes - total_copied, BUFFER_SIZE as u64) as usize;
let n = reader.read(&mut buffer[..to_read])?;
if n == 0 {
break;
}

hasher.update(&buffer[..n]);
std::io::Write::write_all(&mut writer, &buffer[..n])?;
total_copied += n as u64;
}

std::io::Write::flush(&mut writer)?;
Ok(total_copied)
}

fn compute_file_checksum(path: &Path) -> Result<u64> {
use std::io::BufReader;

const BUFFER_SIZE: usize = 64 * 1024;
let file = File::open(path)?;
let mut reader = BufReader::with_capacity(BUFFER_SIZE, file);
let mut hasher = Xxh3::new();
let mut buffer = vec![0u8; BUFFER_SIZE];

loop {
let n = reader.read(&mut buffer)?;
if n == 0 {
break;
}
hasher.update(&buffer[..n]);
}

Ok(hasher.digest())
}
11 changes: 11 additions & 0 deletions src/options.rs
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,16 @@ pub struct Opts {
#[arg(long, default_value = "none")]
pub backup: Backup,

/// Verify checksums after copying.
///
/// Calculates a checksum during the copy operation and verifies
/// it by reading back the destination file. If the checksums
/// don't match, an error is returned. This detects storage or
/// memory errors during copy. Note: This will re-read the
/// destination file after copying, which may impact performance.
#[arg(long)]
pub verify_checksum: bool,

/// Path list.
///
/// Source and destination files, or multiple source(s) to a directory.
Expand Down Expand Up @@ -201,6 +211,7 @@ impl From<&Opts> for Config {
fsync: opts.fsync,
reflink: opts.reflink,
backup: opts.backup,
verify_checksum: opts.verify_checksum,
}
}
}
Loading