-
Notifications
You must be signed in to change notification settings - Fork 0
MinChunkSize: 10MB #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR increases the minimum chunk size for multipart uploads from 5 MB to 10 MB, addressing issue #18 in the data-client repository.
Changes:
- Updated MinChunkSize constant from 5 MB to 10 MB
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func init() { | ||
| v, err := GetLfsCustomTransferInt("lfs.customtransfer.drs.multipart-min-chunk-size", 10) | ||
| if err != nil { | ||
| MinChunkSize = int64(10) * MB | ||
| return | ||
| } |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init function silently ignores errors from GetLfsCustomTransferInt and falls back to the default value without any logging or warning. This means if a user has misconfigured their git config (e.g., set an invalid value), they won't know that their configuration is being ignored. Consider adding a log statement when the error occurs so users are aware their configuration is not being applied.
client/upload/utils_test.go
Outdated
| name string | ||
| fileSize int64 | ||
| wantChunkSize int64 | ||
| wantChunks int64 |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name wantChunks is misleading. Based on the calculation (fileSize + chunkSize - 1) / chunkSize, this represents the number of parts in the multipart upload, not chunks. Consider renaming to wantParts to match the terminology used in the multipart upload code and be consistent with the PR description which refers to "number of parts".
client/common/constants.go
Outdated
| MaxConcurrentUploads = 10 | ||
| MaxRetries = 5 | ||
| MinChunkSize = 5 * 1024 * 1024 | ||
| // MinChunkSize = 10 * 1024 * 1024 |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old constant MinChunkSize is commented out but not removed. This commented-out code should be removed entirely to maintain code cleanliness, especially since MinChunkSize has been moved to a variable declaration below.
| // MinChunkSize = 10 * 1024 * 1024 |
client/common/constants.go
Outdated
|
|
||
| parsed, err := strconv.ParseInt(value, 10, 64) | ||
| if err != nil { | ||
| return defaultValue, fmt.Errorf("invalid int value for %s: >%q<", key, value) |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message format uses angle brackets around the quoted value (e.g., >%q<), which is unusual and inconsistent with standard Go error formatting. Consider using a more conventional format like %q without the angle brackets, or use a clearer message format such as invalid int value for %s: %q.
| return defaultValue, fmt.Errorf("invalid int value for %s: >%q<", key, value) | |
| return defaultValue, fmt.Errorf("invalid int value for %s: %q", key, value) |
client/upload/utils_test.go
Outdated
| chunks := (tt.fileSize + chunkSize - 1) / chunkSize | ||
| if chunks != tt.wantChunks { |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name chunks is misleading. This represents the number of parts, not chunks. The calculation (fileSize + chunkSize - 1) / chunkSize is computing how many parts the file will be divided into. Consider renaming to parts for consistency with the multipart upload terminology used throughout the codebase.
| func TestOptimalChunkSize(t *testing.T) { | ||
| tests := []struct { | ||
| name string | ||
| fileSize int64 | ||
| wantChunkSize int64 | ||
| wantChunks int64 | ||
| }{ | ||
| { | ||
| name: "100MB", | ||
| fileSize: 100 * common.MB, | ||
| wantChunkSize: 100 * common.MB, | ||
| wantChunks: 1, | ||
| }, | ||
| { | ||
| name: "1GB", | ||
| fileSize: 1 * common.GB, | ||
| wantChunkSize: 10 * common.MB, | ||
| wantChunks: 103, | ||
| }, | ||
| { | ||
| name: "5GB", | ||
| fileSize: 5 * common.GB, | ||
| wantChunkSize: 70 * common.MB, | ||
| wantChunks: 74, | ||
| }, | ||
| { | ||
| name: "10GB", | ||
| fileSize: 10 * common.GB, | ||
| wantChunkSize: 128 * common.MB, | ||
| wantChunks: 80, | ||
| }, | ||
| { | ||
| name: "50GB", | ||
| fileSize: 50 * common.GB, | ||
| wantChunkSize: 256 * common.MB, | ||
| wantChunks: 200, | ||
| }, | ||
| { | ||
| name: "100GB", | ||
| fileSize: 100 * common.GB, | ||
| wantChunkSize: 256 * common.MB, | ||
| wantChunks: 400, | ||
| }, | ||
| { | ||
| name: "1TB", | ||
| fileSize: 1 * common.TB, | ||
| wantChunkSize: 1 * common.GB, | ||
| wantChunks: 1024, | ||
| }, | ||
| } | ||
|
|
||
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| chunkSize := OptimalChunkSize(tt.fileSize) | ||
| if chunkSize != tt.wantChunkSize { | ||
| t.Fatalf("chunk size = %d, want %d", chunkSize, tt.wantChunkSize) | ||
| } | ||
|
|
||
| chunks := (tt.fileSize + chunkSize - 1) / chunkSize | ||
| if chunks != tt.wantChunks { | ||
| t.Fatalf("chunks = %d, want %d", chunks, tt.wantChunks) | ||
| } | ||
| }) | ||
| } | ||
| } |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test coverage is incomplete compared to the documented requirements in the PR description. The documentation specifies 15 test cases including boundary cases (exact value, -1 byte, +1 byte), zero bytes, negative values, and specific thresholds. However, the implementation only includes 7 test cases and is missing critical boundary tests such as: 0 bytes, 1 MB, 100 MB + 1 byte, 1 GB + 1 byte, 10 GB + 1 byte, 100 GB + 1 byte, 500 MB, and 500 GB. These boundary cases are important to verify the correct behavior of the switch statement and scaling logic.
client/common/constants.go
Outdated
| } | ||
|
|
||
| MinChunkSize = int64(v) * MB | ||
|
|
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init function has an unnecessary blank line at the end before the closing brace. This should be removed for consistency with Go style guidelines.
client/upload/utils.go
Outdated
| switch { | ||
| case fileSize <= 100*common.MB: | ||
| // Single PUT: return whole file size | ||
|
|
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OptimalChunkSize function has an unnecessary blank line between the comment and the return statement in the first case. This blank line should be removed for consistency with the other cases in the switch statement.
| cmd := exec.Command("git", "config", "--get", "--default", defaultText, key) | ||
| output, err := cmd.Output() | ||
| if err != nil { | ||
| return defaultValue, fmt.Errorf("error reading git config %s: %v", key, err) |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The git config command includes the --default flag on line 114, but git config --default was only introduced in Git 2.18 (June 2018). While this is likely not an issue for most modern environments, the code should verify that this feature is available or document the minimum required Git version. Additionally, the error handling silently falls back to the default value when git config fails, which could mask configuration issues. Consider logging a warning when the git config command fails so users know their configuration is not being applied.
| ## NEW | ||
|
|
||
|
|
||
| ```bash | ||
| go test ./client/upload -run '^TestOptimalChunkSize$' -v | ||
|
|
||
| ``` |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation structure is confusing. The "NEW" section (line 31) doesn't describe the new algorithm or behavior - it only contains test commands and test case specifications. The documentation should include a clear description of the new chunk size calculation algorithm (similar to the "OLD" section) before diving into testing details. Consider adding a subsection that describes the new behavior: <= 100 MB uses fileSize, > 100 MB - <= 1 GB uses 10 MB, > 1 GB - <= 10 GB scales 25-128 MB, > 10 GB - <= 100 GB uses 256 MB, and > 100 GB scales 512-1024 MB.
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Co-authored-by: bwalsh <47808+bwalsh@users.noreply.github.com>
Co-authored-by: bwalsh <47808+bwalsh@users.noreply.github.com>
Co-authored-by: bwalsh <47808+bwalsh@users.noreply.github.com>
Address PR #19 review comments: naming, test coverage, and documentation
- Added nil check for req.Progress in multipart.go to prevent panic - Removed duplicate progress event in singleFile.go (Finalize already emits) - Fixed concurrent BytesSoFar calculation using atomic counter for monotonicity Co-authored-by: bwalsh <47808+bwalsh@users.noreply.github.com>
…progress tracking Fix nil pointer dereferences and duplicate progress events in upload progress tracking
adds tests
feature/progress-event
See #18
Engineering note — Optimal Chunk Size Calculation for Multipart Uploads
OLD:
optimalChunkSize determines the ideal chunk/part size for multipart upload based on file size.
The chunk size (also known as "message size" or "part size") affects upload performance and
must comply with S3 constraints.
Calculation logic:
This results in:
The ~49 GB threshold (10,000 parts × 5 MB) is where files exceed S3's
10,000 part limit when using the minimum chunk size
Examples:
NEW
Purpose
OptimalChunkSizebehavior and return values (chunk size and number of parts) across thresholds, boundaries and scaled ranges.Key behavior to assert
int64bytes; tests should usecommon.MB/common.GBconstants.parts = ceil(fileSize / chunk);fileSize == 0returnsparts == 0.1 MB.<=and some ranges start atX + 1— include exact, -1 and +1 byte checks.Parameterized test cases (file size ⇒ expected chunk ⇒ expected parts)
0bytes1 MB(fallback)01 MB1 MB(<= 100 MB)1100 MB100 MB(<= 100 MB)1100 MB + 1 B10 MB(> 100 MB - <= 1 GB)11500 MB10 MB501 GB(1024 MB)10 MB(<= 1 GB)1031 GB + 1 B25 MB(start of 1 GB - 10 GB scaled range)415 GB(5120 MB)25 MBand128 MB→ ≈70 MB(rounded down)7410 GB(10240 MB)128 MB(end of 1 GB - 10 GB scaled range)8010 GB + 1 B256 MB(> 10 GB - <= 100 GB fixed)4150 GB(51200 MB)256 MB200100 GB(102400 MB)256 MB400100 GB + 1 B512 MB(start of > 100 GB scaled range)201500 GB(512000 MB)512 MBand1024 MB→ ≈739 MB(rounded down)6931 TB(1024 GB = 1,048,576 MB) — note: use project units consistently1024 MB(max of scaled range)1024Test design notes (concise)
client/upload/utils_test.go. Include fields: name,fileSize int64,wantChunk int64,wantParts int64.wantParts. Use integer arithmetic for parts.Execution
go test ./client/upload -vgo test ./client/upload -run '^TestOptimalChunkSize$' -v