Question about lack of boto3 client threading when downloading from S3. #961

tshimoga · 2025-11-04T17:50:17Z

tshimoga
Nov 4, 2025

Official S3 recommendation on maximizing bandwidth utilization is using multiple threads when downloading objects from S3.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-design-patterns.html

You can use the AWS SDKs to issue GET and PUT requests directly rather than employing the management of transfers in the AWS SDK. This approach lets you tune your workload more directly, while still benefiting from the SDK’s support for retries and its handling of any HTTP 503 responses that might occur. As a general rule, when you download large objects within a Region from Amazon S3 to Amazon EC2, we suggest making concurrent requests for byte ranges of an object at the granularity of 8–16 MB. Make one concurrent request for each 85–90 MB/s of desired network throughput. To saturate a 10 Gb/s network interface card (NIC), you might use about 15 concurrent requests over separate connections. You can scale up the concurrent requests over more connections to saturate faster NICs, such as 25 Gb/s or 100 Gb/s NICs.

But in the code, we can see threads being explicitly disabled
https://github.com/mosaicml/streaming/blob/main/streaming/base/storage/download.py#L222

Am I misunderstanding the code or is there a specific reason for that choice?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about lack of boto3 client threading when downloading from S3. #961

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Question about lack of boto3 client threading when downloading from S3. #961

Uh oh!

tshimoga Nov 4, 2025

Replies: 0 comments

tshimoga
Nov 4, 2025