Skip to content

Throttle requests to a domain by total bandwidth in a specified period of time #19

@ScottMansfield

Description

@ScottMansfield

from @truthpickle via livecoding.tv:

The crawler should be able to keep track of the total amount of bandwidth used per domain and limit to a specified amount in a specified period of time, e.g. 1 GB / month or 400MB / week. The fetch stage can just not retrieve the pages once the limit is passed. When parsing, a little softness can be acceptable, but if the limit is passed too far the page should be dropped from the pipeline.

Metadata

Metadata

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions