-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture Design
The following diagram shows an overview of how pider interacts with different components and an outline of data flow that takes place inside the framework.

The Spiders is responsible for providing an interface Spider for programmer to customize their crawlers under different senarios.
The Kernel is responsible for controlling the data flow between all of components and modules , and triggering events when certain actions occur.
ActievedCarbon component is little same as Item , ItemLoader and ItemPipeline in Scrapy, but it supplies more functionalities.It finally purposes to offer a complete mechanism for data cleaning.

As shown in above prototype, ActivedCarbon has a lot of Pores which supply variety ETL operation method.
- Pore
Pore can be regarded as a container that holds a collection of
ETL handlers, which act data transformation (Reaction), data filter(Filter), data assimilation(Absorber).
Reaction
Reaction does transform operation on data.
Absorber
Absorber performs a role to collect information, which can be used to analyze.
Filter
If you don't want to process all the data, then you can define a Filter to avoid processing invalue data.
More details in customizing your own
ETLmodel, please checkoutDataProcessin Pider