-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Ricard Lado edited this page Mar 9, 2025
·
6 revisions
Canonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python.
- Standardized: Canonada provides a standardized way to build your data projects
- Modular: Canonada is modular and allows you to build and visualize data pipelines with ease
- Memory Efficient: Canonada is memory efficient and can handle large datasets by streaming data through the pipeline instead of loading it all at once
Canonada is built around some basic concepts that are important to understand:
- Pipeline: A pipeline is a group of functions related by their inputs and outputs designed to transform data. In Canonada pipelines are defined as Python objects that contain functions known as Nodes.
- Node: A node is a function that performs a specific task in a pipeline.
- System: A system or pipeline system is a Canonada object that manages the sequential execution of pipelines.
-
Catalog: The catalog is an abstraction that contains all the data sources and data sinks that can be used to build pipelines. It is defined in the
catalog.tomlfile in theconfig/directory, centralizing the control of data sources and sinks. To learn more about the catalog check out the Catalog section. - Datahandler: Datahandlers are objects dispatched by the catalog to provide data streaming functionality of the data sources and sinks. Canonada provides several built-in datahandlers but you can also define your own.
Canonada is available on PyPI and can be installed using pip:
pip install canonadaVisit the Getting started section to learn how to start a new project with Canonada.