Skip to content
Ricard Lado edited this page Mar 9, 2025 · 6 revisions

Canonada is a data science framework that helps you build production-ready streaming pipelines for data processing in Python.

Why Canonada?

  • Standardized: Canonada provides a standardized way to build your data projects
  • Modular: Canonada is modular and allows you to build and visualize data pipelines with ease
  • Memory Efficient: Canonada is memory efficient and can handle large datasets by streaming data through the pipeline instead of loading it all at once

Basic Concepts

Canonada is built around some basic concepts that are important to understand:

  • Pipeline: A pipeline is a group of functions related by their inputs and outputs designed to transform data. In Canonada pipelines are defined as Python objects that contain functions known as Nodes.
  • Node: A node is a function that performs a specific task in a pipeline.
  • System: A system or pipeline system is a Canonada object that manages the sequential execution of pipelines.
  • Catalog: The catalog is an abstraction that contains all the data sources and data sinks that can be used to build pipelines. It is defined in the catalog.toml file in the config/ directory, centralizing the control of data sources and sinks. To learn more about the catalog check out the Catalog section.
  • Datahandler: Datahandlers are objects dispatched by the catalog to provide data streaming functionality of the data sources and sinks. Canonada provides several built-in datahandlers but you can also define your own.

Installation

Canonada is available on PyPI and can be installed using pip:

pip install canonada

Ready to start?

Visit the Getting started section to learn how to start a new project with Canonada.

Clone this wiki locally