Skip to content

Data Sinks

Jefferson Smith edited this page May 25, 2021 · 3 revisions

This is a rough sketch of the content for this section, which will be filled in with more concrete details.

Data Sinks are any locations, platforms or services in which project data is stored, either permanently, or as a stepping stone between sources and final destinations.

Table of Contents

Cassandra

Ethica collects data from phones into rapid-storage Cassandra database distributed over the cloud

CSV

INTERACT's primary long-term storage and migration format is in CSV file format. Files are fingerprinted with MD5 checksums (at source, when possible) and then packed into zip or tar files for efficient storage and transfer.

PostgreSQL

INTERACT's primary working format is as table data in a PostgreSQL instance running on Compute Canada.

ComputeCanada

Describe CC, their mandate, and a bit about their infrastructure.

Institutional file servers

  1. yakitori
  2. harmonizer

($sketched)

Clone this wiki locally