You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: source/includes/_bulk_data.md.erb
+37-21Lines changed: 37 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
To perform advanced reporting or analytics on your ControlShift data, you can mirror all your data into an external database.
4
4
This can be a helpful tool for answering high-level questions about member engagement or integrating activity with data from other tools.
5
-
Once your data is in an external data warehouse replica analysts can use SQL to answer questions about activity or join it
5
+
Once your data is in an external data warehouse replica, analysts can use SQL to answer questions about activity or join it
6
6
with data from other sources.
7
7
8
8
<asideclass="warning">
@@ -11,43 +11,48 @@ A fresh copy of the data is provided nightly, but in between nightly exports, on
11
11
If real-time data is required, we recommend consuming the specific relevant webhooks (e.g. <code>signature.updated</code>) rather than relying on the incremental bulk data exports.
12
12
</aside>
13
13
14
-
We provide a set of automated bulk exports and webhooks, along with examples (linked below) on how to use them.
14
+
To get the data into your external system, you'll need to consume **bulk data exports**. There are two types of exports:
15
15
16
-
It's possible to consume the Bulk Data API in its underlying format as CSV files in an S3 bucket or as a higher level
17
-
HTTPS Webhook API that is not specific to AWS or S3. Many data warehouse integration technologies like [BigQuery S3 Transfers](https://cloud.google.com/bigquery/docs/s3-transfer),
18
-
[Airbyte](https://docs.airbyte.com/integrations/sources/s3/) or [AWS Data Glue](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-an-etl-service-pipeline-to-load-data-incrementally-from-amazon-s3-to-amazon-redshift-using-aws-glue.html)
19
-
are able to natively process files in S3 buckets. However, if you are using a different technology or want to implement a custom integration
20
-
you can use our webhook events to get the same data in a cloud platform agnostic way.
16
+
- The **full export** happens once a day, and includes a complete copy of the current data in the tables.
17
+
- The **incremental export** happens once a _minute_, and includes only rows that have been added to the table in the last minute.
21
18
22
-
We provide a [ControlShift to Redshift Pipeline](#bulk-data-controlshift-to-redshift-pipeline) as an example of sample code that demonstrates how to use the high-level webhooks to mirror your ControlShift data into Redshift.
23
-
Similar strategies can be used to mirror your data into other data warehouses. We've designed the underlying APIs to work flexibly regardless of
24
-
your technical architecture. Since we expose the file events as standard HTTPS webhooks they should be compatible with any programming language.
19
+
A bulk data export (full or incremental) is a set of CSV files, one for each [ControlShift table](#bulk-data-bulk-data-data-schemas).
25
20
26
-
## Export schedule and webhooks
27
-
28
-
Every night, we'll export the most up-to-date version of all of your data into a set of CSV files, one for each internal ControlShift table. The [data.full_table_exported](#webhook-endpoints-data-full_table_exported) indicates such an export. These full CSV files should _replace_ the existing data in your mirror database.
21
+
## How to use full and incremental exports
22
+
The data in the full exports should _replace_ the existing data in your mirror database.
29
23
<strong>Refreshing your mirror database with the nightly full export is essential to ensuring an accurate copy of the data.</strong>
30
24
31
-
Additionally, once a minute, we'll produce CSV files with any new rows that have been _added_ to ControlShift's internal tables. The [data.incremental_table_exported](#webhook-endpoints-data-incremental_table_exported) webhooks indicates a set of these added-rows exports.
25
+
If you're using the incremental exports, the data in them should be _added_ to your mirror database.
32
26
Remember, the incremental exports do _not_ include any updates or deletions of existing rows; you'll have to wait for the nightly export to receive fresh data with updates and deletions included.
33
27
28
+
## Webhooks
29
+
30
+
When a new bulk data export is ready, you'll receive a webhook to each of your webhook endpoints.
31
+
32
+
- The webhook for full exports is [`data.full_table_exported`](#webhook-endpoints-data-full_table_exported).
33
+
- The webhook for incremental exports is [`data.incremental_table_exported`](#webhook-endpoints-data-incremental_table_exported)
34
+
34
35
<asideclass="notice">
35
36
Bulk data webhooks should be automatically included when adding a new webhook endpoint. Please contact support to report any issues with bulk data webhook generation. For testing, you can manually trigger these wehbooks by visiting <code>https://<your controlshift instance>/org/settings/integrations/webhook_endpoints</code> and clicking on "Trigger" under "Test Nightly Bulk Data Export Webhook".
36
37
</aside>
37
38
38
-
## Bulk Data Data Schemas
39
+
Your system should listen for those webhooks to know when and where to get the exported data.
39
40
40
-
The bulk data webhooks include exports of the following tables:
It's possible to consume the Bulk Data API in its underlying format as CSV files in an S3 bucket or as a higher level
44
+
HTTPS Webhook API that is not specific to AWS or S3. Many data warehouse integration technologies like [BigQuery S3 Transfers](https://cloud.google.com/bigquery/docs/s3-transfer),
45
+
[Airbyte](https://docs.airbyte.com/integrations/sources/s3/) or [AWS Data Glue](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-an-etl-service-pipeline-to-load-data-incrementally-from-amazon-s3-to-amazon-redshift-using-aws-glue.html)
46
+
are able to natively process files in S3 buckets. However, if you are using a different technology or want to implement a custom integration
47
+
you can use our webhook events to get the same data in a cloud platform agnostic way.
45
48
46
-
For full information on the schema of each table, use the `/api/bulk_data/schema.json` API endpoint.
49
+
We provide a [ControlShift to Redshift Pipeline](#bulk-data-controlshift-to-redshift-pipeline) as an example of sample code that demonstrates how to use the high-level webhooks to mirror your ControlShift data into Redshift.
50
+
Similar strategies can be used to mirror your data into other data warehouses. We've designed the underlying APIs to work flexibly regardless of
51
+
your technical architecture. Since we expose the file events as standard HTTPS webhooks they should be compatible with any programming language.
47
52
48
53
## Bulk Data Files
49
54
50
-
Each table exposed by the bulk data API is made available as a CSV file with the URL to download each file sent via webhook.
55
+
Each table exposed by the bulk data API is made available as a CSV file, with the URL to download each file sent via webhook.
51
56
52
57
We expire access to data 6 hours after it has been generated. This means that if you are building an automated system
53
58
to ingest data from this API it must process webhook notifications within 6 hours.
@@ -75,6 +80,17 @@ Finally, when the compression for data exports is enabled the filename includes
75
80
76
81
When the **Compress bulk data exports** option is enabled (available at the Webhooks integration page), incremental and nightly bulk data export files will be compressed in [`bzip2` format](https://sourceware.org/bzip2/). This will improve the performance for fetching the files from S3 since they will be considerably smaller.
77
82
83
+
84
+
## Bulk Data Data Schemas
85
+
86
+
The bulk data webhooks include exports of the following tables:
0 commit comments