Skip to content

Commit dd35313

Browse files
committed
reorganize the bulk data section for easier reading
1 parent 73c4a29 commit dd35313

1 file changed

Lines changed: 37 additions & 21 deletions

File tree

source/includes/_bulk_data.md.erb

Lines changed: 37 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
To perform advanced reporting or analytics on your ControlShift data, you can mirror all your data into an external database.
44
This can be a helpful tool for answering high-level questions about member engagement or integrating activity with data from other tools.
5-
Once your data is in an external data warehouse replica analysts can use SQL to answer questions about activity or join it
5+
Once your data is in an external data warehouse replica, analysts can use SQL to answer questions about activity or join it
66
with data from other sources.
77

88
<aside class="warning">
@@ -11,43 +11,48 @@ A fresh copy of the data is provided nightly, but in between nightly exports, on
1111
If real-time data is required, we recommend consuming the specific relevant webhooks (e.g. <code>signature.updated</code>) rather than relying on the incremental bulk data exports.
1212
</aside>
1313

14-
We provide a set of automated bulk exports and webhooks, along with examples (linked below) on how to use them.
14+
To get the data into your external system, you'll need to consume **bulk data exports**. There are two types of exports:
1515

16-
It's possible to consume the Bulk Data API in its underlying format as CSV files in an S3 bucket or as a higher level
17-
HTTPS Webhook API that is not specific to AWS or S3. Many data warehouse integration technologies like [BigQuery S3 Transfers](https://cloud.google.com/bigquery/docs/s3-transfer),
18-
[Airbyte](https://docs.airbyte.com/integrations/sources/s3/) or [AWS Data Glue](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-an-etl-service-pipeline-to-load-data-incrementally-from-amazon-s3-to-amazon-redshift-using-aws-glue.html)
19-
are able to natively process files in S3 buckets. However, if you are using a different technology or want to implement a custom integration
20-
you can use our webhook events to get the same data in a cloud platform agnostic way.
16+
- The **full export** happens once a day, and includes a complete copy of the current data in the tables.
17+
- The **incremental export** happens once a _minute_, and includes only rows that have been added to the table in the last minute.
2118

22-
We provide a [ControlShift to Redshift Pipeline](#bulk-data-controlshift-to-redshift-pipeline) as an example of sample code that demonstrates how to use the high-level webhooks to mirror your ControlShift data into Redshift.
23-
Similar strategies can be used to mirror your data into other data warehouses. We've designed the underlying APIs to work flexibly regardless of
24-
your technical architecture. Since we expose the file events as standard HTTPS webhooks they should be compatible with any programming language.
19+
A bulk data export (full or incremental) is a set of CSV files, one for each [ControlShift table](#bulk-data-bulk-data-data-schemas).
2520

26-
## Export schedule and webhooks
27-
28-
Every night, we'll export the most up-to-date version of all of your data into a set of CSV files, one for each internal ControlShift table. The [data.full_table_exported](#webhook-endpoints-data-full_table_exported) indicates such an export. These full CSV files should _replace_ the existing data in your mirror database.
21+
## How to use full and incremental exports
22+
The data in the full exports should _replace_ the existing data in your mirror database.
2923
<strong>Refreshing your mirror database with the nightly full export is essential to ensuring an accurate copy of the data.</strong>
3024

31-
Additionally, once a minute, we'll produce CSV files with any new rows that have been _added_ to ControlShift's internal tables. The [data.incremental_table_exported](#webhook-endpoints-data-incremental_table_exported) webhooks indicates a set of these added-rows exports.
25+
If you're using the incremental exports, the data in them should be _added_ to your mirror database.
3226
Remember, the incremental exports do _not_ include any updates or deletions of existing rows; you'll have to wait for the nightly export to receive fresh data with updates and deletions included.
3327

28+
## Webhooks
29+
30+
When a new bulk data export is ready, you'll receive a webhook to each of your webhook endpoints.
31+
32+
- The webhook for full exports is [`data.full_table_exported`](#webhook-endpoints-data-full_table_exported).
33+
- The webhook for incremental exports is [`data.incremental_table_exported`](#webhook-endpoints-data-incremental_table_exported)
34+
3435
<aside class="notice">
3536
Bulk data webhooks should be automatically included when adding a new webhook endpoint. Please contact support to report any issues with bulk data webhook generation. For testing, you can manually trigger these wehbooks by visiting <code>https://&lt;your controlshift instance&gt;/org/settings/integrations/webhook_endpoints</code> and clicking on "Trigger" under "Test Nightly Bulk Data Export Webhook".
3637
</aside>
3738

38-
## Bulk Data Data Schemas
39+
Your system should listen for those webhooks to know when and where to get the exported data.
3940

40-
The bulk data webhooks include exports of the following tables:
41+
## How to get the exported data
4142

42-
<% data.export_tables['tables'].each do |tbl_info| %>
43-
* <%= tbl_info['table']['name'] %>
44-
<% end %>
43+
It's possible to consume the Bulk Data API in its underlying format as CSV files in an S3 bucket or as a higher level
44+
HTTPS Webhook API that is not specific to AWS or S3. Many data warehouse integration technologies like [BigQuery S3 Transfers](https://cloud.google.com/bigquery/docs/s3-transfer),
45+
[Airbyte](https://docs.airbyte.com/integrations/sources/s3/) or [AWS Data Glue](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-an-etl-service-pipeline-to-load-data-incrementally-from-amazon-s3-to-amazon-redshift-using-aws-glue.html)
46+
are able to natively process files in S3 buckets. However, if you are using a different technology or want to implement a custom integration
47+
you can use our webhook events to get the same data in a cloud platform agnostic way.
4548

46-
For full information on the schema of each table, use the `/api/bulk_data/schema.json` API endpoint.
49+
We provide a [ControlShift to Redshift Pipeline](#bulk-data-controlshift-to-redshift-pipeline) as an example of sample code that demonstrates how to use the high-level webhooks to mirror your ControlShift data into Redshift.
50+
Similar strategies can be used to mirror your data into other data warehouses. We've designed the underlying APIs to work flexibly regardless of
51+
your technical architecture. Since we expose the file events as standard HTTPS webhooks they should be compatible with any programming language.
4752

4853
## Bulk Data Files
4954

50-
Each table exposed by the bulk data API is made available as a CSV file with the URL to download each file sent via webhook.
55+
Each table exposed by the bulk data API is made available as a CSV file, with the URL to download each file sent via webhook.
5156

5257
We expire access to data 6 hours after it has been generated. This means that if you are building an automated system
5358
to ingest data from this API it must process webhook notifications within 6 hours.
@@ -75,6 +80,17 @@ Finally, when the compression for data exports is enabled the filename includes
7580

7681
When the **Compress bulk data exports** option is enabled (available at the Webhooks integration page), incremental and nightly bulk data export files will be compressed in [`bzip2` format](https://sourceware.org/bzip2/). This will improve the performance for fetching the files from S3 since they will be considerably smaller.
7782

83+
84+
## Bulk Data Data Schemas
85+
86+
The bulk data webhooks include exports of the following tables:
87+
88+
<% data.export_tables['tables'].each do |tbl_info| %>
89+
* <%= tbl_info['table']['name'] %>
90+
<% end %>
91+
92+
For full information on the schema of each table, use the `/api/bulk_data/schema.json` API endpoint.
93+
7894
### Interpreting the share_clicks table
7995

8096
The `share_clicks` table is designed to help you understand in detail how social media sharing influences member actions.

0 commit comments

Comments
 (0)