This example shows how to fetch data from a database and write it to a CSV file. Let's say you have a mysql database with customer table and an order table. You want to generate a single file for customer's orders by joining data from both the tables.
Table 1: ORDERS
| order_id | customer_id | order_date | total_amount |
|---|---|---|---|
| 101 | 12 | 12-09-2023 | 234 |
| 102 | 13 | 12-09-2023 | 198 |
| 103 | 12 | 13-09-2023 | 789 |
| 104 | 15 | 13-09-2023 | 236 |
| 105 | 10 | 15-09-2023 | 100 |
Table 2: CUSTOMERS
| customer_id | customer_name | customer_email | customer_address |
|---|---|---|---|
| 10 | Andrew Symonds | andrew.symonds@ingen.com | 1C 104, Viviana Villas, Riverdale |
| 11 | Lilly Charms | lilly.charms@ingen.com | 12B, Westwoods, Hillcrest |
| 12 | Pat Ronald | pat.ronald@ingen.com | C-2, Green woods, Pinehurst |
| 13 | Mathew Josh | matt.josh@ingen.com | 112, Amar Heights, Oakwood |
| 14 | Elizabeth | elli@ingen.com | 23, South Park, Amsteryork |
| 15 | Rahul Bajaj | rahul.bajaj@ingen.com | 14, Indiana Heights, New Town |
In the output file we want to print customer's order details along with their name and address. So the output file will look like this:
| order_id | customer_id | order_date | total_amount | customer_name | customer_address |
|---|---|---|---|---|---|
| 101 | 12 | 12-09-2023 | 234 | Pat Ronald | C-2, Green woods, Pinehurst |
| 102 | 13 | 12-09-2023 | 198 | Mathew Josh | 112, Amar Heights, Oakwood |
| 103 | 12 | 13-09-2023 | 789 | Pat Ronald | C-2, Green woods, Pinehurst |
| 104 | 15 | 13-09-2023 | 236 | Rahul Bajaj | 14, Indiana Heights, New Town |
| 105 | 10 | 15-09-2023 | 100 | Andrew Symonds | 1C 104, Viviana Villas, Riverdale |
The configuration file defines the database connection properties.
datasource.mysql.host=<host_ip_or_url>
datasource.mysql.user=<username>
datasource.mysql.password=<password>
For a mysql database, we declare mysql as type in sources and its properties like database and query to be executed. The
following code snippet lists our example sources:
sources:
- id: SAMPLE
type: mysql
database: SAMPLE_DATABASE
query: 'select
o.order_id, o.customer_id, o.order_date, o.total_amount, c.customer_name, c.customer_address
from ORDERS o inner join CUSTOMERS c on o.customer_id = c.customer_id'
The output generated by running InterfaceGenerator is called an interface, in our example it is a CSV file created using joining two data sets from single database we defined above. This requires defining the sources, columns required in the output, and output properties. The following code snippet lists our example interface:
interfaces:
sample:
sources: [ 'SAMPLE' ]
columns:
- src_col_name: 'order_id'
- src_col_name: 'customer_id'
- src_col_name: 'order_date'
- src_col_name: 'total_amount'
- src_col_name: 'customer_name'
- src_col_name: 'customer_address'
output:
type: delimited_file
props:
delimiter: '|'
path: ./test/customer_orders.csv
header:
type: delimited_result_header
In the above example, output properties contains type and props.
The output will be written to a file called customer_orders.psv which is a pipe separated file. The file will also
contain a header row, which will be just the column names in our example.
Here's the complete yaml file with interface and source definition:
interfaces:
sample:
sources: [ 'SAMPLE' ]
columns:
- src_col_name: 'order_id'
- src_col_name: 'customer_id'
- src_col_name: 'order_date'
- src_col_name: 'total_amount'
- src_col_name: 'customer_name'
- src_col_name: 'customer_address'
output:
type: delimited_file
props:
delimiter: '|'
path: ./test/customer_orders.csv
header:
type: delimited_result_header
sources:
- id: SAMPLE
type: mysql
database: SAMPLE_DATABASE
query: 'select
o.order_id, o.customer_id, o.order_date, o.total_amount, c.customer_name, c.customer_address
from ORDERS o inner join CUSTOMERS c on o.customer_id = c.customer_id'