Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions fybrik/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,6 @@ You will need a copy of the Fybrik repository (`git clone https://github.com/fyb
kubectl config set-context --current --namespace=fybrik-airbyte-sample
```

1. Create a policy to allow access to any asset (we use a policy that does not restrict access nor mandate any transformations):
```bash
kubectl -n fybrik-system create configmap sample-policy --from-file=$AIRBYTE_MODULE_DIR/fybrik/sample-policy.rego
kubectl -n fybrik-system label configmap sample-policy openpolicyagent.org/policy=rego
while [[ $(kubectl get cm sample-policy -n fybrik-system -o 'jsonpath={.metadata.annotations.openpolicyagent\.org/policy-status}') != '{"status":"ok"}' ]]; do echo "waiting for policy to be applied" && sleep 5; done
```


1. Create an asset (the `userdata` asset) and an application that requires this asset:
```bash
kubectl apply -f $AIRBYTE_MODULE_DIR/fybrik/read-flow/asset.yaml -n fybrik-airbyte-sample
Expand Down
2 changes: 1 addition & 1 deletion fybrik/README_Chaining.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Therefore, to satisfy the constraints, Fybrik must deploy two modules: the airby
dataset, and the [arrow-flight-module](https://github.com/fybrik/arrow-flight-module) for transforming the
dataset based on the governance policies.

The current use case differs from the `Unrestricted Read` use case outlined [here](README.md) in that governance policies mandate transformation of sensitive data. See our use case's [policy](sample-policy-restrictive.rego) vs. the `Unrestricted Read` [policy](sample-policy.rego).
The current use case differs from the `Unrestricted Read` use case outlined [here](README.md) in that governance policies mandate transformation of sensitive data. See our use case's [policy](sample-policy-restrictive.rego).

We demonstrate how, using an Airbyte FybrikModule, a workload can access data stored in google-sheets, postgres, and other data stores supported by Airbyte connectors. To do so a FybrikApplication (i.e. the request) must be submitted indicating the desired data set(s). In this example, we use the `userdata` dataset, a Parquet file found in https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata2.parquet.

Expand Down
63 changes: 51 additions & 12 deletions fybrik/README_mysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,6 @@ You will need a copy of the Fybrik repository (`git clone https://github.com/fyb
kubectl config set-context --current --namespace=fybrik-airbyte-sample
```

1. Create a policy to allow access to any asset (we use a policy that does not restrict access nor mandate any transformations):
```bash
kubectl -n fybrik-system create configmap sample-policy --from-file=$AIRBYTE_MODULE_DIR/fybrik/sample-policy.rego
kubectl -n fybrik-system label configmap sample-policy openpolicyagent.org/policy=rego
while [[ $(kubectl get cm sample-policy -n fybrik-system -o 'jsonpath={.metadata.annotations.openpolicyagent\.org/policy-status}') != '{"status":"ok"}' ]]; do echo "waiting for policy to be applied" && sleep 5; done
```

1. Setup and initialize mysql for reading a dataset

1. Deploy [mysql](https://bitnami.com/stack/mysql/helm) helm chart in `fybrik-airbyte-sample` namespace.
Expand Down Expand Up @@ -111,9 +104,9 @@ In this example, a small dataset is written to mysql table. To do so a FybrikApp

As above, you will need a copy of the Fybrik repository (`git clone https://github.com/fybrik/fybrik.git`). Set the following environment variables: FYBRIK_DIR for the path of the `fybrik` directory, and AIRBYTE_MODULE_DIR for the path of the `airbyte-module` directory.

Repeat steps 1-5 above.
Repeat steps 1-4 above.

6. Setup and initialize mysql for writing a dataset
5. Setup and initialize mysql for writing a dataset

1. Deploy [mysql](https://bitnami.com/stack/mysql/helm) helm chart in `fybrik-airbyte-sample` namespace:
```bash
Expand Down Expand Up @@ -151,7 +144,7 @@ Repeat steps 1-5 above.
EOF
```

1. Create an asset (the `userdata` asset), the policy to access it (we use a policy that does not restrict access nor mandate any transformations), and an application that requires this asset:
1. Create an asset (the `userdata` asset) and an application that requires this asset:
```bash
kubectl apply -f $AIRBYTE_MODULE_DIR/fybrik/write-flow/asset-mysql.yaml -n fybrik-airbyte-sample
kubectl apply -f $AIRBYTE_MODULE_DIR/fybrik/write-flow/application.yaml -n fybrik-airbyte-sample
Expand All @@ -163,7 +156,7 @@ Repeat steps 1-5 above.
```
> _NOTE:_ See the note in step 9 above.

1. Run the following commands to exceute a write command:
1. Run the following commands to execute a write command:
```bash
export CATALOGED_ASSET=fybrik-airbyte-sample/userdata
export ENDPOINT_HOSTNAME=$(kubectl get fybrikapplication my-app-write -n fybrik-airbyte-sample -o "jsonpath={.status.assetStates.${CATALOGED_ASSET}.endpoint.fybrik-arrow-flight.hostname}")
Expand All @@ -186,6 +179,52 @@ Repeat steps 1-5 above.
select * from demo;
```

# Writing and Registering a New Dataset with Fybrik Application

In this example, a small dataset is written to mysql table and registered in the catalog.

As above, you will need a copy of the Fybrik repository (`git clone https://github.com/fybrik/fybrik.git`). Set the following environment variables: FYBRIK_DIR for the path of the `fybrik` directory, and AIRBYTE_MODULE_DIR for the path of the `airbyte-module` directory.

Repeat steps 1-4 above.

5. Register the credentials required for writing the dataset as a kubernetes secret. Replace the value for MYSQL_ROOT_PASSWORD with the mysql service password as described in the section above:

```bash
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: creds-mysql
namespace: fybrik-system
type: Opaque
stringData:
username: root
password: "${MYSQL_ROOT_PASSWORD}"
EOF
```

1. Register a storage account specifying the mysql server details and the credentials and apply an application that writes this asset:
```bash
kubectl apply -f $AIRBYTE_MODULE_DIR/fybrik/write-flow/storage-account.yaml -n fybrik-system
kubectl apply -f $AIRBYTE_MODULE_DIR/fybrik/write-flow/write-new-asset-app.yaml -n fybrik-airbyte-sample
```

1. After the application is created, the Fybrik manager attempts to create the data path for the application. Fybrik realizes that the Airbyte module is required, and deploys it in the `fybrik-blueprints` namespace. To verify that the Airbyte module was indeed deployed, run:
```bash
kubectl get pods -n fybrik-blueprints
```
> _NOTE:_ See the note in [Reading a Dataset by a Fybrik Application](#reading-a-dataset-by-a-fybrik-application).

1. Run the following commands to execute a write command:
```bash
export ASSET_NAME=newdata
export ENDPOINT_HOSTNAME=$(kubectl get fybrikapplication my-app-write -n fybrik-airbyte-sample -o "jsonpath={.status.assetStates.${ASSET_NAME}.endpoint.fybrik-arrow-flight.hostname}")
export AIRBYTE_POD_NAME=$(kubectl get pods -n fybrik-blueprints | grep air |awk '{print $1}')
cd $AIRBYTE_MODULE_DIR/helm/client
./deploy_airbyte_module_client_pod.sh
kubectl exec -it my-shell -n default -- python3 /root/client.py --host ${ENDPOINT_HOSTNAME} --port 80 --asset ${ASSET_NAME} --operation put
```

# Cleanup

When you're finished experimenting with a sample, you may clean up as follows:
Expand All @@ -197,7 +236,7 @@ kubectl delete namespace fybrik-airbyte-sample
```

To experiment with a sample after the deletion of `fybrik-airbyte-sample` namespace,
re-create the namespace with the following commands and continue from step 6 in the chosen sample.
re-create the namespace with the following commands and continue from step 5 in the chosen sample.

```bash
kubectl create namespace fybrik-airbyte-sample
Expand Down
2 changes: 0 additions & 2 deletions fybrik/sample-policy.rego

This file was deleted.

12 changes: 12 additions & 0 deletions fybrik/write-flow/storage-account.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: app.fybrik.io/v1beta2
kind: FybrikStorageAccount
metadata:
name: account-mysql
spec:
id: theshire-mysql
type: mysql
secretRef: creds-mysql
geography: theshire
mysql:
host: mysql.fybrik-airbyte-sample
port: 3306
35 changes: 35 additions & 0 deletions fybrik/write-flow/write-new-asset-app.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
apiVersion: app.fybrik.io/v1beta1
kind: FybrikApplication
metadata:
name: my-notebook-write
labels:
app: my-notebook-write
spec:
selector:
clusterName: thegreendragon
workloadSelector:
matchLabels:
app: my-notebook-write
appInfo:
intent: Fraud Detection
data:
- dataSetID: newdata
flow: write
requirements:
flowParams:
isNewDataSet: true
metadata:
tags:
Purpose.finance: true
columns:
- name: DOB
tags:
PII.Sensitive: true
- name: FirstName
tags:
PII.Sensitive: true
- name: LastName
tags:
PII.Sensitive: true
interface:
protocol: fybrik-arrow-flight
16 changes: 0 additions & 16 deletions hack/test_module.sh
Original file line number Diff line number Diff line change
Expand Up @@ -157,21 +157,6 @@ ${TOOLBIN}/kubectl apply -f $WORKING_DIR/asset.yaml -n fybrik-airbyte-sample

${TOOLBIN}/kubectl describe asset userdata -n fybrik-airbyte-sample


kubectl -n fybrik-system create configmap sample-policy --from-file=$WORKING_DIR/sample-policy.rego
kubectl -n fybrik-system label configmap sample-policy openpolicyagent.org/policy=rego
while [[ $(kubectl get cm sample-policy -n fybrik-system -o 'jsonpath={.metadata.annotations.openpolicyagent\.org/policy-status}') != '{"status":"ok"}' ]]; do echo "waiting for policy to be applied" && sleep 5; done


c=0
while [[ $(${TOOLBIN}/kubectl get cm sample-policy -n fybrik-system -o 'jsonpath={.metadata.annotations.openpolicyagent\.org/policy-status}') != '{"status":"ok"}' ]]
do
echo "waiting"
((c++)) && ((c==25)) && break
sleep 1
done


# apply fybrik application for writing
${TOOLBIN}/kubectl apply -f $WORKING_DIR/write-fybrikapplication.yaml -n fybrik-airbyte-sample

Expand Down Expand Up @@ -230,7 +215,6 @@ ${TOOLBIN}/kubectl get cm -o yaml -n fybrik-blueprints

rm -rf ${tmp_dir}
${TOOLBIN}/kubectl delete namespace fybrik-airbyte-sample
${TOOLBIN}/kubectl -n fybrik-system delete configmap sample-policy

if [ ${RES} == 1 ]
then
Expand Down
5 changes: 0 additions & 5 deletions tests/test_fybrik.sh
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,6 @@ kubectl config set-context --current --namespace=fybrik-airbyte-sample

kubectl apply -f $AIRBYTE_FYBRIK_TEST/read-flow/asset.yaml

kubectl -n fybrik-system create configmap sample-policy --from-file=$AIRBYTE_FYBRIK_TEST/sample-policy.rego
kubectl -n fybrik-system label configmap sample-policy openpolicyagent.org/policy=rego
while [[ $(kubectl get cm sample-policy -n fybrik-system -o 'jsonpath={.metadata.annotations.openpolicyagent\.org/policy-status}') != '{"status":"ok"}' ]]; do echo "waiting for policy to be applied" && sleep 5; done

kubectl apply -f $AIRBYTE_FYBRIK_TEST/read-flow/application.yaml
CMD="kubectl wait --for=condition=ready --all pod -n fybrik-blueprints --timeout=300s
"
Expand Down Expand Up @@ -150,7 +146,6 @@ DIFF=$(diff -b $WORKING_DIR/expected.txt res.out)
# cleanup
/bin/rm res.out
kubectl delete namespace fybrik-airbyte-sample
kubectl -n fybrik-system delete configmap sample-policy
/bin/rm -Rf $FYBRIK_DIR

if [ "${DIFF}" == "" ]
Expand Down