Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM golang:1.18-alpine

WORKDIR /
COPY go.mod ./
COPY go.sum ./
RUN go mod download

COPY main.go ./

RUN chmod +rwx ./tmp

EXPOSE 8080

RUN go build -o go-module-server main.go

CMD [ "/go-module-server" ]
50 changes: 50 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
DOCKER_NAME = go-module
DOCKER_HOSTNAME = ghcr.io
DOCKER_NAMESPACE = aradhalevy
Comment thread
shlomitk1 marked this conversation as resolved.
DOCKER_TAG ?= master

TEMP := /tmp
CHART_LOCAL_PATH ?= helm
CHART_NAME ?= go-module-chart
HELM_RELEASE ?= rel1-${DOCKER_NAME}
HELM_TAG ?= 0.0.0
HELM_VALUES ?= \
--set hello=world1


IMG := ${DOCKER_HOSTNAME}/${DOCKER_NAMESPACE}/${DOCKER_NAME}:${DOCKER_TAG}


CHART_REGISTRY_PATH := oci://${DOCKER_HOSTNAME}/${DOCKER_NAMESPACE}

# To enable OCI experimental support for Helm versions prior to v3.8.0, HELM_EXPERIMENTAL_OCI is set
export HELM_EXPERIMENTAL_OCI=1
export GODEBUG=x509ignoreCN=0

.PHONY: helm-verify
helm-verify:
helm lint ${CHART_LOCAL_PATH}
helm install --dry-run ${HELM_RELEASE} ${CHART_LOCAL_PATH} ${HELM_VALUES}

.PHONY: helm-uninstall
helm-uninstall:
helm uninstall ${HELM_RELEASE} || true

.PHONY: helm-install
helm-install: helm
helm install ${HELM_RELEASE} ${CHART_LOCAL_PATH} ${HELM_VALUES}

.PHONY: helm-chart-push
helm-chart-push:
helm package ${CHART_LOCAL_PATH} --version=${HELM_TAG} --destination=${TEMP}
helm push ${TEMP}/${CHART_NAME}-${HELM_TAG}.tgz ${CHART_REGISTRY_PATH}
rm -rf ${TEMP}/${CHART_NAME}-${HELM_TAG}.tgz

.PHONY: docker-build
docker-build:
docker build . -t ${IMG}

.PHONY: docker-push
docker-push: docker-build
docker push ${IMG}

194 changes: 193 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,193 @@
# go-module
# Go-module

This Go-module is intended as a template module for Fybrik written in [Go](https://go.dev/).

This module features a read capability of data assets, using a generic server implementation written with [Gin Web Framework](https://pkg.go.dev/github.com/gin-gonic/gin) for Go.

# How a Fybrik Application can access a dataset, using a Go module for Fybrik
To see the Go module for Fybrik in action, you need to take these steps:
1. Install Fybrik
2. Register the Go-module to Fybrik
3. Prepare a data asset and register it in a data catalog
4. Deploy a Fybrik application
5. Access the data asset using the Go-module

### Install fybrik
Install Fybrik v1.3 using the [Quick Start](https://fybrik.io/v1.3/get-started/quickstart/), without the section of `Install modules`, and make sure to install Fybrik with Katalog as the data catalog.

### Register the Go-module to Fybrik

To register The Go-module as a Fybrik module apply `module.yaml` to the fybrik-system namespace of your cluster.

To install the module:

```bash
kubectl apply -f https://raw.githubusercontent.com/aradhalevy/go-module/go-module-setup/module.yaml -n fybrik-system
```

### Prepare a data asset and register it in a data catalog

First, you should create a new Kubernetes namespace and set it as the active namespace:

```bash
kubectl create namespace fybrik-notebook-sample
kubectl config set-context --current --namespace=fybrik-notebook-sample
```

This example uses a sample of 100 lines of the [Synthetic Financial Datasets For Fraud Detection](https://www.kaggle.com/ealaxi/paysim1) dataset. Download [`PS_20174392719_1491204439457_log.csv`](https://raw.githubusercontent.com/fybrik/fybrik/master/samples/notebook/PS_20174392719_1491204439457_log.csv) from GitHub.

Upload the CSV file to an object storage of your choice such as AWS S3. For experimentation you can install localstack to your cluster instead of using a cloud service:

1. Define variables for access key and secret key
```bash
export ACCESS_KEY="myaccesskey"
export SECRET_KEY="mysecretkey"
```
2. Install localstack to the currently active namespace and wait for it to be ready:
```bash
helm repo add localstack-charts https://localstack.github.io/helm-charts
helm install localstack localstack-charts/localstack \
--version 0.4.3 \
--set image.tag="1.2.0" \
--set startServices="s3" \
--set service.type=ClusterIP \
--set livenessProbe.initialDelaySeconds=25
kubectl wait --for=condition=ready --all pod -n fybrik-notebook-sample --timeout=120s
```

3. Create a port-forward to communicate with localstack server:
```bash
kubectl port-forward svc/localstack 4566:4566 &
```
4. Use [AWS CLI](https://aws.amazon.com/cli/) to upload the dataset to a new created bucket in the localstack server (make sure to replace /path/to/PS... with the directory you downloaded the data set to):
```bash
export ENDPOINT="http://127.0.0.1:4566"
export BUCKET="demo"
export OBJECT_KEY="PS_20174392719_1491204439457_log.csv"
export FILEPATH="/path/to/PS_20174392719_1491204439457_log.csv"
export REGION=theshire
aws configure set aws_access_key_id ${ACCESS_KEY} && aws configure set aws_secret_access_key ${SECRET_KEY}
aws configure set region ${REGION}
aws --endpoint-url=${ENDPOINT} s3api create-bucket --bucket ${BUCKET} --region ${REGION} --create-bucket-configuration LocationConstraint=${REGION}
aws --endpoint-url=${ENDPOINT} s3api put-object --bucket ${BUCKET} --key ${OBJECT_KEY} --body ${FILEPATH}
```

In this step you are performing the role of the data owner, registering his data in the data catalog and registering the credentials for accessing the data in the credential manager.

We now explain how to register a dataset in the Katalog data catalog.

Begin by registering the credentials required for accessing the dataset as a kubernetes secret. Replace the values for `access_key` and `secret_key` with the values from the object storage service that you used and run:

```bash
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: paysim-csv
type: Opaque
stringData:
access_key: "${ACCESS_KEY}"
secret_key: "${SECRET_KEY}"
EOF
```

Next, register the data asset itself in the data catalog.
We use port-forwarding to send asset creation requests to the Katalog connector.

```bash
cat << EOF | kubectl apply -f -
apiVersion: katalog.fybrik.io/v1alpha1
kind: Asset
metadata:
name: paysim-csv
spec:
secretRef:
name: paysim-csv
details:
dataFormat: csv
connection:
name: s3
s3:
endpoint: "http://localstack.fybrik-notebook-sample.svc.cluster.local:4566"
bucket: "demo"
object_key: "PS_20174392719_1491204439457_log.csv"
metadata:
name: Synthetic Financial Datasets For Fraud Detection
geography: theshire
tags:
finance: true
EOF
```

### Deploy a Fybrik application

Create a `FybrikApplication` resource to register the notebook workload to the control plane of Fybrik. The value you place in the `dataSetID` field is your asset ID, as explained above. you can run the following to Create a `FybrikApplication` resource for this example:

```yaml
cat <<EOF | kubectl apply -f -
apiVersion: app.fybrik.io/v1beta1
kind: FybrikApplication
metadata:
name: my-notebook
labels:
app: my-notebook
spec:
selector:
workloadSelector:
matchLabels:
app: my-notebook
appInfo:
intent: Fraud Detection
data:
- dataSetID: "fybrik-notebook-sample/paysim-csv"
flow: read
requirements:
Comment thread
aradhalevy marked this conversation as resolved.
interface:
protocol: fybrik-go
EOF
```

Run the following command to wait until the `FybrikApplication` is ready:

```bash
while [[ $(kubectl get fybrikapplication my-notebook -o 'jsonpath={.status.ready}') != "true" ]]; do echo "waiting for FybrikApplication" && sleep 5; done
```

### Access the data asset using the Go-module

In your terminal, run the following command to print the endpoint to use for reading the data. It fetches the code from the `FybrikApplication` resource:

```bash
export ENDPOINT_SCHEME=$(kubectl get fybrikapplication my-notebook -o "jsonpath={.status.assetStates.fybrik-notebook-sample/paysim-csv.endpoint.fybrik-go.scheme}")
export ENDPOINT_HOSTNAME=$(kubectl get fybrikapplication my-notebook -o "jsonpath={.status.assetStates.fybrik-notebook-sample/paysim-csv.endpoint.fybrik-go.hostname}")
export ENDPOINT_PORT=$(kubectl get fybrikapplication my-notebook -o "jsonpath={.status.assetStates.fybrik-notebook-sample/paysim-csv.endpoint.fybrik-go.port}")
export ASSET_NAME="fybrik-notebook-sample%%2Fpaysim-csv"
printf "\n${ENDPOINT_SCHEME}://${ENDPOINT_HOSTNAME}:${ENDPOINT_PORT}/${ASSET_NAME}\n\n"
```

The next steps use the endpoint to read the data in a Kubernetes pod.

to first set up a basic pod run:

```bash
kubectl run mypod --image=docker.io/library/alpine:3.18 -i --tty -- sh
```

Now, in the shell of the pod that opened up, run the following (make sure to put in the ENDPOINT you printed):

```bash
apk --no-cache add curl util-linux && curl -L -o /tmp/data.csv ('ENTER ENDPOINT HERE') && column -s, -t < /tmp/data.csv
```

And you should be able to see the data set.

### Cleanup

When you're finished experimenting with a sample, you may clean up as follows:

1. Stop ```kubectl port-forward``` processes (e.g., using ```pkill kubectl```)
2. Delete the namespace created for this sample:
Comment thread
shlomitk1 marked this conversation as resolved.

```bash
kubectl delete namespace fybrik-notebook-sample
```
14 changes: 14 additions & 0 deletions conf.sample.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

Comment thread
aradhalevy marked this conversation as resolved.
data:
- name: "fybrik-notebook-sample/paysim-csv"
capability: read
format: "csv"
connection:
type: s3
s3:
endpoint_url: http://localstack.fybrik-notebook-sample.svc.cluster.local:4566
vault_credentials:
address: http://vault.fybrik-system:8200
authPath: /v1/auth/kubernetes/login
role: module
secretPath: /v1/kubernetes-secrets/paysim-csv?namespace=fybrik-notebook-sample
18 changes: 18 additions & 0 deletions fybrikapplication.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: app.fybrik.io/v1beta1
kind: FybrikApplication
metadata:
name: my-notebook
labels:
app: my-notebook
spec:
selector:
workloadSelector:
matchLabels:
app: my-notebook
appInfo:
intent: Fraud Detection
data:
- dataSetID: "fybrik-notebook-sample/paysim-csv"
requirements:
interface:
protocol: fybrik-go
48 changes: 48 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
module main

go 1.18

require (
github.com/minio/minio-go/v7 v7.0.50
github.com/rs/zerolog v1.29.0
gopkg.in/yaml.v2 v2.4.0
)

require (
github.com/bytedance/sonic v1.8.0 // indirect
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/gin-contrib/sse v0.1.0 // indirect
github.com/gin-gonic/gin v1.9.0 // indirect
github.com/go-playground/locales v0.14.1 // indirect
github.com/go-playground/universal-translator v0.18.1 // indirect
github.com/go-playground/validator/v10 v10.11.2 // indirect
github.com/goccy/go-json v0.10.0 // indirect
github.com/google/uuid v1.3.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/klauspost/compress v1.16.0 // indirect
github.com/klauspost/cpuid/v2 v2.2.4 // indirect
github.com/kr/pretty v0.3.0 // indirect
github.com/leodido/go-urn v1.2.1 // indirect
github.com/mattn/go-colorable v0.1.12 // indirect
github.com/mattn/go-isatty v0.0.17 // indirect
github.com/minio/md5-simd v1.1.2 // indirect
github.com/minio/sha256-simd v1.0.0 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/pelletier/go-toml/v2 v2.0.6 // indirect
github.com/rogpeppe/go-internal v1.8.0 // indirect
github.com/rs/xid v1.4.0 // indirect
github.com/sirupsen/logrus v1.9.0 // indirect
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
github.com/ugorji/go/codec v1.2.9 // indirect
golang.org/x/arch v0.0.0-20210923205945-b76863e36670 // indirect
golang.org/x/crypto v0.6.0 // indirect
golang.org/x/net v0.7.0 // indirect
golang.org/x/sys v0.5.0 // indirect
golang.org/x/text v0.7.0 // indirect
google.golang.org/protobuf v1.28.1 // indirect
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
gopkg.in/ini.v1 v1.67.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)
Loading