Skip to content

rrivirr/rriv-cloud

Repository files navigation

RRIV Cloud

Kubernetes infrastructure for rriv's IoT cloud

Authors:

Prerequesites

  • Proton Pass - for storing credentials to various accounts
  • DigitalOcean account access - this is where our cloud lives
  • Tailscale VPN access - this is the VPN that gatekeeps some of our infrastructure
  • kubectl - k8s CLI tool
  • Helm and Helmfile - Kubernetes templating
  • Helm Diff Plugin installed: helm plugin install https://github.com/databus23/helm-diff
  • doctl - CLI tool for DigitalOcean
  • Terraform - IaC for our multi-cloud environment

Background

RRIV cloud is set up to use DigitalOcean (DO) for its Chirpstack servers. Its main database is a Postgres managed db, also in DO. There are two k8s clusters for each environment - one for the project, and one for the Vault secrets.

We use helm for configuring the k8s clusters. Additionally, the DO infrastructure is captured in Terraform. The Terraform statefiles live in DO.

AWS is used sparingly. Its main purpose is to act as the guardian of our Vault root token. For this, we have some very basic infrastructure captured in Terraform as well, stored again in DigitalOcean statefiles.

This diagram attempts to show where each piece of infrastructure lives, and where various resources retrieve data/access from.

rriv network

How-Tos

There are a handful of resources that have been set up manually. If you are setting up your own project, you will need to create these yourself in your own cloud provider accounts:

  • An infrastructure management AWS account (ours is named rriv)
  • AWS S3 bucket in the rriv account for the initial AWS Terraform statefile (see ./00-tf-bootstrap/account-setup/backend.hcl.example)
  • 4 DO Teams linked to the same organization: rriv-management, rriv-dev, rriv-staging, rriv-prod
  • DO S3 bucket in each of the dev, staging, and prod team accounts - for storing the Terraform statefiles - see the backend.hcl.example files in each environment in ./02-tf/environments
  • DO S3 bucket access key with read/write/delete on the above S3 buckets (limited scope, and unable to be created with Terraform)

Deployment order

This section is under construction :)

  1. 00-tf-bootstrap: initial terraform that is applied from within this directory to set up AWS resources that then administer the DO resources
  2. TODO: resources in 02-tf need to be moved into the bootstrap directory
  3. 01-helm: Vault and rriv k8s clusters setup
  4. Unseal the Vault clusters in the various environments
  5. 02-tf: Final terraform to create the remaining resources, including DNS and Vault secrets. You must begin with environments/management/, then do dev, etc.

First-time setup

In order to get set up with k8s, you must first be able to run the terraform. The tf maintains the k8s cluster itself and everything around it. Helm maintains everything inside the cluster.

  1. Open the .zshrc.example file and follow the instructions.
  2. See the section below on "Applying Terraform changes". The secret tfvars and backend.hcl must be set up.
  3. Read below to get authenticated into DigitalOcean. Before you can run kubectl commands, run a terraform refresh to generate the kubeconfig files from each environment directory (./02-tf/environments/...)

Applying Terraform changes

You will need to set up the following files (copy and alter the .example files):

secrets.auto.tfvars

  • DO token: This is the token that you use to authenticate to DO and apply terraform. See the section "Authenticating into DigitalOcean" on how to create a new PAT. Make sure it has permissions to modify clusters and droplets - you may need to create a new one if it doesn't.
  • Tailscale auth key: Auth keys are generated by an admin here: https://login.tailscale.com/admin/settings/keys. You can ask an admin to generate a key for you if you are unable to. Save this in your Proton Pass personal vault.
  • ssh fingerprint: Create an ssh key locally: ssh-keygen -t ed25519 -C "your_email@rriv.org". Go to the Security section of the Settings page of the rriv account in DO, and add your key. Copy the fingerprint.

backend.hcl

  • DO bucket & key: Get these from a project admin.
  • DO access key & secret key: Create an access key in the Access Keys section of the Spaces Object Storage in DO. Save this in your Proton Pass personal vault.

To apply Terraform changes, cd into the environment directory (./02-tf/environments/...) and run your terraform command from here. It is important that you begin with the management environment, in order to populate DNS from the loadbalancers that helm already created. When you run terraform init, you must pass in -backend-config=backend-$ENV.hcl.

Debugging "Unauthorized Resource" Terraform Errors

If you get this error on the kubernetes terraform resources, you may need to re-apply the k8s-cluster module in order to regenerate the kubeconfig: tf apply -target=module.$ENV_do_sfo2_k8s_rriv_cluster or tf apply -target=module.$ENV_do_sfo2_k8s_vault_cluster

Authenticating into DigitalOcean

The first time you authenticate, you will need to set up DigitalOcean's CLI, doctl. See DO's documentation for using the package manager of your choice.

Log in to DO on the web. Go to the API tokens page and create a new Personal Access Token. Give it an expiration of 90 days (for security purposes, we don't want to create forever tokens). Give it full access, as you will need to be able to touch most different kinds of resources in order to develop in DO. Save the PAT in your personal Proton Pass vault.

With the PAT copied, run doctl auth init --context rriv-dev. Paste in the token. You will need to come back and do this step with each team (account) you plan to develop in.

Now run: doctl kubernetes cluster list. You should see the rriv-dev cluster and the vault-dev clusters.

You should now be able to run kubectl get no and see the DO droplets.

Applying k8s changes

Changes to the k8s yaml configurations must be applied using the helmfile command. helmfile must be run from the helm directory.

Show all new changes without applying them: helmfile -e dev diff Apply all new changes from all k8s yamls: helmfile -e dev apply

To target a specific module using release name in the helmfile gotmpl: helmfile -e dev apply --selector name=rriv-chirpstack

Chirpstack k8s

The k8s files were generated from this chirpstack-docker project.

The Chirpstack Gateway Bridge receives packets from LoRaWAN devices in the field. Currently we are hardcoded to a single channel plan, us915_1. In order to use multiple channel plans, we need to run separate gateway bridges with their own ClusterIp services, and route to them behind an nginx loadbalancer.

Credentials

Admin credentials are stored in ProtonPass. Have an admin create an account for you.

Vault

Hashicorp Vault is installed in a separate k8s cluster. To switch over:

kubectl config get-contexts
kubectl config use-context <vault-context-name>

This should update the current cluster to be the one containing Vault.

The Vault cluster is still maintained under the same root helmfile, as a separate release. The environments directory takes care of pointing to the correct cluster when updating Vault.

Initial setup

You may need to manually install the dependencies with the CRDs.

After this you may run helmfile -e $ENV apply.

Vault is set to auto-unseal via AWS KMS keys. This means that as vault pods go down, they come back up without needing to be manually unsealed. However, the first time vault is installed, manual unseal is required. To do this:

$ kubectl exec -it rriv-vault-0 -n vault -- /bin/sh
/ $ export VAULT_ADDR="https://127.0.0.1:8200"
/ $ export VAULT_SKIP_VERIFY=true # skip TLS verification
/ $ vault status # check your connection
Key                      Value
---                      -----
Seal Type                awskms
Recovery Seal Type       n/a
Initialized              false
Sealed                   true
...

/ $ vault operator init
Recovery Key 1: ...
Recovery Key 2: ...
Recovery Key 3: ...
Recovery Key 4: ...
Recovery Key 5: ...

Initial Root Token: ...

Success! Vault is initialized
...

(If any of this errors, try adding the flag: -tls-skip-verify)

After this, it is critical that you write down the recovery keys and root token. SAVE these in Proton Pass. Save half of the recovery keys and distribute half to another admin.

You must now exec into the rriv-vault-0 pod and unseal Vault. You only need to do this once.

Now, if the vault pod goes down, it should be able to come up and unseal itself. Test this by deleting the pod and checking its logs.

The Terraform Vault module needs a bit of manual work to set up. Once you have Vault installed, you will need to create a temporary token in order to set up the Vault auth method for k8s. Refer to vault-kubernetes.sh.

How it works

The Vault agent injects secrets into pods that are specified via annotations. For an example of this, look at the Chirpstack deployment file. Secrets are injected into /vault/secrets/credentials, then used by the container. This centralizes secrets across cloud resources and keeps them out of plaintext k8s.

The secrets needed by Chirpstack are:

POSTGRES_CONN_STRING        # Chirpstack's database connection string
PG_INTEGRATION_CONN_STRING  # The TimescaleDB database that Chirpstack pipes data into

Command line

Install the CLI tool via the package manager of your choice: https://developer.hashicorp.com/vault/install

With your kubectl context set to the Vault cluster (see above), forward traffic to localhost:

kubectl port-forward svc/vault -n vault 8200:8200

In another terminal:

export VAULT_ADDR=http://localhost:8200
vault secrets list

Metrics Server

Installed with: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Queue

The message queue uses nats.

TBD: Might need to create the PVC manually with the pvc.yml file.

Modify the consumers in jetstream.yaml as desired. For more info: https://github.com/nats-io/k8s/blob/main/helm/charts/nack/README.md.

To test the sending and receiving of messages:

# Run nats-box which includes the NATS management utilities:
kubectl exec -it deploy/nats-box -- /bin/sh -l

# Publish a couple of messages
$ nats req orders.received "order 1"
$ nats req orders.received "order 2"

# Pull first message.
$ nats consumer next mystream my-pull-consumer

# Subscribe to the message stream:
$ nats sub my-push-consumer.orders

Keycloak

Keycloak is used as an identity provider throughout the project. Its purpose is twofold:

  1. Authenticate end users who wish to use the rriv monitoring software.
  2. Act as an IDP for the VPN, which gates access to certain private backend services, (such as Vault).

Once you have applied the $ENV_vault_sfo2 module, you are ready to log in to keycloak. You may need to also do a helm apply/sync again, in case the certs aren't working yet: helmfile -e $ENV sync --selector name=rriv-network

  1. To use keycloak, you must first log in with the keycloak_username/keycloak_password credentials that are stored in Vault in the <$ENV>-keycloak-creds secret. Use these to log in and manually create in the UI a new Client with service accounts activated.
  2. You can leave "OpenID Connect" selected.
  3. Give the client the name "terraform".
  4. Turn on "Client Authentication" and make sure that the following are checked: "OAuth 2.0 Device Authorization Grant", "Direct access grants", and "Service accounts roles". Root URL and Home URL should be set: "https://auth.<$ENV>.rriv.org" or "https://auth.rriv.org" for prod.
  5. Put the credentials in your local secrets file so that terraform can use it to create a "rriv-beta" realm.
  6. Under the "Service accounts roles" tab, select "Assign role" > "Filter by realm roles" > select "admin" > Save. From here, once you have applied the keycloak module terraform, you can log in and create users.

Terraform

You must authenticate with DO and AWS before running terraform. Use doctl to authenticate to DO.

For AWS, you need an SSO user. After logging in, you will have the option to retrieve access/secret keys. Paste these into your terminal and run aws sts get-caller-identity to verify that you are logged in as a user with an admin role.

A note about the kubernets cluster resource node_size variable

To update the node_size variable in the k8s-cluster module, you must NOT allow the cluster to be destroyed. This is a shortcoming of the current setup. Upgrade each node one at a time to the new size, and change the node_size value to reflect the changes when you are done. Vault data is persisted in k8s persistent volumes, but currently, they will be destroyed if the entire cluster is destroyed. Having the nodes come back up one at a time ensures that there is always a leader node that can pass the data along.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

 
 
 

Contributors

Languages