Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/doc-build.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: sphinx-doc-build
name: doc-build

on:
push:
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# CogStack-NiFi

[![nifi](https://github.com/CogStack/CogStack-NiFi/actions/workflows/docker-nifi.yml/badge.svg?branch=main)](https://github.com/CogStack/CogStack-NiFi/actions/workflows/docker-nifi.yml)
[![doc-build](https://github.com/CogStack/CogStack-NiFi/actions/workflows/doc-build.yml/badge.svg?branch=main)](https://github.com/CogStack/CogStack-NiFi/actions/workflows/doc-build.yml)
[![elasticsearch-stack](https://github.com/CogStack/CogStack-NiFi/actions/workflows/docker-elasticsearch-stack.yml/badge.svg?branch=main)](https://github.com/CogStack/CogStack-NiFi/actions/workflows/docker-elasticsearch-stack.yml)

## Introduction

This repository proposes a possible next step in the evolution of free-text data processing originally implemented in [CogStack-Pipeline](https://github.com/CogStack/CogStack-Pipeline), moving towards a more modular, Platform-as-a-Service (PaaS) approach.
Expand Down
2 changes: 1 addition & 1 deletion deploy/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ start-samples:
$(WITH_ENV) docker compose -f services.yml $(DC_START_CMD) samples-db

start-jupyter:
$(WITH_ENV) docker compose -f ../services/cogstack-jupyter-hub/docker/docker-compose.yml $(DC_START_CMD) cogstack-jupyter-hub
$(WITH_ENV) docker compose -f ../services/cogstack-jupyter-hub/docker/docker-compose.base.yml -f ../services/cogstack-jupyter-hub/docker/docker-compose.prod.yml $(DC_START_CMD) cogstack-jupyter-hub

start-medcat-service:
$(WITH_ENV) docker compose -f ../services/cogstack-nlp/medcat-service/docker/docker-compose.yml $(DC_START_CMD) nlp-medcat-service-production
Expand Down
2 changes: 0 additions & 2 deletions deploy/export_env_vars.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ echo "🔧 Running $(basename "${BASH_SOURCE[0]}")..."

set -a

current_dir=$(pwd)

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DEPLOY_DIR="$SCRIPT_DIR"
SECURITY_DIR="$SCRIPT_DIR/../security/env"
Expand Down
15 changes: 15 additions & 0 deletions docs/.markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Disable rules that conflict with Sphinx/MyST conventions
default: true

MD013: false # Line length — Sphinx often has long code blocks / directives
MD033: false # Inline HTML — needed for raw HTML and directives
MD041: false # First line should be a heading — breaks for included partials
MD024: false # Multiple headings with same content — Sphinx pages can reuse titles
MD034: false # Bare URL — Sphinx cross-refs handle this
MD043: false # Required heading structure — not needed for Sphinx TOC

# Optional fine-tuning
MD007:
indent: 2 # Consistent list indentation
MD004:
style: dash # Consistent unordered list marker (-)
8 changes: 6 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,16 @@ Welcome to CogStack-Nifi's documentation!
:caption: Contents:

main.md
news.md
nifi/main.md
security/main.md
security/certificates.md
security/elasticsearch_opensearch.md
security/nifi.md
security/services.yml
deploy/main.md
deploy/services.md
deploy/workflows.md
security.md
news.md


Indices and tables
Expand Down
220 changes: 220 additions & 0 deletions docs/security/elasticsearch_opensearch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# Elasticsearch / OpenSearch Security

This section describes how to secure both **Elasticsearch (native)** and **OpenSearch** clusters used in the CogStack-NiFi stack, including certificate setup, user management, and role configuration.

All related certificates are stored in `security/certificates/elastic/`, and are generated from the shared **Root CA** created via [`create_root_ca_cert.sh`](certificates.md).

---

## 🔒 Overview

Both **Elasticsearch** and **OpenSearch** deployments require:

- TLS certificates for all nodes and HTTPS endpoints,
- secure credentials for built-in and custom users,
- properly configured roles and role mappings.

Certificates and credentials are generated using the scripts provided in `security/scripts/` and are controlled through the `.env` files under `security/env/`.

---

## 📄 Environment files used

All scripts reference the following environment configuration files:

| File | Purpose |
|------|----------|
| `certificates_elasticsearch.env` | Hostnames, instance names, and certificate parameters for ES / OpenSearch nodes |
| `certificates_general.env` | Root CA configuration |
| `elasticsearch_users.env` | Internal user credentials |

Reload them before running any security-related script:

```bash
cd ../deploy
source export_env_vars.sh
cd ../security
```

---

## 🧩 Common certificate layout

Certificate naming and folder structure are consistent across both ES and OpenSearch:

```text
security/certificates/elastic/
├── elasticsearch/
│ ├── elastic-stack-ca.crt.pem
│ ├── elastic-stack-ca.key.pem
│ ├── elasticsearch/
│ │ ├── elasticsearch-{1,2,3}/
│ │ │ ├── http-elasticsearch-*.crt
│ │ │ ├── http-elasticsearch-*.key
│ │ │ ├── http-elasticsearch-*.p12
│ │ │ ├── elasticsearch-*.crt
│ │ │ ├── elasticsearch-*.key
│ │ │ └── elasticsearch-*.p12
│ └── kibana/
│ ├── sample-kibana.yml
│ └── README.txt
└── opensearch/
├── admin.*, es_kibana_client.*, root-ca.*
└── elasticsearch/{1,2,3}/...
```

Each version has its own generation scripts, but they all depend on the same `.env` configuration and naming patterns.

---

## 🏗️ Generating certificates

### Elasticsearch (native)

To generate certificates for Elasticsearch:

```bash
bash ./create_es_native_certs.sh
```

This script creates all required node and HTTP certificates under:

```text
security/certificates/elastic/elasticsearch/elasticsearch-{1,2,3}/
```

The script uses variables such as:

- `ES_INSTANCE_NAME_*` — Node names (match `ELASTICSEARCH_NODE_*_NAME` in `/deploy/elasticsearch.env`)
- `ES_INSTANCE_ALTERNATIVE_*_NAME` — Alternative hostnames
- `ES_HOSTNAMES` — List of all node hostnames
- `ES_CLIENT_SUBJ_ALT_NAMES` / `ES_NODE_SUBJ_ALT_NAMES` — Additional domain aliases for SAN fields

Make sure the environment variables are set correctly before running the script.

---

### OpenSearch

For OpenSearch nodes:

```bash
bash ./create_opensearch_node_cert.sh elasticsearch-1 elasticsearch-2 elasticsearch-3
```

Then generate the admin and client certificates:

```bash
bash ./create_opensearch_client_admin_certs.sh
```

This produces:

| File | Purpose |
|------|----------|
| `admin.pem`, `admin-key.pem` | Admin dashboard certificate |
| `es_kibana_client.pem`, `es_kibana_client.key` | Client certificate for Kibana/OpenDashboard |
| `*.jks` | Node keystores/truststores for HTTPS and inter-node encryption |

The resulting certificates are placed in:

```text
security/certificates/elastic/opensearch/
```

---

## ⚙️ Version variable

Set the ES/OS version in `deploy/elasticsearch.env` before launching containers:

```bash
ELASTICSEARCH_VERSION=opensearch
# or
ELASTICSEARCH_VERSION=elasticsearch
```

This ensures the correct certificate directory (`elasticsearch` or `opensearch`) is mounted into containers.

---

## 📁 Kibana / OpenDashboard certificates

| Platform | Required Certificates | Source Folder |
|-----------|----------------------|----------------|
| **Kibana** | `elasticsearch-{1,2,3}.crt`, `elasticsearch-{1,2,3}.key`, `elastic-stack-ca.crt.pem` | `security/certificates/elastic/elasticsearch/` |
| **OpenDashboard (OpenSearch)** | `admin.pem`, `admin-key.pem`, `es_kibana_client.pem`, `es_kibana_client.key` | `security/certificates/elastic/opensearch/` |

All certificate references in `services/kibana/config/kibana_opensearch.yml` or `services.yml` must point to these locations.

---

## 🔐 Users and roles

### OpenSearch

1. Edit `security/es_roles/opensearch/internal_users.yml` to define users.
2. Optionally generate password hashes:

```bash
bash ./create_opensearch_internal_passwords.sh
```

3. Apply changes by recreating containers:

```bash
docker compose down -v
docker compose up -d
```

4. Use `create_opensearch_users.sh` to populate roles and user mappings.

OpenSearch includes default roles (`admin`, `kibanaserver`, `readall`, `snapshotrestore`, etc.) — always change their passwords after first run.

---

### Elasticsearch (native)

Run after containers start:

```bash
bash ./create_es_native_credentials.sh
```

This script creates system users, roles, and a service account token for Kibana.

You can modify credentials in `security/env/elasticsearch_users.env`.

**New roles** created:

- `ingest` — for NiFi and pipeline ingestion (`cogstack_*`, `nifi_*` indices)
- `cogstack_access` — read-only access to `cogstack_*` and `nifi_*`

**New users**:

- `nifi` → `ingest`
- `cogstack_user` → `cogstack_access`

---

## ⚠️ Notes

- The `security/certificates/` folder is also **mounted inside NiFi** so NiFi processors can access ES/OS securely without restarting.
- For OpenSearch role details, see the [OpenSearch Security Plugin documentation](https://opensearch.org/docs/latest/security-plugin/index/).
- For Elasticsearch, refer to the [official Elastic Security docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-security.html).

---

## ✅ Verification

To verify HTTPS access and trust:

```bash
curl -vk --cacert ./root-ca.pem https://elasticsearch-1:9200
```

To check inter-node encryption (inside a container):

```bash
openssl s_client -connect elasticsearch-2:9300 -CAfile ./root-ca.pem
```
50 changes: 50 additions & 0 deletions docs/security/services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Service Security

## General services TLS configuration

All services in cogstack that are not listed in `deploy/services.yml`and use TLS will be described on this page. For services that are still part of the stack but in `dervices/<SERVICE_NAME>` (again, service not also present in `deploy/services.yml`) the TLS setup is handled differently, and the setup is described in each service's README.md. Generally, most should just use the root-ca certs from `security/certificates/root/`.

## Gitea TLS Configuration

This section describes how **Gitea** is secured using the shared **Root Certificate Authority (CA)** generated by `create_root_ca_cert.sh`.

Unlike other services (such as NiFi or Elastic), **Gitea does not require its own dedicated certificate pair** or an NGINX reverse proxy.
It operates directly with the **Root CA** to provide HTTPS encryption and mutual trust within the CogStack-NiFi stack.

---

### 📁 Certificate source

All certificates used by Gitea originate from:

```text
security/certificates/root/
```

| File | Purpose |
|------|----------|
| `root-ca.pem` | Public CA certificate used by Gitea for HTTPS trust |
| `root-ca.key` | Root CA private key (used only when generating new certificates) |
| `root-ca.p12` | Optional PKCS#12 keystore (not required by Gitea) |


### 🧠 Notes

- The **Root CA** (`root-ca.pem`) is shared across all CogStack services for internal TLS trust.
- You do **not** need to create a new `gitea.crt` or `gitea.key`; the Root CA cert/key pair is sufficient.
- Ensure `root-ca.key` remains private and is not committed to version control.
- The same CA also secures NiFi, Elasticsearch, OpenSearch, Kibana, and JupyterHub.

---

### ✅ Verification

To confirm Gitea is serving HTTPS correctly:

```bash
curl -vk --cacert ./security/certificates/root/root-ca.pem https://gitea.local:2222/
```

You should see a valid TLS handshake and an HTTP 200 response.

---
8 changes: 4 additions & 4 deletions nifi/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,10 @@ WORKDIR /opt/nifi/nifi-current/lib/
RUN mkdir -p /opt/nifi/groovy
WORKDIR /opt/nifi/groovy/

RUN curl https://archive.apache.org/dist/groovy/5.0.0/distribution/apache-groovy-binary-5.0.0.zip --output apache-groovy-binary-5.0.0.zip --max-time 3600 && \
unzip apache-groovy-binary-5.0.0.zip && \
rm apache-groovy-binary-5.0.0.zip
ENV GROOVY_BIN=/opt/nifi/groovy/groovy-5.0.0/bin
RUN curl https://archive.apache.org/dist/groovy/5.0.2/distribution/apache-groovy-binary-5.0.2.zip --output apache-groovy-binary-5.0.2.zip --max-time 3600 && \
unzip apache-groovy-binary-5.0.2.zip && \
rm apache-groovy-binary-5.0.2.zip
ENV GROOVY_BIN=/opt/nifi/groovy/groovy-5.0.2/bin
RUN $GROOVY_BIN/grape -V install org.apache.avro avro 1.12.0

# copy configuration files
Expand Down
8 changes: 8 additions & 0 deletions nifi/conf/logback.xml
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,14 @@
<!-- Py4J set to ERROR to avoid verbose socket communication messages -->
<logger name="py4j" level="ERROR" />

<logger name="org.apache.nifi.python" level="DEBUG"/>
<logger name="org.apache.nifi.python.processor" level="DEBUG"/>
<logger name="org.apache.nifi.python.processor.PythonProcessorAdapter" level="DEBUG"/>
<logger name="org.apache.nifi.python.processor.PythonProcessorProxy" level="DEBUG"/>
<logger name="org.apache.nifi.python.exec" level="DEBUG"/>
<logger name="org.apache.nifi.python.controller" level="DEBUG"/>
<logger name="org.apache.nifi.python.processor.PythonProcessorLoader" level="DEBUG"/>

<logger name="org.apache.zookeeper.ClientCnxn" level="ERROR" />
<logger name="org.apache.zookeeper.server.NIOServerCnxn" level="ERROR" />
<logger name="org.apache.zookeeper.server.NIOServerCnxnFactory" level="ERROR" />
Expand Down
2 changes: 1 addition & 1 deletion nifi/recreate_nifi_docker_image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ if [[ $NIFI_GID == 1000 ]]; then
NIFI_GID=$(id -g)
fi

docker build --build-arg GID=${NIFI_GID} --build-arg UID=${NIFI_UID} -t cogstacksystems/cogstack-nifi:latest -f Dockerfile .
docker build --build-arg GID=${NIFI_GID} --build-arg UID=${NIFI_UID} -t cogstacksystems/cogstack-nifi:latest -f Dockerfile .
Loading