diff --git a/.github/workflows/doc-build.yml b/.github/workflows/doc-build.yml index 25f6c83..cd098c3 100644 --- a/.github/workflows/doc-build.yml +++ b/.github/workflows/doc-build.yml @@ -23,4 +23,5 @@ jobs: pip3 install -r requirements.txt make clean # Fail buiild on any docs warning - make html O=-W \ No newline at end of file + # make html O=-W # Removed whilst migrating existing docs + make html \ No newline at end of file diff --git a/.readthedocs.yml b/.readthedocs.yml index 2af7806..0b2f93b 100644 --- a/.readthedocs.yml +++ b/.readthedocs.yml @@ -11,7 +11,7 @@ build: sphinx: configuration: docs/conf.py - fail_on_warning: true + fail_on_warning: false # Removed warnings to migrate existing docs python: install: diff --git a/docs/Makefile b/docs/Makefile index d4bb2cb..b7e8724 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -18,3 +18,6 @@ help: # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +build: + sphinx-autobuild . _build/ \ No newline at end of file diff --git a/docs/cogstack-logo.png b/docs/cogstack-logo.png new file mode 100644 index 0000000..0b60483 Binary files /dev/null and b/docs/cogstack-logo.png differ diff --git a/docs/conf.py b/docs/conf.py index 4fd3c9e..0cf05c1 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -10,11 +10,11 @@ # -- Project information ----------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information -project = 'CogStack Platform Toolkit' +project = 'CogStack Documentation' copyright = '2025, CogStack Org' author = 'CogStack Org' release = 'latest' -html_title = "CogStack Platform Toolkit" +html_title = "CogStack Documentation" # -- General configuration --------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration @@ -24,14 +24,38 @@ 'sphinx.ext.autodoc', 'myst_parser', 'sphinx.ext.inheritance_diagram', + 'sphinx.ext.intersphinx' ] + templates_path = ['_templates'] exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] - # -- Options for HTML output ------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output html_theme = "furo" html_static_path = ['_static'] +html_logo = "cogstack-logo.png" + +intersphinx_mapping = { + "sphinx": ("https://www.sphinx-doc.org/en/master/", None), +} +intersphinx_disabled_reftypes = ["*"] + +myst_enable_extensions = [ + "amsmath", + "attrs_inline", + "colon_fence", + "deflist", + "dollarmath", + "fieldlist", + "html_admonition", + "html_image", + # "linkify", + "replacements", + "smartquotes", + "strikethrough", + "substitution", + "tasklist", +] diff --git a/docs/index.md b/docs/index.md index 6402543..e718036 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,15 +1,43 @@ - -# Cogstack Platform Toolit +# Cogstack Documentation -This project provides utilities for running Cogstack in production. +Welcome to the CogStack Documentation site. + +Get started by looking at the [CogStack Overview](overview/cogstack-documentation.md) + +Any broad questions then please do reach out in our community space [here](https://discourse.cogstack.org/) + +Further in development projects are [here](https://github.com/orgs/CogStack/repositories) + +![](./overview/attachments/43c14755-e565-4ae0-a0a3-ec6dc18a691c.png) + +| Tool | Description | +|:-----|:------------| +| ![CogStack-Nifi](overview/attachments/36c0d23f-a632-4fbf-9f7c-6669e88bbd39.png){width=100}
[**CogStack-Nifi**](https://cogstack-nifi.readthedocs.io/en/latest/main.html) | Data flow orchestration using Apache NiFi | +| ![MedCAT](overview/attachments/09a8bb60-9864-41fa-be7b-cf9a9dc04498.png){width=100}
[**MedCAT**](https://medcat.readthedocs.io/en/latest/) | Medical Concept Annotation Toolkit | +| ![MedCATTrainer](overview/attachments/09a8bb60-9864-41fa-be7b-cf9a9dc04498.png){width=100}
[**MedCATTrainer**](https://medcattrainer.readthedocs.io/en/latest/) | Web-based annotation and training interface for MedCAT | -- [CogStack Observability](observability/_index.md) ```{toctree} :hidden: +:maxdepth: 5 +overview/_index + +``` -observability/_index +```{toctree} +:hidden: +:caption: CogStack NLP +MedCAT +MedCAT Trainer ``` +```{toctree} +:hidden: +:caption: CogStack Platform + +NiFi + +platform/_index +``` diff --git a/docs/overview/CogStack ecosystem (v1).md b/docs/overview/CogStack ecosystem (v1).md new file mode 100644 index 0000000..36fab95 --- /dev/null +++ b/docs/overview/CogStack ecosystem (v1).md @@ -0,0 +1,152 @@ +# CogStack ecosystem (v1) + +In this part are covered the available services that can be running in an example CogStack deployment. To such deployment with many running services we refer as an  *ecosystem* or a *platform*. Below is presented a high-level perspective of CogStack platform with the possibilities it enables through many components and services. In practice, many of the functionalities that CogStack platform enables are implemented as separate, but interconnected services working inside the ecosystem. + +## Core services + +In most scenarios CogStack platform will consist of *core* services tailored to specific use-cases. Additional application and services can be run on top of it, such as [SemEHR](../../CogStack%20General/CogStack%20Wiki/CogStack%20projects/SemEHR.md), [Patient Timeline](../../CogStack%20General/CogStack%20Wiki/CogStack%20projects/Patient%20Timeline.md), Live Alerting (through ElasticSearch plugins) or any other custom developed applications. For an ease-of-use, when deploying a sample CogStack platform, we always emphasise to use Docker Compose (see: [Running CogStack](Running%20CogStack.md)). + +Below is presented is one of the most simple and common scenarios when ingesting and processing the EHR data from a proprietary data source. + +![](./attachments/5503ea0e-ac74-40ba-936a-1287ad3f1cf5.png) + +A CogStack platform presented here consists of such core services: + +- *CogStack Pipeline* service for ingesting and processing the EHR data from the source database, +- *CogStack Job Repository* (PostgreSQL database) serving for job status control, +- *ElasticSearch* sink where the processed EHR records are stored, +- (optional) *Kibana* user interface to easily perform exploratory data analysis over the processed records. + +It is essential to note that presented is a very simplified scenario, which can be easily deployed even on a local machine with limited resources. We are also using here an optional Kibana as an out-of-the-box and easy to use solution to explore the data, although many other data analysis or BI tools can be used. Moreover, there are also available connectors to ElasticSearch in many languages, such as Java, Python, R or JavaScript allowing for fast development of custom user applications. + +:::{tip} +Note + +In the picture we only presented ElasticSearch using a single node. However, in practice, one should consider using at least 3 asticSearch nodes deployed as a cluster which greatly improves resilience, query performance and reliability. +Similarly, in the picture we only presented one CogStack Pipeline instance and only one data source. However, in practice, there may be multiple sources available with multiple Pipeline components running in parallel. This is why, when considering deploying CogStack platform in production, one should keep in mind the aspects of the scalability and resilience of the platform and running services. +::: + + +### CogStack Pipeline + +CogStack Pipeline is the main data processing service used inside the CogStack platform. Within the ecosystem it's main responsibilities is to ingest the EHR data from a specified data source, process the data (e.g. by applying the text extraction methods, records de-identification or extracting the NLP annotations) and store the resulting data in the specified sink. + +Usually, the sink will be the ElasticSearch store, keeping the processed EHRs which can be ready to use by other applications. However, when performing computationally-expensive processing tasks, such as running OCR-based text extraction from the documents, one may prefer to store the partial results in a cache. In such case, PostgreSQL can be used as a temporary store – [Examples](Examples.md) covers such case. + +The information about available data processing components offered by CogStack Pipeline can be found in [CogStack Pipeline](CogStack%20Pipeline.md) part. + +:::{ifno} +We recommend using CogStack Pipeline component in the newest version 1.3.0. +::: + +--- + +--- + + + +### PostgreSQL + +[PostgreSQL](https://www.postgresql.org/) is a widely used object-relational database management system. In CogStack platform it is primarily used as a job repository, for storing the jobs execution status of running CogStack Pipeline instances. However, there may be cases where one may need to store the partial results treating PostgreSQL DB either as a data cache (see: [Examples](Examples.md) ) or an auxiliary data sink. + +When used as a job repository, it requires defining appropriate tables with a user that will be used by CogStack Pipeline running instance(s). This schema is defined by [Spring Batch META-DATA schema definition](https://docs.spring.io/spring-batch/trunk/reference/html/metaDataSchema.html) and is also available in `CogStack-Pipeline/examples/docker-common/pgjobrepo/create_repo.sh` script. + +:::{Info} +We recommend using PostgreSQL in versions >= 10. +In the [Examples](Examples.md) part we use PostgreSQL in version 11.1. +::: + +:::{warning} +Note + +PostgreSQL by default has a connection limit of 100.  Since a single CogStack Pipeline instance using multiple processing threads uses a connection pool both for retrieving the EHR data from the database source and to update the job repository, one may need to increase the default connection limit with the available memory buffers. To do so, one may specify parameters: `"-c 'shared_buffers=256MB' -c 'max_connections=1000'"` when initialising the database. +::: + +### ElasticSearch + +[ElasticSearch](https://www.elastic.co/guide/) is a popular NoSQL search engine based on the Lucene library that provides a distributed full-text search engine storing the data as schema-free JSON documents. Inside CogStack platform it is usually used as a primary data store for processed EHR data by CogStack Pipeline. + +Depending on the use-case, the processed EHR data is usually stored in indices as defined in corresponding CogStack Pipeline job description property files (see: [CogStack Pipeline](CogStack%20Pipeline.md)). Once stored, it can be easily queried either by using the own's REST API (see: [ElasticSearch Search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html)), queried using [Kibana](#kibana) or queried using a ElasticSearch connector available in many programming languages. ElasticSearch apart from standard functionality and features provided in its open-source free version also offers more advanced ones distributed as [Elastic Stack](https://www.elastic.co/products/stack) (formerly: X-Pack extension) which require license. These include modules for machine learning, alerting, monitoring, security and more. + +:::{tip} +In our [Examples](Examples.md) we use the free, open-source version of ElasticSearch without the Elastic Stack modules included. It needs to be noted that in cases when one requires a secure and/or granular access to the processed EHR data in ElasticSearch sink, one should explore the [Security](https://www.elastic.co/guide/en/x-pack/current/elasticsearch-security.html) module (formerly: Shield) offered in the Elastic Stack. Some of the features include (as stated the official website): +- Preventing unauthorised access with password protection, role-based access control (even per index- or single document-level), and IP filtering. +- Preserving the integrity of your data with message authentication and SSL/TLS encryption. +- Maintaining an audit trail so one know who’s doing what to your cluster and the data it stores. +CogStack Pipeline fully supports the functionality provided by the ElasticSearch Security module used to securely access the node(s). +::: + +:::{Info} +In our [Examples](Examples.md) we use a simple, single-node ElasticSearch deployment. However, in practice, one should consider using at least 3 ElasticSearch nodes deployed as a cluster which greatly improves resilience, query performance and reliability. +::: + +:::{important} +We recommend using ElasticSearch in versions >= 6.0. +::: + + +:::{warning} +Note + +If ElasticSearch service does not start up and such error is reported: + +> elasticsearch    | ERROR: [1] bootstrap checks failed +> elasticsearch    | [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536] + +one may need to increase the number of available file descriptors on the **host** machine – please refer to:  +::: + +:::{warning} +Note + +If ElasticSearch service does not start up and such error is reported: + +> elasticsearch    | ERROR: [1] bootstrap checks failed +> elasticsearch    | [1]: max virtual memory areas vm.max\_map\_count [65530] is too low, increase to at least [262144] + +one may need to increase the number of available virtual memory on the **host** machine – please refer to:  +::: + +--- + +--- + +### Kibana + +[Kibana](https://www.elastic.co/products/kibana) is a data visualisation module for ElasticSeach that be easily used to explore and query the data. In sample CogStack platform deployments it can be used as a ready-to-use data exploration tool. + +Apart from providing exploratory data analysis functionality it also offers administrative options over the ElasticSearch data store, such as adding/removing/updating the documents using command line or creating/removing indices. Moreover, custom user dashboards can be created according to use-case requirements. For a more detailed description of the available functionality please refer to the [official documentation](https://www.elastic.co/guide/en/kibana/current/introduction.html). + +:::{info} +In all our [Examples](Examples.md) we provide ElasticSearch bundled with Kibana. +::: + +--- + +--- + +### NGINX + +NGINX is a popular, open-source web server that can also be used as a reverse proxy, load balancer, HTTP cache and more. In CogStack platform deployments, it can be used as a reverse-proxy and providing a basic security access to the exposed data stores and service endpoints. Some of the functionality may include general user-based authentication, IP filtering and selective service access. A more detailed description of security features offered by NGINX can be found in the [official documentation](https://docs.nginx.com/nginx/admin-guide/security-controls/). + +[Examples](Examples.md) covers a simple use-case with NGINX serving as a basic authentication module. The example configuration of NGINX running as a proxy can be found in `CogStack-Pipeline/examples/docker-common/nginx/config/` directory. + +:::{info} +It needs to be noted, however, that the security and granularity of access to the data stored in ElasticSearch offered by NGINX is inferior to using the [Security](https://www.elastic.co/guide/en/x-pack/current/elasticsearch-security.html) module from Elastic Stack. +::: + +--- + +--- + +### Fluentd + +[Fluentd](https://www.fluentd.org/) is an open source data collector providing a unified logging layer. In sample CogStack platform deployments it can be used running as a service collecting the logs from all the running services which can be used for auditing. + +Fluentd provides a highly configurable and flexible set of rules, filters and plugins that can be used to set the logging for any running service inside the platform. The [official Fluentd documentation](https://docs.fluentd.org/v1.0/articles/quickstart) covers many Fluentd examples with detailed description. + +[Examples](Examples.md) covers a simple use-case with using Fluentd for logging. The example configuration file can be found in `CogStack-Pipeline/examples/docker-common/fluentd/conf/` directory. + +--- + +--- diff --git a/docs/overview/Data pipelines.md b/docs/overview/Data pipelines.md new file mode 100644 index 0000000..541b071 --- /dev/null +++ b/docs/overview/Data pipelines.md @@ -0,0 +1,85 @@ + + + +# Data pipelines + +## Introduction + +This page covers the data pipelines used in CogStack ecosystem. + +:::{warning} +Please note that CogStack-Pipeline was the initial implementation of CogStack platform and this pipeline engine is being deprecated – we are moving forward with porting the existing pipeline functionality using Apache NiFi as the main data processing engine (see below: **CogStack-NiFi**). +::: + +## CogStack-Pipeline + +### Overview + +CogStack-Pipeline is an application for executing data pipelines for performing EHR data ingestion from databases to ElasticSearch (primarily) or other databases. It implements a fixed set of ETL operations including extraction of text from binary documents using Apache Tika, running NLP applications based on [GATE NLP suite](https://gate.ac.uk/) and a custom de-identification application based on text scrubbing. It was build in Spring Batch and implements only a document-oriented data processing model. For a complete description on CogStack-Pipeline please refer to [the official documentation](https://cogstack.atlassian.net/wiki/spaces/COGDOC). + +:::{IMPORTANT} +The latest version of CogStack Pipeline is 1.3.1. +::: + +### Key resources + +- Documentation: [https://cogstack.atlassian.net/wiki/spaces/COGDOC](/wiki/spaces/COGDOC) +- Deployment examples: [Examples](Examples.md) +- GitHub: +- DockerHub: + +## CogStack-NiFi + +### Overview + +| | | +|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------| +| CogStack-NiFi is the re-architected version of CogStack-Pipeline that replaces the fixed Spring Batch-based pipeline engine with [Apache NiFi](https://nifi.apache.org/). It focuses on fully configurable and scalable data flows with the data processing engine that is easy to use, deploy and tailor to any site-specific data flow requirements. Apache NiFi also comes in with build-in monitoring, data provenance and security features that puts the operations in better control and reliability.
**CogStack-NiFi useful links:**

**Apache NiFi resources:**

| ![](./attachments/b5fc6b57-faf2-4747-9e77-eb9adf51d8b3.jpg) | + +:::{IMPORTANT} +Please note that CogStack-NiFi project is still under active development with the newest version **0.1.0**. +::: + +### Apache NiFi – overview + +*From the official documentation:* Apache NiFi is a dataflow system based on the concepts of flow-based programming. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. NiFi has a web-based user interface for design, control, feedback, and monitoring of dataflows. It is highly configurable along several dimensions of quality of service, such as loss-tolerant versus guaranteed delivery, low latency versus high throughput, and priority-based queuing. NiFi provides fine-grained data provenance for all data received, forked, joined cloned, modified, sent, and ultimately dropped upon reaching its configured end-state. + +Some of the key features of Apache NiFi engine are: + +- Highly configurable and extendable + + - Can build own data processors and modules that can be easily integrated into data pipeline + - Enables rapid prototyping, development and effective testing + - Data flows can be modified, inspected and troubleshoot at runtime +- Web-based user interface + + - Seamless experience between design, control, feedback, and monitoring of the data flows +- Data Provenance + + - Can track data flow from beginning to end for addressing information governance requirements +- Security + + - Support for SSL, SSH, HTTPS, encrypted content, etc. + - Multi-tenant authorization and internal authorization/policy management + +For a detailed description of Apache NiFi, it’s functionality and broad set of features please refer to [the official documentation](https://nifi.apache.org/docs.html) and [the official Apache NiFi website](https://nifi.apache.org/). + +### Major changes from CogStack-Pipeline + +There are some key major changes when using and deploying Apache NiFi as compared with CogStack-Pipeline. + +One of the most important changes is the way how defining, configuring and monitoring data flows works. When using CogStack-Pipeline the ingestion jobs were defined in `.properties` files and were having very limited job execution monitoring and troubleshooting possibilities. Apache NiFi implements (an optional) web-based user interface that can be used to define data flows on drag-and-drop fashion with further configuration and monitoring capabilities. The data flow definitions can be saved and exported into XML format and later loaded into other instances of Apache NiFi or just kept under version control. + +Each ingestion job that is being run by CogStack-Pipeline also requires a separate CogStack-Pipeline application instance. In Apache NiFi multiple data flows can be run in parallel each being managed by a single, main Apache NiFi data processing engine instance. + +Moreover, one of the main limitations of CogStack pipeline has been support only for a document-centric data model for performing ingestion where each ingested record could only contain one document to be processed. Apache NiFi does not enforce document-centric data model and provides flexibility on defining custom data flows and data schemas. Handling multiple documents in a single record or using a patient-centric data model is a matter of tailoring the pipeline and defining or tailoring appropriate schema. + +Moreover, fixed ETL operations (implemented as modules in CogStack-Pipeline) can be included as custom ETL scripts or application modules inside a defined Apache NiFi data flow. For example, the text extraction done by [Apache Tika](https://tika.apache.org/) and NLP functionality (such as running [MedCAT](https://github.com/CogStack/MedCATservice) or [GATE NLP](https://github.com/CogStack/gate-nlp-service) applications was implemented as external micro-services exposing that expose a REST API and hence can be used directly in the data flow. All the third-party application dependencies are handled by the external services that further allows for separating the responsibilities. + +:::{IMPORTANT} +Please note that the recommended minimal resources requirements for running Apache NiFi will be higher than for CogStack-Pipeline and these will depend on the actual use-case. +::: + +### Example deployment and services + +Please see [CogStack-NiFI example deployment with workflow examples](https://github.com/CogStack/CogStack-NiFi/tree/devel/deploy) . diff --git a/docs/overview/Elasticsearch.md b/docs/overview/Elasticsearch.md new file mode 100644 index 0000000..113665b --- /dev/null +++ b/docs/overview/Elasticsearch.md @@ -0,0 +1,53 @@ + + + +# Elasticsearch + +## Introduction + +In CogStack the Elasticsearch ecosystem is being used extensively. It plays a key role of being the main data store and one of the key analytics tools able to fast query the free-text data. + +## Elasticsearch + +[Elasticsearch](https://www.elastic.co/guide/) is a leading NoSQL search engine based on the Lucene library that provides a distributed full-text search engine storing the data as schema-free JSON documents. Inside CogStack platform it is usually used as a data store for processed EHR free-text and annotation data. + +Depending on the use-case, the processed EHR data is usually stored in indices as defined in data pipeline jobs. Once stored, it can be easily queried either by using the own's REST API (see: [ElasticSearch Search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html)), queried using [Kibana](https://cogstack.atlassian.net/wiki/pages/resumedraft.action?draftId=17006639#CogStackplatform-platform-kibana) or queried using a ElasticSearch connector available in many programming languages. Elasticsearch apart from standard functionality and features provided in its open-source free version also offers more advanced ones distributed as [Elastic Stack](https://www.elastic.co/products/stack) (formerly: X-Pack extension) which require license. These include modules for machine learning, alerting, monitoring, advanced security and more. + +**Key resources:** + +- the official [practical introduction to Elasticsearch](https://www.elastic.co/blog/a-practical-introduction-to-elasticsearch) +- the official [Elasticsearch documentation](https://www.elastic.co/guide/) +- the official [Elasticsearch use-case examples](https://github.com/elastic/examples) + +## OpenDistro for Elasticsearch distribution + +[OpenDistro for Elasticsearch](https://opendistro.github.io/for-elasticsearch/) is a fully open-source, free and community-driven fork of Elasticseach. It implements many of the X-pack components functionality, such as advanced security module, alerting module or SQL support. Nonetheless, the standard core functionality and APIs the official Elasticsearch and OpenDistro remain the same. Hence OpenDistro can be used as a drop-in replacement. + +**Key resources:** + +- [the official website](https://opendistro.github.io/for-elasticsearch/) +- the official [OpenDistro for Elasticsearch documentation](https://opendistro.github.io/for-elasticsearch-docs/) + +:::{TIP} +For example use and deployment of CogStack with Elasticsearch please see the tutorial: [CogStack using Apache NiFi Deployment Examples](https://github.com/CogStack/CogStack-NiFi/tree/devel/deploy) +::: + +## Kibana + +[Kibana](https://www.elastic.co/products/kibana) is a data visualisation application for Elasticsearch that be easily used to explore and query the data. In sample CogStack platform deployments it can be used as a ready-to-use data exploration tool. + +Apart from providing exploratory data analysis functionality it also offers administrative options over the ElasticSearch data store, such as adding/removing/updating the documents using command line or creating/removing indices. Moreover, custom user dashboards can be created according to use-case requirements. For a more detailed description of the available functionality please refer to the [official documentation](https://www.elastic.co/guide/en/kibana/current/introduction.html). + +Below is presented an example dashboard. + +![](./attachments/f7a03376-ae0e-4980-929a-e3897da5d186.jpg) + +## Security + +OpenDistro implements many of the commercial X-Pack components functionality, such as advanced security module, alerting module or SQL support. Some of the features include: + +- Preventing unauthorised access with password protection, role-based access control (even per index- or single document-level), and IP filtering. +- Preserving the integrity of your data with message authentication and SSL/TLS encryption. +- Maintaining an audit trail so one know who’s doing what to your cluster and the data it stores. + +The security aspects and configuration are covered extensively in [the official OpenDistro for Elasticsearch documentation](https://opendistro.github.io/for-elasticsearch-docs/). diff --git a/docs/overview/Natural Language Processing.md b/docs/overview/Natural Language Processing.md new file mode 100644 index 0000000..21af7f8 --- /dev/null +++ b/docs/overview/Natural Language Processing.md @@ -0,0 +1,321 @@ + + + +# Natural Language Processing + +## Overview + +CogStack ecosystem provides a standard set of natural language processing applications that are used either as standalone applications or implemented as RESTful services with uniform API, each running in a Docker container. These NLP applications when used inside the data processing pipeline cover one of the key steps of information extraction. These NLP applications may include extracting medical concepts from free-text notes using a specific terminology, such as [SNOMED CT](https://en.wikipedia.org/wiki/SNOMED_CT) or using all the terminologies as available in [UMLS](https://www.nlm.nih.gov/research/umls/index.html). Often, more specialised applications will be built on top of the standard set of NLP applications provided in CogStack, utilising both structured and unstructured information tailored to a defined use-case. These custom applications can be further integrated into CogStack and used as a part of standard set of NLP applications. + +:::{tip} +Please see [CogStack using Apache NiFi Deployment Examples](https://github.com/CogStack/CogStack-NiFi/tree/devel/deploy) to see how to integrate NLP services in example data pipelines. +::: + +:::{tip} +Apart from being integrated directly in the data processing pipeline, many NLP applications are often used as standalone applications and have a rich set of tools build around them – please see below for more details. +::: + +## MedCAT - Medical Concept Annotation Tool + +### Overview + +One of the key tools is MedCAT – a Medical Concept Annotation Tool that is used for Named Entity Recognition and Linking (NER+L) tasks for clinical concepts from free-text documents. + +| | | +|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------| +| MedCAT is based on a light-weight neural network that calculates vector embeddings and that is used for disambiguation and concept detection. MedCAT also uses Deep Learning Language Model that is used for detection of negation, experiencer or any other type of classification.
MedCAT can utilise a concepts dictionary with a vocabulary provided by the end-user that will be used to perform annotate the concepts in the clinical notes. The provided concepts dictionary can be e.g., SNOMED CT terminology or full/ subset of UMLS resource. Apart from providing the vocabulary and concepts dictionary, the underlying MedCAT model can be further trained and fine-tuned for performing context-aware concept disambiguation with additional meta-annotations tasks. MedCAT be run also directly with pre-trained models.
**MedCAT** can be used either as a standalone Python module, as a part of a model trainer application **MedCAT Trainer** or be deployed as a RESTful **MedCAT Service** inside a data processing pipeline. Below are briefly covered possible ways of working with MedCAT.
| ![](./attachments/df995677-ab49-4f74-ab65-a160882b23a6.jpg) | + +:::{WARNING} +Please note that we only provide few basic models for MedCAT that have been prepared using open datasets. Some of the used models are restricted by the external licensing of the resource that was used to build it, such as SNOMED CT or UMLS. In such cases, the user needs to apply for an appropriate license – please see: [UMLS licensing](https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/license_agreement.html) and [SNOMED CT licensing](http://www.snomed.org/snomed-ct/get-snomed). +::: + +:::{IMPORTANT} +When deploying MedCAT into data processing pipelines one may be interested in training and tailoring the MedCAT models as a part of model preparation. This can be done directly by using MedCAT Trainer or MedCAT library working with a corpus of input documents. Such trained model can be in the next step provided into MedCAT Service that will be deployed as a service and used in the data pipeline. +::: + +### MedCAT Python module + +Key resources: + +- GitHub repository with code and documentation: +- MedCAT publication: +- Tutorial on MedCAT: [MedCAT – Analysing Electronic Health Records](https://towardsdatascience.com/medcat-introduction-analyzing-electronic-health-records-e1c420afa13a) (in a series of articles) +- PIP repository: + +:::{tip} +The MedCAT Python library is the functional core of MedCAT project. The library is used by MedCAT Trainer when training and updating the models. It is also used within the MedCAT Service that exposes the medical concepts extraction functionality. +::: + +## MedCAT Trainer + +| | | +|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------| +| MedCAT Trainer is an interface for building, improving and customising a given Named Entity Recognition and Linking models for biomedical domain text.
The models trained by MedCAT Trainer can be later used directly with custom Python applications using on MedCAT module. Alternatively, the models can be deployed in data pipelines, e.g. behind a RESTful API via MedCAT Service.
**Key resources:**
| ![](./attachments/e4aa0fa4-04c2-4811-96e7-cad849e60b07.jpg)
| + +## MedCAT Service + +MedCAT Service implements a RESTful API over MedCAT module to perform extraction of concepts from provided text. Usually, a single instance of MedCAT Service will be serving a single MedCAT model. Such model can be later deployed in data processing pipelines. The API specification is provided in the sections below. + +Key resources: + +- GitHub repository with code, documentation and use examples: + +:::{tip} +Please note that there is available public MedCAT model trained on MedMentions corpus that can be used to play with. +::: + +## GATE NLP applications + +### Overview of GATE NLP suite + +[GATE NLP suite](https://gate.ac.uk/) is a well established and rich set of open-source technologies implementing full-lifecycle solution for text processing. The GATE ecosystem is very broad and outside of the scope of this documentation – here we will only focus on two applications: + +- [GATE Developer](https://gate.ac.uk/family/developer.html), +- [GATE Embedded](https://gate.ac.uk/family/embedded.html). + +GATE Developer is a development environment that provides a large set of graphical interactive tools for the creation, measurement and maintenance of software components for natural language processing. It allows to design, create and run NLP applications using an intuitive user interface. These applications can be later exported as a custom *gapp* or *xgapp* application with the used resources. + +GATE Embedded, on the other hand, is an object-oriented framework (or class library) implemented in Java. It is used in all GATE-based systems, and forms the core (non-visual) elements of GATE Developer. In principle, it implements the runtime for executing GATE applications. It allows to run the *gapp* and *xgapp* applications that have been previously created in GATE Developer. + + +:::{IMPORTANT} +When deploying GATE applications within CogStack one may be interested in defining and tailoring custom GATE applications directly by using GATE Developer. Such prepared application can be in the next step provided into CogStack **GATE NLP Runner Service** that uses GATE Embedded to execute GATE applications. This way, provided NLP application can be deployed as a service and used in the data pipeline. +::: + +Although there have been developed and published many applications in GATE NLP suite, in this page we only briefly cover Bio-YODIE. + +### Bio-YODIE + +Bio-YODIE is a named entity linking system derived from GATE YODIE system. It links mentions in biomedical text to their referents in the UMLS. It defines a broad set of types such as `Disease` , `Drug`, `Observation` and many more all of the types belonging to `Bio` group – for detailed information please refer to [the official documentation](https://gate.ac.uk/applications/bio-yodie.html). + +Bio-YODIE can be run either within GATE Developer application or as a service within CogStack (based on GATE Embedded and running as a Service). Here we primarily focus on the latter and refer the reader to the official Bio-YODIE website. + +**Key resources:** + +- The official website: +- GitHub repository with application code: +- GitHub repository with code to prepare UMLS resources for Bio-YODIE: + +:::{WARNING} +Please note that Bio-YODIE requires resources to be prepared using UMLS. These are limited by individual license and cannot be openly shared. +::: + +### GATE NLP Runner service + +CogStack implements a GATE NLP Runner service that serves the GATE NLP applications as a service exposing RESTful API. It is using GATE Embedded to execute the GATE applications that are provided either in *gapp* or *xgapp* format. The API specification is provided in the sections below. + +For more information please refer to the official GitHub with code and documentation: + +## NLP REST API + +CogStack defines a simple, uniform, RESTful API for free-text documents processing. It’s primary focus has been on providing an application independent and uniform interface for extracting entities from the free-text. The data exchange should be stateless and synchronous. The use-case is: given a document (or a corpus of documents) extract the recognised named entities with associated meta-data. This way, any NLP application can be used or any NLP model can be served in the data processing pipeline as long as it stays compatible with the interface. + +### REST API definition + +The API defines 3 endpoints, that consume and return data in JSON format: + +- *GET* `/api/info` - displays general information about the the NLP application, +- *POST* `/api/process` - processes the provided single document and returns back the annotations, +- *POST* `/api/process_bulk` - processes the provided list of documents and returns back the annotations. + +The full definition is available as [OpenAPI or Swagger](https://github.com/CogStack/gate-nlp-service/tree/devel/api-specs) specification. + +#### GET `/api/info` + +Returns information about the used NLP application. The returned fields are: + +- `name`, `version`, `language` of the underlying NLP application +- `parameters` – a generic JSON object representing any relevant parameters that have been specified to the application (optional) + +#### POST `/api/process` + +Returns the annotations extracted from the provided document. + +The request message payload JSON consists of following objects + +- `content` that represents the single document content to be processed +- `applicationParams` – a generic JSON object representing NLP application run-time parameters (optional) + +The single document processing `content` (\*\*\*) has following keys : + +- `text` – the document to be processed +- `metadata` – a generic JSON object representing any relevant metadata associated with the document that will be consumed by the NLP application (optional) +- `footer` – a generic JSON object representing a payload footer that will be returned back with the result (optional) + +The response message payload JSON consists of an object `result` that has following fields: + +- `text` – the input document that was processed (optional) +- `annotations` – an array of generic JSON annotation objects, not enforcing any schema +- `metadata` – a metadata associated with the processed document that was reported by the NLP application (optional) +- `success` – boolean value indicating whether the NLP processing was successful +- `timestamp` – document processing timestamp +- `errors` – an array of NLP processor errors (present only in case when `success` is `false`) +- `footer` – the footer object as provided in the request payload (present only when provided in the request message) + +#### POST `/api/process_bulk` + +Returns the annotations extracted from a list of documents. + +The request message payload JSON consists of following objects + +- `content` – an array of documents content to be processed +- `applicationParams` – a generic JSON object representing NLP application run-time parameters (optional) + +Here, the `content` object holds an array of single document content to be processed as defined above in (\*\*\*). + +### Example use + +:::{tip} +Please see [CogStack using Apache NiFi Deployment Examples](https://github.com/CogStack/CogStack-NiFi/tree/devel/deploy) to see how to deploy example NLP services, i.e. MedCAT with a public MedMentions model and example GATE NLP Drug application. +::: + +#### MedCAT + +Assuming that the application is running on the `localhost` with the API exposed on port `5000`, one can run: + +```bash +curl -XPOST http://localhost:5000/api/process \ + -H 'Content-Type: application/json' \ + -d '{"content":{"text":"The patient was diagnosed with leukemia."}}' + +``` + +and the received result: + +```json +{ + "result": { + "text": "The patient was diagnosed with leukemia.", + "annotations": [ + { + "pretty_name": "leukemia", + "cui": "C0023418", + "tui": "T191", + "type": "Neoplastic Process", + "source_value": "leukemia", + "acc": "1", + "start": 31, + "end": 39, + "info": {}, + "id": "0", + "meta_anns": {} + } + ], + "success": true, + "timestamp": "2019-12-03T16:09:58.196+00:00" + } +} +``` + +### Bio-YODIE + +Bio-YODIE is being run as a service using CogStack GATE NLP Runner Service as described above. In this example Bio-YODIE application will only output annotations of `Disease` type from `Bio` group (defined in the service configuration file). Assuming that the service is running on the `localhost` with the API exposed on port `8095`, so one can run: + +```bash +curl --header "Content-Type: application/json" \ + --request POST \ + --data '{"content":{"text": "lung cancer diagnosis"}}' \ + http://localhost:8095/api/process +``` + +and the received result: + +```json +{ + "result": { + "text": "lung cancer diagnosis", + "annotations": [ + { + "end_idx": 11, + "set": "Bio", + "Negation": "Affirmed", + "Experiencer": "Patient", + "PREF": "Lung Cancer", + "end_node_id": "17", + "TUI": "T191", + "language": "", + "start_node_id": "16", + "type": "Disease", + "LABELVOCABS": "CHV,MEDLINEPLUS,MSH", + "CUIVOCABS": "MTH,CHV,MSH,SNOMEDCT_US,NCI,LCH_NW,OMIM,MEDLINEPLUS,COSTAR,NCI_CTRP-SDC", + "inst_full": "http://linkedlifedata.com/resource/umls/id/C0242379", + "inst": "C0242379", + "string_orig": "lung cancer", + "STY": "Neoplastic Process", + "start_idx": 0, + "id": 18, + "text": "lung cancer", + "Temporality": "Recent", + "tui_full": "http://linkedlifedata.com/resource/semanticnetwork/id/T191" + } + ], + "metadata": { + "document_features": { + "keyOverlapsOnly": false, + "gate.SourceURL": "created from String", + "docType": "generic", + "deleteNonNNPLookups": "true", + "lang": "en" + } + }, + "success": true, + "timestamp": "2019-12-03T16:10:13.281+00:00" + } +} +``` + +### Extra: a simple GATE-based drug names extraction application + +As an extra example, a simple application for extracting drug names from the free-text was developed in GATE Developer using ANNIE Gazetteer. It uses as an input the data downloaded from [Drugs@FDA database](https://www.accessdata.fda.gov/scripts/cder/daf/) and further refined giving a curated list of drugs and active ingredients. The application functionality is exposed using CogStack GATE NLP Runner Service. + +Similarly as in above, assuming that the application is running on the `localhost` with the API exposed on port `8095`, one can run: + +```bash +curl -XPOST http://localhost:8095/api/process \ + -H 'Content-Type: application/json' \ + -d '{"content":{"text":"The patient was prescribed with Aspirin."}}' + +``` + +and the received result: + +```json +{ + "result": { + "text": "The patient was prescribed with Aspirin.", + "annotations": [ + { + "end_idx": 39, + "majorType": "Drug", + "set": "", + "name": "ASPIRIN", + "start_idx": 32, + "language": "", + "id": 12, + "minorType": "ActiveComponent", + "text": "Aspirin", + "type": "Drug" + }, + { + "end_idx": 39, + "majorType": "Drug", + "set": "", + "name": "ASPIRIN", + "start_idx": 32, + "language": "", + "id": 13, + "minorType": "Medication", + "text": "Aspirin", + "type": "Drug" + } + ], + "metadata": { + "document_features": { + "gate.SourceURL": "created from String" + } + }, + "success": true, + "timestamp": "2019-12-04T09:51:32.246Z" + } +} +``` diff --git a/docs/overview/_index.md b/docs/overview/_index.md new file mode 100644 index 0000000..fe014a8 --- /dev/null +++ b/docs/overview/_index.md @@ -0,0 +1,10 @@ +# Overview + +```{toctree} +:maxdepth: 1 +cogstack-documentation +CogStack ecosystem (v1) +Data pipelines +Elasticsearch +Natural Language Processing +``` \ No newline at end of file diff --git a/docs/overview/attachments/09a8bb60-9864-41fa-be7b-cf9a9dc04498.png b/docs/overview/attachments/09a8bb60-9864-41fa-be7b-cf9a9dc04498.png new file mode 100644 index 0000000..b50841e Binary files /dev/null and b/docs/overview/attachments/09a8bb60-9864-41fa-be7b-cf9a9dc04498.png differ diff --git a/docs/overview/attachments/36c0d23f-a632-4fbf-9f7c-6669e88bbd39.png b/docs/overview/attachments/36c0d23f-a632-4fbf-9f7c-6669e88bbd39.png new file mode 100644 index 0000000..f3f46cc Binary files /dev/null and b/docs/overview/attachments/36c0d23f-a632-4fbf-9f7c-6669e88bbd39.png differ diff --git a/docs/overview/attachments/43c14755-e565-4ae0-a0a3-ec6dc18a691c.png b/docs/overview/attachments/43c14755-e565-4ae0-a0a3-ec6dc18a691c.png new file mode 100644 index 0000000..0e573ff Binary files /dev/null and b/docs/overview/attachments/43c14755-e565-4ae0-a0a3-ec6dc18a691c.png differ diff --git a/docs/overview/attachments/54bb85e8-0428-4a56-a702-fd359272ed6e.png b/docs/overview/attachments/54bb85e8-0428-4a56-a702-fd359272ed6e.png new file mode 100644 index 0000000..0e573ff Binary files /dev/null and b/docs/overview/attachments/54bb85e8-0428-4a56-a702-fd359272ed6e.png differ diff --git a/docs/overview/attachments/5503ea0e-ac74-40ba-936a-1287ad3f1cf5.png b/docs/overview/attachments/5503ea0e-ac74-40ba-936a-1287ad3f1cf5.png new file mode 100644 index 0000000..f4df7d8 Binary files /dev/null and b/docs/overview/attachments/5503ea0e-ac74-40ba-936a-1287ad3f1cf5.png differ diff --git a/docs/overview/attachments/b5fc6b57-faf2-4747-9e77-eb9adf51d8b3.jpg b/docs/overview/attachments/b5fc6b57-faf2-4747-9e77-eb9adf51d8b3.jpg new file mode 100644 index 0000000..6afd177 Binary files /dev/null and b/docs/overview/attachments/b5fc6b57-faf2-4747-9e77-eb9adf51d8b3.jpg differ diff --git a/docs/overview/attachments/df995677-ab49-4f74-ab65-a160882b23a6.jpg b/docs/overview/attachments/df995677-ab49-4f74-ab65-a160882b23a6.jpg new file mode 100644 index 0000000..07dcc8a Binary files /dev/null and b/docs/overview/attachments/df995677-ab49-4f74-ab65-a160882b23a6.jpg differ diff --git a/docs/overview/attachments/e4aa0fa4-04c2-4811-96e7-cad849e60b07.jpg b/docs/overview/attachments/e4aa0fa4-04c2-4811-96e7-cad849e60b07.jpg new file mode 100644 index 0000000..585f645 Binary files /dev/null and b/docs/overview/attachments/e4aa0fa4-04c2-4811-96e7-cad849e60b07.jpg differ diff --git a/docs/overview/attachments/f7a03376-ae0e-4980-929a-e3897da5d186.jpg b/docs/overview/attachments/f7a03376-ae0e-4980-929a-e3897da5d186.jpg new file mode 100644 index 0000000..a1f9127 Binary files /dev/null and b/docs/overview/attachments/f7a03376-ae0e-4980-929a-e3897da5d186.jpg differ diff --git a/docs/overview/cogstack-documentation.md b/docs/overview/cogstack-documentation.md new file mode 100644 index 0000000..f28e402 --- /dev/null +++ b/docs/overview/cogstack-documentation.md @@ -0,0 +1,22 @@ + + + +# CogStack Documentation + +## What is CogStack? + +CogStack is a lightweight distributed, fault tolerant database processing architecture and ecosystem, intended to make NLP processing and preprocessing easier in resource constrained environments. It comprises of multiple components, and has been designed to provide configurable data processing pipelines for working with EHR data. For the moment it mainly uses databases and files as the primary source of EHR data with the possibility of adding custom data connectors in the near future. It makes use of the [Apache-Nifi](https://nifi.apache.org/) framework in order to provide a fully configurable data processing pipeline with the goal of generating annotated JSON standardised schema files that can be readily indexed into [ElasticSearch](https://www.elastic.co/), stored as files or pushed back to a database. + +![](./attachments/54bb85e8-0428-4a56-a702-fd359272ed6e.png) + +The CogStack ecosystem has been developed as an open source project with the code available on GitHub: [https://github.com/CogStack/](https://github.com/CogStack/CogStack-Pipeline) . + +:::{tip} +Starting from version 1.2 CogStack is preferably being run as an ecosystem using a set of different microservices and deployed using [Docker Compose](https://docs.docker.com/compose/). The ready-to-use CogStack images are available to pull directly from the official Docker Hub under [cogstacksystems](https://hub.docker.com/u/cogstacksystems/) organisation. We’re actively pursuing running the stack in a K8s cluster also. +::: + +## Why does this project exist? + +The CogStack consists of a range of technologies designed to to support modern, open source healthcare analytics within the NHS, and is chiefly comprised of the Elastic stack ([ElasticSearch](https://www.elastic.co/products/elasticsearch), [Kibana](https://www.elastic.co/products/kibana), etc.), [MedCAT](https://github.com/CogStack/MedCAT) (clinical natural language processing for named entity extraction and linking), clinical text [OCR](https://github.com/CogStack/ocr-service), clinical text de-identification. Since the processed EHR data can be represented and stored in databases or ElasticSearch, CogStack can be perfectly utilised as one of the solutions for integrating EHR data with other types of biomedical, -omics, wearables data, etc. + +--- \ No newline at end of file diff --git a/docs/platform/_index.md b/docs/platform/_index.md new file mode 100644 index 0000000..6cb0d11 --- /dev/null +++ b/docs/platform/_index.md @@ -0,0 +1,8 @@ +# CogStack Tooling + +```{toctree} +:maxdepth: 2 + +observability/_index.md + +``` diff --git a/docs/observability/_index.md b/docs/platform/observability/_index.md similarity index 100% rename from docs/observability/_index.md rename to docs/platform/observability/_index.md diff --git a/docs/observability/customization/_index.md b/docs/platform/observability/customization/_index.md similarity index 100% rename from docs/observability/customization/_index.md rename to docs/platform/observability/customization/_index.md diff --git a/docs/observability/customization/alerts-customization.md b/docs/platform/observability/customization/alerts-customization.md similarity index 100% rename from docs/observability/customization/alerts-customization.md rename to docs/platform/observability/customization/alerts-customization.md diff --git a/docs/observability/customization/blackbox-exporter-config.md b/docs/platform/observability/customization/blackbox-exporter-config.md similarity index 100% rename from docs/observability/customization/blackbox-exporter-config.md rename to docs/platform/observability/customization/blackbox-exporter-config.md diff --git a/docs/observability/customization/custom-dashboards.md b/docs/platform/observability/customization/custom-dashboards.md similarity index 100% rename from docs/observability/customization/custom-dashboards.md rename to docs/platform/observability/customization/custom-dashboards.md diff --git a/docs/observability/customization/custom-prometheus-configs.md b/docs/platform/observability/customization/custom-prometheus-configs.md similarity index 100% rename from docs/observability/customization/custom-prometheus-configs.md rename to docs/platform/observability/customization/custom-prometheus-configs.md diff --git a/docs/observability/get-started/_index.md b/docs/platform/observability/get-started/_index.md similarity index 100% rename from docs/observability/get-started/_index.md rename to docs/platform/observability/get-started/_index.md diff --git a/docs/observability/get-started/quickstart.md b/docs/platform/observability/get-started/quickstart.md similarity index 100% rename from docs/observability/get-started/quickstart.md rename to docs/platform/observability/get-started/quickstart.md diff --git a/docs/observability/get-started/userguide-tutorial.md b/docs/platform/observability/get-started/userguide-tutorial.md similarity index 89% rename from docs/observability/get-started/userguide-tutorial.md rename to docs/platform/observability/get-started/userguide-tutorial.md index a2d43d6..3ee8dba 100644 --- a/docs/observability/get-started/userguide-tutorial.md +++ b/docs/platform/observability/get-started/userguide-tutorial.md @@ -2,7 +2,7 @@ This guide walks you through how to monitor your stack using the included Grafana dashboards. It shows how to use each dashboard, and some ideas of what things to look out for. ## Availability - How well are things running? -![Availability Dashboard](../../_static/screenshots-dashboards-availability.png) +![Availability Dashboard](../../../_static/screenshots-dashboards-availability.png) Open the Cogstack Monitoring Dashboard on [localhost/grafana](http://localhost/grafana/d/NEzutrbMk/cogstack-monitoring-dashboard) @@ -20,7 +20,7 @@ Use the filters at the top, or click in the table to better filter the view down See [Setup Probing](../setup/probing.md) to do the full setup of probers. ## Inventory - What is running? -![Docker Metrics Dashboard](../../_static/screenshots-dashboards-docker-metrics.png) +![Docker Metrics Dashboard](../../../_static/screenshots-dashboards-docker-metrics.png) Use the Docker Metrics dashboard to check which containers are running, where, and whether they're healthy. This is useful for verifying deployments or diagnosing issues. @@ -36,7 +36,7 @@ See [telemetry](../setup/telemetry.md) to set this up Some additional dashboards are setup to provide more metrics. ### VM Metrics -![ VM Metrics dashboard ](../../_static/screenshots-dashboards-vm-metrics.png) +![ VM Metrics dashboard ](../../../_static/screenshots-dashboards-vm-metrics.png) Open the VM Metrics dashboard on [localhost/grafana](http://localhost/grafana/d/rYdddlPWk/vm-metrics-in-cogstack) @@ -50,7 +50,7 @@ Look for things like: - Trends over time, by setting the time filter to 30 days. Is your disk usage increasing over time? ### Elasticsearch Metrics -![ElasticSearch Metrics Dashboard](../../_static/screenshots-dashboards-es-metrics.png) +![ElasticSearch Metrics Dashboard](../../../_static/screenshots-dashboards-es-metrics.png) Open the Elasticsearch Metrics dashboard on [localhost/grafana](http://localhost/grafana/d/n_nxrE_mk/elasticsearch-metrics-in-cogstack) This dashboard helps you understand how your ElasticSearch or Opensearch cluster is behaving. @@ -66,7 +66,7 @@ See [telemetry](../setup/telemetry.md) to set this up Alerting is setup using Grafana Alerts, but paused by default When alerts are setup, the grafana graphs will show when the alerts were fired. -![Alerts Firing on dashboard](../../_static/screenshots-dashboards-alerts.png) +![Alerts Firing on dashboard](../../../_static/screenshots-dashboards-alerts.png) Two sets of rules are defined in this project: diff --git a/docs/observability/reference/_index.md b/docs/platform/observability/reference/_index.md similarity index 100% rename from docs/observability/reference/_index.md rename to docs/platform/observability/reference/_index.md diff --git a/docs/observability/reference/concept-materials.md b/docs/platform/observability/reference/concept-materials.md similarity index 100% rename from docs/observability/reference/concept-materials.md rename to docs/platform/observability/reference/concept-materials.md diff --git a/docs/observability/reference/project-details.md b/docs/platform/observability/reference/project-details.md similarity index 100% rename from docs/observability/reference/project-details.md rename to docs/platform/observability/reference/project-details.md diff --git a/docs/observability/reference/quickstart-manual.md b/docs/platform/observability/reference/quickstart-manual.md similarity index 74% rename from docs/observability/reference/quickstart-manual.md rename to docs/platform/observability/reference/quickstart-manual.md index bbf00f6..03ca2ea 100644 --- a/docs/observability/reference/quickstart-manual.md +++ b/docs/platform/observability/reference/quickstart-manual.md @@ -11,8 +11,8 @@ mkdir -p observability-simple/alloy/probers Download these two files, and place in the right folder -- [docker-compose.yml](../../../observability/examples/simple/docker-compose.yml) in observability-simple/ -- [probe-observability.yml](../../../observability/examples/simple/alloy/probers/probe-observability.yml) into observability-simple/alloy/probers +- [docker-compose.yml](../../../../observability/examples/simple/docker-compose.yml) in observability-simple/ +- [probe-observability.yml](../../../../observability/examples/simple/alloy/probers/probe-observability.yml) into observability-simple/alloy/probers ### Step 2: Start the stack diff --git a/docs/observability/reference/understanding-metrics.md b/docs/platform/observability/reference/understanding-metrics.md similarity index 100% rename from docs/observability/reference/understanding-metrics.md rename to docs/platform/observability/reference/understanding-metrics.md diff --git a/docs/observability/setup/_index.md b/docs/platform/observability/setup/_index.md similarity index 100% rename from docs/observability/setup/_index.md rename to docs/platform/observability/setup/_index.md diff --git a/docs/observability/setup/alerting.md b/docs/platform/observability/setup/alerting.md similarity index 100% rename from docs/observability/setup/alerting.md rename to docs/platform/observability/setup/alerting.md diff --git a/docs/observability/setup/probing.md b/docs/platform/observability/setup/probing.md similarity index 100% rename from docs/observability/setup/probing.md rename to docs/platform/observability/setup/probing.md diff --git a/docs/observability/setup/production-setup.md b/docs/platform/observability/setup/production-setup.md similarity index 79% rename from docs/observability/setup/production-setup.md rename to docs/platform/observability/setup/production-setup.md index 17bb80d..4399f10 100644 --- a/docs/observability/setup/production-setup.md +++ b/docs/platform/observability/setup/production-setup.md @@ -26,15 +26,15 @@ This script will setup all the folder structure, and download all the relevant f The script automates making folders, and downloading these files: Downloads the example docker compose files: -- [docker-compose.yml](../../../observability/examples/full/docker-compose.yml) -- [exporters.docker-compose.yml](../../../observability/examples/full/exporters.docker-compose.yml) -- [exporters.elastic.docker-compose.yml](../../../observability/examples/full/exporters.elastic.docker-compose.yml) +- [docker-compose.yml](../../../../observability/examples/full/docker-compose.yml) +- [exporters.docker-compose.yml](../../../../observability/examples/full/exporters.docker-compose.yml) +- [exporters.elastic.docker-compose.yml](../../../../observability/examples/full/exporters.elastic.docker-compose.yml) Downloads the configurations: -- [alloy/probers/probe-external.yml](../../../observability/examples/full/alloy/probers/probe-external.yml) -- [alloy/probers/probe-observability.yml ](../../../observability/examples/full/alloy/probers/probe-observability.yml) -- [prometheus/scrape-configs/exporters/exporters.yml](../../../observability/examples/full/prometheus/scrape-configs/exporters/exporters.yml) -- [prometheus/scrape-configs/recording-rules/slo.yml](../../../observability/examples/full/prometheus/scrape-configs/recording-rules/slo.yml) +- [alloy/probers/probe-external.yml](../../../../observability/examples/full/alloy/probers/probe-external.yml) +- [alloy/probers/probe-observability.yml ](../../../../observability/examples/full/alloy/probers/probe-observability.yml) +- [prometheus/scrape-configs/exporters/exporters.yml](../../../../observability/examples/full/prometheus/scrape-configs/exporters/exporters.yml) +- [prometheus/scrape-configs/recording-rules/slo.yml](../../../../observability/examples/full/prometheus/scrape-configs/recording-rules/slo.yml) @@ -86,7 +86,7 @@ This is probably the hardest step: You will actually need to know what is runnin ## Step 5: Run Grafana Alloy on every VM The Grafana Alloy image needs to be run on each VM that you want to get information from. -Use the example docker compose file in [exporters.docker-compose.yml](../../../observability/examples/full/exporters.docker-compose.yml) which will start up alloy and get metrics +Use the example docker compose file in [exporters.docker-compose.yml](../../../../observability/examples/full/exporters.docker-compose.yml) which will start up alloy and get metrics ``` docker compose -f exporters.docker-compose.yml up -d diff --git a/docs/observability/setup/telemetry.md b/docs/platform/observability/setup/telemetry.md similarity index 89% rename from docs/observability/setup/telemetry.md rename to docs/platform/observability/setup/telemetry.md index 9c7401b..82d5512 100644 --- a/docs/observability/setup/telemetry.md +++ b/docs/platform/observability/setup/telemetry.md @@ -16,7 +16,7 @@ We have to run Grafana Alloy on every single VM to get telemetry. Alloy is setup to push metrics to a central prometheus instance. -- Copy this docker compose file: [exporters.docker-compose.yml](../../../observability/examples/full/exporters.docker-compose.yml) +- Copy this docker compose file: [exporters.docker-compose.yml](../../../../observability/examples/full/exporters.docker-compose.yml) - Edit the environment variables to point to your prometheus URL: ```yaml @@ -34,8 +34,8 @@ Now you have the setup, you will have to run this on every VM you want metrics f ### Elastic Search Metrics To get elasticsearch metrics we have to mount an alloy config file into the image. -- Copy this docker compose file: [exporters.elastic.docker-compose.yml](../../../observability/examples/full/exporters.elastic.docker-compose.yml) -- Copy this configuration file [elasticsearch.alloy](../../../observability/examples/full/alloy/elasticsearch.alloy) into `alloy/elasticsearch.alloy` +- Copy this docker compose file: [exporters.elastic.docker-compose.yml](../../../../observability/examples/full/exporters.elastic.docker-compose.yml) +- Copy this configuration file [elasticsearch.alloy](../../../../observability/examples/full/alloy/elasticsearch.alloy) into `alloy/elasticsearch.alloy` In the docker compose file, we can see there are two changes to the usual exporter: