From 07aeff2799695b985e351cd3b17fd05f9dd0fe6e Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri Date: Fri, 9 Jan 2026 12:18:19 -0500 Subject: [PATCH 01/12] reorganized deployment to keep it inside specific services and created a developer doc for UI - init --- docs/_toc.yml | 6 ++---- docs/deployment_braikbservices.md | 2 +- docs/deployment_userinterface.md | 2 +- docs/ui_developer_document.md | 1 + 4 files changed, 5 insertions(+), 6 deletions(-) create mode 100644 docs/ui_developer_document.md diff --git a/docs/_toc.yml b/docs/_toc.yml index d067c48..bddbc3b 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -18,15 +18,13 @@ parts: numbered: True chapters: - file: ingestion_service + - file: deployment_braikbservices - caption: BrainKB User Interface numbered: True chapters: - file: brainkbui - - caption: Deployment - numbered: True - chapters: + - file: ui_developer_document - file: deployment_userinterface - - file: deployment_braikbservices - caption: StructSense numbered: True chapters: diff --git a/docs/deployment_braikbservices.md b/docs/deployment_braikbservices.md index c2960f0..c9727ec 100644 --- a/docs/deployment_braikbservices.md +++ b/docs/deployment_braikbservices.md @@ -1,4 +1,4 @@ -# Deployment of BrainKB Services +# Deployment BrainKB consists of multiple service components, as highlighted in the {ref}`brainkb_architecture_figure` All of the service components can be deployed independently. However, there are a few dependencies, such as setting up the PostgreSQL database that is used by JWT Users and Scope Manager, that need to be setup first. diff --git a/docs/deployment_userinterface.md b/docs/deployment_userinterface.md index 657e6b4..27524cb 100644 --- a/docs/deployment_userinterface.md +++ b/docs/deployment_userinterface.md @@ -1,4 +1,4 @@ -# Deployment of User Interface +# Deployment This section provides information regarding the deployment of the BrainKB UI, both in the development and the production mode. ```{note} diff --git a/docs/ui_developer_document.md b/docs/ui_developer_document.md new file mode 100644 index 0000000..03b4cb0 --- /dev/null +++ b/docs/ui_developer_document.md @@ -0,0 +1 @@ +# Developer Documentation \ No newline at end of file From 979a910d5a7ca530ed4126fb8e2f9816962938e5 Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri Date: Tue, 10 Feb 2026 13:42:53 -0500 Subject: [PATCH 02/12] updated info about agent task configuration variables --- docs/structsense_configuration.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/docs/structsense_configuration.md b/docs/structsense_configuration.md index 076068e..b6f53bf 100644 --- a/docs/structsense_configuration.md +++ b/docs/structsense_configuration.md @@ -9,12 +9,11 @@ Pass the YAML via CLI, e.g. `--config config/ner_agent.yaml`. - `agent_config` - `task_config` -**Do not replace** runtime variables in braces `{}`: -- `{literature}` — input text (e.g., extracted PDF content) -- `{extracted_structured_information}` — extractor output -- `{aligned_structured_information}` — alignment output -- `{judged_structured_information_with_human_feedback}` — judge output -- `{modification_context}`, `{user_feedback_text}` — inputs to feedback agent +**Do not replace variables** enclosed in curly braces (`{}`); they are dynamically populated at runtime. Names must match the pipeline input map (see `config_template` for examples): +- **Extraction input:** `{input_text}` — input text (e.g. PDF content or raw text) +- **Alignment input:** `{extracted_structured_information}` — output from the extractor agent +- **Judge input:** `{aligned_structured_information}` — output from the alignment agent +- **Human feedback input:** `{judged_structured_information_with_human_feedback}` — output from the judge agent; `{modification_context}` and `{user_feedback_text}` — user feedback for the feedback agent A blank template is available in `config_template/`. See **Templates**. From cef8f8cdad4cab4571ca9fff798ea43c015e558d Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri Date: Tue, 10 Feb 2026 13:46:58 -0500 Subject: [PATCH 03/12] Added legacy section --- docs/structsense_configuration.md | 34 +++++++++++++++---------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/structsense_configuration.md b/docs/structsense_configuration.md index b6f53bf..89888b4 100644 --- a/docs/structsense_configuration.md +++ b/docs/structsense_configuration.md @@ -15,7 +15,7 @@ Pass the YAML via CLI, e.g. `--config config/ner_agent.yaml`. - **Judge input:** `{aligned_structured_information}` — output from the alignment agent - **Human feedback input:** `{judged_structured_information_with_human_feedback}` — output from the judge agent; `{modification_context}` and `{user_feedback_text}` — user feedback for the feedback agent -A blank template is available in `config_template/`. See **Templates**. +A blank template as well as templates for tasks such as `NER`, `Resource Extraction` and `PDF2 ReproSchema` is available in `config_template/`. See **Templates**. ## Agent Configuration @@ -61,7 +61,7 @@ Run without a paid API key: ```bash structsense-cli extract \ --source SOME.pdf \ - --config ner_config_gpt.yaml \ + --config ner-config.yaml \ --env_file .env ``` @@ -75,7 +75,7 @@ Required task IDs (do not rename): - `humanfeedback_task` Each task includes: -- `description` — includes expected input (e.g., `{literature}`) +- `description` — includes expected input (e.g., `{input_text}`) - `expected_output` — **JSON** output format or example - `agent_id` — must match an agent ID from `agent_config` @@ -104,6 +104,20 @@ embedder_config: model: nomic-embed-text:latest ``` +### Experiment Tracking (optional) +| Variable | Description | Default | +|---|---|---| +| `ENABLE_WEIGHTSANDBIAS` | Enable W&B | `false` | +| `ENABLE_MLFLOW` | Enable MLflow | `false` | +| `MLFLOW_TRACKING_URL` | MLflow tracking URL | `http://localhost:5000` | + +### Minimal (no tracking, no knowledge source) +```bash +ENABLE_WEIGHTSANDBIAS=false +ENABLE_MLFLOW=false +ENABLE_KG_SOURCE=false +``` +## Legacy ### Knowledge Source (Vector DB) `WEAVIATE_*` environment variables are optional and only needed if you enable a knowledge source for schema/ontology lookup. @@ -144,20 +158,6 @@ embedder_config: > If Ollama runs on host and Weaviate in Docker, use `http://host.docker.internal:11434`. > If both are in Docker on the same host network, use `http://localhost:11434`. -### Experiment Tracking (optional) -| Variable | Description | Default | -|---|---|---| -| `ENABLE_WEIGHTSANDBIAS` | Enable W&B | `false` | -| `ENABLE_MLFLOW` | Enable MLflow | `false` | -| `MLFLOW_TRACKING_URL` | MLflow tracking URL | `http://localhost:5000` | - -### Minimal (no tracking, no knowledge source) -```bash -ENABLE_WEIGHTSANDBIAS=false -ENABLE_MLFLOW=false -ENABLE_KG_SOURCE=false -``` - ### Example `.env` ```bash WEAVIATE_API_KEY=your_api_key From 08120bafcd5c4ec7b325892e5a6c35e630ebbb18 Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri Date: Tue, 10 Feb 2026 15:16:19 -0500 Subject: [PATCH 04/12] updated to include tutorial information --- docs/structsense_examples.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/docs/structsense_examples.md b/docs/structsense_examples.md index fc56d68..5894e28 100644 --- a/docs/structsense_examples.md +++ b/docs/structsense_examples.md @@ -1,19 +1,18 @@ -# Examples +# Tutorials & Examples -- See the `example/` directory for usage demonstrations and reference configs. -- A blank configuration is provided under `config_template/`. +- See the `tutorial/` directory for usage demonstrations. +- See the `example/` directory task specific reference configs that can be used for `StructSense`. +- A configuration is provided under `config_template/`. -## Example Use Cases -- Entity and relation extraction from text -- Knowledge graph construction ## Blank Configuration Template -A starting template is provided in `config_template/`. +A starting template is provided in `config_template/`. +Please note that `config_template/` folder also contains configuration files for `NER`, `Resource Extraction` and `PDF2ReproSchema` task. Before modifying, read: - **Configuration Overview & Template** - **Agents** - **Tasks** - **Embeddings & Knowledge** -- **Environment Variables** +- **Environment Variables (see `.env_example` from the `StructSense` repository)** From c9e6b6aea9b1e7b8f3255b2a546a510d2ca7a6d5 Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri Date: Tue, 10 Feb 2026 15:17:16 -0500 Subject: [PATCH 05/12] note added --- docs/structsense_getting_started.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/structsense_getting_started.md b/docs/structsense_getting_started.md index 104cd9c..123386c 100644 --- a/docs/structsense_getting_started.md +++ b/docs/structsense_getting_started.md @@ -7,6 +7,8 @@ pip install structsense ``` Alternatively, you can install the latest version of StructSense from the source code on GitHub: +**Note:** The latest updates are not pushed to PyPI, so for now it's recommended to use from GitHub. + ```bash git clone https://github.com/sensein/structsense.git cd structsense @@ -19,16 +21,18 @@ StructSense supports **Python >=3.10,<3.13**. -## Requirements +## Requirements + ### PDF Extraction with Grobid StructSense supports PDF extraction using **[Grobid](https://grobid.readthedocs.io/en/latest/Introduction/)** (default) or an external API service. #### Default: Grobid -By default, StructSense uses Grobid for PDF extraction. You can install and run Grobid either with Docker or in a non-Docker setup. +StructSense uses Grobid for PDF extraction. You can install and run Grobid either with Docker or in a non-Docker setup. We recommend using Docker for easier setup and dependency management. ##### Run Grobid with Docker + ```bash docker pull lfoppiano/grobid:0.8.0 docker run --init -p 8070:8070 -e JAVA_OPTS="-XX:+UseZGC" lfoppiano/grobid:0.8.0 From 64b2a7aa0c3eb047623d77e4c820c14929500516 Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri Date: Tue, 10 Feb 2026 15:21:44 -0500 Subject: [PATCH 06/12] FAQ updated --- docs/structsense_troubleshooting.md | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/docs/structsense_troubleshooting.md b/docs/structsense_troubleshooting.md index d684ed7..36f9c66 100644 --- a/docs/structsense_troubleshooting.md +++ b/docs/structsense_troubleshooting.md @@ -23,11 +23,32 @@ Ensure Python version is **>=3.10,<3.13**. ## FAQ -**Q: Do I need Weaviate to run StructSense?** -A: No. Set `ENABLE_KG_SOURCE=false` to run without a vector DB. +**Q: Why does the agent prompt “Would you like to view your execution traces?”** +A: This happens when execution tracing or telemetry is enabled by default. You can disable the prompt by turning off tracing and telemetry via environment variables. + +```bash +CREWAI_TRACING_ENABLED=false +CREWAI_DISABLE_TELEMETRY=true +CREWAI_DISABLE_TRACING=true +CREWAI_TELEMETRY=false +OTEL_SDK_DISABLED=true +ENABLE_CREW_MEMORY=false +``` +**Q: I am seeing non-fatal agent memory errors. What should I do?** +A: This is commonly related to agent memory being enabled without a valid OpenAI key. If you don’t need memory, disable it explicitly. + +```bash +ENABLE_CREW_MEMORY=false +``` + +**Q: How do chunk sizes affect performance and accuracy?** +A: Smaller chunk sizes generally improve extraction accuracy, but they also increase processing time. Larger chunks run faster but may reduce accuracy—choose based on your priority. + **Q: Can I use local models without API keys?** A: Yes, via **Ollama**. Update agent configs to use the Ollama base URL and model. **Q: Where do I find a minimal `.env`?** A: See **Environment Variables → Minimal** section. + + From fbeac4e3f5ec0532eba3c9474caefbec04ee9028 Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri Date: Tue, 10 Feb 2026 15:23:18 -0500 Subject: [PATCH 07/12] developer info updated --- docs/structsense_troubleshooting.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/structsense_troubleshooting.md b/docs/structsense_troubleshooting.md index 36f9c66..d164f1f 100644 --- a/docs/structsense_troubleshooting.md +++ b/docs/structsense_troubleshooting.md @@ -44,6 +44,8 @@ ENABLE_CREW_MEMORY=false **Q: How do chunk sizes affect performance and accuracy?** A: Smaller chunk sizes generally improve extraction accuracy, but they also increase processing time. Larger chunks run faster but may reduce accuracy—choose based on your priority. +**Q: Where can I find developer documentation?** +A: Developer documentation is available in the repository under `Developer.md`. **Q: Can I use local models without API keys?** A: Yes, via **Ollama**. Update agent configs to use the Ollama base URL and model. From dc91c867cc999727bd3cd508db04ae75587426d6 Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri Date: Wed, 11 Feb 2026 10:23:03 -0500 Subject: [PATCH 08/12] tutorial content moved to getting started --- docs/structsense_getting_started.md | 172 ++++++++++++++++++++++++++-- 1 file changed, 160 insertions(+), 12 deletions(-) diff --git a/docs/structsense_getting_started.md b/docs/structsense_getting_started.md index 123386c..86078bb 100644 --- a/docs/structsense_getting_started.md +++ b/docs/structsense_getting_started.md @@ -62,31 +62,180 @@ In our default setup, Ollama is used for embedding generation. You can also use -## Running +## Using StructSense (CLI and Python) + +### Command-line (CLI) + +After installing (`pip install -e .`), the entry point is **`structsense-cli`**. + +#### Full pipeline (extract) + +Runs extraction → alignment → judge → optional human feedback and returns the final structured result. + +```bash +structsense-cli extract \ + --config path/to/config.yaml \ + --source path/to/file.pdf \ + --env_file .env \ + --save_file result.json +``` + +| Option | Description | +|--------|-------------| +| `--config` | **(Required)** Path to YAML config (agent + task + embedder). | +| `--source` | **(Required)** Input: path to a PDF/text file, a folder, or a text string. | +| `--api_key` | OpenRouter (or other) API key; can also be set in `.env` as `OPENROUTER_API_KEY`. | +| `--env_file` | Path to `.env` (default: `.env` in current directory). | +| `--save_file` | Save the result JSON to this path. | +| `--enable_chunking` | Enable chunking for long documents (flag). | +| `--chunk_size` | Chunk size in characters (e.g. `2000`); used when chunking is enabled. | +| `--max_workers` | Max parallel workers for chunked extraction. | +| `--downstream_max_input_chars` | Max input length for alignment/judge (default 80000). | +| `--max_extraction_chunk_chars` | Cap per-chunk size for extraction (default 25000). | + +**With OpenRouter (API key):** -### Using OpenRouter ```bash structsense-cli extract \ --source somefile.pdf \ - --api_key \ + --api_key \ --config someconfig.yaml \ --env_file .env \ - --save_file result.json # optional + --save_file result.json ``` -### Using Ollama (Local) +**With Ollama (local, no API key):** + ```bash structsense-cli extract \ --source somefile.pdf \ --config someconfig.yaml \ - --env_file .env_file \ - --save_file result.json # optional + --env_file .env \ + --save_file result.json ``` -### Chunking -Disabled by default. Enable with: +**With chunking (recommended for long PDFs):** + +```bash +structsense-cli extract \ + --config config.yaml \ + --source file.pdf \ + --enable_chunking \ + --chunk_size 2000 \ + --save_file result.json +``` + +#### Single agent–task (run-agent) + +Run one agent and one task only (e.g. extractor only), without the full pipeline: + ```bash ---chunking True +structsense-cli run-agent \ + --config path/to/config.yaml \ + --agent_key extractor_agent \ + --task_key extraction_task \ + --source path/to/file.pdf \ + --env_file .env \ + --save_file result.json +``` + +Use the same chunking/worker options as `extract` when needed. + + +### Python (programmatic) + +Use **StructSenseFlow** as the single entry point. Run the **full pipeline** with `information_extraction_task()`, or a **single agent** with `kickoff(agent_key, task_key)` or `extraction()`. + +**API key when running via Python:** For OpenRouter (or other cloud LLMs), either pass `api_key="your-key"` to `StructSenseFlow(...)` or set `OPENROUTER_API_KEY` in a `.env` file and pass `env_file=".env"`. The key is injected into the agent LLM config so all agents use it. Get an OpenRouter key at [openrouter.ai/keys](https://openrouter.ai/keys). If you get `401 User not found`, the key is missing or invalid. + +#### Full pipeline (recommended) + +```python +import asyncio +from structsense.app import StructSenseFlow + +# Config can be paths to YAML files or dicts +flow = StructSenseFlow( + agent_config="path/to/config.yaml", + task_config="path/to/config.yaml", + embedder_config="path/to/config.yaml", + input_source="path/to/file.pdf", # or a text string, or path to .txt + enable_chunking=True, + chunk_size=2000, + max_workers=8, + env_file=".env", + api_key=None, # or set OPENROUTER_API_KEY in .env +) + +# Run full pipeline: extraction → alignment → judge → human feedback (if enabled) +result = asyncio.run(flow.information_extraction_task()) + +# Result is a dict: entities, key_terms, resources, judged_terms, concept_mapping, etc. +print(result.get("task_type"), result.get("elapsed_time")) + +# Save to file +import json +with open("result.json", "w") as f: + json.dump(result, f, indent=2, default=str) +``` + +#### Single agent (one agent–task pair) + +You can run **any** single agent–task pair with `kickoff(agent_key=..., task_key=...)`. For the extractor only, the convenience method is `extraction()`. For the **full pipeline** (extraction → alignment → judge → humanfeedback), use `information_extraction_task()`. + +```python +import asyncio +from structsense.app import StructSenseFlow + +flow = StructSenseFlow( + agent_config="path/to/config.yaml", + task_config="path/to/config.yaml", + embedder_config="path/to/config.yaml", + input_source="path/to/file.pdf", # or source_text="raw text" + enable_chunking=True, + chunk_size=2000, +) + +# Run only the extractor (convenience method) +result = asyncio.run(flow.extraction()) + +# Or run any specific agent–task pair +result = asyncio.run(flow.kickoff( + agent_key="extractor_agent", + task_key="extraction_task", +)) +# Other pairs: alignment_agent/alignment_task, judge_agent/judge_task, +# humanfeedback_agent/humanfeedback_task +``` + +**Note:** Alignment, judge, and humanfeedback tasks are designed to receive **output from the previous stage** when run in the full pipeline. When you run them alone via `kickoff(...)`, they receive the raw `source_text` as input (useful for debugging or custom flows). + +#### Passing config as dicts + +```python +import asyncio +import yaml +from structsense.app import StructSenseFlow + +with open("ner-config.yaml") as f: + all_config = yaml.safe_load(f) + +flow = StructSenseFlow( + agent_config=all_config["agent_config"], + task_config=all_config["task_config"], + embedder_config=all_config.get("embedder_config", {}), + input_source="path/to/file.pdf", # or source_text="raw text" + enable_chunking=True, + chunk_size=2000, + max_workers=8, + env_file=".env", # optional; loads OPENROUTER_API_KEY etc. + api_key=None, # or pass key here; injected into LLM config +) +result = asyncio.run(flow.information_extraction_task()) + +import json +with open("result.json", "w") as f: + json.dump(result, f, indent=2, default=str) ``` @@ -95,9 +244,8 @@ Disabled by default. Enable with: The `docker/` directory contains **Docker Compose** files for running the following components: - **Grobid** – for PDF extraction -- **Weaviate** – In our StructSense architecture, Weaviate acts as the vector database responsible for storing the ontology, effectively serving as the Ontology database. -These Compose files allow you to quickly stand up a complete local **StructSense** stack. +- These Compose files allow you to quickly stand up a complete local **StructSense** stack. If you prefer not to install dependencies system-wide, you can use the provided Docker Compose setup to run everything in **container mode**. This makes it easy to isolate services and manage your environment with minimal setup. From 2d37d2631897bf03dd2e72af6d075aa70537197f Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri <52251022+tekrajchhetri@users.noreply.github.com> Date: Wed, 11 Feb 2026 10:23:47 -0500 Subject: [PATCH 09/12] Update docs/structsense_examples.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- docs/structsense_examples.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/structsense_examples.md b/docs/structsense_examples.md index 5894e28..3bbb2e4 100644 --- a/docs/structsense_examples.md +++ b/docs/structsense_examples.md @@ -1,7 +1,7 @@ # Tutorials & Examples - See the `tutorial/` directory for usage demonstrations. -- See the `example/` directory task specific reference configs that can be used for `StructSense`. +- See the `example/` directory for task specific reference configs that can be used for `StructSense`. - A configuration is provided under `config_template/`. From d5fb42ae807ee7568fb7ac5338612d39613f7d19 Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri <52251022+tekrajchhetri@users.noreply.github.com> Date: Wed, 11 Feb 2026 10:23:55 -0500 Subject: [PATCH 10/12] Update docs/structsense_getting_started.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- docs/structsense_getting_started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/structsense_getting_started.md b/docs/structsense_getting_started.md index 0137433..63adbfa 100644 --- a/docs/structsense_getting_started.md +++ b/docs/structsense_getting_started.md @@ -21,7 +21,7 @@ StructSense supports **Python >=3.10,<3.13**. -## Requirements +## Requirements ### PDF Extraction with Grobid From 95327f98e3d1fd062ac4361c4208e21690fdabca Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri <52251022+tekrajchhetri@users.noreply.github.com> Date: Wed, 11 Feb 2026 10:24:03 -0500 Subject: [PATCH 11/12] Update docs/deployment_braikbservices.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- docs/deployment_braikbservices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/deployment_braikbservices.md b/docs/deployment_braikbservices.md index c9727ec..5d8f4d0 100644 --- a/docs/deployment_braikbservices.md +++ b/docs/deployment_braikbservices.md @@ -1,4 +1,4 @@ -# Deployment +# Microservices Deployment BrainKB consists of multiple service components, as highlighted in the {ref}`brainkb_architecture_figure` All of the service components can be deployed independently. However, there are a few dependencies, such as setting up the PostgreSQL database that is used by JWT Users and Scope Manager, that need to be setup first. From 246bf2cef88b723636901596cafab11b1b88766c Mon Sep 17 00:00:00 2001 From: Tek Raj Chhetri <52251022+tekrajchhetri@users.noreply.github.com> Date: Wed, 11 Feb 2026 10:24:11 -0500 Subject: [PATCH 12/12] Update docs/structsense_examples.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- docs/structsense_examples.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/structsense_examples.md b/docs/structsense_examples.md index 3bbb2e4..41f11b5 100644 --- a/docs/structsense_examples.md +++ b/docs/structsense_examples.md @@ -8,7 +8,7 @@ ## Blank Configuration Template A starting template is provided in `config_template/`. -Please note that `config_template/` folder also contains configuration files for `NER`, `Resource Extraction` and `PDF2ReproSchema` task. +Please note that `config_template/` folder also contains configuration files for `NER`, `Resource Extraction` and `PDF2ReproSchema` tasks. Before modifying, read: - **Configuration Overview & Template**