sensein · tekrajchhetri · Jan 9, 2026 · Feb 10, 2026 · Feb 10, 2026 · Feb 10, 2026
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -18,15 +18,13 @@ parts:
     numbered: True 
     chapters:
     - file: ingestion_service
+    - file: deployment_braikbservices
   - caption: BrainKB User Interface
     numbered: True 
     chapters:
     - file: brainkbui
-  - caption: Deployment
-    numbered: True 
-    chapters:
+    - file: ui_developer_document
     - file: deployment_userinterface
-    - file: deployment_braikbservices
   - caption: StructSense
     numbered: True
     chapters:

diff --git a/docs/deployment_braikbservices.md b/docs/deployment_braikbservices.md
@@ -1,4 +1,4 @@
-# Deployment of BrainKB Services
+# Microservices Deployment
 
 BrainKB consists of multiple service components, as highlighted in the {ref}`brainkb_architecture_figure` All of the service components can be deployed independently. However, there are a few dependencies, such as setting up the PostgreSQL database that is used by JWT Users and Scope Manager, that need to be setup first. 
 

diff --git a/docs/deployment_userinterface.md b/docs/deployment_userinterface.md
@@ -1,4 +1,4 @@
-# Deployment of User Interface
+# Deployment
-# Deployment
+# User Interface Deployment
-# Deployment
+# User Interface Deployment
 This section provides information regarding the deployment of the BrainKB UI, both in the development and the production mode.
 
 ```{note}

diff --git a/docs/structsense_configuration.md b/docs/structsense_configuration.md
@@ -9,15 +9,13 @@ Pass the YAML via CLI, e.g. `--config config/ner_agent.yaml`.
 - `agent_config`
 - `task_config`
 
-**Do not replace** runtime variables in braces `{}`:
-- `{literature}` — input text (e.g., extracted PDF content)
-- `{extracted_structured_information}` — extractor output
-- `{aligned_structured_information}` — alignment output
-- `{judged_structured_information_with_human_feedback}` — judge output
-- `{modification_context}`, `{user_feedback_text}` — inputs to feedback agent
+**Do not replace variables** enclosed in curly braces (`{}`); they are dynamically populated at runtime. Names must match the pipeline input map (see `config_template` for examples):
+- **Extraction input:** `{input_text}` — input text (e.g. PDF content or raw text)
+- **Alignment input:** `{extracted_structured_information}` — output from the extractor agent
+- **Judge input:** `{aligned_structured_information}` — output from the alignment agent
+- **Human feedback input:** `{judged_structured_information_with_human_feedback}` — output from the judge agent; `{modification_context}` and `{user_feedback_text}` — user feedback for the feedback agent
 
-**Config Template**\
-A blank template is available in [config_template](https://github.com/sensein/structsense/blob/main/config_template/config.yaml).
+A blank template as well as templates for tasks such as `NER`, `Resource Extraction` and `PDF2 ReproSchema` is available in `config_template/`. See **Templates**.
 
 <!--Agent Configuration -->
 ## Agent Configuration
@@ -63,7 +61,7 @@ Run without a paid API key:
 ```bash
 structsense-cli extract \
   --source SOME.pdf \
-  --config ner_config_gpt.yaml \
+  --config ner-config.yaml \
   --env_file .env
 ```
 
@@ -77,7 +75,7 @@ Required task IDs (do not rename):
 - `humanfeedback_task`
 
 Each task includes:
-- `description` — includes expected input (e.g., `{literature}`)
+- `description` — includes expected input (e.g., `{input_text}`)
 - `expected_output` — **JSON** output format or example
 - `agent_id` — must match an agent ID from `agent_config`
 
@@ -106,6 +104,20 @@ embedder_config:
     model: nomic-embed-text:latest
 ```
 
+### Experiment Tracking (optional)
+| Variable | Description | Default |
+|---|---|---|
+| `ENABLE_WEIGHTSANDBIAS` | Enable W&B | `false` |
+| `ENABLE_MLFLOW` | Enable MLflow | `false` |
+| `MLFLOW_TRACKING_URL` | MLflow tracking URL | `http://localhost:5000` |
+
+### Minimal (no tracking, no knowledge source)
+```bash
+ENABLE_WEIGHTSANDBIAS=false
+ENABLE_MLFLOW=false
+ENABLE_KG_SOURCE=false
+```
+## Legacy
 ### Knowledge Source (Vector DB)
 `WEAVIATE_*` environment variables are optional and only needed if you enable a knowledge source for schema/ontology lookup.
 
@@ -146,17 +158,25 @@ embedder_config:
 > If Ollama runs on host and Weaviate in Docker, use `http://host.docker.internal:11434`.  
 > If both are in Docker on the same host network, use `http://localhost:11434`.
 
-### Experiment Tracking (optional)
-| Variable | Description | Default |
-|---|---|---|
-| `ENABLE_WEIGHTSANDBIAS` | Enable W&B | `false` |
-| `ENABLE_MLFLOW` | Enable MLflow | `false` |
-| `MLFLOW_TRACKING_URL` | MLflow tracking URL | `http://localhost:5000` |
-
-<!-- Example .env -->
-## Example `.env`
+### Example `.env`
 ```bash
-ENABLE_KG_SOURCE=false
-OLLAMA_API_ENDPOINT=http://localhost:11434
-OLLAMA_MODEL=nomic-embed-text:v1.5
+WEAVIATE_API_KEY=your_api_key
+WEAVIATE_HTTP_HOST=localhost
+WEAVIATE_HTTP_PORT=8080
+WEAVIATE_HTTP_SECURE=false
+
+WEAVIATE_GRPC_HOST=localhost
+WEAVIATE_GRPC_PORT=50051
+WEAVIATE_GRPC_SECURE=false
+
+WEAVIATE_TIMEOUT_INIT=30
+WEAVIATE_TIMEOUT_QUERY=60
+WEAVIATE_TIMEOUT_INSERT=120
+
+OLLAMA_API_ENDPOINT=http://host.docker.internal:11434
+OLLAMA_MODEL=nomic-embed-text
+
+ENABLE_WEAVE=true
+ENABLE_MLFLOW=true
+MLFLOW_TRACKING_URL=http://localhost:5000
 ```
diff --git a/docs/structsense_examples.md b/docs/structsense_examples.md
@@ -1,9 +1,18 @@
-# Examples
+# Tutorials & Examples
 
-- See the [example/](https://github.com/sensein/structsense/tree/main/example) directory for usage demonstrations and reference configs.
+- See the `tutorial/` directory for usage demonstrations.
+- See the `example/` directory for task specific reference configs that can be used for `StructSense`. 
+- A configuration is provided under `config_template/`.
 
-## Example Use Cases
-**For more information about StructSense use cases, see the [StructSense paper on arXiv](https://arxiv.org/html/2507.03674v2#S5)**
-- Neuroscience Named Entity Extraction from text
-- Resource (i.e. models, datasets) Extraction
-- ReproSchema Extraction
+
+## Blank Configuration Template
+
+A starting template is provided in `config_template/`. 
+Please note that `config_template/` folder also contains configuration files for `NER`, `Resource Extraction` and `PDF2ReproSchema` tasks.
+
+Before modifying, read:
+- **Configuration Overview & Template**
+- **Agents**
+- **Tasks**
+- **Embeddings & Knowledge**
+- **Environment Variables (see `.env_example` from the `StructSense` repository)** 
diff --git a/docs/structsense_getting_started.md b/docs/structsense_getting_started.md
@@ -7,6 +7,8 @@ pip install structsense
 ```
 Alternatively, you can install the latest version of StructSense from the source code on GitHub:
 
+**Note:** The latest updates are not pushed to PyPI, so for now it's recommended to use from GitHub.
+
 ```bash
 git clone https://github.com/sensein/structsense.git
 cd structsense
@@ -20,15 +22,17 @@ StructSense supports **Python >=3.10,<3.13**.
 
 <!-- # Requirements -->
 ## Requirements
+
 ### PDF Extraction with Grobid
 
 StructSense supports PDF extraction using **[Grobid](https://grobid.readthedocs.io/en/latest/Introduction/)** (default) or an external API service.
 
 #### Default: Grobid
-By default, StructSense uses Grobid for PDF extraction. You can install and run Grobid either with Docker or in a non-Docker setup.  
+StructSense uses Grobid for PDF extraction. You can install and run Grobid either with Docker or in a non-Docker setup.  
 We recommend using Docker for easier setup and dependency management.
 
 ##### Run Grobid with Docker
+
 ```bash
 docker pull lfoppiano/grobid:0.8.0
 docker run --init -p 8070:8070 -e JAVA_OPTS="-XX:+UseZGC" lfoppiano/grobid:0.8.0
@@ -58,31 +62,180 @@ In our default setup, Ollama is used for embedding generation. You can also use
 
 <!--Running -->
 
-## Running
+## Using StructSense (CLI and Python)
+
+### Command-line (CLI)
+
+After installing (`pip install -e .`), the entry point is **`structsense-cli`**.
+
+#### Full pipeline (extract)
+
+Runs extraction → alignment → judge → optional human feedback and returns the final structured result.
+
+```bash
+structsense-cli extract \
+  --config path/to/config.yaml \
+  --source path/to/file.pdf \
+  --env_file .env \
+  --save_file result.json
+```
+
+| Option | Description |
+|--------|-------------|
+| `--config` | **(Required)** Path to YAML config (agent + task + embedder). |
+| `--source` | **(Required)** Input: path to a PDF/text file, a folder, or a text string. |
+| `--api_key` | OpenRouter (or other) API key; can also be set in `.env` as `OPENROUTER_API_KEY`. |
+| `--env_file` | Path to `.env` (default: `.env` in current directory). |
+| `--save_file` | Save the result JSON to this path. |
+| `--enable_chunking` | Enable chunking for long documents (flag). |
+| `--chunk_size` | Chunk size in characters (e.g. `2000`); used when chunking is enabled. |
+| `--max_workers` | Max parallel workers for chunked extraction. |
+| `--downstream_max_input_chars` | Max input length for alignment/judge (default 80000). |
+| `--max_extraction_chunk_chars` | Cap per-chunk size for extraction (default 25000). |
+
+**With OpenRouter (API key):**
 
-### Using OpenRouter
 ```bash
 structsense-cli extract \
   --source somefile.pdf \
-  --api_key <YOUR_API_KEY> \
+  --api_key <YOUR_OPENROUTER_API_KEY> \
   --config someconfig.yaml \
   --env_file .env \
-  --save_file result.json  # optional
+  --save_file result.json
 ```
 
-### Using Ollama (Local)
+**With Ollama (local, no API key):**
+
 ```bash
 structsense-cli extract \
   --source somefile.pdf \
   --config someconfig.yaml \
-  --env_file .env_file \
-  --save_file result.json  # optional
+  --env_file .env \
+  --save_file result.json
 ```
 
-### Chunking
-Disabled by default. Enable with:
+**With chunking (recommended for long PDFs):**
+
 ```bash
---chunking True
+structsense-cli extract \
+  --config config.yaml \
+  --source file.pdf \
+  --enable_chunking \
+  --chunk_size 2000 \
+  --save_file result.json
+```
+
+#### Single agent–task (run-agent)
+
+Run one agent and one task only (e.g. extractor only), without the full pipeline:
+
+```bash
+structsense-cli run-agent \
+  --config path/to/config.yaml \
+  --agent_key extractor_agent \
+  --task_key extraction_task \
+  --source path/to/file.pdf \
+  --env_file .env \
+  --save_file result.json
+```
+
+Use the same chunking/worker options as `extract` when needed.
+
+
+### Python (programmatic)
+
+Use **StructSenseFlow** as the single entry point. Run the **full pipeline** with `information_extraction_task()`, or a **single agent** with `kickoff(agent_key, task_key)` or `extraction()`.
+
+**API key when running via Python:** For OpenRouter (or other cloud LLMs), either pass `api_key="your-key"` to `StructSenseFlow(...)` or set `OPENROUTER_API_KEY` in a `.env` file and pass `env_file=".env"`. The key is injected into the agent LLM config so all agents use it. Get an OpenRouter key at [openrouter.ai/keys](https://openrouter.ai/keys). If you get `401 User not found`, the key is missing or invalid.
+
+#### Full pipeline (recommended)
+
+```python
+import asyncio
+from structsense.app import StructSenseFlow
+
+# Config can be paths to YAML files or dicts
+flow = StructSenseFlow(
+    agent_config="path/to/config.yaml",
+    task_config="path/to/config.yaml",
+    embedder_config="path/to/config.yaml",
+    input_source="path/to/file.pdf",   # or a text string, or path to .txt
+    enable_chunking=True,
+    chunk_size=2000,
+    max_workers=8,
+    env_file=".env",
+    api_key=None,   # or set OPENROUTER_API_KEY in .env
+)
+
+# Run full pipeline: extraction → alignment → judge → human feedback (if enabled)
+result = asyncio.run(flow.information_extraction_task())
+
+# Result is a dict: entities, key_terms, resources, judged_terms, concept_mapping, etc.
+print(result.get("task_type"), result.get("elapsed_time"))
+
+# Save to file
+import json
+with open("result.json", "w") as f:
+    json.dump(result, f, indent=2, default=str)
+```
+
+#### Single agent (one agent–task pair)
+
+You can run **any** single agent–task pair with `kickoff(agent_key=..., task_key=...)`. For the extractor only, the convenience method is `extraction()`. For the **full pipeline** (extraction → alignment → judge → humanfeedback), use `information_extraction_task()`.
+
+```python
+import asyncio
+from structsense.app import StructSenseFlow
+
+flow = StructSenseFlow(
+    agent_config="path/to/config.yaml",
+    task_config="path/to/config.yaml",
+    embedder_config="path/to/config.yaml",
+    input_source="path/to/file.pdf",  # or source_text="raw text"
+    enable_chunking=True,
+    chunk_size=2000,
+)
+
+# Run only the extractor (convenience method)
+result = asyncio.run(flow.extraction())
+
+# Or run any specific agent–task pair
+result = asyncio.run(flow.kickoff(
+    agent_key="extractor_agent",
+    task_key="extraction_task",
+))
+# Other pairs: alignment_agent/alignment_task, judge_agent/judge_task,
+# humanfeedback_agent/humanfeedback_task
+```
+
+**Note:** Alignment, judge, and humanfeedback tasks are designed to receive **output from the previous stage** when run in the full pipeline. When you run them alone via `kickoff(...)`, they receive the raw `source_text` as input (useful for debugging or custom flows).
+
+#### Passing config as dicts
+
+```python
+import asyncio
+import yaml
+from structsense.app import StructSenseFlow
+
+with open("ner-config.yaml") as f:
+    all_config = yaml.safe_load(f)
+
+flow = StructSenseFlow(
+    agent_config=all_config["agent_config"],
+    task_config=all_config["task_config"],
+    embedder_config=all_config.get("embedder_config", {}),
+    input_source="path/to/file.pdf",  # or source_text="raw text"
+    enable_chunking=True,
+    chunk_size=2000,
+    max_workers=8,
+    env_file=".env",           # optional; loads OPENROUTER_API_KEY etc.
+    api_key=None,              # or pass key here; injected into LLM config
+)
+result = asyncio.run(flow.information_extraction_task())
+
+import json
+with open("result.json", "w") as f:
+    json.dump(result, f, indent=2, default=str)
 ```
 
 <!-- Docker -->
@@ -91,9 +244,8 @@ Disabled by default. Enable with:
 The `docker/` directory contains **Docker Compose** files for running the following components:
 
 - **Grobid** – for PDF extraction
-- **Weaviate** – In our StructSense architecture, Weaviate acts as the vector database responsible for storing the ontology, effectively serving as the Ontology database.
 
-These Compose files allow you to quickly stand up a complete local **StructSense** stack.
+- These Compose files allow you to quickly stand up a complete local **StructSense** stack.
 
 If you prefer not to install dependencies system-wide, you can use the provided Docker Compose setup to run everything in **container mode**.  
 This makes it easy to isolate services and manage your environment with minimal setup.