-
Notifications
You must be signed in to change notification settings - Fork 3
Align documentation with latest StructSense updates #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
07aeff2
979a910
cef8f8c
08120ba
c9e6b6a
64b2a7a
fbeac4e
b65cda0
dc91c86
f069596
2d37d26
d5fb42a
95327f9
246bf2c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,15 +9,13 @@ Pass the YAML via CLI, e.g. `--config config/ner_agent.yaml`. | |
| - `agent_config` | ||
| - `task_config` | ||
|
|
||
| **Do not replace** runtime variables in braces `{}`: | ||
| - `{literature}` — input text (e.g., extracted PDF content) | ||
| - `{extracted_structured_information}` — extractor output | ||
| - `{aligned_structured_information}` — alignment output | ||
| - `{judged_structured_information_with_human_feedback}` — judge output | ||
| - `{modification_context}`, `{user_feedback_text}` — inputs to feedback agent | ||
| **Do not replace variables** enclosed in curly braces (`{}`); they are dynamically populated at runtime. Names must match the pipeline input map (see `config_template` for examples): | ||
| - **Extraction input:** `{input_text}` — input text (e.g. PDF content or raw text) | ||
| - **Alignment input:** `{extracted_structured_information}` — output from the extractor agent | ||
| - **Judge input:** `{aligned_structured_information}` — output from the alignment agent | ||
| - **Human feedback input:** `{judged_structured_information_with_human_feedback}` — output from the judge agent; `{modification_context}` and `{user_feedback_text}` — user feedback for the feedback agent | ||
|
|
||
| **Config Template**\ | ||
| A blank template is available in [config_template](https://github.com/sensein/structsense/blob/main/config_template/config.yaml). | ||
| A blank template as well as templates for tasks such as `NER`, `Resource Extraction` and `PDF2 ReproSchema` is available in `config_template/`. See **Templates**. | ||
|
|
||
| <!--Agent Configuration --> | ||
| ## Agent Configuration | ||
|
|
@@ -63,7 +61,7 @@ Run without a paid API key: | |
| ```bash | ||
| structsense-cli extract \ | ||
| --source SOME.pdf \ | ||
| --config ner_config_gpt.yaml \ | ||
| --config ner-config.yaml \ | ||
| --env_file .env | ||
| ``` | ||
|
|
||
|
|
@@ -77,7 +75,7 @@ Required task IDs (do not rename): | |
| - `humanfeedback_task` | ||
|
|
||
| Each task includes: | ||
| - `description` — includes expected input (e.g., `{literature}`) | ||
| - `description` — includes expected input (e.g., `{input_text}`) | ||
| - `expected_output` — **JSON** output format or example | ||
| - `agent_id` — must match an agent ID from `agent_config` | ||
|
|
||
|
|
@@ -106,6 +104,20 @@ embedder_config: | |
| model: nomic-embed-text:latest | ||
| ``` | ||
|
|
||
| ### Experiment Tracking (optional) | ||
| | Variable | Description | Default | | ||
| |---|---|---| | ||
| | `ENABLE_WEIGHTSANDBIAS` | Enable W&B | `false` | | ||
| | `ENABLE_MLFLOW` | Enable MLflow | `false` | | ||
| | `MLFLOW_TRACKING_URL` | MLflow tracking URL | `http://localhost:5000` | | ||
|
|
||
| ### Minimal (no tracking, no knowledge source) | ||
| ```bash | ||
| ENABLE_WEIGHTSANDBIAS=false | ||
| ENABLE_MLFLOW=false | ||
| ENABLE_KG_SOURCE=false | ||
| ``` | ||
| ## Legacy | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
| ### Knowledge Source (Vector DB) | ||
| `WEAVIATE_*` environment variables are optional and only needed if you enable a knowledge source for schema/ontology lookup. | ||
|
|
||
|
|
@@ -146,17 +158,25 @@ embedder_config: | |
| > If Ollama runs on host and Weaviate in Docker, use `http://host.docker.internal:11434`. | ||
| > If both are in Docker on the same host network, use `http://localhost:11434`. | ||
|
|
||
| ### Experiment Tracking (optional) | ||
| | Variable | Description | Default | | ||
| |---|---|---| | ||
| | `ENABLE_WEIGHTSANDBIAS` | Enable W&B | `false` | | ||
| | `ENABLE_MLFLOW` | Enable MLflow | `false` | | ||
| | `MLFLOW_TRACKING_URL` | MLflow tracking URL | `http://localhost:5000` | | ||
|
|
||
| <!-- Example .env --> | ||
| ## Example `.env` | ||
| ### Example `.env` | ||
| ```bash | ||
| ENABLE_KG_SOURCE=false | ||
| OLLAMA_API_ENDPOINT=http://localhost:11434 | ||
| OLLAMA_MODEL=nomic-embed-text:v1.5 | ||
| WEAVIATE_API_KEY=your_api_key | ||
| WEAVIATE_HTTP_HOST=localhost | ||
| WEAVIATE_HTTP_PORT=8080 | ||
| WEAVIATE_HTTP_SECURE=false | ||
|
|
||
| WEAVIATE_GRPC_HOST=localhost | ||
| WEAVIATE_GRPC_PORT=50051 | ||
| WEAVIATE_GRPC_SECURE=false | ||
|
|
||
| WEAVIATE_TIMEOUT_INIT=30 | ||
| WEAVIATE_TIMEOUT_QUERY=60 | ||
| WEAVIATE_TIMEOUT_INSERT=120 | ||
|
|
||
| OLLAMA_API_ENDPOINT=http://host.docker.internal:11434 | ||
| OLLAMA_MODEL=nomic-embed-text | ||
|
|
||
| ENABLE_WEAVE=true | ||
| ENABLE_MLFLOW=true | ||
| MLFLOW_TRACKING_URL=http://localhost:5000 | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,18 @@ | ||
| # Examples | ||
| # Tutorials & Examples | ||
|
|
||
| - See the [example/](https://github.com/sensein/structsense/tree/main/example) directory for usage demonstrations and reference configs. | ||
| - See the `tutorial/` directory for usage demonstrations. | ||
| - See the `example/` directory for task specific reference configs that can be used for `StructSense`. | ||
| - A configuration is provided under `config_template/`. | ||
|
|
||
| ## Example Use Cases | ||
| **For more information about StructSense use cases, see the [StructSense paper on arXiv](https://arxiv.org/html/2507.03674v2#S5)** | ||
| - Neuroscience Named Entity Extraction from text | ||
| - Resource (i.e. models, datasets) Extraction | ||
| - ReproSchema Extraction | ||
|
|
||
| ## Blank Configuration Template | ||
|
|
||
| A starting template is provided in `config_template/`. | ||
| Please note that `config_template/` folder also contains configuration files for `NER`, `Resource Extraction` and `PDF2ReproSchema` tasks. | ||
|
|
||
| Before modifying, read: | ||
| - **Configuration Overview & Template** | ||
| - **Agents** | ||
| - **Tasks** | ||
| - **Embeddings & Knowledge** | ||
| - **Environment Variables (see `.env_example` from the `StructSense` repository)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve clarity, consider making the title more specific, as this file is now part of the 'BrainKB User Interface' section in the table of contents. A more descriptive title would help readers understand the context better.