Prism User Guide

Complete guide to using Prism for document extraction and knowledge querying.

Overview

Prism transforms unstructured documents into searchable, queryable knowledge. It supports:

PDF files - Technical documents, specifications, reports
Excel files - Spreadsheets, data tables
Email files - .msg email archives with attachments

Core Concepts

Projects

A project is a container for related documents. Each project has:

Documents - Uploaded source files
Output - Extracted and processed content
Workflow Config - Custom question sections
Search Index - Azure AI Search index
Knowledge Agent - AI query interface

Pipeline

The processing pipeline transforms raw documents into queryable knowledge:

Upload → Process → Deduplicate → Chunk → Embed → Index → Agent → Query

Workflows

Workflows are structured question sets for systematically extracting information:

Sections - Groups of related questions (e.g., "Technical Specs")
Questions - Individual queries with instructions
Templates - Section-level prompts that guide the AI

Getting Started

1. Login

Navigate to http://localhost:3000 and enter your password.

2. Create a Project

Go to Projects
Click New Project
Enter a descriptive name (e.g., "vendor-proposal-2024")
Click Create

3. Upload Documents

Open your project
Use the upload area to add files:
- Drag and drop multiple files
- Or click to browse
Supported formats: PDF, XLSX, XLSM, MSG

4. Run the Pipeline

Execute each pipeline stage in order:

Stage	Description	Output
Process	Extract text and images from documents	Markdown files
Deduplicate	Identify and remove duplicate content	Clean content
Chunk	Split into semantic chunks for search	JSON chunks
Embed	Generate vector embeddings	Embedded chunks
Index Create	Create Azure AI Search index	Search index
Index Upload	Upload chunks to search index	Indexed content
Source Create	Create knowledge source wrapper	Knowledge source
Agent Create	Create knowledge retrieval agent	Ready for queries

Click Run on each step, or Run All for the full pipeline.

Incremental Processing

Prism tracks which documents have already been extracted. When you run the "Process" stage:

Already-extracted documents are skipped - Saves time and API costs
Only new documents are processed - Add documents incrementally
Use "Re-run" to force re-extraction - Click the "Re-run" button to re-process all documents

The extraction status is tracked in output/extraction_status.json.

5. Query Documents

Once the pipeline is complete:

Go to Query
Select your project
Type a natural language question
Get AI-generated answers with source citations

Configuring Workflows

Adding Sections

Open your project
Click Configure Workflow
Click Add Section
Enter:
- Name - Section title (e.g., "Technical Specifications")
- Template - Instructions for the AI

Example template:

Answer the following question based on the technical documents.
Focus on specific values, measurements, and standards references.
If the information is not found, state "Not specified in documents."

Adding Questions

Expand a section
Click Add Question
Enter:
- Question - The query to answer
- Instructions - Specific guidance for this question

Example:

Question: "What is the rated voltage?"
Instructions: "Look for voltage ratings in electrical specifications. Include all voltage levels if multiple exist."

Running Workflows

Go to Workflows
Select a section
Click Run Section
Monitor progress in real-time
View results when complete

Viewing Results

Results Page

View all answered questions
Filter by section
See completion percentage
Export to CSV

Result Details

Each result shows:

Question - The original query
Answer - AI-generated response
References - Source document citations
Comments - Additional notes

API Access

Prism provides a REST API for programmatic access:

Endpoints

Endpoint	Method	Description
`/api/projects`	GET	List all projects
`/api/projects`	POST	Create a project
`/api/projects/{id}`	GET	Get project details
`/api/projects/{id}`	DELETE	Delete a project
`/api/projects/{id}/files`	GET	List project files
`/api/projects/{id}/files`	POST	Upload files
`/api/pipeline/stages`	GET	List pipeline stages
`/api/pipeline/{id}/run`	POST	Run a pipeline stage
`/api/query`	POST	Query documents

API Documentation

Interactive API docs available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Best Practices

Document Preparation

Use clear, descriptive filenames
Ensure PDFs have selectable text (not scanned images)
Organize related documents in the same project

Question Design

Be specific in your questions
Include context in instructions
Use consistent terminology
Group related questions in sections

Performance Tips

Process documents in batches
Use focused, specific queries
Create targeted sections for different topics

Troubleshooting

Pipeline Fails

Check Azure credentials in .env
Verify documents are valid and not corrupted
Check backend logs: docker-compose -f infra/docker/docker-compose.yml logs backend

Poor Query Results

Verify pipeline completed successfully
Check if documents contain the information
Try rephrasing the question
Add more specific instructions

Slow Processing

Large documents take longer to process
Check Azure service quotas
Consider splitting very large files

Keyboard Shortcuts

Shortcut	Action
`Ctrl+Enter`	Submit query
`Escape`	Close modal

Data Storage

All data is stored in the projects/ directory:

projects/
  {project-name}/
    config.json             # Project settings (extraction instructions)
    documents/              # Uploaded files
    output/                 # Processed content
      extraction_results/   # Extracted markdown
      extraction_status.json# Per-document extraction tracking
      chunked_documents/    # JSON chunks
      embedded_documents/   # With embeddings
      results.json          # Workflow answers
    workflow_config.json    # Sections & questions

Security Notes

All queries use Azure AI services
Documents remain in your Azure subscription
API protected by password authentication
HTTPS recommended for production

Getting Help

Check logs: docker-compose -f infra/docker/docker-compose.yml logs
API docs: http://localhost:8000/docs
See QUICKSTART.md for setup
Review CLAUDE.md for architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prism User Guide

Overview

Core Concepts

Projects

Pipeline

Workflows

Getting Started

1. Login

2. Create a Project

3. Upload Documents

4. Run the Pipeline

Incremental Processing

5. Query Documents

Configuring Workflows

Adding Sections

Adding Questions

Running Workflows

Viewing Results

Results Page

Result Details

API Access

Endpoints

API Documentation

Best Practices

Document Preparation

Question Design

Performance Tips

Troubleshooting

Pipeline Fails

Poor Query Results

Slow Processing

Keyboard Shortcuts

Data Storage

Security Notes

Getting Help

FilesExpand file tree

USER_GUIDE.md

Latest commit

History

USER_GUIDE.md

File metadata and controls

Prism User Guide

Overview

Core Concepts

Projects

Pipeline

Workflows

Getting Started

1. Login

2. Create a Project

3. Upload Documents

4. Run the Pipeline

Incremental Processing

5. Query Documents

Configuring Workflows

Adding Sections

Adding Questions

Running Workflows

Viewing Results

Results Page

Result Details

API Access

Endpoints

API Documentation

Best Practices

Document Preparation

Question Design

Performance Tips

Troubleshooting

Pipeline Fails

Poor Query Results

Slow Processing

Keyboard Shortcuts

Data Storage

Security Notes

Getting Help