- File:
settings.py - Features:
- Centralized configuration management
- Environment-based settings (development, staging, production)
- Configuration validation
- Safe dictionary export (excludes secrets)
- Dataclass-based design for type safety
Key Classes:
- Settings # Main configuration container
- PDFConfig # PDF processing settings
- EmbeddingConfig # Embedding model settings
- CacheConfig # Caching behavior
- LLMConfig # Language model settings
- APIConfig # API server settings
- LoggingConfig # Logging behaviorUsage:
from src.config import get_settings
settings = get_settings()
chunk_size = settings.pdf.chunk_size
api_port = settings.api.port- File:
models.py - Features:
- Type-safe data models
- Automatic dictionary conversion (for JSON responses)
- Enum for message types
- Full traceability with timestamps and metadata
Key Classes:
- Document # Represents a processed PDF document
- ChatMessage # Individual chat message
- ChatSession # Chat session with multiple messages
- RetrievalResult # Semantic search result
- QueryResponse # Response from LLM query
- FileUpload # File upload metadata
- APIResponse # Standard API response format
- HealthStatus # System health informationExample Usage:
from src.core.models import Document, ChatMessage, APIResponse
# Create a document
doc = Document(
id="doc_123",
filename="research.pdf",
chunks=chunk_list,
embeddings=embeddings_array,
created_at=time.time(),
updated_at=time.time()
)
# Convert to JSON-safe dict
doc_dict = doc.to_dict()
# Wrap in API response
response = APIResponse(
success=True,
data=doc_dict,
metadata={'processing_time': 1.23}
)- File:
exceptions.py - Features:
- Structured exception hierarchy
- Error codes and details
- Automatic API error mapping
- HTTP status code mapping
Exception Tree:
IntelliQueryException (base)
├── ConfigurationError
├── ValidationError
├── PDFProcessingError
│ ├── PDFExtractionError
│ ├── PDFValidationError
│ ├── UnsupportedFileFormatError
│ └── FileSizeExceededError
├── EmbeddingError
│ ├── EmbeddingCreationError
│ ├── ModelLoadError
│ └── SemanticSearchError
├── LLMError
│ ├── APIKeyError
│ ├── RateLimitError
│ ├── TokenLimitError
│ └── LLMResponseError
├── CacheError
│ ├── CacheNotFoundError
│ └── CacheCorruptedError
├── RepositoryError
│ ├── DocumentNotFoundError
│ ├── SessionNotFoundError
│ └── RepositoryAccessError
└── APIError
├── AuthenticationError
├── AuthorizationError
├── BadRequestError
├── NotFoundError
├── ConflictError
└── InternalServerError
Usage:
from src.core.exceptions import (
PDFExtractionError,
DocumentNotFoundError,
get_error_response,
http_status_code
)
try:
extract_pdf(file_path)
except PDFExtractionError as e:
error_response = get_error_response(e)
status_code = http_status_code(e)
return error_response, status_codeintelligent_query/
├── src/
│ ├── config/
│ │ ├── __init__.py
│ │ └── settings.py ✅ NEW
│ ├── core/
│ │ ├── __init__.py ✅ NEW
│ │ ├── models.py ✅ NEW
│ │ └── exceptions.py ✅ NEW
│ ├── services/
│ │ ├── __init__.py ✅ NEW (empty, for Phase 2)
│ │ ├── pdf_service.py (Phase 2)
│ │ ├── embedding_service.py (Phase 2)
│ │ ├── llm_service.py (Phase 2)
│ │ └── cache_service.py (Phase 2)
│ ├── app.py (existing)
│ ├── web_app.py (existing)
│ └── new_app.py (existing)
├── ARCHITECTURE_IMPROVEMENTS.md ✅ NEW
└── ... other files
✅ Centralized Configuration
- Single source of truth for all settings
- Easy environment-based configuration
- Automatic validation
- Type-safe with dataclasses
✅ Strong Typing
- Type-safe data models with dataclasses
- Clear contracts for data structures
- Better IDE support and autocomplete
- Easier debugging
✅ Structured Error Handling
- Clear exception hierarchy
- Error codes and details
- Automatic HTTP status mapping
- Consistent error responses
✅ Better Testability
- Mockable configuration
- Testable data models
- Clear exception behavior
- Dependency injection ready
✅ Maintainability
- Clear separation of concerns
- Self-documenting code
- Easier onboarding for new developers
- Centralized business logic
Create service classes that encapsulate business logic:
PDFService- PDF extraction and processingEmbeddingService- Vector embeddings and searchLLMService- Language model interactionsCacheService- Document caching with TTL
Abstract data access patterns:
DocumentRepository- Document storage/retrievalSessionRepository- Chat session management- Multiple implementations (InMemory, Redis, Database)
Update Flask and FastAPI to use new services:
- Dependency injection in routes
- Cleaner request handlers
- Consistent error handling
- Better code organization
- Unit tests for all services
- Integration tests for APIs
- API documentation updates
- Service documentation
| Aspect | Before | After |
|---|---|---|
| Configuration | Scattered globally | Centralized, validated |
| Data Models | Dictionaries | Type-safe dataclasses |
| Error Handling | Generic try/catch | Structured hierarchy |
| Testing | Difficult, tightly coupled | Easy, loosely coupled |
| Code Duplication | High | Reduced (services layer) |
| Maintainability | Hard to modify | Easy to extend |
| Onboarding | Time-consuming | Clear structure |
| Type Safety | Minimal | Complete with mypy support |
from src.config import get_settings
settings = get_settings()
# Access any setting
pdf_max_size = settings.pdf.max_file_size
embedding_model = settings.embedding.model_name
api_port = settings.api.port
groq_key = settings.groq_api_key
# Validate settings
is_valid, error = settings.validate()
if not is_valid:
raise RuntimeError(error)
# Export for logging
config_dict = settings.to_dict() # No secrets!
logger.info(f"Configuration: {config_dict}")from src.core.exceptions import PDFExtractionError, DocumentNotFoundError
# Raise with details
raise PDFExtractionError(
message="Failed to extract text from PDF",
error_code="PDF_EXTRACTION_FAILED",
details={"file": "document.pdf", "reason": "Corrupted file"}
)
# Handle with automatic API response
try:
process_document()
except DocumentNotFoundError as e:
response = e.to_dict()
status_code = http_status_code(e) # Returns 404
return response, status_codefrom src.core.models import Document, APIResponse
# Create domain objects
doc = Document(
id="doc_123",
filename="research.pdf",
chunks=chunks,
embeddings=embeddings,
created_at=time.time(),
updated_at=time.time(),
file_size=1024000
)
# Convert to JSON with to_dict()
response = APIResponse(
success=True,
data=doc.to_dict(),
metadata={'operation': 'upload'}
)
# Easy JSON serialization
import json
json_str = json.dumps(response.to_dict())# Test imports
python -c "from src.config import get_settings; print(get_settings().to_dict())"
# Test models
python -c "from src.core.models import Document; print(Document.__doc__)"
# Test exceptions
python -c "from src.core.exceptions import PDFExtractionError; raise PDFExtractionError('test')"When ready to start Phase 2 (Service Layer):
- Create
src/services/pdf_service.pywithPDFServiceclass - Create
src/services/embedding_service.pywithEmbeddingServiceclass - Create
src/services/llm_service.pywithLLMServiceclass - Create
src/services/cache_service.pywithCacheServiceclass - Each service uses configuration from
get_settings() - Each service raises appropriate exceptions from
src.core.exceptions - Each service returns data models from
src.core.models
Status: ✅ Phase 1 Complete - Foundation Ready for Services Layer