Port and enhance unified ai api#73
Conversation
…ister custom docs blueprint; keep Swagger disabled
…r transcribe endpoints
Co-authored-by: denizcan.uelker <denizcan.uelker@mercedes-benz.com>
|
Cursor Agent can help with this pull request. Just |
Reviewer's GuideThis PR refactors the secure API server to disable the default Swagger UI, register a custom docs blueprint, add lazy-loaded summarization and transcription services, expose new text/voice processing endpoints with corresponding API models, update the OpenAPI spec, and add integration tests for these extensions. Sequence diagram for the new /api/analyze/voice_journal endpointsequenceDiagram
actor User
participant API_Server
participant Transcriber
participant JournalAnalyzer
User->>API_Server: POST /api/analyze/voice_journal (audio file)
API_Server->>Transcriber: transcribe(audio)
Transcriber-->>API_Server: transcription result
API_Server->>JournalAnalyzer: analyze_journal(transcription.text)
JournalAnalyzer-->>API_Server: analysis result
API_Server-->>User: return journal analysis response
Class diagram for new and updated API models and servicesclassDiagram
class SummarizeRequest {
+string text
+string model
}
class SummarizeResponse {
+string summary
+object meta
}
class JournalRequest {
+string text
+bool generate_summary
+float emotion_threshold
}
class JournalResponse {
+object emotion_analysis
+object summary
+float processing_time_ms
+object pipeline_status
}
class TranscriptionResponse {
+string text
+string language
+float confidence
+float duration
+string audio_quality
+int word_count
+float speaking_rate
}
class BatchTranscriptionResponse {
+int total_files
+array results
}
class T5SummarizationModel {
+generate_summary(text, max_length, min_length)
}
class WhisperTranscriber {
+transcribe(audio_path)
}
SummarizeRequest --> SummarizeResponse
JournalRequest --> JournalResponse
WhisperTranscriber --> TranscriptionResponse
WhisperTranscriber --> BatchTranscriptionResponse
T5SummarizationModel --> SummarizeResponse
JournalResponse --> SummarizeResponse
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Summary of Changes
Hello @uelkerd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the unified AI API by introducing new core endpoints for advanced text and voice processing. It expands the API's functionality beyond basic emotion detection to include text summarization, detailed journal analysis, and audio transcription. Additionally, it addresses and resolves a critical issue with the Swagger documentation, ensuring that the API's capabilities are properly exposed and accessible.
Highlights
- Expanded AI API Capabilities: The API now includes new endpoints for text summarization, comprehensive journal analysis (combining emotion detection with summarization), and robust voice transcription capabilities for both single and batch audio files.
- Improved API Documentation: A custom Swagger UI has been implemented and integrated, resolving the persistent 500 error on the
/docsendpoint and providing functional API documentation. - Optimized Model Loading: Summarization and transcription models are now lazy-loaded, meaning they are only initialized when their respective endpoints are first called, optimizing server startup and resource utilization.
- Comprehensive Integration Tests: New integration tests have been added to ensure the stability and correctness of the newly introduced text and voice processing API endpoints.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
|
Here's the code health analysis summary for commits Analysis Summary
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant new functionality with text summarization and voice processing endpoints. The implementation is generally solid, with good use of lazy loading for models and a custom Swagger UI fix. My review focuses on improving API consistency, making error handling more robust, and addressing some placeholder values in the implementation. I've also added suggestions to enhance the OpenAPI specification for better client-side experience and a minor improvement in the new integration tests.
| summary_results = { | ||
| 'summary': summary_text, | ||
| 'key_emotions': [e.get('emotion') for e in (emotion_results.get('emotions') or [])[:1]], | ||
| 'compression_ratio': 0.5, |
There was a problem hiding this comment.
The compression_ratio is hardcoded to 0.5. This seems to be a placeholder value. It should be calculated dynamically based on the length of the original text and the generated summary to be meaningful.
| 'compression_ratio': 0.5, | |
| 'compression_ratio': len(summary_text) / len(safe_text) if len(safe_text) > 0 else 0, |
| 'summary': summary_text, | ||
| 'key_emotions': [e.get('emotion') for e in (emotion_results.get('emotions') or [])[:1]], | ||
| 'compression_ratio': 0.5, | ||
| 'emotional_tone': 'neutral' |
There was a problem hiding this comment.
The emotional_tone is hardcoded to 'neutral'. This appears to be a placeholder. This value should be derived from the emotion_analysis results, for example by selecting the top emotion or a combination of emotions.
| 'emotional_tone': 'neutral' | |
| 'emotional_tone': (emotion_results.get('emotions') or [{}])[0].get('emotion', 'neutral') |
| summary: | ||
| type: object | ||
| nullable: true |
There was a problem hiding this comment.
The summary object within JournalResponse is defined with type: object without specifying its properties. This makes the API contract less clear for consumers. It's better to define the full structure of the summary object for clarity and to enable better client-side validation.
summary:
type: object
nullable: true
properties:
summary:
type: string
description: The generated summary text.
key_emotions:
type: array
items:
type: string
description: Key emotions detected in the text.
compression_ratio:
type: number
format: float
description: The ratio of original text length to summary length.
emotional_tone:
type: string
description: The overall emotional tone of the summary.
| items: | ||
| type: object | ||
| properties: | ||
| index: | ||
| type: integer | ||
| success: | ||
| type: boolean | ||
| text: | ||
| type: string | ||
| nullable: true | ||
| language: | ||
| type: string | ||
| nullable: true | ||
| confidence: | ||
| type: number | ||
| format: float | ||
| nullable: true | ||
| error: | ||
| type: string | ||
| nullable: true |
There was a problem hiding this comment.
The schema for successful items in BatchTranscriptionResponse is inconsistent with TranscriptionResponse. It's missing fields like duration, audio_quality, word_count, and speaking_rate. To maintain consistency across the API, the batch response should provide the same level of detail for each successful transcription.
| except Exception as exc: | ||
| logger.warning("Summarizer unavailable: %s", exc) | ||
| _summarizer = None | ||
| return False |
There was a problem hiding this comment.
Catching a broad Exception can hide specific issues and make debugging harder. It's better to catch more specific exceptions, like ImportError if a dependency is missing, or RuntimeError for model loading issues. This also applies to the ensure_transcriber_loaded function.
| except Exception as exc: | |
| logger.warning("Summarizer unavailable: %s", exc) | |
| _summarizer = None | |
| return False | |
| except (ImportError, RuntimeError) as exc: | |
| logger.warning("Summarizer unavailable: %s", exc) | |
| _summarizer = None | |
| return False |
| try: | ||
| os.remove(audio_path) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
The except Exception: pass in the finally block silently ignores any errors during file cleanup. This can lead to temporary files being left behind without any notification. It's better to log the exception. This same issue exists in the transcribe_batch and analyze_voice_journal endpoints.
| try: | |
| os.remove(audio_path) | |
| except Exception: | |
| pass | |
| try: | |
| os.remove(audio_path) | |
| except Exception as e: | |
| logger.warning(f"Failed to remove temporary file {audio_path}: {e}") |
| temp_paths.append(p) | ||
| try: | ||
| r = _transcriber.transcribe(p) | ||
| results.append({'index': idx, 'success': True, 'text': r.text, 'language': r.language, 'confidence': r.confidence}) |
There was a problem hiding this comment.
The response for successful batch transcriptions is missing several fields that are available in the TranscriptionResult object and are returned by the single /transcribe endpoint (e.g., duration, audio_quality, word_count, speaking_rate). For consistency, these fields should be included in the batch response as well. This would also align the implementation with the suggested changes in openapi.yaml.
| results.append({'index': idx, 'success': True, 'text': r.text, 'language': r.language, 'confidence': r.confidence}) | |
| results.append({ | |
| 'index': idx, | |
| 'success': True, | |
| 'text': r.text, | |
| 'language': r.language, | |
| 'confidence': r.confidence, | |
| 'duration': r.duration, | |
| 'audio_quality': r.audio_quality, | |
| 'word_count': r.word_count, | |
| 'speaking_rate': r.speaking_rate | |
| }) |
| try: | ||
| os.remove(p) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
The except Exception: pass in the finally block silently ignores any errors during file cleanup. This can lead to temporary files being left behind without any notification. It's better to log the exception.
try:
os.remove(p)
except Exception as e:
logger.warning(f"Failed to remove temporary file {p}: {e}")| try: | ||
| os.remove(audio_path) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
The except Exception: pass in the finally block silently ignores any errors during file cleanup. This can lead to temporary files being left behind without any notification. It's better to log the exception.
try:
os.remove(audio_path)
except Exception as e:
logger.warning(f"Failed to remove temporary file {audio_path}: {e}")| try: | ||
| fh.close() | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
The except Exception: pass silently ignores potential errors when closing file handles. While this is a test file, it's still good practice to handle exceptions explicitly, for example by logging them, to avoid hiding potential issues during test runs.
| try: | |
| fh.close() | |
| except Exception: | |
| pass | |
| try: | |
| fh.close() | |
| except Exception as e: | |
| print(f"Warning: failed to close file handle for {name}: {e}") |
…cs blueprint defaults
…tests for lazy loaders and error paths
Co-authored-by: denizcan.uelker <denizcan.uelker@mercedes-benz.com>
Resolved issues in the following files with DeepSource Autofix: 1. tests/integration/conftest.py 2. tests/integration/test_secure_api_extensions.py 3. tests/unit/test_lazy_loaders.py
Implement core text and voice processing API endpoints and fix Swagger documentation.
This PR significantly expands the API's capabilities beyond basic emotion detection by integrating text summarization, journal analysis, and voice transcription features. It also resolves the persistent 500 error on the
/docsendpoint by serving a custom, functional Swagger UI.Summary by Sourcery
Port and enhance unified AI API by integrating text summarization, journal analysis, and voice transcription endpoints with lazy model loading, custom Swagger UI, and updated OpenAPI spec and integration tests.
New Features:
Bug Fixes:
Enhancements:
Documentation:
Tests: