Skip to content

Port and enhance unified ai api#73

Draft
uelkerd wants to merge 12 commits into
mainfrom
cursor/port-and-enhance-unified-ai-api-03a1
Draft

Port and enhance unified ai api#73
uelkerd wants to merge 12 commits into
mainfrom
cursor/port-and-enhance-unified-ai-api-03a1

Conversation

@uelkerd
Copy link
Copy Markdown
Owner

@uelkerd uelkerd commented Aug 12, 2025

Implement core text and voice processing API endpoints and fix Swagger documentation.

This PR significantly expands the API's capabilities beyond basic emotion detection by integrating text summarization, journal analysis, and voice transcription features. It also resolves the persistent 500 error on the /docs endpoint by serving a custom, functional Swagger UI.


Open in Cursor Open in Web

Summary by Sourcery

Port and enhance unified AI API by integrating text summarization, journal analysis, and voice transcription endpoints with lazy model loading, custom Swagger UI, and updated OpenAPI spec and integration tests.

New Features:

  • Add /api/summarize endpoint for text summarization
  • Add /api/analyze/journal endpoint for combined emotion detection and optional summarization
  • Add /api/transcribe and /api/transcribe_batch endpoints for single and batch audio transcription
  • Add /api/analyze/voice_journal endpoint to transcribe audio and perform full journal analysis

Bug Fixes:

  • Fix persistent 500 error on /docs by disabling default Swagger UI and registering a custom docs blueprint

Enhancements:

  • Implement lazy loading for T5 summarizer and Whisper transcriber to defer model imports and initialization

Documentation:

  • Extend openapi.yaml with new paths and request/response schemas for summarization, journal analysis, and transcription APIs

Tests:

  • Add integration tests for summarization, journal analysis, single and batch transcription endpoints

@cursor
Copy link
Copy Markdown

cursor Bot commented Aug 12, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Aug 12, 2025

Reviewer's Guide

This PR refactors the secure API server to disable the default Swagger UI, register a custom docs blueprint, add lazy-loaded summarization and transcription services, expose new text/voice processing endpoints with corresponding API models, update the OpenAPI spec, and add integration tests for these extensions.

Sequence diagram for the new /api/analyze/voice_journal endpoint

sequenceDiagram
    actor User
    participant API_Server
    participant Transcriber
    participant JournalAnalyzer
    User->>API_Server: POST /api/analyze/voice_journal (audio file)
    API_Server->>Transcriber: transcribe(audio)
    Transcriber-->>API_Server: transcription result
    API_Server->>JournalAnalyzer: analyze_journal(transcription.text)
    JournalAnalyzer-->>API_Server: analysis result
    API_Server-->>User: return journal analysis response
Loading

Class diagram for new and updated API models and services

classDiagram
    class SummarizeRequest {
      +string text
      +string model
    }
    class SummarizeResponse {
      +string summary
      +object meta
    }
    class JournalRequest {
      +string text
      +bool generate_summary
      +float emotion_threshold
    }
    class JournalResponse {
      +object emotion_analysis
      +object summary
      +float processing_time_ms
      +object pipeline_status
    }
    class TranscriptionResponse {
      +string text
      +string language
      +float confidence
      +float duration
      +string audio_quality
      +int word_count
      +float speaking_rate
    }
    class BatchTranscriptionResponse {
      +int total_files
      +array results
    }
    class T5SummarizationModel {
      +generate_summary(text, max_length, min_length)
    }
    class WhisperTranscriber {
      +transcribe(audio_path)
    }
    SummarizeRequest --> SummarizeResponse
    JournalRequest --> JournalResponse
    WhisperTranscriber --> TranscriptionResponse
    WhisperTranscriber --> BatchTranscriptionResponse
    T5SummarizationModel --> SummarizeResponse
    JournalResponse --> SummarizeResponse
Loading

File-Level Changes

Change Details Files
Configure custom Swagger UI and disable default docs
  • Import and conditionally register a docs_blueprint with fallback to None
  • Set OPENAPI_SPEC_PATH and OPENAPI_ALLOWED_DIR when unset
  • Disable Flask-RESTX default Swagger docs to prevent 500 errors
deployment/cloud-run/secure_api_server.py
Add lazy-loading mechanisms for summarization and transcription models
  • Define ensure_summarizer_loaded with thread lock and dynamic import of T5SummarizationModel
  • Define ensure_transcriber_loaded with thread lock and dynamic import of WhisperTranscriber
  • Introduce global variables and locks for both models
deployment/cloud-run/secure_api_server.py
Implement new text and voice processing API endpoints
  • Define RESTX models for SummarizeRequest/Response and JournalRequest/Response
  • Implement /summarize and /analyze/journal endpoints with sanitization, rate limiting, and error handling
  • Add _save_upload helper and implement /transcribe and /transcribe_batch endpoints
  • Implement /analyze/voice_journal endpoint by chaining transcription and journal analysis
deployment/cloud-run/secure_api_server.py
Expand OpenAPI specification with new paths and schemas
  • Add OpenAPI path entries for summarization, journal analysis, transcription, batch transcription, and voice journal analysis
  • Define new components schemas: SummarizeRequest/Response, JournalRequest/Response, TranscriptionResponse, BatchTranscriptionResponse
deployment/cloud-run/openapi.yaml
Add integration tests for the extended API
  • Create pytest-based smoke tests for /api/summarize and /api/analyze/journal
  • Add file-based tests for /api/transcribe and /api/transcribe_batch endpoints
tests/integration/test_secure_api_extensions.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Aug 12, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cursor/port-and-enhance-unified-ai-api-03a1

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @uelkerd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the unified AI API by introducing new core endpoints for advanced text and voice processing. It expands the API's functionality beyond basic emotion detection to include text summarization, detailed journal analysis, and audio transcription. Additionally, it addresses and resolves a critical issue with the Swagger documentation, ensuring that the API's capabilities are properly exposed and accessible.

Highlights

  • Expanded AI API Capabilities: The API now includes new endpoints for text summarization, comprehensive journal analysis (combining emotion detection with summarization), and robust voice transcription capabilities for both single and batch audio files.
  • Improved API Documentation: A custom Swagger UI has been implemented and integrated, resolving the persistent 500 error on the /docs endpoint and providing functional API documentation.
  • Optimized Model Loading: Summarization and transcription models are now lazy-loaded, meaning they are only initialized when their respective endpoints are first called, optimizing server startup and resource utilization.
  • Comprehensive Integration Tests: New integration tests have been added to ensure the stability and correctness of the newly introduced text and voice processing API endpoints.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@deepsource-io
Copy link
Copy Markdown
Contributor

deepsource-io Bot commented Aug 12, 2025

Here's the code health analysis summary for commits 69ddfe4..cb9c733. View details on DeepSource ↗.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource Test coverage LogoTest coverage⚠️ Artifact not reportedTimed out: Artifact was never reportedView Check ↗
DeepSource Python LogoPython❌ Failure
❗ 100 occurences introduced
View Check ↗
DeepSource Terraform LogoTerraform✅ SuccessView Check ↗
DeepSource Secrets LogoSecrets✅ SuccessView Check ↗
DeepSource Shell LogoShell✅ SuccessView Check ↗
DeepSource Docker LogoDocker✅ SuccessView Check ↗

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new functionality with text summarization and voice processing endpoints. The implementation is generally solid, with good use of lazy loading for models and a custom Swagger UI fix. My review focuses on improving API consistency, making error handling more robust, and addressing some placeholder values in the implementation. I've also added suggestions to enhance the OpenAPI specification for better client-side experience and a minor improvement in the new integration tests.

summary_results = {
'summary': summary_text,
'key_emotions': [e.get('emotion') for e in (emotion_results.get('emotions') or [])[:1]],
'compression_ratio': 0.5,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The compression_ratio is hardcoded to 0.5. This seems to be a placeholder value. It should be calculated dynamically based on the length of the original text and the generated summary to be meaningful.

Suggested change
'compression_ratio': 0.5,
'compression_ratio': len(summary_text) / len(safe_text) if len(safe_text) > 0 else 0,

'summary': summary_text,
'key_emotions': [e.get('emotion') for e in (emotion_results.get('emotions') or [])[:1]],
'compression_ratio': 0.5,
'emotional_tone': 'neutral'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The emotional_tone is hardcoded to 'neutral'. This appears to be a placeholder. This value should be derived from the emotion_analysis results, for example by selecting the top emotion or a combination of emotions.

Suggested change
'emotional_tone': 'neutral'
'emotional_tone': (emotion_results.get('emotions') or [{}])[0].get('emotion', 'neutral')

Comment on lines +443 to +445
summary:
type: object
nullable: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The summary object within JournalResponse is defined with type: object without specifying its properties. This makes the API contract less clear for consumers. It's better to define the full structure of the summary object for clarity and to enable better client-side validation.

        summary:
          type: object
          nullable: true
          properties:
            summary:
              type: string
              description: The generated summary text.
            key_emotions:
              type: array
              items:
                type: string
              description: Key emotions detected in the text.
            compression_ratio:
              type: number
              format: float
              description: The ratio of original text length to summary length.
            emotional_tone:
              type: string
              description: The overall emotional tone of the summary.

Comment on lines +481 to +500
items:
type: object
properties:
index:
type: integer
success:
type: boolean
text:
type: string
nullable: true
language:
type: string
nullable: true
confidence:
type: number
format: float
nullable: true
error:
type: string
nullable: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The schema for successful items in BatchTranscriptionResponse is inconsistent with TranscriptionResponse. It's missing fields like duration, audio_quality, word_count, and speaking_rate. To maintain consistency across the API, the batch response should provide the same level of detail for each successful transcription.

Comment on lines +162 to +165
except Exception as exc:
logger.warning("Summarizer unavailable: %s", exc)
_summarizer = None
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception can hide specific issues and make debugging harder. It's better to catch more specific exceptions, like ImportError if a dependency is missing, or RuntimeError for model loading issues. This also applies to the ensure_transcriber_loaded function.

Suggested change
except Exception as exc:
logger.warning("Summarizer unavailable: %s", exc)
_summarizer = None
return False
except (ImportError, RuntimeError) as exc:
logger.warning("Summarizer unavailable: %s", exc)
_summarizer = None
return False

Comment on lines +369 to +372
try:
os.remove(audio_path)
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The except Exception: pass in the finally block silently ignores any errors during file cleanup. This can lead to temporary files being left behind without any notification. It's better to log the exception. This same issue exists in the transcribe_batch and analyze_voice_journal endpoints.

Suggested change
try:
os.remove(audio_path)
except Exception:
pass
try:
os.remove(audio_path)
except Exception as e:
logger.warning(f"Failed to remove temporary file {audio_path}: {e}")

temp_paths.append(p)
try:
r = _transcriber.transcribe(p)
results.append({'index': idx, 'success': True, 'text': r.text, 'language': r.language, 'confidence': r.confidence})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The response for successful batch transcriptions is missing several fields that are available in the TranscriptionResult object and are returned by the single /transcribe endpoint (e.g., duration, audio_quality, word_count, speaking_rate). For consistency, these fields should be included in the batch response as well. This would also align the implementation with the suggested changes in openapi.yaml.

Suggested change
results.append({'index': idx, 'success': True, 'text': r.text, 'language': r.language, 'confidence': r.confidence})
results.append({
'index': idx,
'success': True,
'text': r.text,
'language': r.language,
'confidence': r.confidence,
'duration': r.duration,
'audio_quality': r.audio_quality,
'word_count': r.word_count,
'speaking_rate': r.speaking_rate
})

Comment on lines +416 to +419
try:
os.remove(p)
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The except Exception: pass in the finally block silently ignores any errors during file cleanup. This can lead to temporary files being left behind without any notification. It's better to log the exception.

                    try:
                        os.remove(p)
                    except Exception as e:
                        logger.warning(f"Failed to remove temporary file {p}: {e}")

Comment on lines +449 to +452
try:
os.remove(audio_path)
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The except Exception: pass in the finally block silently ignores any errors during file cleanup. This can lead to temporary files being left behind without any notification. It's better to log the exception.

                try:
                    os.remove(audio_path)
                except Exception as e:
                    logger.warning(f"Failed to remove temporary file {audio_path}: {e}")

Comment on lines +90 to +93
try:
fh.close()
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The except Exception: pass silently ignores potential errors when closing file handles. While this is a test file, it's still good practice to handle exceptions explicitly, for example by logging them, to avoid hiding potential issues during test runs.

Suggested change
try:
fh.close()
except Exception:
pass
try:
fh.close()
except Exception as e:
print(f"Warning: failed to close file handle for {name}: {e}")

@uelkerd uelkerd self-assigned this Aug 12, 2025
Resolved issues in the following files with DeepSource Autofix:
1. tests/integration/conftest.py
2. tests/integration/test_secure_api_extensions.py
3. tests/unit/test_lazy_loaders.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants