flexible api: explicit timeouts + no PII in errors by uelkerd · Pull Request #57 · uelkerd/SAMO--DL

uelkerd · 2025-08-10T23:03:40Z

Summary by Sourcery

Add a Flask-based flexible emotion detection API server supporting serverless, endpoint, and local deployments with explicit timeouts and PII-safe error handling

New Features:

Implement a Flask server offering single (/predict) and batch (/predict_batch) emotion prediction endpoints along with health and documentation routes
Support serverless HuggingFace Inference API, paid inference endpoints, and local model deployments through a unified detector class

Enhancements:

Introduce environment-configurable timeouts and retry strategy for remote API calls
Redact input text in error logs to avoid exposing PII
Add secure default host binding and runtime security warnings for all-interfaces requests

Summary by CodeRabbit

New Features
- Introduced a flexible emotion detection API server with multiple deployment options, including serverless, paid endpoints, and local models.
- Added endpoints for health checks, single and batch emotion prediction, and API documentation.
- Enhanced error handling and security warnings for safer deployment.
- Provided structured JSON responses with emotion predictions and metadata.
- Included example usage instructions and curl commands for easier testing.

coderabbitai · 2025-08-10T23:03:46Z

Walkthrough

A new Flask-based API server for emotion detection has been implemented. It supports three HuggingFace deployment strategies—serverless inference API, paid inference endpoints, and local self-hosted models—configurable via environment variables. The server exposes endpoints for health checks, single and batch predictions, and API documentation, with robust error handling, logging, and security guidance.

Changes

Cohort / File(s)	Change Summary
Flexible Emotion Detection API Server `deployment/flexible_api_server.py`	Introduced a configurable Flask API server for emotion detection supporting HuggingFace serverless, endpoints, and local models. Added the `FlexibleEmotionDetector` class, multiple prediction methods, error handling, logging, and endpoints for health, single/batch prediction, and documentation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant FlaskAPI
    participant FlexibleEmotionDetector
    participant HuggingFaceAPI
    participant LocalModel

    Client->>FlaskAPI: POST /predict (text)
    FlaskAPI->>FlexibleEmotionDetector: predict(text)
    alt Serverless/Endpoint
        FlexibleEmotionDetector->>HuggingFaceAPI: Request prediction
        HuggingFaceAPI-->>FlexibleEmotionDetector: Return prediction
    else Local
        FlexibleEmotionDetector->>LocalModel: Predict emotion
        LocalModel-->>FlexibleEmotionDetector: Return prediction
    end
    FlexibleEmotionDetector-->>FlaskAPI: Prediction result
    FlaskAPI-->>Client: JSON response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

A bunny hopped to code today,
And built an API the flexible way!
Emotions detected, with models three—
Cloud, endpoint, or run locally.
Secure and robust, with logs that delight,
This fluffy release is working just right!
🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/flexible-api-timeouts

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

sourcery-ai · 2025-08-10T23:03:52Z

Reviewer's Guide

This PR introduces a new Flask-based emotion detection API server that supports serverless, endpoint, and local deployment strategies, with explicit timeout configuration, robust retry handling, and PII-safe error logging implemented within a single Python module.

Sequence diagram for /predict endpoint request flow

sequenceDiagram
    actor User
    participant FlaskApp as Flask App
    participant Detector as FlexibleEmotionDetector
    User->>FlaskApp: POST /predict {"text": "..."}
    FlaskApp->>Detector: predict(text)
    Detector->>Detector: Select deployment strategy
    alt Serverless
        Detector->>Detector: _predict_serverless(text)
    else Endpoint
        Detector->>Detector: _predict_endpoint(text)
    else Local
        Detector->>Detector: _predict_local(text)
    end
    Detector-->>FlaskApp: prediction result
    FlaskApp-->>User: JSON response

Sequence diagram for /predict_batch endpoint request flow

sequenceDiagram
    actor User
    participant FlaskApp as Flask App
    participant Detector as FlexibleEmotionDetector
    User->>FlaskApp: POST /predict_batch {"texts": ["...", ...]}
    FlaskApp->>Detector: predict(text) for each text
    Detector->>Detector: Select deployment strategy per text
    Detector-->>FlaskApp: prediction results (list)
    FlaskApp-->>User: JSON response with predictions

Class diagram for FlexibleEmotionDetector and DeploymentType

classDiagram
    class FlexibleEmotionDetector {
        - deployment_type: DeploymentType
        - model_name: str
        - hf_token: str
        - emotion_labels: List[str]
        - model: Any
        - tokenizer: Any
        - session: Any
        + __init__()
        + predict(text: str) Dict[str, Any]
        + get_status() Dict[str, Any]
        - _initialize()
        - _initialize_serverless()
        - _initialize_endpoint()
        - _initialize_local()
        - _predict_serverless(text: str) Dict[str, Any]
        - _predict_endpoint(text: str) Dict[str, Any]
        - _predict_local(text: str) Dict[str, Any]
        - _get_model_device_str() Optional[str]
    }
    class DeploymentType {
        SERVERLESS
        ENDPOINT
        LOCAL
    }
    FlexibleEmotionDetector --> DeploymentType

File-Level Changes

Change	Details	Files
Configurable request timeouts with user guidance	Added TIMEOUT_SECONDS environment variable for serverless and endpoint calls Applied timeout parameter on session.post requests Special handling of Timeout exceptions to return friendly error and suggestion	`deployment/flexible_api_server.py`
Redacted input text in logs to avoid PII leaks	Logged only a 20-character preview of user text on errors Replaced raw text in exception handlers with redacted preview	`deployment/flexible_api_server.py`
Built in HTTP retry strategy for serverless API calls	Created requests.Session with HTTPAdapter and Retry Configured retry on status codes [429,500,502,503,504] and specific HTTP methods	`deployment/flexible_api_server.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

gemini-code-assist

Summary of Changes

Hello @uelkerd, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've added a new Flask-based API server that provides a flexible solution for emotion detection. This server can seamlessly integrate with various HuggingFace deployment options, from cost-effective serverless solutions to high-performance dedicated endpoints, or even local self-hosted models. A key focus of this change was to enhance the API's robustness by implementing explicit timeouts for external requests and improving privacy by ensuring that sensitive user data is not inadvertently logged during errors.

Highlights

Flexible Emotion Detection API Server: I've introduced a new Flask API server designed for emotion detection. This server is highly flexible, supporting three distinct HuggingFace deployment strategies: the free Serverless Inference API, paid Inference Endpoints, and self-hosted local models. This allows for adaptable deployment based on performance and cost requirements.
Explicit Timeouts and PII Redaction: To improve reliability and prevent hanging requests, I've implemented explicit timeouts for all external API calls made to HuggingFace. Additionally, I've focused on privacy by ensuring that Personal Identifiable Information (PII) is not exposed in error logs, redacting sensitive input text when logging prediction failures.
Robust Error Handling and Cold Start Management: The new server includes comprehensive error handling, particularly for the HuggingFace Serverless API, which can experience 'cold starts'. The system now intelligently retries requests if a 503 (Service Unavailable) status is received, allowing the model time to load. This makes the API more resilient to transient issues.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

deepsource-io · 2025-08-10T23:03:55Z

Here's the code health analysis summary for commits 70cb2f6..ae92171. View details on DeepSource ↗.

Analysis Summary

Analyzer	Status	Summary	Link
Test coverage	⚠️ Artifact not reported	Timed out: Artifact was never reported	View Check ↗
Python	❌ Failure	❗ 51 occurences introduced	View Check ↗
Terraform	✅ Success		View Check ↗
Secrets	✅ Success		View Check ↗
Shell	✅ Success		View Check ↗
Docker	✅ Success		View Check ↗

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

sourcery-ai

Hey @uelkerd - I've reviewed your changes - here's some feedback:

Blocking issues:

time.sleep() call; did you mean to leave this in? (link)

General comments:

There’s a lot of duplicated logic in _predict_serverless and _predict_endpoint for parsing and formatting results—consider extracting a shared helper to reduce repetition.
This file mixes CLI startup, Flask routes, and deployment logic into one large script—splitting into modules (e.g. routes, model client, config) would improve readability and maintainability.
Configuration values like timeouts and retry settings are scattered across methods—centralizing them into a single config object or class would make adjustments and testing much easier.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- There’s a lot of duplicated logic in _predict_serverless and _predict_endpoint for parsing and formatting results—consider extracting a shared helper to reduce repetition.
- This file mixes CLI startup, Flask routes, and deployment logic into one large script—splitting into modules (e.g. routes, model client, config) would improve readability and maintainability.
- Configuration values like timeouts and retry settings are scattered across methods—centralizing them into a single config object or class would make adjustments and testing much easier.

## Individual Comments

### Comment 1
<location> `deployment/flexible_api_server.py:39` </location>
<code_context>
+    def __init__(self):
+        """Initialize based on environment configuration."""
+        self.deployment_type = DeploymentType(os.getenv('DEPLOYMENT_TYPE', 'serverless'))
+        self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model')
+        self.hf_token = os.getenv('HF_TOKEN')
+
</code_context>

<issue_to_address>
Default model name is a placeholder and may cause runtime errors.

If MODEL_NAME is not set, loading the model may fail. Add a check to raise an explicit error when MODEL_NAME is missing or still set to the placeholder.
</issue_to_address>

### Comment 2
<location> `deployment/flexible_api_server.py:81` </location>
<code_context>
+        from requests.adapters import HTTPAdapter
+        from requests.packages.urllib3.util.retry import Retry
+
+        retry_strategy = Retry(
+            total=3,
+            backoff_factor=1,
+            status_forcelist=[429, 500, 502, 503, 504],
+            allowed_methods={"POST", "GET", "PUT", "PATCH"}  # Enable retries for these HTTP methods
+        )
+        adapter = HTTPAdapter(max_retries=retry_strategy)
</code_context>

<issue_to_address>
allowed_methods should be a list or tuple for compatibility.

Using a set for allowed_methods may lead to compatibility issues; please use a list or tuple instead.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
        retry_strategy = Retry(
            total=3,
            backoff_factor=1,
            status_forcelist=[429, 500, 502, 503, 504],
            allowed_methods={"POST", "GET", "PUT", "PATCH"}  # Enable retries for these HTTP methods
        )
=======
        retry_strategy = Retry(
            total=3,
            backoff_factor=1,
            status_forcelist=[429, 500, 502, 503, 504],
            allowed_methods=["POST", "GET", "PUT", "PATCH"]  # Enable retries for these HTTP methods
        )
>>>>>>> REPLACE

</suggested_fix>

### Comment 3
<location> `deployment/flexible_api_server.py:294` </location>
<code_context>
+            predicted_class = predicted_class.cpu()
+
+            # Get emotion label
+            if hasattr(self.model.config, 'id2label'):
+                emotion = self.model.config.id2label[predicted_class.item()]
+            else:
+                emotion = self.emotion_labels[predicted_class.item()]
</code_context>

<issue_to_address>
Potential KeyError if id2label does not contain predicted_class index.

Use .get() with a default value or check if the index exists before accessing id2label to avoid a KeyError.
</issue_to_address>

### Comment 4
<location> `deployment/flexible_api_server.py:304` </location>
<code_context>
+
+            # Get all emotion probabilities
+            all_emotions = {}
+            for i, prob in enumerate(probabilities[0]):
+                if hasattr(self.model.config, 'id2label'):
+                    label = self.model.config.id2label[i]
+                else:
+                    label = self.emotion_labels[i] if i < len(self.emotion_labels) else f"emotion_{i}"
+                all_emotions[label] = prob.item()
+
</code_context>

<issue_to_address>
Possible index error if emotion_labels is shorter than model output.

Consider adding a warning log when the fallback label is used to aid in identifying configuration mismatches.

Suggested implementation:

```python
            # Get all emotion probabilities
            all_emotions = {}
            for i, prob in enumerate(probabilities[0]):
                if hasattr(self.model.config, 'id2label'):
                    label = self.model.config.id2label[i]
                else:
                    if i < len(self.emotion_labels):
                        label = self.emotion_labels[i]
                    else:
                        label = f"emotion_{i}"
                        logging.warning(
                            f"Emotion label index {i} exceeds emotion_labels length ({len(self.emotion_labels)}). "
                            f"Using fallback label '{label}'. This may indicate a configuration mismatch."
                        )
                all_emotions[label] = prob.item()

```

```python
import os
import time
import logging
from typing import Dict, List, Optional, Any
from enum import Enum

```
</issue_to_address>

### Comment 5
<location> `deployment/flexible_api_server.py:389` </location>
<code_context>
+    try:
+        data = request.get_json()
+
+        if not data or 'text' not in data:
+            return jsonify({'error': 'No text provided'}), 400
+
</code_context>

<issue_to_address>
Error response for missing 'text' is clear, but could include expected format.

Including the expected request format in the error response will make it easier for clients to fix their requests.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
        if not data or 'text' not in data:
            return jsonify({'error': 'No text provided'}), 400
=======
        if not data or 'text' not in data:
            return jsonify({
                'error': 'No text provided',
                'expected_format': {'text': '<your text here>'}
            }), 400
>>>>>>> REPLACE

</suggested_fix>

### Comment 6
<location> `deployment/flexible_api_server.py:564` </location>
<code_context>
+        print(f"   • Use {all_interfaces_env} only in production with firewall/proxy")
+        print("   • Never expose debug=True to external networks")
+
+    app.run(host=host, port=port, debug=debug)
</code_context>

<issue_to_address>
Flask debug mode should not be enabled in production.

Consider adding a check to prevent debug=True when the app is bound to all interfaces, to avoid accidental exposure.
</issue_to_address>

## Security Issues

### Issue 1
<location> `deployment/flexible_api_server.py:167` </location>

<issue_to_address>
**security (python.lang.best-practice.arbitrary-sleep):** time.sleep() call; did you mean to leave this in?

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-08-10T23:04:39Z

+    def __init__(self):
+        """Initialize based on environment configuration."""
+        self.deployment_type = DeploymentType(os.getenv('DEPLOYMENT_TYPE', 'serverless'))
+        self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model')


issue (bug_risk): Default model name is a placeholder and may cause runtime errors.

If MODEL_NAME is not set, loading the model may fail. Add a check to raise an explicit error when MODEL_NAME is missing or still set to the placeholder.

sourcery-ai · 2025-08-10T23:04:39Z

+        retry_strategy = Retry(
+            total=3,
+            backoff_factor=1,
+            status_forcelist=[429, 500, 502, 503, 504],
+            allowed_methods={"POST", "GET", "PUT", "PATCH"}  # Enable retries for these HTTP methods
+        )


suggestion: allowed_methods should be a list or tuple for compatibility.

Using a set for allowed_methods may lead to compatibility issues; please use a list or tuple instead.

Suggested change

retry_strategy = Retry(

total=3,

backoff_factor=1,

status_forcelist=[429, 500, 502, 503, 504],

allowed_methods={"POST", "GET", "PUT", "PATCH"} # Enable retries for these HTTP methods

)

retry_strategy = Retry(

total=3,

backoff_factor=1,

status_forcelist=[429, 500, 502, 503, 504],

allowed_methods=["POST", "GET", "PUT", "PATCH"] # Enable retries for these HTTP methods

)

sourcery-ai · 2025-08-10T23:04:39Z

+            if hasattr(self.model.config, 'id2label'):
+                emotion = self.model.config.id2label[predicted_class.item()]


issue: Potential KeyError if id2label does not contain predicted_class index.

Use .get() with a default value or check if the index exists before accessing id2label to avoid a KeyError.

sourcery-ai · 2025-08-10T23:04:39Z

+            for i, prob in enumerate(probabilities[0]):
+                if hasattr(self.model.config, 'id2label'):
+                    label = self.model.config.id2label[i]
+                else:
+                    label = self.emotion_labels[i] if i < len(self.emotion_labels) else f"emotion_{i}"


suggestion: Possible index error if emotion_labels is shorter than model output.

Consider adding a warning log when the fallback label is used to aid in identifying configuration mismatches.

Suggested implementation:

# Get all emotion probabilities all_emotions = {} for i, prob in enumerate(probabilities[0]): if hasattr(self.model.config, 'id2label'): label = self.model.config.id2label[i] else: if i < len(self.emotion_labels): label = self.emotion_labels[i] else: label = f"emotion_{i}" logging.warning( f"Emotion label index {i} exceeds emotion_labels length ({len(self.emotion_labels)}). " f"Using fallback label '{label}'. This may indicate a configuration mismatch." ) all_emotions[label] = prob.item()

import os import time import logging from typing import Dict, List, Optional, Any from enum import Enum

sourcery-ai · 2025-08-10T23:04:40Z

+        if not data or 'text' not in data:
+            return jsonify({'error': 'No text provided'}), 400


suggestion: Error response for missing 'text' is clear, but could include expected format.

Including the expected request format in the error response will make it easier for clients to fix their requests.

Suggested change

if not data or 'text' not in data:

return jsonify({'error': 'No text provided'}), 400

if not data or 'text' not in data:

return jsonify({

'error': 'No text provided',

'expected_format': {'text': '<your text here>'}

}), 400

sourcery-ai · 2025-08-10T23:04:40Z

+        print(f"   • Use {all_interfaces_env} only in production with firewall/proxy")
+        print("   • Never expose debug=True to external networks")
+
+    app.run(host=host, port=port, debug=debug)


🚨 suggestion (security): Flask debug mode should not be enabled in production.

Consider adding a check to prevent debug=True when the app is bound to all interfaces, to avoid accidental exposure.

sourcery-ai · 2025-08-10T23:04:40Z

+            if response.status_code == 503:
+                # Model is loading (cold start)
+                logger.info("🔄 Model loading, waiting...")
+                time.sleep(10)  # Wait for model to load


security (python.lang.best-practice.arbitrary-sleep): time.sleep() call; did you mean to leave this in?

Source: opengrep

sourcery-ai · 2025-08-10T23:04:40Z

+        """Predict using local model."""
+        try:
+            # Tokenize input
+            inputs = self.tokenizer(


issue (code-quality): We've found these issues:

Extract code out into method (extract-method)

Replace if statement with if expression (assign-if-exp)

sourcery-ai · 2025-08-10T23:04:40Z

+        if 'error' in result:
+            return jsonify(result), 500
+        return jsonify(result)
+


suggestion (code-quality): We've found these issues:

Lift code into else after jump in control flow (reintroduce-else)

Replace if statement with if expression (assign-if-exp)

Suggested change

if 'error' in result:

return jsonify(result), 500

return jsonify(result)

return (jsonify(result), 500) if 'error' in result else jsonify(result)

gemini-code-assist

Code Review

This pull request introduces a new flexible API server for emotion detection, supporting multiple deployment strategies. The implementation is well-structured, but there are several areas for improvement. Key issues include a critical security vulnerability where raw exception details are exposed to clients, and a high-severity performance issue in the batch prediction endpoint which processes requests sequentially instead of in a true batch. Additionally, there are opportunities to improve maintainability by reducing code duplication, and to increase robustness by adding retry logic and validating configuration more strictly. My review provides specific suggestions to address these points.

gemini-code-assist · 2025-08-10T23:05:34Z

+    except Exception as e:
+        return jsonify({'error': str(e)}), 500


Exposing raw exception messages to the client is a security risk. It can leak sensitive information about the application's internal workings, such as library versions, file paths, or underlying error details. It's better to log the detailed exception for debugging and return a generic, user-friendly error message.

except Exception as e: logger.error(f"An unexpected error occurred in /predict: {e}", exc_info=True) return jsonify({'error': 'An internal server error occurred.'}), 500

gemini-code-assist · 2025-08-10T23:05:34Z

+        results = []
+        for text in texts:
+            if text and text.strip():
+                result = detector.predict(text.strip())
+                results.append(result)


The batch prediction endpoint processes texts sequentially in a loop, calling detector.predict() for each text. This is highly inefficient and defeats the purpose of a batch endpoint, as it results in N separate predictions (and N API calls for serverless/endpoint modes) instead of a single batch operation. Both HuggingFace APIs and local transformer models are optimized for batch processing. You should add a predict_batch method to the FlexibleEmotionDetector class that is optimized for batching.

texts_to_predict = [text.strip() for text in texts if text and text.strip()] if not texts_to_predict: return jsonify({'predictions': [], 'count': 0, 'deployment_type': detector.deployment_type.value}) # Assumes a new `predict_batch` method is added to the detector for true batching results = detector.predict_batch(texts_to_predict)

gemini-code-assist · 2025-08-10T23:05:34Z

+    def __init__(self):
+        """Initialize based on environment configuration."""
+        self.deployment_type = DeploymentType(os.getenv('DEPLOYMENT_TYPE', 'serverless'))
+        self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model')


The default MODEL_NAME is a placeholder value 'your-username/samo-dl-emotion-model'. If the MODEL_NAME environment variable is not set, the application will fail with a potentially confusing error from the HuggingFace library. It would be more robust to check for this placeholder and raise a clear ValueError if it's not been changed, improving the user experience on first run.

Suggested change

self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model')

self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model')

if 'your-username' in self.model_name:

raise ValueError("MODEL_NAME environment variable is not set or is using the default placeholder. Please set it to your HuggingFace model.")

gemini-code-assist · 2025-08-10T23:05:34Z

+        from requests.adapters import HTTPAdapter
+        from requests.packages.urllib3.util.retry import Retry


Imports should be at the top of the file as per PEP 8 guidelines. Placing them inside a function or method can lead to circular import issues and makes it harder to see the file's dependencies at a glance.

gemini-code-assist · 2025-08-10T23:05:34Z

+        self.headers = {"Authorization": f"Bearer {self.hf_token}"}
+
+        # Create session
+        self.session = requests.Session()


The requests.Session for the endpoint deployment type is initialized without a retry strategy. Inference Endpoints can also experience transient network issues or temporary unavailability. Adding a retry mechanism, similar to the one used for the serverless deployment, would make the client more resilient to these issues.

self.session = requests.Session() from requests.adapters import HTTPAdapter from requests.packages.urllib3.util.retry import Retry retry_strategy = Retry( total=3, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504], allowed_methods={"POST", "GET"} ) adapter = HTTPAdapter(max_retries=retry_strategy) self.session.mount("http://", adapter) self.session.mount("https://", adapter)

gemini-code-assist · 2025-08-10T23:05:34Z

+            if response.status_code == 503:
+                # Model is loading (cold start)
+                logger.info("🔄 Model loading, waiting...")
+                time.sleep(10)  # Wait for model to load
+                response = self.session.post(
+                    self.api_url, 
+                    headers=self.headers, 
+                    json=payload,
+                    timeout=timeout
+                )


The requests.post call is duplicated to handle the 503 (model loading) status code. This makes the code harder to maintain. You can refactor this into a loop to avoid repetition.

for attempt in range(2): # Try up to 2 times response = self.session.post( self.api_url, headers=self.headers, json=payload, timeout=timeout ) if response.status_code != 503: break # Success or other error if attempt == 0: # If it's the first attempt and model is loading logger.info("🔄 Model loading, waiting...") time.sleep(10) # Wait for model to load

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (12)

deployment/flexible_api_server.py (12)

2-9: Docstring style fixes (D205, D212, D415) and trailing whitespace (W291).

Tighten the module docstring to meet Ruff pydocstyle rules and remove trailing whitespace.

-"""
-🚀 FLEXIBLE EMOTION DETECTION API SERVER
-========================================
-Supports multiple HuggingFace deployment strategies:
-- Serverless Inference API (free)
-- Inference Endpoints (paid)  
-- Self-hosted (local)
-"""
+"""Flexible emotion detection API server.
+
+Supports multiple HuggingFace deployment strategies:
+- Serverless Inference API (free)
+- Inference Endpoints (paid)
+- Self-hosted (local)
+"""

14-14: Use built-in generics and remove unused imports (UP006, F401).

Drop unused List.
Prefer builtin generics over typing.Dict for Python 3.9+.

-from typing import Dict, List, Optional, Any
+from typing import Optional, Any

And update signatures to builtins:

-def predict(self, text: str) -> Dict[str, Any]:
+def predict(self, text: str) -> dict[str, Any]:
@@
-def _predict_serverless(self, text: str) -> Dict[str, Any]:
+def _predict_serverless(self, text: str) -> dict[str, Any]:
@@
-def _predict_endpoint(self, text: str) -> Dict[str, Any]:
+def _predict_endpoint(self, text: str) -> dict[str, Any]:
@@
-def _predict_local(self, text: str) -> Dict[str, Any]:
+def _predict_local(self, text: str) -> dict[str, Any]:
@@
-def get_status(self) -> Dict[str, Any]:
+def get_status(self) -> dict[str, Any]:

Also applies to: 131-131, 150-150, 214-214, 261-261, 343-343

28-32: Add docstring to public Enum (D101).

Minor: document DeploymentType values for clarity and lint compliance.

 class DeploymentType(Enum):
+    """Supported deployment strategies for emotion detector."""
     SERVERLESS = "serverless"
     ENDPOINT = "endpoint"
     LOCAL = "local"

55-67: Add return annotation to initializer and handle unknown types explicitly (ANN202).

Small cleanups; also good to raise if an unsupported type appears (defensive programming).

-    def _initialize(self):
+    def _initialize(self) -> None:
@@
-        elif self.deployment_type == DeploymentType.LOCAL:
+        elif self.deployment_type == DeploymentType.LOCAL:
             self._initialize_local()
+        else:
+            raise ValueError(f"Unsupported deployment type: {self.deployment_type!r}")

93-108: Mirror retry config for endpoints (consistency) and add missing return annotation (ANN202).

Endpoint init currently lacks retry configuration. Consider reusing the same Session adapter strategy for consistent behavior, and annotate return type.

-    def _initialize_endpoint(self):
+    def _initialize_endpoint(self) -> None:
@@
-        # Create session
-        self.session = requests.Session()
+        # Create session with retry strategy similar to serverless
+        from requests.adapters import HTTPAdapter
+        from urllib3.util.retry import Retry
+        retry_strategy = Retry(
+            total=3,
+            backoff_factor=1,
+            status_forcelist=[429, 500, 502, 503, 504],
+            allowed_methods={"POST", "GET", "PUT", "PATCH"},
+            respect_retry_after_header=True,
+        )
+        adapter = HTTPAdapter(max_retries=retry_strategy)
+        self.session = requests.Session()
+        self.session.mount("http://", adapter)
+        self.session.mount("https://", adapter)

109-131: Add return annotation to local initializer (ANN202).

Nit: annotate and trim whitespace (W291 noted elsewhere).

-    def _initialize_local(self):
+    def _initialize_local(self) -> None:

293-301: Fix confidence indexing (minor correctness).

Indexing with a tensor returns a 1D tensor; use the scalar index for clarity.

-            # Get confidence
-            confidence = probabilities[0][predicted_class].item()
+            # Get confidence
+            predicted_idx = predicted_class.item()
+            confidence = probabilities[0, predicted_idx].item()

343-355: Compute a more accurate readiness flag.

Currently always True; reflect real readiness across modes.

     def get_status(self) -> dict[str, Any]:
         """Get detector status information."""
-        return {
+        ready = False
+        if self.deployment_type == DeploymentType.LOCAL:
+            ready = bool(self.model and self.tokenizer)
+        else:
+            ready = bool(self.session)
+        return {
             "deployment_type": self.deployment_type.value,
             "model_name": self.model_name,
             "emotion_labels": self.emotion_labels,
-            "ready": True,
+            "ready": ready,
             "config": {
                 "serverless_api": self.api_url if hasattr(self, 'api_url') else None,
                 "endpoint_url": self.endpoint_url if hasattr(self, 'endpoint_url') else None,
                 "local_device": self._get_model_device_str() if self.model else None,
             }
         }

366-366: Add return type annotations for Flask route functions (ANN201).

Use Flask’s ResponseReturnValue for accurate typing.

-from flask import Flask, request, jsonify
+from flask import Flask, request, jsonify
+from flask.typing import ResponseReturnValue
@@
-def health_check():
+def health_check() -> ResponseReturnValue:
@@
-def predict_emotion():
+def predict_emotion() -> ResponseReturnValue:
@@
-def predict_batch():
+def predict_batch() -> ResponseReturnValue:
@@
-def home():
+def home() -> ResponseReturnValue:

Also applies to: 379-379, 408-408, 441-441

193-196: Optional: avoid echoing full input text in successful responses.

While the PR focuses on errors, echoing request text can still propagate PII to downstream logs/clients. Consider omitting or replacing with metadata (length/hash).

-                    "text": text,
+                    # "text": text,  # Consider omitting to avoid PII propagation

Also applies to: 239-243, 315-319

482-564: Replace prints with logger or silence lints (T201); handle S104 intentionally.

If Ruff T201 is enforced, convert prints to logger.info for consistency. Otherwise, add noqa to specific lines. Also explicitly mark the all-interfaces constant line as intentional to satisfy S104.

-if __name__ == '__main__':
-    print("🌐 Starting Flexible Emotion Detection API...")
-    print("=" * 60)
+if __name__ == '__main__':
+    logger.info("🌐 Starting Flexible Emotion Detection API...")
+    logger.info("=" * 60)
@@
-        print(f"📋 Deployment Type: {status['deployment_type'].upper()}")
-        print(f"🤖 Model: {status['model_name']}")
-        print(f"🎭 Emotions: {len(status['emotion_labels'])} classes")
+        logger.info("📋 Deployment Type: %s", status['deployment_type'].upper())
+        logger.info("🤖 Model: %s", status['model_name'])
+        logger.info("🎭 Emotions: %d classes", len(status['emotion_labels']))
@@
-    ALL_INTERFACES = '0.0.0.0'
+    ALL_INTERFACES = '0.0.0.0'  # noqa: S104
@@
-    print(f"\n🚀 Server starting on {server_url}")
+    logger.info("🚀 Server starting on %s", server_url)

If you prefer prints for CLI UX, consider adding # noqa: T201 on those lines instead.

62-62: Trim trailing whitespace flagged by Ruff (W291).

Minor; remove trailing spaces on the noted lines.

Also applies to: 465-465, 547-547, 158-159, 169-170, 266-269

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70cb2f6 and ae92171.

📒 Files selected for processing (1)

deployment/flexible_api_server.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

deployment/flexible_api_server.py

2-9: 1 blank line required between summary line and description

(D205)

2-9: Multi-line docstring summary should start at the first line

Remove whitespace after opening quotes

(D212)

2-9: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

7-7: Trailing whitespace

Remove trailing whitespace

(W291)

14-14: typing.List imported but unused

Remove unused import: typing.List

(F401)

28-28: Missing docstring in public class

(D101)

36-36: Missing return type annotation for special method __init__