flexible api: explicit timeouts + no PII in errors#57
Conversation
WalkthroughA new Flask-based API server for emotion detection has been implemented. It supports three HuggingFace deployment strategies—serverless inference API, paid inference endpoints, and local self-hosted models—configurable via environment variables. The server exposes endpoints for health checks, single and batch predictions, and API documentation, with robust error handling, logging, and security guidance. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FlaskAPI
participant FlexibleEmotionDetector
participant HuggingFaceAPI
participant LocalModel
Client->>FlaskAPI: POST /predict (text)
FlaskAPI->>FlexibleEmotionDetector: predict(text)
alt Serverless/Endpoint
FlexibleEmotionDetector->>HuggingFaceAPI: Request prediction
HuggingFaceAPI-->>FlexibleEmotionDetector: Return prediction
else Local
FlexibleEmotionDetector->>LocalModel: Predict emotion
LocalModel-->>FlexibleEmotionDetector: Return prediction
end
FlexibleEmotionDetector-->>FlaskAPI: Prediction result
FlaskAPI-->>Client: JSON response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Reviewer's GuideThis PR introduces a new Flask-based emotion detection API server that supports serverless, endpoint, and local deployment strategies, with explicit timeout configuration, robust retry handling, and PII-safe error logging implemented within a single Python module. Sequence diagram for /predict endpoint request flowsequenceDiagram
actor User
participant FlaskApp as Flask App
participant Detector as FlexibleEmotionDetector
User->>FlaskApp: POST /predict {"text": "..."}
FlaskApp->>Detector: predict(text)
Detector->>Detector: Select deployment strategy
alt Serverless
Detector->>Detector: _predict_serverless(text)
else Endpoint
Detector->>Detector: _predict_endpoint(text)
else Local
Detector->>Detector: _predict_local(text)
end
Detector-->>FlaskApp: prediction result
FlaskApp-->>User: JSON response
Sequence diagram for /predict_batch endpoint request flowsequenceDiagram
actor User
participant FlaskApp as Flask App
participant Detector as FlexibleEmotionDetector
User->>FlaskApp: POST /predict_batch {"texts": ["...", ...]}
FlaskApp->>Detector: predict(text) for each text
Detector->>Detector: Select deployment strategy per text
Detector-->>FlaskApp: prediction results (list)
FlaskApp-->>User: JSON response with predictions
Class diagram for FlexibleEmotionDetector and DeploymentTypeclassDiagram
class FlexibleEmotionDetector {
- deployment_type: DeploymentType
- model_name: str
- hf_token: str
- emotion_labels: List[str]
- model: Any
- tokenizer: Any
- session: Any
+ __init__()
+ predict(text: str) Dict[str, Any]
+ get_status() Dict[str, Any]
- _initialize()
- _initialize_serverless()
- _initialize_endpoint()
- _initialize_local()
- _predict_serverless(text: str) Dict[str, Any]
- _predict_endpoint(text: str) Dict[str, Any]
- _predict_local(text: str) Dict[str, Any]
- _get_model_device_str() Optional[str]
}
class DeploymentType {
SERVERLESS
ENDPOINT
LOCAL
}
FlexibleEmotionDetector --> DeploymentType
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Summary of Changes
Hello @uelkerd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've added a new Flask-based API server that provides a flexible solution for emotion detection. This server can seamlessly integrate with various HuggingFace deployment options, from cost-effective serverless solutions to high-performance dedicated endpoints, or even local self-hosted models. A key focus of this change was to enhance the API's robustness by implementing explicit timeouts for external requests and improving privacy by ensuring that sensitive user data is not inadvertently logged during errors.
Highlights
- Flexible Emotion Detection API Server: I've introduced a new Flask API server designed for emotion detection. This server is highly flexible, supporting three distinct HuggingFace deployment strategies: the free Serverless Inference API, paid Inference Endpoints, and self-hosted local models. This allows for adaptable deployment based on performance and cost requirements.
- Explicit Timeouts and PII Redaction: To improve reliability and prevent hanging requests, I've implemented explicit timeouts for all external API calls made to HuggingFace. Additionally, I've focused on privacy by ensuring that Personal Identifiable Information (PII) is not exposed in error logs, redacting sensitive input text when logging prediction failures.
- Robust Error Handling and Cold Start Management: The new server includes comprehensive error handling, particularly for the HuggingFace Serverless API, which can experience 'cold starts'. The system now intelligently retries requests if a 503 (Service Unavailable) status is received, allowing the model time to load. This makes the API more resilient to transient issues.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
|
Here's the code health analysis summary for commits Analysis Summary
|
There was a problem hiding this comment.
Hey @uelkerd - I've reviewed your changes - here's some feedback:
Blocking issues:
- time.sleep() call; did you mean to leave this in? (link)
General comments:
- There’s a lot of duplicated logic in _predict_serverless and _predict_endpoint for parsing and formatting results—consider extracting a shared helper to reduce repetition.
- This file mixes CLI startup, Flask routes, and deployment logic into one large script—splitting into modules (e.g. routes, model client, config) would improve readability and maintainability.
- Configuration values like timeouts and retry settings are scattered across methods—centralizing them into a single config object or class would make adjustments and testing much easier.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There’s a lot of duplicated logic in _predict_serverless and _predict_endpoint for parsing and formatting results—consider extracting a shared helper to reduce repetition.
- This file mixes CLI startup, Flask routes, and deployment logic into one large script—splitting into modules (e.g. routes, model client, config) would improve readability and maintainability.
- Configuration values like timeouts and retry settings are scattered across methods—centralizing them into a single config object or class would make adjustments and testing much easier.
## Individual Comments
### Comment 1
<location> `deployment/flexible_api_server.py:39` </location>
<code_context>
+ def __init__(self):
+ """Initialize based on environment configuration."""
+ self.deployment_type = DeploymentType(os.getenv('DEPLOYMENT_TYPE', 'serverless'))
+ self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model')
+ self.hf_token = os.getenv('HF_TOKEN')
+
</code_context>
<issue_to_address>
Default model name is a placeholder and may cause runtime errors.
If MODEL_NAME is not set, loading the model may fail. Add a check to raise an explicit error when MODEL_NAME is missing or still set to the placeholder.
</issue_to_address>
### Comment 2
<location> `deployment/flexible_api_server.py:81` </location>
<code_context>
+ from requests.adapters import HTTPAdapter
+ from requests.packages.urllib3.util.retry import Retry
+
+ retry_strategy = Retry(
+ total=3,
+ backoff_factor=1,
+ status_forcelist=[429, 500, 502, 503, 504],
+ allowed_methods={"POST", "GET", "PUT", "PATCH"} # Enable retries for these HTTP methods
+ )
+ adapter = HTTPAdapter(max_retries=retry_strategy)
</code_context>
<issue_to_address>
allowed_methods should be a list or tuple for compatibility.
Using a set for allowed_methods may lead to compatibility issues; please use a list or tuple instead.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods={"POST", "GET", "PUT", "PATCH"} # Enable retries for these HTTP methods
)
=======
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["POST", "GET", "PUT", "PATCH"] # Enable retries for these HTTP methods
)
>>>>>>> REPLACE
</suggested_fix>
### Comment 3
<location> `deployment/flexible_api_server.py:294` </location>
<code_context>
+ predicted_class = predicted_class.cpu()
+
+ # Get emotion label
+ if hasattr(self.model.config, 'id2label'):
+ emotion = self.model.config.id2label[predicted_class.item()]
+ else:
+ emotion = self.emotion_labels[predicted_class.item()]
</code_context>
<issue_to_address>
Potential KeyError if id2label does not contain predicted_class index.
Use .get() with a default value or check if the index exists before accessing id2label to avoid a KeyError.
</issue_to_address>
### Comment 4
<location> `deployment/flexible_api_server.py:304` </location>
<code_context>
+
+ # Get all emotion probabilities
+ all_emotions = {}
+ for i, prob in enumerate(probabilities[0]):
+ if hasattr(self.model.config, 'id2label'):
+ label = self.model.config.id2label[i]
+ else:
+ label = self.emotion_labels[i] if i < len(self.emotion_labels) else f"emotion_{i}"
+ all_emotions[label] = prob.item()
+
</code_context>
<issue_to_address>
Possible index error if emotion_labels is shorter than model output.
Consider adding a warning log when the fallback label is used to aid in identifying configuration mismatches.
Suggested implementation:
```python
# Get all emotion probabilities
all_emotions = {}
for i, prob in enumerate(probabilities[0]):
if hasattr(self.model.config, 'id2label'):
label = self.model.config.id2label[i]
else:
if i < len(self.emotion_labels):
label = self.emotion_labels[i]
else:
label = f"emotion_{i}"
logging.warning(
f"Emotion label index {i} exceeds emotion_labels length ({len(self.emotion_labels)}). "
f"Using fallback label '{label}'. This may indicate a configuration mismatch."
)
all_emotions[label] = prob.item()
```
```python
import os
import time
import logging
from typing import Dict, List, Optional, Any
from enum import Enum
```
</issue_to_address>
### Comment 5
<location> `deployment/flexible_api_server.py:389` </location>
<code_context>
+ try:
+ data = request.get_json()
+
+ if not data or 'text' not in data:
+ return jsonify({'error': 'No text provided'}), 400
+
</code_context>
<issue_to_address>
Error response for missing 'text' is clear, but could include expected format.
Including the expected request format in the error response will make it easier for clients to fix their requests.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
if not data or 'text' not in data:
return jsonify({'error': 'No text provided'}), 400
=======
if not data or 'text' not in data:
return jsonify({
'error': 'No text provided',
'expected_format': {'text': '<your text here>'}
}), 400
>>>>>>> REPLACE
</suggested_fix>
### Comment 6
<location> `deployment/flexible_api_server.py:564` </location>
<code_context>
+ print(f" • Use {all_interfaces_env} only in production with firewall/proxy")
+ print(" • Never expose debug=True to external networks")
+
+ app.run(host=host, port=port, debug=debug)
</code_context>
<issue_to_address>
Flask debug mode should not be enabled in production.
Consider adding a check to prevent debug=True when the app is bound to all interfaces, to avoid accidental exposure.
</issue_to_address>
## Security Issues
### Issue 1
<location> `deployment/flexible_api_server.py:167` </location>
<issue_to_address>
**security (python.lang.best-practice.arbitrary-sleep):** time.sleep() call; did you mean to leave this in?
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def __init__(self): | ||
| """Initialize based on environment configuration.""" | ||
| self.deployment_type = DeploymentType(os.getenv('DEPLOYMENT_TYPE', 'serverless')) | ||
| self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model') |
There was a problem hiding this comment.
issue (bug_risk): Default model name is a placeholder and may cause runtime errors.
If MODEL_NAME is not set, loading the model may fail. Add a check to raise an explicit error when MODEL_NAME is missing or still set to the placeholder.
| retry_strategy = Retry( | ||
| total=3, | ||
| backoff_factor=1, | ||
| status_forcelist=[429, 500, 502, 503, 504], | ||
| allowed_methods={"POST", "GET", "PUT", "PATCH"} # Enable retries for these HTTP methods | ||
| ) |
There was a problem hiding this comment.
suggestion: allowed_methods should be a list or tuple for compatibility.
Using a set for allowed_methods may lead to compatibility issues; please use a list or tuple instead.
| retry_strategy = Retry( | |
| total=3, | |
| backoff_factor=1, | |
| status_forcelist=[429, 500, 502, 503, 504], | |
| allowed_methods={"POST", "GET", "PUT", "PATCH"} # Enable retries for these HTTP methods | |
| ) | |
| retry_strategy = Retry( | |
| total=3, | |
| backoff_factor=1, | |
| status_forcelist=[429, 500, 502, 503, 504], | |
| allowed_methods=["POST", "GET", "PUT", "PATCH"] # Enable retries for these HTTP methods | |
| ) |
| if hasattr(self.model.config, 'id2label'): | ||
| emotion = self.model.config.id2label[predicted_class.item()] |
There was a problem hiding this comment.
issue: Potential KeyError if id2label does not contain predicted_class index.
Use .get() with a default value or check if the index exists before accessing id2label to avoid a KeyError.
| for i, prob in enumerate(probabilities[0]): | ||
| if hasattr(self.model.config, 'id2label'): | ||
| label = self.model.config.id2label[i] | ||
| else: | ||
| label = self.emotion_labels[i] if i < len(self.emotion_labels) else f"emotion_{i}" |
There was a problem hiding this comment.
suggestion: Possible index error if emotion_labels is shorter than model output.
Consider adding a warning log when the fallback label is used to aid in identifying configuration mismatches.
Suggested implementation:
# Get all emotion probabilities
all_emotions = {}
for i, prob in enumerate(probabilities[0]):
if hasattr(self.model.config, 'id2label'):
label = self.model.config.id2label[i]
else:
if i < len(self.emotion_labels):
label = self.emotion_labels[i]
else:
label = f"emotion_{i}"
logging.warning(
f"Emotion label index {i} exceeds emotion_labels length ({len(self.emotion_labels)}). "
f"Using fallback label '{label}'. This may indicate a configuration mismatch."
)
all_emotions[label] = prob.item()import os
import time
import logging
from typing import Dict, List, Optional, Any
from enum import Enum| if not data or 'text' not in data: | ||
| return jsonify({'error': 'No text provided'}), 400 |
There was a problem hiding this comment.
suggestion: Error response for missing 'text' is clear, but could include expected format.
Including the expected request format in the error response will make it easier for clients to fix their requests.
| if not data or 'text' not in data: | |
| return jsonify({'error': 'No text provided'}), 400 | |
| if not data or 'text' not in data: | |
| return jsonify({ | |
| 'error': 'No text provided', | |
| 'expected_format': {'text': '<your text here>'} | |
| }), 400 |
| print(f" • Use {all_interfaces_env} only in production with firewall/proxy") | ||
| print(" • Never expose debug=True to external networks") | ||
|
|
||
| app.run(host=host, port=port, debug=debug) |
There was a problem hiding this comment.
🚨 suggestion (security): Flask debug mode should not be enabled in production.
Consider adding a check to prevent debug=True when the app is bound to all interfaces, to avoid accidental exposure.
| if response.status_code == 503: | ||
| # Model is loading (cold start) | ||
| logger.info("🔄 Model loading, waiting...") | ||
| time.sleep(10) # Wait for model to load |
There was a problem hiding this comment.
security (python.lang.best-practice.arbitrary-sleep): time.sleep() call; did you mean to leave this in?
Source: opengrep
| """Predict using local model.""" | ||
| try: | ||
| # Tokenize input | ||
| inputs = self.tokenizer( |
There was a problem hiding this comment.
issue (code-quality): We've found these issues:
- Extract code out into method (
extract-method) - Replace if statement with if expression (
assign-if-exp)
| if 'error' in result: | ||
| return jsonify(result), 500 | ||
| return jsonify(result) | ||
|
|
There was a problem hiding this comment.
suggestion (code-quality): We've found these issues:
- Lift code into else after jump in control flow (
reintroduce-else) - Replace if statement with if expression (
assign-if-exp)
| if 'error' in result: | |
| return jsonify(result), 500 | |
| return jsonify(result) | |
| return (jsonify(result), 500) if 'error' in result else jsonify(result) |
There was a problem hiding this comment.
Code Review
This pull request introduces a new flexible API server for emotion detection, supporting multiple deployment strategies. The implementation is well-structured, but there are several areas for improvement. Key issues include a critical security vulnerability where raw exception details are exposed to clients, and a high-severity performance issue in the batch prediction endpoint which processes requests sequentially instead of in a true batch. Additionally, there are opportunities to improve maintainability by reducing code duplication, and to increase robustness by adding retry logic and validating configuration more strictly. My review provides specific suggestions to address these points.
| except Exception as e: | ||
| return jsonify({'error': str(e)}), 500 |
There was a problem hiding this comment.
Exposing raw exception messages to the client is a security risk. It can leak sensitive information about the application's internal workings, such as library versions, file paths, or underlying error details. It's better to log the detailed exception for debugging and return a generic, user-friendly error message.
except Exception as e:
logger.error(f"An unexpected error occurred in /predict: {e}", exc_info=True)
return jsonify({'error': 'An internal server error occurred.'}), 500| results = [] | ||
| for text in texts: | ||
| if text and text.strip(): | ||
| result = detector.predict(text.strip()) | ||
| results.append(result) |
There was a problem hiding this comment.
The batch prediction endpoint processes texts sequentially in a loop, calling detector.predict() for each text. This is highly inefficient and defeats the purpose of a batch endpoint, as it results in N separate predictions (and N API calls for serverless/endpoint modes) instead of a single batch operation. Both HuggingFace APIs and local transformer models are optimized for batch processing. You should add a predict_batch method to the FlexibleEmotionDetector class that is optimized for batching.
texts_to_predict = [text.strip() for text in texts if text and text.strip()]
if not texts_to_predict:
return jsonify({'predictions': [], 'count': 0, 'deployment_type': detector.deployment_type.value})
# Assumes a new `predict_batch` method is added to the detector for true batching
results = detector.predict_batch(texts_to_predict)| def __init__(self): | ||
| """Initialize based on environment configuration.""" | ||
| self.deployment_type = DeploymentType(os.getenv('DEPLOYMENT_TYPE', 'serverless')) | ||
| self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model') |
There was a problem hiding this comment.
The default MODEL_NAME is a placeholder value 'your-username/samo-dl-emotion-model'. If the MODEL_NAME environment variable is not set, the application will fail with a potentially confusing error from the HuggingFace library. It would be more robust to check for this placeholder and raise a clear ValueError if it's not been changed, improving the user experience on first run.
| self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model') | |
| self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model') | |
| if 'your-username' in self.model_name: | |
| raise ValueError("MODEL_NAME environment variable is not set or is using the default placeholder. Please set it to your HuggingFace model.") |
| from requests.adapters import HTTPAdapter | ||
| from requests.packages.urllib3.util.retry import Retry |
| self.headers = {"Authorization": f"Bearer {self.hf_token}"} | ||
|
|
||
| # Create session | ||
| self.session = requests.Session() |
There was a problem hiding this comment.
The requests.Session for the endpoint deployment type is initialized without a retry strategy. Inference Endpoints can also experience transient network issues or temporary unavailability. Adding a retry mechanism, similar to the one used for the serverless deployment, would make the client more resilient to these issues.
self.session = requests.Session()
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods={"POST", "GET"}
)
adapter = HTTPAdapter(max_retries=retry_strategy)
self.session.mount("http://", adapter)
self.session.mount("https://", adapter)| if response.status_code == 503: | ||
| # Model is loading (cold start) | ||
| logger.info("🔄 Model loading, waiting...") | ||
| time.sleep(10) # Wait for model to load | ||
| response = self.session.post( | ||
| self.api_url, | ||
| headers=self.headers, | ||
| json=payload, | ||
| timeout=timeout | ||
| ) |
There was a problem hiding this comment.
The requests.post call is duplicated to handle the 503 (model loading) status code. This makes the code harder to maintain. You can refactor this into a loop to avoid repetition.
for attempt in range(2): # Try up to 2 times
response = self.session.post(
self.api_url,
headers=self.headers,
json=payload,
timeout=timeout
)
if response.status_code != 503:
break # Success or other error
if attempt == 0: # If it's the first attempt and model is loading
logger.info("🔄 Model loading, waiting...")
time.sleep(10) # Wait for model to loadThere was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (12)
deployment/flexible_api_server.py (12)
2-9: Docstring style fixes (D205, D212, D415) and trailing whitespace (W291).Tighten the module docstring to meet Ruff pydocstyle rules and remove trailing whitespace.
-""" -🚀 FLEXIBLE EMOTION DETECTION API SERVER -======================================== -Supports multiple HuggingFace deployment strategies: -- Serverless Inference API (free) -- Inference Endpoints (paid) -- Self-hosted (local) -""" +"""Flexible emotion detection API server. + +Supports multiple HuggingFace deployment strategies: +- Serverless Inference API (free) +- Inference Endpoints (paid) +- Self-hosted (local) +"""
14-14: Use built-in generics and remove unused imports (UP006, F401).
- Drop unused List.
- Prefer builtin generics over typing.Dict for Python 3.9+.
-from typing import Dict, List, Optional, Any +from typing import Optional, AnyAnd update signatures to builtins:
-def predict(self, text: str) -> Dict[str, Any]: +def predict(self, text: str) -> dict[str, Any]: @@ -def _predict_serverless(self, text: str) -> Dict[str, Any]: +def _predict_serverless(self, text: str) -> dict[str, Any]: @@ -def _predict_endpoint(self, text: str) -> Dict[str, Any]: +def _predict_endpoint(self, text: str) -> dict[str, Any]: @@ -def _predict_local(self, text: str) -> Dict[str, Any]: +def _predict_local(self, text: str) -> dict[str, Any]: @@ -def get_status(self) -> Dict[str, Any]: +def get_status(self) -> dict[str, Any]:Also applies to: 131-131, 150-150, 214-214, 261-261, 343-343
28-32: Add docstring to public Enum (D101).Minor: document DeploymentType values for clarity and lint compliance.
class DeploymentType(Enum): + """Supported deployment strategies for emotion detector.""" SERVERLESS = "serverless" ENDPOINT = "endpoint" LOCAL = "local"
55-67: Add return annotation to initializer and handle unknown types explicitly (ANN202).Small cleanups; also good to raise if an unsupported type appears (defensive programming).
- def _initialize(self): + def _initialize(self) -> None: @@ - elif self.deployment_type == DeploymentType.LOCAL: + elif self.deployment_type == DeploymentType.LOCAL: self._initialize_local() + else: + raise ValueError(f"Unsupported deployment type: {self.deployment_type!r}")
93-108: Mirror retry config for endpoints (consistency) and add missing return annotation (ANN202).Endpoint init currently lacks retry configuration. Consider reusing the same Session adapter strategy for consistent behavior, and annotate return type.
- def _initialize_endpoint(self): + def _initialize_endpoint(self) -> None: @@ - # Create session - self.session = requests.Session() + # Create session with retry strategy similar to serverless + from requests.adapters import HTTPAdapter + from urllib3.util.retry import Retry + retry_strategy = Retry( + total=3, + backoff_factor=1, + status_forcelist=[429, 500, 502, 503, 504], + allowed_methods={"POST", "GET", "PUT", "PATCH"}, + respect_retry_after_header=True, + ) + adapter = HTTPAdapter(max_retries=retry_strategy) + self.session = requests.Session() + self.session.mount("http://", adapter) + self.session.mount("https://", adapter)
109-131: Add return annotation to local initializer (ANN202).Nit: annotate and trim whitespace (W291 noted elsewhere).
- def _initialize_local(self): + def _initialize_local(self) -> None:
293-301: Fix confidence indexing (minor correctness).Indexing with a tensor returns a 1D tensor; use the scalar index for clarity.
- # Get confidence - confidence = probabilities[0][predicted_class].item() + # Get confidence + predicted_idx = predicted_class.item() + confidence = probabilities[0, predicted_idx].item()
343-355: Compute a more accurate readiness flag.Currently always True; reflect real readiness across modes.
def get_status(self) -> dict[str, Any]: """Get detector status information.""" - return { + ready = False + if self.deployment_type == DeploymentType.LOCAL: + ready = bool(self.model and self.tokenizer) + else: + ready = bool(self.session) + return { "deployment_type": self.deployment_type.value, "model_name": self.model_name, "emotion_labels": self.emotion_labels, - "ready": True, + "ready": ready, "config": { "serverless_api": self.api_url if hasattr(self, 'api_url') else None, "endpoint_url": self.endpoint_url if hasattr(self, 'endpoint_url') else None, "local_device": self._get_model_device_str() if self.model else None, } }
366-366: Add return type annotations for Flask route functions (ANN201).Use Flask’s ResponseReturnValue for accurate typing.
-from flask import Flask, request, jsonify +from flask import Flask, request, jsonify +from flask.typing import ResponseReturnValue @@ -def health_check(): +def health_check() -> ResponseReturnValue: @@ -def predict_emotion(): +def predict_emotion() -> ResponseReturnValue: @@ -def predict_batch(): +def predict_batch() -> ResponseReturnValue: @@ -def home(): +def home() -> ResponseReturnValue:Also applies to: 379-379, 408-408, 441-441
193-196: Optional: avoid echoing full input text in successful responses.While the PR focuses on errors, echoing request text can still propagate PII to downstream logs/clients. Consider omitting or replacing with metadata (length/hash).
- "text": text, + # "text": text, # Consider omitting to avoid PII propagationAlso applies to: 239-243, 315-319
482-564: Replace prints with logger or silence lints (T201); handle S104 intentionally.If Ruff T201 is enforced, convert prints to logger.info for consistency. Otherwise, add noqa to specific lines. Also explicitly mark the all-interfaces constant line as intentional to satisfy S104.
-if __name__ == '__main__': - print("🌐 Starting Flexible Emotion Detection API...") - print("=" * 60) +if __name__ == '__main__': + logger.info("🌐 Starting Flexible Emotion Detection API...") + logger.info("=" * 60) @@ - print(f"📋 Deployment Type: {status['deployment_type'].upper()}") - print(f"🤖 Model: {status['model_name']}") - print(f"🎭 Emotions: {len(status['emotion_labels'])} classes") + logger.info("📋 Deployment Type: %s", status['deployment_type'].upper()) + logger.info("🤖 Model: %s", status['model_name']) + logger.info("🎭 Emotions: %d classes", len(status['emotion_labels'])) @@ - ALL_INTERFACES = '0.0.0.0' + ALL_INTERFACES = '0.0.0.0' # noqa: S104 @@ - print(f"\n🚀 Server starting on {server_url}") + logger.info("🚀 Server starting on %s", server_url)If you prefer prints for CLI UX, consider adding
# noqa: T201on those lines instead.
62-62: Trim trailing whitespace flagged by Ruff (W291).Minor; remove trailing spaces on the noted lines.
Also applies to: 465-465, 547-547, 158-159, 169-170, 266-269
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deployment/flexible_api_server.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
deployment/flexible_api_server.py
2-9: 1 blank line required between summary line and description
(D205)
2-9: Multi-line docstring summary should start at the first line
Remove whitespace after opening quotes
(D212)
2-9: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
7-7: Trailing whitespace
Remove trailing whitespace
(W291)
14-14: typing.List imported but unused
Remove unused import: typing.List
(F401)
28-28: Missing docstring in public class
(D101)
36-36: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
55-55: Missing return type annotation for private function _initialize
Add return type annotation: None
(ANN202)
62-62: Trailing whitespace
Remove trailing whitespace
(W291)
68-68: Missing return type annotation for private function _initialize_serverless
Add return type annotation: None
(ANN202)
93-93: Missing return type annotation for private function _initialize_endpoint
Add return type annotation: None
(ANN202)
109-109: Missing return type annotation for private function _initialize_local
Add return type annotation: None
(ANN202)
131-131: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
150-150: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
158-158: Trailing whitespace
Remove trailing whitespace
(W291)
159-159: Trailing whitespace
Remove trailing whitespace
(W291)
169-169: Trailing whitespace
Remove trailing whitespace
(W291)
170-170: Trailing whitespace
Remove trailing whitespace
(W291)
214-214: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
261-261: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
266-266: Trailing whitespace
Remove trailing whitespace
(W291)
267-267: Trailing whitespace
Remove trailing whitespace
(W291)
268-268: Trailing whitespace
Remove trailing whitespace
(W291)
269-269: Trailing whitespace
Remove trailing whitespace
(W291)
343-343: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
366-366: Missing return type annotation for public function health_check
(ANN201)
379-379: Missing return type annotation for public function predict_emotion
(ANN201)
408-408: Missing return type annotation for public function predict_batch
(ANN201)
441-441: Missing return type annotation for public function home
(ANN201)
465-465: Trailing whitespace
Remove trailing whitespace
(W291)
483-483: print found
Remove print
(T201)
484-484: print found
Remove print
(T201)
488-488: print found
Remove print
(T201)
489-489: print found
Remove print
(T201)
490-490: print found
Remove print
(T201)
493-493: print found
Remove print
(T201)
494-494: print found
Remove print
(T201)
496-496: print found
Remove print
(T201)
497-497: print found
Remove print
(T201)
499-499: print found
Remove print
(T201)
500-500: print found
Remove print
(T201)
502-502: print found
Remove print
(T201)
503-503: print found
Remove print
(T201)
504-504: print found
Remove print
(T201)
505-505: print found
Remove print
(T201)
506-506: print found
Remove print
(T201)
508-508: print found
Remove print
(T201)
513-513: Possible binding to all interfaces
(S104)
528-528: print found
Remove print
(T201)
529-529: print found
Remove print
(T201)
530-530: print found
Remove print
(T201)
531-531: print found
Remove print
(T201)
536-536: print found
Remove print
(T201)
538-538: print found
Remove print
(T201)
539-539: print found
Remove print
(T201)
540-540: print found
Remove print
(T201)
541-541: print found
Remove print
(T201)
547-547: Trailing whitespace
Remove trailing whitespace
(W291)
551-551: print found
Remove print
(T201)
552-552: print found
Remove print
(T201)
553-553: print found
Remove print
(T201)
554-554: print found
Remove print
(T201)
558-558: print found
Remove print
(T201)
559-559: print found
Remove print
(T201)
561-561: print found
Remove print
(T201)
562-562: print found
Remove print
(T201)
| def __init__(self): | ||
| """Initialize based on environment configuration.""" | ||
| self.deployment_type = DeploymentType(os.getenv('DEPLOYMENT_TYPE', 'serverless')) | ||
| self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model') | ||
| self.hf_token = os.getenv('HF_TOKEN') | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Harden DEPLOYMENT_TYPE parsing; add return annotation (ANN204).
Current Enum casting is case-sensitive and may raise on unexpected values. Normalize and fall back safely. Also annotate init.
- def __init__(self):
+ def __init__(self) -> None:
"""Initialize based on environment configuration."""
- self.deployment_type = DeploymentType(os.getenv('DEPLOYMENT_TYPE', 'serverless'))
+ raw = os.getenv('DEPLOYMENT_TYPE', 'serverless')
+ normalized = (raw or 'serverless').strip().lower()
+ try:
+ self.deployment_type = DeploymentType(normalized)
+ except ValueError:
+ logger.warning("Unknown DEPLOYMENT_TYPE=%r; falling back to 'serverless'", raw)
+ self.deployment_type = DeploymentType.SERVERLESS📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__(self): | |
| """Initialize based on environment configuration.""" | |
| self.deployment_type = DeploymentType(os.getenv('DEPLOYMENT_TYPE', 'serverless')) | |
| self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model') | |
| self.hf_token = os.getenv('HF_TOKEN') | |
| def __init__(self) -> None: | |
| """Initialize based on environment configuration.""" | |
| raw = os.getenv('DEPLOYMENT_TYPE', 'serverless') | |
| normalized = (raw or 'serverless').strip().lower() | |
| try: | |
| self.deployment_type = DeploymentType(normalized) | |
| except ValueError: | |
| logger.warning("Unknown DEPLOYMENT_TYPE=%r; falling back to 'serverless'", raw) | |
| self.deployment_type = DeploymentType.SERVERLESS | |
| self.model_name = os.getenv('MODEL_NAME', 'your-username/samo-dl-emotion-model') | |
| self.hf_token = os.getenv('HF_TOKEN') |
🧰 Tools
🪛 Ruff (0.12.2)
36-36: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
🤖 Prompt for AI Agents
In deployment/flexible_api_server.py around lines 36 to 41, the __init__ method
lacks a return type annotation and the DeploymentType enum casting is
case-sensitive, which can raise errors on unexpected values. Add a return type
annotation of None to __init__. Normalize the DEPLOYMENT_TYPE environment
variable to lowercase before casting and implement a safe fallback to a default
DeploymentType value if the provided value is invalid or unexpected.
| def _initialize_serverless(self): | ||
| """Initialize serverless inference API.""" | ||
| if not self.hf_token: | ||
| raise ValueError("HF_TOKEN environment variable required for serverless API") | ||
|
|
||
| self.api_url = f"https://api-inference.huggingface.co/models/{self.model_name}" | ||
| self.headers = {"Authorization": f"Bearer {self.hf_token}"} | ||
|
|
||
| # Create session with retry strategy | ||
| self.session = requests.Session() | ||
| from requests.adapters import HTTPAdapter | ||
| from requests.packages.urllib3.util.retry import Retry | ||
|
|
||
| retry_strategy = Retry( | ||
| total=3, | ||
| backoff_factor=1, | ||
| status_forcelist=[429, 500, 502, 503, 504], | ||
| allowed_methods={"POST", "GET", "PUT", "PATCH"} # Enable retries for these HTTP methods | ||
| ) | ||
| adapter = HTTPAdapter(max_retries=retry_strategy) | ||
| self.session.mount("http://", adapter) | ||
| self.session.mount("https://", adapter) | ||
|
|
||
| logger.info(f"📡 Serverless API: {self.api_url}") |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Retry import path and respect Retry-After; trim trailing whitespace (UP, W291).
- Prefer urllib3 import path (less brittle).
- Respect Retry-After headers for 429/503.
- Remove trailing whitespace.
- from requests.adapters import HTTPAdapter
- from requests.packages.urllib3.util.retry import Retry
+ from requests.adapters import HTTPAdapter
+ from urllib3.util.retry import Retry
@@
- retry_strategy = Retry(
+ retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
- allowed_methods={"POST", "GET", "PUT", "PATCH"} # Enable retries for these HTTP methods
+ allowed_methods={"POST", "GET", "PUT", "PATCH"},
+ respect_retry_after_header=True,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _initialize_serverless(self): | |
| """Initialize serverless inference API.""" | |
| if not self.hf_token: | |
| raise ValueError("HF_TOKEN environment variable required for serverless API") | |
| self.api_url = f"https://api-inference.huggingface.co/models/{self.model_name}" | |
| self.headers = {"Authorization": f"Bearer {self.hf_token}"} | |
| # Create session with retry strategy | |
| self.session = requests.Session() | |
| from requests.adapters import HTTPAdapter | |
| from requests.packages.urllib3.util.retry import Retry | |
| retry_strategy = Retry( | |
| total=3, | |
| backoff_factor=1, | |
| status_forcelist=[429, 500, 502, 503, 504], | |
| allowed_methods={"POST", "GET", "PUT", "PATCH"} # Enable retries for these HTTP methods | |
| ) | |
| adapter = HTTPAdapter(max_retries=retry_strategy) | |
| self.session.mount("http://", adapter) | |
| self.session.mount("https://", adapter) | |
| logger.info(f"📡 Serverless API: {self.api_url}") | |
| self.api_url = f"https://api-inference.huggingface.co/models/{self.model_name}" | |
| self.headers = {"Authorization": f"Bearer {self.hf_token}"} | |
| # Create session with retry strategy | |
| self.session = requests.Session() | |
| from requests.adapters import HTTPAdapter | |
| from urllib3.util.retry import Retry | |
| retry_strategy = Retry( | |
| total=3, | |
| backoff_factor=1, | |
| status_forcelist=[429, 500, 502, 503, 504], | |
| allowed_methods={"POST", "GET", "PUT", "PATCH"}, | |
| respect_retry_after_header=True, | |
| ) | |
| adapter = HTTPAdapter(max_retries=retry_strategy) | |
| self.session.mount("http://", adapter) | |
| self.session.mount("https://", adapter) | |
| logger.info(f"📡 Serverless API: {self.api_url}") |
🧰 Tools
🪛 Ruff (0.12.2)
68-68: Missing return type annotation for private function _initialize_serverless
Add return type annotation: None
(ANN202)
🤖 Prompt for AI Agents
In deployment/flexible_api_server.py around lines 68 to 91, update the import of
Retry to use the urllib3 package directly instead of requests.packages.urllib3
for better reliability. Modify the Retry configuration to set
respect_retry_after_header=True to honor Retry-After headers on 429 and 503
responses. Also, remove any trailing whitespace on these lines to comply with
style guidelines.
| except Exception as e: | ||
| # Log error with redacted text for debugging (avoid PII exposure in logs) | ||
| text_preview = f"{text[:20]}..." if len(text) > 20 else text | ||
| logger.error(f"❌ Prediction failed: {e} (input preview: {text_preview})") | ||
| return { | ||
| "error": str(e), | ||
| "deployment_type": self.deployment_type.value | ||
| } |
There was a problem hiding this comment.
Sanitize error details returned to clients to avoid PII and internal leakage.
Returning str(e) exposes internal messages upstream (contrary to “no PII in errors”). Keep details in logs; return stable codes/messages to clients.
- except Exception as e:
- # Log error with redacted text for debugging (avoid PII exposure in logs)
+ except Exception as e:
+ # Log with redacted preview; do not expose details to clients
text_preview = f"{text[:20]}..." if len(text) > 20 else text
logger.error(f"❌ Prediction failed: {e} (input preview: {text_preview})")
return {
- "error": str(e),
+ "error": "Prediction failed. Please try again later.",
+ "error_code": "PREDICT_FAILURE",
"deployment_type": self.deployment_type.value
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| # Log error with redacted text for debugging (avoid PII exposure in logs) | |
| text_preview = f"{text[:20]}..." if len(text) > 20 else text | |
| logger.error(f"❌ Prediction failed: {e} (input preview: {text_preview})") | |
| return { | |
| "error": str(e), | |
| "deployment_type": self.deployment_type.value | |
| } | |
| except Exception as e: | |
| # Log with redacted preview; do not expose details to clients | |
| text_preview = f"{text[:20]}..." if len(text) > 20 else text | |
| logger.error(f"❌ Prediction failed: {e} (input preview: {text_preview})") | |
| return { | |
| "error": "Prediction failed. Please try again later.", | |
| "error_code": "PREDICT_FAILURE", | |
| "deployment_type": self.deployment_type.value | |
| } |
🤖 Prompt for AI Agents
In deployment/flexible_api_server.py around lines 141 to 148, the error response
returns str(e) which may expose PII or internal details to clients. Modify the
return statement to send a generic, stable error message or code instead of
str(e), while keeping the detailed error logged internally. This ensures
sensitive information is not leaked upstream.
| def _predict_serverless(self, text: str) -> Dict[str, Any]: | ||
| """Predict using serverless inference API.""" | ||
| try: | ||
| payload = {"inputs": text} | ||
|
|
||
| # Add timeout and rate limit handling | ||
| timeout = int(os.getenv('TIMEOUT_SECONDS', '30')) | ||
| response = self.session.post( | ||
| self.api_url, | ||
| headers=self.headers, | ||
| json=payload, | ||
| timeout=timeout | ||
| ) | ||
|
|
||
| if response.status_code == 503: | ||
| # Model is loading (cold start) | ||
| logger.info("🔄 Model loading, waiting...") | ||
| time.sleep(10) # Wait for model to load | ||
| response = self.session.post( | ||
| self.api_url, | ||
| headers=self.headers, | ||
| json=payload, | ||
| timeout=timeout | ||
| ) | ||
|
|
||
| response.raise_for_status() | ||
| result = response.json() | ||
|
|
||
| # Convert HuggingFace format to our format | ||
| if isinstance(result, list) and len(result) > 0: | ||
| # Standard classification output | ||
| predictions = result[0] if isinstance(result[0], list) else result | ||
|
|
||
| # Find highest scoring emotion | ||
| best_prediction = max(predictions, key=lambda x: x['score']) | ||
|
|
||
| # Create emotion probabilities dict | ||
| all_emotions = {pred['label']: pred['score'] for pred in predictions} | ||
|
|
||
| return { | ||
| "emotion": best_prediction['label'], | ||
| "confidence": best_prediction['score'], | ||
| "all_emotions": all_emotions, | ||
| "text": text, | ||
| "deployment_type": "serverless", | ||
| "model": self.model_name | ||
| } | ||
| return { | ||
| "error": "Unexpected response format", | ||
| "raw_response": result, | ||
| "deployment_type": "serverless" | ||
| } | ||
| except requests.exceptions.Timeout: | ||
| return { | ||
| "error": "Request timeout (model may be cold starting)", | ||
| "suggestion": "Try again in a few seconds", | ||
| "deployment_type": "serverless" | ||
| } | ||
| except requests.exceptions.RequestException as e: | ||
| return { | ||
| "error": f"API request failed: {e}", | ||
| "deployment_type": "serverless" | ||
| } | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Harden serverless prediction error handling and keep errors generic.
- Avoid returning raw exception strings to clients.
- Optionally add a capped retry instead of a fixed 10s sleep on 503.
- if response.status_code == 503:
+ if response.status_code == 503:
# Model is loading (cold start)
logger.info("🔄 Model loading, waiting...")
- time.sleep(10) # Wait for model to load
+ time.sleep(10) # Wait for model to load (consider exponential backoff)
@@
- except requests.exceptions.Timeout:
+ except requests.exceptions.Timeout:
return {
- "error": "Request timeout (model may be cold starting)",
+ "error": "Request timeout.",
"suggestion": "Try again in a few seconds",
"deployment_type": "serverless"
}
- except requests.exceptions.RequestException as e:
+ except requests.exceptions.RequestException as e:
+ logger.error("Serverless HTTP error: %s", e)
return {
- "error": f"API request failed: {e}",
+ "error": "Upstream API request failed.",
+ "error_code": "SERVERLESS_HTTP_ERROR",
"deployment_type": "serverless"
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _predict_serverless(self, text: str) -> Dict[str, Any]: | |
| """Predict using serverless inference API.""" | |
| try: | |
| payload = {"inputs": text} | |
| # Add timeout and rate limit handling | |
| timeout = int(os.getenv('TIMEOUT_SECONDS', '30')) | |
| response = self.session.post( | |
| self.api_url, | |
| headers=self.headers, | |
| json=payload, | |
| timeout=timeout | |
| ) | |
| if response.status_code == 503: | |
| # Model is loading (cold start) | |
| logger.info("🔄 Model loading, waiting...") | |
| time.sleep(10) # Wait for model to load | |
| response = self.session.post( | |
| self.api_url, | |
| headers=self.headers, | |
| json=payload, | |
| timeout=timeout | |
| ) | |
| response.raise_for_status() | |
| result = response.json() | |
| # Convert HuggingFace format to our format | |
| if isinstance(result, list) and len(result) > 0: | |
| # Standard classification output | |
| predictions = result[0] if isinstance(result[0], list) else result | |
| # Find highest scoring emotion | |
| best_prediction = max(predictions, key=lambda x: x['score']) | |
| # Create emotion probabilities dict | |
| all_emotions = {pred['label']: pred['score'] for pred in predictions} | |
| return { | |
| "emotion": best_prediction['label'], | |
| "confidence": best_prediction['score'], | |
| "all_emotions": all_emotions, | |
| "text": text, | |
| "deployment_type": "serverless", | |
| "model": self.model_name | |
| } | |
| return { | |
| "error": "Unexpected response format", | |
| "raw_response": result, | |
| "deployment_type": "serverless" | |
| } | |
| except requests.exceptions.Timeout: | |
| return { | |
| "error": "Request timeout (model may be cold starting)", | |
| "suggestion": "Try again in a few seconds", | |
| "deployment_type": "serverless" | |
| } | |
| except requests.exceptions.RequestException as e: | |
| return { | |
| "error": f"API request failed: {e}", | |
| "deployment_type": "serverless" | |
| } | |
| def _predict_serverless(self, text: str) -> Dict[str, Any]: | |
| """Predict using serverless inference API.""" | |
| try: | |
| payload = {"inputs": text} | |
| # Add timeout and rate limit handling | |
| timeout = int(os.getenv('TIMEOUT_SECONDS', '30')) | |
| response = self.session.post( | |
| self.api_url, | |
| headers=self.headers, | |
| json=payload, | |
| timeout=timeout | |
| ) | |
| if response.status_code == 503: | |
| # Model is loading (cold start) | |
| logger.info("🔄 Model loading, waiting...") | |
| time.sleep(10) # Wait for model to load (consider exponential backoff) | |
| response = self.session.post( | |
| self.api_url, | |
| headers=self.headers, | |
| json=payload, | |
| timeout=timeout | |
| ) | |
| response.raise_for_status() | |
| result = response.json() | |
| # Convert HuggingFace format to our format | |
| if isinstance(result, list) and len(result) > 0: | |
| # Standard classification output | |
| predictions = result[0] if isinstance(result[0], list) else result | |
| # Find highest scoring emotion | |
| best_prediction = max(predictions, key=lambda x: x['score']) | |
| # Create emotion probabilities dict | |
| all_emotions = {pred['label']: pred['score'] for pred in predictions} | |
| return { | |
| "emotion": best_prediction['label'], | |
| "confidence": best_prediction['score'], | |
| "all_emotions": all_emotions, | |
| "text": text, | |
| "deployment_type": "serverless", | |
| "model": self.model_name | |
| } | |
| return { | |
| "error": "Unexpected response format", | |
| "raw_response": result, | |
| "deployment_type": "serverless" | |
| } | |
| except requests.exceptions.Timeout: | |
| return { | |
| "error": "Request timeout.", | |
| "suggestion": "Try again in a few seconds", | |
| "deployment_type": "serverless" | |
| } | |
| except requests.exceptions.RequestException as e: | |
| logger.error("Serverless HTTP error: %s", e) | |
| return { | |
| "error": "Upstream API request failed.", | |
| "error_code": "SERVERLESS_HTTP_ERROR", | |
| "deployment_type": "serverless" | |
| } |
🧰 Tools
🪛 Ruff (0.12.2)
150-150: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
158-158: Trailing whitespace
Remove trailing whitespace
(W291)
159-159: Trailing whitespace
Remove trailing whitespace
(W291)
169-169: Trailing whitespace
Remove trailing whitespace
(W291)
170-170: Trailing whitespace
Remove trailing whitespace
(W291)
🤖 Prompt for AI Agents
In deployment/flexible_api_server.py between lines 150 and 213, the serverless
prediction method currently returns raw exception messages to clients and uses a
fixed 10-second sleep on 503 errors. To fix this, replace raw exception strings
with generic error messages that do not expose internal details. Implement a
capped retry mechanism with a limited number of retries and incremental backoff
instead of a single fixed sleep to handle 503 responses more robustly.
| def _predict_endpoint(self, text: str) -> Dict[str, Any]: | ||
| """Predict using inference endpoints.""" | ||
| try: | ||
| payload = {"inputs": text} | ||
| timeout = int(os.getenv('TIMEOUT_SECONDS', '10')) | ||
|
|
||
| response = self.session.post( | ||
| self.endpoint_url, | ||
| headers=self.headers, | ||
| json=payload, | ||
| timeout=timeout | ||
| ) | ||
|
|
||
| response.raise_for_status() | ||
| result = response.json() | ||
|
|
||
| # Same processing as serverless | ||
| if isinstance(result, list) and len(result) > 0: | ||
| predictions = result[0] if isinstance(result[0], list) else result | ||
| best_prediction = max(predictions, key=lambda x: x['score']) | ||
| all_emotions = {pred['label']: pred['score'] for pred in predictions} | ||
|
|
||
| return { | ||
| "emotion": best_prediction['label'], | ||
| "confidence": best_prediction['score'], | ||
| "all_emotions": all_emotions, | ||
| "text": text, | ||
| "deployment_type": "endpoint", | ||
| "model": self.model_name | ||
| } | ||
| return { | ||
| "error": "Unexpected response format", | ||
| "raw_response": result, | ||
| "deployment_type": "endpoint" | ||
| } | ||
| except requests.exceptions.Timeout: | ||
| return { | ||
| "error": "Request timeout (endpoint may be starting up)", | ||
| "suggestion": "Try again in a few seconds", | ||
| "deployment_type": "endpoint" | ||
| } | ||
| except requests.exceptions.RequestException as e: | ||
| return { | ||
| "error": f"Endpoint request failed: {e}", | ||
| "deployment_type": "endpoint" | ||
| } | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Mirror sanitized errors for endpoint backend; consider DRY refactor.
- Keep client-facing error messages generic (no str(e)).
- Optionally factor out shared HTTP prediction logic to reduce duplication.
- except requests.exceptions.Timeout:
+ except requests.exceptions.Timeout:
return {
- "error": "Request timeout (endpoint may be starting up)",
+ "error": "Request timeout.",
"suggestion": "Try again in a few seconds",
"deployment_type": "endpoint"
}
- except requests.exceptions.RequestException as e:
+ except requests.exceptions.RequestException as e:
+ logger.error("Endpoint HTTP error: %s", e)
return {
- "error": f"Endpoint request failed: {e}",
+ "error": "Endpoint request failed.",
+ "error_code": "ENDPOINT_HTTP_ERROR",
"deployment_type": "endpoint"
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _predict_endpoint(self, text: str) -> Dict[str, Any]: | |
| """Predict using inference endpoints.""" | |
| try: | |
| payload = {"inputs": text} | |
| timeout = int(os.getenv('TIMEOUT_SECONDS', '10')) | |
| response = self.session.post( | |
| self.endpoint_url, | |
| headers=self.headers, | |
| json=payload, | |
| timeout=timeout | |
| ) | |
| response.raise_for_status() | |
| result = response.json() | |
| # Same processing as serverless | |
| if isinstance(result, list) and len(result) > 0: | |
| predictions = result[0] if isinstance(result[0], list) else result | |
| best_prediction = max(predictions, key=lambda x: x['score']) | |
| all_emotions = {pred['label']: pred['score'] for pred in predictions} | |
| return { | |
| "emotion": best_prediction['label'], | |
| "confidence": best_prediction['score'], | |
| "all_emotions": all_emotions, | |
| "text": text, | |
| "deployment_type": "endpoint", | |
| "model": self.model_name | |
| } | |
| return { | |
| "error": "Unexpected response format", | |
| "raw_response": result, | |
| "deployment_type": "endpoint" | |
| } | |
| except requests.exceptions.Timeout: | |
| return { | |
| "error": "Request timeout (endpoint may be starting up)", | |
| "suggestion": "Try again in a few seconds", | |
| "deployment_type": "endpoint" | |
| } | |
| except requests.exceptions.RequestException as e: | |
| return { | |
| "error": f"Endpoint request failed: {e}", | |
| "deployment_type": "endpoint" | |
| } | |
| def _predict_endpoint(self, text: str) -> Dict[str, Any]: | |
| """Predict using inference endpoints.""" | |
| try: | |
| payload = {"inputs": text} | |
| timeout = int(os.getenv('TIMEOUT_SECONDS', '10')) | |
| response = self.session.post( | |
| self.endpoint_url, | |
| headers=self.headers, | |
| json=payload, | |
| timeout=timeout | |
| ) | |
| response.raise_for_status() | |
| result = response.json() | |
| # Same processing as serverless | |
| if isinstance(result, list) and len(result) > 0: | |
| predictions = result[0] if isinstance(result[0], list) else result | |
| best_prediction = max(predictions, key=lambda x: x['score']) | |
| all_emotions = {pred['label']: pred['score'] for pred in predictions} | |
| return { | |
| "emotion": best_prediction['label'], | |
| "confidence": best_prediction['score'], | |
| "all_emotions": all_emotions, | |
| "text": text, | |
| "deployment_type": "endpoint", | |
| "model": self.model_name | |
| } | |
| return { | |
| "error": "Unexpected response format", | |
| "raw_response": result, | |
| "deployment_type": "endpoint" | |
| } | |
| except requests.exceptions.Timeout: | |
| return { | |
| "error": "Request timeout.", | |
| "suggestion": "Try again in a few seconds", | |
| "deployment_type": "endpoint" | |
| } | |
| except requests.exceptions.RequestException as e: | |
| logger.error("Endpoint HTTP error: %s", e) | |
| return { | |
| "error": "Endpoint request failed.", | |
| "error_code": "ENDPOINT_HTTP_ERROR", | |
| "deployment_type": "endpoint" | |
| } |
🧰 Tools
🪛 Ruff (0.12.2)
214-214: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
🤖 Prompt for AI Agents
In deployment/flexible_api_server.py around lines 214 to 260, the error handling
for the endpoint backend currently exposes detailed exception messages to
clients, which should be sanitized to keep error messages generic. Update the
error responses to avoid including the string representation of exceptions and
instead provide general error messages. Additionally, consider refactoring the
HTTP prediction logic into a shared helper method to reduce code duplication
between different deployment types.
| @app.route('/predict', methods=['POST']) | ||
| def predict_emotion(): | ||
| """Predict emotion for given text.""" | ||
| if detector is None: | ||
| return jsonify({ | ||
| 'error': 'Detector not initialized' | ||
| }), 503 | ||
|
|
||
| try: | ||
| data = request.get_json() | ||
|
|
||
| if not data or 'text' not in data: | ||
| return jsonify({'error': 'No text provided'}), 400 | ||
|
|
||
| text = data['text'] | ||
| if not text.strip(): | ||
| return jsonify({'error': 'Empty text provided'}), 400 | ||
|
|
||
| # Make prediction | ||
| result = detector.predict(text) | ||
|
|
||
| # Return appropriate status code | ||
| if 'error' in result: | ||
| return jsonify(result), 500 | ||
| return jsonify(result) | ||
|
|
||
| except Exception as e: | ||
| return jsonify({'error': str(e)}), 500 | ||
|
|
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verification: scan for potential PII in error responses.
Ensure no error JSON includes user-provided text or secrets. The current diffs remove str(e) from client responses; validate across the repo.
Also applies to: 407-439
🏁 Script executed:
#!/bin/bash
# Find any error responses that might include request text or variables named 'text'
# 1) Look for error dicts that also include "text"
rg -n -A1 -B1 $'jsonify\\(\\{[^\\)]*error[^\\}]*\\}\\)' | sed -n '1,200p'
rg -n $'\"text\"\\s*:' | sed -n '1,200p'
# 2) Ensure no direct exception strings are returned to clients
rg -n $'jsonify\\(\\{[^\\)]*str\\(e\\)' | sed -n '1,200p'
# 3) Check detector error returns for str(e)
rg -n $'\"error\"\\s*:\\s*str\\(e\\)'Length of output: 32652
Replace direct exception messages in API responses
Multiple endpoints still return str(e) in their JSON error responses, which can expose internal details or sensitive information. Instead, return a generic error message and log the exception internally. For example, replace:
except Exception as e:
return jsonify({'error': str(e)}), 500with something like:
except Exception as e:
logger.error("Prediction error", exc_info=e)
return jsonify({'error': 'Internal server error'}), 500Key locations to update (search for return jsonify({'error': str(e)}):
- deployment/flexible_api_server.py: lines 405, 438
- deployment/api_server.py: lines 53, 73
- deployment/secure_api_server.py: lines 185, 434, 464, 500, 530, 574, 617, 634, 694
- deployment/local/api_server.py: lines 214, 253, 299, 375
- deployment/gcp/predict.py: line 123
- deployment/cloud-run/minimal_api_server.py: line 73
- deployment/cloud-run/onnx_api_server.py: lines 271, 304
- scripts/deployment/deploy_locally.py: lines 156, 183
- …and any other instances of
str(e)in error responses.
Run a global grep for str(e) in JSONify calls to catch all occurrences, then swap to a generic message and ensure exceptions are logged with logger.error(..., exc_info=e).
🧰 Tools
🪛 Ruff (0.12.2)
379-379: Missing return type annotation for public function predict_emotion
(ANN201)
🤖 Prompt for AI Agents
In deployment/flexible_api_server.py around lines 378 to 406, the exception
handling in the predict_emotion endpoint returns the direct exception message in
the JSON response, which can expose sensitive internal details. To fix this,
replace the return statement inside the except block to log the exception using
logger.error with exc_info=e, and return a generic error message like {'error':
'Internal server error'} with a 500 status code instead of returning str(e).
Apply this pattern consistently to all similar exception handlers in the file
and other specified files.
| except Exception as e: | ||
| return jsonify({'error': str(e)}), 500 | ||
|
|
There was a problem hiding this comment.
Do not return raw exception messages to clients (no PII in errors).
Return a generic message and log details server-side.
- except Exception as e:
- return jsonify({'error': str(e)}), 500
+ except Exception as e:
+ logger.exception("Unhandled error in /predict: %s", e)
+ return jsonify({'error': 'Internal error. Please try again later.', 'error_code': 'PREDICT_INTERNAL_ERROR'}), 500📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| return jsonify({'error': str(e)}), 500 | |
| except Exception as e: | |
| logger.exception("Unhandled error in /predict: %s", e) | |
| return jsonify({ | |
| 'error': 'Internal error. Please try again later.', | |
| 'error_code': 'PREDICT_INTERNAL_ERROR' | |
| }), 500 |
🤖 Prompt for AI Agents
In deployment/flexible_api_server.py around lines 404 to 406, avoid returning
raw exception messages to clients to prevent exposing PII. Instead, return a
generic error message like "Internal server error" in the JSON response, and log
the detailed exception message on the server side for debugging purposes.
| texts = data['texts'] | ||
| if not isinstance(texts, list): | ||
| return jsonify({'error': 'Texts must be a list'}), 400 | ||
|
|
||
| results = [] | ||
| for text in texts: | ||
| if text and text.strip(): | ||
| result = detector.predict(text.strip()) | ||
| results.append(result) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Validate batch items and enforce a max batch size.
Prevent accidental large payloads and non-string items from triggering errors.
- texts = data['texts']
+ texts = data['texts']
+ max_batch = int(os.getenv('MAX_BATCH_SIZE', '32'))
+ if len(texts) > max_batch:
+ return jsonify({'error': f'Batch size exceeds limit of {max_batch}'}), 400
@@
- for text in texts:
- if text and text.strip():
- result = detector.predict(text.strip())
- results.append(result)
+ for idx, text in enumerate(texts):
+ if not isinstance(text, str):
+ results.append({'error': 'Invalid item type; expected string', 'index': idx})
+ continue
+ stripped = text.strip()
+ if stripped:
+ results.append(detector.predict(stripped))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| texts = data['texts'] | |
| if not isinstance(texts, list): | |
| return jsonify({'error': 'Texts must be a list'}), 400 | |
| results = [] | |
| for text in texts: | |
| if text and text.strip(): | |
| result = detector.predict(text.strip()) | |
| results.append(result) | |
| texts = data['texts'] | |
| max_batch = int(os.getenv('MAX_BATCH_SIZE', '32')) | |
| if len(texts) > max_batch: | |
| return jsonify({'error': f'Batch size exceeds limit of {max_batch}'}), 400 | |
| if not isinstance(texts, list): | |
| return jsonify({'error': 'Texts must be a list'}), 400 | |
| results = [] | |
| for idx, text in enumerate(texts): | |
| if not isinstance(text, str): | |
| results.append({'error': 'Invalid item type; expected string', 'index': idx}) | |
| continue | |
| stripped = text.strip() | |
| if stripped: | |
| results.append(detector.predict(stripped)) |
🤖 Prompt for AI Agents
In deployment/flexible_api_server.py around lines 421 to 430, the code currently
does not validate individual items in the 'texts' list or enforce a maximum
batch size. To fix this, add a check to ensure each item in 'texts' is a string
and skip or handle non-string items appropriately. Also, enforce a maximum batch
size limit by returning an error response if the list exceeds this limit to
prevent processing overly large payloads.
| except Exception as e: | ||
| return jsonify({'error': str(e)}), 500 | ||
|
|
There was a problem hiding this comment.
Sanitize batch endpoint errors and log server-side.
Same rationale as single predict.
- except Exception as e:
- return jsonify({'error': str(e)}), 500
+ except Exception as e:
+ logger.exception("Unhandled error in /predict_batch: %s", e)
+ return jsonify({'error': 'Internal error. Please try again later.', 'error_code': 'PREDICT_BATCH_INTERNAL_ERROR'}), 500📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| return jsonify({'error': str(e)}), 500 | |
| except Exception as e: | |
| logger.exception("Unhandled error in /predict_batch: %s", e) | |
| return jsonify({ | |
| 'error': 'Internal error. Please try again later.', | |
| 'error_code': 'PREDICT_BATCH_INTERNAL_ERROR' | |
| }), 500 |
🤖 Prompt for AI Agents
In deployment/flexible_api_server.py around lines 437 to 439, the batch endpoint
error handling currently returns the error message directly to the client
without sanitization or server-side logging. Modify the except block to log the
full exception details on the server side for debugging, and return a sanitized,
generic error message to the client in the JSON response to avoid exposing
sensitive information.
Summary by Sourcery
Add a Flask-based flexible emotion detection API server supporting serverless, endpoint, and local deployments with explicit timeouts and PII-safe error handling
New Features:
Enhancements:
Summary by CodeRabbit