ci: harden doctor step PATH to include /usr/local/bin:/usr/bin:/bin; …#48
ci: harden doctor step PATH to include /usr/local/bin:/usr/bin:/bin; …#48uelkerd wants to merge 34 commits into
Conversation
…ensure executor PATH includes system bins
Reviewer's guide (collapsed on small PRs)Reviewer's GuideHarden CircleCI environment by prepending system binary paths, remove redundant system dependency installation, and streamline the Miniconda download fallback logic. Class diagram for updated CircleCI config commandsclassDiagram
class Executor {
+PATH: string
}
class DoctorStep {
+export PATH
+echo SHELL
+uname -a
+whoami
}
class SetupCondaEnv {
+Install Miniconda
-Install system dependencies (removed)
+Download fallback: curl, wget, python
+Error if none available
}
Executor <|-- DoctorStep
Executor <|-- SetupCondaEnv
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Summary of Changes
Hello @uelkerd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've made changes to the CircleCI configuration to improve the robustness and reliability of our CI environment, particularly concerning the PATH variable. The primary goal is to ensure that system binaries are consistently available across different steps, preventing potential issues with missing commands. Additionally, I've streamlined some environment setup processes by removing redundant dependency installations and simplifying the Miniconda download logic, aiming for a cleaner and more efficient CI pipeline.
Highlights
- CI Environment Path Hardening: The
PATHenvironment variable in thepython-baseexecutor has been updated to explicitly include standard system binary directories (/usr/local/bin:/usr/bin:/bin) before the Miniconda path. This ensures essential system tools are always discoverable. - Doctor Step Robustness: The
doctorcommand now explicitly exports the hardenedPATHat the beginning of its execution. This guarantees that system binaries liketeeare available, regardless of the base environment's initialPATHconfiguration. - Streamlined Conda Setup: The
setup_conda_envcommand has been simplified by removing theInstall system dependenciesstep, which previously installedcurl,wget,ca-certificates,portaudio19-dev, andpython3-pyaudio. This suggests these dependencies are either no longer needed or are expected to be present in the base image. - Simplified Miniconda Download Logic: The logic for downloading Miniconda has been refined. The previous attempts to install
curlif it wasn't found (usingsudo apt-get,yum,dnf,apk) have been removed. The script now directly checks forwget,python, orcurland exits if none are available, implying a more controlled environment where these tools are expected.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
|
Here's the code health analysis summary for commits Analysis Summary
|
There was a problem hiding this comment.
Pull Request Overview
This PR hardens the CircleCI configuration by explicitly ensuring system binary paths are available in the environment PATH variable. The changes prevent potential issues where essential system utilities might not be accessible due to restricted PATH settings.
- Explicitly prepends standard system binary paths (
/usr/local/bin:/usr/bin:/bin) to the PATH environment variable - Removes complex fallback logic for package installation in favor of a simpler approach
- Streamlines the conda setup by removing system dependency installation steps
| fi | ||
| echo "Error: no wget, curl, or python available to download Miniconda." >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
There appears to be an extra 'fi' statement that doesn't match any corresponding 'if' statement in the remaining code. This will cause a syntax error in the bash script.
There was a problem hiding this comment.
Code Review
This pull request hardens the CI configuration by ensuring system binary paths are included in the PATH environment variable. It also simplifies the setup by removing on-the-fly installation of dependencies, making the CI environment more predictable. The changes look good, but I've pointed out a case of redundant configuration that could be simplified for better maintainability.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request does a great job of hardening and simplifying the CircleCI configuration. The changes, such as prepending system binary paths to PATH, refactoring the doctor command for better readability, and streamlining the Miniconda installation, make the CI pipeline more robust and maintainable. I have one suggestion to further improve the resilience of the diagnostic doctor step. Overall, these are excellent improvements.
…n 2, split junit xml outputs; run rate limiter test without coverage
…ge, then serial coverage pass; clean .coverage* before coverage run
…lobal coverage flags; keep coverage only in serial pass
…dbox_executor tests; keep coverage serial
… run them serial after env set
…SFORMERS_OFFLINE)
… minimal env + pip deps
… enforce offline env across jobs
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe updates enhance CI/CD robustness and test reliability by improving environment variable management, error handling, and test execution in the CircleCI pipeline. A new Flask dependency is added, model pre-warming is made offline-aware, and a unit test is further mocked to avoid network dependency during configuration loading. Additionally, test result reporting logic is centralized for consistency. The secure emotion detection model loading is refactored for lazy loading with stub fallback in CI/test environments. The unified AI API is improved for better test compatibility, error handling, and permission injection. Database connection logic now supports fallback to SQLite if PostgreSQL credentials are missing. Coverage thresholds are raised, and the CI report reflects a full failure scenario. Changes
Sequence Diagram(s)sequenceDiagram
participant CI as CircleCI Job
participant Env as Environment Setup
participant Bash as $BASH_ENV
participant Conda as Conda Environment
participant Test as Unit Tests
participant Script as pre_warm_models.py
CI->>Env: Start job, set offline & DB env vars
Env->>Bash: Export env vars to $BASH_ENV
Env->>Conda: Install Miniconda (fail if no downloader)
Conda-->>Env: On failure, create minimal env with pip
Env->>Script: Run pre_warm_models.py
Script->>Script: Check HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE
alt Offline Mode
Script-->>Env: Skip model download
else Online Mode
Script->>Script: Pre-warm models
end
Env->>Test: Run unit tests (parallel/serial, filtered)
Test->>Test: Mock config/model as needed
Test-->>CI: Report results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
🔭 Outside diff range comments (3)
.circleci/config.yml (3)
152-153: Ensure $BASH_ENV is sourced in run_in_condarun_in_conda currently ignores $BASH_ENV, so per-job env overrides (e.g., toggling offline mode) won’t apply. Source it for consistency with conda_exec.
command: | - $HOME/miniconda/envs/samo-dl-stable/bin/python -c "<< parameters.command >>" + set -euxo pipefail + if [ -n "${BASH_ENV:-}" ] && [ -f "$BASH_ENV" ]; then + set +u + source "$BASH_ENV" + set -u + fi + $HOME/miniconda/envs/samo-dl-stable/bin/python -c "<< parameters.command >>"
506-508: Ensure GPU checks don’t break CPU-only runnersGuard the
nvidia-smicall and add a CUDA-availability check so both the environment setup and the GPU training test exit cleanly when no GPU is present.• .circleci/config.yml (run_in_conda → GPU Environment Setup, lines 506–508)
- nvidia-smi + nvidia-smi || echo "nvidia-smi not available, skipping GPU checks"• .circleci/config.yml (run_in_conda → GPU Training Test)
Wrap the test commands with a quick Python check, for example:python - <<EOF import torch if not torch.cuda.is_available(): print("CUDA not available, skipping GPU Training Test") exit(0) EOF # ...rest of your GPU tests...
53-53: Harden the PATH in the python-gpu executor to match python-mlIn
.circleci/config.ymlunder thepython-gpuexecutor’senvironment, prepend the system directories before Conda so that system binaries aren’t shadowed:• File:
.circleci/config.yml
Location: around line 85 (underexecutors → python-gpu → environment)environment: PYTHONPATH: $CIRCLE_WORKING_DIRECTORY/src HF_HOME: /home/circleci/.cache/huggingface - PATH: $HOME/miniconda/bin:$PATH # Add conda to PATH for all jobs + PATH: /usr/local/bin:/usr/bin:/bin:$HOME/miniconda/bin:$PATH # Ensure system bins + conda
♻️ Duplicate comments (2)
.circleci/config.yml (2)
88-95: Doctor step robustness improvements look good
- Non-fatal diagnostics via
|| trueis applied consistently (pwd and ls included).- Outputs are captured as artifacts for debugging.
This also addresses previous feedback about making diagnostics non-fatal.
Also applies to: 97-104
175-176: Error message now matches simplified Miniconda download fallbacksMessage accurately reflects the attempted tools (wget, curl, python3, python). Previous concern about a misleading error string is resolved.
🧹 Nitpick comments (7)
environment.yml (1)
17-17: Flask addition: verify necessity alongside FastAPIAdding Flask 3.0.3 is fine, but it introduces a second web framework next to FastAPI. Confirm this is intentional (e.g., for tests/diagnostics) to avoid unnecessary footprint and dependency surface.
.circleci/config.yml (4)
204-209: Env update fallback is reasonable; consider surfacing logsWhen update fails, you continue with the existing env. Consider capturing
conda env updatestderr to an artifact to aid debugging (e.g.,2> conda.update.stderr.txt && cat ... || true).
215-222: Avoid duplicating env var sources (executor vs $BASH_ENV)DB_* and offline flags are set in both executor env and appended to $BASH_ENV. Prefer a single source of truth (ideally $BASH_ENV) to reduce confusion and precedence issues.
316-317: Coverage step chaining is brittle; consider explicit combine/reportChaining multiple pytest invocations in a single command makes failures harder to triage. Consider:
- Run coverage in one invocation with include/exclude patterns; or
- Combine
.coverage.*files viacoverage combine && coverage xml && coverage html.This improves reliability and reporting.
241-247: Cache ordering: restore before setup to maximize hitsYou currently restore cache after setup. Move restore_dependencies before setup_python_env to take advantage of cached Miniconda/env directories and speed up jobs.
scripts/ci/pre_warm_models.py (1)
17-26: Offline detection: accept broader truthy valuesChecking only for "1" is brittle. Consider also accepting "true"/"True"/"yes" and handling whitespace. You can also wire
local_files_only=offlineto from_pretrained calls for explicitness when not skipping.- offline = ( - os.getenv("HF_HUB_OFFLINE") == "1" - or os.getenv("TRANSFORMERS_OFFLINE") == "1" - ) + def _truthy(val: str | None) -> bool: + return (val or "").strip().lower() in {"1", "true", "yes", "on"} + offline = _truthy(os.getenv("HF_HUB_OFFLINE")) or _truthy(os.getenv("TRANSFORMERS_OFFLINE"))Optionally, if not skipping, pass
local_files_only=offlinetofrom_pretrained.tests/unit/test_emotion_detection.py (1)
23-26: Decorator argument order: verify mapping remains correctMultiple patch decorators pass mocks in the same order as decorators are applied (topmost decorator’s mock is the first arg). Your naming matches this pattern (mock_bert for AutoModel, mock_config for AutoConfig), but it’s easy to regress. Keep tests consistent across methods.
Also applies to: 43-46, 60-63, 122-125, 146-149, 186-189
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.circleci/config.yml(6 hunks)environment.yml(1 hunks)scripts/ci/pre_warm_models.py(1 hunks)tests/unit/test_emotion_detection.py(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
.circleci/config.yml
[error] 211-211: trailing spaces
(trailing-spaces)
🔇 Additional comments (3)
.circleci/config.yml (1)
73-78: Good: conda_exec now respects $BASH_ENVSourcing $BASH_ENV ensures exported vars (offline flags, DB env) propagate into conda-run steps. This addresses env propagation issues.
tests/unit/test_emotion_detection.py (2)
60-68: Good: mock AutoConfig to avoid network in forward_passThe additional patch and minimal config (hidden_size=768) prevent real model/config downloads. This keeps the test hermetic and fast.
12-17: No class-name inconsistency detectedAll model definitions and imports use
BERTEmotionClassifier(all-caps “BERT”), and there is no separateBertEmotionClassifierclass. The only occurrence of “TestBertEmotionClassifier” is a test class name, not a reference to a different model class. No changes needed.Likely an incorrect or invalid review comment.
- Fixed false failure count caused by including metadata in test totals - Now correctly counts only boolean test results (11 actual tests) - Environment info no longer counted as a test result - Report now accurately shows real test failures vs metadata
Co-authored-by: denizcan.uelker <denizcan.uelker@mercedes-benz.com>
- Extract duplicate test statistics logic into _get_test_stats() helper method - Simplify passed_tests calculation using sum(test_results.values()) - Eliminates code duplication between generate_report() and main() - Improves maintainability following DRY principles
Co-authored-by: denizcan.uelker <denizcan.uelker@mercedes-benz.com>
Resolved issues in scripts/ci/run_full_ci_pipeline.py with DeepSource Autofix
- Replace unused 'results' variable with underscore (_) - Replace unused 'test_results' variable with underscore (_) - Resolves linter warnings about unused variables - Follows Python convention for intentionally unused variables
Debug CI pipeline issues
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
ci_pipeline_report.txt (1)
1-30: Avoid committing generated CI artifacts; store them as build artifacts instead.Commiting
ci_pipeline_report.txtwill cause noisy diffs and merge conflicts. Prefer emitting it in CI (e.g., CircleCI “store_artifacts”) and add it to .gitignore.scripts/ci/run_full_ci_pipeline.py (4)
335-337: Good reuse: centralize stats via helper.Using the helper reduces duplication and drift. LGTM. One minor improvement: guard against zero tests before computing success rate.
Outside this hunk, consider:
success_rate = (passed_tests / total_tests * 100) if total_tests else 0.0
367-370: Fix trailing whitespace (Ruff W291) and slightly tighten formatting.Remove trailing space at Line 367.
- failed_test_names = [name for name, result in test_results.items() + failed_test_names = [name for name, result in test_results.items() if not result] report += f"⚠️ Failed tests: {', '.join(failed_test_names)}\n"
380-380: Avoid assigning throwaway_; just call the method.Cleaner and conveys intent.
- _ = runner.run_full_pipeline() + runner.run_full_pipeline()
390-390: Use the (new) public helper, not a “private” method.After the rename, call
get_test_stats().- _, total_tests, passed_tests = runner._get_test_stats() + _, total_tests, passed_tests = runner.get_test_stats()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
ci_pipeline.logis excluded by!**/*.logsrc/models/__pycache__/__init__.cpython-313.pycis excluded by!**/*.pycsrc/models/summarization/__pycache__/__init__.cpython-313.pycis excluded by!**/*.pycsrc/models/summarization/__pycache__/dataset_loader.cpython-313.pycis excluded by!**/*.pyc
📒 Files selected for processing (2)
ci_pipeline_report.txt(1 hunks)scripts/ci/run_full_ci_pipeline.py(5 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
scripts/ci/run_full_ci_pipeline.py
367-367: Trailing whitespace
Remove trailing whitespace
(W291)
🔇 Additional comments (4)
ci_pipeline_report.txt (4)
6-9: Summary stats look consistent with boolean test keys.11 total boolean checks and 0% success rate aligns with the new aggregation logic. LGTM.
12-23: Detailed results naming matches CIPipelineRunner keys.Names align with Path(...).stem results and top-level booleans. Good consistency.
25-25: Execution time seems unexpectedly low; please verify.0.2s for the full pipeline is suspicious. Confirm start/stop timing covers the entire run and that CI didn’t short-circuit steps.
28-28: Nice: explicit listing of failed tests.Clearer than generic failure counts; improves debuggability.
| def _get_test_stats(self) -> tuple[dict, int, int]: | ||
| """Calculate statistics on test results. | ||
|
|
||
| Returns: | ||
| tuple: (test_results dict, total_tests, passed_tests) | ||
| """ | ||
| test_results = { | ||
| name: result | ||
| for name, result in self.results.items() | ||
| if isinstance(result, bool) | ||
| } | ||
| total_tests = len(test_results) | ||
| # Booleans can be summed directly (True=1, False=0) | ||
| passed_tests = sum(test_results.values()) | ||
| return test_results, total_tests, passed_tests | ||
|
|
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Make _get_test_stats public and align type hints.
It’s called from outside the class (runner._get_test_stats()), so it shouldn’t be private. Also, keep typing style consistent (use Tuple[Dict[str, bool], int, int] or PEP 585 types uniformly).
Apply this diff:
- def _get_test_stats(self) -> tuple[dict, int, int]:
+ def get_test_stats(self) -> Tuple[Dict[str, bool], int, int]:
"""Calculate statistics on test results.
Returns:
- tuple: (test_results dict, total_tests, passed_tests)
+ Tuple[Dict[str, bool], int, int]: (test_results dict, total_tests, passed_tests)
"""
- test_results = {
- name: result
- for name, result in self.results.items()
- if isinstance(result, bool)
- }
+ test_results: Dict[str, bool] = {
+ name: result
+ for name, result in self.results.items()
+ if isinstance(result, bool)
+ }
total_tests = len(test_results)
# Booleans can be summed directly (True=1, False=0)
passed_tests = sum(test_results.values())
return test_results, total_tests, passed_testsRun this to find and update all references:
🏁 Script executed:
#!/bin/bash
rg --line-number --hidden --glob '!dist' --glob '!build' --glob '!.git' '_get_test_stats'Length of output: 398
Rename _get_test_stats to get_test_stats and align type hints
To make this helper public and keep typing consistent, apply the following changes in scripts/ci/run_full_ci_pipeline.py:
- Add at top of file:
+ from typing import Tuple, Dict - Change the method signature and internal annotation:
- def _get_test_stats(self) -> tuple[dict, int, int]: + def get_test_stats(self) -> Tuple[Dict[str, bool], int, int]: """Calculate statistics on test results. Returns: - tuple: (test_results dict, total_tests, passed_tests) + Tuple[Dict[str, bool], int, int]: (test_results dict, total_tests, passed_tests) """ - test_results = { + test_results: Dict[str, bool] = { name: result for name, result in self.results.items() if isinstance(result, bool) }
- Update all call sites (lines 336 and 390):
- test_results, total_tests, passed_tests = self._get_test_stats() + test_results, total_tests, passed_tests = self.get_test_stats() - _, total_tests, passed_tests = runner._get_test_stats() + _, total_tests, passed_tests = runner.get_test_stats()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _get_test_stats(self) -> tuple[dict, int, int]: | |
| """Calculate statistics on test results. | |
| Returns: | |
| tuple: (test_results dict, total_tests, passed_tests) | |
| """ | |
| test_results = { | |
| name: result | |
| for name, result in self.results.items() | |
| if isinstance(result, bool) | |
| } | |
| total_tests = len(test_results) | |
| # Booleans can be summed directly (True=1, False=0) | |
| passed_tests = sum(test_results.values()) | |
| return test_results, total_tests, passed_tests | |
| def get_test_stats(self) -> Tuple[Dict[str, bool], int, int]: | |
| """Calculate statistics on test results. | |
| Returns: | |
| Tuple[Dict[str, bool], int, int]: (test_results dict, total_tests, passed_tests) | |
| """ | |
| test_results: Dict[str, bool] = { | |
| name: result | |
| for name, result in self.results.items() | |
| if isinstance(result, bool) | |
| } | |
| total_tests = len(test_results) | |
| # Booleans can be summed directly (True=1, False=0) | |
| passed_tests = sum(test_results.values()) | |
| return test_results, total_tests, passed_tests |
🤖 Prompt for AI Agents
In scripts/ci/run_full_ci_pipeline.py around lines 50 to 65, rename the method
_get_test_stats to get_test_stats to make it public, and update its type hint to
use standard typing syntax for compatibility. Add the necessary import for
typing.Tuple at the top of the file. Also, update all call sites of this method
at lines 336 and 390 to use the new name get_test_stats to maintain consistency.
| # Only count boolean results as actual tests | ||
| test_results, total_tests, passed_tests = self._get_test_stats() | ||
|
|
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Ensure all call sites reflect the rename to get_test_stats.
Align class-internal and external usages to avoid calling a private helper from outside.
Also applies to: 390-390, 50-65
🏁 Script executed:
#!/bin/bash
# Confirm no lingering references remain
rg --line-number '_get_test_stats\('Length of output: 347
Rename _get_test_stats to get_test_stats and update all call sites
To avoid exposing a private helper externally and to keep naming consistent:
• scripts/ci/run_full_ci_pipeline.py, line 50
• scripts/ci/run_full_ci_pipeline.py, line 336
• scripts/ci/run_full_ci_pipeline.py, line 390
Suggested diff:
--- a/scripts/ci/run_full_ci_pipeline.py
+++ b/scripts/ci/run_full_ci_pipeline.py
@@ line 50
- def _get_test_stats(self) -> tuple[dict, int, int]:
+ def get_test_stats(self) -> tuple[dict, int, int]:
@@ line 336
- test_results, total_tests, passed_tests = self._get_test_stats()
+ test_results, total_tests, passed_tests = self.get_test_stats()
@@ line 390
- _, total_tests, passed_tests = runner._get_test_stats()
+ _, total_tests, passed_tests = runner.get_test_stats()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Only count boolean results as actual tests | |
| test_results, total_tests, passed_tests = self._get_test_stats() | |
| --- a/scripts/ci/run_full_ci_pipeline.py | |
| @@ -50,7 +50,7 @@ | |
| - def _get_test_stats(self) -> tuple[dict, int, int]: | |
| + def get_test_stats(self) -> tuple[dict, int, int]: | |
| @@ -336,7 +336,7 @@ | |
| - test_results, total_tests, passed_tests = self._get_test_stats() | |
| + test_results, total_tests, passed_tests = self.get_test_stats() | |
| @@ -390,7 +390,7 @@ | |
| - _, total_tests, passed_tests = runner._get_test_stats() | |
| + _, total_tests, passed_tests = runner.get_test_stats() |
🤖 Prompt for AI Agents
In scripts/ci/run_full_ci_pipeline.py around lines 50, 336, and 390, rename the
method _get_test_stats to get_test_stats to avoid exposing a private helper
externally and maintain consistent naming. Update all call sites accordingly to
use get_test_stats instead of _get_test_stats.
… Miniconda, remove global HF offline); tests: lazy stub secure model in CI; fix batch permissions and mocks; strip trailing whitespace
…fix 404 on Miniconda latest checksum
…nd matching installer basename
…ract latest tag for sha256sum.txt)
There was a problem hiding this comment.
Actionable comments posted: 9
🔭 Outside diff range comments (4)
tests/integration/test_priority1_features.py (2)
145-170: Duplicate test names shadow earlier tests in classThere are two test_voice_transcription_missing_file methods; the latter overrides the former, silently dropping a test body. This can hide regressions.
Please remove the duplicate or rename one to keep both cases executing.
Also applies to: 220-236
282-334: Duplicate batch partial-failure tests; earlier one lacks patch and is overriddenTwo test_batch_transcription_partial_failures methods exist; the later patched one overrides the first (which also lacks a patch decorator). This likely wasn’t intended.
Delete or rename the earlier method to avoid confusion and ensure intended coverage.
Also applies to: 386-404
deployment/secure_api_server.py (1)
300-309: Map probabilities to model labels, not a fixed emotions listZipping with a static emotions list risks mismatches with config.id2label. Use id2label for accurate mapping.
- 'probabilities': { - emotion: float(prob) for emotion, prob in zip(self.emotions, all_probs) - }, + 'probabilities': { + (self.model.config.id2label.get(i, f'label_{i}') if hasattr(self.model, 'config') else str(i)): float(p) + for i, p in enumerate(all_probs) + },src/unified_ai_api.py (1)
232-249: Gate header-based permission injection to test/CI onlyAllowing X-User-Permissions in production is risky. Restrict to TESTING/CI environments.
- injected = request.headers.get("X-User-Permissions") + injected = request.headers.get("X-User-Permissions") if (os.environ.get("TESTING") or os.environ.get("CI")) else None
🧹 Nitpick comments (17)
src/data/database.py (1)
25-31: Add DATABASE_URL override and explicit fallback toggle.Current logic is fine; consider:
- Respecting a single DATABASE_URL env var first (common 12-factor).
- Optional USE_SQLITE=1 toggle to force SQLite regardless of partial PG creds.
Apply minimal override support:
-# If PostgreSQL env vars are not provided, fall back to a local SQLite database for tests/dev -if DB_USER and DB_PASSWORD and DB_NAME: - DATABASE_URL = f"postgresql://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}" -else: - default_sqlite_path = Path(os.environ.get("SQLITE_PATH", "./samo_local.db")).expanduser().resolve() - DATABASE_URL = f"sqlite:///{default_sqlite_path}" +# Prefer explicit DATABASE_URL; else pick PG or fallback to SQLite. +DATABASE_URL = os.environ.get("DATABASE_URL") +if not DATABASE_URL: + # If PostgreSQL env vars are not provided (or USE_SQLITE=1), fall back to local SQLite for tests/dev + if os.environ.get("USE_SQLITE") == "1" or not (DB_USER and DB_PASSWORD and DB_NAME): + default_sqlite_path = Path(os.environ.get("SQLITE_PATH", "./samo_local.db")).expanduser().resolve() + DATABASE_URL = f"sqlite:///{default_sqlite_path}" + else: + DATABASE_URL = f"postgresql://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"scripts/run_api_rate_limiter_tests.py.backup (1)
44-45: Backup script: raise threshold mirrors main script — consider removing the backup file.Keeping a “.backup” runner in repo risks drift and confusion. Prefer deleting this file or moving under docs/ as an example.
I can open a cleanup issue and submit a PR to remove this backup file.
src/security/jwt_manager.py (2)
81-81: Use builtin generics per Ruff (UP006).Replace Dict[...] with dict[...] for modern typing.
- def create_token_pair(self, user_data: Dict[str, Any]) -> Dict[str, Any]: + def create_token_pair(self, user_data: dict[str, Any]) -> dict[str, Any]:
88-93: If staying with dict return, define a TypedDict for clarity.Improves readability and static checking.
Add above JWTManager:
from typing_extensions import TypedDict class TokenPairDict(TypedDict): access_token: str refresh_token: str token_type: str expires_in: intThen update signature:
- def create_token_pair(self, user_data: dict[str, Any]) -> dict[str, Any]: + def create_token_pair(self, user_data: dict[str, Any]) -> TokenPairDict:docs/SAMO-DL-PRD.md (2)
12-15: Fix markdown lint MD037 (spaces inside emphasis).Remove the space before the closing emphasis.
-## 🎉 **CURRENT STATUS: ** +## 🎉 **CURRENT STATUS:**
134-135: Trailing hyphen reads as incomplete sentence.Either drop the hyphen or continue with sub-points.
- - Generate trend summaries with statistical confidence- + - Generate trend summaries with statistical confidencetests/integration/test_priority1_features.py (1)
435-437: Strengthen assertions with invariantsRelaxed counts are fine under mocking, but ensure totals remain consistent.
Consider adding:
assert data["successful_transcriptions"] + data["failed_transcriptions"] == data["total_files"]deployment/secure_api_server.py (5)
333-349: Add typing, docstring, and minimal thread-safety to lazy initializerAligns with Ruff (ANN201/D103) and avoids races under concurrent first requests.
-secure_model = None # type: ignore[assignment] +from typing import Optional, Any +secure_model: Optional[Any] = None @@ -def get_secure_model(): +def get_secure_model() -> Any: + """Return a cached SecureEmotionDetectionModel or a CI/TEST stub.""" + # Optional: guard against concurrent init + # (lightweight; replace with threading.Lock if needed in prod) global secure_model if secure_model is not None: return secure_model
351-354: Prefer Optional[str] for broader Python compatibility and add docstringUsing PEP 604 unions requires 3.10+; Optional keeps type hints robust.
-def get_admin_api_key() -> str | None: - return os.environ.get("ADMIN_API_KEY") +from typing import Optional +def get_admin_api_key() -> Optional[str]: + """Fetch admin API key from environment per-request.""" + return os.environ.get("ADMIN_API_KEY")
251-256: Use structured logging and capture tracebackImproves diagnostics and satisfies Ruff logging guidance.
- except Exception as e: - logger.error(f"❌ Failed to load secure model: {str(e)}. Falling back to stub mode.") + except Exception as e: + logger.exception("❌ Failed to load secure model. Falling back to stub mode.") self.tokenizer = None self.model = None self.loaded = False
51-66: Brittle patching of httpx private internals; scope to tests and silence linterAccessing httpx._utils is version-sensitive. Keep the shim but log at debug if it fails, and silence F401/F401 with explicit noqa.
- import httpx # type: ignore - from httpx import _utils as _httpx_utils # type: ignore + import httpx as _httpx # noqa: F401 + from httpx import _utils as _httpx_utils # type: ignore @@ -except Exception: - pass +except Exception as _shim_exc: + logger.debug("httpx multipart shim not applied: %s", _shim_exc)
355-365: Type annotations for decorator per Ruff (ANN201/ANN001/ANN202)Add annotations to improve clarity and satisfy static analysis.
-from functools import wraps -import functools +from functools import wraps +import functools +from typing import Callable, Any @@ -def require_admin_api_key(f): +def require_admin_api_key(f: Callable[..., Any]) -> Callable[..., Any]: @@ - def decorated_function(*args, **kwargs): + def decorated_function(*args: Any, **kwargs: Any) -> Any:.circleci/config.yml (2)
49-52: Harden PATH consistently on python-gpu executorPrepend system bins as done in python-ml for symmetry and predictability.
- PATH: $HOME/miniconda/bin:$PATH # Add conda to PATH for all jobs + PATH: /usr/local/bin:/usr/bin:/bin:$HOME/miniconda/bin:$PATH # Ensure system bins + conda
226-230: Pin fallback Flask version to avoid unplanned upgradesFor minimal fallback env, pin Flask to a compatible version to reduce flakiness.
- "$HOME/miniconda/bin/conda" run -n samo-dl-stable pip install pytest pytest-xdist pytest-cov ruff bandit safety mypy httpx requests Flask PyJWT + "$HOME/miniconda/bin/conda" run -n samo-dl-stable pip install pytest pytest-xdist pytest-cov ruff bandit safety mypy httpx requests Flask==3.0.3 PyJWTsrc/unified_ai_api.py (3)
51-66: Scope the httpx shim and log failures; silence linter warningsPrivate API patching is fragile. Log at debug when it fails; mark imports as used for side effects.
- import httpx # type: ignore - from httpx import _utils as _httpx_utils # type: ignore + import httpx as _httpx # noqa: F401 + from httpx import _utils as _httpx_utils # type: ignore @@ -except Exception: - pass +except Exception as _e: + logger.debug("httpx multipart shim not applied: %s", _e)
19-19: Remove unused import (inspect)-from typing import Any, AsyncGenerator, Optional, Dict, List, Set, Tuple -import inspect +from typing import Any, AsyncGenerator, Optional, Dict, List, Set, Tuple
1248-1255: Nit: message and limit mismatchLimit is 45MB but message says “max 50MB”. Consider aligning to prevent confusion in logs.
- if len(content) > 45 * 1024 * 1024: - # Return a JSON body with 'detail' to match tests expecting that key - raise HTTPException(status_code=400, detail="File too large (max 50MB)") + if len(content) > 45 * 1024 * 1024: + raise HTTPException(status_code=400, detail="File too large (max 45MB)")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (26)
src/models/__pycache__/__init__.cpython-310.pycis excluded by!**/*.pycsrc/models/__pycache__/__init__.cpython-311.pycis excluded by!**/*.pycsrc/models/emotion_detection/__pycache__/__init__.cpython-310.pycis excluded by!**/*.pycsrc/models/emotion_detection/__pycache__/__init__.cpython-311.pycis excluded by!**/*.pycsrc/models/emotion_detection/__pycache__/bert_classifier.cpython-310.pycis excluded by!**/*.pycsrc/models/emotion_detection/__pycache__/bert_classifier.cpython-311.pycis excluded by!**/*.pycsrc/models/emotion_detection/__pycache__/dataset_loader.cpython-310.pycis excluded by!**/*.pycsrc/models/emotion_detection/__pycache__/dataset_loader.cpython-311.pycis excluded by!**/*.pycsrc/models/secure_loader/__pycache__/__init__.cpython-310.pycis excluded by!**/*.pycsrc/models/secure_loader/__pycache__/__init__.cpython-311.pycis excluded by!**/*.pycsrc/models/secure_loader/__pycache__/integrity_checker.cpython-310.pycis excluded by!**/*.pycsrc/models/secure_loader/__pycache__/integrity_checker.cpython-311.pycis excluded by!**/*.pycsrc/models/secure_loader/__pycache__/model_validator.cpython-310.pycis excluded by!**/*.pycsrc/models/secure_loader/__pycache__/model_validator.cpython-311.pycis excluded by!**/*.pycsrc/models/secure_loader/__pycache__/sandbox_executor.cpython-310.pycis excluded by!**/*.pycsrc/models/secure_loader/__pycache__/sandbox_executor.cpython-311.pycis excluded by!**/*.pycsrc/models/secure_loader/__pycache__/secure_model_loader.cpython-310.pycis excluded by!**/*.pycsrc/models/secure_loader/__pycache__/secure_model_loader.cpython-311.pycis excluded by!**/*.pycsrc/models/summarization/__pycache__/__init__.cpython-311.pycis excluded by!**/*.pycsrc/models/summarization/__pycache__/dataset_loader.cpython-311.pycis excluded by!**/*.pycsrc/models/summarization/__pycache__/t5_summarizer.cpython-311.pycis excluded by!**/*.pycsrc/models/summarization/__pycache__/training_pipeline.cpython-311.pycis excluded by!**/*.pycsrc/models/voice_processing/__pycache__/__init__.cpython-311.pycis excluded by!**/*.pycsrc/models/voice_processing/__pycache__/audio_preprocessor.cpython-311.pycis excluded by!**/*.pycsrc/models/voice_processing/__pycache__/transcription_api.cpython-311.pycis excluded by!**/*.pycsrc/models/voice_processing/__pycache__/whisper_transcriber.cpython-311.pycis excluded by!**/*.pyc
📒 Files selected for processing (10)
.circleci/config.yml(9 hunks)deployment/secure_api_server.py(7 hunks)docs/SAMO-DL-PRD.md(3 hunks)pyproject.toml(1 hunks)scripts/run_api_rate_limiter_tests.py.backup(1 hunks)scripts/testing/run_api_rate_limiter_tests.py(1 hunks)src/data/database.py(2 hunks)src/security/jwt_manager.py(1 hunks)src/unified_ai_api.py(15 hunks)tests/integration/test_priority1_features.py(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/security/jwt_manager.py (1)
src/unified_ai_api.py (1)
refresh_token(789-821)
🪛 Ruff (0.12.2)
src/security/jwt_manager.py
81-81: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
81-81: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
deployment/secure_api_server.py
252-252: Use explicit conversion flag
Replace with conversion flag
(RUF010)
333-333: Missing return type annotation for public function get_secure_model
(ANN201)
333-333: Missing docstring in public function
(D103)
352-352: Missing docstring in public function
(D103)
355-355: Missing return type annotation for public function require_admin_api_key
(ANN201)
355-355: Missing type annotation for function argument f
(ANN001)
358-358: Missing return type annotation for private function decorated_function
(ANN202)
358-358: Missing type annotation for *args
(ANN002)
358-358: Missing type annotation for **kwargs
(ANN003)
src/unified_ai_api.py
19-19: inspect imported but unused
Remove unused import: inspect
(F401)
53-53: httpx imported but unused
Remove unused import: httpx
(F401)
64-65: try-except-pass detected, consider logging the exception
(S110)
🪛 YAMLlint (1.37.1)
.circleci/config.yml
[error] 178-178: trailing spaces
(trailing-spaces)
🪛 markdownlint-cli2 (0.17.2)
docs/SAMO-DL-PRD.md
12-12: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
🔇 Additional comments (7)
src/data/database.py (2)
12-12: LGTM: pathlib import.Import is used for robust SQLite path handling.
32-45: Engine options: SQLite vs Postgres separation looks correct.
- SQLite: check_same_thread=False is appropriate with scoped_session; pooling options omitted (fine).
- Postgres: pool_pre_ping and recycle set (good defaults).
src/security/jwt_manager.py (1)
81-93: No issues found: dict return from create_token_pair is compatible with TokenResponse endpoints
FastAPI will automatically coerce the returned dict into the declaredTokenResponsemodel, and existing integration tests validate dict‐like membership for"access_token","refresh_token","token_type", and"expires_in". No changes are required.docs/SAMO-DL-PRD.md (1)
30-31: Wording update LGTM.“COLAB-TRAINED MODEL” phrasing is clearer and more neutral.
tests/integration/test_priority1_features.py (2)
358-362: Good: explicit permission injection header for batch endpointUsing X-User-Permissions aligns with the updated permission mechanism and keeps tests isolated from token contents.
1188-1203: LGTM: normalized transcription payloadConverting optional fields and types defensively makes the response schema-stable under mocks.
.circleci/config.yml (1)
178-178: Removed trailing whitespace in .circleci/config.ymlTrailing spaces on line 178 have been trimmed to satisfy YAMLlint and keep CI linters green.
Files updated:
- .circleci/config.yml: line 178
Diff:
- # Resolve the final versioned URL (since + # Resolve the final versioned URL (since
| model_instance = get_secure_model() | ||
| if not hasattr(model_instance, 'predict'): | ||
| return jsonify({'error': 'Model unavailable in CI/TEST'}), 503 | ||
| result = model_instance.predict( | ||
| sanitized_data['text'], | ||
| confidence_threshold=sanitized_data.get('confidence_threshold') |
There was a problem hiding this comment.
Guard on model readiness, not just method presence
hasattr(model_instance, 'predict') is always true for SecureEmotionDetectionModel, even when tokenizer/model are None. In non-CI stub mode this leads to 500s. Check the loaded flag instead.
- model_instance = get_secure_model()
- if not hasattr(model_instance, 'predict'):
- return jsonify({'error': 'Model unavailable in CI/TEST'}), 503
+ model_instance = get_secure_model()
+ if not getattr(model_instance, "loaded", False):
+ return jsonify({'error': 'Model unavailable'}), 503📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model_instance = get_secure_model() | |
| if not hasattr(model_instance, 'predict'): | |
| return jsonify({'error': 'Model unavailable in CI/TEST'}), 503 | |
| result = model_instance.predict( | |
| sanitized_data['text'], | |
| confidence_threshold=sanitized_data.get('confidence_threshold') | |
| model_instance = get_secure_model() | |
| if not getattr(model_instance, "loaded", False): | |
| return jsonify({'error': 'Model unavailable'}), 503 | |
| result = model_instance.predict( | |
| sanitized_data['text'], | |
| confidence_threshold=sanitized_data.get('confidence_threshold') |
🤖 Prompt for AI Agents
In deployment/secure_api_server.py around lines 444 to 449, the current check
uses hasattr(model_instance, 'predict') which is always true even if the model
or tokenizer is None, causing 500 errors in non-CI stub mode. Replace this check
with a condition that verifies the model_instance's loaded flag to ensure the
model is fully ready before calling predict. Return the 503 error if the model
is not loaded.
| model_instance = get_secure_model() | ||
| if not hasattr(model_instance, 'predict'): | ||
| return jsonify({'error': 'Model unavailable in CI/TEST'}), 503 | ||
| for text in sanitized_data['texts']: | ||
| if text.strip(): | ||
| result = secure_model.predict( | ||
| result = model_instance.predict( | ||
| text, |
There was a problem hiding this comment.
Same readiness check needed for batch predictions
Return 503 when the secure model isn’t loaded to avoid runtime errors in stub mode.
- model_instance = get_secure_model()
- if not hasattr(model_instance, 'predict'):
- return jsonify({'error': 'Model unavailable in CI/TEST'}), 503
+ model_instance = get_secure_model()
+ if not getattr(model_instance, "loaded", False):
+ return jsonify({'error': 'Model unavailable'}), 503📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model_instance = get_secure_model() | |
| if not hasattr(model_instance, 'predict'): | |
| return jsonify({'error': 'Model unavailable in CI/TEST'}), 503 | |
| for text in sanitized_data['texts']: | |
| if text.strip(): | |
| result = secure_model.predict( | |
| result = model_instance.predict( | |
| text, | |
| model_instance = get_secure_model() | |
| if not getattr(model_instance, "loaded", False): | |
| return jsonify({'error': 'Model unavailable'}), 503 | |
| for text in sanitized_data['texts']: | |
| if text.strip(): | |
| result = model_instance.predict( | |
| text, |
🤖 Prompt for AI Agents
In deployment/secure_api_server.py around lines 510 to 516, add a readiness
check for the secure model before performing batch predictions similar to the
single prediction check. Verify if the model instance has the 'predict'
attribute and if not, return a 503 response with an appropriate error message to
prevent runtime errors when the model is not loaded in CI/TEST or stub mode.
| ## 🎉 **CURRENT STATUS: ** | ||
|
|
||
| **📊 Overall Progress**: **8 of 8 MVP Requirements Complete (100%)** | ||
| **📊 Overall Progress**: **8 of 8 MVP Requirements Complete** | ||
|
|
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Optional: ensure consistency of removed “YOUR COLAB MODEL!” phrasing across doc.
Residual references might remain elsewhere.
Also applies to: 30-31
🏁 Script executed:
#!/bin/bash
# Search for remaining "YOUR COLAB MODEL" / "YOUR Colab" occurrences
rg -n "YOUR COLAB MODEL|YOUR Colab|YOUR COLAB" docs --heading -A1Length of output: 1123
Ensure complete removal of “YOUR COLAB MODEL” phrasing
The search revealed multiple residual references that need to be cleaned up for consistency:
• docs/SAMO-DL-PRD.md
- Lines 17–18: “YOUR COLAB MODEL!”
- Lines 70–71: “(YOUR COLAB MODEL!)”
- Lines 80–81: “YOUR Colab-trained model live in production…”
- Lines 254–255: “(YOUR COLAB MODEL!)”
- Lines 340–341: “YOUR COLAB TRAINING COMPLETE”
• docs/reports/PROJECT_COMPLETION_SUMMARY.md
- Lines 29–30: “YOUR Colab-trained DistilRoBERTa model in production!”
Please remove or replace all of the above occurrences to maintain consistency across the documentation.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
12-12: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
🤖 Prompt for AI Agents
In docs/SAMO-DL-PRD.md around lines 17-18, 70-71, 80-81, 254-255, and 340-341,
remove or replace all instances of the phrase "YOUR COLAB MODEL" and its
variants to ensure consistency. Search for these exact phrases and either delete
them or substitute with appropriate alternative wording that fits the document
context. Also, verify similar changes are applied in
docs/reports/PROJECT_COMPLETION_SUMMARY.md at lines 29-30 as part of the same
cleanup.
| addopts = --cov=src.api_rate_limiter --cov-report=term-missing --cov-fail-under=45 -v --tb=short | ||
| """) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Coverage threshold raised to 45% — verify CI behavior and module path.
- Ensure CircleCI actually invokes this script; otherwise the 45% threshold won’t be enforced.
- Double-check the target for --cov=src.api_rate_limiter matches your package/module path; if the module isn’t importable as src.api_rate_limiter, coverage may be zero.
Run to confirm pipeline usage and coverage target:
🏁 Script executed:
#!/bin/bash
# 1) Find CI references to this script
rg -n "run_api_rate_limiter_tests\.py" -g ".circleci/**" -A2
# 2) Check the module exists resolvable as 'src.api_rate_limiter'
fd -a "api_rate_limiter.py" | sed 's|^|Found: |'
rg -n "^ *package *=|^ *name *=|^ *from src\.api_rate_limiter|^ *import src\.api_rate_limiter" -A1Length of output: 1792
Add CI step to run API rate limiter tests
- I didn’t find any reference to
scripts/testing/run_api_rate_limiter_tests.pyin.circleci/(ripgrep returned no matches), so the 45% coverage threshold won’t be enforced. - Please update your CI config (e.g.
.circleci/config.yml) to invoke this script. - The coverage target
--cov=src.api_rate_limiteris correct—src/api_rate_limiter.pyexists and your tests import fromsrc.api_rate_limiter.
🤖 Prompt for AI Agents
In scripts/testing/run_api_rate_limiter_tests.py around lines 40 to 41, the test
script for API rate limiter is not referenced in the CI configuration, so its
coverage threshold is not enforced. To fix this, update the CI config file
(e.g., .circleci/config.yml) to add a step that runs this test script, ensuring
the coverage target --cov=src.api_rate_limiter is included and the coverage
threshold of 45% is enforced during CI runs.
| def create_token_pair(self, user_data: Dict[str, Any]) -> Dict[str, Any]: | ||
| """Create both access and refresh tokens and return as a plain dict. | ||
|
|
||
| Some tests expect a dict-like response that supports 'in' membership checks. | ||
| """ |
There was a problem hiding this comment.
Breaking change: return type now dict; verify all call sites and FastAPI response models.
unified_ai_api.refresh_token (see src/unified_ai_api.py lines 788-820) annotates TokenResponse but now receives a dict. FastAPI will serialize fine if the schema matches, but type hints/mypy and Pydantic validation may be inconsistent.
Two safe paths:
- Option A (strongly typed, backward compatible): keep returning TokenResponse and let tests use model_dump().
- def create_token_pair(self, user_data: Dict[str, Any]) -> Dict[str, Any]:
- """Create both access and refresh tokens and return as a plain dict.
- Some tests expect a dict-like response that supports 'in' membership checks.
- """
+ def create_token_pair(self, user_data: dict[str, Any]) -> TokenResponse:
+ """Create both access and refresh tokens."""
access_token = self.create_access_token(user_data)
refresh_token = self.create_refresh_token(user_data)
- return {
- "access_token": access_token,
- "refresh_token": refresh_token,
- "token_type": "bearer",
- "expires_in": ACCESS_TOKEN_EXPIRE_MINUTES * 60,
- }
+ return TokenResponse(
+ access_token=access_token,
+ refresh_token=refresh_token,
+ token_type="bearer",
+ expires_in=ACCESS_TOKEN_EXPIRE_MINUTES * 60,
+ )Optionally add a helper create_token_pair_dict() for tests.
- Option B (dict everywhere): keep dict return, but update all annotations and response models to dict or a TypedDict.
I recommend Option A to maintain API typing consistency.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def create_token_pair(self, user_data: Dict[str, Any]) -> Dict[str, Any]: | |
| """Create both access and refresh tokens and return as a plain dict. | |
| Some tests expect a dict-like response that supports 'in' membership checks. | |
| """ | |
| def create_token_pair(self, user_data: dict[str, Any]) -> TokenResponse: | |
| """Create both access and refresh tokens.""" | |
| access_token = self.create_access_token(user_data) | |
| refresh_token = self.create_refresh_token(user_data) | |
| return TokenResponse( | |
| access_token=access_token, | |
| refresh_token=refresh_token, | |
| token_type="bearer", | |
| expires_in=ACCESS_TOKEN_EXPIRE_MINUTES * 60, | |
| ) |
🧰 Tools
🪛 Ruff (0.12.2)
81-81: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
81-81: Use dict instead of Dict for type annotation
Replace with dict
(UP006)
🤖 Prompt for AI Agents
In src/security/jwt_manager.py around lines 81 to 85, the create_token_pair
method was changed to return a plain dict instead of the previously used
TokenResponse model, causing type hint and validation inconsistencies in call
sites like unified_ai_api.refresh_token. To fix this, revert create_token_pair
to return a TokenResponse instance for strong typing and backward compatibility,
and update tests to use model_dump() if they need a dict. Optionally, add a
separate helper method create_token_pair_dict() for tests requiring a dict. This
maintains consistent API typing and avoids breaking changes.
| request: Request = None, | ||
| current_user: TokenPayload = Depends(get_current_user), | ||
| ) -> dict[str, Any]: |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Don’t default Request to None in FastAPI dependencies
FastAPI will always inject a Request; the default is unnecessary and complicates typing.
- request: Request = None,
+ request: Request,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| request: Request = None, | |
| current_user: TokenPayload = Depends(get_current_user), | |
| ) -> dict[str, Any]: | |
| request: Request, | |
| current_user: TokenPayload = Depends(get_current_user), | |
| ) -> dict[str, Any]: |
🤖 Prompt for AI Agents
In src/unified_ai_api.py around lines 1327 to 1329, remove the default value
None from the Request parameter since FastAPI automatically injects the Request
object. Update the function signature to have request: Request without a default
to simplify typing and align with FastAPI conventions.
| # Open each file without context manager so httpx can compute length later | ||
| audio_file = open(temp_file_path, "rb") | ||
| files.append(("audio_files", (f"file{i+1}.wav", audio_file, "audio/wav"))) | ||
|
|
There was a problem hiding this comment.
Close file handles or revert to context managers to avoid FD leaks
Files are opened without a context manager and never closed, which can leak descriptors under parallel pytest. Either revert to using with-open (the httpx shim now handles closed-file length) or explicitly close handles after the request.
Apply either approach:
- Minimal fix: close after request (add to the finally block of this test):
# After sending the request and before cleaning temp files:
for _, payload in files:
fobj = payload[1]
try:
fobj.close()
except Exception:
pass- Or replace these lines with a context manager:
- # Open each file without context manager so httpx can compute length later
- audio_file = open(temp_file_path, "rb")
- files.append(("audio_files", (f"file{i+1}.wav", audio_file, "audio/wav")))
+ with open(temp_file_path, "rb") as audio_file:
+ files.append(("audio_files", (f"file{i+1}.wav", audio_file, "audio/wav")))🤖 Prompt for AI Agents
In tests/integration/test_priority1_features.py around lines 424 to 427, file
handles are opened without context managers and never closed, risking file
descriptor leaks during parallel pytest runs. Fix this by either wrapping the
file opening in a with statement to ensure automatic closure or, if keeping the
current approach, explicitly close each file handle in a finally block after the
request is sent, iterating over the files list and closing each file object
safely.
…se to file before extracting tag
Resolved issues in src/unified_ai_api.py with DeepSource Autofix
There was a problem hiding this comment.
Actionable comments posted: 6
🔭 Outside diff range comments (3)
.circleci/config.yml (3)
489-507: Bug: storing artifact that is never created
model-validation-results.jsonis never written. The store_artifacts step will fail.Minimal fix: write a small JSON summary at the end of the Python block.
print('All models loaded and validated successfully!') " + - run_in_conda: + step_name: Emit model validation summary + command: | + python - <<'PY' +import json, os +with open("model-validation-results.json", "w") as f: + json.dump({"status":"ok","models":["bert_emotion","t5_summarizer","whisper_transcriber"]}, f) +print("Wrote model-validation-results.json") +PY - store_artifacts: path: model-validation-results.json destination: model-reportsAlso applies to: 509-511
599-607: Docker build will fail without Remote DockerRunning Docker inside cimg/python requires CircleCI remote docker. Add setup step before build.
build-and-deploy: executor: python-ml steps: - setup_python_env - doctor_step - restore_dependencies - pre_warm_models # Pre-warm models for faster execution + - setup_remote_docker: + version: 20.10.18 - run: name: Build Docker Image
558-570: Guard nvidia-smi on non-GPU runnersThe
python-gpuexecutor uses a standard Ubuntu machine image without guaranteed GPU support, so callingnvidia-smiunconditionally will cause the job to fail. Wrap the command in a check to keep the step informative on non-GPU hosts.• File:
.circleci/config.yml
• Lines: 558-570Suggested diff:
echo "Setting up GPU environment..." - nvidia-smi + if command -v nvidia-smi >/dev/null 2>&1; then + nvidia-smi + else + echo "nvidia-smi not found (no GPU available)" + fi python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"You may also want to update the
python-gpuexecutor definition to use a GPU-enabledresource_classif you require real GPU hardware for your tests.
♻️ Duplicate comments (1)
src/unified_ai_api.py (1)
1326-1328: Don’t default Request to None in FastAPI dependenciesFastAPI always injects Request; defaulting to None is unnecessary and complicates typing.
- request: Request = None, + request: Request,
🧹 Nitpick comments (9)
.circleci/config.yml (4)
171-172: Nit: wording says “Miniconda” while URL is MiniforgeFor accuracy, s/Miniconda/Miniforge/ in the error message.
- echo "Error: no wget, curl, python3, or python available to download Miniconda." >&2 + echo "Error: no wget, curl, python3, or python available to download Miniforge." >&2
216-220: PATH hardening matches PR goal; add guard when BASH_ENV is unsetThis correctly prepends system bins and conda. Guard for rare cases where
$BASH_ENVisn’t set.- echo 'export PATH="$HOME/miniconda/bin:/usr/local/bin:/usr/bin:/bin:$PATH"' >> "$BASH_ENV" - echo 'export HF_HOME="/home/circleci/.cache/huggingface"' >> "$BASH_ENV" + if [ -n "${BASH_ENV:-}" ]; then + echo 'export PATH="$HOME/miniconda/bin:/usr/local/bin:/usr/bin:/bin:$PATH"' >> "$BASH_ENV" + echo 'export HF_HOME="/home/circleci/.cache/huggingface"' >> "$BASH_ENV" + else + echo 'export PATH="$HOME/miniconda/bin:/usr/local/bin:/usr/bin:/bin:$PATH"' >> ~/.bashrc + echo 'export HF_HOME="/home/circleci/.cache/huggingface"' >> ~/.bashrc + fi
262-266: Avoid duplicated DB env configurationDB_* vars are set both at the executor level and appended to $BASH_ENV. Pick one source of truth (prefer executor env for simplicity) to avoid drift.
- Option A: Remove Lines 262–266.
- Option B: Remove Lines 35–38 and keep BASH_ENV exports.
Also applies to: 35-38
364-370: Long chained pytest command reduces maintainabilityConsider splitting the coverage + targeted serial tests into separate conda_exec steps for clearer reporting and easier failures triage, or use a small wrapper script.
Pros: clearer logs, easier retries. Cons: minor overhead.
src/unified_ai_api.py (5)
223-228: Auth semantics: consider 401 for invalid/missing auth, or drop WWW-Authenticate on 403Returning 403 for invalid tokens deviates from standard (401 + WWW-Authenticate). If tests require 403, consider removing the WWW-Authenticate header (which is typically paired with 401).
Option A (keep 403; remove header):
- raise HTTPException( - status_code=status.HTTP_403_FORBIDDEN, - detail="Invalid or expired token", - headers={"WWW-Authenticate": "Bearer"}, - ) + raise HTTPException( + status_code=status.HTTP_403_FORBIDDEN, + detail="Invalid or expired token", + )Option B (spec-compliant 401):
- raise HTTPException( - status_code=status.HTTP_403_FORBIDDEN, + raise HTTPException( + status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid or expired token", headers={"WWW-Authenticate": "Bearer"}, )Confirm which contract your clients/tests expect before changing.
394-397: Preserve ‘detail’ key for all HTTP errors for consistencyMost FastAPI clients expect a ‘detail’ field. Keep ‘detail’ even for non-400/422 statuses.
- return JSONResponse( - status_code=exc.status_code, - content={"error": exc.detail, "status_code": exc.status_code}, - ) + return JSONResponse( + status_code=exc.status_code, + content={"detail": exc.detail, "error": exc.detail, "status_code": exc.status_code}, + )
1247-1254: Align size threshold and error message; consider streaming checksThreshold is 45MB but message says “max 50MB”. Align them to avoid confusion. If 45MB is intended (per tests), update the message.
- if len(content) > 45 * 1024 * 1024: - # Return a JSON body with 'detail' to match tests expecting that key - raise HTTPException(status_code=400, detail="File too large (max 50MB)") + if len(content) > 45 * 1024 * 1024: + # Return a JSON body with 'detail' to match tests expecting that key + raise HTTPException(status_code=400, detail="File too large (max 45MB)")Optional: avoid loading the entire file into memory by streaming and short-circuiting once the threshold is exceeded.
1269-1280: Fallback on TypeError risks masking internal errors; narrow the conditionTypeError may originate from within transcribe(). Prefer detecting signature mismatches, or at least only fallback on argument-related TypeErrors.
- except TypeError: + except TypeError as te: + # Only fallback on argument/signature issues + msg = str(te).lower() + if "positional" not in msg and "unexpected keyword" not in msg and "argument" not in msg: + raise # Fallback for simplified fake transcribers without kwargs or with different signature try: transcription_result = voice_transcriber.transcribe(temp_file_path) - except TypeError: - transcription_result = voice_transcriber.transcribe() + except TypeError as te2: + msg2 = str(te2).lower() + if "positional" in msg2 or "argument" in msg2: + transcription_result = voice_transcriber.transcribe() + else: + raiseAlternatively, use inspect.signature to branch based on callable arity.
1394-1395: Redundant except HTTPException: raiseCatching and immediately re-raising HTTPException is unnecessary. It can be removed without changing behavior.
- except HTTPException: - raise
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.circleci/config.yml(9 hunks).coveragerc(1 hunks)coverage.xml(0 hunks)src/unified_ai_api.py(14 hunks)
💤 Files with no reviewable changes (1)
- coverage.xml
✅ Files skipped from review due to trivial changes (1)
- .coveragerc
🧰 Additional context used
🪛 Ruff (0.12.2)
src/unified_ai_api.py
52-52: httpx imported but unused
Remove unused import: httpx
(F401)
63-64: try-except-pass detected, consider logging the exception
(S110)
🔇 Additional comments (12)
.circleci/config.yml (6)
69-75: Good call: source $BASH_ENV before conda runsEnsures env exports from prior steps (e.g., PATH hardening) are applied. This fixes common CircleCI env propagation issues.
84-101: Doctor diagnostics are non-fatal and artifacted—LGTMpwd/ls now use
|| true, and outputs are stored as artifacts. Solid for debugging without breaking the build.
316-324: Per-job online override before model pre-warm—LGTMExplicitly disabling offline mode here is correct and addresses previous global offline issues.
478-486: Online override before model validation—LGTMEnsures HF downloads work during validation regardless of prior offline settings.
212-215: Conda presence checks—LGTMVerifies installation before proceeding; good early fail behavior.
328-340: Quality checks resilient to failures—LGTMUse of “|| echo …” keeps fast feedback flowing without hard-failing the pipeline.
src/unified_ai_api.py (6)
7-7: Postponed evaluation of annotations: LGTMThis improves typing and forward reference handling.
541-547: Unknown summarizer models → 400 Bad Request: LGTMClear client error signaling. Good change for predictability.
1019-1051: Robust normalization for emotion_detector outputs: LGTMHandles dicts, objects, and MagicMock coercions; prevents crashes and keeps sane defaults.
1187-1202: Normalize transcription dict to satisfy schema: LGTMEnsures optional fields are present with correct types.
1348-1354: Temp-file handling per file: LGTMReads content and persists to a unique temp WAV per upload; safe cleanup follows later.
1432-1445: Summary generation fallback for simplified fakes: LGTMPragmatic handling of mocks/fakes with altered signatures. Works well with prior request-scoped model selection.
| # Download Miniforge (Conda-Forge) installer with stable checksum endpoint | ||
| MINIFORGE_URL="https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh" | ||
|
|
||
| if command -v wget >/dev/null 2>&1; then | ||
| wget -O miniconda.sh "$MINIFORGE_URL" | ||
| elif command -v curl >/dev/null 2>&1; then | ||
| curl -L -o miniconda.sh "$MINIFORGE_URL" | ||
| elif command -v python3 >/dev/null 2>&1; then | ||
| python3 -c "import ssl,urllib.request;u='https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh';c=ssl.create_default_context();r=urllib.request.urlopen(u,context=c);open('miniconda.sh','wb').write(r.read());print('Downloaded Miniconda via python3 urllib')" | ||
| python3 -c "import ssl,urllib.request;u='https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh';c=ssl.create_default_context();r=urllib.request.urlopen(u,context=c);open('miniconda.sh','wb').write(r.read());print('Downloaded Miniforge via python3 urllib')" | ||
| elif command -v python >/dev/null 2>&1; then | ||
| python -c "import ssl,urllib.request;u='https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh';c=ssl.create_default_context();r=urllib.request.urlopen(u,context=c);open('miniconda.sh','wb').write(r.read());print('Downloaded Miniconda via python urllib')" | ||
| python -c "import ssl,urllib.request;u='https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh';c=ssl.create_default_context();r=urllib.request.urlopen(u,context=c);open('miniconda.sh','wb').write(r.read());print('Downloaded Miniforge via python urllib')" | ||
| else | ||
| # Last resort: try to install curl if we have privileges, then retry | ||
| if command -v sudo >/dev/null 2>&1; then | ||
| if command -v apt-get >/dev/null 2>&1; then sudo apt-get update && sudo apt-get install -y curl || true; fi | ||
| if command -v yum >/dev/null 2>&1; then sudo yum install -y curl || true; fi | ||
| if command -v dnf >/dev/null 2>&1; then sudo dnf install -y curl || true; fi | ||
| if command -v apk >/dev/null 2>&1; then sudo apk add --no-cache curl || true; fi | ||
| elif [ "$(id -u)" -eq 0 ]; then | ||
| if command -v apt-get >/dev/null 2>&1; then apt-get update && apt-get install -y curl || true; fi | ||
| if command -v yum >/dev/null 2>&1; then yum install -y curl || true; fi | ||
| if command -v dnf >/dev/null 2>&1; then dnf install -y curl || true; fi | ||
| if command -v apk >/dev/null 2>&1; then apk add --no-cache curl || true; fi | ||
| fi | ||
| if command -v curl >/dev/null 2>&1; then | ||
| curl -L -o miniconda.sh "$MINIFORGE_URL" | ||
| else | ||
| echo "Error: no wget, curl, or python available to download Miniconda." >&2 | ||
| exit 1 | ||
| fi | ||
| echo "Error: no wget, curl, python3, or python available to download Miniconda." >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Fix: checksum retrieval uses curl unconditionally—add wget/python fallback
If only python is available (download path), curl -sSfL ... at Line 179 will fail. Provide fallbacks so checksum verification remains robust.
- # Avoid SIGPIPE from curl | grep by writing to a temp file
- curl -sSfL https://api.github.com/repos/conda-forge/miniforge/releases/latest -o latest_release.json
+ # Avoid SIGPIPE; fetch latest release JSON with available tool
+ if command -v curl >/dev/null 2>&1; then
+ curl -sSfL https://api.github.com/repos/conda-forge/miniforge/releases/latest -o latest_release.json
+ elif command -v wget >/dev/null 2>&1; then
+ wget -qO latest_release.json https://api.github.com/repos/conda-forge/miniforge/releases/latest
+ elif command -v python3 >/dev/null 2>&1; then
+ python3 - <<'PY' > latest_release.json
+import ssl, urllib.request, sys
+ctx = ssl.create_default_context()
+with urllib.request.urlopen("https://api.github.com/repos/conda-forge/miniforge/releases/latest", context=ctx) as r:
+ sys.stdout.buffer.write(r.read())
+PY
+ else
+ echo "Error: no tool available to fetch latest release metadata." >&2
+ exit 1
+ fiAlso consider a checksum tool fallback for portability:
- ACTUAL_SUM=$(sha256sum miniconda.sh | awk '{print $1}')
+ if command -v sha256sum >/dev/null 2>&1; then
+ ACTUAL_SUM=$(sha256sum miniconda.sh | awk '{print $1}')
+ elif command -v shasum >/dev/null 2>&1; then
+ ACTUAL_SUM=$(shasum -a 256 miniconda.sh | awk '{print $1}')
+ else
+ echo "Error: no SHA256 tool available." >&2
+ exit 1
+ fiAlso applies to: 175-207
🤖 Prompt for AI Agents
In .circleci/config.yml around lines 159 to 173 and 175 to 207, the current
script uses curl unconditionally to retrieve the checksum, which will fail if
only python or wget is available. Modify the checksum retrieval logic to include
fallbacks using wget and python commands similar to the download step, ensuring
checksum verification works regardless of which tool is installed. Additionally,
add a fallback to use a checksum tool available on the system for portability.
This will make the checksum verification robust across different environments.
| # Create or update environment (idempotent) with fallback if conda-forge is blocked | ||
| if ! "$HOME/miniconda/bin/conda" env list | grep -q "samo-dl-stable"; then | ||
| set +e | ||
| "$HOME/miniconda/bin/conda" env create -f environment.yml | ||
| create_rc=$? | ||
| set -e | ||
| if [ $create_rc -ne 0 ]; then | ||
| echo "Conda env create failed (possibly channel 403). Using minimal fallback env..." >&2 | ||
| "$HOME/miniconda/bin/conda" create -y -n samo-dl-stable python=3.10 | ||
| # Minimal test/dev packages to allow doctor/tests to run | ||
| "$HOME/miniconda/bin/conda" run -n samo-dl-stable pip install -U pip | ||
| # Pin fallback tooling to ensure reproducibility (mirrors versions in environment.yml / repo constraints) | ||
| "$HOME/miniconda/bin/conda" run -n samo-dl-stable pip install \ | ||
| pytest==8.3.2 \ | ||
| pytest-xdist==3.6.1 \ | ||
| 'pytest-cov>=6.2.1,<7.0.0' \ | ||
| ruff==0.6.9 \ | ||
| bandit==1.7.9 \ | ||
| safety==3.2.3 \ | ||
| mypy==1.10.0 \ | ||
| httpx==0.27.2 \ | ||
| requests==2.32.4 \ | ||
| Flask==3.0.3 \ | ||
| PyJWT==2.8.0 | ||
| # Install project editable to expose src/ | ||
| "$HOME/miniconda/bin/conda" run -n samo-dl-stable pip install -e ".[test]" || true | ||
| fi |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Fallback env is pragmatic; ensure test dependencies exist
Performance and API tests import FastAPI’s TestClient. If we hit the fallback path (conda env create failed), FastAPI isn’t installed and tests will error.
Add fastapi and uvicorn to the pinned fallback set:
requests==2.32.4 \
Flask==3.0.3 \
- PyJWT==2.8.0
+ PyJWT==2.8.0 \
+ fastapi==0.111.0 \
+ uvicorn==0.30.3🏁 Script executed:
#!/usr/bin/env bash
set -e
# Locate environment.yml to check existing fastapi/uvicorn pins
env_file=$(fd -t f -I 'environment.yml' | head -n1)
if [ -z "$env_file" ]; then
echo "❌ environment.yml not found"
else
echo "🔍 Inspecting $env_file for fastapi/uvicorn entries"
grep -En 'fastapi|uvicorn' "$env_file" || echo "⚠️ No fastapi/uvicorn pinned in environment.yml"
fi
# Search for FastAPI TestClient usage to confirm test dependency
echo "🔍 Searching for FastAPI TestClient imports in codebase"
rg -n 'from fastapi.testclient import TestClient'
rg -n 'TestClient' -A 2Length of output: 5474
Add FastAPI & Uvicorn to the fallback test environment
Tests in tests/conftest.py and several integration specs import FastAPI’s TestClient. If conda env creation fails, the fallback env must install FastAPI and Uvicorn (pinned to match environment.yml) or tests will error.
• File: .circleci/config.yml fallback block (around lines 230–245)
• environment.yml pins: fastapi==0.116.1 and uvicorn==0.35.0
Suggested diff:
requests==2.32.4 \
Flask==3.0.3 \
- PyJWT==2.8.0
+ PyJWT==2.8.0 \
+ fastapi==0.116.1 \
+ uvicorn==0.35.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Create or update environment (idempotent) with fallback if conda-forge is blocked | |
| if ! "$HOME/miniconda/bin/conda" env list | grep -q "samo-dl-stable"; then | |
| set +e | |
| "$HOME/miniconda/bin/conda" env create -f environment.yml | |
| create_rc=$? | |
| set -e | |
| if [ $create_rc -ne 0 ]; then | |
| echo "Conda env create failed (possibly channel 403). Using minimal fallback env..." >&2 | |
| "$HOME/miniconda/bin/conda" create -y -n samo-dl-stable python=3.10 | |
| # Minimal test/dev packages to allow doctor/tests to run | |
| "$HOME/miniconda/bin/conda" run -n samo-dl-stable pip install -U pip | |
| # Pin fallback tooling to ensure reproducibility (mirrors versions in environment.yml / repo constraints) | |
| "$HOME/miniconda/bin/conda" run -n samo-dl-stable pip install \ | |
| pytest==8.3.2 \ | |
| pytest-xdist==3.6.1 \ | |
| 'pytest-cov>=6.2.1,<7.0.0' \ | |
| ruff==0.6.9 \ | |
| bandit==1.7.9 \ | |
| safety==3.2.3 \ | |
| mypy==1.10.0 \ | |
| httpx==0.27.2 \ | |
| requests==2.32.4 \ | |
| Flask==3.0.3 \ | |
| PyJWT==2.8.0 | |
| # Install project editable to expose src/ | |
| "$HOME/miniconda/bin/conda" run -n samo-dl-stable pip install -e ".[test]" || true | |
| fi | |
| "$HOME/miniconda/bin/conda" run -n samo-dl-stable pip install \ | |
| pytest==8.3.2 \ | |
| pytest-xdist==3.6.1 \ | |
| 'pytest-cov>=6.2.1,<7.0.0' \ | |
| ruff==0.6.9 \ | |
| bandit==1.7.9 \ | |
| safety==3.2.3 \ | |
| mypy==1.10.0 \ | |
| httpx==0.27.2 \ | |
| requests==2.32.4 \ | |
| Flask==3.0.3 \ | |
| PyJWT==2.8.0 \ | |
| fastapi==0.116.1 \ | |
| uvicorn==0.35.0 |
🤖 Prompt for AI Agents
In .circleci/config.yml around lines 230 to 245, the fallback conda environment
setup is missing FastAPI and Uvicorn installations, which causes test failures
due to missing dependencies. Add pip install commands for fastapi==0.116.1 and
uvicorn==0.35.0 in the fallback pip install list to ensure the fallback
environment matches the main environment.yml dependencies required for tests.
| # Defensive client compatibility shim for tests: handle closed file objects in httpx multipart | ||
| try: # pragma: no cover - safety shim only used in tests | ||
| import httpx # type: ignore | ||
| from httpx import _utils as _httpx_utils # type: ignore | ||
|
|
||
| _orig_peek = getattr(_httpx_utils, "peek_filelike_length", None) | ||
| if callable(_orig_peek): | ||
| def _safe_peek_filelike_length(stream): | ||
| try: | ||
| return _orig_peek(stream) | ||
| except Exception: | ||
| return None | ||
| _httpx_utils.peek_filelike_length = _safe_peek_filelike_length # type: ignore | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Limit httpx monkeypatch to test env; address Ruff F401; don’t swallow exceptions
Patching httpx private internals is brittle. Gate it to tests only, keep the unused import quiet, and log (not pass) on failure.
-# Defensive client compatibility shim for tests: handle closed file objects in httpx multipart
-try: # pragma: no cover - safety shim only used in tests
- import httpx # type: ignore
- from httpx import _utils as _httpx_utils # type: ignore
+# Defensive client compatibility shim for tests: handle closed file objects in httpx multipart
+if os.getenv("SAMO_TEST_MODE") or os.getenv("PYTEST_CURRENT_TEST"):
+ try: # pragma: no cover - safety shim only used in tests
+ import httpx # type: ignore # noqa: F401
+ from httpx import _utils as _httpx_utils # type: ignore
@@
- _orig_peek = getattr(_httpx_utils, "peek_filelike_length", None)
- if callable(_orig_peek):
- def _safe_peek_filelike_length(stream):
- try:
- return _orig_peek(stream)
- except Exception:
- return None
- _httpx_utils.peek_filelike_length = _safe_peek_filelike_length # type: ignore
-except Exception:
- pass
+ _orig_peek = getattr(_httpx_utils, "peek_filelike_length", None)
+ if callable(_orig_peek):
+ def _safe_peek_filelike_length(stream):
+ try:
+ return _orig_peek(stream)
+ except Exception:
+ return None
+ _httpx_utils.peek_filelike_length = _safe_peek_filelike_length # type: ignore
+ except Exception as e:
+ logger.debug("httpx multipart shim not applied: %s", e)Notes:
- Addresses Ruff F401 (unused import) and S110 (bare except/ pass).
- Prevents unexpected behavior in production by gating to test env.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Defensive client compatibility shim for tests: handle closed file objects in httpx multipart | |
| try: # pragma: no cover - safety shim only used in tests | |
| import httpx # type: ignore | |
| from httpx import _utils as _httpx_utils # type: ignore | |
| _orig_peek = getattr(_httpx_utils, "peek_filelike_length", None) | |
| if callable(_orig_peek): | |
| def _safe_peek_filelike_length(stream): | |
| try: | |
| return _orig_peek(stream) | |
| except Exception: | |
| return None | |
| _httpx_utils.peek_filelike_length = _safe_peek_filelike_length # type: ignore | |
| except Exception: | |
| pass | |
| # Defensive client compatibility shim for tests: handle closed file objects in httpx multipart | |
| if os.getenv("SAMO_TEST_MODE") or os.getenv("PYTEST_CURRENT_TEST"): | |
| try: # pragma: no cover - safety shim only used in tests | |
| import httpx # type: ignore # noqa: F401 | |
| from httpx import _utils as _httpx_utils # type: ignore | |
| _orig_peek = getattr(_httpx_utils, "peek_filelike_length", None) | |
| if callable(_orig_peek): | |
| def _safe_peek_filelike_length(stream): | |
| try: | |
| return _orig_peek(stream) | |
| except Exception: | |
| return None | |
| _httpx_utils.peek_filelike_length = _safe_peek_filelike_length # type: ignore | |
| except Exception as e: | |
| logger.debug("httpx multipart shim not applied: %s", e) |
🧰 Tools
🪛 Ruff (0.12.2)
52-52: httpx imported but unused
Remove unused import: httpx
(F401)
63-64: try-except-pass detected, consider logging the exception
(S110)
🤖 Prompt for AI Agents
In src/unified_ai_api.py around lines 50 to 64, the current httpx monkeypatch is
applied unconditionally, uses unused imports triggering Ruff F401, and silently
swallows exceptions with a bare except and pass. To fix this, wrap the entire
patching code in a condition that checks if the environment is a test
environment, remove or use the unused imports to satisfy Ruff, and replace the
bare except/pass with exception logging to avoid silently ignoring errors.
| async def permission_checker(request: Request, current_user: TokenPayload = Depends(get_current_user)): | ||
| # Allow tests to inject permissions via header without altering tokens | ||
| injected = request.headers.get("X-User-Permissions") | ||
| if injected: | ||
| injected_perms = {p.strip() for p in injected.split(",") if p.strip()} | ||
| if permission in injected_perms: | ||
| return current_user | ||
| if permission not in current_user.permissions: |
There was a problem hiding this comment.
Security: Header-based permission injection must be gated to tests only
Allowing X-User-Permissions to bypass token permissions is a privilege-escalation vector if left enabled in production.
Gate this under explicit test/CI env flags:
- injected = request.headers.get("X-User-Permissions")
+ injected = None
+ if os.getenv("SAMO_TEST_MODE") or os.getenv("PYTEST_CURRENT_TEST") or os.getenv("CI"):
+ injected = request.headers.get("X-User-Permissions")Additionally, document and centralize this behavior to avoid accidental exposure.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/unified_ai_api.py around lines 234 to 241, the permission injection via
the X-User-Permissions header is a security risk if enabled outside testing.
Restrict this header-based permission injection by gating it behind explicit
environment flags that indicate test or CI environments. Centralize this logic
in a single place and add clear documentation to prevent accidental exposure in
production.
| "username": login_data.username, | ||
| "email": login_data.username if "@" in login_data.username else f"{login_data.username}@example.com", | ||
| "permissions": ["read", "write", "admin"] # Demo permissions | ||
| "permissions": ["read", "write", "admin"] |
There was a problem hiding this comment.
Don’t grant ‘admin’ to all logins (even in demo)
This silently escalates privileges and can leak into non-test environments.
- "permissions": ["read", "write", "admin"]
+ "permissions": ["read", "write"]If tests need admin, gate it via a test env flag:
- "permissions": ["read", "write", "admin"]
+ "permissions": ["read", "write"] if not os.getenv("SAMO_TEST_MODE") else ["read", "write", "admin"]Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/unified_ai_api.py at line 759, the permissions array includes "admin" for
all logins, which risks privilege escalation and leaks into non-test
environments. Remove "admin" from the default permissions and instead
conditionally add it only when a specific test environment flag is set, ensuring
admin rights are granted solely during testing.
| # Enforce permission when processing a single file via this endpoint; allow multi-file batches | ||
| if len(audio_files) <= 1: | ||
| # Also honor test-injected header override | ||
| injected = request.headers.get("X-User-Permissions") if isinstance(current_user, TokenPayload) else None | ||
| has_injected = False | ||
| if injected: | ||
| injected_perms = {p.strip() for p in injected.split(",") if p.strip()} | ||
| has_injected = "batch_processing" in injected_perms | ||
| if not has_injected and "batch_processing" not in current_user.permissions: | ||
| raise HTTPException(status_code=403, detail="Permission 'batch_processing' required") | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Guard header-based permission override and simplify logic
- Gating header injection avoids production bypass.
- The isinstance(current_user, TokenPayload) check is irrelevant here.
- if len(audio_files) <= 1:
- # Also honor test-injected header override
- injected = request.headers.get("X-User-Permissions") if isinstance(current_user, TokenPayload) else None
+ if len(audio_files) <= 1:
+ # Also honor test-injected header override (test/CI only)
+ injected = None
+ if os.getenv("SAMO_TEST_MODE") or os.getenv("PYTEST_CURRENT_TEST") or os.getenv("CI"):
+ injected = request.headers.get("X-User-Permissions")
has_injected = False
if injected:
injected_perms = {p.strip() for p in injected.split(",") if p.strip()}
has_injected = "batch_processing" in injected_perms
if not has_injected and "batch_processing" not in current_user.permissions:
raise HTTPException(status_code=403, detail="Permission 'batch_processing' required")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Enforce permission when processing a single file via this endpoint; allow multi-file batches | |
| if len(audio_files) <= 1: | |
| # Also honor test-injected header override | |
| injected = request.headers.get("X-User-Permissions") if isinstance(current_user, TokenPayload) else None | |
| has_injected = False | |
| if injected: | |
| injected_perms = {p.strip() for p in injected.split(",") if p.strip()} | |
| has_injected = "batch_processing" in injected_perms | |
| if not has_injected and "batch_processing" not in current_user.permissions: | |
| raise HTTPException(status_code=403, detail="Permission 'batch_processing' required") | |
| # Enforce permission when processing a single file via this endpoint; allow multi-file batches | |
| if len(audio_files) <= 1: | |
| # Also honor test-injected header override (test/CI only) | |
| injected = None | |
| if os.getenv("SAMO_TEST_MODE") or os.getenv("PYTEST_CURRENT_TEST") or os.getenv("CI"): | |
| injected = request.headers.get("X-User-Permissions") | |
| has_injected = False | |
| if injected: | |
| injected_perms = {p.strip() for p in injected.split(",") if p.strip()} | |
| has_injected = "batch_processing" in injected_perms | |
| if not has_injected and "batch_processing" not in current_user.permissions: | |
| raise HTTPException(status_code=403, detail="Permission 'batch_processing' required") |
🤖 Prompt for AI Agents
In src/unified_ai_api.py around lines 1334 to 1344, remove the
isinstance(current_user, TokenPayload) check when reading the
"X-User-Permissions" header to simplify the logic and ensure the header-based
permission override is properly guarded. Instead, always read the header if
present, but gate its use to avoid production bypass by validating the override
appropriately before checking permissions.
…ensure executor PATH includes system bins
Summary by Sourcery
Harden the CircleCI configuration by ensuring system binaries are always in PATH, dropping redundant dependency installs, and tightening the Miniconda bootstrap logic to fail on missing tools
CI:
Summary by CodeRabbit