refactor: remove unnecessary whitespace by deepsource-autofix[bot] · Pull Request #42 · uelkerd/SAMO--DL

deepsource-autofix · 2025-08-08T22:58:20Z

Blank lines should not contain any tabs or spaces.

Summary by CodeRabbit

Refactor
- Extensive whitespace and formatting cleanups were applied across multiple scripts and modules, improving code readability and consistency without altering any functionality or behavior.
Chores
- Several obsolete or redundant debugging and test scripts were removed.
New Features
- Enhanced validation and error handling were added to the bulletproof training script, including improved data checks, batch validation, exception handling, and detailed logging for increased robustness.
Documentation
- Print statements and comments were updated for clarity and consistency in several scripts.

Blank lines should not contain any tabs or spaces.

deepsource-io · 2025-08-08T22:59:16Z

Here's the code health analysis summary for commits da422f8..a6628a7. View details on DeepSource ↗.

Analysis Summary

Analyzer	Status	Summary	Link
Test coverage	⚠️ Artifact not reported	Timed out: Artifact was never reported	View Check ↗
Python	❌ Failure	❗ 2 occurences introduced 🎯 2757 occurences resolved	View Check ↗
Terraform	✅ Success		View Check ↗
Secrets	✅ Success		View Check ↗
Shell	✅ Success		View Check ↗
Docker	✅ Success		View Check ↗

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

Resolved issues in the following files with DeepSource Autofix: 1. scripts/legacy/comprehensive_model_validation.py 2. scripts/legacy/deep_model_analysis.py

Copilot

Pull Request Overview

This PR removes unnecessary whitespace from multiple Python files to improve code quality by cleaning up blank lines that contain tabs or spaces.

Removes whitespace-only lines from empty test files
Updates f-string print statements to use regular string concatenation for static text
Eliminates trailing whitespace from blank lines in analysis scripts

Reviewed Changes

Copilot reviewed 6 out of 113 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
scripts/testing/simple_rate_limiter_test.py	Removes whitespace-only line from empty file
scripts/testing/debug_rate_limiter_test.py	Removes whitespace-only line from empty file
scripts/legacy/deep_model_analysis.py	Converts f-string print statements to regular strings and removes trailing whitespace
scripts/legacy/comprehensive_model_validation.py	Converts f-string print statements to regular strings for static text
debug_ci_timing.py	Removes whitespace-only line from empty file
debug_ci_robust.py	Removes whitespace-only line from empty file

Copilot · 2025-08-08T23:30:49Z

-    
-    print(f"\n📊 EMOTION MAPPING ANALYSIS")
+
+    print("\n📊 EMOTION MAPPING ANALYSIS")


[nitpick] Converting from f-string to regular string for static text is correct, but consider that f-strings are generally preferred in modern Python for consistency, even with static strings. However, this change improves performance slightly by avoiding unnecessary f-string overhead.

uelkerd · 2025-08-08T23:30:51Z

/gemini review

uelkerd · 2025-08-08T23:31:18Z

/sourcery-ai review

gemini-code-assist

Code Review

This pull request provides a comprehensive cleanup by removing unnecessary whitespace from blank lines across a large number of files. This is a valuable refactoring that improves code style and consistency throughout the repository. The changes are correct and well-executed. I have no further comments or suggestions.

uelkerd · 2025-08-10T20:08:25Z

@coderabbitai full review

coderabbitai · 2025-08-10T20:08:33Z

✅ Actions performed

Full review triggered.

coderabbitai · 2025-08-10T20:08:36Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This update is a large-scale codebase cleanup focused almost entirely on formatting and whitespace adjustments. It removes trailing spaces, extraneous blank lines, and enforces consistent code style across numerous scripts, modules, and notebooks. Additionally, several obsolete or debugging-related files are deleted. One file, scripts/training/bulletproof_training.py, receives substantial enhancements for validation, error handling, and logging, while another (scripts/training/improve_expanded_training_notebook.py) programmatically augments a notebook's training function with advanced features.

Changes

Cohort / File(s)	Change Summary
Whitespace & Formatting Cleanup `deployment/api_server.py`, `deployment/cloud-run/robust_predict.py`, `deployment/gcp/predict.py`, `deployment/inference.py`, `deployment/local/api_server.py`, `deployment/secure_api_server.py`, `scripts/ci/model_calibration_test.py`, `scripts/ci/run_full_ci_pipeline.py`, `scripts/ci/whisper_transcription_test.py`, `scripts/deployment/complete_project_deployment.py`, `scripts/deployment/create_model_deployment_package.py`, `scripts/deployment/deploy_locally.py`, `scripts/deployment/deploy_to_gcp_vertex_ai.py`, `scripts/deployment/save_trained_model_for_deployment.py`, `scripts/legacy/add_comprehensive_features.py`, `scripts/legacy/add_wandb_setup.py`, `scripts/legacy/comprehensive_model_validation.py`, `scripts/legacy/convert_to_onnx.py`, `scripts/legacy/create_bulletproof_cell.py`, `scripts/legacy/create_final_bulletproof_cell.py`, `scripts/legacy/create_unique_fallback_dataset.py`, `scripts/legacy/deep_model_analysis.py`, `scripts/legacy/evaluate_whisper_wer.py`, `scripts/legacy/expand_journal_dataset.py`, `scripts/legacy/finalize_emotion_model.py`, `scripts/legacy/improve_model_f1.py`, `scripts/legacy/integrate_cmu_mosei.py`, `scripts/legacy/reorganize_model_directory.py`, `scripts/legacy/retrain_with_expanded_dataset.py`, `scripts/legacy/retrain_with_validation.py`, `scripts/legacy/simple_cmu_mosei_download.py`, `scripts/legacy/simple_f1_evaluation.py`, `scripts/legacy/validate_model_performance.py`, `scripts/maintenance/emergency_f1_fix.py`, `scripts/maintenance/fix_code_quality.py`, `scripts/maintenance/fix_import_paths.py`, `scripts/maintenance/fix_label_mapping.py`, `scripts/maintenance/fix_linting_issues_conservative.py`, `scripts/maintenance/fix_model_architecture_mismatch.py`, `scripts/maintenance/fix_model_reconfiguration.py`, `scripts/maintenance/fix_remaining_linting.py`, `scripts/maintenance/quick_label_fix.py`, `scripts/testing/check_model_health.py`, `scripts/testing/create_journal_test_dataset.py`, `scripts/testing/debug_dataset_structure.py`, `scripts/testing/debug_go_emotions_labels.py`, `scripts/testing/debug_label_mismatch.py`, `scripts/testing/debug_model_loading.py`, `scripts/testing/final_temperature_test.py`, `scripts/testing/mega_comprehensive_model_test.py`, `scripts/testing/mega_test_summary.py`, `scripts/testing/setup_model_testing.py`, `scripts/testing/simple_model_test.py`, `scripts/testing/simple_temperature_test.py`, `scripts/training/SAMO_Colab_Setup.py`, `scripts/training/add_advanced_features_to_notebook.py`, `scripts/training/complete_simple_notebook.py`, `scripts/training/comprehensive_domain_adaptation_training.py`, `scripts/training/create_bulletproof_colab_notebook.py`, `scripts/training/create_colab_expanded_training.py`, `scripts/training/create_colab_notebook.py`, `scripts/training/create_comprehensive_notebook.py`, `scripts/training/create_corrected_specialized_notebook.py`, `scripts/training/create_emotion_specialized_notebook.py`, `scripts/training/create_final_bulletproof_notebook.py`, `scripts/training/create_final_colab_notebook.py`, `scripts/training/create_fixed_bulletproof_notebook.py`, `scripts/training/create_fixed_colab_notebook.py`, `scripts/training/create_fixed_notebook.py`, `scripts/training/create_fixed_specialized_training_notebook.py`, `scripts/training/create_improved_expanded_notebook.py`, `scripts/training/create_minimal_working_notebook.py`, `scripts/training/create_model_ensemble_notebook.py`, `scripts/training/create_simple_ultimate_notebook.py`, `scripts/training/create_ultimate_bulletproof_notebook.py`, `scripts/training/debug_colab_compatibility.py`, `scripts/training/debug_training_loss.py`, `scripts/training/final_combined_training.py`, `scripts/training/final_expanded_training.py`, `scripts/training/fix_imports_in_notebook.py`, `scripts/training/fix_notebook_json.py`, `scripts/training/fix_preprocessing_in_notebook.py`, `scripts/training/fix_training_arguments.py`, `scripts/training/fixed_focal_training.py`, `scripts/training/full_dataset_focal_training.py`, `scripts/training/full_focal_training.py`, `scripts/training/full_scale_focal_training.py`, `scripts/training/setup_colab_environment.py`, `scripts/training/summarize_comprehensive_notebook.py`, `scripts/training/summarize_ultimate_notebook.py`, `scripts/training/validate_improved_notebook.py`, `scripts/validation/check_dependencies.py`, `scripts/validation/validate_security_config.py`, `src/api_rate_limiter.py`, `src/input_sanitizer.py`, `src/models/emotion_detection/dataset_loader.py`, `src/models/secure_loader/integrity_checker.py`, `src/models/secure_loader/model_validator.py`, `src/models/secure_loader/sandbox_executor.py`	Removed trailing spaces, adjusted blank lines, and improved code formatting. No changes to logic, control flow, or functionality.
File Deletions: Debug/Obsolete Scripts `debug_ci_robust.py`, `debug_ci_timing.py`, `scripts/testing/debug_rate_limiter_test.py`, `scripts/testing/simple_rate_limiter_test.py`	Entire files deleted, removing all their content and exported entities.
Whitespace-Only Cleanup in Debug/Utility Scripts `debug_rate_limiter.py`	Removed trailing whitespace lines only; no logic or behavior changes.
Robustness & Validation Enhancements `scripts/training/bulletproof_training.py`	Added comprehensive validation, error handling, and logging throughout data loading, batching, training, and environment setup. Introduced input checks, batch validation, exception handling, and detailed progress reporting.
Notebook Training Function Enhancement `scripts/training/improve_expanded_training_notebook.py`	Programmatically augments a notebook's training function with GPU optimizations, early stopping, learning rate scheduler, and mixed-precision training via regex transformation.

Sequence Diagram(s)

Enhanced Bulletproof Training Flow

sequenceDiagram
    participant User
    participant Main
    participant EnvironmentValidator
    participant DataPreparer
    participant Trainer
    participant Logger

    User->>Main: Run script
    Main->>EnvironmentValidator: validate_environment()
    EnvironmentValidator-->>Main: Success/Failure
    Main->>DataPreparer: prepare_filtered_data()
    DataPreparer-->>Main: Filtered data or None
    Main->>Trainer: train_model_simple()
    loop For each batch
        Trainer->>Trainer: Validate batch structure & label ranges
        alt Invalid batch
            Trainer->>Logger: Log warning, skip batch
        else Valid batch
            Trainer->>Trainer: Train on batch (try-except)
            alt Exception
                Trainer->>Logger: Log error, skip batch
            end
        end
    end
    Trainer-->>Main: Training result
    Main->>Logger: Log completion or error

Notebook Training Function Enhancement (Regex Transformation)

sequenceDiagram
    participant Script
    participant Notebook
    participant RegexEngine

    Script->>Notebook: Load notebook JSON
    Script->>RegexEngine: Apply regex substitutions to training function cell
    RegexEngine-->>Notebook: Modified cell with GPU, scheduler, mixed-precision, early stopping
    Script->>Notebook: Save updated notebook

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–25 minutes

Poem

In a warren of code, the rabbits did sweep,
Tidying whitespace where old bugs might creep.
With a hop and a skip, they cleaned every line,
Deleting old clutter—oh, the code now looks fine!
Robustness was added where errors once grew,
Now the scripts run smoother—hippity-hop, woo-hoo!
🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch deepsource-autofix-d0ae103c

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 16

🔭 Outside diff range comments (13)

scripts/legacy/create_unique_fallback_dataset.py (1)

233-237: Align the success message with the duplicate check outcome

If duplicates are found, the function returns None but the script still prints a success message. Reflect the outcome.

-    create_unique_fallback_dataset()
-    print("\n🎉 Unique fallback dataset created successfully!") 
+    result = create_unique_fallback_dataset()
+    if result is None:
+        print("\n⚠️ Failed to create dataset due to duplicates.")
+        # Optionally: sys.exit(1)
+    else:
+        print("\n🎉 Unique fallback dataset created successfully!")

scripts/legacy/reorganize_model_directory.py (1)

237-246: Handle broken symlinks with lexists to avoid failures

If a broken symlink exists at deployment/model, os.path.exists(...) returns False, so the stale link remains and os.symlink(...) will fail with “File exists.” Use os.path.lexists(...) so broken links are detected and unlinked.
-    if os.path.exists(symlink_path):
+    if os.path.lexists(symlink_path):
         if os.path.islink(symlink_path):
             os.unlink(symlink_path)
         else:
             # Backup the original model directory
             backup_path = "deployment/model_backup"
             if os.path.exists(backup_path):
                 shutil.rmtree(backup_path)
             shutil.move(symlink_path, backup_path)
             print(f"✅ Backed up original model to: {backup_path}")

scripts/legacy/finalize_emotion_model.py (1)

187-191: Critical: incorrect unpacking from create_bert_emotion_classifier().

That factory returns (model, loss_function), not a tokenizer. Unpacking into (model, tokenizer) is wrong and will propagate incorrect types to create_augmented_dataset(...).
-    model, tokenizer = create_bert_emotion_classifier()
+    model, _ = create_bert_emotion_classifier()
If you truly need a tokenizer here, initialize it explicitly:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

scripts/maintenance/fix_code_quality.py (1)

57-66: fix_f_strings corrupts valid f-strings (data-loss bug)

Patterns at Line 60/61 will strip the f-prefix from all f-strings, including those with placeholders. The follow-up “invalid syntax” fix can’t recover because the f-prefix is already removed.

Use patterns that only demote f-strings with no placeholders, or drop this risky transform entirely.
-        # Fix f-strings without placeholders
-        content = re.sub(r'f"([^"]*)"', r'"\1"', content)
-        content = re.sub(r"f'([^']*)'", r"'\1'", content)
-
-        # Fix f-strings with invalid syntax
-        content = re.sub(r'f"([^"]*)\{([^}]*)\}([^"]*)"', r'f"\1{\2}\3"', content)
+        # Only demote f-strings that have no braces/placeholders
+        content = re.sub(r'f"([^"{}]*)"', r'"\1"', content)
+        content = re.sub(r"f'([^'{}]*)'", r"'\1'", content)
+        # Avoid attempting to "fix" arbitrary f-string syntax via regex; leave to linters/formatters.

scripts/maintenance/fix_remaining_linting.py (3)

104-124: Import sorting moves comments and can mangle imports

Lines 113-117 treat “#” comments as import lines; don’t.
Re-assembly doesn’t insert a separator or preserve groups.

-            if (stripped.startswith('import ') or
-                stripped.startswith('from ') or
-                stripped.startswith('#')):
+            if (stripped.startswith('import ')
+                or stripped.startswith('from ')):
                 import_lines.append(line)
             else:
                 non_import_lines.append(line)
 
-        import_lines.sort()
+        import_lines.sort()
 
-        new_content = '\n'.join(import_lines + non_import_lines)
+        # Separate imports from code
+        new_content = '\n'.join([*import_lines, '', *non_import_lines])

Consider using isort for correctness.

198-208: Directory existence check is ineffective

Line 206 uses if Path(directory): which is always truthy. Use exists()/is_dir().

-        for directory in directories:
-            if Path(directory):
-                self.process_directory(directory)
+        for directory in directories:
+            p = Path(directory)
+            if p.exists() and p.is_dir():
+                self.process_directory(str(p))

175-187: Broken regex in fix_minor_issues

Pattern at Line 179 appears malformed and the replacement duplicates noqa tags.

Either remove this mutation or constrain it precisely:

-        pattern = r'TEST_USER_PASSWORD_HASH = "test_hashed_password_123"  # noqa: S105]*)"'
-        replacement = r'TEST_USER_PASSWORD_HASH = "test_hashed_password_123"  # noqa: S105  # noqa: S105'
+        pattern = r'^(TEST_USER_PASSWORD_HASH\s*=\s*)"[^"]*"(.*)$'
+        replacement = r'\1"test_hashed_password_123"\2  # noqa: S105'

Add re.MULTILINE and only apply in test fixtures, not globally.

scripts/testing/final_temperature_test.py (1)

156-166: “Raw probs” printed for wrong samples (last-batch only)

probabilities refers to the last batch (Line 165). You’re indexing it with i from the full dataset, so it doesn’t align with test_texts.

Accumulate probs alongside predictions/labels and reference by global index:

-        all_predictions = []
-        all_labels = []
+        all_predictions = []
+        all_labels = []
+        all_probabilities = []
@@
-                probabilities = torch.sigmoid(outputs / temp)
+                probabilities = torch.sigmoid(outputs / temp)
@@
-                # Convert to numpy for sklearn
+                # Convert to numpy for sklearn/printing
                 all_predictions.append(predictions.cpu().numpy())
                 all_labels.append(labels.cpu().numpy())
+                all_probabilities.append(probabilities.cpu().numpy())
@@
-        all_predictions = np.concatenate(all_predictions, axis=0)
-        all_labels = np.concatenate(all_labels, axis=0)
+        all_predictions = np.concatenate(all_predictions, axis=0)
+        all_labels = np.concatenate(all_labels, axis=0)
+        all_probabilities = np.concatenate(all_probabilities, axis=0)
@@
-            logging.info(f"    Raw probs: {probabilities[i].cpu().numpy()}")
+            logging.info(f"    Raw probs: {all_probabilities[i]}")

src/api_rate_limiter.py (2)

403-433: Critical: concurrent request counters are never released in middleware

allow_request increments concurrent_requests, but release_request is never called. This leaks slots and will eventually block all traffic.

Apply this fix to call release_request in a finally block and still attach headers:

     @app.middleware("http")
     async def rate_limit_middleware(request: Request, call_next):
         """Rate limiting middleware."""
         client_ip = request.client.host if request.client else "unknown"
         user_agent = request.headers.get("user-agent", "")
@@
         # Check rate limit
-        allowed, reason, meta = rate_limiter.allow_request(client_ip, user_agent)
-
-        if not allowed:
-            return JSONResponse(
-                status_code=429,
-                content={
-                    "error": "Rate limit exceeded",
-                    "message": reason,
-                    "retry_after": meta.get("retry_after", 60)
-                }
-            )
-
-        # Add rate limit headers
-        response = await call_next(request)
-        response.headers["X-RateLimit-Limit"] = str(config.requests_per_minute)
-        response.headers["X-RateLimit-Remaining"] = str(meta.get("tokens_remaining", 0))
-        response.headers["X-RateLimit-Reset"] = str(meta.get("reset_time", 0))
-
-        return response
+        allowed, reason, meta = rate_limiter.allow_request(client_ip, user_agent)
+
+        if not allowed:
+            return JSONResponse(
+                status_code=429,
+                content={
+                    "error": "Rate limit exceeded",
+                    "message": reason,
+                    "retry_after": meta.get("retry_after", 60),
+                },
+            )
+
+        try:
+            response = await call_next(request)
+        finally:
+            # Always release the slot after request completes
+            rate_limiter.release_request(client_ip, user_agent)
+
+        # Add rate limit headers
+        response.headers["X-RateLimit-Limit"] = str(config.requests_per_minute)
+        response.headers["X-RateLimit-Remaining"] = str(meta.get("tokens_remaining", 0))
+        if "reset_time" in meta:
+            response.headers["X-RateLimit-Reset"] = str(meta["reset_time"])
+        return response

1-433: Ensure every allowed request releases its slot
The rate_limit_middleware in src/api_rate_limiter.py calls allow_request() (which increments concurrent_requests) but never calls release_request(), causing slots to leak. Wrap call_next() in a try/finally so that release_request(client_ip, user_agent) is always invoked—even on errors.

Locations to update:

src/api_rate_limiter.py, inside rate_limit_middleware
- Before returning the response (or on exception), add:
```
try:
    response = await call_next(request)
finally:
    rate_limiter.release_request(client_ip, user_agent)
```
- Then set rate-limit headers on response and return it.

This guarantees that every successful or failed request frees its concurrent slot.

scripts/training/final_combined_training.py (2)

197-218: Fix invalid TrainingArguments parameter: use evaluation_strategy (not eval_strategy)

Hugging Face Trainer expects evaluation_strategy. eval_strategy will be ignored or cause an error depending on version.

Apply:
-        eval_strategy="steps",
+        evaluation_strategy="steps",
245-267: Ensure model/input device alignment during ad-hoc testing

After training on GPU, model likely resides on CUDA. The ad-hoc test creates CPU tensors and can error. Move inputs to model.device or move model to CPU.

Minimal fix:
-    model.eval()
+    model.eval()
+    device = next(model.parameters()).device
@@
-            inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
-            outputs = model(**inputs)
+            inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
+            inputs = {k: v.to(device) for k, v in inputs.items()}
+            outputs = model(**inputs)

scripts/training/comprehensive_domain_adaptation_training.py (1)

523-571: DomainAdaptedEmotionClassifier must subclass nn.Module (critical for .to(), train/eval, etc.)

Current class isn’t an nn.Module yet ModelManager calls self.model.to(self.device). This will fail at runtime.

Minimal fix (follow-up PR recommended):

- class DomainAdaptedEmotionClassifier:
+ from torch import nn
+ class DomainAdaptedEmotionClassifier(nn.Module):
     def __init__(self, model_name="bert-base-uncased", num_labels=None, dropout=0.3):
-        import torch.nn as nn
+        # nn imported above
         from transformers import AutoModel
         self.bert = AutoModel.from_pretrained(model_name)
         self.dropout = nn.Dropout(dropout)
         self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)
         self.domain_classifier = nn.Sequential(
             nn.Linear(self.bert.config.hidden_size, 512),
             nn.ReLU(),
             nn.Dropout(0.3),
             nn.Linear(512, 2)
         )

♻️ Duplicate comments (1)

scripts/legacy/deep_model_analysis.py (1)

31-31: Converting static f-strings to plain strings is fine.

Repeating earlier feedback: using plain strings for static text removes unnecessary f-prefix.

coderabbitai · 2025-08-10T20:18:55Z

    with open('data/unique_fallback_dataset.json', 'w') as f:
        json.dump(unique_samples, f, indent=2)


🛠️ Refactor suggestion

Ensure output directory exists before writing the JSON file

If data/ doesn’t exist, open(...) will raise FileNotFoundError. Create the directory first.

Apply within this hunk:

- with open('data/unique_fallback_dataset.json', 'w') as f: + os.makedirs('data', exist_ok=True) + with open('data/unique_fallback_dataset.json', 'w') as f:

Also add the missing import near the top of the file:

import os

🤖 Prompt for AI Agents

In scripts/legacy/create_unique_fallback_dataset.py around lines 216 to 217, before opening the file for writing, check if the 'data/' directory exists and create it if it does not using os.makedirs with exist_ok=True. Also, add 'import os' near the top of the file to enable directory operations. This will prevent FileNotFoundError when the directory is missing.

coderabbitai · 2025-08-10T20:18:55Z

        f"{template} {random.choice(['It\'s been a long day.', 'Things are going well.', 'I need to process this.', 'This is important to me.'])}",
        f"{template} {random.choice(['I hope this continues.', 'I wonder what\'s next.', 'This feels right.', 'I\'m processing this.'])}",
        f"{template} {random.choice(['I should reflect on this.', 'This is meaningful.', 'I appreciate this moment.', 'I\'m learning from this.'])}"


⚠️ Potential issue

Fix f-string escape sequence syntax issue

The f-strings contain backslash escape sequences which cause syntax errors in Python 3.9. This syntax was only added in Python 3.12.

- f"{template} {random.choice(['It\'s been a long day.', 'Things are going well.', 'I need to process this.', 'This is important to me.'])}", - f"{template} {random.choice(['I hope this continues.', 'I wonder what\'s next.', 'This feels right.', 'I\'m processing this.'])}", - f"{template} {random.choice(['I should reflect on this.', 'This is meaningful.', 'I appreciate this moment.', 'I\'m learning from this.'])}" + f"{template} {random.choice(['It\\'s been a long day.', 'Things are going well.', 'I need to process this.', 'This is important to me.'])}", + f"{template} {random.choice(['I hope this continues.', 'I wonder what\\'s next.', 'This feels right.', 'I\\'m processing this.'])}", + f"{template} {random.choice(['I should reflect on this.', 'This is meaningful.', 'I appreciate this moment.', 'I\\'m learning from this.'])}"

Or better yet, avoid the issue entirely by using regular string concatenation or format():

- f"{template} {random.choice(['It\'s been a long day.', 'Things are going well.', 'I need to process this.', 'This is important to me.'])}", + template + " " + random.choice(['It\'s been a long day.', 'Things are going well.', 'I need to process this.', 'This is important to me.']),

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.12.2)

234-234: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

234-234: SyntaxError: Cannot use an escape sequence (backslash) in f-strings on Python 3.9 (syntax was added in Python 3.12)

235-235: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

235-235: SyntaxError: Cannot use an escape sequence (backslash) in f-strings on Python 3.9 (syntax was added in Python 3.12)

235-235: SyntaxError: Cannot use an escape sequence (backslash) in f-strings on Python 3.9 (syntax was added in Python 3.12)

236-236: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

236-236: SyntaxError: Cannot use an escape sequence (backslash) in f-strings on Python 3.9 (syntax was added in Python 3.12)

🤖 Prompt for AI Agents

In scripts/legacy/expand_journal_dataset.py around lines 234 to 236, the f-strings use backslash escape sequences inside the expressions, which causes syntax errors in Python 3.9. To fix this, replace the f-strings with either regular string concatenation or use the format() method to avoid backslash escapes inside the expressions, ensuring compatibility with Python 3.9.

coderabbitai · 2025-08-10T20:18:55Z

        # Calculate F1 score
        val_predictions = torch.cat(val_predictions, dim=0)
        val_labels = torch.cat(val_labels, dim=0)
-        
+
        f1 = f1_score(val_labels, val_predictions, average='micro', zero_division=0)
-        
+


💡 Verification agent

🧩 Analysis chain

Ensure sklearn gets NumPy arrays, not torch tensors.

f1_score should receive NumPy arrays. Convert after concatenation.

- val_predictions = torch.cat(val_predictions, dim=0) - val_labels = torch.cat(val_labels, dim=0) - f1 = f1_score(val_labels, val_predictions, average='micro', zero_division=0) + val_predictions = torch.cat(val_predictions, dim=0).cpu().numpy().astype(int) + val_labels = torch.cat(val_labels, dim=0).cpu().numpy().astype(int) + f1 = f1_score(val_labels, val_predictions, average='micro', zero_division=0)

I've added a quick search to confirm how val_predictions and val_labels are assembled and whether they reside on GPU:

🏁 Script executed:

#!/bin/bash rg -n "val_predictions" -A 10 -B 10 scripts/legacy/finalize_emotion_model.py

Length of output: 1402

Convert concatenated tensors to NumPy arrays before calling f1_score

Ensure that sklearn.metrics.f1_score receives NumPy arrays (and integer labels) rather than raw PyTorch tensors:

• File: scripts/legacy/finalize_emotion_model.py
• Lines: 239–244

- val_predictions = torch.cat(val_predictions, dim=0) - val_labels = torch.cat(val_labels, dim=0) - f1 = f1_score(val_labels, val_predictions, average='micro', zero_division=0) + val_predictions = torch.cat(val_predictions, dim=0).cpu().numpy().astype(int) + val_labels = torch.cat(val_labels, dim=0).cpu().numpy().astype(int) + f1 = f1_score(val_labels, val_predictions, average='micro', zero_division=0)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Calculate F1 score

val_predictions = torch.cat(val_predictions, dim=0)

val_labels = torch.cat(val_labels, dim=0)

f1 = f1_score(val_labels, val_predictions, average='micro', zero_division=0)

# Calculate F1 score

val_predictions = torch.cat(val_predictions, dim=0).cpu().numpy().astype(int)

val_labels = torch.cat(val_labels, dim=0).cpu().numpy().astype(int)

f1 = f1_score(val_labels, val_predictions, average='micro', zero_division=0)

🤖 Prompt for AI Agents

In scripts/legacy/finalize_emotion_model.py around lines 239 to 244, the concatenated PyTorch tensors val_labels and val_predictions need to be converted to NumPy arrays with integer type before passing to sklearn.metrics.f1_score. Fix this by calling .cpu().numpy() on each tensor and converting them to integers to ensure f1_score receives the correct input format.

coderabbitai · 2025-08-10T20:18:55Z

    # Calculate metrics
    micro_f1 = f1_score(labels, predictions, average='micro', zero_division=0)
    macro_f1 = f1_score(labels, predictions, average='macro', zero_division=0)
    precision, recall, _, _ = precision_recall_fscore_support(
        labels, predictions, average='micro', zero_division=0
    )
-    
+


🛠️ Refactor suggestion

Same conversion needed in ensemble evaluation.

Convert tensors to NumPy before computing metrics to avoid type/dtype surprises.

- predictions = torch.cat(predictions, dim=0) - labels = torch.cat(labels, dim=0) + predictions = torch.cat(predictions, dim=0).cpu().numpy().astype(int) + labels = torch.cat(labels, dim=0).cpu().numpy().astype(int)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Calculate metrics

micro_f1 = f1_score(labels, predictions, average='micro', zero_division=0)

macro_f1 = f1_score(labels, predictions, average='macro', zero_division=0)

precision, recall, _, _ = precision_recall_fscore_support(

labels, predictions, average='micro', zero_division=0

)

# Aggregate predictions and labels, then convert to NumPy ints

predictions = torch.cat(predictions, dim=0).cpu().numpy().astype(int)

labels = torch.cat(labels, dim=0).cpu().numpy().astype(int)

# Calculate metrics

micro_f1 = f1_score(labels, predictions, average='micro', zero_division=0)

macro_f1 = f1_score(labels, predictions, average='macro', zero_division=0)

precision, recall, _, _ = precision_recall_fscore_support(

labels, predictions, average='micro', zero_division=0

)

🤖 Prompt for AI Agents

In scripts/legacy/finalize_emotion_model.py around lines 327 to 333, the labels and predictions are used directly as tensors in metric calculations, which can cause type or dtype issues. Convert both labels and predictions tensors to NumPy arrays before passing them to f1_score and precision_recall_fscore_support functions to ensure compatibility and avoid errors.

coderabbitai · 2025-08-10T20:18:55Z

        data_loader = GoEmotionsDataLoader()
        test_data = data_loader.get_test_data()
        _, tokenizer = create_bert_emotion_classifier()
-        
+
        metrics = evaluate_ensemble(ensemble, test_data, tokenizer, device)


⚠️ Potential issue

Critical: repeated incorrect unpacking; also tokenizer unused.

Same issue: create_bert_emotion_classifier() does not return a tokenizer. Moreover, evaluate_ensemble(...) doesn’t use tokenizer at all.

- _, tokenizer = create_bert_emotion_classifier() - metrics = evaluate_ensemble(ensemble, test_data, tokenizer, device) + _, _ = create_bert_emotion_classifier() + metrics = evaluate_ensemble(ensemble, test_data, device=device, tokenizer=None)

And simplify evaluate_ensemble signature to drop unused tokenizer param (see next comment).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

data_loader = GoEmotionsDataLoader()

test_data = data_loader.get_test_data()

_, tokenizer = create_bert_emotion_classifier()

metrics = evaluate_ensemble(ensemble, test_data, tokenizer, device)

data_loader = GoEmotionsDataLoader()

test_data = data_loader.get_test_data()

_, _ = create_bert_emotion_classifier()

metrics = evaluate_ensemble(ensemble, test_data, device=device, tokenizer=None)

🤖 Prompt for AI Agents

In scripts/legacy/finalize_emotion_model.py around lines 412 to 416, remove the incorrect unpacking of create_bert_emotion_classifier() since it does not return a tokenizer, and eliminate the tokenizer variable entirely as it is unused. Also, update the call to evaluate_ensemble() to remove the tokenizer argument, and subsequently modify the evaluate_ensemble function signature to drop the unused tokenizer parameter.

coderabbitai · 2025-08-10T20:18:56Z

            logger.info(f"   Temperature {temp}: F1 = {results.get('f1_score', 'N/A'):.4f}")
-            
+


⚠️ Potential issue

Guard f-string formatting for missing metrics.

results.get('f1_score', 'N/A') with :.4f will raise if key is absent. Format robustly.

- logger.info(f" Temperature {temp}: F1 = {results.get('f1_score', 'N/A'):.4f}") + f1 = results.get('f1_score') + f1_str = f"{f1:.4f}" if isinstance(f1, (int, float)) else "N/A" + logger.info(f" Temperature {temp}: F1 = {f1_str}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.info(f" Temperature {temp}: F1 = {results.get('f1_score', 'N/A'):.4f}")

f1 = results.get('f1_score')

f1_str = f"{f1:.4f}" if isinstance(f1, (int, float)) else "N/A"

logger.info(f" Temperature {temp}: F1 = {f1_str}")

🤖 Prompt for AI Agents

In scripts/testing/simple_temperature_test.py around lines 80 to 81, the f-string formatting uses :.4f on results.get('f1_score', 'N/A'), which raises an error if 'f1_score' is missing because 'N/A' is a string. Fix this by first checking if 'f1_score' exists and is a number before formatting; if not present, log 'N/A' without formatting. This ensures robust formatting without exceptions.

coderabbitai · 2025-08-10T20:18:56Z

    # Add mixed precision training loop
    training_loop_pattern = r'optimizer\.zero_grad\(\)\s+outputs = model\(input_ids=input_ids, attention_mask=attention_mask\)\s+loss = criterion\(outputs, labels\)\s+loss\.backward\(\)\s+optimizer\.step\(\)'
    training_loop_code = '''optimizer.zero_grad()
            with autocast():
                outputs = model(input_ids=input_ids, attention_mask=attention_mask)
                loss = criterion(outputs, labels)
-            
+
            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()'''
-    
+
    new_source = re.sub(training_loop_pattern, training_loop_code, new_source)


💡 Verification agent

🧩 Analysis chain

Regex-based code surgery is brittle; add a quick verification step

Patterns may miss slightly different formatting. Recommend verifying that all intended substitutions occurred before writing.

Add this check before saving:

assert 'GradScaler' in new_source and 'autocast' in new_source and 'scheduler.step' in new_source, "Expected training enhancements not injected"

Or run this script to scan the modified cell content after execution:

🏁 Script executed:

#!/bin/bash rg -n "autocast|GradScaler|scheduler\.step|cudnn\.benchmark|Early stopping triggered" notebooks/expanded_dataset_training_improved.ipynb

Length of output: 229

🏁 Script executed:

#!/bin/bash # List all notebook files in the repository echo "Notebook files found:" fd --extension ipynb echo "" # Search for mixed-precision and training enhancements in all notebooks echo "Searching for patterns in notebooks:" rg -n "autocast|GradScaler|scheduler\.step|cudnn\.benchmark|Early stopping triggered" -g '*.ipynb'

Length of output: 4871

Add verification in improve_expanded_training_notebook.py to guard against brittle regex substitutions

Our scan confirms that autocast, GradScaler, and scheduler.step appear in notebooks/training/expanded_dataset_training_improved.ipynb, so the substitution is working—but relying on a single regex is fragile. Please add an assertion right after your re.sub call in scripts/training/improve_expanded_training_notebook.py to fail fast if the injection didn’t occur:

new_source = re.sub(training_loop_pattern, training_loop_code, new_source) + # Verify that mixed‐precision and scheduler calls were injected + assert "GradScaler" in new_source, "Expected GradScaler injection missing" + assert "autocast" in new_source, "Expected autocast context missing" + assert "scheduler.step" in new_source, "Expected scheduler.step to remain in the loop"

• File needing change:
– scripts/training/improve_expanded_training_notebook.py (after line 105)

This will catch formatting variations or future edits that break your regex.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Add mixed precision training loop

training_loop_pattern = r'optimizer\.zero_grad\s+outputs = model$input_ids=input_ids, attention_mask=attention_mask$\s+loss = criterion$outputs, labels$\s+loss\.backward\s+optimizer\.step'

training_loop_code = '''optimizer.zero_grad()

with autocast():

outputs = model(input_ids=input_ids, attention_mask=attention_mask)

loss = criterion(outputs, labels)

scaler.scale(loss).backward()

scaler.step(optimizer)

scaler.update()'''

new_source = re.sub(training_loop_pattern, training_loop_code, new_source)

# Add mixed precision training loop

training_loop_pattern = r'optimizer\.zero_grad\s+outputs = model$input_ids=input_ids, attention_mask=attention_mask$\s+loss = criterion$outputs, labels$\s+loss\.backward\s+optimizer\.step'

training_loop_code = '''optimizer.zero_grad()

with autocast():

outputs = model(input_ids=input_ids, attention_mask=attention_mask)

loss = criterion(outputs, labels)

scaler.scale(loss).backward()

scaler.step(optimizer)

scaler.update()'''

new_source = re.sub(training_loop_pattern, training_loop_code, new_source)

# Verify that mixed-precision and scheduler calls were injected

assert "GradScaler" in new_source, "Expected GradScaler injection missing"

assert "autocast" in new_source, "Expected autocast context missing"

assert "scheduler.step" in new_source, "Expected scheduler.step to remain in the loop"

🤖 Prompt for AI Agents

In scripts/training/improve_expanded_training_notebook.py after line 105, add an assertion immediately following the re.sub call to verify that the substitution actually occurred. This can be done by checking that the new source code string differs from the original or by confirming the presence of the injected mixed precision training loop code. This assertion will cause the script to fail fast if the regex substitution did not apply, protecting against brittle regex failures due to formatting changes or future edits.

coderabbitai · 2025-08-10T20:18:56Z

            # Check if tokens available
            if self.buckets[client_key] < 0.999999:  # Use small epsilon to handle floating-point precision
                return False, "Rate limit exceeded", {
                    "client_key": client_key,
                    "tokens": self.buckets[client_key],
                    "rate_limit": self.config.requests_per_minute
                }
-            
+


🛠️ Refactor suggestion

Provide accurate retry_after (and optionally reset_time) in rate-limit responses

When tokens are exhausted, the 429 body sets a default retry_after=60. Compute and return the actual time until the next token.

- if self.buckets[client_key] < 0.999999: # Use small epsilon to handle floating-point precision - return False, "Rate limit exceeded", { - "client_key": client_key, - "tokens": self.buckets[client_key], - "rate_limit": self.config.requests_per_minute - } + if self.buckets[client_key] < 0.999999: # Use small epsilon to handle floating-point precision + rate_per_sec = self.config.requests_per_minute / 60.0 + missing = max(0.0, 1.0 - self.buckets[client_key]) + retry_after = max(0, int(missing / rate_per_sec)) if rate_per_sec > 0 else 60 + reset_time = int(time.time()) + retry_after + return False, "Rate limit exceeded", { + "client_key": client_key, + "tokens": self.buckets[client_key], + "rate_limit": self.config.requests_per_minute, + "retry_after": retry_after, + "reset_time": reset_time, + }

If you adopt this, also add import math if you prefer rounding up, or keep the integer cast as shown.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Check if tokens available

if self.buckets[client_key] < 0.999999: # Use small epsilon to handle floating-point precision

return False, "Rate limit exceeded", {

"client_key": client_key,

"tokens": self.buckets[client_key],

"rate_limit": self.config.requests_per_minute

}

# Check if tokens available

if self.buckets[client_key] < 0.999999: # Use small epsilon to handle floating-point precision

rate_per_sec = self.config.requests_per_minute / 60.0

missing = max(0.0, 1.0 - self.buckets[client_key])

retry_after = max(0, int(missing / rate_per_sec)) if rate_per_sec > 0 else 60

reset_time = int(time.time()) + retry_after

return False, "Rate limit exceeded", {

"client_key": client_key,

"tokens": self.buckets[client_key],

"rate_limit": self.config.requests_per_minute,

"retry_after": retry_after,

"reset_time": reset_time,

}

🤖 Prompt for AI Agents

In src/api_rate_limiter.py around lines 301 to 308, the rate-limit response currently returns a fixed retry_after value of 60 seconds when tokens are exhausted. Modify the code to calculate the actual retry_after time based on the token refill rate and the current token count, then include this computed retry_after value in the response dictionary. Optionally, add a reset_time field if relevant. Also, add an import for the math module if you choose to use math.ceil for rounding the retry_after value.

coderabbitai · 2025-08-10T20:18:57Z

                model_data = torch.load(model_path, map_location='cpu', weights_only=True)
                validation_info['loadable'] = True
-                
+
                # Validate model data structure
                if not isinstance(model_data, dict):
                    validation_info['issues'].append("Model file is not a valid state dict")
                else:
                    # Check for required keys
                    if 'state_dict' not in model_data:
                        validation_info['issues'].append("Model file missing state_dict")
-                    
+
                    if 'config' not in model_data:
                        validation_info['issues'].append("Model file missing config")
-                
+
            except Exception as e:


🛠️ Refactor suggestion

Add backward-compatible torch.load fallback for weights_only.

weights_only is not supported in older PyTorch (1.x). Fallback improves compatibility with your own version matrix.

- model_data = torch.load(model_path, map_location='cpu', weights_only=True) + try: + model_data = torch.load(model_path, map_location='cpu', weights_only=True) + except TypeError: + # Fallback for PyTorch < 2.0 + model_data = torch.load(model_path, map_location='cpu')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model_data = torch.load(model_path, map_location='cpu', weights_only=True)

validation_info['loadable'] = True

# Validate model data structure

if not isinstance(model_data, dict):

validation_info['issues'].append("Model file is not a valid state dict")

else:

# Check for required keys

if 'state_dict' not in model_data:

validation_info['issues'].append("Model file missing state_dict")

if 'config' not in model_data:

validation_info['issues'].append("Model file missing config")

except Exception as e:

try:

model_data = torch.load(model_path, map_location='cpu', weights_only=True)

except TypeError:

# Fallback for PyTorch < 2.0

model_data = torch.load(model_path, map_location='cpu')

validation_info['loadable'] = True

# Validate model data structure

if not isinstance(model_data, dict):

validation_info['issues'].append("Model file is not a valid state dict")

else:

# Check for required keys

if 'state_dict' not in model_data:

validation_info['issues'].append("Model file missing state_dict")

if 'config' not in model_data:

validation_info['issues'].append("Model file missing config")

except Exception as e:

🤖 Prompt for AI Agents

In src/models/secure_loader/model_validator.py around lines 197 to 211, the torch.load call uses the weights_only parameter which is unsupported in older PyTorch versions. To fix this, add a try-except block that first attempts to load with weights_only=True, and if it raises a TypeError, retry loading without the weights_only argument. This fallback ensures compatibility with older PyTorch versions while maintaining the intended behavior.

coderabbitai · 2025-08-10T20:18:57Z

                    model_data = torch.load(model_path, map_location='cpu', weights_only=True)
-                    
+
                    # Filter model_config to only include valid constructor parameters
                    import inspect
                    constructor_params = inspect.signature(model_class.__init__).parameters
                    valid_params = {k: v for k, v in model_config.items() if k in constructor_params}
                    model = model_class(**valid_params)
-                    
+
                    if 'state_dict' in model_data:
                        model.load_state_dict(model_data['state_dict'])
-                    
+
                    structure_valid, structure_info = self.validate_model_structure(model)
                    comprehensive_info['structure_validation'] = structure_info
                    if not structure_valid:


🛠️ Refactor suggestion

Mirror torch.load fallback during comprehensive load as well.

Keeps behavior consistent across validation paths.

- model_data = torch.load(model_path, map_location='cpu', weights_only=True) + try: + model_data = torch.load(model_path, map_location='cpu', weights_only=True) + except TypeError: + model_data = torch.load(model_path, map_location='cpu')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model_data = torch.load(model_path, map_location='cpu', weights_only=True)

# Filter model_config to only include valid constructor parameters

import inspect

constructor_params = inspect.signature(model_class.__init__).parameters

valid_params = {k: v for k, v in model_config.items() if k in constructor_params}

model = model_class(**valid_params)

if 'state_dict' in model_data:

model.load_state_dict(model_data['state_dict'])

structure_valid, structure_info = self.validate_model_structure(model)

comprehensive_info['structure_validation'] = structure_info

if not structure_valid:

try:

model_data = torch.load(model_path, map_location='cpu', weights_only=True)

except TypeError:

model_data = torch.load(model_path, map_location='cpu')

# Filter model_config to only include valid constructor parameters

import inspect

constructor_params = inspect.signature(model_class.__init__).parameters

valid_params = {k: v for k, v in model_config.items() if k in constructor_params}

model = model_class(**valid_params)

if 'state_dict' in model_data:

model.load_state_dict(model_data['state_dict'])

structure_valid, structure_info = self.validate_model_structure(model)

comprehensive_info['structure_validation'] = structure_info

if not structure_valid:

🤖 Prompt for AI Agents

In src/models/secure_loader/model_validator.py around lines 373 to 386, the torch.load call during comprehensive load does not use the fallback parameter like in other validation paths, causing inconsistent behavior. Modify the torch.load invocation to include the fallback argument as done elsewhere to ensure consistent loading behavior across all validation paths.

refactor: remove unnecessary whitespace

ed60e3e

Blank lines should not contain any tabs or spaces.

refactor: remove unnecessary whitespace

a6628a7

Resolved issues in the following files with DeepSource Autofix: 1. scripts/legacy/comprehensive_model_validation.py 2. scripts/legacy/deep_model_analysis.py

uelkerd requested a review from Copilot August 8, 2025 23:30

Copilot AI reviewed Aug 8, 2025

View reviewed changes

gemini-code-assist Bot reviewed Aug 8, 2025

View reviewed changes

coderabbitai Bot reviewed Aug 10, 2025

View reviewed changes


		print(f"\n📊 EMOTION MAPPING ANALYSIS")

		print("\n📊 EMOTION MAPPING ANALYSIS")

		with open('data/unique_fallback_dataset.json', 'w') as f:
		json.dump(unique_samples, f, indent=2)

		logger.info(f" Temperature {temp}: F1 = {results.get('f1_score', 'N/A'):.4f}")

-            logger.info(f"   Temperature {temp}: F1 = {results.get('f1_score', 'N/A'):.4f}")
+            f1 = results.get('f1_score')
+            f1_str = f"{f1:.4f}" if isinstance(f1, (int, float)) else "N/A"
+            logger.info(f"   Temperature {temp}: F1 = {f1_str}")

Conversation

deepsource-autofix Bot commented Aug 8, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

deepsource-io Bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Analysis Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

uelkerd commented Aug 8, 2025

Uh oh!

uelkerd commented Aug 8, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

uelkerd commented Aug 10, 2025

Uh oh!

coderabbitai Bot commented Aug 10, 2025

Uh oh!

coderabbitai Bot commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Enhanced Bulletproof Training Flow

Notebook Training Function Enhancement (Regex Transformation)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

deepsource-autofix Bot commented Aug 8, 2025 •

edited by coderabbitai Bot

Loading

deepsource-io Bot commented Aug 8, 2025 •

edited

Loading

coderabbitai Bot commented Aug 10, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)