Skip to content

Phase 2.3: Download Wait Logic ✅ Added exponential backoff to app/states/texas/texas_downloader.py:#8

Open
jreakin wants to merge 1 commit into01-27-summary_i_ve_completed_the_implementation_of_phase_1_quick_wins_from_the_code_review_plan._here_s_what_was_accomplished_from
phase2_to_3
Open

Phase 2.3: Download Wait Logic ✅ Added exponential backoff to app/states/texas/texas_downloader.py:#8
jreakin wants to merge 1 commit into01-27-summary_i_ve_completed_the_implementation_of_phase_1_quick_wins_from_the_code_review_plan._here_s_what_was_accomplished_from
phase2_to_3

Conversation

@jreakin
Copy link
Member

@jreakin jreakin commented Jan 28, 2026

Exponential Backoff for Texas Downloader and Core Module Refactoring

Added exponential backoff to app/states/texas/texas_downloader.py:

  • Created exponential_backoff() helper function with jitter to prevent synchronized retries
  • Added 1-hour timeout protection for downloads
  • Replaced fixed time.sleep() calls with adaptive delays:
    • Error recovery: 1s base, 30s max
    • Download polling: 2s base, 60s max
  • Added detailed logging for delay times and elapsed duration

Phase 3: Architecture Refactor ✅

Moved unified modules to app/core/:

  • Moved 6 unified_*.py files from app/states/ to app/core/
  • Created app/core/init.py with proper exports
  • Updated imports in 38 Python files and 6 documentation files
  • Fixed relative imports in remaining app/states/ files

Verification

  • All 5 unit tests pass
  • Core module imports work correctly
  • No remaining old import paths in the codebase
  • New code passes linting (pre-existing lint issues in moved files are not related to my changes)

Summary by CodeRabbit

  • New Features

    • Consolidated public API for Campaign Finance Core module, centralizing access to field definitions, unified data models, SQL models, database management, and data loaders.
  • Improvements

    • Enhanced data downloader robustness with exponential backoff retry logic, timeout protection, and improved progress tracking and logging.
  • Chores

    • Internal module reorganization for consistency.

✏️ Tip: You can customize this high-level summary in your review settings.

Added exponential backoff to app/states/texas/texas_downloader.py:

Created exponential_backoff() helper function with jitter to prevent synchronized retries
Added 1-hour timeout protection for downloads
Replaced fixed time.sleep() calls with adaptive delays:
Error recovery: 1s base, 30s max
Download polling: 2s base, 60s max
Added detailed logging for delay times and elapsed duration
Phase 3: Architecture Refactor ✅
Moved unified modules to app/core/:

Moved 6 unified_*.py files from app/states/ to app/core/
Created app/core/__init__.py with proper exports
Updated imports in 38 Python files and 6 documentation files
Fixed relative imports in remaining app/states/ files
Verification
All 5 unit tests pass
Core module imports work correctly
No remaining old import paths in the codebase
New code passes linting (pre-existing lint issues in moved files are not related to my changes)
@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • ✅ Full review completed - (🔄 Check again to review again)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Member Author

jreakin commented Jan 28, 2026

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds more resilient download retry/polling behavior for the Texas TEC downloader and refactors the “unified_*” infrastructure into a new app/core/ module with updated imports across code, tests, scripts, and docs.

Changes:

  • Added exponential backoff + jitter and a 1-hour timeout to Texas download polling/error recovery.
  • Introduced app/core/ unified modules (models, field library, integration helpers) and exported them via app/core/__init__.py.
  • Updated imports throughout tests, scripts, and documentation from app.states.unified_* to app.core.unified_*.

Reviewed changes

Copilot reviewed 50 out of 53 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/test_unified_processor.py Updates unified SQLModels import to app.core.
tests/test_unified_models.py Updates unified (dataclass) models import to app.core.
tests/test_unified_loader.py Updates unified SQLModels import to app.core.
tests/test_state_loader.py Updates unified state loader imports (including docs snippet).
tests/test_sqlmodels.py Updates unified SQLModels + db_manager imports to app.core.
tests/test_single_record.py Updates unified state loader import to app.core.
tests/test_simple_transaction.py Updates unified SQLModels imports to app.core (including inner import).
tests/test_simple_address_dedup.py Updates unified SQLModels imports to app.core (including inner import).
tests/test_session_address_dedup.py Updates unified SQLModels imports to app.core (including inner import).
tests/test_officer_financial_activities.py Updates unified SQLModels + db_manager imports to app.core.
tests/test_limited_load.py Updates unified state loader import to app.core.
tests/test_committee_persons.py Updates unified SQLModels + db_manager imports to app.core (including inner import).
tests/test_committee_creation.py Updates unified state loader import to app.core.
tests/test_automatic_linking.py Updates unified SQLModels + db_manager imports to app.core.
tests/test_address_dedup_fix.py Updates unified SQLModels imports to app.core.
tests/test_address_dedup.py Updates unified SQLModels import to app.core.
tests/simple_address_test.py Updates unified SQLModels imports to app.core (including inner import).
tests/conftest.py Updates field library fixture import to app.core.
scripts/loaders/unified_postgres_loader.py Updates unified SQLModels import to app.core.
scripts/loaders/simple_loader.py Updates unified SQLModels imports to app.core (including inner import).
scripts/loaders/rebuild_and_load_texas.py Updates unified SQLModels import block to app.core.
scripts/loaders/production_loader.py Updates field library + unified SQLModels imports to app.core.
scripts/loaders/optimized_loader.py Updates unified SQLModels imports to app.core (including inner import).
scripts/loaders/load_working_files.py Updates unified state loader import to app.core.
scripts/loaders/load_single_file_test.py Updates unified state loader import to app.core.
scripts/loaders/load_oklahoma_test.py Updates unified state loader import to app.core.
scripts/loaders/load_data_batched.py Updates unified state loader import to app.core.
scripts/loaders/load_all_data_simple.py Updates unified state loader import to app.core.
scripts/loaders/load_actual_data_to_postgres.py Updates unified SQLModels import to app.core.
scripts/loaders/high_performance_loader.py Updates unified SQLModels imports to app.core (including inner import).
scripts/loaders/fixed_loader.py Updates unified SQLModels imports to app.core (including inner import).
scripts/debug/transaction_save.py Updates unified SQLModels import to app.core.
scripts/debug/committee.py Updates unified SQLModels import to app.core.
scripts/debug/address_dedup.py Updates unified SQLModels imports to app.core (including inner import).
scripts/analysis/analyze_fields.py Updates field library import to app.core.
docs/TESTING.md Updates field library import example to app.core.
docs/RUNBOOK.md Updates field library import example to app.core.
docs/DATA_DICTIONARY.md Updates field library import examples to app.core.
app/states/texas/texas_downloader.py Adds exponential backoff + jitter, timeout, and improved logging during download wait loops.
app/states/postgres_state_loader.py Updates loader base class import to app.core.unified_state_loader.
app/states/postgres_config.py Updates DB manager import to app.core.unified_database.
app/states/field_analyzer.py Updates unified field library imports to app.core.
app/ingest/file_reader.py Updates unified field library import to app.core.
app/core/unified_state_loader.py Updates internal SQLModels import to app.core.
app/core/unified_sqlmodels.py Updates internal db_manager imports to app.core.
app/core/unified_models.py Adds dataclass-based unified models + builder/processor.
app/core/unified_integration.py Adds integration utilities/examples for unified models usage.
app/core/unified_field_library.py Adds unified field definitions and state mapping registry.
app/core/unified_database.py Updates PostgresConfig import source to app.states.
app/core/init.py Adds central exports for core unified modules.
CONTRIBUTING.md Updates field library import example to app.core.
CLAUDE.md Updates unified SQLModels import example to app.core.
AGENTS.md Updates db_manager / field_library / unified SQLModels import examples to app.core.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from icecream import ic

from .unified_field_library import UnifiedFieldLibrary, FieldCategory, FieldType, FieldDefinition, StateFieldMapping
from app.core.unified_field_library import UnifiedFieldLibrary, FieldCategory, FieldType, FieldDefinition, StateFieldMapping
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'FieldCategory' is not used.
Import of 'FieldType' is not used.
Import of 'StateFieldMapping' is not used.

Suggested change
from app.core.unified_field_library import UnifiedFieldLibrary, FieldCategory, FieldType, FieldDefinition, StateFieldMapping
from app.core.unified_field_library import UnifiedFieldLibrary, FieldDefinition

Copilot uses AI. Check for mistakes.
import polars as pl

from app.states.unified_field_library import FieldDefinition, FieldType, StateFieldMapping, field_library
from app.core.unified_field_library import FieldDefinition, FieldType, StateFieldMapping, field_library
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'FieldDefinition' is not used.

Suggested change
from app.core.unified_field_library import FieldDefinition, FieldType, StateFieldMapping, field_library
from app.core.unified_field_library import FieldType, StateFieldMapping, field_library

Copilot uses AI. Check for mistakes.
Comment on lines +11 to 14
from app.core.unified_sqlmodels import (
unified_sql_processor, UnifiedTransaction, PersonRole, TransactionType,
CommitteeRole, UnifiedPerson, UnifiedCommittee, UnifiedTransactionPerson
)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'unified_sql_processor' is not used.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to 14
from app.core.unified_sqlmodels import (
unified_sql_processor, UnifiedTransaction, PersonRole, TransactionType,
CommitteeRole
)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'UnifiedTransaction' is not used.
Import of 'TransactionType' is not used.
Import of 'unified_sql_processor' is not used.
Import of 'PersonRole' is not used.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to 14
from app.core.unified_sqlmodels import (
unified_sql_processor, UnifiedTransaction, PersonRole, TransactionType,
CommitteeRole, UnifiedPerson, UnifiedCommittee, UnifiedTransactionPerson
)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'unified_sql_processor' is not used.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to 12
from app.core.unified_sqlmodels import (
unified_sql_processor, UnifiedTransaction, PersonRole, TransactionType
)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'UnifiedTransaction' is not used.
Import of 'PersonRole' is not used.

Copilot uses AI. Check for mistakes.
from pathlib import Path
from icecream import ic
from app.states.unified_models import unified_processor, TransactionType, PersonType
from app.core.unified_models import unified_processor, TransactionType, PersonType
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'TransactionType' is not used.
Import of 'PersonType' is not used.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
tests/test_simple_transaction.py (1)

25-25: Multiple calls to UnifiedSQLModelBuilder missing required state_id argument.

The UnifiedSQLModelBuilder constructor signature requires state_id as a positional argument: __init__(self, state: str, state_id: Optional[int], state_code: Optional[str] = None). This call only passes state, which will raise TypeError: __init__() missing 1 required positional argument: 'state_id'.

🐛 Affected instances
  • tests/test_simple_transaction.py:25
  • tests/test_session_address_dedup.py
  • tests/test_address_dedup.py
  • tests/simple_address_test.py
  • scripts/debug/transaction_save.py
  • scripts/debug/committee.py

All use: UnifiedSQLModelBuilder("oklahoma")

Pass state_id=None as a keyword argument or provide the state_id value:

-    builder = UnifiedSQLModelBuilder("oklahoma")
+    builder = UnifiedSQLModelBuilder("oklahoma", state_id=None)
scripts/loaders/high_performance_loader.py (1)

106-111: NameError at runtime: UnifiedAddress and select are not in scope.

UnifiedAddress and select are imported inside process_batch_optimized (lines 125-126), but this code block in high_performance_loader executes before that function and references both symbols without importing them at module level.

🐛 Proposed fix: Add missing imports at module level
 from app.core.unified_sqlmodels import unified_sql_processor
+from app.core.unified_sqlmodels import UnifiedAddress
+from sqlalchemy import select
 from app.states.postgres_config import create_postgres_database_manager
scripts/debug/transaction_save.py (1)

24-25: Pass state_id to UnifiedSQLModelBuilder constructor.

The UnifiedSQLModelBuilder.__init__ signature requires state_id: Optional[int] as a mandatory second parameter. Calling it with only "oklahoma" will raise a TypeError at runtime. Provide a value for state_id (e.g., state_id=1) or modify the constructor to make state_id optional with a default value.

scripts/loaders/load_oklahoma_test.py (1)

37-38: Remove setting max_records_per_file and max_files attributes; these do not exist on UnifiedStateLoader.

The UnifiedStateLoader class does not define max_records_per_file or max_files as instance attributes. The load_state_data() method accepts a max_records parameter instead. Pass it directly to the method:

result = loader.load_state_data(
    max_records=1000,
    auto_link_officers=True,
    create_relationships=True,
    progress_callback=...
)

Note: If you need separate per-file and total-file limits, the current API only supports a single max_records limit per file. Consider whether max_records=1000 is sufficient, or if the API needs to be extended.

CLAUDE.md (2)

78-85: Important Files table references stale file paths.

The import example on line 69 correctly uses app.core.unified_sqlmodels, but the Important Files table still references the old locations (app/states/unified_sqlmodels.py and app/states/unified_field_library.py). According to the PR objectives, these files were moved to app/core/.

📝 Suggested fix
 | File | Purpose |
 |------|---------|
-| `app/states/unified_sqlmodels.py` | Unified SQLModel implementations |
-| `app/states/unified_field_library.py` | Cross-state field mapping |
+| `app/core/unified_sqlmodels.py` | Unified SQLModel implementations |
+| `app/core/unified_field_library.py` | Cross-state field mapping |
 | `app/ingest/file_reader.py` | Schema-driven CSV/file parsing |
 | `app/abcs/abc_category.py` | Core data processing ABC |
 | `scripts/loaders/production_loader.py` | Production data loader |

34-48: Architecture overview may need updating.

The Architecture Overview section still shows unified_*.py files under app/states/. If the unified modules have been moved to app/core/ as indicated by the PR objectives, consider updating this section to reflect the new structure by adding an app/core/ entry.

scripts/debug/committee.py (1)

22-27: Constructor call will fail due to missing required state_id argument.

The UnifiedSQLModelBuilder constructor requires both state and state_id as positional arguments: __init__(state: str, state_id: Optional[int], state_code: Optional[str] = None). The call on line 23 passes only the state name, which will raise TypeError: __init__() missing 1 required positional argument: 'state_id' at runtime. Pass the state ID as the second argument to fix this.

tests/test_session_address_dedup.py (1)

50-51: Missing required state_id argument.

UnifiedSQLModelBuilder.__init__ requires state_id as the second positional argument (signature: state: str, state_id: Optional[int], state_code: Optional[str] = None). This call will raise TypeError at runtime.

🐛 Proposed fix
     # Create builder
-    builder = UnifiedSQLModelBuilder("oklahoma")
+    builder = UnifiedSQLModelBuilder("oklahoma", state_id=None)

Or if a specific state ID is needed for Oklahoma:

     # Create builder
-    builder = UnifiedSQLModelBuilder("oklahoma")
+    builder = UnifiedSQLModelBuilder("oklahoma", state_id=40, state_code="OK")
🤖 Fix all issues with AI agents
In `@app/core/__init__.py`:
- Around line 60-100: The __all__ list is not alphabetically sorted (Ruff
RUF022); reorder the entries alphabetically to satisfy static analysis while
preserving the existing logical groupings/comments (e.g., Field Library, Base
Models, SQLModels, Database, Loaders). Update the items such as "field_library",
"FieldCategory", "FieldDefinition", "FieldType", "StateFieldMapping",
"UnifiedFieldLibrary", and the other group members (e.g., "unified_processor",
"BaseUnifiedTransaction", "UnifiedTransaction", "UnifiedPerson", "db_manager",
"UnifiedStateLoader", etc.) so each group is alphabetized; keep the group
headers and formatting intact and ensure no duplicates are introduced.
- Around line 17-25: Add a short clarifying comment above the grouped imports in
app/core/__init__.py explaining that the aliases like UnifiedTransaction as
BaseUnifiedTransaction, TransactionType as BaseTransactionType, PersonType as
BasePersonType, UnifiedPerson as BaseUnifiedPerson, UnifiedAddress as
BaseUnifiedAddress refer to the original dataclass-based/base models (used for
core logic/serialization), while separately defined SQLModel/ORM-backed types
named UnifiedTransaction, UnifiedPerson, etc. exist for DB mapping; mention
unified_processor to indicate which module provides processing logic and note
that the `Base*` prefix denotes the non-ORM/base variants to avoid confusion for
consumers.

In `@app/core/unified_state_loader.py`:
- Around line 348-349: Remove the redundant inline import of UnifiedAddress
inside unified_state_loader.py (the local inline "from
app.core.unified_sqlmodels import UnifiedAddress") since UnifiedAddress is
already imported at module level; keep the sqlalchemy select import if needed
and only remove the duplicated UnifiedAddress import to avoid redundancy.

In `@app/states/texas/texas_downloader.py`:
- Around line 77-78: The initial polling logic that does
time.sleep(exponential_backoff(...)) then breaks if no .crdownload exists should
be changed to actively wait (with exponential_backoff) until either a new zip
file appears or a .crdownload (in-progress) file is observed, while still
enforcing the existing timeout; update the code around exponential_backoff(),
the loop that checks download_dir / target zip (e.g., any function or block
referencing download_dir, target_zip_name or find_latest_zip()), and the
start_time/timeout_seconds checks so the loop only exits on timeout or when the
zip is actually ready, and apply the same fix to the analogous polling block
referenced later (lines 93-118).

In `@scripts/loaders/high_performance_loader.py`:
- Around line 125-126: Remove the redundant in-function imports that duplicate
module-level imports: delete the local imports of UnifiedAddress and
UnifiedCommittee from app.core.unified_sqlmodels and the select import from
sqlalchemy (the ones inside the function) and rely on the existing top-level
imports instead; ensure no other references rely on those local import lines and
run tests/style checks to confirm cleanup.

In `@scripts/loaders/optimized_loader.py`:
- Around line 126-127: The imports for UnifiedAddress and select are inside the
inner loop of process_batch causing repeated import lookup; move "from
app.core.unified_sqlmodels import UnifiedAddress" and "from sqlalchemy import
select" out of the inner loop to the top of the module or top of process_batch,
then remove the in-loop import statements so the loop directly references
UnifiedAddress and select.

In `@tests/test_session_address_dedup.py`:
- Around line 68-69: Move the two imports "from app.core.unified_sqlmodels
import UnifiedAddress" and "from sqlalchemy import select" out of the
loop/function body and add them at the module top with the other imports; then
remove the in-function duplicates (the imports currently inside the loop and the
later duplicated import in that test function) so the test uses the top-level
UnifiedAddress and select references. Ensure there are no circular-import issues
when relocating them.

Comment on lines +17 to +25
from .unified_models import (
unified_processor,
UnifiedTransaction as BaseUnifiedTransaction,
TransactionType as BaseTransactionType,
PersonType as BasePersonType,
UnifiedPerson as BaseUnifiedPerson,
UnifiedAddress as BaseUnifiedAddress,
UnifiedDataProcessor,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider documenting the distinction between Base* and SQLModel types.

The aliasing pattern (UnifiedTransaction as BaseUnifiedTransaction, etc.) distinguishes dataclass-based models from SQLModel-based models. This is a reasonable approach, but consumers may find it confusing that BaseUnifiedTransaction and UnifiedTransaction are different classes. Consider adding a brief comment explaining this distinction.

📝 Proposed documentation
+# Base models are dataclass-based (from unified_models), used for in-memory processing
+# SQLModel-based equivalents (from unified_sqlmodels) are for database persistence
 from .unified_models import (
     unified_processor,
     UnifiedTransaction as BaseUnifiedTransaction,
     TransactionType as BaseTransactionType,
     PersonType as BasePersonType,
     UnifiedPerson as BaseUnifiedPerson,
     UnifiedAddress as BaseUnifiedAddress,
     UnifiedDataProcessor,
 )
🤖 Prompt for AI Agents
In `@app/core/__init__.py` around lines 17 - 25, Add a short clarifying comment
above the grouped imports in app/core/__init__.py explaining that the aliases
like UnifiedTransaction as BaseUnifiedTransaction, TransactionType as
BaseTransactionType, PersonType as BasePersonType, UnifiedPerson as
BaseUnifiedPerson, UnifiedAddress as BaseUnifiedAddress refer to the original
dataclass-based/base models (used for core logic/serialization), while
separately defined SQLModel/ORM-backed types named UnifiedTransaction,
UnifiedPerson, etc. exist for DB mapping; mention unified_processor to indicate
which module provides processing logic and note that the `Base*` prefix denotes
the non-ORM/base variants to avoid confusion for consumers.

Comment on lines +60 to +100
__all__ = [
# Field Library
"field_library",
"FieldCategory",
"FieldType",
"FieldDefinition",
"StateFieldMapping",
"UnifiedFieldLibrary",
# Base Models (dataclass-based)
"unified_processor",
"BaseUnifiedTransaction",
"BaseTransactionType",
"BasePersonType",
"BaseUnifiedPerson",
"BaseUnifiedAddress",
"UnifiedDataProcessor",
# SQLModels
"unified_sql_processor",
"UnifiedTransaction",
"UnifiedPerson",
"UnifiedAddress",
"UnifiedCommittee",
"UnifiedTransactionPerson",
"UnifiedCommitteePerson",
"UnifiedTransactionVersion",
"UnifiedPersonVersion",
"UnifiedAddressVersion",
"UnifiedCommitteeVersion",
"UnifiedCommitteePersonVersion",
"TransactionType",
"PersonType",
"PersonRole",
"CommitteeRole",
"State",
# Database
"db_manager",
"UnifiedDatabaseManager",
# Loaders
"UnifiedStateLoader",
"UnifiedStateProcessor",
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider sorting __all__ for consistency.

Static analysis (Ruff RUF022) flags that __all__ is not sorted. While the current grouping by category is logical and readable, sorting alphabetically within each category or applying isort-style sorting can improve maintainability and reduce merge conflicts.

🧰 Tools
🪛 Ruff (0.14.14)

60-100: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🤖 Prompt for AI Agents
In `@app/core/__init__.py` around lines 60 - 100, The __all__ list is not
alphabetically sorted (Ruff RUF022); reorder the entries alphabetically to
satisfy static analysis while preserving the existing logical groupings/comments
(e.g., Field Library, Base Models, SQLModels, Database, Loaders). Update the
items such as "field_library", "FieldCategory", "FieldDefinition", "FieldType",
"StateFieldMapping", "UnifiedFieldLibrary", and the other group members (e.g.,
"unified_processor", "BaseUnifiedTransaction", "UnifiedTransaction",
"UnifiedPerson", "db_manager", "UnifiedStateLoader", etc.) so each group is
alphabetized; keep the group headers and formatting intact and ensure no
duplicates are introduced.

Comment on lines +348 to 349
from app.core.unified_sqlmodels import UnifiedAddress
from sqlalchemy import select
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Redundant import: UnifiedAddress is already imported at module level.

UnifiedAddress is already imported at the top of the file (line 17) from .unified_sqlmodels. This inline import is unnecessary and can be removed.

♻️ Suggested fix
                     # Check if address already exists in session
-                        from app.core.unified_sqlmodels import UnifiedAddress
                         from sqlalchemy import select
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from app.core.unified_sqlmodels import UnifiedAddress
from sqlalchemy import select
from sqlalchemy import select
🤖 Prompt for AI Agents
In `@app/core/unified_state_loader.py` around lines 348 - 349, Remove the
redundant inline import of UnifiedAddress inside unified_state_loader.py (the
local inline "from app.core.unified_sqlmodels import UnifiedAddress") since
UnifiedAddress is already imported at module level; keep the sqlalchemy select
import if needed and only remove the duplicated UnifiedAddress import to avoid
redundancy.

Comment on lines +77 to +78
# Initial wait for download to start
time.sleep(exponential_backoff(0, base=2.0, max_delay=10.0))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid exiting the polling loop before the download actually starts.

If no .crdownload exists after the initial sleep, the loop breaks immediately, bypassing the timeout and potentially using a stale zip or raising FileNotFoundError under slow starts. Consider waiting until either a new zip appears or a .crdownload is observed, with backoff and the existing timeout.

🛠️ Proposed fix (wait for download to start or finish)
-        # Initial wait for download to start
-        time.sleep(exponential_backoff(0, base=2.0, max_delay=10.0))
+        download_start_time = time.time()
+        # Initial wait for download to start
+        time.sleep(exponential_backoff(0, base=2.0, max_delay=10.0))
@@
-        poll_attempt = 0
-        download_start_time = time.time()
+        poll_attempt = 0
         while True:
             # Check for timeout
             elapsed = time.time() - download_start_time
             if elapsed > DOWNLOAD_TIMEOUT_SECONDS:
                 raise TimeoutError(f"Download timed out after {DOWNLOAD_TIMEOUT_SECONDS // 60} minutes")
 
             dl_files = list(tmp.glob("*.crdownload"))
+            zip_files = list(tmp.glob("*.zip"))
+            latest_zip = max(zip_files, key=lambda f: f.stat().st_mtime) if zip_files else None
             if dl_files:
                 if not in_progress:
                     progress.update_task(_task2, "Still in progress")
                     in_progress = True
                 # Use exponential backoff for polling (2s to 60s)
                 poll_delay = exponential_backoff(poll_attempt, base=2.0, max_delay=60.0)
                 poll_attempt = min(poll_attempt + 1, 10)  # Cap attempts to limit max delay
                 logger.debug("Download in progress, waiting", extra={"delay_seconds": round(poll_delay, 2), "elapsed_minutes": round(elapsed / 60, 1)})
                 time.sleep(poll_delay)
-            else:
-                if in_progress:
-                    progress.update_task(_task2, "Download Complete")
-                    logger.info("Download completed", extra={"elapsed_minutes": round(elapsed / 60, 1)})
-                    _safe_to_delete = True
-                break
+            elif latest_zip and latest_zip.stat().st_mtime >= download_start_time:
+                progress.update_task(_task2, "Download Complete")
+                logger.info("Download completed", extra={"elapsed_minutes": round(elapsed / 60, 1)})
+                _safe_to_delete = True
+                break
+            else:
+                poll_delay = exponential_backoff(poll_attempt, base=2.0, max_delay=10.0)
+                poll_attempt = min(poll_attempt + 1, 10)
+                logger.debug(
+                    "Waiting for download to start",
+                    extra={"delay_seconds": round(poll_delay, 2), "elapsed_minutes": round(elapsed / 60, 1)},
+                )
+                time.sleep(poll_delay)

Also applies to: 93-118

🤖 Prompt for AI Agents
In `@app/states/texas/texas_downloader.py` around lines 77 - 78, The initial
polling logic that does time.sleep(exponential_backoff(...)) then breaks if no
.crdownload exists should be changed to actively wait (with exponential_backoff)
until either a new zip file appears or a .crdownload (in-progress) file is
observed, while still enforcing the existing timeout; update the code around
exponential_backoff(), the loop that checks download_dir / target zip (e.g., any
function or block referencing download_dir, target_zip_name or
find_latest_zip()), and the start_time/timeout_seconds checks so the loop only
exits on timeout or when the zip is actually ready, and apply the same fix to
the analogous polling block referenced later (lines 93-118).

Comment on lines +125 to 126
from app.core.unified_sqlmodels import UnifiedAddress, UnifiedCommittee
from sqlalchemy import select
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Import path update is correct, but imports are duplicated.

The import path change to app.core.unified_sqlmodels is consistent with the refactor. However, once the module-level imports are added (as suggested above), these in-function imports become redundant and can be removed for cleaner code.

🤖 Prompt for AI Agents
In `@scripts/loaders/high_performance_loader.py` around lines 125 - 126, Remove
the redundant in-function imports that duplicate module-level imports: delete
the local imports of UnifiedAddress and UnifiedCommittee from
app.core.unified_sqlmodels and the select import from sqlalchemy (the ones
inside the function) and rely on the existing top-level imports instead; ensure
no other references rely on those local import lines and run tests/style checks
to confirm cleanup.

Comment on lines +126 to 127
from app.core.unified_sqlmodels import UnifiedAddress
from sqlalchemy import select
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Move imports outside the loop to avoid repeated import overhead.

The imports for UnifiedAddress and select are inside the inner loop of process_batch. While Python caches imports, placing them at the function or module level improves clarity and avoids unnecessary lookup overhead on each iteration.

♻️ Proposed fix

Move imports to the top of the function:

 def process_batch(records, db_manager, progress, record_task):
     """Process a batch of records"""
+    from app.core.unified_sqlmodels import UnifiedAddress
+    from sqlalchemy import select
+    
     batch_success = 0
     batch_errors = 0
     
     with db_manager.get_session() as session:
         for i, record in enumerate(records):
             try:
                 # Use unified processor
                 transaction = unified_sql_processor.process_record(record, "oklahoma")
                 
                 if transaction:
                     # Save committee first if it exists
                     if transaction.committee:
                         session.merge(transaction.committee)
                         session.flush()
                     
                     # Save addresses for all persons with deduplication
                     for tx_person in transaction.persons:
                         if tx_person.person and tx_person.person.address:
                             # Check if address already exists in database
-                            from app.core.unified_sqlmodels import UnifiedAddress
-                            from sqlalchemy import select
                             existing_address = session.exec(
🤖 Prompt for AI Agents
In `@scripts/loaders/optimized_loader.py` around lines 126 - 127, The imports for
UnifiedAddress and select are inside the inner loop of process_batch causing
repeated import lookup; move "from app.core.unified_sqlmodels import
UnifiedAddress" and "from sqlalchemy import select" out of the inner loop to the
top of the module or top of process_batch, then remove the in-loop import
statements so the loop directly references UnifiedAddress and select.

Comment on lines +68 to 69
from app.core.unified_sqlmodels import UnifiedAddress
from sqlalchemy import select
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Move imports to module level.

The import path update is correct, but placing imports inside the loop body is a code smell. Move these to the top of the file alongside other imports for better readability and conventional structure.

♻️ Suggested refactor

At the top of the file:

 from app.core.unified_sqlmodels import UnifiedSQLModelBuilder
+from app.core.unified_sqlmodels import UnifiedAddress
 from app.states.postgres_config import create_postgres_database_manager
+from sqlalchemy import select, text

Then remove lines 68-69 and 96 from inside the function.

🤖 Prompt for AI Agents
In `@tests/test_session_address_dedup.py` around lines 68 - 69, Move the two
imports "from app.core.unified_sqlmodels import UnifiedAddress" and "from
sqlalchemy import select" out of the loop/function body and add them at the
module top with the other imports; then remove the in-function duplicates (the
imports currently inside the loop and the later duplicated import in that test
function) so the test uses the top-level UnifiedAddress and select references.
Ensure there are no circular-import issues when relocating them.

@sentry
Copy link

sentry bot commented Jan 28, 2026

✅ All tests passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants