Phase 2.3: Download Wait Logic ✅ Added exponential backoff to app/states/texas/texas_downloader.py:#8
Conversation
Added exponential backoff to app/states/texas/texas_downloader.py: Created exponential_backoff() helper function with jitter to prevent synchronized retries Added 1-hour timeout protection for downloads Replaced fixed time.sleep() calls with adaptive delays: Error recovery: 1s base, 30s max Download polling: 2s base, 60s max Added detailed logging for delay times and elapsed duration Phase 3: Architecture Refactor ✅ Moved unified modules to app/core/: Moved 6 unified_*.py files from app/states/ to app/core/ Created app/core/__init__.py with proper exports Updated imports in 38 Python files and 6 documentation files Fixed relative imports in remaining app/states/ files Verification All 5 unit tests pass Core module imports work correctly No remaining old import paths in the codebase New code passes linting (pre-existing lint issues in moved files are not related to my changes)
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Pull request overview
Adds more resilient download retry/polling behavior for the Texas TEC downloader and refactors the “unified_*” infrastructure into a new app/core/ module with updated imports across code, tests, scripts, and docs.
Changes:
- Added exponential backoff + jitter and a 1-hour timeout to Texas download polling/error recovery.
- Introduced
app/core/unified modules (models, field library, integration helpers) and exported them viaapp/core/__init__.py. - Updated imports throughout tests, scripts, and documentation from
app.states.unified_*toapp.core.unified_*.
Reviewed changes
Copilot reviewed 50 out of 53 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_unified_processor.py | Updates unified SQLModels import to app.core. |
| tests/test_unified_models.py | Updates unified (dataclass) models import to app.core. |
| tests/test_unified_loader.py | Updates unified SQLModels import to app.core. |
| tests/test_state_loader.py | Updates unified state loader imports (including docs snippet). |
| tests/test_sqlmodels.py | Updates unified SQLModels + db_manager imports to app.core. |
| tests/test_single_record.py | Updates unified state loader import to app.core. |
| tests/test_simple_transaction.py | Updates unified SQLModels imports to app.core (including inner import). |
| tests/test_simple_address_dedup.py | Updates unified SQLModels imports to app.core (including inner import). |
| tests/test_session_address_dedup.py | Updates unified SQLModels imports to app.core (including inner import). |
| tests/test_officer_financial_activities.py | Updates unified SQLModels + db_manager imports to app.core. |
| tests/test_limited_load.py | Updates unified state loader import to app.core. |
| tests/test_committee_persons.py | Updates unified SQLModels + db_manager imports to app.core (including inner import). |
| tests/test_committee_creation.py | Updates unified state loader import to app.core. |
| tests/test_automatic_linking.py | Updates unified SQLModels + db_manager imports to app.core. |
| tests/test_address_dedup_fix.py | Updates unified SQLModels imports to app.core. |
| tests/test_address_dedup.py | Updates unified SQLModels import to app.core. |
| tests/simple_address_test.py | Updates unified SQLModels imports to app.core (including inner import). |
| tests/conftest.py | Updates field library fixture import to app.core. |
| scripts/loaders/unified_postgres_loader.py | Updates unified SQLModels import to app.core. |
| scripts/loaders/simple_loader.py | Updates unified SQLModels imports to app.core (including inner import). |
| scripts/loaders/rebuild_and_load_texas.py | Updates unified SQLModels import block to app.core. |
| scripts/loaders/production_loader.py | Updates field library + unified SQLModels imports to app.core. |
| scripts/loaders/optimized_loader.py | Updates unified SQLModels imports to app.core (including inner import). |
| scripts/loaders/load_working_files.py | Updates unified state loader import to app.core. |
| scripts/loaders/load_single_file_test.py | Updates unified state loader import to app.core. |
| scripts/loaders/load_oklahoma_test.py | Updates unified state loader import to app.core. |
| scripts/loaders/load_data_batched.py | Updates unified state loader import to app.core. |
| scripts/loaders/load_all_data_simple.py | Updates unified state loader import to app.core. |
| scripts/loaders/load_actual_data_to_postgres.py | Updates unified SQLModels import to app.core. |
| scripts/loaders/high_performance_loader.py | Updates unified SQLModels imports to app.core (including inner import). |
| scripts/loaders/fixed_loader.py | Updates unified SQLModels imports to app.core (including inner import). |
| scripts/debug/transaction_save.py | Updates unified SQLModels import to app.core. |
| scripts/debug/committee.py | Updates unified SQLModels import to app.core. |
| scripts/debug/address_dedup.py | Updates unified SQLModels imports to app.core (including inner import). |
| scripts/analysis/analyze_fields.py | Updates field library import to app.core. |
| docs/TESTING.md | Updates field library import example to app.core. |
| docs/RUNBOOK.md | Updates field library import example to app.core. |
| docs/DATA_DICTIONARY.md | Updates field library import examples to app.core. |
| app/states/texas/texas_downloader.py | Adds exponential backoff + jitter, timeout, and improved logging during download wait loops. |
| app/states/postgres_state_loader.py | Updates loader base class import to app.core.unified_state_loader. |
| app/states/postgres_config.py | Updates DB manager import to app.core.unified_database. |
| app/states/field_analyzer.py | Updates unified field library imports to app.core. |
| app/ingest/file_reader.py | Updates unified field library import to app.core. |
| app/core/unified_state_loader.py | Updates internal SQLModels import to app.core. |
| app/core/unified_sqlmodels.py | Updates internal db_manager imports to app.core. |
| app/core/unified_models.py | Adds dataclass-based unified models + builder/processor. |
| app/core/unified_integration.py | Adds integration utilities/examples for unified models usage. |
| app/core/unified_field_library.py | Adds unified field definitions and state mapping registry. |
| app/core/unified_database.py | Updates PostgresConfig import source to app.states. |
| app/core/init.py | Adds central exports for core unified modules. |
| CONTRIBUTING.md | Updates field library import example to app.core. |
| CLAUDE.md | Updates unified SQLModels import example to app.core. |
| AGENTS.md | Updates db_manager / field_library / unified SQLModels import examples to app.core. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from icecream import ic | ||
|
|
||
| from .unified_field_library import UnifiedFieldLibrary, FieldCategory, FieldType, FieldDefinition, StateFieldMapping | ||
| from app.core.unified_field_library import UnifiedFieldLibrary, FieldCategory, FieldType, FieldDefinition, StateFieldMapping |
There was a problem hiding this comment.
Import of 'FieldCategory' is not used.
Import of 'FieldType' is not used.
Import of 'StateFieldMapping' is not used.
| from app.core.unified_field_library import UnifiedFieldLibrary, FieldCategory, FieldType, FieldDefinition, StateFieldMapping | |
| from app.core.unified_field_library import UnifiedFieldLibrary, FieldDefinition |
| import polars as pl | ||
|
|
||
| from app.states.unified_field_library import FieldDefinition, FieldType, StateFieldMapping, field_library | ||
| from app.core.unified_field_library import FieldDefinition, FieldType, StateFieldMapping, field_library |
There was a problem hiding this comment.
Import of 'FieldDefinition' is not used.
| from app.core.unified_field_library import FieldDefinition, FieldType, StateFieldMapping, field_library | |
| from app.core.unified_field_library import FieldType, StateFieldMapping, field_library |
| from app.core.unified_sqlmodels import ( | ||
| unified_sql_processor, UnifiedTransaction, PersonRole, TransactionType, | ||
| CommitteeRole, UnifiedPerson, UnifiedCommittee, UnifiedTransactionPerson | ||
| ) |
There was a problem hiding this comment.
Import of 'unified_sql_processor' is not used.
| from app.core.unified_sqlmodels import ( | ||
| unified_sql_processor, UnifiedTransaction, PersonRole, TransactionType, | ||
| CommitteeRole | ||
| ) |
There was a problem hiding this comment.
Import of 'UnifiedTransaction' is not used.
Import of 'TransactionType' is not used.
Import of 'unified_sql_processor' is not used.
Import of 'PersonRole' is not used.
| from app.core.unified_sqlmodels import ( | ||
| unified_sql_processor, UnifiedTransaction, PersonRole, TransactionType, | ||
| CommitteeRole, UnifiedPerson, UnifiedCommittee, UnifiedTransactionPerson | ||
| ) |
There was a problem hiding this comment.
Import of 'unified_sql_processor' is not used.
| from app.core.unified_sqlmodels import ( | ||
| unified_sql_processor, UnifiedTransaction, PersonRole, TransactionType | ||
| ) |
There was a problem hiding this comment.
Import of 'UnifiedTransaction' is not used.
Import of 'PersonRole' is not used.
| from pathlib import Path | ||
| from icecream import ic | ||
| from app.states.unified_models import unified_processor, TransactionType, PersonType | ||
| from app.core.unified_models import unified_processor, TransactionType, PersonType |
There was a problem hiding this comment.
Import of 'TransactionType' is not used.
Import of 'PersonType' is not used.
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
tests/test_simple_transaction.py (1)
25-25: Multiple calls to UnifiedSQLModelBuilder missing required state_id argument.The
UnifiedSQLModelBuilderconstructor signature requiresstate_idas a positional argument:__init__(self, state: str, state_id: Optional[int], state_code: Optional[str] = None). This call only passesstate, which will raiseTypeError: __init__() missing 1 required positional argument: 'state_id'.🐛 Affected instances
- tests/test_simple_transaction.py:25
- tests/test_session_address_dedup.py
- tests/test_address_dedup.py
- tests/simple_address_test.py
- scripts/debug/transaction_save.py
- scripts/debug/committee.py
All use:
UnifiedSQLModelBuilder("oklahoma")Pass
state_id=Noneas a keyword argument or provide the state_id value:- builder = UnifiedSQLModelBuilder("oklahoma") + builder = UnifiedSQLModelBuilder("oklahoma", state_id=None)scripts/loaders/high_performance_loader.py (1)
106-111:NameErrorat runtime:UnifiedAddressandselectare not in scope.
UnifiedAddressandselectare imported insideprocess_batch_optimized(lines 125-126), but this code block inhigh_performance_loaderexecutes before that function and references both symbols without importing them at module level.🐛 Proposed fix: Add missing imports at module level
from app.core.unified_sqlmodels import unified_sql_processor +from app.core.unified_sqlmodels import UnifiedAddress +from sqlalchemy import select from app.states.postgres_config import create_postgres_database_managerscripts/debug/transaction_save.py (1)
24-25: Passstate_idtoUnifiedSQLModelBuilderconstructor.The
UnifiedSQLModelBuilder.__init__signature requiresstate_id: Optional[int]as a mandatory second parameter. Calling it with only"oklahoma"will raise aTypeErrorat runtime. Provide a value forstate_id(e.g.,state_id=1) or modify the constructor to makestate_idoptional with a default value.scripts/loaders/load_oklahoma_test.py (1)
37-38: Remove settingmax_records_per_fileandmax_filesattributes; these do not exist onUnifiedStateLoader.The
UnifiedStateLoaderclass does not definemax_records_per_fileormax_filesas instance attributes. Theload_state_data()method accepts amax_recordsparameter instead. Pass it directly to the method:result = loader.load_state_data( max_records=1000, auto_link_officers=True, create_relationships=True, progress_callback=... )Note: If you need separate per-file and total-file limits, the current API only supports a single
max_recordslimit per file. Consider whethermax_records=1000is sufficient, or if the API needs to be extended.CLAUDE.md (2)
78-85: Important Files table references stale file paths.The import example on line 69 correctly uses
app.core.unified_sqlmodels, but the Important Files table still references the old locations (app/states/unified_sqlmodels.pyandapp/states/unified_field_library.py). According to the PR objectives, these files were moved toapp/core/.📝 Suggested fix
| File | Purpose | |------|---------| -| `app/states/unified_sqlmodels.py` | Unified SQLModel implementations | -| `app/states/unified_field_library.py` | Cross-state field mapping | +| `app/core/unified_sqlmodels.py` | Unified SQLModel implementations | +| `app/core/unified_field_library.py` | Cross-state field mapping | | `app/ingest/file_reader.py` | Schema-driven CSV/file parsing | | `app/abcs/abc_category.py` | Core data processing ABC | | `scripts/loaders/production_loader.py` | Production data loader |
34-48: Architecture overview may need updating.The Architecture Overview section still shows
unified_*.pyfiles underapp/states/. If the unified modules have been moved toapp/core/as indicated by the PR objectives, consider updating this section to reflect the new structure by adding anapp/core/entry.scripts/debug/committee.py (1)
22-27: Constructor call will fail due to missing requiredstate_idargument.The
UnifiedSQLModelBuilderconstructor requires bothstateandstate_idas positional arguments:__init__(state: str, state_id: Optional[int], state_code: Optional[str] = None). The call on line 23 passes only the state name, which will raiseTypeError: __init__() missing 1 required positional argument: 'state_id'at runtime. Pass the state ID as the second argument to fix this.tests/test_session_address_dedup.py (1)
50-51: Missing requiredstate_idargument.
UnifiedSQLModelBuilder.__init__requiresstate_idas the second positional argument (signature:state: str, state_id: Optional[int], state_code: Optional[str] = None). This call will raiseTypeErrorat runtime.🐛 Proposed fix
# Create builder - builder = UnifiedSQLModelBuilder("oklahoma") + builder = UnifiedSQLModelBuilder("oklahoma", state_id=None)Or if a specific state ID is needed for Oklahoma:
# Create builder - builder = UnifiedSQLModelBuilder("oklahoma") + builder = UnifiedSQLModelBuilder("oklahoma", state_id=40, state_code="OK")
🤖 Fix all issues with AI agents
In `@app/core/__init__.py`:
- Around line 60-100: The __all__ list is not alphabetically sorted (Ruff
RUF022); reorder the entries alphabetically to satisfy static analysis while
preserving the existing logical groupings/comments (e.g., Field Library, Base
Models, SQLModels, Database, Loaders). Update the items such as "field_library",
"FieldCategory", "FieldDefinition", "FieldType", "StateFieldMapping",
"UnifiedFieldLibrary", and the other group members (e.g., "unified_processor",
"BaseUnifiedTransaction", "UnifiedTransaction", "UnifiedPerson", "db_manager",
"UnifiedStateLoader", etc.) so each group is alphabetized; keep the group
headers and formatting intact and ensure no duplicates are introduced.
- Around line 17-25: Add a short clarifying comment above the grouped imports in
app/core/__init__.py explaining that the aliases like UnifiedTransaction as
BaseUnifiedTransaction, TransactionType as BaseTransactionType, PersonType as
BasePersonType, UnifiedPerson as BaseUnifiedPerson, UnifiedAddress as
BaseUnifiedAddress refer to the original dataclass-based/base models (used for
core logic/serialization), while separately defined SQLModel/ORM-backed types
named UnifiedTransaction, UnifiedPerson, etc. exist for DB mapping; mention
unified_processor to indicate which module provides processing logic and note
that the `Base*` prefix denotes the non-ORM/base variants to avoid confusion for
consumers.
In `@app/core/unified_state_loader.py`:
- Around line 348-349: Remove the redundant inline import of UnifiedAddress
inside unified_state_loader.py (the local inline "from
app.core.unified_sqlmodels import UnifiedAddress") since UnifiedAddress is
already imported at module level; keep the sqlalchemy select import if needed
and only remove the duplicated UnifiedAddress import to avoid redundancy.
In `@app/states/texas/texas_downloader.py`:
- Around line 77-78: The initial polling logic that does
time.sleep(exponential_backoff(...)) then breaks if no .crdownload exists should
be changed to actively wait (with exponential_backoff) until either a new zip
file appears or a .crdownload (in-progress) file is observed, while still
enforcing the existing timeout; update the code around exponential_backoff(),
the loop that checks download_dir / target zip (e.g., any function or block
referencing download_dir, target_zip_name or find_latest_zip()), and the
start_time/timeout_seconds checks so the loop only exits on timeout or when the
zip is actually ready, and apply the same fix to the analogous polling block
referenced later (lines 93-118).
In `@scripts/loaders/high_performance_loader.py`:
- Around line 125-126: Remove the redundant in-function imports that duplicate
module-level imports: delete the local imports of UnifiedAddress and
UnifiedCommittee from app.core.unified_sqlmodels and the select import from
sqlalchemy (the ones inside the function) and rely on the existing top-level
imports instead; ensure no other references rely on those local import lines and
run tests/style checks to confirm cleanup.
In `@scripts/loaders/optimized_loader.py`:
- Around line 126-127: The imports for UnifiedAddress and select are inside the
inner loop of process_batch causing repeated import lookup; move "from
app.core.unified_sqlmodels import UnifiedAddress" and "from sqlalchemy import
select" out of the inner loop to the top of the module or top of process_batch,
then remove the in-loop import statements so the loop directly references
UnifiedAddress and select.
In `@tests/test_session_address_dedup.py`:
- Around line 68-69: Move the two imports "from app.core.unified_sqlmodels
import UnifiedAddress" and "from sqlalchemy import select" out of the
loop/function body and add them at the module top with the other imports; then
remove the in-function duplicates (the imports currently inside the loop and the
later duplicated import in that test function) so the test uses the top-level
UnifiedAddress and select references. Ensure there are no circular-import issues
when relocating them.
| from .unified_models import ( | ||
| unified_processor, | ||
| UnifiedTransaction as BaseUnifiedTransaction, | ||
| TransactionType as BaseTransactionType, | ||
| PersonType as BasePersonType, | ||
| UnifiedPerson as BaseUnifiedPerson, | ||
| UnifiedAddress as BaseUnifiedAddress, | ||
| UnifiedDataProcessor, | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider documenting the distinction between Base* and SQLModel types.
The aliasing pattern (UnifiedTransaction as BaseUnifiedTransaction, etc.) distinguishes dataclass-based models from SQLModel-based models. This is a reasonable approach, but consumers may find it confusing that BaseUnifiedTransaction and UnifiedTransaction are different classes. Consider adding a brief comment explaining this distinction.
📝 Proposed documentation
+# Base models are dataclass-based (from unified_models), used for in-memory processing
+# SQLModel-based equivalents (from unified_sqlmodels) are for database persistence
from .unified_models import (
unified_processor,
UnifiedTransaction as BaseUnifiedTransaction,
TransactionType as BaseTransactionType,
PersonType as BasePersonType,
UnifiedPerson as BaseUnifiedPerson,
UnifiedAddress as BaseUnifiedAddress,
UnifiedDataProcessor,
)🤖 Prompt for AI Agents
In `@app/core/__init__.py` around lines 17 - 25, Add a short clarifying comment
above the grouped imports in app/core/__init__.py explaining that the aliases
like UnifiedTransaction as BaseUnifiedTransaction, TransactionType as
BaseTransactionType, PersonType as BasePersonType, UnifiedPerson as
BaseUnifiedPerson, UnifiedAddress as BaseUnifiedAddress refer to the original
dataclass-based/base models (used for core logic/serialization), while
separately defined SQLModel/ORM-backed types named UnifiedTransaction,
UnifiedPerson, etc. exist for DB mapping; mention unified_processor to indicate
which module provides processing logic and note that the `Base*` prefix denotes
the non-ORM/base variants to avoid confusion for consumers.
| __all__ = [ | ||
| # Field Library | ||
| "field_library", | ||
| "FieldCategory", | ||
| "FieldType", | ||
| "FieldDefinition", | ||
| "StateFieldMapping", | ||
| "UnifiedFieldLibrary", | ||
| # Base Models (dataclass-based) | ||
| "unified_processor", | ||
| "BaseUnifiedTransaction", | ||
| "BaseTransactionType", | ||
| "BasePersonType", | ||
| "BaseUnifiedPerson", | ||
| "BaseUnifiedAddress", | ||
| "UnifiedDataProcessor", | ||
| # SQLModels | ||
| "unified_sql_processor", | ||
| "UnifiedTransaction", | ||
| "UnifiedPerson", | ||
| "UnifiedAddress", | ||
| "UnifiedCommittee", | ||
| "UnifiedTransactionPerson", | ||
| "UnifiedCommitteePerson", | ||
| "UnifiedTransactionVersion", | ||
| "UnifiedPersonVersion", | ||
| "UnifiedAddressVersion", | ||
| "UnifiedCommitteeVersion", | ||
| "UnifiedCommitteePersonVersion", | ||
| "TransactionType", | ||
| "PersonType", | ||
| "PersonRole", | ||
| "CommitteeRole", | ||
| "State", | ||
| # Database | ||
| "db_manager", | ||
| "UnifiedDatabaseManager", | ||
| # Loaders | ||
| "UnifiedStateLoader", | ||
| "UnifiedStateProcessor", | ||
| ] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider sorting __all__ for consistency.
Static analysis (Ruff RUF022) flags that __all__ is not sorted. While the current grouping by category is logical and readable, sorting alphabetically within each category or applying isort-style sorting can improve maintainability and reduce merge conflicts.
🧰 Tools
🪛 Ruff (0.14.14)
60-100: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
🤖 Prompt for AI Agents
In `@app/core/__init__.py` around lines 60 - 100, The __all__ list is not
alphabetically sorted (Ruff RUF022); reorder the entries alphabetically to
satisfy static analysis while preserving the existing logical groupings/comments
(e.g., Field Library, Base Models, SQLModels, Database, Loaders). Update the
items such as "field_library", "FieldCategory", "FieldDefinition", "FieldType",
"StateFieldMapping", "UnifiedFieldLibrary", and the other group members (e.g.,
"unified_processor", "BaseUnifiedTransaction", "UnifiedTransaction",
"UnifiedPerson", "db_manager", "UnifiedStateLoader", etc.) so each group is
alphabetized; keep the group headers and formatting intact and ensure no
duplicates are introduced.
| from app.core.unified_sqlmodels import UnifiedAddress | ||
| from sqlalchemy import select |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Redundant import: UnifiedAddress is already imported at module level.
UnifiedAddress is already imported at the top of the file (line 17) from .unified_sqlmodels. This inline import is unnecessary and can be removed.
♻️ Suggested fix
# Check if address already exists in session
- from app.core.unified_sqlmodels import UnifiedAddress
from sqlalchemy import select📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from app.core.unified_sqlmodels import UnifiedAddress | |
| from sqlalchemy import select | |
| from sqlalchemy import select |
🤖 Prompt for AI Agents
In `@app/core/unified_state_loader.py` around lines 348 - 349, Remove the
redundant inline import of UnifiedAddress inside unified_state_loader.py (the
local inline "from app.core.unified_sqlmodels import UnifiedAddress") since
UnifiedAddress is already imported at module level; keep the sqlalchemy select
import if needed and only remove the duplicated UnifiedAddress import to avoid
redundancy.
| # Initial wait for download to start | ||
| time.sleep(exponential_backoff(0, base=2.0, max_delay=10.0)) |
There was a problem hiding this comment.
Avoid exiting the polling loop before the download actually starts.
If no .crdownload exists after the initial sleep, the loop breaks immediately, bypassing the timeout and potentially using a stale zip or raising FileNotFoundError under slow starts. Consider waiting until either a new zip appears or a .crdownload is observed, with backoff and the existing timeout.
🛠️ Proposed fix (wait for download to start or finish)
- # Initial wait for download to start
- time.sleep(exponential_backoff(0, base=2.0, max_delay=10.0))
+ download_start_time = time.time()
+ # Initial wait for download to start
+ time.sleep(exponential_backoff(0, base=2.0, max_delay=10.0))
@@
- poll_attempt = 0
- download_start_time = time.time()
+ poll_attempt = 0
while True:
# Check for timeout
elapsed = time.time() - download_start_time
if elapsed > DOWNLOAD_TIMEOUT_SECONDS:
raise TimeoutError(f"Download timed out after {DOWNLOAD_TIMEOUT_SECONDS // 60} minutes")
dl_files = list(tmp.glob("*.crdownload"))
+ zip_files = list(tmp.glob("*.zip"))
+ latest_zip = max(zip_files, key=lambda f: f.stat().st_mtime) if zip_files else None
if dl_files:
if not in_progress:
progress.update_task(_task2, "Still in progress")
in_progress = True
# Use exponential backoff for polling (2s to 60s)
poll_delay = exponential_backoff(poll_attempt, base=2.0, max_delay=60.0)
poll_attempt = min(poll_attempt + 1, 10) # Cap attempts to limit max delay
logger.debug("Download in progress, waiting", extra={"delay_seconds": round(poll_delay, 2), "elapsed_minutes": round(elapsed / 60, 1)})
time.sleep(poll_delay)
- else:
- if in_progress:
- progress.update_task(_task2, "Download Complete")
- logger.info("Download completed", extra={"elapsed_minutes": round(elapsed / 60, 1)})
- _safe_to_delete = True
- break
+ elif latest_zip and latest_zip.stat().st_mtime >= download_start_time:
+ progress.update_task(_task2, "Download Complete")
+ logger.info("Download completed", extra={"elapsed_minutes": round(elapsed / 60, 1)})
+ _safe_to_delete = True
+ break
+ else:
+ poll_delay = exponential_backoff(poll_attempt, base=2.0, max_delay=10.0)
+ poll_attempt = min(poll_attempt + 1, 10)
+ logger.debug(
+ "Waiting for download to start",
+ extra={"delay_seconds": round(poll_delay, 2), "elapsed_minutes": round(elapsed / 60, 1)},
+ )
+ time.sleep(poll_delay)Also applies to: 93-118
🤖 Prompt for AI Agents
In `@app/states/texas/texas_downloader.py` around lines 77 - 78, The initial
polling logic that does time.sleep(exponential_backoff(...)) then breaks if no
.crdownload exists should be changed to actively wait (with exponential_backoff)
until either a new zip file appears or a .crdownload (in-progress) file is
observed, while still enforcing the existing timeout; update the code around
exponential_backoff(), the loop that checks download_dir / target zip (e.g., any
function or block referencing download_dir, target_zip_name or
find_latest_zip()), and the start_time/timeout_seconds checks so the loop only
exits on timeout or when the zip is actually ready, and apply the same fix to
the analogous polling block referenced later (lines 93-118).
| from app.core.unified_sqlmodels import UnifiedAddress, UnifiedCommittee | ||
| from sqlalchemy import select |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Import path update is correct, but imports are duplicated.
The import path change to app.core.unified_sqlmodels is consistent with the refactor. However, once the module-level imports are added (as suggested above), these in-function imports become redundant and can be removed for cleaner code.
🤖 Prompt for AI Agents
In `@scripts/loaders/high_performance_loader.py` around lines 125 - 126, Remove
the redundant in-function imports that duplicate module-level imports: delete
the local imports of UnifiedAddress and UnifiedCommittee from
app.core.unified_sqlmodels and the select import from sqlalchemy (the ones
inside the function) and rely on the existing top-level imports instead; ensure
no other references rely on those local import lines and run tests/style checks
to confirm cleanup.
| from app.core.unified_sqlmodels import UnifiedAddress | ||
| from sqlalchemy import select |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Move imports outside the loop to avoid repeated import overhead.
The imports for UnifiedAddress and select are inside the inner loop of process_batch. While Python caches imports, placing them at the function or module level improves clarity and avoids unnecessary lookup overhead on each iteration.
♻️ Proposed fix
Move imports to the top of the function:
def process_batch(records, db_manager, progress, record_task):
"""Process a batch of records"""
+ from app.core.unified_sqlmodels import UnifiedAddress
+ from sqlalchemy import select
+
batch_success = 0
batch_errors = 0
with db_manager.get_session() as session:
for i, record in enumerate(records):
try:
# Use unified processor
transaction = unified_sql_processor.process_record(record, "oklahoma")
if transaction:
# Save committee first if it exists
if transaction.committee:
session.merge(transaction.committee)
session.flush()
# Save addresses for all persons with deduplication
for tx_person in transaction.persons:
if tx_person.person and tx_person.person.address:
# Check if address already exists in database
- from app.core.unified_sqlmodels import UnifiedAddress
- from sqlalchemy import select
existing_address = session.exec(🤖 Prompt for AI Agents
In `@scripts/loaders/optimized_loader.py` around lines 126 - 127, The imports for
UnifiedAddress and select are inside the inner loop of process_batch causing
repeated import lookup; move "from app.core.unified_sqlmodels import
UnifiedAddress" and "from sqlalchemy import select" out of the inner loop to the
top of the module or top of process_batch, then remove the in-loop import
statements so the loop directly references UnifiedAddress and select.
| from app.core.unified_sqlmodels import UnifiedAddress | ||
| from sqlalchemy import select |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Move imports to module level.
The import path update is correct, but placing imports inside the loop body is a code smell. Move these to the top of the file alongside other imports for better readability and conventional structure.
♻️ Suggested refactor
At the top of the file:
from app.core.unified_sqlmodels import UnifiedSQLModelBuilder
+from app.core.unified_sqlmodels import UnifiedAddress
from app.states.postgres_config import create_postgres_database_manager
+from sqlalchemy import select, textThen remove lines 68-69 and 96 from inside the function.
🤖 Prompt for AI Agents
In `@tests/test_session_address_dedup.py` around lines 68 - 69, Move the two
imports "from app.core.unified_sqlmodels import UnifiedAddress" and "from
sqlalchemy import select" out of the loop/function body and add them at the
module top with the other imports; then remove the in-function duplicates (the
imports currently inside the loop and the later duplicated import in that test
function) so the test uses the top-level UnifiedAddress and select references.
Ensure there are no circular-import issues when relocating them.
|
✅ All tests passed. |

Exponential Backoff for Texas Downloader and Core Module Refactoring
Added exponential backoff to app/states/texas/texas_downloader.py:
Phase 3: Architecture Refactor ✅
Moved unified modules to app/core/:
Verification
Summary by CodeRabbit
New Features
Improvements
Chores
✏️ Tip: You can customize this high-level summary in your review settings.