Re-branching after screwing up last PR. by jreakin · Pull Request #4 · Abstract-Data/campaignfinance-2023

jreakin · 2025-01-18T01:46:05Z

This is just the start of a new PR after I messed up the last one.

Summary by CodeRabbit

New Features
- Parquet read/write support, Ohio downloader, Texas search workflow, terminal live progress tracker, unified PostgreSQL-backed data manager, and a production batch loader.
Documentation
- README expanded with package overview, examples, and built-in data download notes.
Refactor
- Dataflow moved to dataframe-first processing; Texas schemas, validators, and models reorganized for modularity.
Chores
- VCS ignore/LFS rules, IDE and CI configs, and dependency metadata updated.
Tests
- Added hypothesis-driven ingestion and consolidation tests.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Copilot

Pull Request Overview

This PR adds a unified data model system for campaign finance data across multiple states, incorporating PostgreSQL support with SQLModel/SQLAlchemy ORM. The changes include dependency updates, new infrastructure for data loading and processing, and significant refactoring of Texas state validators.

Key changes:

Introduced unified SQLModel-based data models for cross-state campaign finance data normalization
Added production-ready data loader with batch processing and deduplication capabilities
Updated dependencies including pyarrow, selenium, icecream, and added web-scraping utilities
Refactored Texas validators to use base models and reduce code duplication
Added database management utilities for PostgreSQL operations

Reviewed Changes

Copilot reviewed 51 out of 64 changed files in this pull request and generated 35 comments.

Show a summary per file

File	Description
uv.lock	Updated package versions and added new dependencies (pyarrow, web-scrape-utils, websockets, undetected-chromedriver)
pyproject.toml	Added pyarrow and web-scrape-utils as dependencies
app/states/unified_sqlmodels.py	New comprehensive unified data model definitions with SQLModel/SQLAlchemy
app/states/unified_database.py	New database manager for PostgreSQL operations
production_loader.py	New production data loader with batch processing
recreate_tables.py	Utility script for recreating database tables
app/states/texas/validators/*.py	Refactored validators to use base models and reduce duplication
app/states/texas/texas_downloader.py	Updated to use web-scraping utilities and polars
app/states/texas/texas_search.py	Refactored search functionality with polars DataFrames
tmp/texas/*.parquet	Binary data files (21 parquet files added)
tmp/.DS_Store	macOS system file added

Files not reviewed (4)

.idea/campaignfinance.iml: Language not supported
.idea/csv-editor.xml: Language not supported
.idea/misc.xml: Language not supported
.idea/ruff.xml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-09T02:21:14Z

app/states/texas/validators/texas_filers.py

+# class TECFilerCTALinkModel(TECBaseModel, table=True):
+#     __tablename__ = "tx_filer_cta_link"
+#     __table_args__ = {"schema": "texas"}
+#     filerIdent: int = Field(foreign_key="texas.tx_filer_name.filerIdent", primary_key=True)
+#     ctaSeekOfficeCd: Optional[str] = Field(foreign_key="texas.tx_cta.id", primary_key=True)
+


This comment appears to contain commented-out code.

Suggested change

# class TECFilerCTALinkModel(TECBaseModel, table=True):

# __tablename__ = "tx_filer_cta_link"

# __table_args__ = {"schema": "texas"}

# filerIdent: int = Field(foreign_key="texas.tx_filer_name.filerIdent", primary_key=True)

# ctaSeekOfficeCd: Optional[str] = Field(foreign_key="texas.tx_cta.id", primary_key=True)

Copilot · 2025-11-09T02:21:14Z

app/states/texas/validators/texas_filers.py

+# class TECFilerTreasurerLinkModel(TECBaseModel, table=True):
+#     __tablename__ = "tx_filer_treasurer_link"
+#     __table_args__ = {"schema": "texas"}
+#     filer_name_id: int = Field(foreign_key="texas.tx_filer_name.filerIdent", primary_key=True)
+#     treasurer_id: Optional[str] = Field(foreign_key="texas.tx_treasurer.id", primary_key=True)
+#     assttreas_id: Optional[str] = Field(foreign_key="texas.tx_assistant_treasurer.id", primary_key=True)
+#     chair_id: Optional[str] = Field(foreign_key="texas.tx_chair.id", primary_key=True)



This comment appears to contain commented-out code.

Suggested change

# class TECFilerTreasurerLinkModel(TECBaseModel, table=True):

# __tablename__ = "tx_filer_treasurer_link"

# __table_args__ = {"schema": "texas"}

# filer_name_id: int = Field(foreign_key="texas.tx_filer_name.filerIdent", primary_key=True)

# treasurer_id: Optional[str] = Field(foreign_key="texas.tx_treasurer.id", primary_key=True)

# assttreas_id: Optional[str] = Field(foreign_key="texas.tx_assistant_treasurer.id", primary_key=True)

# chair_id: Optional[str] = Field(foreign_key="texas.tx_chair.id", primary_key=True)

Copilot · 2025-11-09T02:21:14Z

app/states/texas/validators/texas_personname.py

+    # @model_validator(mode='before')
+    # @classmethod
+    # def store_raw(cls, values):
+    #     values['raw_input'] = {k: v for k, v in values.items()}
+    #     return values



This comment appears to contain commented-out code.

Suggested change

# @model_validator(mode='before')

# @classmethod

# def store_raw(cls, values):

# values['raw_input'] = {k: v for k, v in values.items()}

# return values

Copilot · 2025-11-09T02:21:15Z

app/states/texas/validators/texas_settings.py

+    # def __init__(self, **data):
+    #     _flags = TECFlags(**data)
+    #     if len(_flags.model_dump(exclude_none=True).values()) > 0:
+    #         self.flags = _flags
+    #     super().__init__(**data)


This comment appears to contain commented-out code.

Copilot · 2025-11-09T02:21:15Z

app/states/unified_database.py

+                unlinked_transactions = session.exec(
+                    select(UnifiedTransactionPerson).where(
+                        UnifiedTransactionPerson.person_id == cp.person_id,
+                        UnifiedTransactionPerson.committee_person_id == None


Testing for None should use the 'is' operator.

Copilot · 2025-11-09T02:21:22Z

production_loader.py

+            if original_address is not ensured_address and original_address is not None:
+                try:
+                    session.expunge(original_address)
+                except InvalidRequestError:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2025-11-09T02:21:22Z

production_loader.py

+                        if original_address is not ensured_address and original_address is not None:
+                            try:
+                                session.expunge(original_address)
+                            except InvalidRequestError:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2025-11-09T02:21:23Z

production_loader.py

+                        if original_person is not ensured_person:
+                            try:
+                                session.expunge(original_person)
+                            except InvalidRequestError:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2025-11-09T02:21:23Z

app/states/texas/texas_search.py

+        result = self.choose_options(result)
+        if not result.selected_choices:
+            return result
+        result.selected_choices = result.selected_choices


This assignment assigns a variable to itself.

Suggested change

result.selected_choices = result.selected_choices

Copilot · 2025-11-09T02:21:23Z

app/states/unified_sqlmodels.py

+            }
+            if type_str in type_synonyms:
+                return type_synonyms[type_str]
+            for transaction_type in TransactionType:


This for-loop may attempt to iterate over a non-iterable instance of class type.

- Modified app/funcs/__init__.py and csv_reader.py - Updated Texas state modules (downloader, search, validators) - Updated unified_database.py - Updated test_validation_class.py - Updated pyproject.toml and uv.lock - Removed parquet files from git tracking (now in .gitignore)

coderabbitai · 2025-12-04T01:33:23Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds Parquet/TXT readers and Polars-based data flows, a Rich-based ProgressTracker, Texas downloader/search/consolidation refactors, an Ohio downloader scaffold, a unified SQLModel schema and DB manager, a production batch ingestion pipeline with deduplication, extensive validator/model restructures, and project/IDE/dependency updates.

Changes

Cohort / File(s)	Summary
Repo config & IDE `\.gitattributes`, `\.gitignore`, `pyproject.toml`, `.idea/*`	Replace LFS patterns (add `.parquet`, remove `.zip`/`.csv`), reintroduce `.parquet` and `/tmp` in `.gitignore`, add runtime/dev deps (psycopg2-binary, web-scrape-utils, pyarrow, hypothesis, pytest), update IDE SDK/exclusions and add Ruff config.
Progress UI `app/live_display.py`	New `ProgressTracker` class: thread-safe Rich Live terminal task table with start/stop, add/update/remove tasks, background refresh and print API.
ABCs & download helpers `app/abcs/*` `app/abcs/__init__.py`, `app/abcs/abc_download.py`, `app/abcs/abc_category.py`, `app/abcs/abc_db_loader.py`, `app/abcs/abc_validation.py`, `app/abcs/abc_validation_errors.py`	Migrate imports to `app.*`, expose module-level `progress` (started at import), convert FileDownloaderABC to use `ClassVar` driver and classmethod interfaces, add ZIP extraction and CSV→Parquet helpers, and add headless control helper.
File readers & validators `app/funcs/*` `app/funcs/__init__.py`, `app/funcs/csv_reader.py`, `app/funcs/validator_functions.py`, `app/funcs/db_loader.py`	Switch to package-scoped imports; add Parquet and TXT readers (lazy + fallback), per-file try/except handling, metadata wrappers (file_origin/download_date), and new `check_contains_factory` validator factory.
Texas downloader, consolidation & search `app/states/texas/*` `app/states/texas/texas_downloader.py`, `app/states/texas/texas_search.py`, `app/states/texas/__init__.py`	Refactor TECDownloader to classmethod/shared-driver flow with headless control, safe cleanup, ZIP extraction and CSV→Parquet conversion, per-category consolidation (align columns, stack, write consolidated parquet), and add Polars-based TexasSearch pipeline and types; update imports and validator references.
Texas models & validators (large restructure) `app/states/texas/models.py`, `app/states/texas/validators/*`	Add `TECBaseModel` and `check_contains_factory`; add flags models; refactor person/address/filer/expense models into base vs table variants, add link models and FK updates, consolidate payee handling, and change exported validator symbols.
Ohio downloader `app/states/ohio/ohio_downloader.py`	New `OhioDownloader` with entry URL and initial driver-based FTP discovery/download scaffold; placeholders for consolidate/read.
Unified schema & DB manager `app/states/unified_sqlmodels.py`, `app/states/unified_database.py`, `recreate_tables.py`	New unified SQLModel layer (many `Unified*` entities, versions, associations), builder/processor for mapping state input to unified models, and `UnifiedDatabaseManager` with engine/session management, ingest/query APIs and recreate utility.
Production ingestion `production_loader.py`	New `ProductionLoader` with `LoaderConfig`/`LoaderStats`: batch processing, in-memory caches, deduplication (`_dedupe_addresses`, `_dedupe_persons_and_entities`), upserts, DB interactions, progress UI and summary display.
Main, docs & logs `app/main.py`, `README.md`, `app/logs/*`	`app/main.py` converted to a Polars/pandas dataframe-centric processing path; README updated; new log file added.
Tests & housekeeping `app/tests/*`, removed `app/workflows/texas_download.py`	Add Hypothesis-driven ingestion/consolidation tests and update validation tests for injected validator style; remove a commented example workflow file.
Field library & unified helpers `app/states/unified_field_library.py`	New `UnifiedFieldLibrary` with FieldDefinition/StateFieldMapping, category/type enums, baked-in state mappings, and JSON import/export utilities.
Misc import path updates `app/states/oklahoma/*`, many state validators and modules	Numerous relative/local imports converted to absolute `app.*` package imports across state modules and validators; behavior unchanged beyond import resolution.

Sequence Diagram(s)

sequenceDiagram
    actor CLI as Client/CLI
    participant Loader as ProductionLoader
    participant Reader as GenericFileReader
    participant Processor as UnifiedSQLDataProcessor
    participant DB as UnifiedDatabaseManager
    participant Session as DB Session

    CLI->>Loader: load_file(file_path, state)
    Loader->>DB: load_existing_data(session)
    Loader->>Reader: _get_reader(state)
    Reader-->>Loader: reader

    loop per batch
        Loader->>Reader: read batch
        Reader-->>Loader: records
        Loader->>Processor: process_records(records, state)
        Processor-->>Loader: UnifiedTransaction objects
        Loader->>Loader: dedupe/cache & manage associations
        Loader->>DB: save_transactions(transactions)
        DB->>Session: persist & commit
        Session-->>DB: commit result
        Loader->>Loader: update progress
    end

    Loader->>Loader: _dedupe_persons_and_entities(session)
    Loader->>CLI: display_results()

sequenceDiagram
    actor User as Client
    participant TEC as TECDownloader
    participant Driver as CreateWebDriver
    participant Progress as ProgressTracker
    participant Extract as Zip/Parquet Processor
    participant FS as Filesystem

    User->>TEC: download(read_from_temp, headless)
    TEC->>Progress: add_task("cleanup")
    TEC->>FS: safe temp cleanup
    Progress->>Progress: update_task("cleanup","completed")

    TEC->>Driver: CreateWebDriver(...)
    Driver-->>TEC: driver instance
    TEC->>Progress: add_task("download")
    TEC->>Driver: navigate & click to download
    Driver-->>FS: save zip
    TEC->>Progress: update_task("download","completed")

    TEC->>Extract: extract_zipfile(zip, tmp)
    Extract->>Extract: _process_csv -> convert CSV→Parquet
    Extract-->>FS: write parquet files
    TEC->>TEC: consolidate_files() (align columns & stack)
    TEC->>Progress: stop()

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas needing focused review:
- app/states/unified_sqlmodels.py and app/states/unified_database.py — FK correctness, indexes, versioning, builder/processor mapping and JSON/raw storage.
- production_loader.py — batch atomicity, deduplication correctness, reference rewiring, caching and error handling.
- app/states/texas/texas_downloader.py and app/funcs/csv_reader.py — Parquet consolidation/read/write, column alignment, lazy-read fallbacks, and metadata handling.
- app/states/texas/validators and models — model/table separation, Before/field validators, FK string targets, and link models.
- app/abcs/abc_download.py — module-level side effects (ProgressTracker.start()) and shared driver lifecycle as ClassVar.

Poem

🐇
I hopped through schemas, columns neat and bright,
Parquets stacked softly in the cool night light.
Texas names and Ohio's FTP file,
Loaders and trackers marched on for a while.
A rabbit applauds — pipelines humming with style.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 48.91% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title is vague and self-referential, using informal language ('screwing up') that doesn't describe the actual technical changes in the changeset.	Replace with a specific, professional title that describes the primary changes, such as 'Migrate to Polars-based data processing and unified ORM models' or similar.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jan25-polars

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95b4304 and eaedb1d.

📒 Files selected for processing (1)

.github/workflows/ci.yml (1 hunks)

🧰 Additional context used

🪛 GitHub Check: CodeQL

.github/workflows/ci.yml

[warning] 9-40: Workflow does not contain permissions
Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {{contents: read}}

🪛 YAMLlint (1.37.1)

.github/workflows/ci.yml

[warning] 3-3: truthy value should be one of [false, true]

(truthy)

[error] 41-41: too many blank lines (1 > 0)

(empty-lines)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test
GitHub Check: test

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 81

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)

.idea/misc.xml (1)
3-9: Inconsistent SDK references between components.

The Black component (line 7) still references the old "Poetry (campaignfinance) (2)" SDK, while the ProjectRootManager (line 9) has been updated to "Python 3.12 UV (campaignfinance)". This inconsistency may cause Black to fail or use the wrong Python environment.

Apply this diff to align the Black SDK reference:
   <component name="Black">
     <option name="enabledOnReformat" value="true" />
     <option name="executionMode" value="BINARY" />
     <option name="pathToExecutable" value="/opt/homebrew/bin/black" />
-    <option name="sdkName" value="Poetry (campaignfinance) (2)" />
+    <option name="sdkName" value="Python 3.12 UV (campaignfinance)" />
   </component>
app/tests/test_validation_class.py (1)
3-25: Tests are incompatible with the StateFileValidation API.

Several issues will cause these tests to fail:

StateFileValidation is an abstract class (abc.ABC) and cannot be instantiated directly

validate_record(record) takes only one argument, not (record, MockModel)

validate(records) takes only one argument, not (records, MockModel)

The validator_to_use should be a class attribute, not passed to methods

Based on app/abcs/abc_validation.py, the tests need to create a concrete subclass:
-@given(st.dictionaries(st.text(), st.text()))
-def test_validate_record(record):
-    validator = StateFileValidation()
-    result = validator.validate_record(record, MockModel)
+class TestValidator(StateFileValidation):
+    validator_to_use = MockModel
+
+@given(st.dictionaries(st.text(), st.text()))
+def test_validate_record(record):
+    validator = TestValidator()
+    result = validator.validate_record(record)
     assert result[0] in ['passed', 'failed']
app/states/texas/validators/texas_filers.py (1)

186-638: Large block of commented-out legacy code.

Lines 186-638 contain extensive commented-out code. If this code serves as a reference during refactoring, consider documenting the migration plan or removing it entirely, relying on version control for history.
app/states/texas/validators/texas_address.py (1)
83-86: Critical logic bug: condition is always True.

The expression if "PO" or "PO BOX" in _address: is always truthy because "PO" (a non-empty string) is truthy by itself. This means the PO BOX parsing logic runs for every address, not just PO boxes.
-        if "PO" or "PO BOX" in _address:
+        if "PO" in _address or "PO BOX" in _address:
app/states/texas/validators/texas_expenses.py (1)
235-246: Incorrect field name in error message.

The validator validates expendDescr and expendDt, but the error message on line 243 incorrectly references 'filerName'. This will confuse users when validation fails.
     @field_validator('expendDescr', 'expendDt', mode='before')
     @classmethod
     def validate_required_fields(cls, v):
         if v == "" or v is None:
             raise PydanticCustomError(
                 'missing_required_value',
                 "Field is required",
                 {
-                    'column': 'filerName',
+                    'column': 'expendDescr/expendDt',
                     'value': v
                 }
             )
         return v
Alternatively, use Pydantic's validation info to get the actual field name dynamically.
app/states/texas/validators/texas_personname.py (2)
74-75: Potential AttributeError if no key contains 'NamePrefixCd'.

If _key_pfx is None, calling .split() on it will raise an AttributeError. Add a guard or handle the None case.
     @model_validator(mode='before')
     @classmethod
     def set_type_prefix(cls, values):
         _key_pfx = next((key for key in values.keys() if 'NamePrefixCd' in key), None)
+        if not _key_pfx:
+            values['field_pfx'] = None
+            return values
         values['field_pfx'] = _key_pfx.split('NamePrefixCd')[0]
         return values
92-94: Same potential AttributeError for _pfx in validate_address.

If no key contains 'NamePrefixCd', _pfx will be None, and line 94 will fail when calling .split().
         _pfx = next((key for key in values.keys() if 'NamePrefixCd' in key), None)
-        if _pfx:
-            _pfx = _pfx.split('NamePrefixCd')[0]
+        if not _pfx:
+            return values
+        _pfx = _pfx.split('NamePrefixCd')[0]

♻️ Duplicate comments (20)

app/main.py (2)

4-5: Unused import re.

The re module is imported but not used anywhere in the active code.

30-32: Unused imports: Path and contextmanager.

Path and contextmanager are imported but not used in the active code. The polars import (pl) is used and should be kept.

app/states/texas/validators/texas_filers.py (2)

60-65: Commented-out code.

This block contains commented-out code that should be removed or documented.

89-96: Commented-out code.

This block contains commented-out code that should be removed or documented.

production_loader.py (4)

602-605: Empty except clause silently swallows InvalidRequestError.

The except InvalidRequestError: pass silently ignores the error without logging. Add a comment explaining why this is expected, or log at debug level.

628-631: Empty except clause silently swallows InvalidRequestError.

775-778: Empty except clause silently swallows InvalidRequestError.

796-799: Empty except clause silently swallows InvalidRequestError.
app/states/unified_database.py (3)
1078-1085: Use is None for None comparisons.

Testing for None should use the is operator rather than ==.

This was flagged in a previous review. Apply:
-                        UnifiedTransactionPerson.committee_person_id == None
+                        UnifiedTransactionPerson.committee_person_id.is_(None)
1176-1184: Use is None for None comparisons.

Testing for None should use the is operator rather than ==.

This was flagged in a previous review. Apply:
-                        UnifiedTransactionPerson.committee_person_id == None
+                        UnifiedTransactionPerson.committee_person_id.is_(None)
16-22: Unused import: PersonType.

PersonType is imported but not used in this file.
 from .unified_sqlmodels import (
     UnifiedTransaction, UnifiedPerson, UnifiedAddress, UnifiedCommittee,
-    UnifiedTransactionPerson, PersonRole, TransactionType, PersonType,
+    UnifiedTransactionPerson, PersonRole, TransactionType,
     unified_sql_processor, UnifiedTransactionVersion,
     UnifiedPersonVersion, UnifiedCommitteeVersion, UnifiedAddressVersion,
     UnifiedCommitteePerson, UnifiedCommitteePersonVersion, CommitteeRole, State
 )
app/states/texas/validators/texas_settings.py (1)

51-57: Remove commented-out code.

This commented-out code for flags handling should be removed. If it's intended for future implementation, track it in an issue instead.
app/states/ohio/ohio_downloader.py (1)
2-4: Unused imports detected.

CategoryTypes, RecordGen, CategoryConfig, progress, and CreateWebDriver are imported but not used in this file.

Remove unused imports:
-from abcs import (
-    FileDownloaderABC, StateConfig, CategoryTypes, RecordGen, CategoryConfig, progress, CSVReaderConfig)
-from web_scrape_utils import CreateWebDriver, By
+from abcs import FileDownloaderABC, StateConfig, CSVReaderConfig
+from web_scrape_utils import By
app/states/texas/models.py (1)

4-9: Multiple unused imports.

BaseModel, ConfigDict, model_validator (line 4), SQLModel (line 5), tx_funcs (line 7), and RecordKeyGenerator (line 9) are imported but not used.
app/states/texas/texas_downloader.py (2)
2-18: Remove unused imports.

Optional (line 2) and CreateWebDriver (line 18) are imported but never used in the code.
-from typing import Self, ClassVar, Optional
+from typing import Self, ClassVar
-from web_scrape_utils import CreateWebDriver, By
+from web_scrape_utils import By
103-103: Unused variables _file_count.

_file_count is assigned but never used on both lines 103 and 122.
-        _file_count = len(list(_folder.glob("*.parquet")))
-        _file_count = len(_file_types)
Also applies to: 122-122
app/states/texas/validators/texas_personname.py (2)
2-4: Remove unused imports.

Annotated (line 2), field_validator, create_model, AliasChoices (line 4), and re (line 12) are imported but never used.
-from typing import Optional, Annotated
+from typing import Optional
-from pydantic import field_validator, model_validator, BeforeValidator, create_model, AliasChoices
+from pydantic import model_validator, BeforeValidator
-import re
Also applies to: 12-12

65-70: Remove commented-out code.

Commented-out code should be removed to improve readability. If it's needed for reference, it should be tracked in version control or documented elsewhere.
app/states/texas/texas_search.py (2)
2-2: Use modern type hints and remove unused imports.

Type, Self, and Tuple are imported but never used. Also, List and Dict are deprecated in favor of built-in list and dict.
-from typing import Optional, List, Dict, Any, Type, NamedTuple, Self, Tuple
+from typing import Optional, Any, NamedTuple
Then update usages:

List[str] → list[str]

Dict[str, List[str]] → dict[str, list[str]]

258-258: Self-assignment has no effect.

result.selected_choices = result.selected_choices is a no-op.
-        result.selected_choices = result.selected_choices

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33af3b3 and 2b78c27.

⛔ Files ignored due to path filters (6)

.DS_Store is excluded by !**/.DS_Store
.idea/.DS_Store is excluded by !**/.DS_Store
app/.DS_Store is excluded by !**/.DS_Store
poetry.lock is excluded by !**/*.lock
tmp/.DS_Store is excluded by !**/.DS_Store
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (36)

.gitattributes (1 hunks)
.gitignore (2 hunks)
.idea/campaignfinance.iml (1 hunks)
.idea/csv-editor.xml (0 hunks)
.idea/misc.xml (1 hunks)
.idea/ruff.xml (1 hunks)
README.md (1 hunks)
app/abcs/__init__.py (1 hunks)
app/abcs/abc_download.py (2 hunks)
app/abcs/abc_state_config.py (1 hunks)
app/abcs/abc_validation_errors.py (1 hunks)
app/funcs/__init__.py (1 hunks)
app/funcs/csv_reader.py (2 hunks)
app/funcs/validator_functions.py (1 hunks)
app/live_display.py (1 hunks)
app/logs/campaign_finance.log.2025-01-31 (1 hunks)
app/main.py (3 hunks)
app/states/ohio/ohio_downloader.py (1 hunks)
app/states/texas/__init__.py (2 hunks)
app/states/texas/models.py (1 hunks)
app/states/texas/texas_downloader.py (2 hunks)
app/states/texas/texas_search.py (1 hunks)
app/states/texas/validators/__init__.py (1 hunks)
app/states/texas/validators/texas_address.py (4 hunks)
app/states/texas/validators/texas_expenses.py (4 hunks)
app/states/texas/validators/texas_filers.py (5 hunks)
app/states/texas/validators/texas_flags.py (1 hunks)
app/states/texas/validators/texas_personname.py (2 hunks)
app/states/texas/validators/texas_settings.py (3 hunks)
app/states/unified_database.py (1 hunks)
app/states/unified_sqlmodels.py (1 hunks)
app/tests/test_validation_class.py (1 hunks)
app/workflows/texas_download.py (0 hunks)
production_loader.py (1 hunks)
pyproject.toml (1 hunks)
recreate_tables.py (1 hunks)

💤 Files with no reviewable changes (2)

.idea/csv-editor.xml
app/workflows/texas_download.py

🧰 Additional context used

🧬 Code graph analysis (19)

recreate_tables.py (1)

app/op.py (2)

database (91-92)

database_url (111-122)

app/abcs/abc_state_config.py (1)

app/funcs/toml_reader.py (1)

read_toml (6-8)

app/tests/test_validation_class.py (1)

app/abcs/abc_validation.py (1)

StateFileValidation (24-85)

app/main.py (2)

app/states/texas/texas_search.py (1)

TexasSearch (165-294)

app/states/texas/texas_downloader.py (2)

download (30-98)

dataframes (277-279)

app/abcs/__init__.py (1)

app/abcs/abc_download.py (1)

FileDownloaderABC (24-142)

app/states/texas/validators/texas_flags.py (3)

app/states/texas/models.py (1)

TECBaseModel (58-64)

app/states/texas/validators/texas_settings.py (2)

TECBaseModel (44-57)

check_contains_factory (15-20)

app/funcs/validator_functions.py (1)

check_contains_factory (166-171)

app/states/texas/__init__.py (2)

app/states/texas/texas_search.py (1)

TexasSearch (165-294)

app/states/texas/validators/texas_filers.py (1)

TECFilerName (126-184)

app/funcs/validator_functions.py (1)

app/states/texas/validators/texas_settings.py (2)

check_contains_factory (15-20)

check_contains (16-19)

app/states/texas/validators/texas_personname.py (3)

app/states/texas/models.py (1)

TECBaseModel (58-64)

app/states/texas/validators/texas_settings.py (2)

TECBaseModel (44-57)

check_contains_factory (15-20)

app/states/texas/validators/texas_address.py (1)

TECAddressBase (20-142)

app/states/ohio/ohio_downloader.py (2)

app/abcs/abc_download.py (4)

FileDownloaderABC (24-142)

download (109-110)

consolidate_files (114-115)

read (119-120)

app/abcs/abc_state_config.py (4)

StateConfig (140-180)

CategoryTypes (110-136)

CategoryConfig (65-106)

CSVReaderConfig (40-42)

app/states/texas/validators/texas_settings.py (2)

app/funcs/validator_functions.py (2)

check_contains_factory (166-171)

check_contains (167-170)

app/states/texas/models.py (1)

TECBaseModel (58-64)

app/funcs/csv_reader.py (3)

app/abcs/abc_validation.py (1)

logger (32-34)

app/abcs/abc_category.py (1)

logger (99-101)

app/logger.py (1)

Logger (10-71)

app/states/texas/validators/texas_address.py (2)

app/states/texas/models.py (1)

TECBaseModel (58-64)

app/states/texas/validators/texas_settings.py (1)

TECBaseModel (44-57)

app/states/texas/validators/__init__.py (4)

app/states/texas/models.py (1)

TECBaseModel (58-64)

app/states/texas/validators/texas_filers.py (1)

TECFilerName (126-184)

app/states/texas/validators/texas_contributions.py (1)

TECContribution (10-294)

app/states/texas/validators/texas_address.py (1)

TECAddressBase (20-142)

app/funcs/__init__.py (4)

app/funcs/csv_reader.py (1)

FileReader (54-144)

app/funcs/toml_reader.py (1)

read_toml (6-8)

app/funcs/file_exporters.py (1)

write_records_to_csv_validation (8-38)

app/funcs/depreciated.py (1)

deprecated (5-17)

app/abcs/abc_download.py (4)

app/abcs/abc_state_config.py (3)

StateConfig (140-180)

CategoryTypes (110-136)

TEMP_FOLDER (149-150)

app/live_display.py (4)

ProgressTracker (9-106)

start (21-25)

add_task (34-44)

update_task (46-52)

app/states/texas/texas_downloader.py (3)

download (30-98)

consolidate_files (120-206)

read (209-274)

app/abcs/abc_category.py (1)

read (103-125)

app/states/texas/validators/texas_expenses.py (2)

app/states/texas/validators/texas_personname.py (1)

TECPersonNameBase (15-107)

app/states/texas/validators/texas_settings.py (1)

TECSettings (22-41)

app/states/unified_sqlmodels.py (2)

app/op.py (1)

role (107-108)

app/states/unified_database.py (1)

get_session (77-79)

production_loader.py (5)

app/states/unified_sqlmodels.py (2)

State (101-119)

process_record (1321-1444)

app/abcs/abc_validation.py (1)

failed_records (62-65)

app/logger.py (3)

error (61-62)

warning (58-59)

info (52-53)

app/states/unified_database.py (1)

get_session (77-79)

app/live_display.py (1)

add_task (34-44)

🪛 LanguageTool

README.md

[style] ~15-~15: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 1026 characters long)
Context: ...r-the-badge&logo=python&logoColor=blue) ![Selenium](https://img.shields.io/badge/...

(EN_EXCESSIVE_EXCLAMATION)

🪛 markdownlint-cli2 (0.18.1)

README.md

3-3: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

7-7: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

10-10: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

14-14: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

18-18: Files should end with a single newline character

(MD047, single-trailing-newline)

🪛 Ruff (0.14.7)

recreate_tables.py

1-1: Shebang is present but file is not executable

(EXE001)

12-12: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

30-30: Do not catch blind exception: Exception

(BLE001)

65-65: Do not catch blind exception: Exception

(BLE001)

70-70: Do not catch blind exception: Exception

(BLE001)

81-81: Do not catch blind exception: Exception

(BLE001)

109-109: Do not catch blind exception: Exception

(BLE001)

app/states/texas/models.py

5-5: Redefinition of unused Field from line 1

(F811)

13-13: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

app/states/texas/validators/texas_flags.py

10-10: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

app/states/texas/texas_downloader.py

25-25: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

32-32: Boolean-typed positional argument in function definition

(FBT001)

32-32: Boolean default positional argument in function definition

(FBT002)

32-32: Unused class method argument: read_from_temp

(ARG003)

33-33: Boolean-typed positional argument in function definition

(FBT001)

33-33: Boolean default positional argument in function definition

(FBT002)

36-36: Unnecessary map() usage (rewrite using a generator expression)

Replace map() with a generator expression

(C417)

63-63: Avoid specifying long messages outside the exception class

(TRY003)

83-83: Unnecessary map() usage (rewrite using a list comprehension)

Replace map() with a list comprehension

(C417)

87-87: Unnecessary generator (rewrite as a list comprehension)

Rewrite as a list comprehension

(C400)

89-89: Avoid specifying long messages outside the exception class

(TRY003)

95-95: Unnecessary map() usage (rewrite using a list comprehension)

Replace map() with a list comprehension

(C417)

113-113: Avoid specifying long messages outside the exception class

(TRY003)

120-120: Missing return type annotation for classmethod consolidate_files

(ANN206)

147-147: Do not catch blind exception: Exception

(BLE001)

178-178: Do not catch blind exception: Exception

(BLE001)

201-201: datetime.datetime.now() called without a tz argument

(DTZ005)

204-204: Unnecessary map() usage (rewrite using a list comprehension)

Replace map() with a list comprehension

(C417)

209-209: Boolean-typed positional argument in function definition

(FBT001)

209-209: Boolean default positional argument in function definition

(FBT002)

217-217: Avoid specifying long messages outside the exception class

(TRY003)

240-240: Do not catch blind exception: Exception

(BLE001)

264-264: Do not catch blind exception: Exception

(BLE001)

269-269: Do not catch blind exception: Exception

(BLE001)

277-277: Missing return type annotation for classmethod dataframes

(ANN206)

app/funcs/validator_functions.py

169-169: Avoid specifying long messages outside the exception class

(TRY003)

app/live_display.py

4-4: typing.Dict is deprecated, use dict instead

(UP035)

10-10: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

40-40: datetime.datetime.now() called without a tz argument

(DTZ005)

41-41: datetime.datetime.now() called without a tz argument

(DTZ005)

51-51: datetime.datetime.now() called without a tz argument

(DTZ005)

74-74: datetime.datetime.now() called without a tz argument

(DTZ005)

94-94: Missing return type annotation for private function _queue_update

Add return type annotation: None

(ANN202)

96-96: Boolean positional value in function call

(FBT003)

98-98: Missing return type annotation for private function _update_display

Add return type annotation: None

(ANN202)

app/states/ohio/ohio_downloader.py

21-21: Boolean-typed positional argument in function definition

(FBT001)

21-21: Unused method argument: overwrite

(ARG002)

21-21: Boolean-typed positional argument in function definition

(FBT001)

21-21: Unused method argument: read_from_temp

(ARG002)

29-29: Unnecessary ... literal

Remove unnecessary ...

(PIE790)

app/states/texas/validators/texas_settings.py

5-5: Redefinition of unused Field from line 1

(F811)

18-18: Avoid specifying long messages outside the exception class

(TRY003)

app/funcs/csv_reader.py

100-101: Replace yield over for loop with yield from

Replace with yield from

(UP028)

102-102: Do not catch blind exception: Exception

(BLE001)

103-103: Use explicit conversion flag

Replace with conversion flag

(RUF010)

108-108: Missing type annotation for **kwargs

(ANN003)

121-122: Replace yield over for loop with yield from

Replace with yield from

(UP028)

124-124: Do not catch blind exception: Exception

(BLE001)

125-125: Use explicit conversion flag

Replace with conversion flag

(RUF010)

129-130: Replace yield over for loop with yield from

Replace with yield from

(UP028)

131-131: Do not catch blind exception: Exception

(BLE001)

132-132: Use explicit conversion flag

Replace with conversion flag

(RUF010)

136-136: Missing type annotation for **kwargs

(ANN003)

137-137: Unnecessary mode argument

Remove mode argument

(UP015)

app/states/texas/validators/texas_address.py

147-147: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

app/states/texas/validators/texas_filers.py

23-23: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

30-30: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

46-46: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

53-53: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

68-68: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

84-84: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

84-84: Missing type annotation for **data

(ANN003)

99-99: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

114-114: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

123-123: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

128-128: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

152-152: Missing return type annotation for classmethod validate_name

(ANN206)

158-158: Missing return type annotation for classmethod validate_addresses

(ANN206)

167-167: Missing return type annotation for classmethod validate_treasurer

(ANN206)

174-174: Missing return type annotation for classmethod validate_assistant_treasurer

(ANN206)

181-181: Missing return type annotation for classmethod validate_chair

(ANN206)

app/states/unified_database.py

5-5: typing.List is deprecated, use list instead

(UP035)

5-5: typing.Dict is deprecated, use dict instead

(UP035)

31-31: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

44-44: Avoid specifying long messages outside the exception class

(TRY003)

105-105: Avoid specifying long messages outside the exception class

(TRY003)

152-152: Boolean-typed positional argument in function definition

(FBT001)

152-152: Boolean default positional argument in function definition

(FBT002)

283-283: Naive datetime constructed using datetime.datetime.strptime() without %z

(DTZ007)

284-284: Naive datetime constructed using datetime.datetime.strptime() without %z

(DTZ007)

574-574: datetime.datetime.utcnow() used

(DTZ003)

583-583: datetime.datetime.utcnow() used

(DTZ003)

617-617: datetime.datetime.utcnow() used

(DTZ003)

651-651: datetime.datetime.utcnow() used

(DTZ003)

685-685: datetime.datetime.utcnow() used

(DTZ003)

738-738: Avoid equality comparisons to True; use UnifiedCommitteePerson.is_active: for truth checks

Replace with UnifiedCommitteePerson.is_active

(E712)

744-744: datetime.datetime.utcnow() used

(DTZ003)

745-745: datetime.datetime.utcnow() used

(DTZ003)

753-753: Boolean-typed positional argument in function definition

(FBT001)

753-753: Boolean default positional argument in function definition

(FBT002)

760-760: Avoid equality comparisons to True; use UnifiedCommitteePerson.is_active: for truth checks

Replace with UnifiedCommitteePerson.is_active

(E712)

768-768: Boolean-typed positional argument in function definition

(FBT001)

768-768: Boolean default positional argument in function definition

(FBT002)

777-777: Avoid equality comparisons to True; use UnifiedCommitteePerson.is_active: for truth checks

Replace with UnifiedCommitteePerson.is_active

(E712)

817-817: datetime.datetime.utcnow() used

(DTZ003)

827-827: datetime.datetime.utcnow() used

(DTZ003)

855-855: Avoid equality comparisons to True; use UnifiedCommitteePerson.is_active: for truth checks

Replace with UnifiedCommitteePerson.is_active

(E712)

865-865: Boolean-typed positional argument in function definition

(FBT001)

865-865: Boolean default positional argument in function definition

(FBT002)

878-878: Unused method argument: user

(ARG002)

891-891: datetime.datetime.utcnow() used

(DTZ003)

935-935: Avoid equality comparisons to True; use UnifiedCommitteePerson.is_active: for truth checks

Replace with UnifiedCommitteePerson.is_active

(E712)

985-985: Avoid equality comparisons to True; use UnifiedCommitteePerson.is_active: for truth checks

Replace with UnifiedCommitteePerson.is_active

(E712)

1059-1059: Unused method argument: user

(ARG002)

1069-1069: Avoid equality comparisons to True; use UnifiedCommitteePerson.is_active: for truth checks

Replace with UnifiedCommitteePerson.is_active

(E712)

1081-1081: Comparison to None should be cond is None

Replace with cond is None

(E711)

1092-1092: datetime.datetime.utcnow() used

(DTZ003)

1107-1107: Unused method argument: user

(ARG002)

1132-1132: Avoid equality comparisons to True; use UnifiedCommitteePerson.is_active: for truth checks

Replace with UnifiedCommitteePerson.is_active

(E712)

1148-1148: datetime.datetime.utcnow() used

(DTZ003)

1163-1163: Avoid equality comparisons to True; use UnifiedCommitteePerson.is_active: for truth checks

Replace with UnifiedCommitteePerson.is_active

(E712)

1179-1179: Comparison to None should be cond is None

Replace with cond is None

(E711)

1203-1203: Do not catch blind exception: Exception

(BLE001)

1208-1208: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

app/abcs/abc_download.py

6-6: Import from collections.abc instead: Generator

Import from collections.abc

(UP035)

6-6: typing.Dict is deprecated, use dict instead

(UP035)

30-30: Missing return type annotation for special method __post_init__

(ANN204)

36-36: Missing return type annotation for classmethod not_headless

(ANN206)

62-62: Missing return type annotation for classmethod extract_zipfile

Add return type annotation: None

(ANN206)

68-68: Do not catch blind exception: Exception

(BLE001)

73-73: Missing return type annotation for classmethod _process_csv

Add return type annotation: None

(ANN206)

85-85: datetime.datetime.now() called without a tz argument

(DTZ005)

109-109: Boolean-typed positional argument in function definition

(FBT001)

109-109: Boolean-typed positional argument in function definition

(FBT001)

114-114: Missing return type annotation for classmethod consolidate_files

(ANN206)

119-119: Missing return type annotation for classmethod read

(ANN206)

app/states/texas/validators/texas_expenses.py

161-161: Missing return type annotation for classmethod validate_payee

(ANN206)

app/states/texas/texas_search.py

2-2: typing.List is deprecated, use list instead

(UP035)

2-2: typing.Dict is deprecated, use dict instead

(UP035)

2-2: typing.Type is deprecated, use type instead

(UP035)

2-2: typing.Tuple is deprecated, use tuple instead

(UP035)

48-48: Missing return type annotation for special method __post_init__

(ANN204)

53-53: Missing return type annotation for private function _sort_fields

(ANN202)

63-63: Missing return type annotation for private function _recast_sorted_field_types

(ANN202)

85-85: Avoid specifying long messages outside the exception class

(TRY003)

94-94: Missing return type annotation for private function _coalesce_names

(ANN202)

120-120: Missing return type annotation for special method __repr__

Add return type annotation: str

(ANN204)

169-169: Missing return type annotation for special method __repr__

Add return type annotation: str

(ANN204)

172-172: Missing return type annotation for special method __post_init__

(ANN204)

180-180: Unnecessary dict() call (rewrite as a literal)

Rewrite as a literal

(C408)

230-230: Missing type annotation for *args

(ANN002)

235-235: Unnecessary dict() call (rewrite as a literal)

Rewrite as a literal

(C408)

app/states/unified_sqlmodels.py

8-8: typing.List is deprecated, use list instead

(UP035)

8-8: typing.Dict is deprecated, use dict instead

(UP035)

136-136: Unnecessary UTF-8 encoding argument to encode

Remove unnecessary encoding argument

(UP012)

164-164: Missing return type annotation for special method __post_init__

(ANN204)

227-227: Missing return type annotation for special method __post_init__

(ANN204)

288-288: Missing return type annotation for special method __post_init__

(ANN204)

414-414: Class field state_id is defined multiple times

Remove duplicate field definition for state_id

(PIE794)

747-747: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

853-853: Unused method argument: entity_role

(ARG002)

953-953: Dynamically typed expressions (typing.Any) are disallowed in _get_field_value

(ANN401)

990-990: Consider moving this statement to an else block

(TRY300)

991-991: Do not catch blind exception: Exception

(BLE001)

1007-1007: Do not catch blind exception: Exception

(BLE001)

1036-1036: Do not catch blind exception: Exception

(BLE001)

1094-1094: Do not catch blind exception: Exception

(BLE001)

1183-1183: Possible SQL injection vector through string-based query construction

(S608)

1189-1189: Do not catch blind exception: Exception

(BLE001)

1201-1201: Do not catch blind exception: Exception

(BLE001)

1204-1204: Dynamically typed expressions (typing.Any) are disallowed in value

(ANN401)

1218-1218: Dynamically typed expressions (typing.Any) are disallowed in value

(ANN401)

1228-1228: Naive datetime constructed using datetime.datetime.strptime() without %z

(DTZ007)

1234-1234: Consider moving this statement to an else block

(TRY300)

1238-1238: Dynamically typed expressions (typing.Any) are disallowed in value

(ANN401)

1293-1293: Dynamically typed expressions (typing.Any) are disallowed in obj

(ANN401)

1293-1293: Dynamically typed expressions (typing.Any) are disallowed in _json_default

(ANN401)

1304-1304: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

production_loader.py

1-1: Shebang is present but file is not executable

(EXE001)

19-19: typing.Dict is deprecated, use dict instead

(UP035)

19-19: typing.List is deprecated, use list instead

(UP035)

19-19: typing.Tuple is deprecated, use tuple instead

(UP035)

152-152: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

203-203: Loop control variable slug not used within loop body

Rename unused slug to _slug

(B007)

206-206: Avoid specifying long messages outside the exception class

(TRY003)

451-451: Possible SQL injection vector through string-based query construction

(S608)

469-469: Possible SQL injection vector through string-based query construction

(S608)

536-536: Missing return type annotation for private function _cache_entity

(ANN202)

552-552: Missing return type annotation for private function _cache_campaign

(ANN202)

568-568: Missing return type annotation for private function _ensure_address

(ANN202)

700-700: Avoid specifying long messages outside the exception class

(TRY003)

823-823: Do not catch blind exception: Exception

(BLE001)

832-832: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

832-832: Logging statement uses f-string

(G004)

855-855: Avoid specifying long messages outside the exception class

(TRY003)

876-876: Avoid specifying long messages outside the exception class

(TRY003)

931-931: Logging statement uses f-string

(G004)

1038-1038: f-string without any placeholders

Remove extraneous f prefix

(F541)

1044-1044: Logging .exception(...) should be used instead of .error(..., exc_info=True)

(G201)

1044-1044: Logging statement uses f-string

(G004)

1048-1048: f-string without any placeholders

Remove extraneous f prefix

(F541)

1049-1049: f-string without any placeholders

Remove extraneous f prefix

(F541)

1050-1050: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (12)

.gitattributes (1)

1-1: Verify removal of CSV and ZIP from LFS tracking.

The change removes *.csv and *.zip from Git LFS tracking and replaces them with *.parquet. Combined with the .gitignore changes that now actively ignore *.csv, CSV files would be completely untracked (neither in Git nor LFS).

Confirm this is intentional: Are CSV files temporary/transient, or do they need to be persioned in some form? If CSVs are generated/downloaded in CI pipelines or shared workflows, verify that removing LFS tracking won't cause issues.

app/abcs/abc_validation_errors.py (1)

7-7: LGTM!

The logging prefix standardization to 'campaignfinance|' aligns with the project-wide logging consistency effort.

recreate_tables.py (1)

12-12: The import is intentional for side-effect registration.

The # noqa: F401 comment is appropriate here since the import registers SQLModel metadata as a side effect. However, static analysis reports the noqa directive itself is flagged. This is a linter configuration issue—the import is correctly used.

app/states/texas/validators/texas_filers.py (1)

28-37: Unusual composite primary key design with all-optional fields.

TECFilerIDNameLink defines multiple primary_key=True fields that are all Optional except filer_id. Having optional primary key columns is unconventional and may cause issues with NULL values in primary key constraints. Consider whether this link table design is intentional.

app/abcs/abc_state_config.py (1)

154-158: LGTM!

The reformatted multiline expression improves readability while maintaining identical functionality.

app/states/texas/__init__.py (2)

6-6: LGTM!

The new TexasSearch import exposes the search functionality from the module's public interface.

32-34: TECFilerName validator is correctly implemented with proper field mappings.

The change from TECFiler to TECFilerName is intentional and complete. TECFilerName is properly defined with comprehensive field mappings including filerIdent, filerJdiCd, committeeStatusCd, filerName, filerAddress, filerMailing, and relationships to related entities. The old TECFiler class is fully deprecated (commented out), with no active references remaining. The integration into the filers category configuration is correct.

app/abcs/__init__.py (1)

7-7: LGTM!

The progress object is now part of the public API, enabling progress tracking across downloader components. This aligns with its usage in FileDownloaderABC and the Texas downloader modules.

app/states/texas/validators/__init__.py (1)

1-10: LGTM on the restructured exports.

The updated public API surface correctly exposes the new base models (TECBaseModel, TECAddressBase) and renamed entities (TECFilerName), aligning with the refactored validator architecture.

app/live_display.py (1)

64-92: LGTM on the table generation logic.

The color-coded status display with duration tracking is well-implemented. Thread-safe access via self._lock during table generation ensures consistent reads.

app/states/texas/validators/texas_expenses.py (1)

18-20: table=True removed but table metadata retained.

TECExpense no longer has table=True but still has __tablename__ and __table_args__ defined. If this is intentionally not a table model, remove the metadata. If it should be a table, add table=True back.
app/states/ohio/ohio_downloader.py (1)
21-35: Method signature mismatch with abstract base class.

The download, consolidate_files, and read methods use self (instance methods), but FileDownloaderABC declares them as @classmethod with cls parameter. This inconsistency will cause issues when these methods are invoked.

Apply this diff to align with the ABC:
-    def download(self, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
+    @classmethod
+    def download(cls, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
-        _driver = self.driver.create_driver()
-        _wait = self.driver
+        _driver = cls.driver.create_driver()
+        _wait = cls.driver
         _driver.get(OHIO_ENTRY_URL)
         _wait.wait_until_clickable(By.LINK_TEXT, "FTP Site")
         _ftp_site_url = _driver.find_element(By.LINK_TEXT, "FTP Site").get_attribute("href")
         _driver.get(_ftp_site_url)
-
-        ...
+        return cls

-    def consolidate_files(self):
-        ...
+    @classmethod
+    def consolidate_files(cls):
+        pass

-    def read(self):
-        ...
+    @classmethod
+    def read(cls):
+        pass
Likely an incorrect or invalid review comment.

coderabbitai · 2025-12-04T01:37:59Z

.gitignore

 *.zip
 *.DS_Store
 *.env
+# *.parquet


🛠️ Refactor suggestion | 🟠 Major

Consolidate redundant commented and uncommented ignore patterns.

Lines 7 and 10 contain commented versions of patterns, while lines 178–179 contain their uncommented equivalents. This duplication creates maintenance burden. Consider removing the commented versions or consolidating at the end of the file with a single definitive block.

Also applies to: 10-10, 178-179

🤖 Prompt for AI Agents

.gitignore lines 7, 10 and lines 178–179: there are duplicate commented ignore patterns (lines 7 and 10) that match the uncommented equivalents at 178–179; remove the redundant commented entries (or move any intentionally kept comments into a single definitive explanatory block at the end of the file) so each ignore pattern appears only once (keep the active/uncommented patterns where they currently are and delete the commented duplicates).

coderabbitai · 2025-12-04T01:37:59Z

.gitignore

 *.csv
 *.txt


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l .gitignore

Repository: Abstract-Data/campaignfinance-2023

Length of output: 92

🏁 Script executed:

head -10 .gitignore | cat -n

Repository: Abstract-Data/campaignfinance-2023

Length of output: 244

🏁 Script executed:

sed -n '170,180p' .gitignore | cat -n

Repository: Abstract-Data/campaignfinance-2023

Length of output: 543

🏁 Script executed:

# Check if *.csv appears multiple times rg "^\*\.csv$" .gitignore

Repository: Abstract-Data/campaignfinance-2023

Length of output: 89

🏁 Script executed:

# Check if *.txt already exists elsewhere rg "^\*\.txt$" .gitignore

Repository: Abstract-Data/campaignfinance-2023

Length of output: 83

Remove duplicate CSV rule and clarify intent of broad *.txt pattern.

Line 176 duplicates the *.csv rule already present at line 2. Additionally, line 177's *.txt pattern is overly broad and could inadvertently ignore important text files (README, documentation, config templates, etc.).

Clarify the intent: Is *.txt meant to ignore temporary text outputs, or is this too broad for the repository?

🤖 Prompt for AI Agents

In .gitignore around lines 176-177, remove the duplicate "*.csv" entry (it's already present at line 2) and replace or clarify the overly broad "*.txt" rule: either delete it if not needed, scope it to a specific directory/pattern for temporary outputs (e.g., temp/*.txt or *.out.txt) or add explicit negations for important files (e.g., !README.txt, !docs/*.txt) and include a short comment above the rule explaining the intent (temporary outputs vs generated artifacts).

coderabbitai · 2025-12-04T01:37:59Z

.idea/campaignfinance.iml

      <excludeFolder url="file://$MODULE_DIR$/.venv/bin" />
      <excludeFolder url="file://$MODULE_DIR$/.venv/lib" />
+      <excludeFolder url="file://$MODULE_DIR$/.venv" />


🧹 Nitpick | 🔵 Trivial

Redundant folder exclusions.

Lines 6-7 exclude specific subdirectories of .venv, but line 8 excludes the entire .venv directory. When a parent directory is excluded, all its subdirectories are implicitly excluded, making lines 6-7 redundant.

Apply this diff to remove the redundant exclusions:

<content url="file://$MODULE_DIR$"> <sourceFolder url="file://$MODULE_DIR$/app" isTestSource="false" /> - <excludeFolder url="file://$MODULE_DIR$/.venv/bin" /> - <excludeFolder url="file://$MODULE_DIR$/.venv/lib" /> <excludeFolder url="file://$MODULE_DIR$/.venv" /> </content>

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

<excludeFolder url="file://$MODULE_DIR$/.venv/bin" />

<excludeFolder url="file://$MODULE_DIR$/.venv/lib" />

<excludeFolder url="file://$MODULE_DIR$/.venv" />

<excludeFolder url="file://$MODULE_DIR$/.venv" />

🤖 Prompt for AI Agents

In .idea/campaignfinance.iml around lines 6 to 8, remove the two redundant excludeFolder entries for file://$MODULE_DIR$/.venv/bin and file://$MODULE_DIR$/.venv/lib and keep only the single excludeFolder entry that excludes file://$MODULE_DIR$/.venv so that the parent .venv exclusion covers its subdirectories.

coderabbitai · 2025-12-04T01:38:00Z

.idea/campaignfinance.iml

+      <excludeFolder url="file://$MODULE_DIR$/.venv" />
    </content>
-    <orderEntry type="jdk" jdkName="Poetry (campaignfinance) (2)" jdkType="Python SDK" />
+    <orderEntry type="jdk" jdkName="Python 3.12 UV (campaignfinance)" jdkType="Python SDK" />


🧹 Nitpick | 🔵 Trivial

Hardcoded SDK reference is machine-specific.

The SDK name "Python 3.12 UV (campaignfinance)" is an IDE-specific identifier that may differ across team members' machines. IDE configuration files in .idea/ are typically added to .gitignore because they contain machine-specific settings like SDK paths, tool locations, and personal preferences.

Consider adding .idea/ to .gitignore (keeping only shared configurations like code style if needed), or document that developers must configure their local SDK to match this exact name.

🤖 Prompt for AI Agents

In .idea/campaignfinance.iml around line 10 the file contains a hardcoded, machine-specific SDK entry "Python 3.12 UV (campaignfinance)"; remove that SDK-specific orderEntry (or delete the whole .iml) and add .idea/ to .gitignore (or keep only shared IDE files) so personal SDK names aren't committed; alternatively document required local SDK name if you must keep a shared reference.

coderabbitai · 2025-12-04T01:38:00Z

.idea/misc.xml

@@ -6,7 +6,7 @@
    <option name="pathToExecutable" value="/opt/homebrew/bin/black" />


🧹 Nitpick | 🔵 Trivial

Hardcoded absolute path is machine-specific.

The path /opt/homebrew/bin/black is specific to macOS with Homebrew and won't work on Linux, Windows, or systems with different package managers. This will cause Black integration to fail for other developers.

As noted for the previous file, .idea/ configurations are typically machine-specific and should be excluded from version control. If you choose to commit them, document the expected tool locations or use IDE features to detect tools from the project's virtual environment automatically.

🤖 Prompt for AI Agents

In .idea/misc.xml around line 6 the Black executable path is hardcoded to /opt/homebrew/bin/black which is machine-specific; replace this by removing the absolute path and either rely on IDE's auto-detection (use the project's virtualenv/interpreter setting) or store a relative/variable-based path (e.g., use a configurable IDEA path macro or project interpreter) and add .idea/ files to .gitignore or document expected tool locations in repo docs so other developers don't get a broken config.

coderabbitai · 2025-12-04T01:38:04Z

README.md

+# Texas Campaign Finance Package
+
+## Overview
+This package is designed to provide a simple interface for accessing campaign finance data from the Texas Ethics Commission.
+It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data.
+
+## Examples
+Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files.
+
+## Ability to Download TEC File Data Built-In
+Using [Selenium](https://www.selenium.dev/), this package can download the latest campaign finance data from the Texas Ethics Commission website. The data is then processed and saved as CSV files.
+
+
+## Dependencies
+![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue)
+![Pydantic](https://img.shields.io/badge/Pydantic-E92063?style=for-the-badge&logo=Pydantic&logoColor=white)
+![Pandas](https://img.shields.io/badge/Pandas-2C2D72?style=for-the-badge&logo=pandas&logoColor=white)
+![Selenium](https://img.shields.io/badge/Selenium-43B02A?style=for-the-badge&logo=Selenium&logoColor=white)


🧹 Nitpick | 🔵 Trivial

Minor markdown formatting improvements.

Add blank lines after headings and ensure file ends with a newline for proper markdown rendering and linting compliance.

# Texas Campaign Finance Package ## Overview + This package is designed to provide a simple interface for accessing campaign finance data from the Texas Ethics Commission. It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data. ## Examples + Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files. ## Ability to Download TEC File Data Built-In + Using [Selenium](https://www.selenium.dev/), this package can download the latest campaign finance data from the Texas Ethics Commission website. The data is then processed and saved as CSV files. ## Dependencies + ![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue) ![Pydantic](https://img.shields.io/badge/Pydantic-E92063?style=for-the-badge&logo=Pydantic&logoColor=white) ![Pandas](https://img.shields.io/badge/Pandas-2C2D72?style=for-the-badge&logo=pandas&logoColor=white) ![Selenium](https://img.shields.io/badge/Selenium-43B02A?style=for-the-badge&logo=Selenium&logoColor=white) +

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Texas Campaign Finance Package

## Overview

This package is designed to provide a simple interface for accessing campaign finance data from the Texas Ethics Commission.

It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data.

## Examples

Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files.

## Ability to Download TEC File Data Built-In

Using [Selenium](https://www.selenium.dev/), this package can download the latest campaign finance data from the Texas Ethics Commission website. The data is then processed and saved as CSV files.

## Dependencies

![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue)

![Pydantic](https://img.shields.io/badge/Pydantic-E92063?style=for-the-badge&logo=Pydantic&logoColor=white)

![Pandas](https://img.shields.io/badge/Pandas-2C2D72?style=for-the-badge&logo=pandas&logoColor=white)

![Selenium](https://img.shields.io/badge/Selenium-43B02A?style=for-the-badge&logo=Selenium&logoColor=white)

# Texas Campaign Finance Package

## Overview

This package is designed to provide a simple interface for accessing campaign finance data from the Texas Ethics Commission.

It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data.

## Examples

Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files.

## Ability to Download TEC File Data Built-In

Using [Selenium](https://www.selenium.dev/), this package can download the latest campaign finance data from the Texas Ethics Commission website. The data is then processed and saved as CSV files.

## Dependencies

![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge&logo=python&logoColor=blue)

![Pydantic](https://img.shields.io/badge/Pydantic-E92063?style=for-the-badge&logo=Pydantic&logoColor=white)

![Pandas](https://img.shields.io/badge/Pandas-2C2D72?style=for-the-badge&logo=pandas&logoColor=white)

![Selenium](https://img.shields.io/badge/Selenium-43B02A?style=for-the-badge&logo=Selenium&logoColor=white)

🧰 Tools

🪛 LanguageTool

[style] ~15-~15: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 1026 characters long)
Context: ...r-the-badge&logo=python&logoColor=blue) ![Selenium](https://img.shields.io/badge/...

(EN_EXCESSIVE_EXCLAMATION)

🪛 markdownlint-cli2 (0.18.1)

3-3: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

7-7: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

10-10: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

14-14: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

18-18: Files should end with a single newline character

(MD047, single-trailing-newline)

🤖 Prompt for AI Agents

In README.md around lines 1 to 18, headings are missing blank lines after them and the file likely does not end with a newline; add a single blank line immediately after each Markdown heading (e.g., after "# Texas Campaign Finance Package", "## Overview", "## Examples", "## Ability to Download TEC File Data Built-In", and "## Dependencies") and ensure the file ends with a trailing newline character.

coderabbitai · 2025-12-04T01:38:04Z

README.md

+It also reduces duplication of fields and joins data from multiple files into a single table to reduce the size of the data.
+
+## Examples
+Across all files, there are over 317 columns. This package reduces the number of columns to [number of columns] by joining data from multiple files.


⚠️ Potential issue | 🟡 Minor

Placeholder text needs to be replaced.

The text [number of columns] appears to be a placeholder that should be replaced with the actual column count.

🤖 Prompt for AI Agents

In README.md around line 8, the placeholder "[number of columns]" must be replaced with the actual column count; determine the correct number (e.g., 317 or the current accurate count) and replace the placeholder with that numeric value, keeping the sentence grammatical and updating any surrounding punctuation or wording as needed.

coderabbitai · 2025-12-04T01:38:04Z

recreate_tables.py

+    except Exception as e:
+        console.print(f"[red]❌ Setup failed: {e}[/red]")
+        return


🧹 Nitpick | 🔵 Trivial

Catching bare Exception hides root causes.

Broad exception handling makes debugging difficult. Catch specific exceptions:

- except Exception as e: - console.print(f"[red]❌ Setup failed: {e}[/red]") + except (ImportError, ConnectionError, OperationalError) as e: + console.print(f"[red]❌ Setup failed: {e}[/red]")

At minimum, log the full traceback for debugging:

import traceback except Exception as e: console.print(f"[red]❌ Setup failed: {e}[/red]") console.print(f"[dim]{traceback.format_exc()}[/dim]")

Also applies to: 70-72, 81-83

🧰 Tools

🪛 Ruff (0.14.7)

30-30: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In recreate_tables.py around lines 30-32 (and similarly at 70-72 and 81-83), the code catches a bare Exception and only prints the exception message; update each except block to catch more specific exceptions where possible and at minimum include the full traceback in the output for debugging: import the traceback module at the top, and in those except blocks print the original error message and the formatted traceback (traceback.format_exc()) so root causes are visible; prefer replacing broad Exception with the specific exception types raised by the failing operations if known.

coderabbitai · 2025-12-04T01:38:04Z

recreate_tables.py

+    # Drop existing unified tables
+    console.print("\n[bold blue]Dropping existing unified tables...[/bold blue]")
+    try:
+        with engine.connect() as conn:
+            # Drop tables in reverse dependency order
+            tables_to_drop = [
+                'unified_committee_person_versions',
+                'unified_transaction_versions', 
+                'unified_person_versions',
+                'unified_committee_versions',
+                'unified_address_versions',
+                'unified_campaign_entities',
+                'unified_entity_associations',
+                'unified_transaction_persons',
+                'unified_committee_persons',
+                'unified_contributions',
+                'unified_loans',
+                'unified_transactions',
+                'file_origins',
+                'unified_campaigns',
+                'unified_entities',
+                'unified_persons',
+                'unified_committees',
+                'unified_addresses',
+                'states'
+            ]
+
+            for table in tables_to_drop:
+                try:
+                    conn.execute(text(f"DROP TABLE IF EXISTS {table} CASCADE"))
+                    console.print(f"[green]✅ Dropped {table}[/green]")
+                except Exception as e:
+                    console.print(f"[yellow]⚠️ Could not drop {table}: {e}[/yellow]")
+
+            conn.commit()
+
+    except Exception as e:
+        console.print(f"[red]❌ Error dropping tables: {e}[/red]")
+        return
+
+    # Create new tables with correct schema
+    console.print("\n[bold blue]Creating new unified tables...[/bold blue]")
+    try:
+        # Create all tables
+        SQLModel.metadata.create_all(engine)
+        console.print("[green]✅ All unified tables created successfully![/green]")
+
+    except Exception as e:
+        console.print(f"[red]❌ Error creating tables: {e}[/red]")
+        return


⚠️ Potential issue | 🟠 Major

Schema migration lacks transactional safety.

If table creation fails after tables are dropped, the database is left in an inconsistent state with missing tables. Consider:

Wrap the entire operation in a transaction

Or use a backup/restore approach

Or at minimum, warn users this is destructive

+ console.print("[yellow]⚠️ WARNING: This operation is destructive and non-recoverable![/yellow]") + console.print("[yellow]Consider backing up your database first.[/yellow]") + # Drop existing unified tables console.print("\n[bold blue]Dropping existing unified tables...[/bold blue]")

For production use, consider migration tools like Alembic that provide rollback capabilities.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Drop existing unified tables

console.print("\n[bold blue]Dropping existing unified tables...[/bold blue]")

try:

with engine.connect() as conn:

# Drop tables in reverse dependency order

tables_to_drop = [

'unified_committee_person_versions',

'unified_transaction_versions',

'unified_person_versions',

'unified_committee_versions',

'unified_address_versions',

'unified_campaign_entities',

'unified_entity_associations',

'unified_transaction_persons',

'unified_committee_persons',

'unified_contributions',

'unified_loans',

'unified_transactions',

'file_origins',

'unified_campaigns',

'unified_entities',

'unified_persons',

'unified_committees',

'unified_addresses',

'states'

]

for table in tables_to_drop:

try:

conn.execute(text(f"DROP TABLE IF EXISTS {table} CASCADE"))

console.print(f"[green]✅ Dropped {table}[/green]")

except Exception as e:

console.print(f"[yellow]⚠️ Could not drop {table}: {e}[/yellow]")

conn.commit()

except Exception as e:

console.print(f"[red]❌ Error dropping tables: {e}[/red]")

return

# Create new tables with correct schema

console.print("\n[bold blue]Creating new unified tables...[/bold blue]")

try:

# Create all tables

SQLModel.metadata.create_all(engine)

console.print("[green]✅ All unified tables created successfully![/green]")

except Exception as e:

console.print(f"[red]❌ Error creating tables: {e}[/red]")

return

# Drop existing unified tables

console.print("\n[bold blue]Dropping existing unified tables...[/bold blue]")

console.print("[yellow]⚠️ WARNING: This operation is destructive and non-recoverable![/yellow]")

console.print("[yellow]Consider backing up your database first.[/yellow]")

try:

with engine.connect() as conn:

# Drop tables in reverse dependency order

tables_to_drop = [

'unified_committee_person_versions',

'unified_transaction_versions',

'unified_person_versions',

'unified_committee_versions',

'unified_address_versions',

'unified_campaign_entities',

'unified_entity_associations',

'unified_transaction_persons',

'unified_committee_persons',

'unified_contributions',

'unified_loans',

'unified_transactions',

'file_origins',

'unified_campaigns',

'unified_entities',

'unified_persons',

'unified_committees',

'unified_addresses',

'states'

]

for table in tables_to_drop:

try:

conn.execute(text(f"DROP TABLE IF EXISTS {table} CASCADE"))

console.print(f"[green]✅ Dropped {table}[/green]")

except Exception as e:

console.print(f"[yellow]⚠️ Could not drop {table}: {e}[/yellow]")

conn.commit()

except Exception as e:

console.print(f"[red]❌ Error dropping tables: {e}[/red]")

return

# Create new tables with correct schema

console.print("\n[bold blue]Creating new unified tables...[/bold blue]")

try:

# Create all tables

SQLModel.metadata.create_all(engine)

console.print("[green]✅ All unified tables created successfully![/green]")

except Exception as e:

console.print(f"[red]❌ Error creating tables: {e}[/red]")

return

🧰 Tools

🪛 Ruff (0.14.7)

65-65: Do not catch blind exception: Exception

(BLE001)

70-70: Do not catch blind exception: Exception

(BLE001)

81-81: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In recreate_tables.py around lines 34 to 83, the script drops tables then creates them without transactional safety; change the flow so the destructive drops and subsequent create_all occur inside a single transaction (use engine.begin()/engine.connect().begin() or a Session/transaction context) and only commit after SQLModel.metadata.create_all succeeds, bind create_all to the same connection/transaction; additionally add a pre-check/backup step (e.g., prompt the user and optionally dump or rename existing tables/schemas) and explicit destructive-warning messaging, and consider switching this workflow to Alembic migrations for production to enable proper rollbacks.

coderabbitai · 2025-12-04T01:38:04Z

recreate_tables.py

+            for table in tables_to_drop:
+                try:
+                    conn.execute(text(f"DROP TABLE IF EXISTS {table} CASCADE"))
+                    console.print(f"[green]✅ Dropped {table}[/green]")
+                except Exception as e:
+                    console.print(f"[yellow]⚠️ Could not drop {table}: {e}[/yellow]")
+
+            conn.commit()


🧹 Nitpick | 🔵 Trivial

Use parameterized queries or identifier quoting to avoid SQL injection patterns.

While the table names are hardcoded in tables_to_drop, directly interpolating into SQL strings establishes a risky pattern. Use SQLAlchemy's quoted_name or text().bindparams() for safety:

+from sqlalchemy import quoted_name +from sqlalchemy.sql import text ... for table in tables_to_drop: try: - conn.execute(text(f"DROP TABLE IF EXISTS {table} CASCADE")) + # Use quoted identifier to prevent SQL injection patterns + conn.execute(text(f"DROP TABLE IF EXISTS {quoted_name(table, quote=True)} CASCADE"))

Alternatively, use SQLAlchemy's Table.drop() API with the metadata for type-safe operations.

🧰 Tools

🪛 Ruff (0.14.7)

65-65: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In recreate_tables.py around lines 61 to 68, the code interpolates table names directly into a DROP TABLE SQL string which encourages SQL-injection-prone patterns; replace this by using SQLAlchemy-safe identifier handling — either construct Table objects from your MetaData and call table.drop(bind=conn, checkfirst=True) for each name, or if you must use text(), quote identifiers with sqlalchemy.sql.elements.quoted_name or use sql.text() with proper identifier binding via metadata/engine.dialect.identifier_preparer.quote to produce a safe quoted identifier before executing; update the try/except branch to execute the safe API and keep the conn.commit() after successful drops.

coderabbitai

Review continued from previous batch...

coderabbitai · 2025-12-04T01:39:17Z

app/abcs/abc_download.py

+progress = ProgressTracker()
+progress.start()


⚠️ Potential issue | 🟠 Major

Side effect at module import time.

Starting ProgressTracker at module load (progress.start()) means the live display begins whenever this module is imported, even if no download is happening. This can cause unexpected console output and resource usage.

Consider lazy initialization or explicit start:

progress = ProgressTracker() -progress.start()

Then call progress.start() explicitly in the download() method when needed.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In app/abcs/abc_download.py around lines 20-21, creating and starting ProgressTracker at module import time (progress = ProgressTracker(); progress.start()) causes an unwanted live display and resource use whenever the module is imported; change to lazy initialization by removing progress.start() from module scope, instantiate ProgressTracker (or call progress.start()) inside the download() method when an actual download begins (or use a helper that creates/starts the tracker on first use), and ensure the tracker is properly stopped/cleaned up after the download completes or on errors.

coderabbitai · 2025-12-04T01:39:17Z

app/abcs/abc_download.py

+            try:
+                cls._process_csv(zip_ref, file, tmp)
+            except Exception as e:
+                ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")


⚠️ Potential issue | 🟡 Minor

Broad exception catch may hide critical errors.

Catching bare Exception and only logging with ic() can hide serious issues like disk full, permission errors, or corrupted archives. Consider catching specific exceptions or at least logging the full traceback.

try: cls._process_csv(zip_ref, file, tmp) - except Exception as e: - ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}") + except (OSError, IOError, pl.exceptions.ComputeError) as e: + ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}") + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

cls._process_csv(zip_ref, file, tmp)

except Exception as e:

ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")

try:

cls._process_csv(zip_ref, file, tmp)

except (OSError, IOError, pl.exceptions.ComputeError) as e:

ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")

raise

🧰 Tools

🪛 Ruff (0.14.7)

68-68: Do not catch blind exception: Exception

(BLE001)

coderabbitai · 2025-12-04T01:39:17Z

app/abcs/abc_download.py

+        _csv_task = progress.add_task("T5", f"Extract CSV {file_name.stem}", "Started")
+        zip_ref.extract(file, tmp)
+
+        if file_name.suffix == '.txt':
+            return


⚠️ Potential issue | 🟡 Minor

Progress task not completed for .txt files.

When the file is a .txt, the function returns early on line 83 without calling progress.update_task(_csv_task, "Complete"). This leaves the task in "Started" status indefinitely.

if file_name.suffix == '.txt': + progress.update_task(_csv_task, "Complete") return

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_csv_task = progress.add_task("T5", f"Extract CSV {file_name.stem}", "Started")

zip_ref.extract(file, tmp)

if file_name.suffix == '.txt':

return

_csv_task = progress.add_task("T5", f"Extract CSV {file_name.stem}", "Started")

zip_ref.extract(file, tmp)

if file_name.suffix == '.txt':

progress.update_task(_csv_task, "Complete")

return

🤖 Prompt for AI Agents

In app/abcs/abc_download.py around lines 79 to 83, the progress task _csv_task is left in "Started" when a .txt file causes an early return; before returning for file_name.suffix == '.txt' call progress.update_task(_csv_task, "Complete") (or an equivalent completion/cleanup call) so the task status is set to "Complete" before the function returns.

coderabbitai · 2025-12-04T01:39:17Z

app/abcs/abc_download.py

+        if file_name.suffix == '.txt':
+            return
+
+        rename = f"{file_name.stem}_{datetime.now():%Y%m%d}dl"


🧹 Nitpick | 🔵 Trivial

datetime.now() called without timezone.

Using naive datetime for file naming can cause inconsistencies across different environments or if the system timezone changes.

+from datetime import datetime, timezone ... - rename = f"{file_name.stem}_{datetime.now():%Y%m%d}dl" + rename = f"{file_name.stem}_{datetime.now(timezone.utc):%Y%m%d}dl"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rename = f"{file_name.stem}_{datetime.now():%Y%m%d}dl"

rename = f"{file_name.stem}_{datetime.now(timezone.utc):%Y%m%d}dl"

🧰 Tools

🪛 Ruff (0.14.7)

85-85: datetime.datetime.now() called without a tz argument

(DTZ005)

🤖 Prompt for AI Agents

In app/abcs/abc_download.py around line 85, the filename uses a naive datetime (datetime.now()) which lacks timezone information; replace it with a timezone-aware timestamp (for example datetime.now(tz=timezone.utc) or datetime.now().astimezone()) when formatting the rename string and ensure the appropriate timezone import (from datetime import datetime, timezone) is added at the top of the file so the generated filename is consistently based on an explicit timezone (e.g., UTC).

coderabbitai · 2025-12-04T01:39:17Z

app/abcs/abc_download.py

+    @classmethod
+    @abc.abstractmethod
+    def download(cls, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
+        ...
+
+    @classmethod
    @abc.abstractmethod
-    def download(self, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
+    def consolidate_files(cls):
        ...

+    @classmethod
    @abc.abstractmethod
-    def read(self):
+    def read(cls):
        ...


🧹 Nitpick | 🔵 Trivial

Consider keyword-only arguments for boolean parameters.

The download method has two boolean positional parameters which can be confusing at call sites (e.g., download(True, False)).

@classmethod @abc.abstractmethod - def download(cls, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC: + def download(cls, *, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC: ...

🧰 Tools

🪛 Ruff (0.14.7)

109-109: Boolean-typed positional argument in function definition

(FBT001)

109-109: Boolean-typed positional argument in function definition

(FBT001)

114-114: Missing return type annotation for classmethod consolidate_files

(ANN206)

119-119: Missing return type annotation for classmethod read

(ANN206)

🤖 Prompt for AI Agents

In app/abcs/abc_download.py around lines 107 to 120, the download method currently accepts two boolean positional arguments which are ambiguous at call sites; change the abstractmethod signature to require keyword-only booleans (e.g., introduce a positional-only marker by adding a lone * so overwrite and read_from_temp must be passed by name), update all concrete implementations to match the new signature, and update all call sites to use explicit keywords (overwrite=..., read_from_temp=...) to avoid confusion; ensure type hints and return type remain unchanged and run tests to catch any mismatches.

coderabbitai · 2025-12-04T01:39:21Z

app/states/texas/validators/texas_settings.py

+from dataclasses import Field
+from typing import Optional
+from datetime import date
+from pydantic import ConfigDict, model_validator, BaseModel, BeforeValidator
+from sqlmodel import SQLModel, Field


⚠️ Potential issue | 🟡 Minor

Remove unused and redefined Field import.

Line 1 imports Field from dataclasses, which is immediately shadowed by the Field import from sqlmodel on line 5. The dataclasses.Field is never used. Additionally, BaseModel from pydantic (line 4) appears unused since you're using SQLModel.

-from dataclasses import Field from typing import Optional from datetime import date -from pydantic import ConfigDict, model_validator, BaseModel, BeforeValidator +from pydantic import ConfigDict, model_validator, BeforeValidator from sqlmodel import SQLModel, Field

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from dataclasses import Field

from typing import Optional

from datetime import date

from pydantic import ConfigDict, model_validator, BaseModel, BeforeValidator

from sqlmodel import SQLModel, Field

from typing import Optional

from datetime import date

from pydantic import ConfigDict, model_validator, BeforeValidator

from sqlmodel import SQLModel, Field

🧰 Tools

🪛 Ruff (0.14.7)

5-5: Redefinition of unused Field from line 1

(F811)

🤖 Prompt for AI Agents

In app/states/texas/validators/texas_settings.py around lines 1 to 5, remove the unused and shadowed imports: drop "Field" from dataclasses (line 1) since it is immediately shadowed by sqlmodel.Field, and remove "BaseModel" from pydantic (line 4) if it is not used elsewhere in the file; also trim any other unused pydantic imports (e.g., BeforeValidator) so that only the actually used symbols (like model_validator or ConfigDict) remain, leaving the sqlmodel Field import intact.

coderabbitai · 2025-12-04T01:39:21Z

app/states/texas/validators/texas_settings.py

+def check_contains_factory(match_string: str):
+    def check_contains(value: str) -> str:
+        if value and match_string not in value:
+            raise ValueError(f"Value must contain '{match_string}'")
+        return value
+    return check_contains


🛠️ Refactor suggestion | 🟠 Major

Duplicate of check_contains_factory in validator_functions.py.

This function is identical to the one in app/funcs/validator_functions.py (lines 166-171). Import it from there instead of redefining it.

-def check_contains_factory(match_string: str): - def check_contains(value: str) -> str: - if value and match_string not in value: - raise ValueError(f"Value must contain '{match_string}'") - return value - return check_contains +from funcs.validator_functions import check_contains_factory

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.7)

18-18: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In app/states/texas/validators/texas_settings.py around lines 15 to 20 there is a duplicate definition of check_contains_factory; remove the local function and instead import check_contains_factory from app/funcs/validator_functions.py (add the import at the top of the file), update any local references to use the imported name, and ensure imports are organized (and tests/types still pass).

coderabbitai · 2025-12-04T01:39:21Z

app/states/texas/validators/texas_settings.py

+class TECBaseModel(TECSettings):
+    recordType: Optional[str] = Field(BeforeValidator(check_contains_factory('recordType')), description="Record type code")
+    formTypeCd: Optional[str] = Field(BeforeValidator(check_contains_factory('formType')), description="TEC form used")
+    schedFormTypeCd: Optional[str] = Field(BeforeValidator(check_contains_factory('schedFormType')), description="TEC Schedule Used")
+    receivedDt: Optional[date] = Field(BeforeValidator(check_contains_factory('receivedDt')), description="Date report received by TEC")
+    filerIdent: Optional[str] = Field(default=None, description="Filer account #", max_length=100)
+    filerTypeCd: Optional[str] = Field(default=None, description="Type of filer", max_length=30)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the check_contains_factory function definition rg -n "def check_contains_factory|check_contains_factory" --type py -B 2 -A 10

Repository: Abstract-Data/campaignfinance-2023

Length of output: 12330

🏁 Script executed:

# Search for receivedDt usage patterns throughout the codebase rg -n "receivedDt" --type py -C 3

Repository: Abstract-Data/campaignfinance-2023

Length of output: 9386

🏁 Script executed:

# Check if there are test files that show how data is instantiated fd -e "test_" -o -e "_test.py" | head -20

Repository: Abstract-Data/campaignfinance-2023

Length of output: 194

🏁 Script executed:

# Let me examine the check_contains function more closely and see if there's type coercion cat -n app/funcs/validator_functions.py | sed -n '166,171p'

Repository: Abstract-Data/campaignfinance-2023

Length of output: 373

🏁 Script executed:

# Check if there are any Pydantic validators that handle date parsing before the BeforeValidator cat -n app/states/texas/validators/texas_settings.py | sed -n '1,60p'

Repository: Abstract-Data/campaignfinance-2023

Length of output: 3032

🏁 Script executed:

# Look for how data is loaded/instantiated to understand input format fd -type f -name "*.py" | xargs rg -l "TECBaseModel\|receivedDt" | head -10

Repository: Abstract-Data/campaignfinance-2023

Length of output: 248

🏁 Script executed:

# Search for validate_dates function to understand what it does rg -n "def validate_dates" --type py -A 15

Repository: Abstract-Data/campaignfinance-2023

Length of output: 3023

🏁 Script executed:

# Also check if there are any tests that show actual instantiation patterns fd -type f -name "*.py" | xargs rg -l "test.*TECBaseModel\|TECBaseModel.*test" | head -5

Repository: Abstract-Data/campaignfinance-2023

Length of output: 248

🏁 Script executed:

# Verify the execution order - check if there are any tests showing the error rg -n "receivedDt" app/states/texas/validators/texas_settings.py -B 5 -A 5

Repository: Abstract-Data/campaignfinance-2023

Length of output: 971

🏁 Script executed:

# Double-check the Pydantic v2 behavior by looking at imports and config head -20 app/states/texas/validators/texas_settings.py

Repository: Abstract-Data/campaignfinance-2023

Length of output: 758

Remove BeforeValidator(check_contains_factory('receivedDt')) from the receivedDt field.

The validate_dates model validator (mode='before') runs before field-level validators and converts the receivedDt string to a date object. When check_contains_factory subsequently runs, it receives a date object but expects a string, causing a TypeError at runtime. The substring-check logic is also semantically incorrect for date fields. Apply the same fix to any other non-string fields using this validator pattern.

🤖 Prompt for AI Agents

In app/states/texas/validators/texas_settings.py around lines 44 to 50, remove the Field-level BeforeValidator(check_contains_factory('receivedDt')) from the receivedDt field because validate_dates (mode='before') already converts the incoming string to a date and the check_contains validator expects a string; update receivedDt to be a plain Optional[date] Field (no BeforeValidator) and audit other non-string fields to remove any BeforeValidator(check_contains_factory(...)) uses so substring checks only run on string fields (keep check_contains_factory on string-typed fields only or move checks into appropriate pre-validation for strings).

coderabbitai · 2025-12-04T01:39:21Z

app/states/unified_database.py

+            # Get current version number
+            version_count = session.exec(select(UnifiedTransactionVersion).where(UnifiedTransactionVersion.transaction_id == tx.id)).count()
+            # Save current state as version
+            # Convert data to JSON-serializable format
+            data_dict = {}
+            for k in tx.__fields__.keys():
+                value = getattr(tx, k)
+                if isinstance(value, date):
+                    data_dict[k] = value.isoformat()
+                elif isinstance(value, datetime):
+                    data_dict[k] = value.isoformat()
+                elif isinstance(value, Decimal):
+                    data_dict[k] = float(value)
+                else:
+                    data_dict[k] = value
+
+            version = UnifiedTransactionVersion(
+                transaction_id=tx.id,
+                version_number=version_count + 1,
+                data=json.dumps(data_dict),
+                changed_at=datetime.utcnow(),
+                changed_by=user,
+                change_reason=reason,
+                amendment_details=amendment_details
+            )
+            session.add(version)
+            # Update fields
+            for k, v in updates.items():
+                setattr(tx, k, v)
+            tx.last_modified_at = datetime.utcnow()
+            tx.last_modified_by = user
+            tx.change_reason = reason
+            tx.amendment_details = amendment_details
+            session.add(tx)
+            session.commit()
+            session.refresh(tx)


⚠️ Potential issue | 🟠 Major

Race condition in version counting and deprecated datetime.utcnow().

The version count query followed by version creation is not atomic—concurrent updates could result in duplicate version numbers.

datetime.utcnow() is deprecated; use datetime.now(timezone.utc) instead.

For the race condition, consider using a database sequence or SELECT ... FOR UPDATE. For the datetime issue:

+from datetime import datetime, date, timezone ... - changed_at=datetime.utcnow(), + changed_at=datetime.now(timezone.utc), ... - tx.last_modified_at = datetime.utcnow() + tx.last_modified_at = datetime.now(timezone.utc)

The same pattern applies to all other datetime.utcnow() calls at lines 583, 617, 651, 685, 744, 745, 817, 827, 891, 1092, 1148.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.7)

574-574: datetime.datetime.utcnow() used

(DTZ003)

583-583: datetime.datetime.utcnow() used

(DTZ003)

🤖 Prompt for AI Agents

In app/states/unified_database.py around lines 554 to 589, the version_count query followed by creating a new UnifiedTransactionVersion is not atomic (race can yield duplicate version_number) and uses deprecated datetime.utcnow(); to fix, perform the version calculation and insert inside the same DB transaction while acquiring a lock on the related transaction row (e.g., SELECT ... FOR UPDATE or ORM equivalent) or use a dedicated DB sequence/auto-increment for version_number so concurrent workers cannot generate the same number; also replace datetime.utcnow() with datetime.now(timezone.utc) (and apply that replacement to the other occurrences at lines 583, 617, 651, 685, 744, 745, 817, 827, 891, 1092, 1148) so timestamps are timezone-aware.

coderabbitai · 2025-12-04T01:39:21Z

app/states/unified_database.py

+            version = UnifiedPersonVersion(
+                person_id=person.id,
+                version_number=version_count + 1,
+                data=json.dumps({k: getattr(person, k) for k in person.__fields__.keys()}),
+                changed_at=datetime.utcnow(),
+                changed_by=user,
+                change_reason=reason,
+                amendment_details=amendment_details
+            )
+            session.add(version)
+            for k, v in updates.items():
+                setattr(person, k, v)
+            session.add(person)
+            session.commit()
+            session.refresh(person)
+            return person


⚠️ Potential issue | 🟠 Major

Non-serializable data in person version snapshot.

Unlike update_transaction, the update_person method doesn't handle non-JSON-serializable types (e.g., date, datetime, Decimal) before calling json.dumps(). This will raise TypeError if the person model contains such fields.

Apply consistent serialization logic:

- version = UnifiedPersonVersion( - person_id=person.id, - version_number=version_count + 1, - data=json.dumps({k: getattr(person, k) for k in person.__fields__.keys()}), + data_dict = {} + for k in person.__fields__.keys(): + value = getattr(person, k) + if isinstance(value, date): + data_dict[k] = value.isoformat() + elif isinstance(value, datetime): + data_dict[k] = value.isoformat() + elif isinstance(value, Decimal): + data_dict[k] = float(value) + else: + data_dict[k] = value + version = UnifiedPersonVersion( + person_id=person.id, + version_number=version_count + 1, + data=json.dumps(data_dict),

The same issue exists in update_committee (line 650) and update_address (line 684).

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.7)

617-617: datetime.datetime.utcnow() used

(DTZ003)

🤖 Prompt for AI Agents

In app/states/unified_database.py around lines 613-628, the snapshot JSON for UnifiedPersonVersion uses json.dumps on model fields directly which will raise TypeError for non-serializable types (date, datetime, Decimal, UUID, etc.); update the code to serialize the person dict using the same approach as update_transaction (use a shared helper or json.dumps(..., default=str) or explicitly convert known field types to ISO strings/primitive types) so all values are JSON-serializable before creating the version. Apply the same fix to the analogous snapshots in update_committee (around line 650) and update_address (around line 684) so they also use the shared serialization helper or json.dumps with a default converter.

coderabbitai

Review continued from previous batch...

coderabbitai · 2025-12-04T01:40:30Z

app/states/texas/texas_downloader.py

+                file=cls.config.TEMP_FOLDER / f"{k}_{datetime.now():%Y%m%d}w.parquet",
+                compression='lz4')


🧹 Nitpick | 🔵 Trivial

Use timezone-aware datetime.

datetime.now() without a timezone argument produces a naive datetime, which can cause issues in systems with multiple timezones.

+from datetime import datetime, timezone ... df.write_parquet( - file=cls.config.TEMP_FOLDER / f"{k}_{datetime.now():%Y%m%d}w.parquet", + file=cls.config.TEMP_FOLDER / f"{k}_{datetime.now(timezone.utc):%Y%m%d}w.parquet", compression='lz4')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

file=cls.config.TEMP_FOLDER / f"{k}_{datetime.now():%Y%m%d}w.parquet",

compression='lz4')

file=cls.config.TEMP_FOLDER / f"{k}_{datetime.now(timezone.utc):%Y%m%d}w.parquet",

compression='lz4')

🧰 Tools

🪛 Ruff (0.14.7)

201-201: datetime.datetime.now() called without a tz argument

(DTZ005)

🤖 Prompt for AI Agents

In app/states/texas/texas_downloader.py around lines 201-202, the code uses datetime.now() to generate a filename which produces a naive datetime; update this to a timezone-aware timestamp by using datetime.now(timezone.utc) (or datetime.now(tz=your_app_tz) if a local zone is required) and ensure you import timezone from datetime (or use datetime.utcnow().replace(tzinfo=timezone.utc) if you prefer); replace the current call in the filename construction with the timezone-aware call so the generated filename includes an aware datetime.

- Updated all imports to use app. prefix for proper module resolution - Fixed app/abcs/*.py imports - Fixed app/funcs/*.py imports - Fixed app/states/texas/**/*.py imports - Fixed app/states/oklahoma/**/*.py imports - Fixed app/states/ohio/*.py imports - Fixed app/main.py imports - Updated test_validation_class.py to use new StateFileValidation API - Updated test_ingest_hypothesis.py with Hypothesis health check suppression

- Cast pl.lit(None) to String type when adding missing columns to fix 'type String is incompatible with expected type Null' error - Add cleanup of leftover parquet files at start of test_texas_consolidate_files to prevent flaky test failures from tmp_path reuse between Hypothesis examples

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

app/abcs/abc_category.py (1)

79-94: validation setter is recursively assigning to itself.

In the setter, self.validation = value calls the same setter again, leading to infinite recursion if validation is ever set. Either drop the setter or have it assign to a private backing attribute (e.g. _validation) instead.
app/states/texas/validators/texas_personname.py (1)
87-101: Guard against missing NamePrefixCd when deriving address prefixes.

If no key containing 'NamePrefixCd' is present, _pfx will be None, and key.startswith(_pfx) will raise a TypeError. While your data likely always includes a prefix field, a defensive check would avoid surprising failures:
-        _pfx = next((key for key in values.keys() if 'NamePrefixCd' in key), None)
-        if _pfx:
-            _pfx = _pfx.split('NamePrefixCd')[0]
-        _mail_keys = [key for key in values.keys() if 'Mail' in key and key.startswith(_pfx)]
+        _pfx = next((key for key in values.keys() if 'NamePrefixCd' in key), None)
+        if not _pfx:
+            return values
+        _pfx = _pfx.split('NamePrefixCd')[0]
+        _mail_keys = [key for key in values.keys() if 'Mail' in key and key.startswith(_pfx)]
This keeps the happy path untouched while making the validator resilient to missing prefix fields.
app/states/texas/validators/direct_expenditures.py (1)
1-44: Import is fine, but most TECDirectExpenses fields lack type annotations and won’t be modeled.

The new import of TECSettings from app.states.texas.validators.texas_settings is correct, but in TECDirectExpenses only the first few attributes are typed. All of the bare names (e.g. infoOnlyFlag, filerIdent, expendDt, etc.) will not become SQLModel/Pydantic fields and therefore won’t be persisted or validated.

If this model is meant to represent the full direct‑expenses record, you’ll want to add explicit type annotations (and Field(...) where appropriate) for each of those attributes, for example:
-    infoOnlyFlag
-    filerIdent
+    infoOnlyFlag: bool | None = None
+    filerIdent: str | None = None
    # ...and so on for the remaining attributes, with appropriate types...
Adjust the concrete types/defaults to match your TEC schema.
app/states/texas/validators/texas_address.py (1)
83-83: Logic bug: condition always evaluates to True.

The expression if "PO" or "PO BOX" in _address: is incorrect. It's parsed as if ("PO") or ("PO BOX" in _address):, and since "PO" is a non-empty string (truthy), this condition is always True regardless of _address content.

Apply this diff:
-        if "PO" or "PO BOX" in _address:
+        if "PO" in _address or "PO BOX" in _address:
Or more succinctly:
-        if "PO" or "PO BOX" in _address:
+        if any(x in _address for x in ("PO", "PO BOX")):
app/states/texas/validators/texas_expenses.py (1)
235-247: Minor: Incorrect field name in error message.

Line 243 references 'column': 'filerName' in the error dict, but this validator is for expendDescr and expendDt fields. The error message should reference the actual field being validated.
     @field_validator('expendDescr', 'expendDt', mode='before')
     @classmethod
-    def validate_required_fields(cls, v):
+    def validate_required_fields(cls, v, info):
         if v == "" or v is None:
             raise PydanticCustomError(
                 'missing_required_value',
                 "Field is required",
                 {
-                    'column': 'filerName',
+                    'column': info.field_name,
                     'value': v
                 }
             )
         return v

♻️ Duplicate comments (27)

app/states/texas/validators/texas_personname.py (1)
51-55: field_pfx validator still appears to be a copy‑paste error.

field_pfx is populated by set_type_prefix from a key containing NamePrefixCd, but its BeforeValidator checks for 'NameSuffixCd', which doesn’t match the field’s purpose and duplicates the suffix logic. This is almost certainly unintended and can also interfere if a value is ever passed in.

Recommend removing the BeforeValidator from field_pfx (or changing it to the correct key if you truly want validation there), e.g.:
-    field_pfx: Optional[str] = Field(
-        BeforeValidator(funcs.check_contains_factory('NameSuffixCd')),
-        description="Field prefix"
-    )
+    field_pfx: Optional[str] = Field(
+        default=None,
+        description="Field prefix",
+    )
app/states/texas/validators/texas_flags.py (2)
9-10: Annotate __table_args__ with ClassVar.

Mutable class attributes should be annotated with typing.ClassVar to indicate they are class-level configuration, not instance fields.
-from typing import Optional
+from typing import Optional, ClassVar

 class TECFlagsBase(TECBaseModel):
     __tablename__ = "tx_flags"
-    __table_args__ = {"schema": "texas"}
+    __table_args__: ClassVar[dict] = {"schema": "texas"}
12-14: Type mismatch: BeforeValidator expects str, but field is Optional[bool].

The check_contains_factory returns a validator operating on str values, but primaryUsaPhoneFlag is typed as Optional[bool]. This will cause a runtime error when the validator attempts string operations on a boolean.

Either remove the BeforeValidator if the field should be boolean, or change the type to Optional[str] if it holds string values like "Y"/"N":
-    primaryUsaPhoneFlag: Optional[bool] = Field(
-        BeforeValidator(funcs.check_contains_factory('PrimaryUsaPhoneFlag')),
-        description="Y indicates that the phone is a USA")
+    primaryUsaPhoneFlag: Optional[str] = Field(
+        default=None,
+        description="Y indicates that the phone is a USA")
app/states/texas/validators/texas_settings.py (3)
1-5: Remove unused Field import from dataclasses and unused BaseModel import.

Line 1 imports Field from dataclasses, which is immediately shadowed by the Field import from sqlmodel on line 5. The dataclasses.Field is never used. Additionally, BaseModel from pydantic appears unused.
-from dataclasses import Field
 from typing import Optional
 from datetime import date
-from pydantic import ConfigDict, model_validator, BaseModel, BeforeValidator
+from pydantic import ConfigDict, model_validator, BeforeValidator
 from sqlmodel import SQLModel, Field
15-20: Duplicate of check_contains_factory in validator_functions.py.

This function is identical to the one in app/funcs/validator_functions.py (lines 165-170). Import it from there instead of redefining it to maintain DRY principles.
-def check_contains_factory(match_string: str):
-    def check_contains(value: str) -> str:
-        if value and match_string not in value:
-            raise ValueError(f"Value must contain '{match_string}'")
-        return value
-    return check_contains
+from app.funcs.validator_functions import check_contains_factory
44-50: Remove BeforeValidator from receivedDt field - type mismatch will cause runtime error.

The validate_dates model validator (mode='before') converts receivedDt from string to date object before field-level validators run. When check_contains_factory subsequently executes, it receives a date object but expects a string, causing a TypeError. The substring-check logic is semantically incorrect for date fields.
-    receivedDt: Optional[date] = Field(BeforeValidator(check_contains_factory('receivedDt')), description="Date report received by TEC")
+    receivedDt: Optional[date] = Field(default=None, description="Date report received by TEC")
app/main.py (2)
71-73: doug_deason is created but not included in donation_list.

Line 71 creates doug_deason filtering for DOUG DEASON contributions, but line 73 only concatenates tim_dunn, farris_wilks, and don_dyer.

Apply this diff to include doug_deason:
-donation_list = pd.concat([tim_dunn, farris_wilks, don_dyer])
+donation_list = pd.concat([tim_dunn, farris_wilks, don_dyer, doug_deason])
83-348: Large block of commented-out code should be removed.

This extensive commented-out code clutters the file and reduces readability. Rely on version control history for reference instead.
app/states/ohio/ohio_downloader.py (1)

21-29: Incomplete implementation with unused parameters.

The overwrite and read_from_temp parameters are declared but never used. The trailing ... on line 29 is unnecessary since there's preceding code. Either implement the logic using these parameters or remove them if not needed.
app/states/texas/validators/texas_address.py (2)
32-32: Type annotation mismatch with default value.

standardized has default=None but is typed as str (non-optional). This will cause validation errors when the field is None.
-    standardized: str = Field(default=None, description="Address standardized")
+    standardized: Optional[str] = Field(default=None, description="Address standardized")
145-151: Nullable primary key fields violate database integrity.

TECPersonAddressLinkModel uses Optional fields with primary_key=True. NULL values in primary key columns prevent SQL from enforcing uniqueness, potentially allowing duplicate rows.

Consider creating separate link tables with non-nullable primary keys, or add a required discriminator column with validation ensuring exactly one optional field is set per row.
app/abcs/abc_download.py (3)
20-21: Side effect at module import time.

Starting ProgressTracker at module load (progress.start()) means the live display begins whenever this module is imported, even if no download is happening. This can cause unexpected console output and resource usage.

Consider lazy initialization:
 progress = ProgressTracker()
-progress.start()
Then call progress.start() explicitly in the download() method when needed.

82-83: Progress task not completed for .txt files.

When the file is a .txt, the function returns early on line 83 without calling progress.update_task(_csv_task, "Complete"). This leaves the task in "Started" status indefinitely.
         if file_name.suffix == '.txt':
+            progress.update_task(_csv_task, "Complete")
             return
66-69: Broad exception catch may hide critical errors.

Catching bare Exception and only logging with ic() can hide serious issues like disk full, permission errors, or corrupted archives.
             try:
                 cls._process_csv(zip_ref, file, tmp)
-            except Exception as e:
-                ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")
+            except (OSError, IOError, pl.exceptions.ComputeError) as e:
+                ic(f"Zip File Extraction Error on {file.filename.upper()}: {e}")
+                raise
app/states/texas/texas_downloader.py (4)
36-36: Consuming the iterator prematurely causes deletion to fail.

_existing_files is an iterator from itertools.chain. It's consumed when checking _safe_to_delete, but the actual deletion at line 83 happens after the download completes. By that time, the iterator is exhausted and no files get deleted.
-        _existing_files = itertools.chain(*map(lambda x: tmp.glob(x), ["*.csv", "*.parquet", "*.txt"]))
+        _existing_files = list(tmp.glob("*.csv")) + list(tmp.glob("*.parquet")) + list(tmp.glob("*.txt"))
218-227: _data is reassigned to empty dict, discarding previously collected files.

Line 218-221 populates _data with file mappings, but line 223 immediately reassigns _data = {}, discarding all the collected data. The subsequent loop iterates over the now-empty dictionary, making the entire parquet reading logic a no-op.
             _data = {}
             for file in parquet_files:
                 _type = file.stem.rsplit('_', 1)[0]
                 _data.setdefault(_type, []).append(file)

-            _data = {}
             for k, v in _data.items():
143-149: Second file is skipped during consolidation due to iterator consumption.

The loop at lines 143-149 iterates v[1:] to collect columns, but then line 157 iterates _files (the iterator). Since next(_files) already consumed the first file at line 132, and v[1:] doesn't consume _files, the iterator _files still points at v[1]. However, when you iterate _files at line 157, it starts from v[1], meaning files are read correctly BUT the second file (v[1]) is read twice - once for column discovery and once for processing.

Use the list directly instead of an iterator:
-                _files = iter(v)
-                # Start with scanning the first file instead of empty DataFrame
-                _first_file = next(_files)
                 df = (
-                    pl.read_parquet(_first_file)
+                    pl.read_parquet(v[0])
                     .with_columns(
-                        pl.lit(_first_file.stem)
+                        pl.lit(v[0].stem)
                         .alias('file_origin')
                     )
                 )
...
-                for _file in _files:
+                for _file in v[1:]:
Also applies to: 157-157

32-33: Unused parameter read_from_temp and boolean positional arguments.

read_from_temp is declared but never used. Additionally, boolean positional arguments can be confusing at call sites.

Either implement the logic using read_from_temp or remove it, and consider keyword-only arguments:
     @classmethod
     def download(
             cls,
-            read_from_temp: bool = True,
-            headless: bool = True
+            *,
+            headless: bool = True
     ) -> Self:
app/states/texas/models.py (3)
1-9: Critical: Remove dataclasses Field import and clean up unused imports.

Line 1 imports Field from dataclasses, but Line 5 redefines it by importing from sqlmodel. The dataclasses import is incorrect and unused. Additionally, several imports appear unused: ConfigDict, model_validator, BaseModel from pydantic (Line 4), SQLModel from sqlmodel (Line 5), tx_funcs (Line 7), and RecordKeyGenerator (Line 9).

Apply this diff to fix the import issues:
-from dataclasses import Field
 from typing import Optional
 from datetime import date
-from pydantic import ConfigDict, model_validator, BaseModel, BeforeValidator
+from pydantic import BeforeValidator
-from sqlmodel import SQLModel, Field
+from sqlmodel import Field
 from .validators.texas_settings import TECSettings
-import app.states.texas.funcs.tx_validation_funcs as tx_funcs
 import app.funcs.validator_functions as funcs
-from app.funcs.record_keygen import RecordKeyGenerator
15-17: Critical: Incorrect BeforeValidator usage with Field().

BeforeValidator must be used with Annotated type hints in Pydantic v2, not as a positional argument to Field(). This will cause runtime errors.

Apply this pattern to fix:
+from typing import Annotated
...
-    primaryUsaPhoneFlag: Optional[bool] = Field(
-        BeforeValidator(funcs.check_contains_factory('PrimaryUsaPhoneFlag')),
-        description="Y indicates that the phone is a USA")
+    primaryUsaPhoneFlag: Annotated[
+        Optional[bool],
+        BeforeValidator(funcs.check_contains_factory('PrimaryUsaPhoneFlag'))
+    ] = Field(default=None, description="Y indicates that the phone is a USA")
Apply the same fix to Lines 59-62 in TECBaseModel for recordType, formTypeCd, schedFormTypeCd, and receivedDt.

58-64: Major: Potential duplicate TECBaseModel definition.

According to past review comments, TECBaseModel appears to be already defined in app/states/texas/validators/texas_settings.py. The AI summary also references TECBaseModel being introduced in texas_settings.py as part of this PR. This could create code duplication and inconsistency.

Verify whether TECBaseModel already exists in texas_settings.py:
#!/bin/bash
# Check for TECBaseModel definitions across the codebase
rg -n "class TECBaseModel" --type py -C3
app/states/texas/validators/texas_expenses.py (3)
71-135: Remove commented-out code.

This large block of commented payee field definitions (65 lines) should be removed rather than left in the codebase. Git history preserves the old implementation if needed.

159-173: Critical: Field extraction incompatible with BeforeValidator checks in TECPersonNameBase.

The validate_payee validator extracts payee fields by key prefix (k.startswith("payee")), creating a dictionary like {'payeeNameFirst': 'John', 'payeeNameLast': 'Doe'}. It then passes this to TECPersonNameBase(**_payee). However, TECPersonNameBase fields use BeforeValidator(funcs.check_contains_factory(...)) which expect the VALUE to contain substrings like 'NameFirst'. After extraction, _payee['payeeNameFirst'] = "John", but the validator expects "John" to contain 'NameFirst', which will fail.

The check_contains_factory validators are designed to validate raw input keys/metadata, not cleaned extracted values. The validate_name method in TECPersonNameBase (shown in relevant snippets) extracts values by finding keys containing substrings (e.g., 'NameFirst' in key), which works with raw input but conflicts with the BeforeValidator checks.

Consider one of these solutions:

Solution 1 (Recommended): Pass raw values dict directly and let TECPersonNameBase handle extraction:
     @model_validator(mode='before')
     @classmethod
     def validate_payee(cls, values):
-        _payee = {k: v for k, v in values.items() if k.startswith("payee")}
         if not _payee:
             raise PydanticCustomError(
                 'missing_required_value',
                 "Payee data is required",
                 {
                     'column': 'payee',
                     'value': _payee
                 }
             )
-        values['payee'] = TECPersonNameBase(**_payee)
+        # Pass all values; TECPersonNameBase will extract payee fields via validate_name
+        values['payee'] = TECPersonNameBase(**values)
         return values
Solution 2: Remove BeforeValidator constraints from TECPersonNameBase fields that are populated via validate_name.

174-233: Remove commented-out code.

This 60-line block of commented validation logic should be removed. Git history preserves it if needed.
app/states/texas/validators/texas_filers.py (3)
110-110: Critical: Broken back_populates references on relationships.

Lines 145-147 in TECFilerName define relationships with back_populates="campaign" pointing to TECTreasurer, TECAssistantTreasurer, and TECCampaignChair. However, the corresponding campaign relationships are commented out (Lines 110, 119, 124). This will cause SQLAlchemy relationship configuration errors at runtime.

Either uncomment the campaign relationships in the target classes:
 class TECTreasurer(TECPersonNameBase, table=True):
     ...
-    # campaign: "TECFilerName" = Relationship(back_populates="treasurer", link_model=TECFilerIDNameLink)
+    campaign: "TECFilerName" = Relationship(back_populates="treasurer", link_model=TECFilerIDNameLink)

 class TECAssistantTreasurer(TECPersonNameBase, table=True):
     ...
-    # campaign: "TECFilerName" = Relationship(back_populates="asstTreasurer", link_model=TECFilerIDNameLink)
+    campaign: "TECFilerName" = Relationship(back_populates="asstTreasurer", link_model=TECFilerIDNameLink)

 class TECCampaignChair(TECPersonNameBase, table=True):
     ...
-    # campaign: "TECFilerName" = Relationship(back_populates="chair", link_model=TECFilerIDNameLink)
+    campaign: "TECFilerName" = Relationship(back_populates="chair", link_model=TECFilerIDNameLink)
Or remove the back_populates parameters from Lines 145-147.

Also applies to: 119-119, 124-124

150-154: Critical: Type mismatch - assigning TECPersonName object to str field.

The validate_name validator assigns a TECPersonName object to values['filerName'] (Line 153), but filerName is defined as str = Field(...) on Line 135. This will cause validation errors or type mismatches at runtime.

Consider one of these fixes:

Solution 1: Change filerName field type to TECPersonName:
-    filerName: str = Field(..., description="Filer name")
+    filerName: TECPersonName = Field(..., description="Filer name", sa_type=JSON)
Solution 2: Extract string representation before assignment:
     @model_validator(mode='before')
     @classmethod
     def validate_name(cls, values):
-        values['filerName'] = TECPersonName(**values)
+        person_name = TECPersonName(**values)
+        values['filerName'] = person_name.nameFull or f"{person_name.first} {person_name.last}"
         return values
Solution 3: Store in a separate field:
+    filer_person_name: TECPersonName = Field(default=None, description="Filer person name", sa_type=JSON)
     ...
     @model_validator(mode='before')
     @classmethod
     def validate_name(cls, values):
-        values['filerName'] = TECPersonName(**values)
+        values['filer_person_name'] = TECPersonName(**values)
         return values
60-65: Remove hundreds of lines of commented-out code.

This file contains massive blocks of commented-out code:

Lines 60-65: Old link model

Lines 89-96: Old link model

Lines 186-729: 500+ lines of commented class definitions and fields

These should be removed entirely. Git history preserves the old implementations if needed. Keeping this much commented code severely harms readability and maintainability.

Also applies to: 89-96, 186-729

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b78c27 and 1fb1a7c.

📒 Files selected for processing (30)

app/abcs/__init__.py (1 hunks)
app/abcs/abc_category.py (2 hunks)
app/abcs/abc_db_loader.py (1 hunks)
app/abcs/abc_download.py (2 hunks)
app/abcs/abc_state_config.py (2 hunks)
app/abcs/abc_validation.py (1 hunks)
app/funcs/db_loader.py (1 hunks)
app/main.py (3 hunks)
app/states/ohio/ohio_downloader.py (1 hunks)
app/states/oklahoma/funcs/ok_validation_funcs.py (1 hunks)
app/states/oklahoma/oklahoma.py (1 hunks)
app/states/oklahoma/validators/ok_contribution.py (1 hunks)
app/states/oklahoma/validators/ok_expenditure.py (1 hunks)
app/states/oklahoma/validators/ok_lobbyist_expenditure.py (1 hunks)
app/states/oklahoma/validators/ok_settings.py (1 hunks)
app/states/texas/__init__.py (2 hunks)
app/states/texas/funcs/tx_validation_funcs.py (1 hunks)
app/states/texas/models.py (1 hunks)
app/states/texas/texas_downloader.py (1 hunks)
app/states/texas/validators/direct_expenditures.py (1 hunks)
app/states/texas/validators/texas_address.py (3 hunks)
app/states/texas/validators/texas_contributions.py (1 hunks)
app/states/texas/validators/texas_debtdata.py (1 hunks)
app/states/texas/validators/texas_expenses.py (4 hunks)
app/states/texas/validators/texas_filers.py (5 hunks)
app/states/texas/validators/texas_flags.py (1 hunks)
app/states/texas/validators/texas_personname.py (2 hunks)
app/states/texas/validators/texas_settings.py (2 hunks)
app/tests/test_ingest_hypothesis.py (1 hunks)
app/tests/test_validation_class.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (16)

app/states/ohio/ohio_downloader.py (3)

app/abcs/abc_download.py (4)

FileDownloaderABC (24-142)

download (109-110)

consolidate_files (114-115)

read (119-120)

app/abcs/abc_state_config.py (4)

StateConfig (140-180)

CategoryTypes (110-136)

CategoryConfig (65-106)

CSVReaderConfig (40-42)

app/abcs/abc_category.py (1)

read (103-125)

app/funcs/db_loader.py (3)

app/abcs/abc_category.py (1)

logger (99-101)

app/abcs/abc_validation.py (1)

logger (32-34)

app/logger.py (1)

Logger (10-71)

app/abcs/abc_validation.py (4)

app/abcs/abc_validation_errors.py (1)

ValidationErrorList (29-95)

app/funcs/validator_functions.py (1)

create_record_id (16-27)

app/abcs/abc_category.py (1)

logger (99-101)

app/logger.py (1)

Logger (10-71)

app/abcs/abc_state_config.py (1)

app/funcs/toml_reader.py (1)

read_toml (6-8)

app/states/texas/texas_downloader.py (5)

app/funcs/csv_reader.py (3)

FileReader (54-144)

read_parquet (108-132)

read_folder (82-104)

app/abcs/abc_download.py (6)

FileDownloaderABC (24-142)

download (109-110)

not_headless (36-39)

extract_zipfile (62-70)

consolidate_files (114-115)

read (119-120)

app/abcs/abc_state_config.py (4)

StateConfig (140-180)

CategoryTypes (110-136)

CategoryConfig (65-106)

TEMP_FOLDER (149-150)

app/states/ohio/ohio_downloader.py (3)

download (21-29)

consolidate_files (31-32)

read (34-35)

app/live_display.py (2)

add_task (34-44)

update_task (46-52)

app/tests/test_validation_class.py (1)

app/abcs/abc_validation.py (3)

StateFileValidation (24-85)

validate_record (36-47)

validate (49-55)

app/abcs/abc_db_loader.py (3)

app/abcs/abc_category.py (1)

logger (99-101)

app/abcs/abc_validation.py (1)

logger (32-34)

app/logger.py (1)

Logger (10-71)

app/states/texas/validators/direct_expenditures.py (1)

app/states/texas/validators/texas_settings.py (1)

TECSettings (22-41)

app/states/texas/__init__.py (4)

app/logger.py (1)

Logger (10-71)

app/states/texas/texas_downloader.py (1)

TECDownloader (22-279)

app/states/texas/texas_search.py (1)

TexasSearch (165-294)

app/states/texas/validators/texas_filers.py (1)

TECFilerName (126-184)

app/states/texas/validators/texas_flags.py (3)

app/states/texas/models.py (1)

TECBaseModel (58-64)

app/states/texas/validators/texas_settings.py (2)

TECBaseModel (44-57)

check_contains_factory (15-20)

app/funcs/validator_functions.py (1)

check_contains_factory (166-171)

app/abcs/__init__.py (6)

app/abcs/abc_state_config.py (4)

CSVReaderConfig (40-42)

StateConfig (140-180)

CategoryConfig (65-106)

CategoryTypes (110-136)

app/abcs/abc_category.py (1)

StateCategoryClass (32-169)

app/abcs/abc_validation.py (1)

StateFileValidation (24-85)

app/abcs/abc_validation_errors.py (1)

ValidationErrorList (29-95)

app/abcs/abc_db_loader.py (1)

DBLoaderClass (13-162)

app/abcs/abc_download.py (1)

FileDownloaderABC (24-142)

app/states/texas/validators/texas_address.py (3)

app/states/texas/models.py (1)

TECBaseModel (58-64)

app/states/texas/validators/texas_settings.py (1)

TECBaseModel (44-57)

app/funcs/record_keygen.py (1)

RecordKeyGenerator (10-56)

app/states/texas/validators/texas_expenses.py (2)

app/states/texas/validators/texas_personname.py (1)

TECPersonNameBase (15-107)

app/states/texas/validators/texas_settings.py (1)

TECSettings (22-41)

app/main.py (2)

app/states/texas/texas_search.py (1)

TexasSearch (165-294)

app/states/texas/texas_downloader.py (2)

download (30-98)

dataframes (277-279)

app/states/texas/validators/texas_personname.py (4)

app/states/texas/models.py (1)

TECBaseModel (58-64)

app/states/texas/validators/texas_settings.py (2)

TECBaseModel (44-57)

check_contains_factory (15-20)

app/states/texas/validators/texas_address.py (1)

TECAddressBase (20-142)

app/funcs/validator_functions.py (1)

check_contains_factory (166-171)

app/states/texas/validators/texas_settings.py (3)

app/funcs/record_keygen.py (1)

RecordKeyGenerator (10-56)

app/funcs/validator_functions.py (2)

check_contains_factory (166-171)

check_contains (167-170)

app/states/texas/models.py (1)

TECBaseModel (58-64)

🪛 Ruff (0.14.7)

app/states/ohio/ohio_downloader.py

21-21: Boolean-typed positional argument in function definition

(FBT001)

21-21: Unused method argument: overwrite

(ARG002)

21-21: Boolean-typed positional argument in function definition

(FBT001)

21-21: Unused method argument: read_from_temp

(ARG002)

29-29: Unnecessary ... literal

Remove unnecessary ...

(PIE790)

app/states/texas/texas_downloader.py

25-25: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

32-32: Boolean-typed positional argument in function definition

(FBT001)

32-32: Boolean default positional argument in function definition

(FBT002)

32-32: Unused class method argument: read_from_temp

(ARG003)

33-33: Boolean-typed positional argument in function definition

(FBT001)

33-33: Boolean default positional argument in function definition

(FBT002)

36-36: Unnecessary map() usage (rewrite using a generator expression)

Replace map() with a generator expression

(C417)

63-63: Avoid specifying long messages outside the exception class

(TRY003)

83-83: Unnecessary map() usage (rewrite using a list comprehension)

Replace map() with a list comprehension

(C417)

87-87: Unnecessary generator (rewrite as a list comprehension)

Rewrite as a list comprehension

(C400)

89-89: Avoid specifying long messages outside the exception class

(TRY003)

95-95: Unnecessary map() usage (rewrite using a list comprehension)

Replace map() with a list comprehension

(C417)

113-113: Avoid specifying long messages outside the exception class

(TRY003)

120-120: Missing return type annotation for classmethod consolidate_files

(ANN206)

147-147: Do not catch blind exception: Exception

(BLE001)

178-178: Do not catch blind exception: Exception

(BLE001)

201-201: datetime.datetime.now() called without a tz argument

(DTZ005)

204-204: Unnecessary map() usage (rewrite using a list comprehension)

Replace map() with a list comprehension

(C417)

209-209: Boolean-typed positional argument in function definition

(FBT001)

209-209: Boolean default positional argument in function definition

(FBT002)

217-217: Avoid specifying long messages outside the exception class

(TRY003)

240-240: Do not catch blind exception: Exception

(BLE001)

264-264: Do not catch blind exception: Exception

(BLE001)

269-269: Do not catch blind exception: Exception

(BLE001)

277-277: Missing return type annotation for classmethod dataframes

(ANN206)

app/abcs/abc_download.py

6-6: Import from collections.abc instead: Generator

Import from collections.abc

(UP035)

6-6: typing.Dict is deprecated, use dict instead

(UP035)

30-30: Missing return type annotation for special method __post_init__

(ANN204)

36-36: Missing return type annotation for classmethod not_headless

(ANN206)

62-62: Missing return type annotation for classmethod extract_zipfile

Add return type annotation: None

(ANN206)

68-68: Do not catch blind exception: Exception

(BLE001)

73-73: Missing return type annotation for classmethod _process_csv

Add return type annotation: None

(ANN206)

85-85: datetime.datetime.now() called without a tz argument

(DTZ005)

109-109: Boolean-typed positional argument in function definition

(FBT001)

109-109: Boolean-typed positional argument in function definition

(FBT001)

114-114: Missing return type annotation for classmethod consolidate_files

(ANN206)

119-119: Missing return type annotation for classmethod read

(ANN206)

app/states/texas/validators/texas_flags.py

10-10: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

app/tests/test_ingest_hypothesis.py

7-7: typing.Dict is deprecated, use dict instead

(UP035)

7-7: typing.List is deprecated, use list instead

(UP035)

7-7: typing.Tuple is deprecated, use tuple instead

(UP035)

122-122: datetime.datetime.fromtimestamp() called without a tz argument

(DTZ006)

144-144: datetime.datetime.fromtimestamp() called without a tz argument

(DTZ006)

174-174: Unused lambda argument: self

(ARG005)

206-206: Loop control variable files not used within loop body

(B007)

app/states/texas/validators/texas_address.py

147-147: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

app/states/texas/models.py

5-5: Redefinition of unused Field from line 1

(F811)

13-13: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

app/states/texas/validators/texas_expenses.py

161-161: Missing return type annotation for classmethod validate_payee

(ANN206)

app/states/texas/validators/texas_filers.py

23-23: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

30-30: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

46-46: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

53-53: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

68-68: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

84-84: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

84-84: Missing type annotation for **data

(ANN003)

99-99: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

114-114: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

123-123: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

128-128: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

152-152: Missing return type annotation for classmethod validate_name

(ANN206)

158-158: Missing return type annotation for classmethod validate_addresses

(ANN206)

167-167: Missing return type annotation for classmethod validate_treasurer

(ANN206)

174-174: Missing return type annotation for classmethod validate_assistant_treasurer

(ANN206)

181-181: Missing return type annotation for classmethod validate_chair

(ANN206)

app/states/texas/validators/texas_settings.py

5-5: Redefinition of unused Field from line 1

(F811)

18-18: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (25)

app/abcs/abc_db_loader.py (1)

4-4: LGTM: Import path standardization.

The update to absolute import (from app.logger import Logger) improves consistency across the codebase and aligns with the project-wide refactor to app-level imports.

app/states/oklahoma/validators/ok_lobbyist_expenditure.py (1)

7-7: LGTM: Import path standardization.

The import path update to app.funcs.validator_functions aligns with the project-wide migration to absolute imports and is consistent with other Oklahoma validator modules.

app/states/texas/funcs/tx_validation_funcs.py (1)

2-2: LGTM: Import path standardization.

The update to app.funcs.validator_functions maintains consistency with the project-wide refactor to absolute imports.

app/abcs/abc_state_config.py (2)

15-15: LGTM: Import path standardization.

The import path update to app.funcs aligns with the project-wide migration to absolute imports.

154-158: LGTM: Formatting improvement.

The multi-line formatting of the return statement improves readability without changing functionality.

app/states/texas/validators/texas_contributions.py (1)

7-7: LGTM: Import path standardization.

The import path update to app.states.texas.funcs.tx_validation_funcs maintains consistency with the project-wide refactor to absolute imports.

app/states/oklahoma/validators/ok_settings.py (1)

3-3: LGTM: Import path standardization.

The import path update to app.funcs.validator_functions aligns with other Oklahoma validator modules and the project-wide migration to absolute imports.

app/funcs/db_loader.py (1)

3-3: LGTM: Import path standardization.

The update to absolute import (from app.logger import Logger) maintains consistency with the project-wide refactor to app-level imports.

app/states/oklahoma/validators/ok_contribution.py (1)

7-7: LGTM: Import path standardization.

The import path update to app.funcs.validator_functions completes the consistent migration to absolute imports across Oklahoma validator modules.

app/states/oklahoma/validators/ok_expenditure.py (1)

6-6: Import path refactor looks correct and consistent.

Switching to app.funcs.validator_functions keeps existing usages (validate_date, person_name_parser) intact and aligns with the new app‑scoped import style.

app/states/oklahoma/funcs/ok_validation_funcs.py (1)

1-1: OK to shift validator_functions to app‑scoped import.

This keeps ok_date_validation behavior the same while matching the new app.funcs namespace.

app/states/texas/validators/texas_debtdata.py (1)

7-8: Texas validator import realignment looks good.

Using app.funcs.validator_functions and app.states.texas.funcs.tx_validation_funcs keeps the existing DebtData validators wired through the new app‑level paths.

app/abcs/abc_category.py (1)

6-20: App‑scoped ABC imports wired correctly.

The shift to app.abcs.*, app.logger, and app.funcs keeps existing usages (StateFileValidation, DBLoaderClass, FileReader, logging) consistent with the new package layout.

app/states/oklahoma/oklahoma.py (1)

5-9: Oklahoma configuration imports correctly re‑namespaced.

Using app.funcs, app.logger, app.abcs and a relative .validators keeps the Oklahoma configuration self‑contained under app.states.oklahoma and consistent with the rest of the project.

app/tests/test_ingest_hypothesis.py (1)

26-145: GenericFileReader CSV/Parquet property‑based tests look solid.

The schema builder + _expected_records_from_schema pattern, together with Hypothesis strategies, gives good coverage of type conversion and metadata (file_origin, download_date) for both CSV and Parquet inputs. The assertions align well with the expected schema behavior.

app/abcs/__init__.py (1)

2-7: Import path updates and new progress export look good.

The import paths are consistently updated to use the app.abcs.* prefix, aligning with the project-wide module structure changes. The addition of progress to the public exports makes the ProgressTracker instance accessible to downstream modules.

app/abcs/abc_validation.py (1)

9-11: Import path updates are consistent.

The import paths are correctly updated to use the app.* prefix, consistent with the project-wide module structure changes.

app/states/texas/__init__.py (2)

2-7: Import path updates look good.

The import paths are correctly updated to use app.* prefixes and relative imports where appropriate. The addition of TexasSearch import aligns with the module's expanded functionality.

32-34: Unable to locate the file or validators referenced in the review comment.

After comprehensive searching, I could not find the file app/states/texas/__init__.py or any references to TECFiler/TECFilerName validators in the repository. The verification cannot be completed because:

The file path app/states/texas/__init__.py does not appear to exist in the repository

No search results returned any definitions or usages of TECFiler or TECFilerName validators

The validators module cannot be located to confirm the validator structure

Recommendation: Verify that the file path is correct and the validators module exists in the codebase before proceeding with this review comment. If this is a newly added file or the path has changed, please provide the correct location.

app/states/texas/validators/texas_flags.py (1)

15-53: Remaining boolean flag fields look correct.

The other flag fields are properly defined as Optional[bool] with default=None and descriptive documentation. The file is missing a trailing newline at the end.

app/tests/test_validation_class.py (3)

7-10: MockModel definition is appropriate for testing.

The MockModel correctly extends SQLModel with an optional id field and required field, suitable for testing the validation logic.

14-22: Test correctly validates single record handling.

The test properly instantiates StateFileValidation with the new validator_to_use parameter and correctly asserts the tuple structure of passed/failed results.

25-36: Test correctly handles generator-based validation.

The test properly iterates over the generator returned by validate() and validates each (status, record) tuple, aligning with the updated API.
app/states/ohio/ohio_downloader.py (1)
18-35: Method signatures don't match the ABC abstract methods.

The ABC FileDownloaderABC defines download, consolidate_files, and read as @classmethod, but OhioDownloader implements them as instance methods. This violates the abstract interface contract and will cause issues when called as classmethods.

Apply this diff to align with the ABC:
 class OhioDownloader(FileDownloaderABC):
     config: StateConfig = OHIO_CONFIGURATION

-    def download(self, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
+    @classmethod
+    def download(cls, overwrite: bool, read_from_temp: bool) -> FileDownloaderABC:
-        _driver = self.driver.create_driver()
-        _wait = self.driver
+        _driver = cls.driver.create_driver()
+        _wait = cls.driver
         _driver.get(OHIO_ENTRY_URL)
         _wait.wait_until_clickable(By.LINK_TEXT, "FTP Site")
         _ftp_site_url = _driver.find_element(By.LINK_TEXT, "FTP Site").get_attribute("href")
         _driver.get(_ftp_site_url)
-
-        ...

-    def consolidate_files(self):
+    @classmethod
+    def consolidate_files(cls):
         ...

-    def read(self):
+    @classmethod
+    def read(cls):
         ...
Likely an incorrect or invalid review comment.
app/states/texas/validators/texas_filers.py (1)

165-184: Code snippet does not exist in the repository.

The review comment references lines 165-184 in app/states/texas/validators/texas_filers.py, but the file contains only 149 lines. The validators validate_treasurer, validate_assistant_treasurer, and validate_chair with the shown implementation do not exist in the codebase. Additionally, the referenced classes TECTreasurer, TECAssistantTreasurer, and TECCampaignChair are not defined or imported anywhere in the repository.

This appears to be a review comment for a different version or branch of the code than what is currently present.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-12-04T02:12:37Z

app/main.py

+donation_campaign_expenses['new_PayeeName'] = donation_campaign_expenses['payeeNameOrganization'].fillna('') \
+    .where(donation_campaign_expenses['payeeNameOrganization'].notna(),
+           (donation_campaign_expenses['payeeNameFirst'].fillna('') + ' ' +
+            donation_campaign_expenses['payeeNameLast'].fillna('')).str.strip())


🧹 Nitpick | 🔵 Trivial

Chained .where() with .fillna() is confusing and potentially incorrect.

The logic here is convoluted. You use .fillna('') then .where() with the same notna() condition, which is redundant since fillna('') already handles NaN values. The intent seems to be: use organization name if present, otherwise concatenate first + last name.

Consider using a clearer approach:

-donation_campaign_expenses['new_PayeeName'] = donation_campaign_expenses['payeeNameOrganization'].fillna('') \ - .where(donation_campaign_expenses['payeeNameOrganization'].notna(), - (donation_campaign_expenses['payeeNameFirst'].fillna('') + ' ' + - donation_campaign_expenses['payeeNameLast'].fillna('')).str.strip()) +donation_campaign_expenses['new_PayeeName'] = donation_campaign_expenses['payeeNameOrganization'].fillna( + (donation_campaign_expenses['payeeNameFirst'].fillna('') + ' ' + + donation_campaign_expenses['payeeNameLast'].fillna('')).str.strip() +)

🤖 Prompt for AI Agents

In app/main.py around lines 77 to 80, the chained use of .fillna('') then .where(...notna()) is redundant and confusing; replace it with a clear conditional that picks payeeNameOrganization when it is non-empty/non-null, otherwise concatenates payeeNameFirst + ' ' + payeeNameLast (ensuring first/last are filled with '' before concatenation and the result is stripped). Implement this by using a single np.where (or Series.where with a boolean mask) that checks for organization presence (notna and not empty) and otherwise uses the concatenated first+last name after fillna and strip; remove the redundant fillna->where chaining.

coderabbitai · 2025-12-04T02:12:37Z

app/states/texas/models.py

+
+class TECFlags(TECSettings, table=True):
+    __tablename__ = "tx_flags"
+    __table_args__ = {"schema": "texas"}


🧹 Nitpick | 🔵 Trivial

Minor: Annotate table_args with ClassVar.

SQLAlchemy's __table_args__ is a class-level configuration attribute and should be annotated with typing.ClassVar to indicate it's not an instance attribute.

+from typing import ClassVar ... - __table_args__ = {"schema": "texas"} + __table_args__: ClassVar[dict] = {"schema": "texas"}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

__table_args__ = {"schema": "texas"}

from typing import ClassVar

__table_args__: ClassVar[dict] = {"schema": "texas"}

🧰 Tools

🪛 Ruff (0.14.7)

13-13: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents

In app/states/texas/models.py around line 13, __table_args__ is defined as a class-level SQLAlchemy config but lacks a typing.ClassVar annotation; update the declaration to use typing.ClassVar (e.g., ClassVar[dict] or ClassVar[Mapping[str, Any]]) and add the corresponding import from typing at the top of the file so the attribute is clearly marked as a class variable rather than an instance attribute.

coderabbitai · 2025-12-04T02:12:37Z

app/states/texas/texas_downloader.py

+    @classmethod
+    def dataframes(cls):
+        _files = cls._create_file_type_dict()
+        return {k: pl.scan_parquet(v) for k, v in _files.items()}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Does polars scan_parquet accept a list of files

💡 Result:

Short answer: No — polars.scan_parquet does not accept a Python list of file paths. It expects a single path/string (a file, directory, or a glob pattern) and can read multiple files via a directory or glob, or by using read_parquet which accepts multiple paths. [1][2]

References

scan_parquet signature and "Lazily read ... file or multiple files via glob patterns." [1]

Detailed scan_parquet params and behavior (source is a path/str; use glob or directory to read multiple files). [2]

🏁 Script executed:

# Find the file and examine context around line 279 cd app/states/texas && head -290 texas_downloader.py | tail -20

Repository: Abstract-Data/campaignfinance-2023

Length of output: 892

🏁 Script executed:

# Look at the _files variable and understand its structure cd app/states/texas && grep -n "_files" texas_downloader.py | head -20

Repository: Abstract-Data/campaignfinance-2023

Length of output: 1034

🏁 Script executed:

cd app/states/texas && grep -n "_create_file_type_dict" texas_downloader.py -A 30 | head -50

Repository: Abstract-Data/campaignfinance-2023

Length of output: 2444

pl.scan_parquet does not accept a list of paths—it only accepts a single path, directory, or glob pattern.

The _create_file_type_dict() method returns values as lists of file paths (line 114: _file_type_dict.setdefault(parts, []).append(file)), but pl.scan_parquet(v) at line 279 will fail when v is a list. Use a glob pattern or convert to a string path, or iterate through files individually like the consolidate_files() method does (lines 133–146).

🤖 Prompt for AI Agents

In app/states/texas/texas_downloader.py around line 279, pl.scan_parquet is being called with v which is a list of file paths but scan_parquet accepts a single path/directory/glob; replace the single call with logic that handles lists: either convert the list to an appropriate glob/string path if files share a directory/pattern, or iterate the list and call pl.scan_parquet for each file and then combine the resulting LazyFrames (mirroring the consolidate_files() approach at lines 133–146) so the function returns a mapping of keys to a single concatenated LazyFrame per key rather than attempting to pass a Python list into pl.scan_parquet.

coderabbitai · 2025-12-04T02:12:37Z

app/states/texas/validators/texas_expenses.py



-class TECExpense(TECSettings, table=True):
+class TECExpense(TECSettings):


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Get context around the class declaration cat -n app/states/texas/validators/texas_expenses.py | head -30

Repository: Abstract-Data/campaignfinance-2023

Length of output: 1413

🏁 Script executed:

# Search for TECExpense usage in the codebase rg "TECExpense" --type py -B2 -A2

Repository: Abstract-Data/campaignfinance-2023

Length of output: 1646

🏁 Script executed:

# Check what TECSettings looks like rg "class TECSettings" --type py -B2 -A10

Repository: Abstract-Data/campaignfinance-2023

Length of output: 1046

🏁 Script executed:

# Check if tx_expenses table is referenced anywhere else rg "tx_expenses" --type py

Repository: Abstract-Data/campaignfinance-2023

Length of output: 157

🏁 Script executed:

# Check other validators to see if they have table=True rg "class TEC\w+$TECSettings.*$" --type py

Repository: Abstract-Data/campaignfinance-2023

Length of output: 826

Remove dead code: __tablename__ and __table_args__ have no effect without table=True.

The TECExpense class lost its table=True parameter (line 18), which means it is no longer a SQLModel table. However, lines 19-20 still define __tablename__ and __table_args__, which are only meaningful for table models in SQLModel and now serve no purpose. Either restore table=True if TECExpense should remain an ORM table, or remove these two attributes to clean up dead code.

🤖 Prompt for AI Agents

In app/states/texas/validators/texas_expenses.py around line 18, the TECExpense class no longer declares table=True so its following __tablename__ and __table_args__ are dead; either restore table=True on the TECExpense class definition if it is intended to be a SQLModel ORM table, or remove the __tablename__ and __table_args__ attributes entirely to eliminate dead code; make the change consistent with other models (if you restore table=True, ensure imports and any DB metadata usage remain correct; if you remove them, run tests/lint to confirm no references rely on those attributes).

coderabbitai · 2025-12-04T02:12:37Z

app/states/texas/validators/texas_expenses.py

+    # clear_blank_strings = model_validator(mode='before')(funcs.clear_blank_strings)
+    # check_dates = model_validator(mode='before')(tx_funcs.validate_dates)
+    # check_zipcodes = model_validator(mode='before')(tx_funcs.check_zipcodes)
+    # address_formatting = model_validator(mode='before')(tx_funcs.address_formatting)
+    # phone_number_validation = model_validator(mode='before')(tx_funcs.phone_number_validation)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "clear_blank_strings|check_dates|check_zipcodes|phone_number_validation|address_formatting" app/states/texas/validators/texas_settings.py -A2

Repository: Abstract-Data/campaignfinance-2023

Length of output: 534

🏁 Script executed:

rg -n "class TECExpense" app/states/texas/validators/texas_expenses.py -A5

Repository: Abstract-Data/campaignfinance-2023

Length of output: 674

Commented-out validators are redundant and may mask inheritance issues.

These validators are inherited from TECSettings (via class inheritance on line 18), so commenting them out here is redundant. However, the situation is more nuanced:

clear_blank_strings and check_dates are active in TECSettings (lines 32-33), so commenting them in TECExpense serves no purpose and could confuse maintainers.

phone_number_validation in TECExpense corresponds to check_phone_numbers in TECSettings (line 35, active), with a naming mismatch that adds confusion.

check_zipcodes and address_formatting are also commented in TECSettings (lines 34, 36), making these redundant comments match the parent class state.

If duplicate execution is a concern, remove these comments entirely and rely on inheritance. If they're disabled intentionally, document why in a comment explaining the specific reason (e.g., "disabled due to performance concerns").

🤖 Prompt for AI Agents

In app/states/texas/validators/texas_expenses.py around lines 153 to 157, commented-out model validators duplicate or conflict with validators inherited from TECSettings and create confusion (and a naming mismatch for phone validator). Remove these commented validator lines so the class relies on inherited validators, or if you intentionally want to disable or rename any validator, add a concise inline comment explaining the reason (e.g., "disabled due to performance" or "renamed to check_phone_numbers in parent"); ensure names match the parent (phone_number_validation -> check_phone_numbers) if you reintroduce them.

coderabbitai · 2025-12-04T02:12:38Z

app/states/texas/validators/texas_filers.py

+    __tablename__ = "tx_person_names"
+    __table_args__ = {"schema": "texas"}
+    addresses: list["TECAddress"] = Relationship(back_populates="person_name", link_model=TECPersonAddressLinkModel)
+    # filer_name: list["TECFilerName"] = Relationship(back_populates="filer_name", link_model=TECFilerIDNameLink)


⚠️ Potential issue | 🟠 Major

Major: Commented-out filer_name relationship may cause issues.

Line 48 has a commented-out relationship: # filer_name: list["TECFilerName"] = Relationship(back_populates="filer_name", link_model=TECFilerIDNameLink). However, Line 144 in TECFilerName references this with filer_name: Optional["TECPersonName"] = Relationship(back_populates="filer_name", link_model=TECFilerIDNameLink). This asymmetry will cause SQLAlchemy relationship configuration errors.

Either uncomment and fix Line 48, or remove/update the relationship on Line 144.

🤖 Prompt for AI Agents

In app/states/texas/validators/texas_filers.py around line 48, the filer_name relationship is commented out which creates an asymmetric relationship because TECFilerName (line 144) declares filer_name with back_populates and link_model=TECFilerIDNameLink; restore symmetry by uncommenting and correcting the relationship on line 48 so the attribute name and back_populates value exactly match the counterpart on TECFilerName, use the proper typing (e.g., list["TECFilerName"] or Optional depending on cardinality) and include link_model=TECFilerIDNameLink and back_populates="filer_name"; alternatively, if you intend a different cardinality, update TECFilerName’s back_populates to match the relationship name used here and ensure forward refs/imports are present.

coderabbitai · 2025-12-04T02:12:38Z

app/states/texas/validators/texas_filers.py

+    def __init__(self, **data):
+        super().__init__(**data)
+        self.id = self.generate_key(self.ctaSeekOfficeCd, self.ctaSeekOfficeDistrict, self.ctaSeekOfficePlace)


🧹 Nitpick | 🔵 Trivial

Add type annotations to init method.

The __init__ method is missing return type annotation (-> None) and parameter type annotation for **data.

- def __init__(self, **data): + def __init__(self, **data: Any) -> None: super().__init__(**data) self.id = self.generate_key(self.ctaSeekOfficeCd, self.ctaSeekOfficeDistrict, self.ctaSeekOfficePlace)

Add to imports:

from typing import Optional +from typing import Any

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __init__(self, **data):

super().__init__(**data)

self.id = self.generate_key(self.ctaSeekOfficeCd, self.ctaSeekOfficeDistrict, self.ctaSeekOfficePlace)

def __init__(self, **data: Any) -> None:

super().__init__(**data)

self.id = self.generate_key(self.ctaSeekOfficeCd, self.ctaSeekOfficeDistrict, self.ctaSeekOfficePlace)

🧰 Tools

🪛 Ruff (0.14.7)

84-84: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

84-84: Missing type annotation for **data

(ANN003)

🤖 Prompt for AI Agents

In app/states/texas/validators/texas_filers.py around lines 84 to 86, the __init__ method lacks type annotations; change its signature to include parameter and return types (e.g., def __init__(self, **data: Any) -> None:) and update imports to include Any from typing (add "from typing import Any"). Ensure the rest of the method remains unchanged and type-checks under these annotations.

coderabbitai · 2025-12-04T02:12:38Z

app/states/texas/validators/texas_filers.py

+    filer_names: Optional["TECFilerID"] = Relationship(back_populates="other_names", link_model=TECFilerIDNameLink)
+    @model_validator(mode='before')
+    @classmethod
+    def validate_name(cls, values):


🧹 Nitpick | 🔵 Trivial

Add return type annotations to validators.

All the model_validator classmethods are missing return type annotations. While this is a minor issue, adding -> dict or -> Any would improve code clarity and satisfy type checkers.

@model_validator(mode='before') @classmethod - def validate_name(cls, values): + def validate_name(cls, values: dict) -> dict: ...

Apply the same pattern to all validators in this file.

Also applies to: 158-158, 167-167, 174-174, 181-181

🧰 Tools

🪛 Ruff (0.14.7)

152-152: Missing return type annotation for classmethod validate_name

(ANN206)

🤖 Prompt for AI Agents

In app/states/texas/validators/texas_filers.py around lines 152, 158, 167, 174 and 181, the model_validator classmethods (e.g., validate_name and the other validators) lack return type annotations; update each validator signature to include an explicit return type (-> dict or -> Any) consistent across the file, import Any from typing if using Any, and run type checks to ensure signatures match their actual return values.

coderabbitai · 2025-12-04T02:12:38Z

app/states/texas/validators/texas_settings.py

+    # flags: Optional[TECFlags] = Field(default=None, description="Flags")
+    #
+    # def __init__(self, **data):
+    #     _flags = TECFlags(**data)
+    #     if len(_flags.model_dump(exclude_none=True).values()) > 0:
+    #         self.flags = _flags
+    #     super().__init__(**data)


🧹 Nitpick | 🔵 Trivial

Remove or address commented-out code.

The commented-out flags field and __init__ method represent incomplete functionality. Either implement this feature or remove the dead code to avoid confusion.

🤖 Prompt for AI Agents

In app/states/texas/validators/texas_settings.py around lines 51-57, remove the commented-out flags field and __init__ or implement them properly; either delete the dead code to avoid confusion, or restore and implement the flags feature by adding a typed Optional[TECFlags] Field and an initializer that constructs TECFlags from incoming data only when flag values are present (using Pydantic v2 model validation/creation patterns and model_dump/exclude_none to detect values), then call super().__init__(**data); ensure imports and typing match the project Pydantic version.

coderabbitai · 2025-12-04T02:12:38Z

app/tests/test_ingest_hypothesis.py

+import datetime as dt
+from decimal import Decimal
+from pathlib import Path
+from typing import Dict, List, Tuple


🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

# First, let's check GenericFileReader's handling of download_date and timezone rg -n "download_date|fromtimestamp" app/ingest/file_reader.py -A 3 -B 1

Repository: Abstract-Data/campaignfinance-2023

Length of output: 143

🏁 Script executed:

# Now let's examine the test file at the specific lines mentioned cat -n app/tests/test_ingest_hypothesis.py | head -20

Repository: Abstract-Data/campaignfinance-2023

Length of output: 819

🏁 Script executed:

# Check lines 122-123, 144-145, 174-175 in the test file sed -n '120,125p; 142,147p; 172,177p' app/tests/test_ingest_hypothesis.py

Repository: Abstract-Data/campaignfinance-2023

Length of output: 1026

🏁 Script executed:

# Also check the Python version requirement fd -t f "setup.py|setup.cfg|pyproject.toml|.python-version" -x head -20 {} \;

Repository: Abstract-Data/campaignfinance-2023

Length of output: 576

🏁 Script executed:

# Search for GenericFileReader implementation across all files find . -name "*.py" -type f -exec grep -l "class GenericFileReader\|download_date" {} \;

Repository: Abstract-Data/campaignfinance-2023

Length of output: 854

🏁 Script executed:

# Get more context around line 174 sed -n '168,180p' app/tests/test_ingest_hypothesis.py

Repository: Abstract-Data/campaignfinance-2023

Length of output: 684

🏁 Script executed:

# Check if there are any datetime timezone usages in GenericFileReader or related files find . -name "*.py" -type f -exec grep -l "timezone\|UTC\|utc" {} \; | grep -E "(file_reader|ingest)" | head -5

Repository: Abstract-Data/campaignfinance-2023

Length of output: 60

🏁 Script executed:

# Try to find if GenericFileReader is defined in app directory structure find ./app -name "*.py" -type f | xargs grep -l "class GenericFileReader"

Repository: Abstract-Data/campaignfinance-2023

Length of output: 62

Address minor Ruff hints: built‑in generics, tz‑aware datetime, and unused lambda arg.

Line 7: Switch to built‑in generics for annotations (UP035): dict[str, object], list[str], tuple[...] instead of Dict, List, Tuple.

Lines 122 & 144: Add timezone information to dt.datetime.fromtimestamp() (DTZ006), e.g., dt.datetime.fromtimestamp(file_path.stat().st_mtime, tz=dt.timezone.utc) to make tests explicit with tz-aware datetimes rather than naive local times.

Line 174: Change lambda parameter from self to _ (ARG005): property(lambda _: tmp_path) since the parameter is unused.

These are optional cleanups; functionality is otherwise fine.

🧰 Tools

🪛 Ruff (0.14.7)

7-7: typing.Dict is deprecated, use dict instead

(UP035)

7-7: typing.List is deprecated, use list instead

(UP035)

7-7: typing.Tuple is deprecated, use tuple instead

(UP035)

🤖 Prompt for AI Agents

In app/tests/test_ingest_hypothesis.py around lines 7, 122, 144 and 174, address Ruff hints by switching typing imports to built-in generics on line 7 (use dict[str, object], list[str], tuple[...] instead of Dict, List, Tuple), make timestamps tz-aware on lines 122 and 144 by passing a timezone (e.g., tz=dt.timezone.utc) to dt.datetime.fromtimestamp(...), and rename the unused lambda parameter on line 174 from self to _ (property(lambda _: tmp_path)) to indicate it's unused.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (10)

app/states/texas/texas_downloader.py (7)
32-33: Unused parameter flagged in previous review.

The read_from_temp parameter is never used in the method body and should be removed as indicated in the previous review.

36-36: Iterator consumption issue flagged in previous review.

The iterator _existing_files will be exhausted when checking _safe_to_delete and cannot be reused for deletion later, as detailed in the previous review.

80-84: Deletion logic issue flagged in previous review.

The _existing_files iterator is exhausted by this point and the deletion will be a no-op, as explained in the previous review comment.

143-180: File reading inefficiency flagged in previous review.

The iterator consumption pattern causes files to be read multiple times, as detailed in the previous review.

201-202: Timezone-aware datetime recommended in previous review.

Using datetime.now() without a timezone argument was flagged in the previous review.

276-279: pl.scan_parquet does not accept a list, as flagged in previous review.

The _create_file_type_dict() method returns lists of file paths, but pl.scan_parquet(v) at line 279 expects a single path, directory, or glob pattern, not a list. This was detailed in the previous review.

208-274: Critical bug: _data dictionary is cleared before use.

Line 218 initializes _data = {}, lines 219-221 populate it with file mappings, but line 223 immediately reassigns _data = {}, discarding all collected data. The subsequent loop at line 224 iterates over an empty dictionary, making the entire read() method non-functional.

Apply this diff to remove the redundant reinitialization:
             _data = {}
             for file in parquet_files:
                 _type = file.stem.rsplit('_', 1)[0]
                 _data.setdefault(_type, []).append(file)

-            _data = {}
             for k, v in _data.items():
app/tests/test_ingest_hypothesis.py (3)

7-7: Deprecated typing imports flagged in previous review.

The previous review recommended switching to built-in generics (dict, list, tuple) instead of typing.Dict, typing.List, typing.Tuple.

122-122: Timezone-aware datetime recommended in previous review.

The previous review flagged dt.datetime.fromtimestamp() calls without a tz argument at lines 122 and 144.

Also applies to: 144-144

174-174: Unused lambda parameter flagged in previous review.

The previous review noted that the lambda parameter should be renamed from self to _ to indicate it's unused.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fb1a7c and 8037b27.

📒 Files selected for processing (2)

app/states/texas/texas_downloader.py (1 hunks)
app/tests/test_ingest_hypothesis.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

app/states/texas/texas_downloader.py (3)

app/funcs/csv_reader.py (3)

FileReader (54-144)

read_parquet (108-132)

read_folder (82-104)

app/abcs/abc_download.py (6)

FileDownloaderABC (24-142)

download (109-110)

not_headless (36-39)

extract_zipfile (62-70)

consolidate_files (114-115)

read (119-120)

app/live_display.py (2)

add_task (34-44)

update_task (46-52)

app/tests/test_ingest_hypothesis.py (4)

app/abcs/abc_state_config.py (1)

StateConfig (140-180)

app/states/texas/texas_downloader.py (1)

consolidate_files (120-206)

app/abcs/abc_download.py (1)

consolidate_files (114-115)

app/states/ohio/ohio_downloader.py (1)

consolidate_files (31-32)

🪛 Ruff (0.14.7)

app/states/texas/texas_downloader.py

25-25: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

32-32: Boolean-typed positional argument in function definition

(FBT001)

32-32: Boolean default positional argument in function definition

(FBT002)

32-32: Unused class method argument: read_from_temp

(ARG003)

33-33: Boolean-typed positional argument in function definition

(FBT001)

33-33: Boolean default positional argument in function definition

(FBT002)

36-36: Unnecessary map() usage (rewrite using a generator expression)

Replace map() with a generator expression

(C417)

63-63: Avoid specifying long messages outside the exception class

(TRY003)

83-83: Unnecessary map() usage (rewrite using a list comprehension)

Replace map() with a list comprehension

(C417)

87-87: Unnecessary generator (rewrite as a list comprehension)

Rewrite as a list comprehension

(C400)

89-89: Avoid specifying long messages outside the exception class

(TRY003)

95-95: Unnecessary map() usage (rewrite using a list comprehension)

Replace map() with a list comprehension

(C417)

113-113: Avoid specifying long messages outside the exception class

(TRY003)

120-120: Missing return type annotation for classmethod consolidate_files

(ANN206)

147-147: Do not catch blind exception: Exception

(BLE001)

178-178: Do not catch blind exception: Exception

(BLE001)

201-201: datetime.datetime.now() called without a tz argument

(DTZ005)

204-204: Unnecessary map() usage (rewrite using a list comprehension)

Replace map() with a list comprehension

(C417)

209-209: Boolean-typed positional argument in function definition

(FBT001)

209-209: Boolean default positional argument in function definition

(FBT002)

217-217: Avoid specifying long messages outside the exception class

(TRY003)

240-240: Do not catch blind exception: Exception

(BLE001)

264-264: Do not catch blind exception: Exception

(BLE001)

269-269: Do not catch blind exception: Exception

(BLE001)

277-277: Missing return type annotation for classmethod dataframes

(ANN206)

app/tests/test_ingest_hypothesis.py

7-7: typing.Dict is deprecated, use dict instead

(UP035)

7-7: typing.List is deprecated, use list instead

(UP035)

7-7: typing.Tuple is deprecated, use tuple instead

(UP035)

122-122: datetime.datetime.fromtimestamp() called without a tz argument

(DTZ006)

144-144: datetime.datetime.fromtimestamp() called without a tz argument

(DTZ006)

174-174: Unused lambda argument: self

(ARG005)

211-211: Loop control variable files not used within loop body

(B007)

🔇 Additional comments (1)

app/tests/test_ingest_hypothesis.py (1)

1-224: LGTM: Comprehensive hypothesis-driven tests for ingestion and consolidation.

The test suite effectively validates:

GenericFileReader for both CSV and Parquet formats with schema mapping and type conversions

Metadata augmentation (file_origin, download_date)

Texas consolidation workflow with dynamic schema alignment and file lifecycle management

The property-based testing approach with Hypothesis provides strong coverage of edge cases and various input combinations.

coderabbitai · 2025-12-04T02:16:28Z

app/states/texas/texas_downloader.py

+        if not headless or not cls.driver.headless:
+            cls.not_headless()


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /repo && cat -n app/states/texas/texas_downloader.py | head -80

Repository: Abstract-Data/campaignfinance-2023

Length of output: 133

🏁 Script executed:

cd /repo && rg -n "not_headless" --type=py -A 3 -B 3

Repository: Abstract-Data/campaignfinance-2023

Length of output: 133

🏁 Script executed:

cd /repo && rg -n "def not_headless|driver\.headless\s*=" --type=py -B 2 -A 5

Repository: Abstract-Data/campaignfinance-2023

Length of output: 133

🏁 Script executed:

pwd && git ls-files | head -20

Repository: Abstract-Data/campaignfinance-2023

Length of output: 529

🏁 Script executed:

find . -name "texas_downloader.py" -type f 2>/dev/null

Repository: Abstract-Data/campaignfinance-2023

Length of output: 116

🏁 Script executed:

cat -n app/states/texas/texas_downloader.py | head -80

Repository: Abstract-Data/campaignfinance-2023

Length of output: 3225

🏁 Script executed:

rg -n "not_headless|def not_headless" --type=py -B 3 -A 5

Repository: Abstract-Data/campaignfinance-2023

Length of output: 1239

🏁 Script executed:

rg -n "driver\.headless" --type=py -B 2 -A 2

Repository: Abstract-Data/campaignfinance-2023

Length of output: 647

🏁 Script executed:

rg -n "class CreateWebDriver|def __init__" app/abcs/abc_download.py -B 2 -A 10 | head -50

Repository: Abstract-Data/campaignfinance-2023

Length of output: 60

🏁 Script executed:

rg -n "CreateWebDriver" --type=py -B 2 -A 8 | grep -A 8 "class CreateWebDriver"

Repository: Abstract-Data/campaignfinance-2023

Length of output: 60

🏁 Script executed:

grep -r "class CreateWebDriver" --include="*.py"

Repository: Abstract-Data/campaignfinance-2023

Length of output: 60

🏁 Script executed:

cat -n app/abcs/abc_download.py | head -50

Repository: Abstract-Data/campaignfinance-2023

Length of output: 2099

🏁 Script executed:

rg -n "self.driver|cls.driver" app/abcs/abc_download.py -B 2 -A 2 | head -40

Repository: Abstract-Data/campaignfinance-2023

Length of output: 222

🏁 Script executed:

rg -n "FileDownloaderABC.driver.*=" --type=py -B 2 -A 2

Repository: Abstract-Data/campaignfinance-2023

Length of output: 407

Simplify the headless mode logic.

The conditional logic is unnecessarily complex:

if not headless or not cls.driver.headless: cls.not_headless()

When headless=True, this evaluates to if False or not cls.driver.headless:, which redundantly calls not_headless() only if the driver is already in non-headless mode. When headless=False, the first condition is always True, making the second condition irrelevant. Simplify to directly handle the user's requested mode:

-if not headless or not cls.driver.headless: - cls.not_headless() +if not headless: + cls.not_headless()

🤖 Prompt for AI Agents

In app/states/texas/texas_downloader.py around lines 39-40, the conditional "if not headless or not cls.driver.headless: cls.not_headless()" is overly complex; change it to directly act on the user's requested mode by calling cls.not_headless() only when headless is False (i.e., use "if not headless: cls.not_headless()") and remove the redundant driver.headless check so the code clearly follows the requested headless parameter.

coderabbitai · 2025-12-04T02:16:28Z

app/states/texas/texas_downloader.py

+
+        if not parquet:
+            _reader = FileReader()
+            cls.data = _reader.read_folder(cls.folder, file_counts=cls.config.FILE_COUNTS)


⚠️ Potential issue | 🔴 Critical

cls.folder may not exist as a class attribute.

The read classmethod references cls.folder at line 213, but folder is defined as an instance attribute (field(init=False)) in the base class FileDownloaderABC. Accessing it as a class attribute will fail with an AttributeError.

Replace with the class-level cls.config.TEMP_FOLDER:

if not parquet: _reader = FileReader() - cls.data = _reader.read_folder(cls.folder, file_counts=cls.config.FILE_COUNTS) + cls.data = _reader.read_folder(cls.config.TEMP_FOLDER, file_counts=cls.config.FILE_COUNTS) else:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cls.data = _reader.read_folder(cls.folder, file_counts=cls.config.FILE_COUNTS)

cls.data = _reader.read_folder(cls.config.TEMP_FOLDER, file_counts=cls.config.FILE_COUNTS)

🤖 Prompt for AI Agents

In app/states/texas/texas_downloader.py around line 213, the classmethod uses cls.folder which is an instance-only attribute and will raise AttributeError; replace the reference with the class-level temp folder (cls.config.TEMP_FOLDER) so the classmethod uses a valid class attribute, i.e., call _reader.read_folder using cls.config.TEMP_FOLDER and keep passing cls.config.FILE_COUNTS.

coderabbitai · 2025-12-04T02:16:29Z

app/tests/test_ingest_hypothesis.py

+    for category, files in category_files.items():
+        consolidated = list(tmp_path.glob(f"{category}_*.parquet"))
+        assert len(consolidated) == 1, f"Expected a single consolidated file for {category}"


🧹 Nitpick | 🔵 Trivial

Unused loop variable files.

The loop unpacks category_files.items() into (category, files) but only category is used in the loop body. The files variable can be replaced with _ to indicate it's intentionally unused.

Apply this diff:

- for category, files in category_files.items(): + for category, _ in category_files.items(): consolidated = list(tmp_path.glob(f"{category}_*.parquet")) assert len(consolidated) == 1, f"Expected a single consolidated file for {category}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for category, files in category_files.items():

consolidated = list(tmp_path.glob(f"{category}_*.parquet"))

assert len(consolidated) == 1, f"Expected a single consolidated file for {category}"

for category, _ in category_files.items():

consolidated = list(tmp_path.glob(f"{category}_*.parquet"))

assert len(consolidated) == 1, f"Expected a single consolidated file for {category}"

🧰 Tools

🪛 Ruff (0.14.7)

211-211: Loop control variable files not used within loop body

(B007)

🤖 Prompt for AI Agents

In app/tests/test_ingest_hypothesis.py around lines 211 to 213, the loop unpacks category_files.items() into (category, files) but never uses files; replace the unused variable with an underscore (e.g., for category, _ in category_files.items():) to indicate it is intentionally unused and silence linters.

- Add direct lookup for unified field names before fuzzy matching - Skip fuzzy matching on fields with unified prefixes (person_, address_, etc.) - Make transaction fields optional in field library (required=False) This fixes the issue where campaign names like 'LYTAL LYTAL LYTAL' appeared instead of 'JAMES H. LYTAL' due to fuzzy matching incorrectly matching person_last_name when looking for person_first_name.

coderabbitai

Actionable comments posted: 7

♻️ Duplicate comments (6)

app/states/unified_sqlmodels.py (6)
129-129: datetime.utcnow() is deprecated in Python 3.12+.

This issue was previously flagged. Consider using datetime.now(timezone.utc) for forward compatibility.

413-414: Duplicate field definition: state_id is defined twice.

This bug was previously flagged. Remove Line 414 to fix.

853-853: Unused parameter entity_role.

This was previously flagged. Either remove the parameter or use it to differentiate address field lookups by role (e.g., contributor vs payee address fields).

1001-1002: Broad exception handling returns False silently.

This was previously flagged. Consider catching specific exceptions or logging unexpected errors.

1017-1018: Broad exception handling returns None silently.

This was previously flagged. Consider logging unexpected exceptions at debug level.

1191-1196: SQL injection risk and incorrect session.exec usage.

The SQL injection risk was previously flagged. Additionally, Line 1194 uses session.exec(text(query), params), but SQLModel's exec() method signature doesn't accept positional parameters this way. This will likely raise a TypeError or ignore the parameters.

Use SQLModel's query builder as suggested in the previous review, or fix the raw SQL approach:
-                if len(conditions) >= 2:
-                    query = f"SELECT * FROM unified_addresses WHERE {' AND '.join(conditions)} LIMIT 1"
-                    result = session.exec(text(query), params).first()
-                    if result:
-                        return UnifiedAddress(**dict(result))
+                if len(params) >= 2:
+                    stmt = select(UnifiedAddress)
+                    if params.get("street_1"):
+                        stmt = stmt.where(UnifiedAddress.street_1 == params["street_1"])
+                    if params.get("city"):
+                        stmt = stmt.where(UnifiedAddress.city == params["city"])
+                    if params.get("state"):
+                        stmt = stmt.where(UnifiedAddress.state == params["state"])
+                    if params.get("zip_code"):
+                        stmt = stmt.where(UnifiedAddress.zip_code == params["zip_code"])
+                    return session.exec(stmt).first()

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8037b27 and ff9a994.

📒 Files selected for processing (2)

app/states/unified_field_library.py (1 hunks)
app/states/unified_sqlmodels.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

app/states/unified_sqlmodels.py (2)

app/states/unified_field_library.py (1)

get_state_mappings (380-382)

app/states/unified_database.py (1)

get_session (77-79)

🪛 Ruff (0.14.7)

app/states/unified_field_library.py

8-8: typing.Dict is deprecated, use dict instead

(UP035)

8-8: typing.List is deprecated, use list instead

(UP035)

8-8: typing.Set is deprecated, use set instead

(UP035)

8-8: typing.Tuple is deprecated, use tuple instead

(UP035)

108-108: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

115-115: Missing return type annotation for private function _initialize_unified_fields

Add return type annotation: None

(ANN202)

296-296: Missing return type annotation for private function _initialize_state_mappings

Add return type annotation: None

(ANN202)

465-465: Unnecessary mode argument

Remove mode argument

(UP015)

app/states/unified_sqlmodels.py

8-8: typing.List is deprecated, use list instead

(UP035)

8-8: typing.Dict is deprecated, use dict instead

(UP035)

136-136: Unnecessary UTF-8 encoding argument to encode

Remove unnecessary encoding argument

(UP012)

164-164: Missing return type annotation for special method __post_init__

(ANN204)

227-227: Missing return type annotation for special method __post_init__

(ANN204)

288-288: Missing return type annotation for special method __post_init__

(ANN204)

414-414: Class field state_id is defined multiple times

Remove duplicate field definition for state_id

(PIE794)

747-747: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

853-853: Unused method argument: entity_role

(ARG002)

953-953: Dynamically typed expressions (typing.Any) are disallowed in _get_field_value

(ANN401)

1000-1000: Consider moving this statement to an else block

(TRY300)

1001-1001: Do not catch blind exception: Exception

(BLE001)

1017-1017: Do not catch blind exception: Exception

(BLE001)

1046-1046: Do not catch blind exception: Exception

(BLE001)

1104-1104: Do not catch blind exception: Exception

(BLE001)

1193-1193: Possible SQL injection vector through string-based query construction

(S608)

1199-1199: Do not catch blind exception: Exception

(BLE001)

1211-1211: Do not catch blind exception: Exception

(BLE001)

1214-1214: Dynamically typed expressions (typing.Any) are disallowed in value

(ANN401)

1228-1228: Dynamically typed expressions (typing.Any) are disallowed in value

(ANN401)

1238-1238: Naive datetime constructed using datetime.datetime.strptime() without %z

(DTZ007)

1244-1244: Consider moving this statement to an else block

(TRY300)

1248-1248: Dynamically typed expressions (typing.Any) are disallowed in value

(ANN401)

1303-1303: Dynamically typed expressions (typing.Any) are disallowed in obj

(ANN401)

1303-1303: Dynamically typed expressions (typing.Any) are disallowed in _json_default

(ANN401)

1314-1314: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)

🔇 Additional comments (13)

app/states/unified_field_library.py (3)

15-77: LGTM!

The enum definitions are comprehensive and well-organized with clear semantic groupings for campaign finance data categories and field types.

80-99: LGTM!

The dataclass definitions are well-designed with appropriate defaults. The confidence field in StateFieldMapping is a thoughtful addition for handling approximate field mappings.

296-374: LGTM!

The state-specific field mappings are comprehensive and well-structured. The varying confidence scores (e.g., 0.9 for receivedDt → filed_date, 0.8 for Purpose → description) appropriately indicate mapping uncertainty.

app/states/unified_sqlmodels.py (10)

23-99: LGTM!

The enum definitions are comprehensive and well-organized, covering all necessary transaction types, person types, roles, and association types for campaign finance data modeling.

140-189: LGTM!

The UnifiedAddress model is well-structured with appropriate field types and normalization in __post_init__. Note: The datetime.utcnow deprecation applies to multiple models throughout this file (Lines 157-158, 214-215, 278-279, etc.)—consider a bulk update if addressing the deprecation.

438-546: LGTM!

The version/history tables follow a consistent pattern for audit trails with JSON snapshots. The UnifiedCommitteePerson junction table properly models committee-person relationships with role information and change tracking.

548-622: LGTM!

The entity association and campaign models are well-designed with proper relationship definitions. The UnifiedEntityAssociation pattern effectively models various relationship types (treasurer-of, vendor-for, etc.) between entities.

625-688: LGTM!

The UnifiedContribution and UnifiedLoan models properly normalize specialized transaction data with appropriate entity relationships and unique constraints on transaction_id.

741-786: LGTM!

The UnifiedSQLModelBuilder initialization and build_transaction method are well-implemented. Storing raw data as JSON provides valuable debugging/audit capability.

788-851: LGTM!

The build_person method has good logic for detecting person types, including handling special placeholder cases like "NON-ITEMIZED CONTRIBUTOR" and "ANONYMOUS".

885-951: LGTM!

The build_committee method has sensible fallback logic for handling committees without names, using candidate name or filer_id as alternatives.

1264-1301: LGTM!

The _determine_transaction_type method has good logic for inferring transaction types from both explicit type fields and field name patterns, with appropriate fallbacks.

1309-1477: LGTM!

The UnifiedSQLDataProcessor effectively orchestrates the building of complete transaction graphs with related entities (persons, committees, campaigns) and specialized financial records (contributions, loans). The process_record method properly creates all necessary junction table entries.

coderabbitai · 2025-12-04T02:57:15Z

app/states/unified_field_library.py

+across different states, enabling consistent data processing and analysis.
+"""
+
+from typing import Dict, List, Set, Optional, Any, Tuple


🧹 Nitpick | 🔵 Trivial

Consider using built-in generic types instead of typing imports.

Python 3.9+ supports dict, list, set, and tuple as generic types directly, making typing.Dict, typing.List, etc. unnecessary.

-from typing import Dict, List, Set, Optional, Any, Tuple +from typing import Optional, Any

Then update type hints throughout the file (e.g., Dict[str, FieldDefinition] → dict[str, FieldDefinition]).

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.7)

8-8: typing.Dict is deprecated, use dict instead

(UP035)

8-8: typing.List is deprecated, use list instead

(UP035)

8-8: typing.Set is deprecated, use set instead

(UP035)

8-8: typing.Tuple is deprecated, use tuple instead

(UP035)

coderabbitai · 2025-12-04T02:57:15Z

app/states/unified_field_library.py

+        # Committee fields
+        self.unified_fields.update({
+            "committee_name": FieldDefinition(
+                name="committee_name",
+                category=FieldCategory.COMMITTEE_NAME,
+                field_type=FieldType.STRING,
+                description="Name of the political committee",
+                examples=["filerName", "Committee Name"],
+                validation_rules={"max_length": 200}
+            ),
+            "committee_type": FieldDefinition(
+                name="committee_type",
+                category=FieldCategory.COMMITTEE_TYPE,
+                field_type=FieldType.CODE,
+                description="Type of political committee",
+                examples=["filerTypeCd", "Committee Type"],
+                validation_rules={"enum_values": ["candidate", "pac", "party", "other"]}
+            ),
+        })


⚠️ Potential issue | 🟠 Major

Missing committee_filer_id unified field definition.

The state mappings reference committee_filer_id (Lines 332, 369), but this unified field is not defined in _initialize_unified_fields. This will cause get_unified_field("committee_filer_id") to return None and may break field lookups.

Add the missing field definition:

"committee_type": FieldDefinition( name="committee_type", category=FieldCategory.COMMITTEE_TYPE, field_type=FieldType.CODE, description="Type of political committee", examples=["filerTypeCd", "Committee Type"], validation_rules={"enum_values": ["candidate", "pac", "party", "other"]} ), + "committee_filer_id": FieldDefinition( + name="committee_filer_id", + category=FieldCategory.COMMITTEE_IDENTIFICATION, + field_type=FieldType.IDENTIFIER, + description="Unique identifier for the committee from the state filing system", + examples=["filerIdent", "Org ID"], + validation_rules={"required": False} + ), })

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Committee fields

self.unified_fields.update({

"committee_name": FieldDefinition(

name="committee_name",

category=FieldCategory.COMMITTEE_NAME,

field_type=FieldType.STRING,

description="Name of the political committee",

examples=["filerName", "Committee Name"],

validation_rules={"max_length": 200}

),

"committee_type": FieldDefinition(

name="committee_type",

category=FieldCategory.COMMITTEE_TYPE,

field_type=FieldType.CODE,

description="Type of political committee",

examples=["filerTypeCd", "Committee Type"],

validation_rules={"enum_values": ["candidate", "pac", "party", "other"]}

),

})

# Committee fields

self.unified_fields.update({

"committee_name": FieldDefinition(

name="committee_name",

category=FieldCategory.COMMITTEE_NAME,

field_type=FieldType.STRING,

description="Name of the political committee",

examples=["filerName", "Committee Name"],

validation_rules={"max_length": 200}

),

"committee_type": FieldDefinition(

name="committee_type",

category=FieldCategory.COMMITTEE_TYPE,

field_type=FieldType.CODE,

description="Type of political committee",

examples=["filerTypeCd", "Committee Type"],

validation_rules={"enum_values": ["candidate", "pac", "party", "other"]}

),

"committee_filer_id": FieldDefinition(

name="committee_filer_id",

category=FieldCategory.COMMITTEE_IDENTIFICATION,

field_type=FieldType.IDENTIFIER,

description="Unique identifier for the committee from the state filing system",

examples=["filerIdent", "Org ID"],

validation_rules={"required": False}

),

})

🤖 Prompt for AI Agents

In app/states/unified_field_library.py around lines 250 to 268, add a unified field definition for "committee_filer_id" because state mappings reference it but it's missing; create a FieldDefinition with name="committee_filer_id", an appropriate category (e.g., FieldCategory.COMMITTEE_ID or COMMITTEE_FILER_ID), field_type=FieldType.STRING, a short description like "Unique filer identifier for the committee", useful examples (e.g., ["filerId", "committeeFilerId"]), and validation_rules such as {"max_length": 100} (and an optional pattern if your IDs follow a specific regex); insert this into the self.unified_fields.update({...}) block alongside the other committee fields so get_unified_field("committee_filer_id") returns the new definition.

coderabbitai · 2025-12-04T02:57:15Z

app/states/unified_field_library.py

+    def import_mappings(self, file_path: Path):
+        """Import mappings from a JSON file"""
+        with open(file_path, 'r') as f:
+            import_data = json.load(f)


🧹 Nitpick | 🔵 Trivial

Minor: Unnecessary mode argument.

The 'r' mode is the default for open() and can be omitted.

- with open(file_path, 'r') as f: + with open(file_path) as f: import_data = json.load(f)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def import_mappings(self, file_path: Path):

"""Import mappings from a JSON file"""

with open(file_path, 'r') as f:

import_data = json.load(f)

def import_mappings(self, file_path: Path):

"""Import mappings from a JSON file"""

with open(file_path) as f:

import_data = json.load(f)

🧰 Tools

🪛 Ruff (0.14.7)

465-465: Unnecessary mode argument

Remove mode argument

(UP015)

🤖 Prompt for AI Agents

In app/states/unified_field_library.py around lines 463 to 466, the open(file_path, 'r') call uses the explicit 'r' mode which is unnecessary; remove the mode argument (use open(file_path) or preferably file_path.open()) or even simpler use file_path.read_text() and then json.loads to read the JSON, so the function reads the file without the redundant 'r' argument.

coderabbitai · 2025-12-04T02:57:15Z

app/states/unified_sqlmodels.py

+    """Normalized file origin references for ingested data."""
+    __tablename__ = "file_origins"
+
+    id: str = Field(default=None, primary_key=True, max_length=64)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for FileOrigin instantiation patterns rg -n "FileOrigin\(" --type py -A 3 -B 1

Repository: Abstract-Data/campaignfinance-2023

Length of output: 750

🏁 Script executed:

# Look for the build_key method implementation rg -n "def build_key" --type py -A 10

Repository: Abstract-Data/campaignfinance-2023

Length of output: 928

🏁 Script executed:

# Search for how FileOrigin instances are created in tests or usage rg -n "FileOrigin\s*=" --type py -A 2

Repository: Abstract-Data/campaignfinance-2023

Length of output: 60

FileOrigin.id has default=None but is a primary key—remove the default or use default_factory.

All instantiations in the codebase explicitly set id via the build_key() result (e.g., FileOrigin(id=key, ...)), so the default is never actually used. However, defining default=None on a primary key field is poor practice and confusing. Either remove the default entirely or use default_factory to compute the key automatically if instances are created without an id.

🤖 Prompt for AI Agents

In app/states/unified_sqlmodels.py around line 126, the FileOrigin.id field is declared as a primary_key with default=None which is confusing and poor practice; remove the default=None so callers must supply an id, or replace it with a default_factory that generates the key (e.g., call the existing build_key function) if you want automatic ids—pick one approach, update the Field declaration accordingly, and ensure any tests/instantiations that relied on implicit defaults still pass.

coderabbitai · 2025-12-04T02:57:16Z

app/states/unified_sqlmodels.py

+    updated_at: datetime = Field(default_factory=datetime.utcnow)
+
+    # Relationships
+    address: Optional[UnifiedAddress] = Relationship()


⚠️ Potential issue | 🟡 Minor

Missing back_populates on UnifiedCommittee.address relationship.

Unlike other address relationships (e.g., UnifiedPerson.address, UnifiedEntity.address), this relationship doesn't specify back_populates. This inconsistency may cause bidirectional relationship sync issues.

- address: Optional[UnifiedAddress] = Relationship() + address: Optional[UnifiedAddress] = Relationship(back_populates="committees")

Note: You'll also need to add a corresponding relationship on UnifiedAddress:

# In UnifiedAddress class committees: List["UnifiedCommittee"] = Relationship(back_populates="address")

🤖 Prompt for AI Agents

In app/states/unified_sqlmodels.py around line 282, the UnifiedCommittee.address Relationship is missing back_populates which breaks bidirectional sync; update UnifiedCommittee.address to include back_populates="committees" and add the corresponding Relationship on the UnifiedAddress model (e.g., committees: List["UnifiedCommittee"] = Relationship(back_populates="address") ) so both sides reference each other consistently.

coderabbitai · 2025-12-04T02:57:16Z

app/states/unified_sqlmodels.py

+# Database indexes for performance
+class UnifiedTransactionIndexes:
+    """Database indexes for the unified transaction system"""
+
+    # Transaction indexes
+    idx_transactions_state = Index("idx_transactions_state", UnifiedTransaction.state_id)
+    idx_transactions_type = Index("idx_transactions_type", UnifiedTransaction.transaction_type)
+    idx_transactions_date = Index("idx_transactions_date", UnifiedTransaction.transaction_date)
+    idx_transactions_amount = Index("idx_transactions_amount", UnifiedTransaction.amount)
+    idx_transactions_committee = Index("idx_transactions_committee", UnifiedTransaction.committee_id)
+    idx_transactions_id = Index("idx_transactions_id", UnifiedTransaction.transaction_id)
+    idx_transactions_file_origin = Index("idx_transactions_file_origin", UnifiedTransaction.file_origin_id)
+
+    # Person indexes
+    idx_persons_name = Index("idx_persons_name", UnifiedPerson.last_name, UnifiedPerson.first_name)
+    idx_persons_organization = Index("idx_persons_organization", UnifiedPerson.organization)
+    idx_persons_type = Index("idx_persons_type", UnifiedPerson.person_type)
+
+    # Transaction-Person relationship indexes
+    idx_transaction_persons_role = Index("idx_transaction_persons_role", UnifiedTransactionPerson.role)
+    idx_transaction_persons_transaction = Index("idx_transaction_persons_transaction", UnifiedTransactionPerson.transaction_id)
+    idx_transaction_persons_person = Index("idx_transaction_persons_person", UnifiedTransactionPerson.person_id)
+
+    # Address indexes
+    idx_addresses_state = Index("idx_addresses_state", UnifiedAddress.state)
+    idx_addresses_city = Index("idx_addresses_city", UnifiedAddress.city)
+
+    # Committee indexes
+    idx_committees_name = Index("idx_committees_name", UnifiedCommittee.name)
+    idx_committees_type = Index("idx_committees_type", UnifiedCommittee.committee_type)
+    # Note: filer_id is already indexed as primary key
+
+    # Entity indexes
+    idx_entities_type = Index("idx_entities_type", UnifiedEntity.entity_type)
+    idx_entities_name = Index("idx_entities_name", UnifiedEntity.normalized_name)
+
+    # Campaign indexes
+    idx_campaigns_year = Index("idx_campaigns_year", UnifiedCampaign.election_year)
+    idx_campaigns_office = Index("idx_campaigns_office", UnifiedCampaign.office_sought)
+    idx_campaigns_name = Index("idx_campaigns_name", UnifiedCampaign.normalized_name)
+    idx_campaign_entity_role = Index("idx_campaign_entity_role", UnifiedCampaignEntity.role)
+
+    # Contribution indexes
+    idx_contributions_date = Index("idx_contributions_date", UnifiedContribution.receipt_date)
+    idx_contributions_amount = Index("idx_contributions_amount", UnifiedContribution.amount)
+
+    # Loan indexes
+    idx_loans_date = Index("idx_loans_date", UnifiedLoan.loan_date)
+    idx_loans_due_date = Index("idx_loans_due_date", UnifiedLoan.due_date)
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for references to UnifiedTransactionIndexes and its indexes rg -n "UnifiedTransactionIndexes|idx_transactions_|idx_persons_|idx_addresses_|idx_committees_" --type py -B 2 -A 2

Repository: Abstract-Data/campaignfinance-2023

Length of output: 2773

🏁 Script executed:

# Check if indexes are used in __table_args__ of model classes rg -n "__table_args__" --type py -A 5

Repository: Abstract-Data/campaignfinance-2023

Length of output: 17028

🏁 Script executed:

# Look for migration files or initialization code that might apply these indexes fd -e py -e sql | xargs grep -l "idx_" 2>/dev/null | head -20

Repository: Abstract-Data/campaignfinance-2023

Length of output: 109

🏁 Script executed:

# Check the import statements and usage of Index from SQLAlchemy rg -n "from sqlalchemy|from sqlmodel" app/states/unified_sqlmodels.py -A 3

Repository: Abstract-Data/campaignfinance-2023

Length of output: 498

🏁 Script executed:

# Search for Index usage within __table_args__ rg -n "__table_args__.*Index|Index.*__table_args__" --type py -B 5 -A 5

Repository: Abstract-Data/campaignfinance-2023

Length of output: 60

🏁 Script executed:

# Check if there are any migration files fd -e py -path "*/migrations/*" | head -20

Repository: Abstract-Data/campaignfinance-2023

Length of output: 246

🏁 Script executed:

# Look for alembic or db initialization files fd -e py | xargs grep -l "alembic\|migrate\|create_all\|metadata" 2>/dev/null | head -10

Repository: Abstract-Data/campaignfinance-2023

Length of output: 302

🏁 Script executed:

# Check unified_database.py for index-related code cat -n app/states/unified_database.py | head -100

Repository: Abstract-Data/campaignfinance-2023

Length of output: 4198

🏁 Script executed:

# Check recreate_tables.py cat -n recreate_tables.py

Repository: Abstract-Data/campaignfinance-2023

Length of output: 5343

🏁 Script executed:

# Search for any reference to UnifiedTransactionIndexes specifically rg -n "UnifiedTransactionIndexes" --type py

Repository: Abstract-Data/campaignfinance-2023

Length of output: 146

🏁 Script executed:

# Check if any model class has __table_args__ with Index to confirm pattern rg -n "class Unified" app/states/unified_sqlmodels.py -A 10 | head -50

Repository: Abstract-Data/campaignfinance-2023

Length of output: 2934

🏁 Script executed:

# Verify the specific model definitions that need indexes rg -n "class UnifiedTransaction\(|class UnifiedPerson\(|class UnifiedAddress\(" app/states/unified_sqlmodels.py -A 8

Repository: Abstract-Data/campaignfinance-2023

Length of output: 1434

Index definitions in UnifiedTransactionIndexes won't be automatically applied to the database.

The UnifiedTransactionIndexes class defines indexes as class attributes, but SQLModel/SQLAlchemy doesn't automatically apply indexes defined this way. These Index objects need to be registered with each model's __table_args__. Currently, the models use individual index=True flags on fields but don't include composite or additional indexes via __table_args__.

Move index definitions to the respective model classes. For example, on UnifiedTransaction:

class UnifiedTransaction(SQLModel, table=True): __tablename__ = "unified_transactions" __table_args__ = ( Index("idx_transactions_state", "state_id"), Index("idx_transactions_type", "transaction_type"), Index("idx_transactions_date", "transaction_date"), # ... other indexes )

Repeat for UnifiedPerson, UnifiedAddress, UnifiedCommittee, and other models with indexes defined in UnifiedTransactionIndexes.

coderabbitai · 2025-12-04T02:57:16Z

app/states/unified_sqlmodels.py

+    def _parse_amount(self, value: Any) -> Optional[Decimal]:
+        """Parse an amount value to Decimal"""
+        if value is None:
+            return None
+
+        try:
+            # Remove currency symbols and commas
+            if isinstance(value, str):
+                value = re.sub(r'[^\d.-]', '', value)
+
+            return Decimal(str(value))
+        except (ValueError, TypeError):
+            return None


⚠️ Potential issue | 🟡 Minor

Amount parsing may produce invalid results.

The regex on Line 1222 removes all characters except digits, ., and -, but doesn't validate the result. Inputs like "$1.2.3" or "--100" would produce invalid decimals that could raise InvalidOperation when converting to Decimal.

Consider adding validation after stripping:

def _parse_amount(self, value: Any) -> Optional[Decimal]: """Parse an amount value to Decimal""" if value is None: return None try: # Remove currency symbols and commas if isinstance(value, str): value = re.sub(r'[^\d.-]', '', value) + # Validate the cleaned string has valid decimal format + if not re.match(r'^-?\d*\.?\d+$', value): + return None return Decimal(str(value)) - except (ValueError, TypeError): + except (ValueError, TypeError, InvalidOperation): return None

Note: Import InvalidOperation from decimal module.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.7)

1214-1214: Dynamically typed expressions (typing.Any) are disallowed in value

(ANN401)

🤖 Prompt for AI Agents

In app/states/unified_sqlmodels.py around lines 1214 to 1226, the current _parse_amount strips non-numeric characters but doesn't validate the cleaned string so inputs like "$1.2.3" or "--100" can produce InvalidOperation; after stripping, validate the result with a regex that only allows an optional single leading '-' and at most one decimal point (e.g. ^-?\d+(\.\d+)?$), ensure the cleaned string is non-empty, then convert to Decimal while catching decimal.InvalidOperation (import InvalidOperation) and return None on any validation or conversion failure.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff9a994 and 26ea85c.

📒 Files selected for processing (1)

.github/workflows/ci.yml (1 hunks)

🧰 Additional context used

🪛 YAMLlint (1.37.1)

.github/workflows/ci.yml

[warning] 3-3: truthy value should be one of [false, true]

(truthy)

[error] 35-35: too many blank lines (1 > 0)

(empty-lines)

🔇 Additional comments (3)

.github/workflows/ci.yml (3)

17-17: Verify Python 3.12 dependency compatibility.

The workflow specifies Python 3.12. Given the extensive dependency updates in this PR (Parquet support, Polars, Rich, SQLModel, etc.), please verify that all project dependencies are compatible with Python 3.12. If you haven't tested against 3.12 locally, consider doing so before merging.

24-25: Confirm CODECOV_TOKEN secret is configured.

The workflow uploads coverage to Codecov using ${{ secrets.CODECOV_TOKEN }}. Ensure this secret is already configured in the repository settings; otherwise, the upload step will fail silently or error out at runtime.

1-35: LGTM—solid CI foundation.

The workflow provides a clean, modern CI setup that aligns well with the PR's test infrastructure additions:

Modern Python 3.12 with uv package manager

Pytest with coverage reporting

Codecov integration for tracking coverage over time

Triggers on both push and PR events for comprehensive testing

Pending verification of Python 3.12 compatibility and Codecov secret setup, this is ready to go.

coderabbitai · 2025-12-04T03:48:37Z

.github/workflows/ci.yml

+on:
+  push:
+  pull_request:


🧹 Nitpick | 🔵 Trivial

Consider explicit trigger configuration for YAML linting.

The push: and pull_request: events without explicit configuration are valid GitHub Actions syntax and will trigger on all push/PR events. However, YAMLlint prefers explicit values. This is a minor style note—the current syntax works fine, but if you want to satisfy stricter linting, consider using explicit empty dicts or null:

on: - push: - pull_request: + push: {} + pull_request: {}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

on:

push:

pull_request:

on:

push: {}

pull_request: {}

🧰 Tools

🪛 YAMLlint (1.37.1)

[warning] 3-3: truthy value should be one of [false, true]

(truthy)

🤖 Prompt for AI Agents

.github/workflows/ci.yml lines 3-5: the workflow triggers use bare "push:" and "pull_request:" which are valid but fail stricter YAML lint rules; change each to an explicit empty mapping or null (e.g., "push: {}" and "pull_request: {}" or "push: null" and "pull_request: null") so the YAML linter accepts the syntax while preserving the same trigger behavior.

coderabbitai · 2025-12-04T03:48:37Z

.github/workflows/ci.yml

+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          slug: Abstract-Data/campaignfinance-2023
+


🧹 Nitpick | 🔵 Trivial

Remove trailing blank line.

YAMLlint flags the trailing blank line after line 34. Apply this cleanup:

slug: Abstract-Data/campaignfinance-2023 -

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

slug: Abstract-Data/campaignfinance-2023

🧰 Tools

🪛 YAMLlint (1.37.1)

[error] 35-35: too many blank lines (1 > 0)

(empty-lines)

🤖 Prompt for AI Agents

In .github/workflows/ci.yml around line 35, there is a trailing blank line flagged by YAMLlint; remove the empty line at the end of the file so the file ends on line 34 without an extra newline-only line, then save the file (ensure the file still ends with a single newline character if your editor enforces it).

sentry · 2025-12-04T03:57:55Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
1	1	0	0

View the top 1 failed test(s) by shortest run time

app/tests/test_ingest_hypothesis.py::app.tests.test_ingest_hypothesis

Stack Traces | 0s run time

ImportError while importing test module '.../app/tests/test_ingest_hypothesis.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../..../uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
app/tests/test_ingest_hypothesis.py:20: in <module>
    from app.ingest.file_reader import FieldSpec, FieldType, GenericFileReader, SchemaDefinition
E   ModuleNotFoundError: No module named 'app.ingest'

To view more test analytics, go to the Prevent Tests Dashboard

.github/workflows/ci.yml

+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+
+      - name: Install dependencies
+        run: |
+          uv pip install --system .
+          uv pip install --system pytest pytest-cov
+
+      - name: Run tests with coverage
+        run: pytest --cov --cov-branch --cov-report=xml --junitxml=junit.xml -o junit_family=legacy
+
+      - name: Upload coverage reports to Codecov
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          slug: Abstract-Data/campaignfinance-2023
+
+      - name: Upload test results to Codecov
+        if: ${{ !cancelled() }}
+        uses: codecov/test-results-action@v1
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}


The recommended fix is to explicitly set a permissions block for the workflow or for the individual job(s). In this case, since there’s only one job and the workflow’s steps do not require write access to repository contents or other privileged scopes, the minimal permission should be set: contents: read. This should be added at the root level of the workflow file (just below name: and before on:), to apply to all jobs. No other imports, method changes, or additional code definitions are needed; only a single YAML key addition to .github/workflows/ci.yml.

jreakin added 14 commits January 14, 2025 17:48

Consolidating field structures of objs

030b089

changed dependency manager from Poetry to UV

20f8666

Removed files now in .gitignore

d5fb1a9

Cleaning up TX Filer Relationships TEXAS-927

3a4a249

Added TODO notes under texas_filers.py TEXAS-927

56d08e2

Attempt to fix LFS issues

1abce3b

Initial commit on jan25-polars

26dcf83

Recreating jan25-update-branch from 1abce3b

6738336

Made search query changes to texas_search.py

7062227

Modified file formatting

960a7ed

Moved webdriver tools to their own package

e1384aa

Track parquet files with Git LFS

9f69a1c

updated gitignore

4912a06

Normalize state and file origin references

a048755

Copilot AI review requested due to automatic review settings November 9, 2025 02:17

Copilot AI reviewed Nov 9, 2025

View reviewed changes

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

jreakin added 2 commits December 3, 2025 20:06

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

jreakin added 2 commits December 3, 2025 21:41

Add CI workflow with CodeCov integration

fcb46ba

Fix CodeCov slug to campaignfinance-2023

26ea85c

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

Add JUnit XML output and test results upload to Codecov

0fbdcc6

github-advanced-security bot found potential problems Dec 4, 2025

View reviewed changes

jreakin added 5 commits December 3, 2025 21:59

Fix CI workflow: use uv sync and uv run for proper environment setup

606875a

Add pytest-xdist for parallel test execution

e6fc28a

Run only app/tests/ and remove parallel overhead for small test suite

fd80b6d

Fix CI: install uv via pip for better compatibility

95b4304

Fix CI: add PYTHONPATH for app module imports

eaedb1d

		@@ -6,7 +6,7 @@
		<option name="pathToExecutable" value="/opt/homebrew/bin/black" />

	rename = f"{file_name.stem}_{datetime.now():%Y%m%d}dl"
	rename = f"{file_name.stem}_{datetime.now(timezone.utc):%Y%m%d}dl"

		file=cls.config.TEMP_FOLDER / f"{k}_{datetime.now():%Y%m%d}w.parquet",
		compression='lz4')

-    __table_args__ = {"schema": "texas"}
+from typing import ClassVar
+    __table_args__: ClassVar[dict] = {"schema": "texas"}



		class TECExpense(TECSettings, table=True):
		class TECExpense(TECSettings):

		if not headless or not cls.driver.headless:
		cls.not_headless()

	cls.data = _reader.read_folder(cls.folder, file_counts=cls.config.FILE_COUNTS)
	cls.data = _reader.read_folder(cls.config.TEMP_FOLDER, file_counts=cls.config.FILE_COUNTS)

-on:
-  push:
-  pull_request:
+on:
+  push: {}
+  pull_request: {}

@@ -1,4 +1,6 @@
             name: CI
+            permissions:
+              contents: read
             on:
               push:

Conversation

jreakin commented Jan 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

jreakin commented Jan 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading