Skip to content

Implement SQL-based document search matching MATLAB's database behavior#17

Merged
stevevanhooser merged 3 commits intomainfrom
claude/sync-did-sqlite-behavior-KZgb8
Mar 15, 2026
Merged

Implement SQL-based document search matching MATLAB's database behavior#17
stevevanhooser merged 3 commits intomainfrom
claude/sync-did-sqlite-behavior-KZgb8

Conversation

@stevevanhooser
Copy link
Copy Markdown
Contributor

Summary

This PR implements comprehensive SQL-based document search functionality for SQLite, matching MATLAB's did.implementations.database.m behavior. It includes proper document flattening via doc2sql, field indexing, and SQL query generation for various search operations.

Key Changes

SQLiteDB (src/did/implementations/sqlitedb.py)

  • Added _sqlite_regexp() function to support regex operations in SQLite
  • Added _sql_escape() helper for safe SQL string literal escaping
  • Implemented _get_field_idx() to manage field indexing with caching, converting triple-underscores to dots per MATLAB convention
  • Implemented _populate_doc_data() to flatten documents via doc2sql and insert into fields and doc_data tables
  • Updated _do_add_doc() to properly populate field data instead of placeholder logic
  • Implemented search() method as the main entry point for SQL-based searches
  • Implemented _search_doc_ids() for recursive search structure traversal:
    • Handles struct arrays (AND operations)
    • Handles OR operations with union semantics
    • Delegates leaf queries to SQL generation
    • Supports negation via ~ prefix
  • Implemented _query_struct_to_sql_str() to convert query structures to SQL WHERE clauses, supporting:
    • String operations: exact_string, exact_string_anycase, contains_string, regexp
    • Numeric operations: exact_number, lessthan, lessthaneq, greaterthan, greaterthaneq
    • Field operations: hasfield, isa, depends_on
    • Falls back to None for unsupported operations
  • Implemented _brute_force_search() fallback using field_search for unsupported SQL operations

doc2sql (src/did/implementations/doc2sql.py)

  • Added _get_class_name() to extract class names from both DID-python and NDI/MATLAB document formats
  • Added _get_superclass_str() to extract and format superclass strings matching MATLAB's comma-space separated, sorted format
  • Added _serialize_depends_on() to serialize dependency information in MATLAB's format (name,value;name,value;)
  • Added _flatten_dict() to flatten nested dictionaries using triple-underscore separators
  • Added _get_meta_table_from() to create meta-tables from field groups
  • Enhanced doc_to_sql() to properly build meta-tables with:
    • Standard meta table with doc_id, class, superclass, datestamp, creation, deletion, depends_on
    • Per-group tables for all top-level dict fields (excluding skip list)
    • Proper handling of both DID-python and MATLAB document schemas
  • Improved get_field() to handle empty strings and IndexError exceptions

Query (src/did/query.py)

  • Implemented to_search_structure() to resolve high-level operations
  • Added _resolve_search_structure() for recursive resolution of search structures
  • Added _resolve_single() to handle operation-specific resolution:
    • isa: kept unresolved to work with both field_search and SQL paths
    • depends_on: converted to hasanysubfield_exact_string with proper parameter handling
    • or: recursively resolves sub-structures
    • Supports negation prefix (~) throughout

Notable Implementation Details

  • Field names in the database use {group}.{field} format matching MATLAB convention, with triple-underscores in column names converted to dots
  • Search supports both AND (struct arrays) and OR operations with proper set semantics
  • Negation is handled at each level of the search tree
  • SQL generation includes fallback to brute-force search for unsupported operations
  • Document flattening properly handles both DID-python and MATLAB/NDI document formats
  • Superclass and dependency information is serialized in MATLAB-compatible formats

https://claude.ai/code/session_01DvD2oeqUFXWGPUfoenh2aj

claude added 3 commits March 15, 2026 21:50
…mpatibility

- Update doc2sql.py to properly flatten documents into MATLAB-compatible
  field names (meta.class, meta.superclass, meta.depends_on, etc.)
- Update _do_add_doc in sqlitedb.py to populate fields/doc_data tables
  using doc2sql, with field caching for performance
- Add SQL-based search in SQLiteDB that queries doc_data directly,
  matching MATLAB's query_struct_to_sql_str logic for all operations
  (exact_string, contains_string, regexp, isa, depends_on, numeric
  comparisons, hasfield, negation)
- Update query.py to_search_structure to resolve depends_on into
  hasanysubfield_exact_string for the SQL search path
- Add brute-force fallback for operations not expressible in SQL

https://claude.ai/code/session_01DvD2oeqUFXWGPUfoenh2aj
- Add lint job with black --check and ruff check on src/ and tests/
- Add Python 3.12 to the test matrix (was only 3.10, 3.11)
- Add workflow_dispatch trigger for manual runs
- Matches CI pattern from NDI-python

https://claude.ai/code/session_01DvD2oeqUFXWGPUfoenh2aj
- Apply black formatting to all src/ and tests/ files
- Fix bare except clauses (E722) in binarydoc.py and database.py
- Remove unused variable in query.py (F841)
- Replace star import in test_datastructures.py with explicit imports (F403/F405)
- Remove unused imports auto-fixed by ruff (F401)

https://claude.ai/code/session_01DvD2oeqUFXWGPUfoenh2aj
@stevevanhooser stevevanhooser merged commit d50eb10 into main Mar 15, 2026
4 checks passed
@stevevanhooser stevevanhooser deleted the claude/sync-did-sqlite-behavior-KZgb8 branch March 15, 2026 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants