Implement SQL-based document search matching MATLAB's database behavior#17
Merged
stevevanhooser merged 3 commits intomainfrom Mar 15, 2026
Merged
Conversation
…mpatibility - Update doc2sql.py to properly flatten documents into MATLAB-compatible field names (meta.class, meta.superclass, meta.depends_on, etc.) - Update _do_add_doc in sqlitedb.py to populate fields/doc_data tables using doc2sql, with field caching for performance - Add SQL-based search in SQLiteDB that queries doc_data directly, matching MATLAB's query_struct_to_sql_str logic for all operations (exact_string, contains_string, regexp, isa, depends_on, numeric comparisons, hasfield, negation) - Update query.py to_search_structure to resolve depends_on into hasanysubfield_exact_string for the SQL search path - Add brute-force fallback for operations not expressible in SQL https://claude.ai/code/session_01DvD2oeqUFXWGPUfoenh2aj
- Add lint job with black --check and ruff check on src/ and tests/ - Add Python 3.12 to the test matrix (was only 3.10, 3.11) - Add workflow_dispatch trigger for manual runs - Matches CI pattern from NDI-python https://claude.ai/code/session_01DvD2oeqUFXWGPUfoenh2aj
- Apply black formatting to all src/ and tests/ files - Fix bare except clauses (E722) in binarydoc.py and database.py - Remove unused variable in query.py (F841) - Replace star import in test_datastructures.py with explicit imports (F403/F405) - Remove unused imports auto-fixed by ruff (F401) https://claude.ai/code/session_01DvD2oeqUFXWGPUfoenh2aj
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements comprehensive SQL-based document search functionality for SQLite, matching MATLAB's
did.implementations.database.mbehavior. It includes proper document flattening viadoc2sql, field indexing, and SQL query generation for various search operations.Key Changes
SQLiteDB (
src/did/implementations/sqlitedb.py)_sqlite_regexp()function to support regex operations in SQLite_sql_escape()helper for safe SQL string literal escaping_get_field_idx()to manage field indexing with caching, converting triple-underscores to dots per MATLAB convention_populate_doc_data()to flatten documents viadoc2sqland insert intofieldsanddoc_datatables_do_add_doc()to properly populate field data instead of placeholder logicsearch()method as the main entry point for SQL-based searches_search_doc_ids()for recursive search structure traversal:~prefix_query_struct_to_sql_str()to convert query structures to SQL WHERE clauses, supporting:exact_string,exact_string_anycase,contains_string,regexpexact_number,lessthan,lessthaneq,greaterthan,greaterthaneqhasfield,isa,depends_onNonefor unsupported operations_brute_force_search()fallback usingfield_searchfor unsupported SQL operationsdoc2sql (
src/did/implementations/doc2sql.py)_get_class_name()to extract class names from both DID-python and NDI/MATLAB document formats_get_superclass_str()to extract and format superclass strings matching MATLAB's comma-space separated, sorted format_serialize_depends_on()to serialize dependency information in MATLAB's format (name,value;name,value;)_flatten_dict()to flatten nested dictionaries using triple-underscore separators_get_meta_table_from()to create meta-tables from field groupsdoc_to_sql()to properly build meta-tables with:metatable with doc_id, class, superclass, datestamp, creation, deletion, depends_onget_field()to handle empty strings and IndexError exceptionsQuery (
src/did/query.py)to_search_structure()to resolve high-level operations_resolve_search_structure()for recursive resolution of search structures_resolve_single()to handle operation-specific resolution:isa: kept unresolved to work with both field_search and SQL pathsdepends_on: converted tohasanysubfield_exact_stringwith proper parameter handlingor: recursively resolves sub-structures~) throughoutNotable Implementation Details
{group}.{field}format matching MATLAB convention, with triple-underscores in column names converted to dotshttps://claude.ai/code/session_01DvD2oeqUFXWGPUfoenh2aj