Skip to content

Commit 4ee911d

Browse files
rustyconoverclaude
andcommitted
refactor: change cardinality() method to property
Change cardinality from a method to a property for API consistency with output_schema. Both return immutable data, so having consistent access patterns improves the API. Changes: - vgi/table_function.py: Add @Property decorator to cardinality - vgi/examples/table.py: Add @Property to all cardinality overrides - vgi/examples/table_in_out.py: Add @Property to all cardinality overrides - tests/: Update test overrides and call sites (.cardinality instead of .cardinality()) - docs/generator-api.md: Update example to use @Property Closes: vgi-python-bku 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 8682e0f commit 4ee911d

8 files changed

Lines changed: 22 additions & 10 deletions

File tree

.beads/issues.jsonl

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
{"id":"vgi-python-0hr","title":"Remove redundant InitInputType class attribute","description":"InitInputType class attribute duplicates the generic type parameter: 'class ScalarFunctionGenerator(Function[FunctionInitInput])' already specifies the type, but 'InitInputType = FunctionInitInput' repeats it. Investigate using get_type_hints or __orig_bases__ to infer the type and remove the redundant attribute.","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-04T20:06:53.780529-05:00","created_by":"rusty","updated_at":"2026-01-04T20:07:37.960914-05:00"}
2-
{"id":"vgi-python-1s5","title":"Move distributed state management to optional mixin","description":"The Function base class in function.py includes ~200 lines for distributed state management (store_state, collect_states, enqueue_work, dequeue_work, work queue storage). Not all functions need this. Extract to DistributedStateMixin that functions can opt into, keeping Function base class simpler for basic use cases.","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-04T20:06:53.606614-05:00","created_by":"rusty","updated_at":"2026-01-04T20:07:37.839559-05:00"}
2+
{"id":"vgi-python-1s5","title":"Move distributed state management to optional mixin","description":"The Function base class in function.py includes ~200 lines for distributed state management (store_state, collect_states, enqueue_work, dequeue_work, work queue storage). Not all functions need this. Extract to DistributedStateMixin that functions can opt into, keeping Function base class simpler for basic use cases.","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-04T20:06:53.606614-05:00","created_by":"rusty","updated_at":"2026-01-04T21:22:09.772825-05:00","closed_at":"2026-01-04T21:22:09.772825-05:00","close_reason":"Analysis complete: extraction not recommended. The distributed state methods are tightly coupled with execution_identifier and storage, which are used by core initialization methods. Extraction would require moving initialize_global_state/load_global_state to the mixin, breaking the protocol and requiring multiple inheritance. Current API is already opt-in (just don't call the methods) and well-documented."}
33
{"id":"vgi-python-36f","title":"Split metadata.py Arrow serialization into separate module","description":"metadata.py is 932 lines with two distinct concerns: 1) metadata resolution (enums, dataclasses, parameter extraction, resolve_metadata) and 2) Arrow serialization (schema definitions, to_arrow/from_arrow functions). Split Arrow serialization into metadata_serialization.py or metadata_arrow.py for better separation of concerns.","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-04T20:06:53.481364-05:00","created_by":"rusty","updated_at":"2026-01-04T20:07:37.718814-05:00"}
4-
{"id":"vgi-python-3fq","title":"Abstract common worker batch processing logic","description":"Worker batch processing methods _process_scalar_batches (377-466), _process_batches (468-550), and _generate_batches (552-593) share significant structure: IPC writer/reader setup, batch counting/logging, main processing loop. Extract common logic to reduce duplication - consider a BatchProcessor helper class or template method pattern.","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-04T20:06:53.350497-05:00","created_by":"rusty","updated_at":"2026-01-04T20:07:37.598552-05:00"}
4+
{"id":"vgi-python-3fq","title":"Abstract common worker batch processing logic","description":"Worker batch processing methods _process_scalar_batches (377-466), _process_batches (468-550), and _generate_batches (552-593) share significant structure: IPC writer/reader setup, batch counting/logging, main processing loop. Extract common logic to reduce duplication - consider a BatchProcessor helper class or template method pattern.","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-04T20:06:53.350497-05:00","created_by":"rusty","updated_at":"2026-01-04T21:20:24.509785-05:00","closed_at":"2026-01-04T21:20:24.509785-05:00","close_reason":"Analysis complete: abstraction not warranted. The three methods have sufficiently different logic (input handling, log message loops, protocol types) that abstracting them would add complexity without meaningful benefit. Current code is already readable at ~70-90 lines each."}
55
{"id":"vgi-python-5er","title":"Extract _should_terminate into shared base class","description":"Identical _should_terminate method is copy-pasted in all three function modules. Implementation is always: check if log_message exists and level is EXCEPTION. Move to shared base class (Function or new ProcessingMixin) to eliminate duplication.","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-04T20:06:41.190482-05:00","created_by":"rusty","updated_at":"2026-01-04T20:07:16.071737-05:00","dependencies":[{"issue_id":"vgi-python-5er","depends_on_id":"vgi-python-6o0","type":"blocks","created_at":"2026-01-04T20:07:49.283865-05:00","created_by":"rusty"}]}
66
{"id":"vgi-python-67w","title":"Create example function using DuckDB settings","description":"Create an example function that demonstrates using DuckDB settings to determine its output.\n\nRequirements:\n- Function declares required_settings in Meta\n- Output schema depends on a setting value (e.g., include extra column based on setting)\n- Clear documentation showing the pattern\n\nExample ideas:\n1. TimezoneAwareFunction: Output includes timezone info based on 'timezone' setting\n2. VerboseOutput: Adds debug columns when 'debug_mode' setting is true\n3. NumericPrecision: Uses 'numeric_precision' to determine output type precision\n\nAdd to vgi/examples/ and register in ExampleWorker.","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-04T13:05:48.503681-05:00","created_by":"rusty","updated_at":"2026-01-04T13:22:23.779895-05:00","closed_at":"2026-01-04T13:22:23.779895-05:00","close_reason":"Added SettingsAwareFunction example","dependencies":[{"issue_id":"vgi-python-67w","depends_on_id":"vgi-python-c2b","type":"blocks","created_at":"2026-01-04T13:06:13.865474-05:00","created_by":"rusty"},{"issue_id":"vgi-python-67w","depends_on_id":"vgi-python-ivf","type":"blocks","created_at":"2026-01-04T13:06:13.890269-05:00","created_by":"rusty"},{"issue_id":"vgi-python-67w","depends_on_id":"vgi-python-bqb","type":"blocks","created_at":"2026-01-04T13:06:13.912531-05:00","created_by":"rusty"},{"issue_id":"vgi-python-67w","depends_on_id":"vgi-python-a99","type":"blocks","created_at":"2026-01-04T13:06:13.936552-05:00","created_by":"rusty"},{"issue_id":"vgi-python-67w","depends_on_id":"vgi-python-j4t","type":"blocks","created_at":"2026-01-04T13:06:13.958494-05:00","created_by":"rusty"}]}
7-
{"id":"vgi-python-6o0","title":"Consolidate _OutputComplete classes into shared module","description":"Three nearly identical _OutputComplete classes exist in scalar_function.py:168-197 (_ScalarOutputComplete), table_function.py:136-175 (_OutputComplete), and table_in_out_function.py:356-400 (_OutputComplete). All are frozen dataclasses with batch field, log_message field, and from_process_result() classmethod. Extract to shared module (e.g., vgi/protocol_types.py) with a single parameterized class.","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-04T20:06:40.893139-05:00","created_by":"rusty","updated_at":"2026-01-04T20:07:15.806567-05:00"}
7+
{"id":"vgi-python-6o0","title":"Consolidate _OutputComplete classes into shared module","description":"Three nearly identical _OutputComplete classes exist in scalar_function.py:168-197 (_ScalarOutputComplete), table_function.py:136-175 (_OutputComplete), and table_in_out_function.py:356-400 (_OutputComplete). All are frozen dataclasses with batch field, log_message field, and from_process_result() classmethod. Extract to shared module (e.g., vgi/protocol_types.py) with a single parameterized class.","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-04T20:06:40.893139-05:00","created_by":"rusty","updated_at":"2026-01-04T21:18:34.529683-05:00","closed_at":"2026-01-04T21:18:34.529683-05:00","close_reason":"PR #5 created: https://github.com/Query-farm/vgi-python/pull/5"}
88
{"id":"vgi-python-79e","title":"Unify ProtocolInput classes with shared base","description":"ProtocolInput classes in scalar_function.py:151-166 and table_in_out_function.py:109-142 have similar structure with batch and metadata fields. The table_in_out version adds is_finalize logic. Create shared base ProtocolInput in protocol_types.py with table_in_out extending it.","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-04T20:06:41.31917-05:00","created_by":"rusty","updated_at":"2026-01-04T20:07:16.240397-05:00"}
99
{"id":"vgi-python-a99","title":"Add settings accessor to function base classes","description":"Add a property to access DuckDB settings values in function implementations.\n\nChanges needed:\n- Add 'settings: dict[str, str]' property to Function base class\n- Property should return self.invocation.duckdb_settings or empty dict\n- Add convenience method like 'get_setting(name, default=None)'\n- Update ScalarFunction, TableFunctionGenerator, TableInOutFunction\n\nExample usage in function:\ndef compute(self, batch):\n tz = self.get_setting('timezone', 'UTC')\n # or\n tz = self.settings.get('timezone', 'UTC')","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-04T13:05:48.221602-05:00","created_by":"rusty","updated_at":"2026-01-04T13:20:41.171991-05:00","closed_at":"2026-01-04T13:20:41.171991-05:00","close_reason":"Implementation complete, all tests pass","dependencies":[{"issue_id":"vgi-python-a99","depends_on_id":"vgi-python-aad","type":"blocks","created_at":"2026-01-04T13:06:13.738212-05:00","created_by":"rusty"}]}
1010
{"id":"vgi-python-aad","title":"Design: DuckDB settings/pragmas access for VGI functions","description":"Design how VGI functions can declare required DuckDB settings/pragmas in their Meta class, and how these settings values should be passed during the bind phase.\n\nKey design decisions:\n1. How to declare required settings in function Meta (e.g., required_settings = ['timezone', 'threads'])\n2. How to add settings to Invocation dataclass\n3. How settings values should be accessed in function code\n4. Serialization format for settings in Arrow IPC\n\nRecommendation: Add 'duckdb_settings: dict[str, str] | None' to Invocation and 'required_settings: list[str]' to Meta class.","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-04T13:05:47.619105-05:00","created_by":"rusty","updated_at":"2026-01-04T13:11:13.197139-05:00","closed_at":"2026-01-04T13:11:13.197139-05:00","close_reason":"Design document created at docs/design-duckdb-settings.md"}
1111
{"id":"vgi-python-bi8","title":"Extract common _process_with_exception_handling into mixin","description":"The _process_with_exception_handling and _process_and_validate methods are duplicated across scalar_function.py:296-346, table_function.py:386-438, and table_in_out_function.py:586-642. All follow same pattern: try _process_and_validate, catch exceptions, return OutputComplete with error message. Extract to ProcessingMixin that all function types inherit from.","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-04T20:06:41.02111-05:00","created_by":"rusty","updated_at":"2026-01-04T20:07:15.947758-05:00","dependencies":[{"issue_id":"vgi-python-bi8","depends_on_id":"vgi-python-6o0","type":"blocks","created_at":"2026-01-04T20:07:49.181408-05:00","created_by":"rusty"}]}
12-
{"id":"vgi-python-bku","title":"Change cardinality() method to property for consistency with output_schema","description":"Inconsistent access patterns: output_schema is a property but cardinality() is a method. Both return immutable data. Change cardinality() to a property for API consistency. Located in table_function.py:304-314.","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-04T20:06:53.211782-05:00","created_by":"rusty","updated_at":"2026-01-04T20:07:37.463727-05:00"}
12+
{"id":"vgi-python-bku","title":"Change cardinality() method to property for consistency with output_schema","description":"Inconsistent access patterns: output_schema is a property but cardinality() is a method. Both return immutable data. Change cardinality() to a property for API consistency. Located in table_function.py:304-314.","status":"in_progress","priority":3,"issue_type":"task","created_at":"2026-01-04T20:06:53.211782-05:00","created_by":"rusty","updated_at":"2026-01-04T21:22:27.430091-05:00"}
1313
{"id":"vgi-python-bqb","title":"Update worker to handle DuckDB settings during bind","description":"Update vgi/worker.py to process DuckDB settings from Invocation during the bind phase.\n\nChanges needed:\n- Read settings from invocation.duckdb_settings\n- Validate that all required_settings (from Meta) are present in invocation\n- Pass settings to function instance for access\n- Log settings usage for debugging\n\nThe worker should validate settings early in bind to fail fast if required settings are missing.","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-04T13:05:48.04037-05:00","created_by":"rusty","updated_at":"2026-01-04T13:20:41.17079-05:00","closed_at":"2026-01-04T13:20:41.17079-05:00","close_reason":"Implementation complete, all tests pass","dependencies":[{"issue_id":"vgi-python-bqb","depends_on_id":"vgi-python-aad","type":"blocks","created_at":"2026-01-04T13:06:13.714281-05:00","created_by":"rusty"}]}
1414
{"id":"vgi-python-c2b","title":"Add duckdb_settings field to Invocation class","description":"Update vgi/invocation.py to add a duckdb_settings field to the Invocation dataclass.\n\nChanges needed:\n- Add 'duckdb_settings: dict[str, str] | None = None' field to Invocation\n- Update serialize() to include settings in Arrow IPC batch\n- Update deserialize() to read settings from Arrow IPC batch\n- Handle None case (no settings requested)\n\nSerialization: Use a struct field with string key-value pairs or a map type.","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-04T13:05:47.765077-05:00","created_by":"rusty","updated_at":"2026-01-04T13:20:41.167817-05:00","closed_at":"2026-01-04T13:20:41.167817-05:00","close_reason":"Implementation complete, all tests pass","dependencies":[{"issue_id":"vgi-python-c2b","depends_on_id":"vgi-python-aad","type":"blocks","created_at":"2026-01-04T13:06:13.664038-05:00","created_by":"rusty"}]}
1515
{"id":"vgi-python-e37","title":"move Invocation from function.py out to own file","description":"The Invocation clas is kind of seperate from functions, so it should be in its own file. Move it and all of its other associated classes like InvocationType to its own file","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-04T09:18:46.605941-05:00","created_by":"rusty","updated_at":"2026-01-04T09:24:37.922675-05:00","closed_at":"2026-01-04T09:24:37.922675-05:00","close_reason":"Closed"}

docs/generator-api.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ class MyTableFunction(TableFunctionGenerator):
9595
def output_schema(self) -> pa.Schema:
9696
return pa.schema([("value", pa.int64())])
9797

98+
@property
9899
def cardinality(self) -> TableCardinality:
99100
"""Optional: provide row count estimate."""
100101
return TableCardinality(estimate=self.count, max=self.count)

tests/table/generator/test_constant_table_function.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ def test_cardinality(self) -> None:
3939
invocation=invocation,
4040
logger=structlog.get_logger(),
4141
)
42-
cardinality = func.cardinality()
42+
cardinality = func.cardinality
4343
assert cardinality is not None
4444
assert cardinality.estimate == 1
4545
assert cardinality.max == 1

tests/table/generator/test_sequence_function.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def test_cardinality(self) -> None:
4040
invocation=invocation,
4141
logger=structlog.get_logger(),
4242
)
43-
cardinality = func.cardinality()
43+
cardinality = func.cardinality
4444
assert cardinality is not None
4545
assert cardinality.estimate == 100
4646
assert cardinality.max == 100

tests/table/test_function.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -196,22 +196,23 @@ def output_schema(self) -> pa.Schema:
196196
func = NoCardinalityFunction(
197197
invocation=invocation, logger=structlog.get_logger()
198198
)
199-
assert func.cardinality() is None
199+
assert func.cardinality is None
200200

201201
def test_custom_cardinality(self) -> None:
202-
"""Custom cardinality() should be respected."""
202+
"""Custom cardinality should be respected."""
203203

204204
class CardinalityFunction(TableFunctionGenerator):
205205
@property
206206
def output_schema(self) -> pa.Schema:
207207
return make_schema([pa.field("x", pa.int64())])
208208

209+
@property
209210
def cardinality(self) -> TableCardinality:
210211
return TableCardinality(estimate=100, max=1000)
211212

212213
invocation = make_invocation()
213214
func = CardinalityFunction(invocation=invocation, logger=structlog.get_logger())
214-
cardinality = func.cardinality()
215+
cardinality = func.cardinality
215216
assert cardinality is not None
216217
assert cardinality.estimate == 100
217218
assert cardinality.max == 1000

vgi/examples/table.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ def output_schema(self) -> pa.Schema:
9191
"""Return output schema with single integer column."""
9292
return pa.schema([pa.field("n", pa.int64())])
9393

94+
@property
9495
def cardinality(self) -> TableCardinality:
9596
"""Return exact cardinality since we know the count."""
9697
return TableCardinality(estimate=self.count, max=self.count)
@@ -160,6 +161,7 @@ def output_schema(self) -> pa.Schema:
160161
"""Return output schema with single integer column."""
161162
return pa.schema([pa.field("value", pa.int64())])
162163

164+
@property
163165
def cardinality(self) -> TableCardinality:
164166
"""Return cardinality based on range parameters."""
165167
if self.end <= self.start:
@@ -233,6 +235,7 @@ def output_schema(self) -> pa.Schema:
233235
"""Return output schema with single integer column."""
234236
return pa.schema([pa.field("value", pa.int64())])
235237

238+
@property
236239
def cardinality(self) -> TableCardinality:
237240
"""Return cardinality of exactly one row."""
238241
return TableCardinality(estimate=1, max=1)
@@ -299,6 +302,7 @@ def output_schema(self) -> pa.Schema:
299302
]
300303
return pa.schema(fields)
301304

305+
@property
302306
def cardinality(self) -> TableCardinality:
303307
"""Return cardinality estimate."""
304308
return TableCardinality(estimate=self.count, max=self.count)
@@ -466,6 +470,7 @@ def output_schema(self) -> pa.Schema:
466470
"""Return output schema with single integer column."""
467471
return pa.schema([pa.field("value", pa.int64())])
468472

473+
@property
469474
def cardinality(self) -> TableCardinality:
470475
"""Return cardinality estimate.
471476
@@ -583,6 +588,7 @@ def output_schema(self) -> pa.Schema:
583588
"""Return the projected schema based on init_input."""
584589
return self.apply_projection(self.FULL_SCHEMA)
585590

591+
@property
586592
def cardinality(self) -> TableCardinality:
587593
"""Return exact cardinality since we know the count."""
588594
return TableCardinality(estimate=self.count, max=self.count)
@@ -705,6 +711,7 @@ def output_schema(self) -> pa.Schema:
705711

706712
return pa.schema(fields)
707713

714+
@property
708715
def cardinality(self) -> TableCardinality:
709716
"""Return exact cardinality since we know the count."""
710717
return TableCardinality(estimate=self.count, max=self.count)

vgi/examples/table_in_out.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,7 @@ class Meta:
357357

358358
data: TableInput = Arg[TableInput](0, doc="Input table with numeric columns") # type: ignore[assignment]
359359

360+
@property
360361
def cardinality(self) -> TableCardinality | None:
361362
"""Return cardinality estimate of exactly 1 row."""
362363
return TableCardinality(estimate=1, max=1)
@@ -676,6 +677,7 @@ def __init__(
676677
super().__init__(invocation=invocation, logger=logger)
677678
self.sums: dict[str, pa.Scalar[Any]] = {}
678679

680+
@property
679681
def cardinality(self) -> TableCardinality | None:
680682
"""Return cardinality estimate of exactly 1 row."""
681683
return TableCardinality(estimate=1, max=1)

0 commit comments

Comments
 (0)