@@ -40,6 +40,9 @@ VGI (Vector Gateway Interface) provides an Apache Arrow-based protocol for conne
4040│ ▼ │
4141│ ┌───────────────────────────────────────────────────────────────┐ │
4242│ │ Worker Process │ │
43+ │ │ SCALAR FUNCTION (ScalarFunction) │ │
44+ │ │ - compute(batch): Transform each row to single output column │ │
45+ │ │ OR │ │
4346│ │ TABLE FUNCTION (TableFunctionGenerator) │ │
4447│ │ - process(): Generator yielding output batches (no input) │ │
4548│ │ OR │ │
@@ -52,13 +55,15 @@ VGI (Vector Gateway Interface) provides an Apache Arrow-based protocol for conne
5255
5356| Type | Base Class | Input | Use Case |
5457| ------| ------------| -------| ----------|
58+ | ** Scalar Function** | ` ScalarFunction ` | Batches | Per-row transforms (1:1 row mapping, single output column) |
5559| ** Table Function** | ` TableFunctionGenerator ` | None | Generate data (sequences, ranges) |
5660| ** Table-In-Out Function** | ` TableInOutFunction ` | Batches | Transform, filter, aggregate |
5761
5862### Key Components
5963
6064- ** Worker** (` vgi/worker.py ` ): Subprocess that hosts functions
6165- ** Client** (` vgi/client/client.py ` ): Spawns workers, streams data
66+ - ** ScalarFunction** (` vgi/scalar_function.py ` ): Base for scalar functions
6267- ** TableFunctionGenerator** (` vgi/table_function.py ` ): Base for table functions
6368- ** TableInOutFunction** (` vgi/table_in_out_function.py ` ): Base for table-in-out functions
6469
@@ -67,7 +72,8 @@ VGI (Vector Gateway Interface) provides an Apache Arrow-based protocol for conne
6772```
6873vgi/
6974 __init__.py # Package exports
70- function.py # Invocation, OutputSpec, Arguments, GlobalInitResult
75+ function.py # Invocation, OutputSpec, Arguments, FunctionType
76+ scalar_function.py # ScalarFunction, ScalarFunctionGenerator
7177 table_function.py # TableFunctionGenerator, CardinalityInfo, Output
7278 table_in_out_function.py # TableInOutFunction, TableInOutGeneratorFunction
7379 metadata.py # Function metadata for introspection
7682 client/
7783 client.py # Client class
7884 examples/
85+ scalar.py # Example scalar functions
7986 table.py # Example table functions
8087 table_in_out.py # Example table-in-out functions
8188 worker.py # ExampleWorker with registry
@@ -89,6 +96,32 @@ vgi-client --input data.parquet --function echo --server vgi-example-worker
8996vgi-client --input data.parquet --function sum_all_columns --server vgi-example-worker
9097```
9198
99+ ## Creating a Scalar Function (Per-Row Transform)
100+
101+ ``` python
102+ import pyarrow as pa
103+ import pyarrow.compute as pc
104+ from vgi import ScalarFunction, Arg
105+
106+ class DoubleColumn (ScalarFunction ):
107+ """ Double the value in a specified column."""
108+
109+ column = Arg[str ](0 , doc = " Column to double" )
110+
111+ @ property
112+ def output_type (self ) -> pa.DataType:
113+ # Output type matches input column type
114+ return self .input_schema.field(self .column).type
115+
116+ def compute (self , batch : pa.RecordBatch) -> pa.Array:
117+ return pc.multiply(batch.column(self .column), 2 )
118+ ```
119+
120+ ### Key Constraints for Scalar Functions:
121+ - ** 1:1 row mapping** : Output must have exactly the same number of rows as input
122+ - ** Single column output** : Output schema has exactly one column named "result"
123+ - ** No finalize phase** : All processing happens in compute()
124+
92125## Creating a Table-In-Out Function (Recommended)
93126
94127``` python
@@ -182,6 +215,9 @@ if __name__ == "__main__":
182215### Imports
183216
184217``` python
218+ # Scalar Functions (per-row transform)
219+ from vgi import ScalarFunction, Arg, Worker
220+
185221# Table Functions (no input)
186222from vgi import TableFunctionGenerator, Output, Arg, Worker
187223
@@ -221,6 +257,17 @@ output_schema = schema_like(self.input_schema, rename={"old": "new"})
221257
222258### Method Override Summary
223259
260+ ** ScalarFunction:**
261+
262+ | Method | When to Override | Default |
263+ | --------| ------------------| ---------|
264+ | ` output_type ` | Define output column type | Required |
265+ | ` compute(batch) ` | Transform batch to single array | Required |
266+ | ` setup() ` | Acquire resources | No-op |
267+ | ` teardown() ` | Release resources | No-op |
268+
269+ ** TableInOutFunction:**
270+
224271| Method | When to Override | Default |
225272| --------| ------------------| ---------|
226273| ` output_schema ` | Change output columns | Returns input_schema |
@@ -232,17 +279,22 @@ output_schema = schema_like(self.input_schema, rename={"old": "new"})
232279### Pattern Decision Tree
233280
234281```
235- Need to implement a VGI function?
236- │
237- ├─ Does the function receive input data?
238- │ │
239- │ ├─ NO → Use TableFunctionGenerator
240- │ │ Override process() to yield Output batches
241- │ │
242- │ └─ YES → Use TableInOutFunction
243- │ ├─ Transform each batch? → Override transform()
244- │ ├─ Aggregate results? → Accumulate in transform(), emit in finish()
245- │ └─ Need generator control? → See docs/generator-api.md
282+ How will your function be used in SQL?
283+
284+ 1. SELECT my_func(col1, col2) FROM table
285+ → SCALAR FUNCTION: Returns one value per input row
286+ → Use ScalarFunction, override output_type and compute()
287+ → Example: upper(), abs(), concat()
288+
289+ 2. SELECT * FROM my_func(args)
290+ → TABLE FUNCTION: Generates rows from arguments (no input table)
291+ → Use TableFunctionGenerator, override process()
292+ → Example: range(), read_csv(), glob()
293+
294+ 3. SELECT * FROM my_func(args, (SELECT * FROM input_table))
295+ → TABLE-IN-OUT FUNCTION: Transforms input rows to output rows
296+ → Use TableInOutFunction, override transform() and optionally finish()
297+ → Example: filtering, enrichment, aggregation
246298```
247299
248300## Additional Documentation
0 commit comments