Skip to content

Commit cba8a51

Browse files
TimelordUKclaude
andcommitted
feat: Add UNNEST parser support and test data for FIX allocations
Add parser support for UNNEST syntax and comprehensive design documentation for row expansion feature to handle FIX repeated groups. Parser Changes: - Add parse_unnest() function to primary expression parser - Parses UNNEST(column_expr, 'delimiter') syntax - Creates SqlExpression::Unnest with column and delimiter - Validates delimiter is a string literal - Currently gets parsed as FunctionCall "UNNEST" due to parser architecture - Will be handled specially in evaluator/executor Test Data: - data/fix_allocations.csv - FIX message example data - Contains pipe-delimited accounts and comma-delimited amounts - Three test cases including mismatched lengths Documentation: - docs/UNNEST_DESIGN.md - Complete design specification - Input/output examples with expected results - NULL padding behavior for mismatched lengths - Implementation requirements for evaluator and executor Next Steps: - Handle UNNEST in arithmetic evaluator (return array of split values) - Update query executor to detect UNNEST and multiply rows - Implement NULL padding for mismatched array lengths All tests passing (397 passed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 76692bb commit cba8a51

3 files changed

Lines changed: 123 additions & 0 deletions

File tree

data/fix_allocations.csv

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
msg_type,order_id,symbol,accounts,amounts
2+
AS,ORD001,ZX5Y,ACC_1|ACC_2|ACC_3,"200,200,200"
3+
AS,ORD002,ABCD,ACC_4|ACC_5,"300,700"
4+
8,ORD003,WXYZ,ACC_1,"1000"

docs/UNNEST_DESIGN.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# UNNEST Goal - FIX Allocation Example
2+
3+
## Input Data (fix_allocations.csv)
4+
```csv
5+
msg_type,order_id,symbol,accounts,amounts
6+
AS,ORD001,ZX5Y,ACC_1|ACC_2|ACC_3,"200,200,200"
7+
AS,ORD002,ABCD,ACC_4|ACC_5,"300,700"
8+
8,ORD003,WXYZ,ACC_1,"1000"
9+
```
10+
11+
## Desired Query
12+
```sql
13+
SELECT
14+
msg_type,
15+
order_id,
16+
symbol,
17+
UNNEST(accounts, '|') AS account,
18+
UNNEST(amounts, ',') AS amount
19+
FROM fix_allocations;
20+
```
21+
22+
## Expected Output
23+
```
24+
+----------+----------+--------+---------+--------+
25+
| msg_type | order_id | symbol | account | amount |
26+
+----------+----------+--------+---------+--------+
27+
| AS | ORD001 | ZX5Y | ACC_1 | 200 |
28+
| AS | ORD001 | ZX5Y | ACC_2 | 200 |
29+
| AS | ORD001 | ZX5Y | ACC_3 | 200 |
30+
| AS | ORD002 | ABCD | ACC_4 | 300 |
31+
| AS | ORD002 | ABCD | ACC_5 | 700 |
32+
| 8 | ORD003 | WXYZ | ACC_1 | 1000 |
33+
+----------+----------+--------+---------+--------+
34+
```
35+
36+
## How It Works
37+
38+
1. **Row ORD001**: `accounts` has 3 items, `amounts` has 3 items
39+
- Creates 3 output rows (max of both)
40+
- Each regular column (msg_type, order_id, symbol) is replicated 3 times
41+
- UNNEST columns get their respective split values
42+
43+
2. **Row ORD002**: `accounts` has 2 items, `amounts` has 2 items
44+
- Creates 2 output rows
45+
- Values aligned by index
46+
47+
3. **Row ORD003**: `accounts` has 1 item, `amounts` has 1 item
48+
- Creates 1 output row (no expansion needed)
49+
50+
## Mismatched Length Example (NULL Padding)
51+
52+
If we had mismatched data:
53+
```csv
54+
AS,ORD004,TEST,ACC_1|ACC_2|ACC_3,"100,200"
55+
```
56+
57+
Query result would be:
58+
```
59+
| AS | ORD004 | TEST | ACC_1 | 100 |
60+
| AS | ORD004 | TEST | ACC_2 | 200 |
61+
| AS | ORD004 | TEST | ACC_3 | NULL | <- NULL padding
62+
```
63+
64+
## Implementation Requirements
65+
66+
1. **Parser**: Recognize `UNNEST(column_expr, 'delimiter')`
67+
2. **Evaluator**: Return array of split values (not executed row-by-row)
68+
3. **Query Executor**:
69+
- Detect all UNNEST expressions in SELECT
70+
- For each input row:
71+
- Evaluate each UNNEST → get arrays
72+
- Find max array length
73+
- Generate N output rows
74+
- Fill in values (NULL if array exhausted)

src/sql/parser/expressions/primary.rs

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,11 @@ where
6161
parse_datetime_constructor(parser)
6262
}
6363

64+
Token::Unnest => {
65+
debug!("Parsing UNNEST expression");
66+
parse_unnest(parser)
67+
}
68+
6469
Token::Identifier(id) => {
6570
let id_upper = id.to_uppercase();
6671
let id_clone = id.clone();
@@ -442,6 +447,46 @@ where
442447
}
443448
}
444449

450+
/// Parse UNNEST expression
451+
/// Syntax: UNNEST(column_expr, 'delimiter')
452+
fn parse_unnest<P>(parser: &mut P) -> Result<SqlExpression, String>
453+
where
454+
P: ParsePrimary + ExpressionParser + ?Sized,
455+
{
456+
debug!("parse_unnest: starting");
457+
ExpressionParser::advance(parser); // consume UNNEST
458+
ExpressionParser::consume(parser, Token::LeftParen)?;
459+
460+
// Parse the column expression (first argument)
461+
let column = parser.parse_logical_or()?;
462+
debug!("parse_unnest: parsed column expression");
463+
464+
// Expect comma
465+
ExpressionParser::consume(parser, Token::Comma)?;
466+
467+
// Parse the delimiter (second argument - must be a string literal)
468+
let delimiter = match ExpressionParser::current_token(parser) {
469+
Token::StringLiteral(s) => {
470+
let delim = s.clone();
471+
ExpressionParser::advance(parser);
472+
delim
473+
}
474+
_ => {
475+
return Err("UNNEST delimiter must be a string literal".to_string());
476+
}
477+
};
478+
479+
debug!(delimiter = %delimiter, "parse_unnest: parsed delimiter");
480+
481+
ExpressionParser::consume(parser, Token::RightParen)?;
482+
483+
debug!("parse_unnest: complete");
484+
Ok(SqlExpression::Unnest {
485+
column: Box::new(column),
486+
delimiter,
487+
})
488+
}
489+
445490
/// Trait that parsers must implement to use primary expression parsing
446491
pub trait ParsePrimary {
447492
fn current_token(&self) -> &Token;

0 commit comments

Comments
 (0)