Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,77 @@
# Changelog

## [3.1.0]

> New aggregate `COLLECT_OBJECT` for `GROUP BY` queries — collects rows in
> each group into an array of structured objects via an inner mini-SELECT
> with optional `ORDER BY`. Plugs into the existing aggregate pipeline
> alongside `COUNT` / `SUM` / `GROUP_CONCAT`; no breaking changes to other
> aggregates or to the public API.

### Added

- **`COLLECT_OBJECT(...)` aggregate function.** Per `GROUP BY` group, returns
`array<array<string, mixed>>` — each row in the group projected through an
inner SELECT (with field aliases) and optionally sorted by an inner
`ORDER BY` (ASC/DESC, multi-key). Inner items accept scalar functions
(`CONCAT`, `ROUND`, `IF`, `COALESCE`, `UPPER`, `LOWER`, …), arithmetic
expressions (`price * 1.21 AS price_with_vat`), and aliases.
FQL grammar:
`COLLECT_OBJECT(expr [AS alias], … [ORDER BY expr [ASC|DESC], …])`.
- **`FQL\Query\Builder\CollectObject` fluent builder.** Tiny chainable DSL —
`select(string ...$fields)` (full FQL expression syntax, accepts inline
`"expr AS alias"` and comma-separated lists via `FieldListSplitter`),
`as(string $alias)` for the idiomatic main-Query-style aliasing, plus the
`orderBy/asc/desc` triple inherited from `Sortable`. Used as
`$query->collectObject((new CollectObject())->select('id')->as('i')->orderBy('name'))->as('alias')`.
- **`FQL\Sql\Ast\Expression\WholeRowNode`.** New AST node that the
`ExpressionEvaluator` resolves to the entire source `$item`. Used as the
`spec.expression` of `COLLECT_OBJECT`, so the standard
`Stream::applyGrouping` path — `$evaluator->evaluate(spec.expression, $item)`
followed by `$class::accumulate($acc, $value)` — automatically delivers the
whole row as the aggregate's value, with no special case in the Stream
pipeline. `CollectObject` is a plain `AggregateFunction` and finalises by
running a one-off `Query` over a `ResultStreamProvider` of the collected
rows — full SELECT/ORDER BY pipeline reuse, no parallel evaluator state.

### Changed

- `ExpressionEvaluator` learned to evaluate `WholeRowNode` (returns the
source `$item`). `Stream` aggregate grouping path is unchanged.
- **`Traits\Sortable` return types changed from `Query` to `static`.** Same
for the corresponding `Interface\Query` signatures (`orderBy`, `sortBy`,
`asc`, `desc`). Existing fluent chains on `Query` keep their behaviour
(Query continues to return itself); the change unlocks `use Sortable;` in
builders that aren't full `Query` objects — `Builder\CollectObject` now
inherits the trio instead of duplicating it.
- `OrderByClauseParser::parseItem()` is now public, enabling `ORDER BY` item
reuse inside expression contexts (used by `COLLECT_OBJECT(... ORDER BY …)`).
- `ExpressionParser` gained a lazy `setOrderByParser()` setter, wired in
`Parser::create()` (full FQL statements) and
`Sql\Provider::freshExpressionParser()` (fragment parsers used by the
fluent API). `ExpressionCompiler` learned to render
`CollectObjectExpressionNode` so SELECT round-trip (compile → string →
re-parse) works for FQL-string inputs.

### Notes

- **Empty groups** produce no output row (consistent with the other
aggregates).
- **`ORDER BY` inside `COLLECT_OBJECT` recognises projected aliases** —
finalisation runs as a full `Query` over the accumulated rows, so the
ordering clause sees both source columns and the aliases declared inside
`COLLECT_OBJECT(...)`. Standard SQL semantics.
- **Null values propagate** into the produced objects (unlike `SUM` / `AVG`,
which skip them).
- **Stable sort** preserves accumulation order on ties.
- **Aggregates inside `COLLECT_OBJECT`** are supported but rarely useful —
inner aggregates collapse the accumulated rows to a single output object,
so `COLLECT_OBJECT(SUM(x))` yields an array of length 1. Prefer scalar
aggregates at the outer level alongside `COLLECT_OBJECT` for per-group
summary numbers.
- **Out of MVP scope** (rejected with a clear exception): `DISTINCT`,
`LIMIT`, `WHERE` inside `COLLECT_OBJECT`, and nested `COLLECT_OBJECT`.

## [3.0.2]

### Fixed
Expand Down
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
![Packagist Dependency Version](https://img.shields.io/packagist/dependency-v/1biot/fiquela/php)
![Packagist License](https://img.shields.io/packagist/l/1biot/fiquela)

![Coverage](https://img.shields.io/badge/coverage-90.22%25-lightgreen)
![PHPUnit Tests](https://img.shields.io/badge/PHPUnit-tests%3A_1286-lightgreen)
![Coverage](https://img.shields.io/badge/coverage-90.19%25-lightgreen)
![PHPUnit Tests](https://img.shields.io/badge/PHPUnit-tests%3A_1302-lightgreen)
![PHPStan](https://img.shields.io/badge/phpstan-level_8-lightgreen)

**FiQueLa** brings SQL querying to structured files. Filter, join, group, aggregate, and export data from XML, CSV, JSON, NDJSON, YAML, NEON, XLSX, ODS, and HTTP access logs — using familiar SQL syntax or a fluent PHP API.
Expand Down Expand Up @@ -356,15 +356,19 @@ fiquela-cli "SELECT name, price FROM csv(data.csv).* WHERE price > 100;"

Full documentation at **[docs.fiquela.io](https://docs.fiquela.io)**

- [Quickstart](https://docs.fiquela.io/quickstart)
- [FQL Syntax](https://docs.fiquela.io/querying/fql-syntax)
- [Fluent API](https://docs.fiquela.io/querying/fluent-api)
- [Joins](https://docs.fiquela.io/querying/joins)
- [Conditions](https://docs.fiquela.io/querying/conditions)
- [Functions](https://docs.fiquela.io/functions/string-functions)
- [EXPLAIN ANALYZE](https://docs.fiquela.io/advanced/explain-analyze)
- [Export with INTO](https://docs.fiquela.io/advanced/export-into)
- [API Reference](docs/api-reference.md)
- [x] ~~**Operator BETWEEN**: Add operator `BETWEEN` for filtering data and add support for dates and ranges.~~
- [x] ~~**XLS/XLSX**: Add Excel file support.~~
- [x] ~~**Custom cast type**: Add support for custom cast type for `SELECT` clause.~~
- [x] ~~**Add explain method**: Add method `explain()` for explaining query execution from actual query debugger and provide more complex information about query.~~
- [x] ~~**PHPStan 8**: Fix all PHPStan 8 errors.~~
- [x] ~~**Tests**: Increase test coverage (80%+).~~
- [x] ~~**Optimize GROUP BY**: Optimize `GROUP BY` for more memory efficient data processing.~~
- [x] ~~**DELETE, UPDATE, INSERT**: Support for manipulating data in files.~~ ~~- Instead of this, it will comes support
for exporting data to files (CSV, NDJson, MessagePack, and more...) by `INTO` clause.~~
- [x] ~~**Documentation**: Create detailed guides and examples for advanced use cases.~~ - [docs.fiquela.io](https://docs.fiquela.io)
- [x] ~~**Tests**: Increase test coverage (90%+).~~
- [ ] **Next file formats**: Add next file formats [MessagePack](https://msgpack.org/), [Parquet](https://parquet.apache.org/docs/file-format/), [INI](https://en.wikipedia.org/wiki/INI_file) and [TOML](https://toml.io/en/)
- [ ] **Hashmap cache**: Add hashmap cache (Redis, Memcache) for more memory efficient data processing.

---

Expand Down
104 changes: 96 additions & 8 deletions docs/file-query-language.md
Original file line number Diff line number Diff line change
Expand Up @@ -503,14 +503,15 @@ WHERE

### Aggregations

| Function | Description |
|-----------------|--------------------|
| `COUNT` | Count rows |
| `SUM` | Sum values |
| `AVG` | Average values |
| `MIN` | Minimum value |
| `MAX` | Maximum value |
| `GROUP_CONCAT` | Concatenate values |
| Function | Description |
|-------------------|------------------------------------------------|
| `COUNT` | Count rows |
| `SUM` | Sum values |
| `AVG` | Average values |
| `MIN` | Minimum value |
| `MAX` | Maximum value |
| `GROUP_CONCAT` | Concatenate values |
| `COLLECT_OBJECT` | Collect rows as an array of structured objects |

Aggregate functions support `DISTINCT` in the same way as SQL, for example `COUNT(DISTINCT id)`.

Expand All @@ -533,6 +534,93 @@ HAVING
OR maxPrice < 500
```

### COLLECT_OBJECT

`COLLECT_OBJECT` accumulates source rows within a `GROUP BY` group into an **array of objects**
(`array<array<string, mixed>>`). Each object contains the fields (and optional aliases) declared
inside the function. You can apply scalar functions and an inner `ORDER BY` to shape and order the
collected rows. The result is suitable for producing nested, JSON-like structures directly from FQL.

```sql
COLLECT_OBJECT(
field_expr [AS alias] [, field_expr [AS alias]] ...
[ORDER BY order_expr [ASC | DESC] [, order_expr [ASC | DESC]] ...]
) [AS alias]
```

**Minimal example** — select a few fields with aliases:

```sql
SELECT
categoryId,
categoryName,
COLLECT_OBJECT(
productId AS id,
productName AS name,
price
) AS products
FROM csv(./examples/data/products.csv).*
GROUP BY categoryId
```

**Complex example** — scalar functions and multi-key ORDER BY:

```sql
SELECT
categoryId,
COLLECT_OBJECT(
productId AS id,
CONCAT(productName, " (", code, ")") AS label,
ROUND(price, 2) AS price
ORDER BY price DESC, productName ASC
) AS products
FROM csv(./examples/data/products.csv).*
GROUP BY categoryId
```

**Example output:**

```json
[
{
"categoryId": 1,
"categoryName": "Phones",
"products": [
{"id": 100, "name": "iPhone", "price": 23990.0},
{"id": 101, "name": "Samsung", "price": 21990.0}
]
}
]
```

#### Semantics

- **Empty group** — produces no output row (consistent with all other aggregate functions).
- **Single-row group** — produces an array of length 1, not a scalar or nullable value.
- **`null` propagation** — unlike `SUM`/`AVG`, which skip nulls, `COLLECT_OBJECT` stores whatever
the expression evaluator returns, including `null`.
- **Stable sort** — rows with equal sort keys preserve their accumulation (source) order.
- **ORDER BY recognises projected aliases** — the inner SELECT runs as a real query over the
accumulated rows, so `ORDER BY` sees both the source columns and the aliases declared inside
`COLLECT_OBJECT(...)`. Standard SQL semantics:

```sql
COLLECT_OBJECT(
ROUND(price, 2) AS roundedPrice
ORDER BY roundedPrice DESC
) AS products
```

#### Limitations (outside MVP)

The following features are not supported. Attempting to use them raises a `ParseException` or
`InvalidArgumentException` at parse/build time:

- `DISTINCT` inside `COLLECT_OBJECT`
- `LIMIT` inside `COLLECT_OBJECT`
- `WHERE` inside `COLLECT_OBJECT`
- Nested `COLLECT_OBJECT(COLLECT_OBJECT(...))`

## 8. Sorting and Filtering

```sql
Expand Down
130 changes: 121 additions & 9 deletions docs/fluent-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ AND (
## 5. Grouping and Aggregations

Use the `groupBy()` method to group the data in your query results. You can use the `having()` method to filter the grouped data.
Also, you can use these aggregations functions `count()`, `sum()`, `avg()`, `min()`, `max()` and `groupConcat()` methods to aggregate the data.
Also, you can use these aggregations functions `count()`, `sum()`, `avg()`, `min()`, `max()`, `groupConcat()` and `collectObject()` methods to aggregate the data.
`count()`, `sum()`, `min()`, `max()` and `groupConcat()` accept a `bool $distinct` parameter.

`groupBy()` is last method that using dot notation for nested fields.
Expand All @@ -472,14 +472,15 @@ $query->groupBy('category.id');

### Aggregations

| Function | Description |
|---------------|-----------------------------|
| `count` | Count rows |
| `sum` | Sum values |
| `avg` | Average values |
| `min` | Minimum value |
| `max` | Maximum value |
| `groupConcat` | Concatenate values |
| Function | Description |
|-----------------|------------------------------------------------|
| `count` | Count rows |
| `sum` | Sum values |
| `avg` | Average values |
| `min` | Minimum value |
| `max` | Maximum value |
| `groupConcat` | Concatenate values |
| `collectObject` | Collect rows as an array of structured objects |

**Example:**

Expand All @@ -503,6 +504,117 @@ $query->count('category.id', true)->as('COUNT_DISTINCT')
->groupConcat('name', ',', true)->as('GROUP_CONCAT_DISTINCT');
```

### COLLECT_OBJECT

`collectObject()` accumulates source rows within a `GROUP BY` group into an **array of objects**
(`array<array<string, mixed>>`). Build the inner projection using `CollectObject` — a small
fluent builder with three surfaces: `select(...)` for inner SELECT items, `as(...)` to alias the
last selected item (mirrors the main Query pattern), and the `orderBy/asc/desc` trio inherited
from `Sortable` for optional inner ordering.

**Minimal example** — plain field selection with aliases:

```php
use FQL\Query\Builder\CollectObject;

$query->collectObject(
(new CollectObject())
->select('productId')->as('id')
->select('productName')->as('name')
->select('price')
)->as('products')
->groupBy('categoryId');
```

**Complex example** — scalar functions and multi-key ORDER BY:

```php
use FQL\Query\Builder\CollectObject;

$query->select('categoryId')
->collectObject(
(new CollectObject())
->select('productId')->as('id')
->select('CONCAT(productName, " (", code, ")")')->as('label')
->select('ROUND(price, 2)')->as('price')
->orderBy('price')->desc()
->orderBy('productName')->asc()
)->as('products')
->groupBy('categoryId');
```

**Combined with other aggregations:**

```php
use FQL\Query\Builder\CollectObject;

$query->select('categoryId', 'categoryName')
->count('productId')->as('total')
->sum('price')->as('totalPrice')
->collectObject(
(new CollectObject())
->select('productId')->as('id')
->select('productName')->as('name')
->select('ROUND(price, 2)')->as('price')
->orderBy('price')->desc()
)->as('products')
->groupBy('categoryId');
```

**Compact form** — `select()` also accepts inline aliases and comma-separated multi-field
strings via `FieldListSplitter`, so a whole projection can fit in a single call:

```php
(new CollectObject())
->select('productId AS id, productName AS name, ROUND(price, 2) AS price')
->orderBy('price')->desc()
```

#### CollectObject builder methods

| Method | Description |
|-------------------------------|---------------------------------------------------------------------------------------------------------------------|
| `select(string ...$fields)` | Add one or more inner SELECT items. Each argument can be a single expression, an `expr AS alias`, or a comma list. |
| `as(string $alias)` | Alias the most recently added select item. Throws `LogicException` if called before any `select()`. |
| `orderBy($field, ?Sort $dir)` | Add an ORDER BY key (default `ASC`). Inherited from `Sortable`. |
| `asc()` | Set the last ORDER BY key to ascending. |
| `desc()` | Set the last ORDER BY key to descending. |

Any scalar function known to the parser — `CONCAT`, `ROUND`, `IF`, `COALESCE`, `UPPER`, `LOWER`,
arithmetic, … — works inside the `select()` string:

```php
(new CollectObject())
->select('IF(stock > 0, "in stock", "sold out")')->as('availability')
```

#### Semantics

- **Empty group** — produces no output row (consistent with all other aggregate functions).
- **Single-row group** — produces an array of length 1, not a scalar or nullable value.
- **`null` propagation** — unlike `sum()`/`avg()`, which skip nulls, `collectObject()` stores
whatever the expression evaluator returns, including `null`.
- **Stable sort** — rows with equal sort keys preserve their accumulation (source) order.
- **`orderBy()` recognises projected aliases** — internally the builder runs a full Query over
the accumulated rows, so `orderBy()` can reference both source columns and aliases declared
via `->as(...)` (or inline `AS` inside the `select()` string). Standard SQL semantics:

```php
(new CollectObject())
->select('ROUND(price, 2)')->as('roundedPrice')
->orderBy('roundedPrice')->desc()
```

#### Limitations (outside MVP)

The following features are not supported. Attempting to use them raises an
`InvalidArgumentException` at build time:

- `DISTINCT` inside the `CollectObject` builder
- `LIMIT` inside the `CollectObject` builder
- `WHERE` inside the `CollectObject` builder
- Nested `collectObject()` calls inside another `CollectObject`

## 6. Sorting

Use the `orderBy()` method to sort the data in your query results. You can use the `limit()` and `offset()` methods to filter the data.
Expand Down
Loading
Loading