Skip to content

Latest commit

 

History

History
1222 lines (1028 loc) · 24.5 KB

File metadata and controls

1222 lines (1028 loc) · 24.5 KB

Back to index

Aggregate Functions

Navigation: Functions — Date / Time · Functions — Conditional

This page documents aggregate functions for summarizing and analyzing data.


Table of Contents

  1. COUNT
  2. SUM
  3. AVG
  4. MIN
  5. MAX
  6. FIRST_VALUE
  7. LAST_VALUE
  8. ARRAY_AGG

Overview

Aggregate functions perform calculations on sets of rows and return a single result. They are commonly used with GROUP BY clauses to summarize data by categories.

Key Concepts:

  • Aggregate Functions: Operate on multiple rows to produce a single result
  • GROUP BY: Groups rows that have the same values in specified columns
  • HAVING: Filters groups based on aggregate conditions
  • Window Functions: Perform calculations across rows related to the current row

Function: COUNT

Description:
Count rows or non-null expressions. With DISTINCT counts distinct values.

Syntax:

COUNT(*)
COUNT(expr)
COUNT(DISTINCT expr)
COUNT(*) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)

Inputs:

  • * - Count all rows (including NULLs)
  • expr - Count non-NULL values in expression
  • DISTINCT expr - Count distinct non-NULL values

Output:

  • BIGINT - Number of rows/values

NULL Handling:

  • COUNT(*) includes rows with NULL values
  • COUNT(expr) excludes NULL values
  • COUNT(DISTINCT expr) excludes NULL values

Examples:

Basic COUNT:

-- Count all rows
SELECT COUNT(*) AS total FROM emp;
-- Result: total = 42

-- Count non-NULL values
SELECT COUNT(manager) AS employees_with_manager FROM emp;

-- Count with WHERE
SELECT COUNT(*) AS it_employees 
FROM emp 
WHERE department = 'IT';

COUNT DISTINCT:

-- Count distinct salaries
SELECT COUNT(DISTINCT salary) AS distinct_salaries FROM emp;
-- Result: 8

-- Count distinct departments
SELECT COUNT(DISTINCT department) AS dept_count FROM emp;

-- Count distinct non-NULL emails
SELECT COUNT(DISTINCT email) AS unique_emails FROM users;

COUNT with GROUP BY:

-- Count employees per department
SELECT 
  department,
  COUNT(*) AS employee_count
FROM emp
GROUP BY department;

-- Count orders per customer
SELECT 
  customer_id,
  COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

-- Count with multiple columns
SELECT 
  department,
  job_title,
  COUNT(*) AS count
FROM emp
GROUP BY department, job_title;

COUNT with HAVING:

-- Departments with more than 10 employees
SELECT 
  department,
  COUNT(*) AS count
FROM emp
GROUP BY department
HAVING COUNT(*) > 10;

-- Customers with multiple orders
SELECT 
  customer_id,
  COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 1;

COUNT vs COUNT(*):

-- COUNT(*) counts all rows
SELECT COUNT(*) AS total_rows FROM emp;
-- Result: 42 (includes rows with NULL values)

-- COUNT(column) counts non-NULL values
SELECT COUNT(email) AS rows_with_email FROM emp;
-- Result: 38 (excludes 4 NULL emails)

-- Difference shows NULL count
SELECT 
  COUNT(*) AS total,
  COUNT(email) AS with_email,
  COUNT(*) - COUNT(email) AS without_email
FROM emp;

Practical Examples:

1. Data quality check:

SELECT 
  COUNT(*) AS total_records,
  COUNT(email) AS records_with_email,
  COUNT(phone) AS records_with_phone,
  COUNT(CASE WHEN email IS NOT NULL AND phone IS NOT NULL THEN 1 END) AS complete_records
FROM contacts;

2. Completion rate:

SELECT 
  department,
  COUNT(*) AS total,
  COUNT(performance_review) AS reviewed,
  ROUND(COUNT(performance_review) * 100.0 / COUNT(*), 2) AS review_completion_rate
FROM employees
GROUP BY department;

3. Active users:

SELECT 
  DATE_TRUNC('month', login_date) AS month,
  COUNT(DISTINCT user_id) AS active_users
FROM user_logins
WHERE login_date >= DATE_SUB(CURRENT_DATE, INTERVAL 6 MONTH)
GROUP BY DATE_TRUNC('month', login_date)
ORDER BY month;

Function: SUM

Description:
Sum of values.

Syntax:

SUM(expr)
SUM(DISTINCT expr)
SUM(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)

Inputs:

  • expr - Numeric expression (INT, DOUBLE, DECIMAL, etc.)
  • DISTINCT expr - Sum of distinct values only

Output:

  • NUMERIC - Sum of values (same type as input, or promoted)

NULL Handling:

  • NULL values are ignored
  • If all values are NULL, returns NULL
  • Empty set returns NULL

Examples:

Basic SUM:

-- Total salary
SELECT SUM(salary) AS total_salary FROM emp;

-- Total revenue
SELECT SUM(amount) AS total_revenue FROM sales;

-- Sum with WHERE
SELECT SUM(salary) AS it_total_salary 
FROM emp 
WHERE department = 'IT';

SUM with GROUP BY:

-- Total salary per department
SELECT 
  department,
  SUM(salary) AS total_salary
FROM emp
GROUP BY department;

-- Revenue per product
SELECT 
  product_id,
  SUM(quantity * price) AS total_revenue
FROM order_items
GROUP BY product_id;

-- Monthly sales
SELECT 
  DATE_TRUNC('month', order_date) AS month,
  SUM(total_amount) AS monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

SUM DISTINCT:

-- Sum of distinct salaries (removes duplicates)
SELECT SUM(DISTINCT salary) AS sum_distinct_salaries FROM emp;

-- Sum of unique prices
SELECT SUM(DISTINCT price) AS sum_unique_prices FROM products;

SUM with Calculations:

-- Total compensation (salary + bonus)
SELECT 
  department,
  SUM(salary + COALESCE(bonus, 0)) AS total_compensation
FROM emp
GROUP BY department;

-- Total order value with tax
SELECT 
  SUM(subtotal + tax + shipping) AS total_order_value
FROM orders;

-- Weighted average preparation
SELECT 
  SUM(score * weight) AS weighted_sum,
  SUM(weight) AS total_weight
FROM test_scores;

SUM with CASE:

-- Conditional sum
SELECT 
  SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) AS completed_revenue,
  SUM(CASE WHEN status = 'pending' THEN amount ELSE 0 END) AS pending_revenue
FROM orders;

-- Sum by category
SELECT 
  department,
  SUM(CASE WHEN gender = 'M' THEN salary ELSE 0 END) AS male_total,
  SUM(CASE WHEN gender = 'F' THEN salary ELSE 0 END) AS female_total
FROM emp
GROUP BY department;

Practical Examples:

1. Financial summary:

SELECT 
  DATE_TRUNC('quarter', order_date) AS quarter,
  SUM(total_amount) AS revenue,
  SUM(cost) AS expenses,
  SUM(total_amount - cost) AS profit
FROM orders
GROUP BY DATE_TRUNC('quarter', order_date)
ORDER BY quarter;

2. Inventory value:

SELECT 
  category,
  SUM(quantity * unit_price) AS inventory_value
FROM inventory
GROUP BY category
ORDER BY inventory_value DESC;

3. Running total (with window function):

SELECT 
  order_date,
  amount,
  SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders
ORDER BY order_date;

Function: AVG

Description:
Average of values.

Syntax:

AVG(expr)
AVG(DISTINCT expr)
AVG(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)

Inputs:

  • expr - Numeric expression
  • DISTINCT expr - Average of distinct values only

Output:

  • DOUBLE - Average value

NULL Handling:

  • NULL values are ignored
  • If all values are NULL, returns NULL
  • Empty set returns NULL

Examples:

Basic AVG:

-- Average salary
SELECT AVG(salary) AS avg_salary FROM emp;

-- Average with WHERE
SELECT AVG(salary) AS avg_it_salary 
FROM emp 
WHERE department = 'IT';

-- Average order value
SELECT AVG(total_amount) AS avg_order_value FROM orders;

AVG with GROUP BY:

-- Average salary per department
SELECT 
  department,
  AVG(salary) AS avg_salary
FROM emp
GROUP BY department;

-- Average rating per product
SELECT 
  product_id,
  AVG(rating) AS avg_rating,
  COUNT(*) AS review_count
FROM reviews
GROUP BY product_id;

-- Average daily sales
SELECT 
  DATE_TRUNC('day', order_date) AS day,
  AVG(total_amount) AS avg_daily_order
FROM orders
GROUP BY DATE_TRUNC('day', order_date)
ORDER BY day;

AVG DISTINCT:

-- Average of distinct salaries
SELECT AVG(DISTINCT salary) AS avg_distinct_salary FROM emp;

-- Average of unique prices
SELECT 
  category,
  AVG(DISTINCT price) AS avg_unique_price
FROM products
GROUP BY category;

AVG with Rounding:

-- Round average to 2 decimals
SELECT 
  department,
  ROUND(AVG(salary), 2) AS avg_salary
FROM emp
GROUP BY department;

-- Format as currency
SELECT 
  CONCAT('$', ROUND(AVG(salary), 2)) AS avg_salary_formatted
FROM emp;

AVG vs Manual Calculation:

-- Using AVG function
SELECT AVG(salary) AS avg_salary FROM emp;

-- Manual calculation (equivalent)
SELECT SUM(salary) / COUNT(salary) AS avg_salary FROM emp;

-- Difference with NULL handling
SELECT 
  AVG(bonus) AS avg_with_function,
  SUM(bonus) / COUNT(*) AS avg_manual_all,
  SUM(bonus) / COUNT(bonus) AS avg_manual_non_null
FROM emp;

Practical Examples:

1. Performance metrics:

SELECT 
  employee_id,
  AVG(sales_amount) AS avg_sale,
  AVG(customer_rating) AS avg_rating,
  COUNT(*) AS total_sales
FROM sales
WHERE sale_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
GROUP BY employee_id
HAVING AVG(sales_amount) > 1000;

2. Grade analysis:

SELECT 
  course_id,
  AVG(score) AS avg_score,
  MIN(score) AS min_score,
  MAX(score) AS max_score,
  COUNT(*) AS student_count
FROM exam_results
GROUP BY course_id
ORDER BY avg_score DESC;

3. Response time analysis:

SELECT 
  service_name,
  AVG(response_time_ms) AS avg_response,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_time_ms) AS p95_response
FROM api_logs
WHERE log_date >= CURRENT_DATE - INTERVAL 1 DAY
GROUP BY service_name;

Function: MIN

Description:
Minimum value in group.

Syntax:

MIN(expr)
MIN(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)

Inputs:

  • expr - Any comparable type (numeric, string, date, etc.)

Output:

  • Same type as input

NULL Handling:

  • NULL values are ignored
  • If all values are NULL, returns NULL
  • Empty set returns NULL

Examples:

Basic MIN:

-- Minimum salary
SELECT MIN(salary) AS min_salary FROM emp;

-- Earliest hire date
SELECT MIN(hire_date) AS earliest FROM emp;

-- Lowest price
SELECT MIN(price) AS lowest_price FROM products;

MIN with Different Types:

-- Numeric MIN
SELECT MIN(age) AS youngest FROM users;

-- Date MIN
SELECT MIN(order_date) AS first_order FROM orders;

-- String MIN (alphabetically first)
SELECT MIN(name) AS first_alphabetically FROM products;

-- Timestamp MIN
SELECT MIN(created_at) AS earliest_record FROM logs;

MIN with GROUP BY:

-- Minimum salary per department
SELECT 
  department,
  MIN(salary) AS min_salary
FROM emp
GROUP BY department;

-- Earliest order per customer
SELECT 
  customer_id,
  MIN(order_date) AS first_order_date
FROM orders
GROUP BY customer_id;

-- Lowest price per category
SELECT 
  category,
  MIN(price) AS min_price,
  MAX(price) AS max_price
FROM products
GROUP BY category;

MIN with WHERE:

-- Minimum salary in IT department
SELECT MIN(salary) AS min_it_salary 
FROM emp 
WHERE department = 'IT';

-- Earliest order in 2025
SELECT MIN(order_date) AS first_2025_order 
FROM orders 
WHERE YEAR(order_date) = 2025;

Practical Examples:

1. Find oldest/newest records:

SELECT 
  customer_id,
  MIN(order_date) AS first_order,
  MAX(order_date) AS last_order,
  DATEDIFF(MAX(order_date), MIN(order_date)) AS customer_lifetime_days
FROM orders
GROUP BY customer_id;

2. Price range analysis:

SELECT 
  category,
  MIN(price) AS min_price,
  AVG(price) AS avg_price,
  MAX(price) AS max_price,
  MAX(price) - MIN(price) AS price_range
FROM products
GROUP BY category;

3. Performance bounds:

SELECT 
  server_name,
  MIN(response_time) AS best_response,
  AVG(response_time) AS avg_response,
  MAX(response_time) AS worst_response
FROM server_logs
WHERE log_date = CURRENT_DATE
GROUP BY server_name;

Function: MAX

Description:
Maximum value in group.

Syntax:

MAX(expr)
MAX(expr) OVER (PARTITION BY partition_expr, ...) --(since v0.14.0)

Inputs:

  • expr - Any comparable type (numeric, string, date, etc.)

Output:

  • Same type as input

NULL Handling:

  • NULL values are ignored
  • If all values are NULL, returns NULL
  • Empty set returns NULL

Examples:

Basic MAX:

-- Maximum salary
SELECT MAX(salary) AS top_salary FROM emp;

-- Latest hire date
SELECT MAX(hire_date) AS most_recent FROM emp;

-- Highest price
SELECT MAX(price) AS highest_price FROM products;

MAX with Different Types:

-- Numeric MAX
SELECT MAX(age) AS oldest FROM users;

-- Date MAX
SELECT MAX(order_date) AS last_order FROM orders;

-- String MAX (alphabetically last)
SELECT MAX(name) AS last_alphabetically FROM products;

-- Timestamp MAX
SELECT MAX(updated_at) AS latest_update FROM records;

MAX with GROUP BY:

-- Maximum salary per department
SELECT 
  department,
  MAX(salary) AS max_salary
FROM emp
GROUP BY department;

-- Latest order per customer
SELECT 
  customer_id,
  MAX(order_date) AS last_order_date
FROM orders
GROUP BY customer_id;

-- Highest price per category
SELECT 
  category,
  MAX(price) AS max_price
FROM products
GROUP BY category;

MAX with WHERE:

-- Maximum salary in IT department
SELECT MAX(salary) AS max_it_salary 
FROM emp 
WHERE department = 'IT';

-- Latest order in 2025
SELECT MAX(order_date) AS last_2025_order 
FROM orders 
WHERE YEAR(order_date) = 2025;

Practical Examples:

1. Find top performers:

SELECT 
  department,
  MAX(salary) AS top_salary,
  AVG(salary) AS avg_salary,
  MAX(salary) - AVG(salary) AS gap_to_top
FROM emp
GROUP BY department;

2. Latest activity:

SELECT 
  user_id,
  MAX(login_date) AS last_login,
  DATEDIFF(CURRENT_DATE, MAX(login_date)) AS days_since_login
FROM user_logins
GROUP BY user_id
HAVING DATEDIFF(CURRENT_DATE, MAX(login_date)) > 30;

3. Peak values:

SELECT 
  DATE_TRUNC('day', timestamp) AS day,
  MAX(cpu_usage) AS peak_cpu,
  MAX(memory_usage) AS peak_memory,
  MAX(active_connections) AS peak_connections
FROM system_metrics
WHERE timestamp >= DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)
GROUP BY DATE_TRUNC('day', timestamp)
ORDER BY day;

Function: FIRST_VALUE

Description:
Window function: returns the first value in an ordered partition. Pushed as top_hits size=1 to Elasticsearch when possible.

Syntax:

FIRST_VALUE(expr) OVER (
  [PARTITION BY partition_expr, ...]
  [ORDER BY order_expr [ASC|DESC], ...]
)

Inputs:

  • expr - Expression to return
  • PARTITION BY - Optional grouping columns
  • ORDER BY - Ordering specification (if not provided, only expr column name is used for sorting)

Output:

  • Same type as input expression

Behavior:

  • Returns the first value based on ORDER BY within each partition
  • If OVER is not provided, only the expr column name is used for sorting
  • Optimized to Elasticsearch top_hits aggregation with size=1

Examples:

Basic FIRST_VALUE:

-- First salary in each department (ordered by hire date)
SELECT 
  department,
  name,
  salary,
  FIRST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS first_salary
FROM emp;

Without PARTITION BY:

-- First hire across entire company
SELECT 
  name,
  hire_date,
  FIRST_VALUE(name) OVER (ORDER BY hire_date ASC) AS first_hired_employee
FROM emp;

Without OVER clause:

-- Uses expr column name for sorting
SELECT 
  department,
  FIRST_VALUE(salary) AS first_salary_value
FROM emp;

Multiple Partitions:

-- First employee hired in each department and job title
SELECT 
  department,
  job_title,
  name,
  hire_date,
  FIRST_VALUE(name) OVER (
    PARTITION BY department, job_title
    ORDER BY hire_date ASC
  ) AS first_in_role
FROM emp;

Practical Examples:

1. First purchase per customer:

SELECT 
  customer_id,
  order_id,
  order_date,
  total_amount,
  FIRST_VALUE(total_amount) OVER (
    PARTITION BY customer_id
    ORDER BY order_date ASC
  ) AS first_order_amount
FROM orders;

2. Initial stock price:

SELECT 
  stock_symbol,
  trade_date,
  closing_price,
  FIRST_VALUE(closing_price) OVER (
    PARTITION BY stock_symbol
    ORDER BY trade_date ASC
  ) AS initial_price
FROM stock_prices;

3. Baseline metrics:

SELECT 
  server_name,
  timestamp,
  cpu_usage,
  FIRST_VALUE(cpu_usage) OVER (
    PARTITION BY server_name
    ORDER BY timestamp ASC
  ) AS baseline_cpu
FROM server_metrics
WHERE DATE(timestamp) = CURRENT_DATE;

Comparison with MIN:

-- FIRST_VALUE (order-dependent)
SELECT 
  department,
  FIRST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS first_hired_salary
FROM emp;

-- MIN (value-dependent)
SELECT 
  department,
  MIN(salary) AS lowest_salary
FROM emp
GROUP BY department;

Function: LAST_VALUE

Description:
Window function: returns the last value in an ordered partition. Pushed to Elasticsearch by flipping sort order in top_hits.

Syntax:

LAST_VALUE(expr) OVER (
  [PARTITION BY partition_expr, ...]
  [ORDER BY order_expr [ASC|DESC], ...]
)

Inputs:

  • expr - Expression to return
  • PARTITION BY - Optional grouping columns
  • ORDER BY - Ordering specification (if not provided, only expr column name is used for sorting)

Output:

  • Same type as input expression

Behavior:

  • Returns the last value based on ORDER BY within each partition
  • If OVER is not provided, only the expr column name is used for sorting
  • Optimized to Elasticsearch top_hits by reversing sort order

Examples:

Basic LAST_VALUE:

-- Last salary in each department (ordered by hire date)
SELECT 
  department,
  name,
  salary,
  LAST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS last_salary
FROM emp;

Without PARTITION BY:

-- Most recent hire across entire company
SELECT 
  name,
  hire_date,
  LAST_VALUE(name) OVER (ORDER BY hire_date ASC) AS last_hired_employee
FROM emp;

Without OVER clause:

-- Uses expr column name for sorting
SELECT 
  department,
  LAST_VALUE(salary) AS last_salary_value
FROM emp;

Multiple Partitions:

-- Last employee hired in each department and job title
SELECT 
  department,
  job_title,
  name,
  hire_date,
  LAST_VALUE(name) OVER (
    PARTITION BY department, job_title
    ORDER BY hire_date ASC
  ) AS last_in_role
FROM emp;

Practical Examples:

1. Most recent purchase per customer:

SELECT 
  customer_id,
  order_id,
  order_date,
  total_amount,
  LAST_VALUE(total_amount) OVER (
    PARTITION BY customer_id
    ORDER BY order_date ASC
  ) AS last_order_amount
FROM orders;

2. Latest stock price:

SELECT 
  stock_symbol,
  trade_date,
  closing_price,
  LAST_VALUE(closing_price) OVER (
    PARTITION BY stock_symbol
    ORDER BY trade_date ASC
  ) AS current_price
FROM stock_prices;

3. Current status:

SELECT 
  user_id,
  status_change_date,
  status,
  LAST_VALUE(status) OVER (
    PARTITION BY user_id
    ORDER BY status_change_date ASC
  ) AS current_status
FROM user_status_history;

Comparison with MAX:

-- LAST_VALUE (order-dependent)
SELECT 
  department,
  LAST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS last_hired_salary
FROM emp;

-- MAX (value-dependent)
SELECT 
  department,
  MAX(salary) AS highest_salary
FROM emp
GROUP BY department;

FIRST_VALUE vs LAST_VALUE:

-- Compare first and last values
SELECT 
  department,
  FIRST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS first_hire_salary,
  LAST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS last_hire_salary,
  LAST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) - FIRST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
  ) AS salary_change
FROM emp;

Function: ARRAY_AGG

Description:
Collect values into an array for each partition. Implemented using OVER and pushed to Elasticsearch as top_hits. Post-processing converts hits to an array of scalars.

Syntax:

ARRAY_AGG(expr) OVER (
  [PARTITION BY partition_expr, ...]
  [ORDER BY order_expr [ASC|DESC], ...]
)

Inputs:

  • expr - Expression to collect
  • PARTITION BY - Optional grouping columns
  • ORDER BY - Optional ordering (if not provided, only expr column name is used for sorting)

Output:

  • ARRAY<type_of_expr> - Array of collected values

Behavior:

  • Collects all values of expr within each partition into an array
  • If OVER is not provided, only the expr column name is used for sorting
  • Optimized to Elasticsearch top_hits aggregation
  • Post-processing converts hits to array of scalars

Examples:

Basic ARRAY_AGG:

-- Collect employee names per department
SELECT 
  department,
  ARRAY_AGG(name) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
    LIMIT 100
  ) AS employees
FROM emp;
-- Result: employees as an array of name values per department (sorted and limited)

Without PARTITION BY:

-- Collect all employee names (ordered)
SELECT 
  ARRAY_AGG(name) OVER (ORDER BY hire_date ASC) AS all_employees
FROM emp;

Without OVER clause:

-- Uses expr column name for sorting
SELECT 
  department,
  ARRAY_AGG(name) AS employee_list
FROM emp;

With Multiple Columns:

-- Collect salaries per department
SELECT 
  department,
  ARRAY_AGG(salary) OVER (
    PARTITION BY department 
    ORDER BY salary DESC
  ) AS salary_list
FROM emp;

Ordered Collection:

-- Collect products by price (high to low)
SELECT 
  category,
  ARRAY_AGG(product_name) OVER (
    PARTITION BY category 
    ORDER BY price DESC
  ) AS products_by_price
FROM products
LIMIT 100;

Practical Examples:

1. Customer order history:

SELECT 
  customer_id,
  ARRAY_AGG(order_id) OVER (
    PARTITION BY customer_id 
    ORDER BY order_date DESC
  ) AS order_history,
  ARRAY_AGG(total_amount) OVER (
    PARTITION BY customer_id 
    ORDER BY order_date DESC
  ) AS amount_history
FROM orders
LIMIT 1000;

2. Product tags:

SELECT 
  product_id,
  product_name,
  ARRAY_AGG(tag) OVER (
    PARTITION BY product_id 
    ORDER BY tag ASC
  ) AS tags
FROM product_tags
LIMIT 500;

3. Timeline of events:

SELECT 
  user_id,
  ARRAY_AGG(event_type) OVER (
    PARTITION BY user_id 
    ORDER BY event_timestamp ASC
  ) AS event_timeline,
  ARRAY_AGG(event_timestamp) OVER (
    PARTITION BY user_id 
    ORDER BY event_timestamp ASC
  ) AS timestamp_timeline
FROM user_events
WHERE event_timestamp >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
LIMIT 1000;

4. Hierarchical data:

SELECT 
  manager_id,
  ARRAY_AGG(employee_name) OVER (
    PARTITION BY manager_id 
    ORDER BY hire_date ASC
  ) AS direct_reports
FROM employees
WHERE manager_id IS NOT NULL
LIMIT 100;

LIMIT Consideration:

-- Always use LIMIT with ARRAY_AGG to prevent memory issues
SELECT 
  department,
  ARRAY_AGG(name) OVER (
    PARTITION BY department 
    ORDER BY hire_date ASC
    LIMIT 100  -- Important: limits result set size
  ) AS employees
FROM emp;

Aggregate Functions Summary

Function Purpose Input Output NULL Handling
COUNT(*) Count all rows Any BIGINT Includes NULLs
COUNT(expr) Count non-NULL values Any BIGINT Excludes NULLs
COUNT(DISTINCT expr) Count distinct values Any BIGINT Excludes NULLs
SUM(expr) Sum values Numeric Numeric Ignores NULLs
AVG(expr) Average values Numeric DOUBLE Ignores NULLs
MIN(expr) Minimum value Comparable Same as input Ignores NULLs
MAX(expr) Maximum value Comparable Same as input Ignores NULLs
FIRST_VALUE(expr) First value (ordered) Any Same as input Depends on ORDER
LAST_VALUE(expr) Last value (ordered) Any Same as input Depends on ORDER
ARRAY_AGG(expr) Collect into array Any ARRAY<type> Includes NULLs

Back to index