-
Notifications
You must be signed in to change notification settings - Fork 1
API Reference
Complete reference for all PipeFrame functions and classes.
- Core Classes
- Data Selection
- Data Filtering
- Data Transformation
- Grouping & Aggregation
- Joining & Combining
- Sorting & Ordering
- Reshaping
- Utilities
PipeFrame's enhanced DataFrame class.
from pipeframe import DataFrame
df = DataFrame(data, columns=None, index=None)Arguments:
-
data: dict, array, or pandas DataFrame -
columns: column labels (optional) -
index: row labels (optional)
Returns: PipeFrame DataFrame (fully compatible with pandas)
Example:
df = DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})Enhanced Series class with pipe support.
from pipeframe import Series
s = Series(data, index=None, name=None)Select columns from DataFrame.
df >> select(*columns)Arguments:
-
*columns: Column names to select
Special patterns:
-
starts_with('prefix'): Columns starting with prefix -
ends_with('suffix'): Columns ending with suffix -
contains('text'): Columns containing text -
matches('regex'): Columns matching regex -
-column: Exclude column
Examples:
# Select specific columns
df >> select('name', 'age', 'salary')
# Select with patterns
df >> select(starts_with('sales_'))
# Exclude columns
df >> select('-temp', '-id')Filter rows based on conditions.
df >> filter(condition)
df >> where(condition) # AliasArguments:
-
condition: String expression for filtering
Examples:
# Simple condition
df >> filter('age > 25')
# Multiple conditions with &
df >> filter('age > 25 & salary > 50000')
# Or conditions with |
df >> filter('department == "Sales" | department == "IT"')
# String operations
df >> filter('name.str.startswith("A")')
df >> filter('city.str.contains("York")')
# IN operator
df >> filter('status in ["Active", "Pending"]')Create or modify columns.
df >> define(**kwargs)
df >> mutate(**kwargs) # AliasArguments:
-
**kwargs: name=expression pairs
Examples:
# Create new column
df >> define(bonus='salary * 0.1')
# Multiple columns
df >> define(
bonus='salary * 0.1',
total='salary + bonus',
is_senior='age > 35'
)
# Complex expressions
df >> define(
category='''
"High" if amount > 1000 else
"Medium" if amount > 500 else
"Low"
'''
)Drop columns.
df >> drop(*columns)Examples:
df >> drop('temp_column')
df >> drop('col1', 'col2', 'col3')Rename columns.
df >> rename(mapper, **kwargs)Examples:
# Using dict
df >> rename({'old_name': 'new_name'})
# Using kwargs
df >> rename(old_name='new_name')Group DataFrame by columns.
df >> group_by(*columns)Examples:
df >> group_by('department')
df >> group_by('department', 'location')Aggregate grouped data.
grouped_df >> summarize(**kwargs)
grouped_df >> summarise(**kwargs) # British spellingArguments:
-
**kwargs: name='aggregation_function(column)' pairs
Common aggregations:
-
mean(column): Average -
sum(column): Total -
count(): Count rows -
min(column): Minimum -
max(column): Maximum -
std(column): Standard deviation -
nunique(column): Count unique values
Examples:
result = (df
>> group_by('department')
>> summarize(
avg_salary='mean(salary)',
total_employees='count()',
max_salary='max(salary)'
)
)Count observations.
df >> count(*columns)Examples:
# Count all rows
df >> count()
# Count by group
df >> count('department')
df >> count('department', 'location')Left outer join.
df >> left_join(right, on=None, left_on=None, right_on=None)Right outer join.
df >> right_join(right, on=None, left_on=None, right_on=None)Inner join.
df >> inner_join(right, on=None, left_on=None, right_on=None)Full outer join.
df >> full_join(right, on=None, left_on=None, right_on=None)Examples:
# Join on common column
result = orders >> left_join(customers, on='customer_id')
# Join on different column names
result = orders >> left_join(
customers,
left_on='cust_id',
right_on='id'
)Concatenate DataFrames vertically.
df >> bind_rows(other)Concatenate DataFrames horizontally.
df >> bind_cols(other)Sort DataFrame by columns.
df >> arrange(*columns)
df >> order_by(*columns) # AliasArguments:
-
*columns: Column names (prefix with-for descending)
Examples:
# Ascending
df >> arrange('age')
# Descending (use - prefix)
df >> arrange('-salary')
# Multiple columns
df >> arrange('department', '-salary')Pivot from long to wide format.
df >> pivot_wider(id_cols, names_from, values_from)Pivot from wide to long format.
df >> pivot_longer(cols, names_to='name', values_to='value')Examples:
# Wide to long
long_df = (wide_df
>> pivot_longer(
cols=['Q1', 'Q2', 'Q3', 'Q4'],
names_to='quarter',
values_to='sales'
)
)
# Long to wide
wide_df = (long_df
>> pivot_wider(
id_cols='product',
names_from='quarter',
values_from='sales'
)
)Return first n rows.
df >> head(n=5)Return last n rows.
df >> tail(n=5)Random sample of rows.
df >> sample(n=None, frac=None)Examples:
# Sample 10 rows
df >> sample(n=10)
# Sample 10% of rows
df >> sample(frac=0.1)Remove duplicate rows.
df >> distinct(*columns)Examples:
# Remove all duplicate rows
df >> distinct()
# Remove duplicates based on columns
df >> distinct('customer_id', 'date')Drop rows with missing values.
df >> drop_na(*columns)Examples:
# Drop rows with any NA
df >> drop_na()
# Drop rows with NA in specific columns
df >> drop_na('critical_column')Fill missing values.
df >> fill_na(value, **kwargs)Examples:
# Fill all NAs with 0
df >> fill_na(0)
# Fill specific columns
df >> fill_na(age=0, salary=50000)View intermediate results (for debugging).
df >> peek(n=5)Example:
result = (df
>> filter('age > 25')
>> peek(n=3) # Shows first 3 rows
>> define(bonus='salary * 0.1')
>> peek() # Shows first 5 rows
)Select columns starting with prefix.
df >> select(starts_with('sales_'))Select columns ending with suffix.
df >> select(ends_with('_total'))Select columns containing text.
df >> select(contains('amount'))Select columns matching regex pattern.
df >> select(matches(r'\\d{4}')) # Columns with 4 digits- Quick Start Guide - Learn the basics
- Examples - Real-world use cases
- FAQ - Common questions
Need help? Open an issue https://github.com/Yasser03/pipeframe/issues