-
Notifications
You must be signed in to change notification settings - Fork 1
FAQ
PipeFrame is a Python library for data manipulation that uses the pipe operator (>>) to create readable, chainable data workflows. It's built on pandas but provides a more intuitive syntax inspired by R's dplyr and tidyverse.
PipeFrame doesn't replace pandas - it enhances it! benefits:
- More Readable: Workflows read like natural language
- Less Nesting: No more deeply nested function calls
- Easier to Learn: Consistent verb-based API
- 100% Compatible: Works seamlessly with pandas
You can mix PipeFrame and pandas freely in the same code.
Yes! PipeFrame is:
- ✅ Tested thoroughly
- ✅ Type-hinted
- ✅ Well-documented
- ✅ Built on proven pandas foundation
- ✅ MIT licensed
pip install pipeframeCore dependencies:
- pandas >= 1.5.0
- numpy >= 1.21.0
Optional dependencies available through extras like pip install pipeframe[excel].
No, PipeFrame requires Python 3.8 or higher.
Read >> as "then" or "pipe to":
df >> filter('x > 5')
# "Take df, THEN filter where x > 5"?
Absolutely! PipeFrame DataFrames are pandas DataFrames:
from pipeframe import DataFrame
df = DataFrame({'x': [1, 2, 3]})
# Mix PipeFrame and pandas
result = (df
>> filter('x > 1') # PipeFrame
.reset_index(drop=True) # pandas
>> select('x') # PipeFrame
)Use peek() to view intermediate results:
result = (df
>> filter('age > 25')
>> peek(n=5) # Shows first 5 rows
>> define(category='age // 10')
>> peek() # Shows first 5 rows again
>> group_by('category')
)Yes, save to variables:
filtered = df >> filter('age > 25')
with_bonus = filtered >> define(bonus='salary * 0.1')
final = with_bonus >> select('name', 'bonus')Any valid pandas query expression:
# Comparisons
df >> filter('age > 25')
df >> filter('name == "Alice"')
# Logic
df >> filter('age > 25 & salary > 50000')
df >> filter('age > 60 | salary > 100000')
# String operations
df >> filter('name.str.startswith("A")')
df >> filter('name.str.contains("Smith")')
# In operator
df >> filter('department in ["Sales", "IT"]')Use define() (alias for mutate()):
df >> define(
bonus='salary * 0.1',
total='salary + bonus',
is_senior='age > 35'
)Yes, use select() with exclusion:
# Select everything except id
df >> select('-id')
# Select everything except id and temp columns
df >> select('-id', '-temp')PipeFrame has minimal overhead. It's built on pandas, so most operations are just as fast. The pipe operator adds negligible overhead (<1%).
Yes! PipeFrame uses pandas under the hood, so it handles large datasets as efficiently as pandas does. For very large data (>RAM), consider dask or polars.
PipeFrame inherits pandas' performance characteristics. For parallel processing, integrate with dask:
import dask.dataframe as dd
from pipeframe import filter, select
ddf = dd.read_csv('large_file.csv')
result = ddf.compute() >> filter('x > 5') >> select('a', 'b')Make sure you're using PipeFrame's DataFrame:
# Wrong
import pandas as pd
df = pd.DataFrame(...) # Regular pandas DataFrame
# Right
from pipeframe import DataFrame
df = DataFrame(...) # PipeFrame DataFrameOr convert:
from pipeframe import DataFrame
df = DataFrame(pandas_df)Check for:
- String columns need quotes:
'name == "Alice"' - Use
¬and,|notor - Column names with spaces need backticks:
`column name`
# Filter out NA values
df >> filter('column.notna()')
# Replace NA before filtering
df >> define(column='column.fillna(0)') >> filter('column > 0')Yes! PipeFrame works great in Jupyter. DataFrames display exactly like pandas DataFrames.
Yes! Since PipeFrame DataFrames are pandas DataFrames:
import matplotlib.pyplot as plt
plot_data = df >> filter('year == 2024') >> group_by('month') >> summarize(total='sum(sales)')
plot_data.plot(x='month', y='total', kind='line')
plt.show()Absolutely:
from sklearn.model_selection import train_test_split
from pipeframe import *
# Prepare data
X = data >> select('-target')
y = data >> select('target')
X_train, X_test, y_train, y_test = train_test_split(X, y)See our Contributing Guide for:
- Reporting bugs
- Suggesting features
- Submitting pull requests
- Writing documentation
Open an issue on GitHub: https://github.com/Yasser03/pipeframe/issues
- Check the API Reference
- Join Discussions
- Open an Issue