This package is a Python implementation of a ggplot2-like grammar that renders
to Plotly. Treat ggplot2/R compatibility as the default product requirement:
function names, argument names, aliases, and behavior should match ggplot2
unless Python makes that impossible. Do not rename public parameters to more
Pythonic spellings when a ggplot2 spelling already exists; preserve aliases such
as colour, linewidth, show.legend, na.rm, and position_*.
ggplotly/ggplot.pyowns composition and final draw ordering: layers, scales, coords, theme, labels, guides, annotations, and size.- Most geoms convert mapped data into Plotly traces through
Geom._transform_figandggplotly/trace_builders.py. ggplotly/aesthetic_mapper.pyis the shared resolver forcolor,fill,size,shape,linetype,group, and alpha.- Keep computation-heavy or invariant-heavy rendering logic in small pure helper
functions that return values/specs; keep Plotly
fig.add_traceandfig.update_layoutmutation inside geom/theme/layout shells. - Avoid broad manager/service abstractions. Add a helper module only when it encodes a real rendering concept or is reused/tested directly.
fillandcolorhave distinct ggplot2 meanings. For bars/histograms,fillis interior color andcolor/colouris outline color.- Do not change the shared
position_dodgemath casually. Different geoms may use it differently; bar/column chart dodging can be implemented at the trace/layout level without changing the position object itself. geom_bar()defaults to stacked bars whenfillis mapped.position="dodge"/position_dodge()should render side-by-side groups.position_fill()should normalize each x stack to 1.0.geom_density(fill=...)should fill the area under the density curve, not only style the line.- Map overlays must use geo traces when a geo context exists. Cartesian
Scatter/Heatmapoverlays detach visually from Plotly geo projections. - Date/time index values must stay as datetimes in rendered traces; avoid
accidentally converting
DatetimeIndexvalues to nanosecond integers. - Mapped numeric point sizes should be scaled to a visible marker range, not passed through raw data values.
- Preferred full test command:
.venv/bin/python -m pytest pytest -q --no-cov
- Add targeted tests near the behavior owner, and use
pytest/test_fp_review_regressions.pyfor cross-cutting rendering regressions found during review. - For notebook visual review, execution passing is not enough. Render notebooks to HTML, capture PNGs, and inspect whether the chart semantics are correct.
- Use npm/Node Playwright with its managed Chromium browser for notebook PNG capture. Do not use Kaleido or the installed macOS Chrome app for bulk review; those paths can launch the user's desktop Chrome profile/updater and cause hangs or crashes.
- In this environment, Jupyter notebook execution may need permission to bind
local kernel ports. If nbconvert fails with
PermissionErrorwhile finding a port, rerun the same command with escalation rather than changing notebook code.
- Work on feature branches and commit coherent chunks frequently.
- Do not rewrite or revert unrelated user changes. If the working tree is dirty, inspect and preserve user-owned changes.
- Keep generated review artifacts under
/private/tmpunless the user asks for repo-tracked artifacts.
IMPORTANT: This library aims to faithfully replicate R's ggplot2 API in Python.
When contributing or modifying code:
- Follow ggplot2 conventions - Match R's ggplot2 function names, parameter names, and behavior as closely as possible
- Consult ggplot2 documentation - When implementing existing ggplot2 features, reference https://ggplot2.tidyverse.org/reference/
- Extrapolate for new features - For functionality not in ggplot2 (e.g.,
geom_candlestick,geom_stl,geom_sankey), follow ggplot2 naming conventions and design patterns:- Use
geom_*prefix for geometric objects - Use
stat_*prefix for statistical transformations - Use
scale_*_*pattern for scales (e.g.,scale_x_log10,scale_color_manual) - Accept
aes()mappings consistently - Support
data=parameter override in geoms
- Use
- Pythonic adaptations - Only deviate from ggplot2 when Python requires it (e.g., strings for column names in
aes())
GGPLOTLY is a Python data visualization library that combines R's ggplot2 Grammar of Graphics with Plotly's interactive capabilities.
- Version: 0.3.5 (Beta)
- Author: Ben Cho
- Python: 3.9+
- License: MIT
from ggplotly import ggplot, aes, geom_point, theme_minimal
(ggplot(df, aes(x='col1', y='col2', color='category'))
+ geom_point()
+ theme_minimal())ggplotly/
├── ggplotly/ # Main package
│ ├── ggplot.py # Core ggplot class
│ ├── aes.py # Aesthetic mappings
│ ├── layer.py # Layer abstraction
│ ├── geoms/ # 46 geometric objects
│ ├── stats/ # 13 statistical transformations
│ ├── scales/ # 19 scales
│ ├── coords/ # 5 coordinate systems
│ ├── themes.py # 9 built-in themes
│ ├── facets.py # facet_wrap, facet_grid
│ └── data/ # 16 built-in datasets (CSV)
├── pytest/ # Test suite (39 files)
├── examples/ # Jupyter notebooks
└── docs/ # MkDocs documentation
ggplotly/ggplot.py- Main ggplot class, rendering pipelineggplotly/layer.py- Layer abstraction combining data, geom, stat, positionggplotly/aes.py-aes()function andafter_stat()for aesthetic mappingsggplotly/trace_builders.py- Strategy pattern for Plotly trace creationggplotly/aesthetic_mapper.py- Maps aesthetics to visual propertiesggplotly/data_utils.py- Data normalization, index handling
# Install
pip install -e .
# Run tests
pytest pytest/ -v
# Run specific test file
pytest pytest/test_geoms.py -v
# Run example notebooks as tests (catches real-world usage bugs)
pytest --nbmake examples/*.ipynb
# Run all tests including notebooks
pytest pytest/ -v && pytest --nbmake examples/*.ipynb
# Build docs
mkdocs build
# Serve docs locally
mkdocs serveCore: pandas, plotly, numpy, scikit-learn, scipy, statsmodels
Optional:
pip install ggplotly[geo]- geopandas, shapelypip install ggplotly[network]- igraph, searoute
- Grammar of Graphics: Uses
+operator for composition - Immutable:
+returns copy, doesn't modify in place - Strategy Pattern:
trace_builders.pyhandles different grouping scenarios - Registry Pattern:
ScaleRegistryenforces one scale per aesthetic
| Component | Count | Location |
|---|---|---|
| Geoms | 46 | ggplotly/geoms/ |
| Stats | 13 | ggplotly/stats/ |
| Scales | 19 | ggplotly/scales/ |
| Themes | 9 | ggplotly/themes.py |
| Coords | 5 | ggplotly/coords/ |
| Datasets | 16 | ggplotly/data/ |
Tests are in pytest/ using pytest framework:
test_geoms.py- Geom functionalitytest_showcase.py- Integration/showcase teststest_stats_positions_limits.py- Stats and positionstest_facets.py- Facetingtest_scales.py- Scale functionality
When adding new features or modifying existing code, tests MUST include all four categories:
Verify the feature works as expected in isolation:
def test_stroke_with_value(self):
"""Test stroke parameter sets marker border width."""
df = pd.DataFrame({"x": [1, 2], "y": [1, 2]})
plot = ggplot(df, aes(x="x", y="y")) + geom_point(stroke=2)
fig = plot.draw()
assert fig.data[0].marker.line.width == 2Test boundary conditions, empty data, type variations:
def test_stroke_with_large_value(self):
"""Test stroke with unusually large value."""
# ...
def test_stroke_empty_dataframe(self):
"""Test stroke with empty DataFrame doesn't crash."""
df = pd.DataFrame({"x": [], "y": []})
plot = ggplot(df, aes(x="x", y="y")) + geom_point(stroke=2)
fig = plot.draw() # Should not raise
def test_stroke_with_float_value(self):
"""Test stroke accepts float values."""
# ...Test with faceting, color aesthetics, and multiple geoms:
def test_stroke_with_facet_wrap(self):
"""Test stroke parameter works with faceting."""
df = pd.DataFrame({
"x": [1, 2, 3, 4], "y": [1, 2, 3, 4],
"cat": ["A", "A", "B", "B"]
})
plot = ggplot(df, aes(x="x", y="y")) + geom_point(stroke=2) + facet_wrap("cat")
fig = plot.draw()
# Verify stroke applied across all facets
for trace in fig.data:
if hasattr(trace, "marker") and trace.marker:
assert trace.marker.line.width == 2
def test_stroke_with_color_aesthetic(self):
"""Test stroke works when color aesthetic is mapped."""
df = pd.DataFrame({
"x": [1, 2, 3], "y": [1, 2, 3],
"cat": ["A", "B", "C"]
})
plot = ggplot(df, aes(x="x", y="y", color="cat")) + geom_point(stroke=1.5)
fig = plot.draw()
# Each category trace should have stroke
for trace in fig.data:
assert trace.marker.line.width == 1.5Capture and verify figure structure/properties:
class TestVisualRegression:
"""Visual regression tests that verify figure structure."""
def get_figure_signature(self, fig):
"""Extract key properties from figure for comparison."""
signature = {"num_traces": len(fig.data), "traces": []}
for trace in fig.data:
trace_sig = {"type": trace.type, "mode": getattr(trace, "mode", None)}
if hasattr(trace, "marker") and trace.marker:
trace_sig["marker"] = {
"size": getattr(trace.marker, "size", None),
"line_width": getattr(trace.marker.line, "width", None) if trace.marker.line else None,
}
signature["traces"].append(trace_sig)
return signature
def test_stroke_visual_signature(self):
"""Test that stroke produces expected visual signature."""
df = pd.DataFrame({"x": [1, 2, 3], "y": [1, 2, 3]})
plot = ggplot(df, aes(x="x", y="y")) + geom_point(stroke=2.5)
fig = plot.draw()
sig = self.get_figure_signature(fig)
assert sig["num_traces"] == 1
assert sig["traces"][0]["type"] == "scatter"
assert sig["traces"][0]["marker"]["line_width"] == 2.5test_<feature>_default- Test default behaviortest_<feature>_with_value- Test with explicit valuetest_<feature>_with_<aesthetic>- Test with specific aesthetictest_<feature>_empty_dataframe- Test empty data handlingtest_<feature>_with_facet_wrap- Test with facetingtest_<feature>_visual_signature- Visual regression test
See pytest/test_new_parameters.py for comprehensive examples of all four test categories.
aes(
x='column', # X-axis mapping
y='column', # Y-axis mapping
color='column', # Line/point color (categorical or continuous)
fill='column', # Fill color (bars, areas)
size='column', # Point/line size
shape='column', # Point shape (categorical)
alpha=0.5, # Transparency (0-1)
group='column', # Grouping without color
label='column', # Text labels
)Use after_stat() to reference computed statistics:
aes(y=after_stat('density')) # Use density instead of count in histograms
aes(y=after_stat('count / count.sum()')) # Proportions| Geom | Use Case |
|---|---|
geom_point() |
Scatter plots |
geom_line() |
Line charts |
geom_path() |
Connect points in data order |
geom_bar() |
Bar charts (stat='count' default) |
geom_col() |
Bar charts (stat='identity') |
geom_histogram() |
Histograms |
geom_boxplot() |
Box plots |
geom_violin() |
Violin plots |
geom_density() |
Density curves |
geom_smooth() |
Trend lines with CI |
geom_area() |
Area charts |
geom_ribbon() |
Confidence bands (ymin/ymax) |
geom_tile() |
Heatmaps |
geom_rect() |
Rectangles (highlight regions) |
geom_text() |
Text labels |
geom_label() |
Text with background |
geom_errorbar() |
Error bars |
geom_segment() |
Line segments (with optional arrows) |
geom_vline(), geom_hline() |
Reference lines |
geom_abline() |
Slope/intercept lines |
geom_jitter() |
Jittered points |
geom_rug() |
Marginal tick marks |
geom_qq(), geom_qq_line() |
Q-Q plots |
geom_contour() |
Contour lines |
geom_candlestick() |
Financial OHLC |
geom_waterfall() |
Waterfall charts |
geom_map() |
Choropleth maps |
from ggplotly import data
# List all datasets
data()
# Load a specific dataset
mpg = data('mpg')
diamonds = data('diamonds')
iris = data('iris')Available: diamonds, mpg, iris, mtcars, economics, economics_long, msleep, faithfuld, seals, txhousing, midwest, presidential, commodity_prices, luv_colours, us_flights (network data)
theme_default() # Default Plotly theme
theme_minimal() # Clean, minimal
theme_classic() # Classic ggplot2 style
theme_dark() # Dark background
theme_ggplot2() # R ggplot2 style
theme_nytimes() # NYT style
theme_bbc() # BBC News style
theme_custom() # Custom theme builderposition_dodge() # Side by side (grouped bars)
position_jitter() # Add random noise (overlapping points)
position_stack() # Stack on top of each other
position_fill() # Stack normalized to 100%
position_nudge() # Shift by fixed amountcoord_cartesian(xlim=(0, 10)) # Zoom without clipping data
coord_fixed(ratio=1) # Fixed aspect ratio (1:1 scaling)
coord_flip() # Swap x and y axes
coord_polar() # Polar coordinates (pie charts)
coord_sf() # Geographic projectionsPandas index is automatically available:
# Series: index becomes x-axis automatically
ggplot(series, aes(y='value')) # x uses index
# DataFrame: reference index with 'index'
ggplot(df, aes(x='index', y='col'))
# Named index becomes axis label automatically# Wrap into rows/columns
facet_wrap('category', ncol=3)
# Grid by two variables
facet_grid(rows='var1', cols='var2')
# Free scales
facet_wrap('category', scales='free') # 'free_x', 'free_y'# Axis transforms
scale_x_log10() # Log scale
scale_y_log10() # Log scale (y-axis)
scale_x_reverse() # Reversed x-axis
scale_y_reverse() # Reversed y-axis
scale_x_continuous(limits=(0,100)) # Set range
scale_x_date(date_labels='%Y-%m') # Date formatting
# Manual colors
scale_color_manual(['red', 'blue', 'green'])
scale_fill_manual({'A': 'red', 'B': 'blue'}) # Dict mapping
# Color gradients
scale_color_gradient(low='white', high='red')
scale_fill_viridis_c() # Viridis colorscale
# ColorBrewer palettes
scale_color_brewer(palette='Set1')
scale_fill_brewer(palette='Blues')
# Interactive
scale_x_rangeslider() # Add range slider
scale_x_rangeselector() # Add range buttons| Stat | Purpose | Used By |
|---|---|---|
stat_identity |
No transformation | geom_col, geom_point |
stat_count |
Count observations | geom_bar |
stat_bin |
Bin data | geom_histogram |
stat_density |
Kernel density | geom_density |
stat_smooth |
Smoothed line + CI | geom_smooth |
stat_ecdf |
Empirical CDF | - |
stat_summary |
Summary statistics | - |
stat_function |
Apply function | - |
stat_qq |
Q-Q plot points | geom_qq |
stat_qq_line |
Q-Q reference line | geom_qq_line |
stat_contour |
Contour computation | geom_contour |
stat_stl |
STL decomposition | geom_stl |
stat_fanchart |
Fan chart percentiles | geom_fanchart |
# Candlestick (requires open, high, low, close columns)
geom_candlestick(aes(x='date', open='open', high='high', low='low', close='close'))
geom_ohlc() # OHLC bars
geom_waterfall() # Waterfall chartsgeom_point_3d(aes(x='x', y='y', z='z'))
geom_surface() # 3D surface
geom_wireframe() # Wireframe surfacegeom_map(aes(fill='value')) # Choropleth
geom_sf() # Simple features
coord_sf(projection='...') # Map projectionsgeom_edgebundle() # Edge bundling
geom_sankey() # Sankey diagramsgeom_stl() # STL decomposition (trend, seasonal, residual)
geom_acf() # Autocorrelation function
geom_pacf() # Partial autocorrelation function
geom_fanchart() # Fan charts for uncertainty
geom_range() # Historical range plots (5-year range)geom_rect- Draw rectangles (highlight regions, backgrounds)geom_label- Text labels with background boxes
scale_x_reverse() # Reversed x-axis
scale_y_reverse() # Reversed y-axis| Geom | Parameter | Description |
|---|---|---|
geom_point |
stroke |
Marker border width |
geom_segment |
arrow, arrow_size |
Add arrows to segments |
geom_errorbar |
width |
Error bar cap width |
geom_text |
parse |
Enable LaTeX/MathJax rendering |
geom_col |
width |
Bar width control |
geom_smooth |
fullrange |
Extend line to full x-axis |
geom_area |
position |
Stacking support |
linewidth→size(ggplot2 3.4+)colour→color(British spelling)
- Create
ggplotly/geoms/geom_newname.py - Inherit from
GeomBaseingeom_base.py - Implement
_draw_impl()method - Export in
ggplotly/__init__.py - Add tests in
pytest/test_geoms.py
- Create
ggplotly/stats/stat_newname.py - Inherit from
Statinstat_base.py - Implement
compute()method returning(data, mapping)tuple - Export in
ggplotly/__init__.py
- Create
ggplotly/scales/scale_newname.py - Inherit from
Scaleinscale_base.py - Implement
apply()method - Export in
ggplotly/__init__.py
Geom parameters follow a three-level inheritance pattern:
class geom_example(Geom):
# Subclass defaults - override base class defaults
default_params = {"size": 2, "alpha": 0.8}Parameter resolution order (later takes precedence):
- Base class defaults (always applied):
{"na_rm": False, "show_legend": True} - Subclass
default_params: Class-specific defaults like{"size": 2} - User-provided params: Explicit values passed to constructor
Important: Do NOT include na_rm or show_legend in subclass default_params - they are automatically inherited from the base class.
The base class handles these ggplot2 compatibility aliases:
linewidth→size(ggplot2 3.4+ compatibility)colour→color(British spelling)showlegend→show_legend(Plotly convention)
Explicit user params take precedence: if both linewidth=10 and size=5 are passed, size=5 wins.
The before_add() method is called when a geom is added to a plot via the + operator. Use it to:
- Create sub-layers (e.g.,
geom_ribboncreates multiplegeom_linelayers) - Transform the geom before it's added to the plot
- Return additional layers to be added
class geom_ribbon(Geom):
def before_add(self):
# Create additional layers for ribbon edges
color = self.params.get("color", None)
# Create line layers for ymin and ymax edges
min_line = geom_line(mapping=aes(x=self.mapping['x'], y=self.mapping['ymin']),
color=color)
max_line = geom_line(mapping=aes(x=self.mapping['x'], y=self.mapping['ymax']),
color=color)
# Return list of additional layers
return [min_line, max_line]When to use before_add():
- Composite geoms that consist of multiple sub-geoms
- Geoms that need to generate additional visual elements
- When the geom itself shouldn't render but spawns other geoms
Implementation notes:
- Return
None(or omit return) if no additional layers needed - Returned layers are added to the plot after the original geom
- The method is called by
ggplot.__add__()during composition
-
String column names: Always use strings in
aes():aes(x='col')notaes(x=col) -
Parentheses for chaining: Wrap in
()for multi-line+chains:(ggplot(df, aes(x='x', y='y')) + geom_point() + theme_minimal())
-
geom_bar vs geom_col:
geom_bar()counts rows (stat='count')geom_col()uses y values directly (stat='identity')
-
Color vs Fill:
color= outline/line colorfill= interior color (bars, areas, boxes)
-
Saving plots:
from ggplotly import ggsave ggsave(plot, 'output.html') # Interactive HTML ggsave(plot, 'output.png') # Static image (requires kaleido)
-
Plot sizing:
from ggplotly import ggsize plot + ggsize(width=800, height=600)
-
Labels and titles:
from ggplotly import labs plot + labs(title='Title', x='X Label', y='Y Label', color='Legend')
-
Multiple geoms: Layer geoms for complex plots:
(ggplot(df, aes(x='x', y='y')) + geom_point() + geom_smooth() + geom_hline(yintercept=0))
-
Access Plotly figure: Get underlying figure for custom modifications:
fig = plot.draw() # Returns plotly.graph_objects.Figure fig.update_layout(...) # Standard Plotly customization
-
Per-geom data: Override plot data for specific geoms:
(ggplot(df1, aes(x='x', y='y')) + geom_point() + geom_line(data=df2)) # Different data for this geom
Located in examples/:
ggplotly_master_examples.ipynb- Comprehensive examplesExamples.ipynb- Core functionalityview_all.ipynb- Gallery of all featuresprices.ipynb- Financial datamaps.ipynb- Geographic mappingEdgeBundling.ipynb- Network visualization
| ggplot2 (R) | ggplotly (Python) |
|---|---|
aes(x = col) |
aes(x='col') (strings required) |
%+% for data replacement |
Not supported |
stat_bin(geom="line") |
Use geom_line(stat=stat_bin) |
theme(text = element_text(...)) |
theme(text=element_text(...)) |
| Automatic printing | Use .show() or Jupyter auto-display |
# Check what data a geom receives
plot = ggplot(df, aes(x='x', y='y')) + geom_point()
fig = plot.draw() # Renders and returns figure
# Inspect Plotly traces
for trace in fig.data:
print(trace)
# Check aesthetic mappings
print(plot.mapping) # Shows aes mappings
# Verify data normalization
from ggplotly.data_utils import normalize_data
normalized_df, mapping = normalize_data(df, aes(x='x', y='y'))Always test before flagging issues. When auditing code for bugs or missing features:
-
Don't grep-and-flag - Pattern matching on code structure without understanding behavior leads to false positives
-
Run the code - A 30-second
python3 -c "..."test catches most false positives:# Instead of assuming geom_bar fill is broken because it's commented out: python3 -c " from ggplotly import ggplot, aes, geom_bar import pandas as pd df = pd.DataFrame({'x': ['A', 'B', 'C']}) fig = (ggplot(df, aes(x='x')) + geom_bar(fill='red')).draw() print(fig.data[0].marker.color) # Actually works! "
-
Trace cross-file interactions - Code in one file may have fallback logic in another (e.g.,
geom_barrelies ongeom_base._apply_color_targetsfallback) -
Ask "why" before flagging - If something looks wrong, investigate whether it's intentional design (e.g.,
stat_edgebundledoesn't inherit fromStatbecause it has a different API contract) -
Check existing tests - If tests pass for a "broken" feature, the feature probably works
-
Test diverse input types - Don't just test the happy path. Test edge cases like:
- Dict input vs DataFrame vs Series
- Empty data, single row, large data
- Different column types (numeric, categorical, datetime)
- Missing values, NaN handling
Example: The test suite only used DataFrames, so dict input to
ggplot()was broken and went undetected. -
Visually verify visualization code - For charts and plots, passing tests and running without errors is NOT sufficient. The visual output IS the test:
- Generate the actual chart and open it in a browser to view it
- Check that visual properties match expectations (stacking, colors, positions, aspect ratios)
- Don't assume correct data flow means correct rendering
- NEVER claim "all checks passed" without actually viewing the output
Examples of bugs only visible through actual visual inspection:
- Histogram code passed all tests and set
barmode='stack', but bars weren't actually stacking because each group had different bin edges coord_fixed(ratio=2)set the correct Plotly properties (scaleratio=2), but theconstrain='domain'setting was overriding the aspect ratio - squares appeared as squares instead of tall rectangles- Structural tests passed (correct number of traces, correct types), but visual output was completely wrong
Required visual verification process:
fig = plot.draw() fig.write_html('/tmp/test_output.html') # THEN: open /tmp/test_output.html # Actually view the file!