# Tidyplots for Python
<div align="center">
<table>
<tr>
<td><img src="figures/1_scatter.png" width="150"/></td>
<td><img src="figures/1.4_box.png" width="150"/></td>
<td><img src="figures/1.5_violin.png" width="150"/></td>
<td><img src="figures/1.6_density.png" width="150"/></td>
<td><img src="figures/1_line.png" width="150"/></td>
<td><img src="figures/1.3_bar.png" width="150"/></td>
</tr>
<tr>
<td><img src="figures/1.8_dot.png" width="150"/></td>
<td><img src="figures/2.1_mean_bar.png" width="150"/></td>
<td><img src="figures/7.5_curve_fit.png" width="150"/></td>
<td><img src="figures/7.6.1_sem_ribbon.png" width="150"/></td>
<td><img src="figures/7.1_hexbin.png" width="150"/></td>
<td><img src="figures/3.5_count.png" width="150"/></td>
</tr>
<tr>
<td><img src="figures/5.4.2_scatter.png" width="150"/></td>
<td><img src="figures/7.2_density_2d_filled.png" width="150"/></td>
<td><img src="figures/4.1_plot.png" width="150"/></td>
<td><img src="figures/5.4_bar.png" width="150"/></td>
<td><img src="figures/2.7_scatter.png" width="150"/></td>
<td><img src="figures/2.8_scatter.png" width="150"/></td>
</tr>
<tr>
<td><img src="figures/3.1_scatter.png" width="150"/></td>
<td><img src="figures/3.4_scatter.png" width="150"/></td>
<td><img src="figures/4.2_plot.png" width="150"/></td>
<td><img src="figures/4_scatter.png" width="150"/></td>
<td><img src="figures/5.1_scatter.png" width="150"/></td>
<td><img src="figures/5.2_scatter.png" width="150"/></td>
</tr>
<tr>
<td><img src="figures/5.3_scatter.png" width="150"/></td>
<td><img src="figures/5.4.1_line.png" width="150"/></td>
<td><img src="figures/5.4.3_scatter.png" width="150"/></td>
<td><img src="figures/6.1_bar.png" width="150"/></td>
<td><img src="figures/6.2_bar.png" width="150"/></td>
<td><img src="figures/7.2_plot.png" width="150"/></td>
</tr>
<tr>
<td><img src="figures/7.3.1_sum_bar_value.png" width="150"/></td>
<td><img src="figures/7.3.2_sum_line_dot.png" width="150"/></td>
<td><img src="figures/7.3.3_sum_area.png" width="150"/></td>
<td><img src="figures/7.4.1_median_bar_value.png" width="150"/></td>
<td><img src="figures/7.4.2_median_line_dot.png" width="150"/></td>
<td><img src="figures/7.4.3_median_area.png" width="150"/></td>
</tr>
</table>
</div>
A Python library for creating publication-ready plots with a fluent, chainable interface, inspired by [tidyplots R version](https://github.com/jbengler/tidyplots).
Built on top of plotnine, it provides a pandas DataFrame extension method for easy and intuitive plot creation.
todo list
* [X] complete the whole set of plotting that maps to R version
* [ ] more themes, such as ggprism
* [ ] more palettes, including palettes from famous journals, for instance, ggsci
* [ ] more plotting variations from ggpubr
## Features
- Fluent, chainable interface using pandas DataFrame extension
- Publication-ready plots with Nature Publishing Group (NPG) color palette by default
- Comprehensive set of plot types and statistical visualizations
- Flexible faceting with `split_by` parameter for both single and multi-variable splits
- Statistical annotations (p-values, correlations, etc.)
- Multiple scientific journal color palettes (NPG, AAAS, NEJM, etc.)
- Easy customization of colors, labels, and themes
- Prism-style publication themes
## Installation
```bash
pip install tidyplots-python
```
for development version:
```
pip install git+https://github.com/JNU-Tangyin/tidyplots-python.git
```
## Quick Start
```python
import pandas as pd
import numpy as np
import seaborn as sns
from tidyplots import TidyPlot
# Create sample data
iris = sns.load_dataset("iris")
# Create a scatter plot with groups
(iris.tidyplot(x='sepal_length', y='sepal_width', fill='species')
.add_scatter(size=5, alpha=0.5)
.add_density_2d(alpha=0.1)
.adjust_labels(title='Customization Test',
x='Sepal Length', y='Sepal Width')
.adjust_colors(['#1f77b4', '#ff7f0e', '#2ca02c'])
.adjust_legend_position('right')
.show())
```
<img src="figures/5.4.2_scatter.png" width="500"/>
## API Reference
### Core Plot Creation
- `df.tidyplot(x, y=None, color=None)`: Initialize a plot with aesthetics
```python
df.tidyplot(x='column1', y='column2', color='group')
```
### Plot Types
- `.add_scatter(alpha=0.6, size=3)`: Add scatter points
- `.add_line(alpha=0.8)`: Add line plot
- `.add_boxplot(alpha=0.3)`: Add box plot
- `.add_violin(draw_quantiles=[0.25, 0.5, 0.75])`: Add violin plot
- `.add_density(alpha=0.5)`: Add density plot
- `.add_density_2d()`: Add 2D density contour plot
- `.add_bar()`: Add bar plot
- `.add_errorbar(ymin, ymax)`: Add error bars
- `.add_hex(bins=20)`: Add hexbin plot
- `.add_data_points(alpha=0.3)`: Add jittered data points
### Statistical Features
- `.add_smooth(method='lm')`: Add smoothing line
- `.add_correlation_text()`: Add correlation coefficient
- `.add_pvalue(p_value, x1, x2, y)`: Add p-value annotation
### Customization
- `.adjust_labels(title=None, x=None, y=None)`: Set plot labels
- `.adjust_colors(palette)`: Change color palette
- Available palettes: 'npg' (default), 'aaas', 'nejm', 'lancet', 'jama', 'd3', 'material', 'igv'
- `.scale_color_gradient(low, high)`: Set color gradient
- `.adjust_axis_text_angle(angle)`: Rotate axis text
### Themes
- Default theme: Prism-style with NPG colors
- Customizable base theme:
```python
.adjust_theme(base_size=11, base_family="Arial")
```
## Examples
### Basic Plots
```python
import pandas as pd
import seaborn as sns
from tidyplots import TidyPlot
# Load sample data
iris = sns.load_dataset("iris")
# Simple scatter plot with groups
(iris.tidyplot(x='sepal_length', y='sepal_width', fill='species')
.add_scatter()
.show())
# Faceted plot using single variable
(iris.tidyplot(x='sepal_length', y='sepal_width', fill='species', split_by='species')
.add_scatter()
.show())
# Grid faceted plot using two variables
(iris.tidyplot(x='sepal_length', y='sepal_width', fill='species',
split_by=['species', 'petal_size_category']) # petal_size_category is an example
.add_scatter()
.show())
```
### Scatter Plot
```python
import seaborn as sns
from tidyplots import TidyPlot
# Load iris dataset
iris = sns.load_dataset('iris')
# Create a scatter plot
(iris.tidyplot(x='sepal_length', y='sepal_width', fill='species')
.add_scatter(size=5, alpha=0.7)
.adjust_labels(title='Iris Sepal Dimensions',
x='Sepal Length', y='Sepal Width'))
```

### Box Plot
```python
# Create a box plot
(iris.tidyplot(x='species', y='sepal_length', fill='species')
.add_boxplot(alpha=0.5)
.add_jitter(width=0.2, size=3, alpha=0.7)
.adjust_labels(title='Sepal Length by Species',
x='Species', y='Sepal Length'))
```

### Violin Plot
```python
# Create a violin plot
(iris.tidyplot(x='species', y='petal_length', fill='species')
.add_violin(draw_quantiles=[0.25, 0.5, 0.75], alpha=0.6)
.adjust_labels(title='Petal Length Distribution',
x='Species', y='Petal Length'))
```

### Density Plot
```python
# Create a density plot
(iris.tidyplot(x='sepal_width', fill='species')
.add_density(alpha=0.3)
.adjust_labels(title='Sepal Width Density',
x='Sepal Width', y='Density'))
```

### Line Plot
```python
# Load flights dataset
flights = sns.load_dataset('flights')
# Create a line plot
(flights.tidyplot(x='year', y='passengers', fill='month')
.add_line(size=2)
.adjust_labels(title='Passenger Trends',
x='Year', y='Number of Passengers'))
```

### Bar Plot
```python
# Create a bar plot
tips = sns.load_dataset('tips')
# Average tip by day
(tips.tidyplot(x='day', y='tip', fill='sex')
.add_bar(stat='mean')
.adjust_labels(title='Average Tip by Day',
x='Day', y='Average Tip'))
```

### Dot Plot
```python
# Create a dot plot
(iris.tidyplot(x='species', y='sepal_length', fill='species')
.add_dotplot()
.adjust_labels(title='Sepal Length Dot Plot',
x='Species', y='Sepal Length'))
```

### Mean Bar Plot
```python
# Mean bar plot with error bars
(tips.tidyplot(x='day', y='total_bill', fill='sex')
.add_mean_bar(alpha=0.5)
.add_sem_errorbar()
.adjust_labels(title='Mean Total Bill by Day',
x='Day', y='Total Bill'))
```

### Curve Fitting
```python
# Scatter plot with curve fitting
(tips.tidyplot(x='total_bill', y='tip', fill='sex')
.add_scatter()
.add_smooth(method='lm')
.adjust_labels(title='Tip vs Total Bill',
x='Total Bill', y='Tip'))
```

### Ribbon Plot (SEM)
```python
# Ribbon plot showing standard error
(tips.groupby('day')['total_bill']
.apply(lambda x: pd.DataFrame({
'mean': x.mean(),
'sem': x.sem()
}))
.reset_index()
.tidyplot(x='day', y='mean', fill='sex')
.add_line(color='cyan')
.add_ribbon(ymin='mean - sem', ymax='mean + sem', alpha=0.3)
.adjust_labels(title='Total Bill with SEM',
x='Day', y='Total Bill'))
```

### Hexbin Plot
```python
# Hexbin plot for density visualization
(tips.tidyplot(x='total_bill', y='tip', fill='sex')
.add_hex(bins=20)
.adjust_labels(title='Total Bill vs Tip Density',
x='Total Bill', y='Tip'))
```

### Count Plot
```python
# Count plot
titanic = sns.load_dataset('titanic')
(titanic.tidyplot(x='class', fill='survived')
.add_bar(stat='count')
.adjust_labels(title='Passenger Count by Class and Survival',
x='Class', y='Count'))
```

## Pie Charts
TidyPlots supports creating customizable pie charts using the `add_pie()` method. You can customize colors, explode slices, add shadows, and format percentages.
#### Basic Pie Chart
```python
# Create a simple pie chart
titanic = sns.load_dataset("titanic")
survival_counts = titanic['survived'].value_counts().reset_index()
survival_counts.columns = ['Status', 'Count']
survival_counts['Status'] = survival_counts['Status'].map({0: 'Did Not Survive', 1: 'Survived'})
(survival_counts.tidyplot(x='Status', y='Count')
.add_pie(colors=['#ff9999', '#66b3ff'])
.show())
```
#### Exploded Pie Chart with Shadow
```python
# Create pie chart with exploded slices and shadow effect
class_counts = titanic['class'].value_counts().reset_index()
class_counts.columns = ['Class', 'Count']
(class_counts.tidyplot(x='Class', y='Count')
.add_pie(colors=['#ff9999', '#66b3ff', '#99ff99'],
explode=(0.1, 0, 0), # Explode first slice
shadow=True) # Add shadow effect
.show())
```
#### Custom Percentage Formatting
```python
# Show both percentages and counts
tips = sns.load_dataset("tips")
day_counts = tips['day'].value_counts().reset_index()
day_counts.columns = ['Day', 'Count']
(day_counts.tidyplot(x='Day', y='Count')
.add_pie(colors=['#ff9999', '#66b3ff', '#99ff99', '#ffcc99'],
autopct=lambda pct: f'{pct:.1f}%\n({int(pct/100.*sum(day_counts["Count"])):.0f})',
startangle=45)
.show())
```
#### Advanced Label Positioning
```python
# Adjust label positions and add white edges
diamonds = sns.load_dataset("diamonds")
cut_counts = diamonds['cut'].value_counts().reset_index()
cut_counts.columns = ['Cut', 'Count']
(cut_counts.tidyplot(x='Cut', y='Count')
.add_pie(colors=['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#ff99cc'],
pctdistance=0.85, # Move percentage labels outward
labeldistance=1.1, # Move labels outward
wedgeprops={'linewidth': 3, 'edgecolor': 'white'}) # Add white edges
.show())
```
The `add_pie()` method supports various customization options:
- `colors`: List of colors for pie slices
- `explode`: Tuple of floats to offset each slice
- `autopct`: String or function for percentage display (default: '%1.1f%%')
- `pctdistance`: Float for percentage label distance from center
- `labeldistance`: Float for label distance from center
- `shadow`: Bool to add shadow effect
- `startangle`: Int for starting angle in degrees (default: 90)
- `wedgeprops`: Dict of properties for pie wedges
Note: Pie charts do not support faceting with `split_by`. For faceted visualizations, consider using bar plots or other plot types.
## Faceting
The `split_by` parameter allows you to create faceted plots in two ways:
1. Single variable faceting using `facet_wrap`
2. Two-variable faceting using `facet_grid`
### Single Variable Faceting (facet_wrap)
```python
# Create faceted scatter plot by species
(iris.tidyplot(x='sepal_length', y='sepal_width', split_by='species', fill='species')
.add_scatter(alpha=0.6)
.adjust_labels(title='Iris Measurements by Species',
x='Sepal Length', y='Sepal Width')
.show())
# Create faceted violin plot by island
penguins = sns.load_dataset("penguins")
(penguins.tidyplot(x='species', y='body_mass_g', split_by='island', fill='species')
.add_violin(alpha=0.7)
.adjust_labels(title='Penguin Body Mass by Island',
x='Species', y='Body Mass (g)')
.show())
```
<div align="center">
<table>
<tr>
<td><img src="figures/penguins_violin.png" width="400"/></td>
</tr>
</table>
</div>
### Two-Variable Faceting (facet_grid)
```python
# Create faceted scatter plot by day and time
tips = sns.load_dataset("tips")
(tips.tidyplot(x='total_bill', y='tip', split_by=['day', 'time'], fill='smoker')
.add_scatter(alpha=0.6)
.adjust_labels(title='Tips by Day and Time',
x='Total Bill', y='Tip')
.show())
# Create faceted boxplot by color and clarity
diamonds = sns.load_dataset("diamonds")
diamonds_subset = diamonds.sample(n=1000, random_state=42)
(diamonds_subset.tidyplot(x='cut', y='price', split_by=['color', 'clarity'], fill='color')
.add_boxplot(alpha=0.7)
.adjust_labels(title='Diamond Prices by Cut, Color, and Clarity',
x='Cut', y='Price')
.show())
```
<div align="center">
<table>
<tr>
<td><img src="figures/13.2_tips_facet_grid.png" width="400"/></td>
</tr>
</table>
</div>
### Bar Plot with Faceting
```python
# Create faceted bar plot showing survival counts
titanic = sns.load_dataset("titanic")
survival_data = titanic.groupby(['class', 'sex', 'survived']).size().reset_index(name='count')
(survival_data.tidyplot(x='class', y='count', fill='survived', split_by='sex')
.add_bar(position='dodge', alpha=0.7)
.adjust_labels(title='Titanic Survival by Class and Sex',
x='Class', y='Count')
.show())
```
<div align="center">
<img src="figures/titanic_survival.png" width="400"/>
</div>
## Color Palettes
Default color palettes from scientific journals:
```python
# Change color palette to Nature Publishing Group (default)
df.tidyplot(...).adjust_colors('npg') #
```
Available palettes (thanks to ggsci):
- 'npg': Nature Publishing Group colors (default)
- 'aaas': Science/AAAS colors
- 'nejm': New England Journal of Medicine colors
- 'lancet': The Lancet colors
- 'jama': Journal of American Medical Association colors
- 'd3': D3.js colors
- 'material': Material Design colors
- 'igv': Integrative Genomics Viewer colors
and it supports the all color schemes in matplotlib by name:
```python
cmaps = [
('Perceptually Uniform Sequential', ['viridis', 'plasma', 'inferno', 'magma']),
('Sequential', [ 'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds', 'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu', 'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']),
('Sequential (2)', ['binary', 'gist_yarg', 'gist_gray', 'gray', 'bone', 'pink', 'spring', 'summer', 'autumn', 'winter', 'cool', 'Wistia','hot', 'afmhot', 'gist_heat', 'copper']),
('Diverging', ['PiYG', 'PRGn', 'BrBG', 'PuOr', 'RdGy', 'RdBu', 'RdYlBu', 'RdYlGn', 'Spectral', 'coolwarm', 'bwr', 'seismic']), ('Qualitative', [ 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'Set1', 'Set2', 'Set3', 'tab10', 'tab20', 'tab20b', 'tab20c']),
('Miscellaneous', [ 'flag', 'prism', 'ocean', 'gist_earth', 'terrain', 'gist_stern', 'gnuplot', 'gnuplot2', 'CMRmap', 'cubehelix', 'brg', 'hsv', 'gist_rainbow', 'rainbow', 'jet', 'nipy_spectral', 'gist_ncar'])
]
# Change color palette using Matplotlib colormaps
df.tidyplot(...).adjust_colors('Set1')
```
more to come ...
## Dependencies
- pandas >= 1.0.0
- numpy >= 1.18.0
- plotnine >= 0.8.0
## Notices
This project is a Python transplant from [tidyplots R version](https://github.com/jbengler/tidyplots), and it is supported by [Windsurf](https://codeium.com/windsurf).
Althought I believe the philosophy we present in this work is convenient, pythonic, and easy to use, it is however rudimentary starting point, therefore it may contain bugs and missing features. Please forgive me if you find any.
And most importantly, contributions are more than welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Missing functions
Please note that not all the functions are available because of:
1. plotnine being a younger library with fewer contributors
2. Some R-specific features being harder to implement in Python
3. Different underlying graphics engines (R's grid vs Python's matplotlib)
The list is provided as below:
1. **Removed Functions** :
* `geom_hex()` - Hexagonal binning not available in plotnine
* `geom_text_repel()` - Text repulsion for avoiding overlaps not available
* `coord_polar()` - Polar coordinate system not fully supported
* `stat_density_2d_filled()` - Filled 2D density plots not available
* `geom_raster()` - High-performance raster rendering not available
* `geom_sf()` - Simple features (geographic data) not supported
* `geom_spoke()` - Spoke plots not available
* `stat_ellipse()` - Statistical ellipses not supported
* `geom_label()` - Labels with backgrounds not available
* `geom_curve()` - Curved line segments not available
2. **Modified Functions** :
* `add_data_labels_repel()` - Modified to use regular `geom_text()` instead of `geom_text_repel()`
* `add_pie_chart()` - Implemented using bar plots and coordinate transformations since native pie charts aren't supported
* `add_donut_chart()` - Similar to pie charts, implemented through workarounds
* `add_density_2d_filled()` - Had to use regular density_2d with modified aesthetics
3. **Limited Functionality** :
* `facet_grid()` - More limited options compared to ggplot2
* `scale_*_gradient2()` - Diverging color scales have limited options
* `theme()` - Some theme elements and customizations not available
* `coord_fixed()` - Fixed coordinate ratio support is limited
* `position_dodge2()` - Advanced dodging features not available
4. **Performance Differences** :
* Large dataset handling is generally slower in plotnine
* Some smoothing methods in `stat_smooth()` have fewer options
* Rendering of complex plots with many layers is less optimized
Raw data
{
"_id": null,
"home_page": "https://github.com/JNU-Tangyin/tidyplots-python",
"name": "tidyplots-python",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "data visualization, plotting, statistics, ggplot, tidyverse",
"author": "Tangyin",
"author_email": "tangyin@jnu.edu.cn",
"download_url": "https://files.pythonhosted.org/packages/15/1a/a30bef2e030090523b492231bb518fa7f0f1b47851261234af0b6cadc913/tidyplots_python-0.1.0.tar.gz",
"platform": null,
"description": "# Tidyplots for Python\n\n<div align=\"center\">\n<table>\n<tr>\n<td><img src=\"figures/1_scatter.png\" width=\"150\"/></td>\n<td><img src=\"figures/1.4_box.png\" width=\"150\"/></td>\n<td><img src=\"figures/1.5_violin.png\" width=\"150\"/></td>\n<td><img src=\"figures/1.6_density.png\" width=\"150\"/></td>\n<td><img src=\"figures/1_line.png\" width=\"150\"/></td>\n<td><img src=\"figures/1.3_bar.png\" width=\"150\"/></td>\n</tr>\n<tr>\n<td><img src=\"figures/1.8_dot.png\" width=\"150\"/></td>\n<td><img src=\"figures/2.1_mean_bar.png\" width=\"150\"/></td>\n<td><img src=\"figures/7.5_curve_fit.png\" width=\"150\"/></td>\n<td><img src=\"figures/7.6.1_sem_ribbon.png\" width=\"150\"/></td>\n<td><img src=\"figures/7.1_hexbin.png\" width=\"150\"/></td>\n<td><img src=\"figures/3.5_count.png\" width=\"150\"/></td>\n</tr>\n<tr>\n<td><img src=\"figures/5.4.2_scatter.png\" width=\"150\"/></td>\n<td><img src=\"figures/7.2_density_2d_filled.png\" width=\"150\"/></td>\n<td><img src=\"figures/4.1_plot.png\" width=\"150\"/></td>\n<td><img src=\"figures/5.4_bar.png\" width=\"150\"/></td>\n<td><img src=\"figures/2.7_scatter.png\" width=\"150\"/></td>\n<td><img src=\"figures/2.8_scatter.png\" width=\"150\"/></td>\n</tr>\n<tr>\n<td><img src=\"figures/3.1_scatter.png\" width=\"150\"/></td>\n<td><img src=\"figures/3.4_scatter.png\" width=\"150\"/></td>\n<td><img src=\"figures/4.2_plot.png\" width=\"150\"/></td>\n<td><img src=\"figures/4_scatter.png\" width=\"150\"/></td>\n<td><img src=\"figures/5.1_scatter.png\" width=\"150\"/></td>\n<td><img src=\"figures/5.2_scatter.png\" width=\"150\"/></td>\n</tr>\n<tr>\n<td><img src=\"figures/5.3_scatter.png\" width=\"150\"/></td>\n<td><img src=\"figures/5.4.1_line.png\" width=\"150\"/></td>\n<td><img src=\"figures/5.4.3_scatter.png\" width=\"150\"/></td>\n<td><img src=\"figures/6.1_bar.png\" width=\"150\"/></td>\n<td><img src=\"figures/6.2_bar.png\" width=\"150\"/></td>\n<td><img src=\"figures/7.2_plot.png\" width=\"150\"/></td>\n</tr>\n<tr>\n<td><img src=\"figures/7.3.1_sum_bar_value.png\" width=\"150\"/></td>\n<td><img src=\"figures/7.3.2_sum_line_dot.png\" width=\"150\"/></td>\n<td><img src=\"figures/7.3.3_sum_area.png\" width=\"150\"/></td>\n<td><img src=\"figures/7.4.1_median_bar_value.png\" width=\"150\"/></td>\n<td><img src=\"figures/7.4.2_median_line_dot.png\" width=\"150\"/></td>\n<td><img src=\"figures/7.4.3_median_area.png\" width=\"150\"/></td>\n</tr>\n</table>\n</div>\n\nA Python library for creating publication-ready plots with a fluent, chainable interface, inspired by [tidyplots R version](https://github.com/jbengler/tidyplots).\n\nBuilt on top of plotnine, it provides a pandas DataFrame extension method for easy and intuitive plot creation.\n\ntodo list\n\n* [X] complete the whole set of plotting that maps to R version\n* [ ] more themes, such as ggprism\n* [ ] more palettes, including palettes from famous journals, for instance, ggsci\n* [ ] more plotting variations from ggpubr\n\n## Features\n\n- Fluent, chainable interface using pandas DataFrame extension\n- Publication-ready plots with Nature Publishing Group (NPG) color palette by default\n- Comprehensive set of plot types and statistical visualizations\n- Flexible faceting with `split_by` parameter for both single and multi-variable splits\n- Statistical annotations (p-values, correlations, etc.)\n- Multiple scientific journal color palettes (NPG, AAAS, NEJM, etc.)\n- Easy customization of colors, labels, and themes\n- Prism-style publication themes\n\n## Installation\n\n```bash\npip install tidyplots-python\n```\n\nfor development version:\n\n```\npip install git+https://github.com/JNU-Tangyin/tidyplots-python.git\n```\n\n## Quick Start\n\n```python\nimport pandas as pd\nimport numpy as np\nimport seaborn as sns\nfrom tidyplots import TidyPlot\n\n# Create sample data\niris = sns.load_dataset(\"iris\")\n\n# Create a scatter plot with groups\n(iris.tidyplot(x='sepal_length', y='sepal_width', fill='species')\n .add_scatter(size=5, alpha=0.5)\n .add_density_2d(alpha=0.1)\n .adjust_labels(title='Customization Test',\n x='Sepal Length', y='Sepal Width')\n .adjust_colors(['#1f77b4', '#ff7f0e', '#2ca02c'])\n .adjust_legend_position('right')\n .show())\n```\n\n<img src=\"figures/5.4.2_scatter.png\" width=\"500\"/>\n\n## API Reference\n\n### Core Plot Creation\n\n- `df.tidyplot(x, y=None, color=None)`: Initialize a plot with aesthetics\n\n ```python\n df.tidyplot(x='column1', y='column2', color='group')\n ```\n\n### Plot Types\n\n- `.add_scatter(alpha=0.6, size=3)`: Add scatter points\n- `.add_line(alpha=0.8)`: Add line plot\n- `.add_boxplot(alpha=0.3)`: Add box plot\n- `.add_violin(draw_quantiles=[0.25, 0.5, 0.75])`: Add violin plot\n- `.add_density(alpha=0.5)`: Add density plot\n- `.add_density_2d()`: Add 2D density contour plot\n- `.add_bar()`: Add bar plot\n- `.add_errorbar(ymin, ymax)`: Add error bars\n- `.add_hex(bins=20)`: Add hexbin plot\n- `.add_data_points(alpha=0.3)`: Add jittered data points\n\n### Statistical Features\n\n- `.add_smooth(method='lm')`: Add smoothing line\n- `.add_correlation_text()`: Add correlation coefficient\n- `.add_pvalue(p_value, x1, x2, y)`: Add p-value annotation\n\n### Customization\n\n- `.adjust_labels(title=None, x=None, y=None)`: Set plot labels\n- `.adjust_colors(palette)`: Change color palette\n - Available palettes: 'npg' (default), 'aaas', 'nejm', 'lancet', 'jama', 'd3', 'material', 'igv'\n- `.scale_color_gradient(low, high)`: Set color gradient\n- `.adjust_axis_text_angle(angle)`: Rotate axis text\n\n### Themes\n\n- Default theme: Prism-style with NPG colors\n- Customizable base theme:\n\n ```python\n .adjust_theme(base_size=11, base_family=\"Arial\")\n ```\n\n## Examples\n\n### Basic Plots\n\n```python\nimport pandas as pd\nimport seaborn as sns\nfrom tidyplots import TidyPlot\n\n# Load sample data\niris = sns.load_dataset(\"iris\")\n\n# Simple scatter plot with groups\n(iris.tidyplot(x='sepal_length', y='sepal_width', fill='species')\n .add_scatter()\n .show())\n\n# Faceted plot using single variable\n(iris.tidyplot(x='sepal_length', y='sepal_width', fill='species', split_by='species')\n .add_scatter()\n .show())\n\n# Grid faceted plot using two variables\n(iris.tidyplot(x='sepal_length', y='sepal_width', fill='species', \n split_by=['species', 'petal_size_category']) # petal_size_category is an example\n .add_scatter()\n .show())\n```\n\n### Scatter Plot\n\n```python\nimport seaborn as sns\nfrom tidyplots import TidyPlot\n\n# Load iris dataset\niris = sns.load_dataset('iris')\n\n# Create a scatter plot\n(iris.tidyplot(x='sepal_length', y='sepal_width', fill='species')\n .add_scatter(size=5, alpha=0.7)\n .adjust_labels(title='Iris Sepal Dimensions', \n x='Sepal Length', y='Sepal Width'))\n```\n\n\n\n### Box Plot\n\n```python\n# Create a box plot\n(iris.tidyplot(x='species', y='sepal_length', fill='species')\n .add_boxplot(alpha=0.5)\n .add_jitter(width=0.2, size=3, alpha=0.7)\n .adjust_labels(title='Sepal Length by Species', \n x='Species', y='Sepal Length'))\n```\n\n\n\n### Violin Plot\n\n```python\n# Create a violin plot\n(iris.tidyplot(x='species', y='petal_length', fill='species')\n .add_violin(draw_quantiles=[0.25, 0.5, 0.75], alpha=0.6)\n .adjust_labels(title='Petal Length Distribution', \n x='Species', y='Petal Length'))\n```\n\n\n\n### Density Plot\n\n```python\n# Create a density plot\n(iris.tidyplot(x='sepal_width', fill='species')\n .add_density(alpha=0.3)\n .adjust_labels(title='Sepal Width Density', \n x='Sepal Width', y='Density'))\n```\n\n\n\n### Line Plot\n\n```python\n# Load flights dataset\nflights = sns.load_dataset('flights')\n\n# Create a line plot\n(flights.tidyplot(x='year', y='passengers', fill='month')\n .add_line(size=2)\n .adjust_labels(title='Passenger Trends', \n x='Year', y='Number of Passengers'))\n```\n\n\n\n### Bar Plot\n\n```python\n# Create a bar plot\ntips = sns.load_dataset('tips')\n\n# Average tip by day\n(tips.tidyplot(x='day', y='tip', fill='sex')\n .add_bar(stat='mean')\n .adjust_labels(title='Average Tip by Day', \n x='Day', y='Average Tip'))\n```\n\n\n\n### Dot Plot\n\n```python\n# Create a dot plot\n(iris.tidyplot(x='species', y='sepal_length', fill='species')\n .add_dotplot()\n .adjust_labels(title='Sepal Length Dot Plot', \n x='Species', y='Sepal Length'))\n```\n\n\n\n### Mean Bar Plot\n\n```python\n# Mean bar plot with error bars\n(tips.tidyplot(x='day', y='total_bill', fill='sex')\n .add_mean_bar(alpha=0.5)\n .add_sem_errorbar()\n .adjust_labels(title='Mean Total Bill by Day', \n x='Day', y='Total Bill'))\n```\n\n\n\n### Curve Fitting\n\n```python\n# Scatter plot with curve fitting\n(tips.tidyplot(x='total_bill', y='tip', fill='sex')\n .add_scatter()\n .add_smooth(method='lm')\n .adjust_labels(title='Tip vs Total Bill', \n x='Total Bill', y='Tip'))\n```\n\n\n\n### Ribbon Plot (SEM)\n\n```python\n# Ribbon plot showing standard error\n(tips.groupby('day')['total_bill']\n .apply(lambda x: pd.DataFrame({\n 'mean': x.mean(),\n 'sem': x.sem()\n }))\n .reset_index()\n .tidyplot(x='day', y='mean', fill='sex')\n .add_line(color='cyan')\n .add_ribbon(ymin='mean - sem', ymax='mean + sem', alpha=0.3)\n .adjust_labels(title='Total Bill with SEM', \n x='Day', y='Total Bill'))\n```\n\n\n\n### Hexbin Plot\n\n```python\n# Hexbin plot for density visualization\n(tips.tidyplot(x='total_bill', y='tip', fill='sex')\n .add_hex(bins=20)\n .adjust_labels(title='Total Bill vs Tip Density', \n x='Total Bill', y='Tip'))\n```\n\n\n\n### Count Plot\n\n```python\n# Count plot\ntitanic = sns.load_dataset('titanic')\n(titanic.tidyplot(x='class', fill='survived')\n .add_bar(stat='count')\n .adjust_labels(title='Passenger Count by Class and Survival', \n x='Class', y='Count'))\n```\n\n\n\n## Pie Charts\n\nTidyPlots supports creating customizable pie charts using the `add_pie()` method. You can customize colors, explode slices, add shadows, and format percentages.\n\n#### Basic Pie Chart\n```python\n# Create a simple pie chart\ntitanic = sns.load_dataset(\"titanic\")\nsurvival_counts = titanic['survived'].value_counts().reset_index()\nsurvival_counts.columns = ['Status', 'Count']\nsurvival_counts['Status'] = survival_counts['Status'].map({0: 'Did Not Survive', 1: 'Survived'})\n(survival_counts.tidyplot(x='Status', y='Count')\n .add_pie(colors=['#ff9999', '#66b3ff'])\n .show())\n```\n\n#### Exploded Pie Chart with Shadow\n```python\n# Create pie chart with exploded slices and shadow effect\nclass_counts = titanic['class'].value_counts().reset_index()\nclass_counts.columns = ['Class', 'Count']\n(class_counts.tidyplot(x='Class', y='Count')\n .add_pie(colors=['#ff9999', '#66b3ff', '#99ff99'],\n explode=(0.1, 0, 0), # Explode first slice\n shadow=True) # Add shadow effect\n .show())\n```\n\n#### Custom Percentage Formatting\n```python\n# Show both percentages and counts\ntips = sns.load_dataset(\"tips\")\nday_counts = tips['day'].value_counts().reset_index()\nday_counts.columns = ['Day', 'Count']\n(day_counts.tidyplot(x='Day', y='Count')\n .add_pie(colors=['#ff9999', '#66b3ff', '#99ff99', '#ffcc99'],\n autopct=lambda pct: f'{pct:.1f}%\\n({int(pct/100.*sum(day_counts[\"Count\"])):.0f})',\n startangle=45)\n .show())\n```\n\n#### Advanced Label Positioning\n```python\n# Adjust label positions and add white edges\ndiamonds = sns.load_dataset(\"diamonds\")\ncut_counts = diamonds['cut'].value_counts().reset_index()\ncut_counts.columns = ['Cut', 'Count']\n(cut_counts.tidyplot(x='Cut', y='Count')\n .add_pie(colors=['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#ff99cc'],\n pctdistance=0.85, # Move percentage labels outward\n labeldistance=1.1, # Move labels outward\n wedgeprops={'linewidth': 3, 'edgecolor': 'white'}) # Add white edges\n .show())\n```\n\nThe `add_pie()` method supports various customization options:\n- `colors`: List of colors for pie slices\n- `explode`: Tuple of floats to offset each slice\n- `autopct`: String or function for percentage display (default: '%1.1f%%')\n- `pctdistance`: Float for percentage label distance from center\n- `labeldistance`: Float for label distance from center\n- `shadow`: Bool to add shadow effect\n- `startangle`: Int for starting angle in degrees (default: 90)\n- `wedgeprops`: Dict of properties for pie wedges\n\nNote: Pie charts do not support faceting with `split_by`. For faceted visualizations, consider using bar plots or other plot types.\n\n## Faceting\n\nThe `split_by` parameter allows you to create faceted plots in two ways:\n1. Single variable faceting using `facet_wrap`\n2. Two-variable faceting using `facet_grid`\n\n### Single Variable Faceting (facet_wrap)\n\n```python\n# Create faceted scatter plot by species\n(iris.tidyplot(x='sepal_length', y='sepal_width', split_by='species', fill='species')\n .add_scatter(alpha=0.6)\n .adjust_labels(title='Iris Measurements by Species',\n x='Sepal Length', y='Sepal Width')\n .show())\n\n# Create faceted violin plot by island\npenguins = sns.load_dataset(\"penguins\")\n(penguins.tidyplot(x='species', y='body_mass_g', split_by='island', fill='species')\n .add_violin(alpha=0.7)\n .adjust_labels(title='Penguin Body Mass by Island',\n x='Species', y='Body Mass (g)')\n .show())\n```\n\n<div align=\"center\">\n<table>\n<tr>\n<td><img src=\"figures/penguins_violin.png\" width=\"400\"/></td>\n</tr>\n</table>\n</div>\n\n### Two-Variable Faceting (facet_grid)\n\n```python\n# Create faceted scatter plot by day and time\ntips = sns.load_dataset(\"tips\")\n(tips.tidyplot(x='total_bill', y='tip', split_by=['day', 'time'], fill='smoker')\n .add_scatter(alpha=0.6)\n .adjust_labels(title='Tips by Day and Time',\n x='Total Bill', y='Tip')\n .show())\n\n# Create faceted boxplot by color and clarity\ndiamonds = sns.load_dataset(\"diamonds\")\ndiamonds_subset = diamonds.sample(n=1000, random_state=42)\n(diamonds_subset.tidyplot(x='cut', y='price', split_by=['color', 'clarity'], fill='color')\n .add_boxplot(alpha=0.7)\n .adjust_labels(title='Diamond Prices by Cut, Color, and Clarity',\n x='Cut', y='Price')\n .show())\n```\n\n<div align=\"center\">\n<table>\n<tr>\n<td><img src=\"figures/13.2_tips_facet_grid.png\" width=\"400\"/></td>\n</tr>\n</table>\n</div>\n\n### Bar Plot with Faceting\n\n```python\n# Create faceted bar plot showing survival counts\ntitanic = sns.load_dataset(\"titanic\")\nsurvival_data = titanic.groupby(['class', 'sex', 'survived']).size().reset_index(name='count')\n(survival_data.tidyplot(x='class', y='count', fill='survived', split_by='sex')\n .add_bar(position='dodge', alpha=0.7)\n .adjust_labels(title='Titanic Survival by Class and Sex',\n x='Class', y='Count')\n .show())\n```\n\n<div align=\"center\">\n<img src=\"figures/titanic_survival.png\" width=\"400\"/>\n</div>\n\n## Color Palettes\n\nDefault color palettes from scientific journals:\n\n```python\n# Change color palette to Nature Publishing Group (default)\ndf.tidyplot(...).adjust_colors('npg') # \n```\n\nAvailable palettes (thanks to ggsci):\n\n- 'npg': Nature Publishing Group colors (default)\n- 'aaas': Science/AAAS colors\n- 'nejm': New England Journal of Medicine colors\n- 'lancet': The Lancet colors\n- 'jama': Journal of American Medical Association colors\n- 'd3': D3.js colors\n- 'material': Material Design colors\n- 'igv': Integrative Genomics Viewer colors\n\nand it supports the all color schemes in matplotlib by name:\n\n```python\ncmaps = [\n('Perceptually Uniform Sequential', ['viridis', 'plasma', 'inferno', 'magma']),\n('Sequential', [ 'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds', 'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu', 'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']),\n('Sequential (2)', ['binary', 'gist_yarg', 'gist_gray', 'gray', 'bone', 'pink', 'spring', 'summer', 'autumn', 'winter', 'cool', 'Wistia','hot', 'afmhot', 'gist_heat', 'copper']),\n ('Diverging', ['PiYG', 'PRGn', 'BrBG', 'PuOr', 'RdGy', 'RdBu', 'RdYlBu', 'RdYlGn', 'Spectral', 'coolwarm', 'bwr', 'seismic']), ('Qualitative', [ 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'Set1', 'Set2', 'Set3', 'tab10', 'tab20', 'tab20b', 'tab20c']),\n('Miscellaneous', [ 'flag', 'prism', 'ocean', 'gist_earth', 'terrain', 'gist_stern', 'gnuplot', 'gnuplot2', 'CMRmap', 'cubehelix', 'brg', 'hsv', 'gist_rainbow', 'rainbow', 'jet', 'nipy_spectral', 'gist_ncar'])\n]\n# Change color palette using Matplotlib colormaps\ndf.tidyplot(...).adjust_colors('Set1')\n```\n\nmore to come ...\n\n## Dependencies\n\n- pandas >= 1.0.0\n- numpy >= 1.18.0\n- plotnine >= 0.8.0\n\n## Notices\n\nThis project is a Python transplant from [tidyplots R version](https://github.com/jbengler/tidyplots), and it is supported by [Windsurf](https://codeium.com/windsurf).\n\nAlthought I believe the philosophy we present in this work is convenient, pythonic, and easy to use, it is however rudimentary starting point, therefore it may contain bugs and missing features. Please forgive me if you find any.\n\nAnd most importantly, contributions are more than welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Missing functions\n\nPlease note that not all the functions are available because of:\n\n1. plotnine being a younger library with fewer contributors\n2. Some R-specific features being harder to implement in Python\n3. Different underlying graphics engines (R's grid vs Python's matplotlib)\n\nThe list is provided as below:\n\n1. **Removed Functions** :\n\n* `geom_hex()` - Hexagonal binning not available in plotnine\n* `geom_text_repel()` - Text repulsion for avoiding overlaps not available\n* `coord_polar()` - Polar coordinate system not fully supported\n* `stat_density_2d_filled()` - Filled 2D density plots not available\n* `geom_raster()` - High-performance raster rendering not available\n* `geom_sf()` - Simple features (geographic data) not supported\n* `geom_spoke()` - Spoke plots not available\n* `stat_ellipse()` - Statistical ellipses not supported\n* `geom_label()` - Labels with backgrounds not available\n* `geom_curve()` - Curved line segments not available\n\n2. **Modified Functions** :\n\n* `add_data_labels_repel()` - Modified to use regular `geom_text()` instead of `geom_text_repel()`\n* `add_pie_chart()` - Implemented using bar plots and coordinate transformations since native pie charts aren't supported\n* `add_donut_chart()` - Similar to pie charts, implemented through workarounds\n* `add_density_2d_filled()` - Had to use regular density_2d with modified aesthetics\n\n3. **Limited Functionality** :\n\n* `facet_grid()` - More limited options compared to ggplot2\n* `scale_*_gradient2()` - Diverging color scales have limited options\n* `theme()` - Some theme elements and customizations not available\n* `coord_fixed()` - Fixed coordinate ratio support is limited\n* `position_dodge2()` - Advanced dodging features not available\n\n4. **Performance Differences** :\n\n* Large dataset handling is generally slower in plotnine\n* Some smoothing methods in `stat_smooth()` have fewer options\n* Rendering of complex plots with many layers is less optimized\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python implementation of R's tidyplots for creating publication-ready plots",
"version": "0.1.0",
"project_urls": {
"Bug Reports": "https://github.com/JNU-Tangyin/tidyplots-python/issues",
"Documentation": "https://github.com/JNU-Tangyin/tidyplots-python#readme",
"Homepage": "https://github.com/JNU-Tangyin/tidyplots-python",
"Source": "https://github.com/JNU-Tangyin/tidyplots-python"
},
"split_keywords": [
"data visualization",
" plotting",
" statistics",
" ggplot",
" tidyverse"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "be494200e2e174d43dbc123bf8120bb78f4effafa99d8bdd64002b59f58c6f02",
"md5": "72907a6e7ce843b439eadcb1b3f705dc",
"sha256": "42e6e3c0d50122f0acf48afed7f93993f7ae950e7295e0f26a78df742b358732"
},
"downloads": -1,
"filename": "tidyplots_python-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "72907a6e7ce843b439eadcb1b3f705dc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 23384,
"upload_time": "2024-12-31T03:42:50",
"upload_time_iso_8601": "2024-12-31T03:42:50.581550Z",
"url": "https://files.pythonhosted.org/packages/be/49/4200e2e174d43dbc123bf8120bb78f4effafa99d8bdd64002b59f58c6f02/tidyplots_python-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "151aa30bef2e030090523b492231bb518fa7f0f1b47851261234af0b6cadc913",
"md5": "79f76fa2e847502b4fc72763fab6fda7",
"sha256": "e3c286585e3a3e06903a4944bca6a011af0eb84c1f06f3875af50f2a2b11e834"
},
"downloads": -1,
"filename": "tidyplots_python-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "79f76fa2e847502b4fc72763fab6fda7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 28720,
"upload_time": "2024-12-31T03:42:54",
"upload_time_iso_8601": "2024-12-31T03:42:54.942221Z",
"url": "https://files.pythonhosted.org/packages/15/1a/a30bef2e030090523b492231bb518fa7f0f1b47851261234af0b6cadc913/tidyplots_python-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-31 03:42:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "JNU-Tangyin",
"github_project": "tidyplots-python",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "plotnine",
"specs": [
[
">=",
"0.10.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.18.0"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.4.0"
]
]
},
{
"name": "scikit-misc",
"specs": [
[
">=",
"0.1.4"
]
]
},
{
"name": "statsmodels",
"specs": [
[
">=",
"0.13.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.4.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "black",
"specs": [
[
">=",
"22.0.0"
]
]
},
{
"name": "flake8",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "mypy",
"specs": [
[
">=",
"0.900"
]
]
}
],
"lcname": "tidyplots-python"
}