Where to Discuss?

Local Group

Preface

Goal: Pretty statistics visualization with Seaborn, equipped with example script for each plots.

Let’s face it: plotting raw numbers in matplotlib is, like eating plain oatmeal. Nutritious but a little… bland. Enter Seaborn, the library that dresses our data in a tuxedo, adds a spotlight, and whispers the punchline for you.

In this article, we will find ready-to-run example scripts for each plot. No tedious step-by-step handholding (the internet’s already brimming with tutorials). Our mission is to showcase what Seaborn can do for our statistical properties, so we can spend less time wrestling with code, and more time interpreting elegant visuals.

Remember: real-world data rarely stays neat and tidy. These are simple demos, consider them our rehearsal dinner before the big data wedding. Now, grab our favorite beverage and let’s our Seaborn’s stat-smarts in action.


Visualizing Linear Regression

Yes, we are still talking about trend, now with Seaborn.

Data Series

Instead of just one series, we’ll play with these three: ys₁, ys₂, or ys₃. That way we can compare how different curves behave side by side.

xs, ys1, ys2, ys3
0,  5,   5,   5
1,  9,   12,  14
2,  13,  25,  41
3,  17,  44,  98
4,  21,  69,  197
5,  25,  100, 350
6,  29,  137, 569
7,  33,  180, 866
8,  37,  229, 1253
9,  41,  284, 1742
10, 45,  345, 2345
11, 49,  412, 3074
12, 53,  485, 3941

Python: Seaborn: Statistical Properties: CSV Source

Comparing multiple series in one plot helps us, see at a glance which trendlines are stubbornly linear, and which ones go off to dramatic nonlinear land.

Regression Plot

A simple way to overlay regression lines, on scatter points for each series. Plotting linear regression plot is straightforward. We can plot all these three series at once in one plot figure.

# Getting Matrix Values
pairCSV = np.genfromtxt("series.csv",
  skip_header=1, delimiter=",", dtype=float)

# Extract x and y values from CSV data
xs, ys1, ys2, ys3 = pairCSV.T

# Scatter plot with regression line
plt.figure(figsize=(8, 6))
sns.regplot(x=xs, y=ys1)
sns.regplot(x=xs, y=ys2)
sns.regplot(x=xs, y=ys3)

With one command per series we get data points, regression line, and confidence band. It’s like magic, but with math under the hood.

Python: Seaborn: Linear Regression: Regression Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: Regression Plot

You can obtain the interactive JupyterLab in this following link:

Or if you wish you can have three subplots in one figure with the help of tight layout,

Three-in-One Subplots

When overlaying clutters the view, we can split into three panels, all neatly aligned.

Prepare our data first. Getting Matrix Values, and extract x and y values from CSV data.

pairCSV = np.genfromtxt("series.csv",
  skip_header=1, delimiter=",", dtype=float)

xs, ys1, ys2, ys3 = pairCSV.T

Create the subplots. And also defining seaborn color palette. You can specify the number of colors here.

# Creating subplots
fig, axs = plt.subplots(1, 3, figsize=(12, 4))

palette = sns.color_palette("husl", 3)

Then plotting each scatter plot with regression line.

pairs = zip([ys1, ys2, ys3], ['ys1', 'ys2', 'ys3'])

for i, (ys, title) in enumerate(pairs):
  sns.regplot(x=xs, y=ys,
    ax=axs[i], color=palette[i])

  axs[i].set_title(title)
  axs[i].set_xlabel('x')
  axs[i].set_ylabel('y')

plt.tight_layout()
plt.show()

Python: Seaborn: Linear Regression: Regression Plot

The result of the plot can be visualized as follows. All with pretty color. You can see the color is better than matplotlib.

Python: Visualization with Seaborne: Multiple Regression Plot

You can obtain the interactive JupyterLab in this following link:

Side-by-side panels make it easy, to compare slopes and scatter spread across series. Plus the colors from husl give our eyes a treat.

Linear Model Plot

LM: Linear Model

Leverage lmplot for a concise call, that handles DataFrame melting and faceting under the hood. We can make the code above simpler with lmplot.

With panda dataframe, we can read data from CSV directly. But beware of the strip leading spaces from column names.

Before using the dataframe, we need to transform the DataFrame to long format for linear model plot. We can do this using melt method from panda.

df = pd.read_csv("series.csv") \
  .rename(columns=lambda x: x.strip())

df_melted = pd.melt(df,
  id_vars='xs', var_name='y', value_name='value')

Then we can draw scatter plot with regression line. For convenience, I adjust the title position a bit, so the title fit in small sized figure.

plt.figure(figsize=(8, 6))
sns.lmplot(x='xs', y='value',
  data=df_melted, hue='y')

plt.subplots_adjust(top=0.9)

Python: Seaborn: Linear Regression: Linear Model Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: Linear Model Plot

You can obtain the interactive JupyterLab in this following link:

lmplot combines regression, hue-based grouping, and faceting in one shot. It’s the Swiss Army knife of trend visualization.

Facet Grid

Grid of Plot

For ultimate control we can use FacetGrid, for multiple subplots sharing axes or not. Instead of using subplots, we can arrange our plot in a grid. I give you two different examples. One with shared y-axis, and the other having different y-axis for each.

First we need to get the matrix values. Then convert the values to pandas dataframe. For use with this facetgrid, we need to melt the dataframe to long format.

pairCSV = np.genfromtxt("series.csv",
  skip_header=1, delimiter=",", dtype=float)

cols_all = ['xs', 'ys1', 'ys2', 'ys3']
cols_sel = ['ys1', 'ys2', 'ys3']

df = pd.DataFrame(pairCSV, columns=cols_all)

df_melted = pd.melt(df,
  id_vars='xs', var_name='y', value_name='value')

Or share the y-axis and vary x-axis limits. We need to create a facetgrid with one row and three columns, with different y-axis for each. Then we can map regplot to each facet.

g = sns.FacetGrid(df_melted,
  col='y', col_wrap=3, height=4, sharey=False)

g.map_dataframe(sns.regplot,
  x='xs', y='value', color='b')

We can iterate over selected columns and map regplot to each column in the facetgrid.

In the iteration, we should filter dataframe subset for each ys category. Also for each ys category we can use different color, based on sns.color_palette.

for ax, ys_name in zip(g.axes.flat, cols_sel):
  df_subset = df_melted[
    df_melted['y'] == ys_name]

  color = sns.color_palette("husl", 3)[
    cols_sel.index(ys_name)]

  sns.regplot(x='xs', y='value',
    data=df_subset, ax=ax,
    color=color)

Python: Seaborn: Linear Regression: Facet Grid

The result of the plot can be visualized as below. They all shared the same y-axis.

Python: Visualization with Seaborne: Facet Grid

You can obtain the interactive JupyterLab in this following link:

If you want you can have different y-axis for each grid.

With panda dataframe, we can read data from CSV directly. Do not firget to strip leading spaces from column names. Now we define selected columns for ys series.

df = pd.read_csv("series.csv") \
  .rename(columns=lambda x: x.strip())
  
cols_sel = ['ys1', 'ys2', 'ys3']

As usual we should melt the DataFrame to long format for facetgrid. So we can create a facetgrid with seaborn

df_melted = df.melt(
  id_vars='xs', value_vars=cols_sel)

g = sns.FacetGrid(df_melted,
  col='variable', col_wrap=3,
  sharex=False, sharey=True)

Like previous example, we can iterate over selected columns and map regplot to each column in the facetgrid.

for ax, col in zip(g.axes.flatten(), cols_sel):
  df_subset = df.melt(
    id_vars='xs', value_vars=col)

  color = sns.color_palette("husl", 3)[
    cols_sel.index(col)]

  sns.regplot(x='xs', y='value',
    data=df_subset, ax=ax, color=color)

Python: Seaborn: Linear Regression: Facet Grid

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: acet Grid

You can obtain the interactive JupyterLab in this following link:

FacetGrid is our multi-paneled stage. We decide whether our subplots share scales or stand alone, giving us granular control over comparisons.


Visualizing Statistics Properties

We have four plots that share almost identical setup. Each one gives us a different lens on the same data Let’s prepare our data once and then explore:

  1. Boxplot
  2. Violinplot
  3. Swarmplot
  4. Striplot

Preparing Dataframe

These plot required the the same data preparation. As usual, you might either read the dataframe from panda directly, or using numpy’s np.genfromtxt.

First we need to get the matrix values. Then convert the values to pandas dataframe. For use with this these four kinds of plot, we need to melt the dataframe to long format.

pairCSV = np.genfromtxt("series.csv",
  skip_header=1, delimiter=",", dtype=float)

cols_all = ['xs', 'ys1', 'ys2', 'ys3']
df = pd.DataFrame(pairCSV, columns=cols_all)

df_melted = pd.melt(df,
  id_vars='xs', var_name='y', value_name='value')

We load the CSV, convert to a DataFrame, and melt it into long form. This step powers every plot below.

Python: Seaborn: Statistics Properties: Preparing Dataframe

One tidy DataFrame fuels many plots. We avoid copy paste and ensure consistency across visuals.

Box Plot

The box plot is the classic. It shows median quartiles and outliers at a glance.

Creating boxplot is as simple as below:

plt.figure(figsize=(8, 6))
sns.boxplot(x='y', y='value', data=df_melted)

Python: Seaborn: Statistics Properties: Box Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: Box Plot

You can obtain the interactive JupyterLab in this following link:

Box plots highlight center spread and extreme points. If a whisker goes rogue we notice immediately.

Violin Plot

Violin plots layer a kernel density estimate, around the box plot structure for extra flair. This is basically the sum of normal distribution.

Creating violinplot is also simple.

plt.figure(figsize=(8, 6))
sns.violinplot(x='y', y='value', data=df_melted)

Python: Seaborn: Statistics Properties: Violin Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: Violin Plot

You can obtain the interactive JupyterLab in this following link:

Violin plots reveal the full distribution shape. We see multimodal bumps or smooth bell curves at a glance.

Swarm Plot

Swarm plots show individual observations while avoiding overlap. It’s like inviting each data point to stand in its own space.

We can define colors for swarmplot, by adjust the number of colors as needed, so we can create swarmplot with different colors

colors = sns.color_palette("husl", 3)

plt.figure(figsize=(8, 6))
sns.swarmplot(x='y', y='value',
  hue='y', data=df_melted, palette=colors)

Python: Seaborn: Statistics Properties: Swarm Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: Swarm Plot

You can obtain the interactive JupyterLab in this following link:

Swarm plots let us see the actual data points. We spot clusters, gaps, and any singletons that box or violin plots might hide.

Strip Plot

Strip plots are like swarms but allow us, to dodge points side by side. They give a sense of overlap density.

Just like swarmplot, We can define colors by adjust the number of colors as needed, so we can create the striplot with different colors

colors = sns.color_palette("husl", 3)

plt.figure(figsize=(8, 6))
sns.stripplot(x='y', y='value', data=df_melted,
  hue='y', palette=colors, dodge=True)

Python: Seaborn: Statistics Properties: Strip Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: Strip Plot

You can obtain the interactive JupyterLab in this following link:

Strip plots combine the clarity of swarm plots with grouping dodge. We see individual values and group separation at once.


Visualizing Distribution

When it comes to distribution plots, Seaborn makes our lives a breeze. We get beautiful charts with minimal code, and maximum statistical insight.

KDE Plot

Kernel Density Estimation

Kernel Density Estimation turns each data point, into a little bell curve and sums them all together. The result is a smooth silhouette of our data’s true shape.

This is the sum of normal distribution for each points for a data series.

As usual we can prepare the data. Then seaborn decoration such as the style. And also define a color palette for the KDE plot, with adjustable number of colors as you needed.

df = pd.read_csv("series.csv") \
  .rename(columns=lambda x: x.strip())

sns.set_style("whitegrid")
palette = sns.color_palette("husl", 3)

plt.figure(figsize=(8, 6))

And create a KDE plot for each ys category.

for i, col in enumerate(['ys1', 'ys2', 'ys3']):
  sns.kdeplot(data=df[col],
    color=palette[i], label=col)

Python: Seaborn: Visualizing Distribution: KDE Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: KDE Plot

You can obtain the interactive JupyterLab in this following link:

If you wish, you can customize the style, with other parameters.

df = pd.read_csv("series.csv")
df_melted = pd.melt(df, id_vars='xs',
  var_name='Category', value_name='Value')

sns.set_style("darkgrid")

plt.figure(figsize=(8, 6))

Then we can create KDE plot for all categories with oneliner settings.

sns.kdeplot(data=df_melted,
  x='Value', hue='Category', palette='deep',
  alpha=0.7, multiple='stack', linewidth=2)

Python: Seaborn: Visualizing Distribution: KDE Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: KDE Plot

Customize further:

KDE plots reveal subtle bumps and tails that histograms can miss. They help us spot multimodal distributions or heavy tails in a flash.

Rug Plot

Sometimes the simplest visualization is the most telling. A rug plot adds one tick for each observation. It’s like sprinkling sugar on a cake. Small but impactful.

As usual we need to melt the dataframe to long format for rugplot.

df = pd.read_csv("series.csv")

df_melted = pd.melt(df, id_vars='xs',
  var_name='Category', value_name='Value')

For decoration purpose we need to define a color palette for the rug plots. With using one less color for ‘xs’

palette = sns.color_palette(
  "husl", len(df.columns) - 1)  

plt.figure(figsize=(8, 6))

Then we can create rug plot for each category, with ‘xs’ column excluded.

for i, col in enumerate(df.columns[1:]):
  df_subset = df_melted[df_melted['Category'] == col]
  sns.rugplot(data=df_subset, x='Value',
    color=palette[i], label=col, alpha=0.7)

Python: Seaborn: Visualizing Distribution: Rug Plot

The result of the plot can be visualized as below. This looks like an empty chart as first. But you can see the ticks at the below of the figure.

Python: Visualization with Seaborne: Rug Plot

You can obtain the interactive JupyterLab in this following link:

Rug plots show raw data density without binning. They let us see exact observation locations and gaps in the data.

Histogram Plot

We all know histograms. So what is so special with this histogram? Seaborn’s histplot can add a KDE curve on top to combine the best of both worlds.

As usual we need to prepare data. Then select columns such as ys₁, ys₂, and ys₃. Then create a figure and axis objects.

df = pd.read_csv("series.csv") \
  .rename(columns=lambda x: x.strip())

cols_selected = ['ys1', 'ys2', 'ys3']

plt.figure(figsize=(8, 6))

This way we can plot displot for selected columns.

sns.histplot(data=df[cols_selected],
  kde=True, element='step',
  multiple='layer', palette='husl')

Python: Seaborn: Visualizing Distribution: Histogram Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: Histogram Plot

Explore more:

Layering KDE on a histogram helps us, see both individual bin counts and the underlying density estimate. It’s clarity and style rolled into one.

Distribution Plot

Seaborn’s displot ties it all together: histogram, KDE, and rug, into a single convenient function.

As above, we need to select columns, such as ys₁, ys₂, and ys₃.

df = pd.read_csv("series.csv") \
  .rename(columns=lambda x: x.strip())

cols_selected = ['ys1', 'ys2', 'ys3']
df_selected = df[cols_selected]

Let’s decorate the figure as usual. Defining a color palette for the displot.

palette = sns.color_palette(
  "husl", len(cols_selected))

plt.figure(figsize=(8, 6))

Now we can create displot for selected columns.

sns.displot(data=df_selected,
  kind='hist', rug=True, kde=True,
  palette=palette, alpha=0.7, multiple='layer')

Python: Seaborn: Visualizing Distribution: Distribution Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: Distribution Plot

You can obtain the interactive JupyterLab in this following link:

displot is our all-in-one distribution toolkit. It saves us boilerplate code and gives a comprehensive view of: frequency, density, and individual data points.


Further Visualization

We can combine multiple layers of information into a single figure. For example these two plots add marginal distributions on the top and right sides, giving us both joint and individual views in one glance.

Joint Plot

Seaborn’s jointplot makes it trivial to pair a scatterplot with marginal density or histogram plots. Here we’ll illustrate a regression between xs and ys₃ with KDE filling on the margins.

For eaxmple, we can use seaborn’s jointplot to create a scatter plot, with KDE at the marginal.

df = pd.read_csv("series.csv") \
  .rename(columns=lambda x: x.strip())

sns.jointplot(data=df, x='xs', y='ys3',
  kind='reg', marginal_kws={'fill': True})

Python: Seaborn: Further Visualization: Joint Plot

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: Joint Plot

You can obtain the interactive JupyterLab in this following link:

This plot shows both the relationship, between two variables and their individual distributions. We get regression insight and marginal shape in one compact view.

Joint Grid

For full customization we can build the same view by hand using JointGrid. This lets us choose any plot type for the center and the margins.

First we need to create a JointGrid object. Then plot the scatter plot in the center, and also set the histograms plot on the marginal axes.

df = pd.read_csv("series.csv") \
  .rename(columns=lambda x: x.strip())

g = sns.JointGrid(data=df, x='xs', y='ys3')
g.plot_joint(sns.regplot)
g.plot_marginals(sns.boxplot)

Python: Seaborn: Further Visualization: Joint Grid

The result of the plot can be visualized as below:

Python: Visualization with Seaborne: Joint Grid

You can obtain the interactive JupyterLab in this following link:

With JointGrid we control every layer. We can swap in histograms, violinplots, or any custom chart on the margins to suit our analysis needs.


What Comes Next 🤔?

We have dazzled our plots and explored distributions in all their glory. Now it is time to broaden our toolkit beyond Python and Seaborn.

I am eager to dive into PSPPire, the open source cousin of SPSS. It lets us run familiar statistical tests in a free, community-driven environment. Think of it as a statistical theme park, where all the rides are free and the cotton candy never runs out.

Learning PSPPire gives us another arrow in our quiver. When we need quick hypothesis tests or standardized reporting, PSPPire can deliver without licensing headaches.

Let us continue our adventure here: 🔗 [ Trend - Properties - PSPPire ].