### Preface

Goal: Pretty statistics visualization with Seaborn, equipped with example script for each plots.

We need tool to make easy for us to visualize statistic properties. Fortunately, there is this seaborn library with ready to use plot chart, specifically made for statistics.

Example chart plot in this article provided with source code. There will be no explanation step by step tutorial, as there is already a bunch of tutorial in the internet anyway. Our focus is what you can do with Seaborn, related with statistics properties.

Note that in real life we would face complex data analysis, so the script would also be more complex than just these simple examples.

Let’s have a tour, enjoy the view of each chart plot.

### Visualizing Linear Regression

Yes we are still talking about trend.

#### Data Series

Instead of just one series,
we would like to use three series: `ys1`

, `ys2`

, or `ys3`

:

```
xs, ys1, ys2, ys3
0, 5, 5, 5
1, 9, 12, 14
2, 13, 25, 41
3, 17, 44, 98
4, 21, 69, 197
5, 25, 100, 350
6, 29, 137, 569
7, 33, 180, 866
8, 37, 229, 1253
9, 41, 284, 1742
10, 45, 345, 2345
11, 49, 412, 3074
12, 53, 485, 3941
```

#### Regression Plot

Plotting linear regression plot is straightforward. You can plot all these three series at once in one plot figure.

```
# Getting Matrix Values
pairCSV = np.genfromtxt("series.csv",
skip_header=1, delimiter=",", dtype=float)
# Extract x and y values from CSV data
xs, ys1, ys2, ys3 = pairCSV.T
# Scatter plot with regression line
plt.figure(figsize=(8, 6))
sns.regplot(x=xs, y=ys1)
sns.regplot(x=xs, y=ys2)
sns.regplot(x=xs, y=ys3)
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

Or if you wish you can have three subplots in one figure with the help of tight layout,

Prepare our data first. Getting Matrix Values, and extract x and y values from CSV data.

```
pairCSV = np.genfromtxt("series.csv",
skip_header=1, delimiter=",", dtype=float)
xs, ys1, ys2, ys3 = pairCSV.T
```

Create the subplots. And also defining seaborn color palette. You can specify the number of colors here.

```
# Creating subplots
fig, axs = plt.subplots(1, 3, figsize=(12, 4))
palette = sns.color_palette("husl", 3)
```

Then plotting each scatter plot with regression line.

```
pairs = zip([ys1, ys2, ys3], ['ys1', 'ys2', 'ys3'])
for i, (ys, title) in enumerate(pairs):
sns.regplot(x=xs, y=ys,
ax=axs[i], color=palette[i])
axs[i].set_title(title)
axs[i].set_xlabel('x')
axs[i].set_ylabel('y')
plt.tight_layout()
plt.show()
```

The result of the plot can be visualized as follows. All with pretty color. You can see the color is better than matplotlib.

You can obtain the interactive `JupyterLab`

in this following link:

#### Linear Model Plot

LM: Linear Model

We can make the code above simpler with lmplot.

With panda dataframe, we can read data from CSV directly. But beware of the strip leading spaces from column names.

Before using the dataframe,
we need to transform the DataFrame to long format for linear model plot.
We can do this using `melt`

method from `panda`

.

```
df = pd.read_csv("series.csv") \
.rename(columns=lambda x: x.strip())
df_melted = pd.melt(df,
id_vars='xs', var_name='y', value_name='value')
```

Then we can draw scatter plot with regression line. For convenience, I adjust the title position a bit, so the title fit in small sized figure.

```
plt.figure(figsize=(8, 6))
sns.lmplot(x='xs', y='value',
data=df_melted, hue='y')
plt.subplots_adjust(top=0.9)
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

#### Facet Grid

Grid of Plot

Instead of using subplots, we can arrange our plot in a grid. I give you two different examples. One with shared y-axis, and the other having different y-axis for each.

First we need to get the matrix values.
Then convert the values to pandas dataframe.
For use with this `facetgrid`

,
we need to melt the dataframe to long format.

```
pairCSV = np.genfromtxt("series.csv",
skip_header=1, delimiter=",", dtype=float)
cols_all = ['xs', 'ys1', 'ys2', 'ys3']
cols_sel = ['ys1', 'ys2', 'ys3']
df = pd.DataFrame(pairCSV, columns=cols_all)
df_melted = pd.melt(df,
id_vars='xs', var_name='y', value_name='value')
```

We need to create a `facetgrid`

with one row and three columns,
with different y-axis for each.
Then we can map regplot to each facet.

```
g = sns.FacetGrid(df_melted,
col='y', col_wrap=3, height=4, sharey=False)
g.map_dataframe(sns.regplot,
x='xs', y='value', color='b')
```

We can iterate over selected columns and
map regplot to each column in the `facetgrid`

.

In the iteration,
we should filter dataframe subset for each `ys`

category.
Also for each `ys`

category we can use different color,
based on `sns.color_palette`

.

```
for ax, ys_name in zip(g.axes.flat, cols_sel):
df_subset = df_melted[
df_melted['y'] == ys_name]
color = sns.color_palette("husl", 3)[
cols_sel.index(ys_name)]
sns.regplot(x='xs', y='value',
data=df_subset, ax=ax,
color=color)
```

The result of the plot can be visualized as below. They all shared the same y-axis.

You can obtain the interactive `JupyterLab`

in this following link:

If you want you can have different y-axis for each grid.

With panda dataframe,
we can read data from CSV directly.
Do not firget to strip leading spaces from column names.
Now we define selected columns for `ys`

series.

```
df = pd.read_csv("series.csv") \
.rename(columns=lambda x: x.strip())
cols_sel = ['ys1', 'ys2', 'ys3']
```

As usual we should melt the DataFrame to long format for `facetgrid`

.
So we can create a `facetgrid`

with seaborn

```
df_melted = df.melt(
id_vars='xs', value_vars=cols_sel)
g = sns.FacetGrid(df_melted,
col='variable', col_wrap=3,
sharex=False, sharey=True)
```

Like previous example, we can iterate over selected columns and
map regplot to each column in the `facetgrid`

.

```
for ax, col in zip(g.axes.flatten(), cols_sel):
df_subset = df.melt(
id_vars='xs', value_vars=col)
color = sns.color_palette("husl", 3)[
cols_sel.index(col)]
sns.regplot(x='xs', y='value',
data=df_subset, ax=ax, color=color)
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

### Visualizing Statistics Properties

We have four plots with almost identical settings

- Boxplot
- Violinplot
- Swarmplot
- Striplot

#### Preparing Dataframe

These plot required the the same data preparation.
As usual, you might either read the dataframe from panda directly,
or using numpy’s `np.genfromtxt`

.

First we need to get the matrix values. Then convert the values to pandas dataframe. For use with this these four kinds of plot, we need to melt the dataframe to long format.

```
pairCSV = np.genfromtxt("series.csv",
skip_header=1, delimiter=",", dtype=float)
cols_all = ['xs', 'ys1', 'ys2', 'ys3']
df = pd.DataFrame(pairCSV, columns=cols_all)
df_melted = pd.melt(df,
id_vars='xs', var_name='y', value_name='value')
```

#### Box Plot

The box plot is the most common visualization.

Creating `boxplot`

is as simple as below:

```
plt.figure(figsize=(8, 6))
sns.boxplot(x='y', y='value', data=df_melted)
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

#### Violin Plot

This violin plot is the most common visualization. This is basically the sum of normal distribution.

Creating `violinplot`

is also simple.

```
plt.figure(figsize=(8, 6))
sns.violinplot(x='y', y='value', data=df_melted)
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

#### Swarm Plot

There is also other visualization as well.

We can define colors for s`warmplot`

,
by adjust the number of colors as needed,
so we can create `swarmplot`

with different colors

```
colors = sns.color_palette("husl", 3)
plt.figure(figsize=(8, 6))
sns.swarmplot(x='y', y='value',
hue='y', data=df_melted, palette=colors)
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

#### Strip Plot

This looks like swarm plot, but with some kind of offset for each dots, so we can see how the data overlapped with the other.

Just like `swarmplot`

,
We can define colors by adjust the number of colors as needed,
so we can create the `striplot`

with different colors

```
colors = sns.color_palette("husl", 3)
plt.figure(figsize=(8, 6))
sns.stripplot(x='y', y='value', data=df_melted,
hue='y', palette=colors, dodge=True)
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

### Visualizing Distribution

Compared to matplotlib, visualizing distribution is much more easier with seaborn.

#### KDE Plot

Kernel Density Estimation

This is the sum of normal distribution for each points for a data series.

As usual we can prepare the data. Then seaborn decoration such as the style. And also define a color palette for the KDE plot, with adjustable number of colors as you needed.

```
df = pd.read_csv("series.csv") \
.rename(columns=lambda x: x.strip())
sns.set_style("whitegrid")
palette = sns.color_palette("husl", 3)
plt.figure(figsize=(8, 6))
```

And create a KDE plot for each ys category.

```
for i, col in enumerate(['ys1', 'ys2', 'ys3']):
sns.kdeplot(data=df[col],
color=palette[i], label=col)
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

If you wish, you can customize the style, with other parameters.

```
df = pd.read_csv("series.csv")
df_melted = pd.melt(df, id_vars='xs',
var_name='Category', value_name='Value')
sns.set_style("darkgrid")
plt.figure(figsize=(8, 6))
```

Then we can create KDE plot for all categories with oneliner settings.

```
sns.kdeplot(data=df_melted,
x='Value', hue='Category', palette='deep',
alpha=0.7, multiple='stack', linewidth=2)
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

#### Rug Plot

Sometimes all you need is just the ticks. You can do this with rugs plot.

As usual we need to `melt`

the dataframe to long format for rugplot.

```
df = pd.read_csv("series.csv")
df_melted = pd.melt(df, id_vars='xs',
var_name='Category', value_name='Value')
```

For decoration purpose we need to define a color palette for the rug plots. With using one less color for ‘xs’

```
palette = sns.color_palette(
"husl", len(df.columns) - 1)
plt.figure(figsize=(8, 6))
```

Then we can create rug plot for each category, with ‘xs’ column excluded.

```
for i, col in enumerate(df.columns[1:]):
df_subset = df_melted[df_melted['Category'] == col]
sns.rugplot(data=df_subset, x='Value',
color=palette[i], label=col, alpha=0.7)
```

The result of the plot can be visualized as below. This looks like an empty chart as first. But you can see the ticks at the below of the figure.

You can obtain the interactive `JupyterLab`

in this following link:

#### Histogram Plot

Histogram is a very basic plot and available in matplotlib. So what is so special with this histogram?

With seaborn we can have additional KDE plot with histogram plot.

As usual we need to prepare data.
Then select columns such as `ys1`

, `ys2`

, and `ys3.
Then create a figure and axis objects.

```
df = pd.read_csv("series.csv") \
.rename(columns=lambda x: x.strip())
cols_selected = ['ys1', 'ys2', 'ys3']
plt.figure(figsize=(8, 6))
```

This way we can plot displot for selected columns.

```
sns.histplot(data=df[cols_selected],
kde=True, element='step',
multiple='layer', palette='husl')
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

#### Distribution Plot

This is similar to above plot, but instead of having KDE Plot feature in histogram. Here we have histogram feature in KDE plot.

As above, we need to select columns,
such as `ys1`

, `ys2`

, and `ys3`

.

```
df = pd.read_csv("series.csv") \
.rename(columns=lambda x: x.strip())
cols_selected = ['ys1', 'ys2', 'ys3']
df_selected = df[cols_selected]
```

Let’s decorate the figure as usual.
Defining a color palette for the `displot`

.

```
palette = sns.color_palette(
"husl", len(cols_selected))
plt.figure(figsize=(8, 6))
```

Now we can create displot for selected columns.

```
sns.displot(data=df_selected,
kind='hist', rug=True, kde=True,
palette=palette, alpha=0.7, multiple='layer')
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

### Further Visualization

We can combine different in information in one figure. For example these two plots below have marginal side on top and right.

#### Joint Plot

The first approach is using plot, and putting the marginal settings inside the plots.

For eaxmple, we can use seaborn’s `jointplot`

to create a scatter plot,
with KDE at the marginal.

```
df = pd.read_csv("series.csv") \
.rename(columns=lambda x: x.strip())
sns.jointplot(data=df, x='xs', y='ys3',
kind='reg', marginal_kws={'fill': True})
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

#### Joint Grid

Or the second approach by using grid, so we can customize the marginal manually.

First we need to create a JointGrid object. Then plot the scatter plot in the center, and also set the histograms plot on the marginal axes.

```
df = pd.read_csv("series.csv") \
.rename(columns=lambda x: x.strip())
g = sns.JointGrid(data=df, x='xs', y='ys3')
g.plot_joint(sns.regplot)
g.plot_marginals(sns.boxplot)
```

The result of the plot can be visualized as below:

You can obtain the interactive `JupyterLab`

in this following link:

### What Comes Next ðŸ¤”?

We can visualize statistical properties, in practical way.

I would really like to explore PSPPire. PSPP is the open source version of SPSS.

Consider continuing your exploration with [ Trend - Properties - PSPPire ].