### Preface

Goal: Explore Julia statistic plot visualization. Providing the data using linear model.

There are multiple libraries in Julia, from StatPlots, Gadfly, and Vega lite. I haven’t explore them deeply.

### Distribution

We can start with normal distribution.
This `pdf`

(probabilty density function) can be used to
calculate the corresponding y-values
for the standard normal distribution

#### Normal Distribution

We need to make ane data series.
Using `distributions`

library,
we can generate data points for x-axis,
then calculate the corresponding y-values
for a standard normal distribution.

```
using StatsPlots, Distributions
x = range(-5, 5, length=1000)
y = pdf(Normal(), x)
```

From this `x`

and `y`

series,
we can plot the normal distribution,
along with the labels and title.

```
plot(
x, y, fillrange = zero(x), fillalpha = 0.35,
color=:black,
label="Standard Normal Distribution", lw=1)
xlabel!("x")
ylabel!("Density")
title!("Standard Normal Distribution with Quantiles")
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Normal Distribution with Quantiles

No luck

I’ve got no luck of visualizing quantiles with Julia.

#### Kurtosis

With the `pdf`

method,
we can simulate kurtosis and skewness.

We start with making series by generating data points for x-axis, and calculating the corresponding y-values for the standard normal distribution.

```
using StatsPlots, Distributions
x = range(-5, 5, length=1000)
y_standard = pdf.(Normal(), x)
```

Let’s make examples of distributions with different levels of kurtosis.

- Standard normal distribution (Kurtosis = 0)
- Lower kurtosis
- Higher kurtosis

```
y_kurtosis_1 = pdf.(Normal(1, 1), x)
y_kurtosis_2 = pdf.(Normal(1, 0.5),
y_kurtosis_3 = pdf.(Normal(1, 2), x)
```

Make our first plot, using normal distribution.

```
# Plot the normal distribution and
plot(
x, y_standard, color=:black,
label="Standard Normal",
title = "Normal Distribution "
* "with Different Kurtosis",
xlabel = "x", ylabel = "Density",
)
```

Then add each different levels of kurtosis to the plot grammar.

```
# distributions with different levels of kurtosis
plot!(
x, y_kurtosis_1, color=:red,
label="Standard Kurtosis = 0",
linestyle=:dash,
)
plot!(
x, y_kurtosis_2, color=:green,
label="Lower Kurtosis",
linestyle=:dash,
)
plot!(
x, y_kurtosis_3, color=:blue,
label="Higher Kurtosis",
linestyle=:dash,
)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Skewness

The same can be applied with skewness.

```
using StatsPlots, Distributions
x = range(-5, 5, length=1000)
y_standard = pdf.(Normal(), x)
```

Let’s make examples of distributions with different skewness parameters.

- Negative skewness
- Moderate positive skewness
- High positive skewness

```
y_skewed_1 = (2 * pdf.(Normal(), x)
.* cdf.(Normal(), x))
y_skewed_2 = (2 * pdf.(Normal(), -x)
.* cdf.(Normal(), -x))
y_skewed_3 = (2 * pdf.(Normal(), x)
.* cdf.(Normal(), x) * 2)
```

Make our first plot, using normal distribution.

```
plot(
x, y_standard, color=:black,
label="Standard Normal",
title = "Normal Distribution "
* "with Different Skewness",
xlabel = "x", ylabel = "Density",
)
```

Then add each distributions with different skewness parameters to the plot grammar.

```
plot!(
x, y_skewed_1, color=:red,
label="Negative Skewness = -4",
linestyle=:dash,
)
plot!(
x, y_skewed_2, color=:green,
label="Moderate Positive Skewness = 2",
linestyle=:dash,
)
plot!(
x, y_skewed_3, color=:blue,
label="High Positive Skewness = 6",
linestyle=:dash,
)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

### Multiple Series

From the perspective of visualization, We can manage to display different series, in one plot, or different using grid.

#### Regression

This can be done by these three steps.

- Scatter plot for each series.
- Line plot for each series.
- Calculate each standard errors.
- Add shaded region for standard error for each series.

With total plot drawing as 6 plots.

As usual, read data from CSV file into dataframe, then extract x and each y values from CSV data.

```
df = CSV.read("series.csv", DataFrame, types=Dict())
rename!(df, Symbol.(strip.(string.(names(df)))))
xs = df.xs
ys1 = df.ys1
ys2 = df.ys2
ys3 = df.ys3
```

Scatter plot for each series, without with regression lines.

```
scatter(
xs, ys1, label="ys1",
seriestype=:scatter, color=:red,
legend=:topright)
scatter!(
xs, ys2, label="ys2",
seriestype=:scatter, color=:green)
scatter!(
xs, ys3, label="ys3",
seriestype=:scatter, color=:blue)
```

Calculate each standard errors.

```
se1 = std(ys1) / sqrt(length(ys1))
se2 = std(ys2) / sqrt(length(ys2))
se3 = std(ys3) / sqrt(length(ys3))
```

Also define color scheme for shading.

`colors = ColorSchemes.magma.colors`

Line plot for each series, along with shaded region using ribbon, representing standard error for each series.

```
plot!(
xs, ys1, label="", color=colors[1],
ribbon=(se1, se1), fillalpha=0.3)
plot!(
xs, ys2, label="", color=colors[2],
ribbon=(se2, se2), fillalpha=0.3)
plot!(
xs, ys3, label="", color=colors[3],
ribbon=(se3, se3), fillalpha=0.3)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Combined

For some reason, it would be better to separate the result. For example, if you want different y-axis scale.

As usual, read data from CSV file into dataframe, then extract x and each y values from CSV data. Calculate standard error for each y series. And also consider aestethic by defining color scheme for shading.

```
df = CSV.read("series.csv", DataFrame, types=Dict())
rename!(df, Symbol.(strip.(string.(names(df)))))
xs = df.xs
ys1 = df.ys1
ys2 = df.ys2
ys3 = df.ys3
se1 = std(ys1) / sqrt(length(ys1))
se2 = std(ys2) / sqrt(length(ys2))
se3 = std(ys3) / sqrt(length(ys3))
colors = ColorSchemes.magma.colors
```

From this we can draw plot for each series:
[`ys1`

, `ys2`

, `ys3`

]

```
plot1 = scatter(
xs, ys1, label="ys1",
seriestype=:scatter, color=:red)
...
plot2 = scatter(
xs, ys2, label="ys2",
seriestype=:scatter, color=:green)
...
plot3 = scatter(
xs, ys3, label="ys3",
seriestype=:scatter, color=:blue)
...
```

Now we can combine plots into a single figure.

```
plot_combined = plot(
plot1, plot2, plot3, layout=(1, 3))
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

### Statistic Properties: StatsPlot

Three series in one axis plot

As you can see from previous statistical properties.
We can analyze the data for each series.
For example we can just consider just the y-series,
and obtain the `mean`

, `median`

, `mode`

,
and also the `minimum`

, `maximum`

, `range`

, and `quantiles`

.

We can use `StatsPlot`

for simple `Boxplot`

and `Violinplot`

.
But we require `Gadfly`

to draw `Swarm Plot`

.

#### StatsPlot: Box Plot

There is this `boxplot`

method from `StatsPlot`

.

We need to read data from CSV file, then extract the columns ys1, ys2, and ys3.

```
using CSV, DataFrames, StatsPlots
df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))
data = [df.ys1, df.ys2, df.ys3]
```

And utilize `boxplot`

directly,
to create a box plot using `StatsPlots`

.

```
boxplot(data,
labels = ["ys1", "ys2", "ys3"],
linecolor = :black,
legend = false,
xlabel = "Variable",
ylabel = "Value",
title = "Box Plot for ys1, ys2, and ys3",
grid = false)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### StatsPlot: Violin Plot

There is also this `violin`

method from `StatsPlot`

.

With the same data, we can use the method directly,
to create a violin plot using `StatsPlots`

.

```
violin(data,
labels = ["ys1", "ys2", "ys3"],
linecolor = :black,
legend = false,
xlabel = "Variable",
ylabel = "Value",
title = "Violin Plot for ys1, ys2, and ys3",
grid = false)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

Unfortunately I can’t draw swarm plot and strip plot using `StatsPlot`

.
So I’m looking for something else.

### Statistic Properties: Gadfly

Three series in one axis plot

#### Gadfly: Box Plot

To use this `box_plot`

method from `Gadfly`

,
we need to import `Cairo`

and `Fontconfig`

.

```
using CSV, DataFrames, Gadfly
import Cairo, Fontconfig
```

We need to melt the DataFrame to long format.

```
df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))
df_long = stack(df, Not(:xs))
```

And utilize `box_plot`

directly,
to create a box plot using `Gadfly`

.

```
box_plot = Gadfly.plot(
df_long,
x=:variable,
y=:value,
color=:variable,
Geom.boxplot(),
Guide.xlabel("Variable"),
Guide.ylabel("Value"),
Guide.title("Box Plot for ys1, ys2, and ys3"),
Theme(
key_position = :top,
boxplot_spacing = 100px,
background_color = "white",
)
)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Gadfly: Violin Plot

Also to use this `violin_plot`

method from `Gadfly`

,
we need to import `Cairo`

and `Fontconfig`

.

With the same data, we can use the method directly,
to create a violin plot using `Gadfly`

.

```
violin_plot = Gadfly.plot(
df_long,
x=:variable,
y=:value,
color=:variable,
Geom.violin,
Guide.xlabel("Variable"),
Guide.ylabel("Value"),
Guide.title("Violin Plot for ys1, ys2, and ys3"),
Coord.cartesian(ymin=0),
Scale.y_continuous(minvalue=0),
Theme(
key_position=:top,
default_color="purple",
background_color="white",
panel_stroke=colorant"gray",
minor_label_font_size=10pt,
major_label_font_size=12pt,
)
)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Gadfly: Swarm Plot

To draw swarm plot, we neen `box_plot`

from `Gadfly`

,
but with additional `Geom.beeswarm()`

parameter.

```
box_plot = Gadfly.plot(
df_long,
x=:variable,
y=:value,
color=:variable,
Geom.beeswarm(),
Guide.xlabel("Variable"),
Guide.ylabel("Value"),
Guide.title("Swarm Plot for ys1, ys2, and ys3"),
Theme(
key_position = :top,
background_color = "white",
)
)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

### Statistic Properties: Distribution

Just like previous plots, we can analyse the y-axis, but this time by frequency of each series.

#### KDE Plot

Kernel Density Estimation

KDE shown well the distribution of the frequency.
This complex task can be done easily
with `kde_plot`

from `StatsPlots`

.

We need to melt the DataFrame to long format.

```
using CSV, DataFrames, StatsPlots
df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))
df_long = stack(df, Not(:xs))
```

Now we can create KDE plot using StatsPlots with custom colors

```
kde_plot = density(
df_long.value,
group = df_long.variable,
fillalpha = 0.7,
legend = :topright,
xlabel = "Value",
ylabel = "Density",
title = "KDE Plot for ys1, ys2, and ys3",
lw = 2, # Line width
Î± = 0.5 # Opacity
)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Rug Plot

No Luck

You know, I still have no luck, drawing this plot in Julia. I’d better come back later on.

#### Histogram

This looks like the most common chart for beginner.
This simple task can be done easily
with `hist_plot`

from `StatsPlots`

.

```
using CSV, DataFrames, StatsPlots
df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))
df_long = stack(df, Not(:xs))
```

Now we can create Histogram using StatsPlots with custom colors

```
hist_plot = histogram(
df_long.value,
group = df_long.variable,
bins = collect(0:50:maximum(df_long.value)),
linecolor = :black,
fillalpha = 0.7,
color = :Set1,
xlabel = "Value",
ylabel = "Density",
title = "Histogram Plot for ys1, ys2, and ys3",
legend = :topleft
)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

Unfortunately, I still don’t know to use custom color.

Well, I should learn more. I’ll do it later. When I’ve got the time.

#### Marginal

No Luck

I have to explore more. Unfortunately.

*I apologize.*

### What’s the Next Chapter ðŸ¤”?

You can obtain the interactive `JupyterLab`

in this following link:

- [github.com/…/trend/.ipynb]

We can visualize statistical properties, in practical way.

Beside statistical analysis with python, R and Julia. We can go further to Typescript and Go, so you can integrate with your application seamlessly.

But currently I’m pretty busy with my job.

### Conclusion

It is fun, right?

What do you think?

Farewell. We shall meet again.