Where to Discuss?

Local Group

Preface

Goal: Explore Julia statistic plot visualization. Providing the data using linear model.

There are multiple libraries in Julia, from StatPlots, Gadfly, and Vega lite. I haven’t explore them deeply.


Distribution

We can start with normal distribution. This pdf (probabilty density function) can be used to calculate the corresponding y-values for the standard normal distribution

Normal Distribution

We need to make ane data series. Using distributions library, we can generate data points for x-axis, then calculate the corresponding y-values for a standard normal distribution.

using StatsPlots, Distributions

x = range(-5, 5, length=1000)
y = pdf(Normal(), x)

Julia: Distribution: Normal Distribution

From this x and y series, we can plot the normal distribution, along with the labels and title.

plot(
  x, y, fillrange = zero(x), fillalpha = 0.35,
  color=:black,
  label="Standard Normal Distribution", lw=1)

xlabel!("x")
ylabel!("Density")
title!("Standard Normal Distribution with Quantiles")

Julia: Distribution: Normal Distribution

The plot result can be shown as follows:

Julia: Distribution Plot: Normal

You can obtain the interactive JupyterLab in this following link:

Normal Distribution with Quantiles

No luck

I’ve got no luck of visualizing quantiles with Julia.

Kurtosis

With the pdf method, we can simulate kurtosis and skewness.

We start with making series by generating data points for x-axis, and calculating the corresponding y-values for the standard normal distribution.

using StatsPlots, Distributions

x = range(-5, 5, length=1000)
y_standard = pdf.(Normal(), x)

Let’s make examples of distributions with different levels of kurtosis.

  1. Standard normal distribution (Kurtosis = 0)
  2. Lower kurtosis
  3. Higher kurtosis
y_kurtosis_1 = pdf.(Normal(1, 1), x)
y_kurtosis_2 = pdf.(Normal(1, 0.5), 
y_kurtosis_3 = pdf.(Normal(1, 2), x)

Julia: Distribution: Kurtosis

Make our first plot, using normal distribution.

# Plot the normal distribution and
plot(
  x, y_standard, color=:black,
  label="Standard Normal",
  title = "Normal Distribution "
    * "with Different Kurtosis",
  xlabel = "x", ylabel = "Density",
)

Julia: Distribution: Kurtosis

Then add each different levels of kurtosis to the plot grammar.

# distributions with different levels of kurtosis
plot!(
  x, y_kurtosis_1, color=:red,
  label="Standard Kurtosis = 0",
  linestyle=:dash, 
)

plot!(
  x, y_kurtosis_2, color=:green,
  label="Lower Kurtosis",
  linestyle=:dash, 
)

plot!(
  x, y_kurtosis_3, color=:blue,
  label="Higher Kurtosis",
  linestyle=:dash, 
)

Julia: Distribution: Kurtosis

The plot result can be shown as follows:

Julia: Distribution Plot: Kurtosis

You can obtain the interactive JupyterLab in this following link:

Skewness

The same can be applied with skewness.

using StatsPlots, Distributions

x = range(-5, 5, length=1000)
y_standard = pdf.(Normal(), x)

Let’s make examples of distributions with different skewness parameters.

  1. Negative skewness
  2. Moderate positive skewness
  3. High positive skewness
y_skewed_1 = (2 * pdf.(Normal(), x) 
  .* cdf.(Normal(), x))
y_skewed_2 = (2 * pdf.(Normal(), -x) 
  .* cdf.(Normal(), -x))
y_skewed_3 = (2 * pdf.(Normal(), x) 
  .* cdf.(Normal(), x) * 2)

Julia: Distribution: Skewness

Make our first plot, using normal distribution.

plot(
  x, y_standard, color=:black,
label="Standard Normal",
  title = "Normal Distribution "
    * "with Different Skewness",
  xlabel = "x", ylabel = "Density",
)

Then add each distributions with different skewness parameters to the plot grammar.

plot!(
  x, y_skewed_1, color=:red,
  label="Negative Skewness = -4",
  linestyle=:dash,
)

plot!(
  x, y_skewed_2, color=:green,
  label="Moderate Positive Skewness = 2",
  linestyle=:dash,
)

plot!(
  x, y_skewed_3, color=:blue,
  label="High Positive Skewness = 6",
  linestyle=:dash,
)

Julia: Distribution: Skewness

The plot result can be shown as follows:

Julia: Distribution Plot: Skewness

You can obtain the interactive JupyterLab in this following link:


Multiple Series

From the perspective of visualization, We can manage to display different series, in one plot, or different using grid.

Regression

This can be done by these three steps.

  1. Scatter plot for each series.
  2. Line plot for each series.
  3. Calculate each standard errors.
  4. Add shaded region for standard error for each series.

With total plot drawing as 6 plots.

As usual, read data from CSV file into dataframe, then extract x and each y values from CSV data.

df = CSV.read("series.csv", DataFrame, types=Dict())
rename!(df, Symbol.(strip.(string.(names(df)))))

xs = df.xs
ys1 = df.ys1
ys2 = df.ys2
ys3 = df.ys3

Julia: Multiple Series: Regression

Scatter plot for each series, without with regression lines.

scatter(
  xs, ys1, label="ys1", 
  seriestype=:scatter, color=:red, 
  legend=:topright)
scatter!(
  xs, ys2, label="ys2", 
  seriestype=:scatter, color=:green)
scatter!(
  xs, ys3, label="ys3", 
  seriestype=:scatter, color=:blue)

Julia: Multiple Series: Regression

Calculate each standard errors.

se1 = std(ys1) / sqrt(length(ys1))
se2 = std(ys2) / sqrt(length(ys2))
se3 = std(ys3) / sqrt(length(ys3))

Also define color scheme for shading.

colors = ColorSchemes.magma.colors

Julia: Multiple Series: Regression

Line plot for each series, along with shaded region using ribbon, representing standard error for each series.

plot!(
  xs, ys1, label="", color=colors[1],
  ribbon=(se1, se1), fillalpha=0.3)
plot!(
  xs, ys2, label="", color=colors[2],
  ribbon=(se2, se2), fillalpha=0.3)
plot!(
  xs, ys3, label="", color=colors[3],
  ribbon=(se3, se3), fillalpha=0.3)

Julia: Multiple Series: Regression

The plot result can be shown as follows:

Julia: Multiple Series: Regression Plot

You can obtain the interactive JupyterLab in this following link:

Combined

For some reason, it would be better to separate the result. For example, if you want different y-axis scale.

As usual, read data from CSV file into dataframe, then extract x and each y values from CSV data. Calculate standard error for each y series. And also consider aestethic by defining color scheme for shading.

df = CSV.read("series.csv", DataFrame, types=Dict())
rename!(df, Symbol.(strip.(string.(names(df)))))

xs = df.xs
ys1 = df.ys1
ys2 = df.ys2
ys3 = df.ys3

se1 = std(ys1) / sqrt(length(ys1))
se2 = std(ys2) / sqrt(length(ys2))
se3 = std(ys3) / sqrt(length(ys3))

colors = ColorSchemes.magma.colors

Julia: Multiple Series: Combined

From this we can draw plot for each series: [ys1, ys2, ys3]

plot1 = scatter(
  xs, ys1, label="ys1",
  seriestype=:scatter, color=:red)
...

plot2 = scatter(
  xs, ys2, label="ys2",
  seriestype=:scatter, color=:green)
...

plot3 = scatter(
  xs, ys3, label="ys3",
  seriestype=:scatter, color=:blue)
...

Julia: Multiple Series: Combined

Now we can combine plots into a single figure.

plot_combined = plot(
  plot1, plot2, plot3, layout=(1, 3))

Julia: Multiple Series: Combined

The plot result can be shown as follows:

Julia: Multiple Series: Combined Plot

You can obtain the interactive JupyterLab in this following link:


Statistic Properties: StatsPlot

Three series in one axis plot

As you can see from previous statistical properties. We can analyze the data for each series. For example we can just consider just the y-series, and obtain the mean, median, mode, and also the minimum, maximum, range, and quantiles.

We can use StatsPlot for simple Boxplot and Violinplot. But we require Gadfly to draw Swarm Plot.

StatsPlot: Box Plot

There is this boxplot method from StatsPlot.

We need to read data from CSV file, then extract the columns ys1, ys2, and ys3.

using CSV, DataFrames, StatsPlots

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

data = [df.ys1, df.ys2, df.ys3]

Julia: Statistic Properties: StatsPlot: Box Plot

And utilize boxplot directly, to create a box plot using StatsPlots.

boxplot(data, 
  labels = ["ys1", "ys2", "ys3"],
  linecolor = :black,
  legend = false,
  xlabel = "Variable",
  ylabel = "Value",
  title = "Box Plot for ys1, ys2, and ys3",
  grid = false)

Julia: Statistic Properties: StatsPlot: Box Plot

The plot result can be shown as follows:

Julia: Statistic Properties: StatsPlot: Box Plot

You can obtain the interactive JupyterLab in this following link:

StatsPlot: Violin Plot

There is also this violin method from StatsPlot.

With the same data, we can use the method directly, to create a violin plot using StatsPlots.

violin(data, 
  labels = ["ys1", "ys2", "ys3"],
  linecolor = :black,
  legend = false,
  xlabel = "Variable",
  ylabel = "Value",
  title = "Violin Plot for ys1, ys2, and ys3",
  grid = false)

Julia: Statistic Properties: StatsPlot: Violin

The plot result can be shown as follows:

Julia: Statistic Properties: StatsPlot: Violin Plot

You can obtain the interactive JupyterLab in this following link:

Unfortunately I can’t draw swarm plot and strip plot using StatsPlot. So I’m looking for something else.


Statistic Properties: Gadfly

Three series in one axis plot

Gadfly: Box Plot

To use this box_plot method from Gadfly, we need to import Cairo and Fontconfig.

using CSV, DataFrames, Gadfly
import Cairo, Fontconfig

We need to melt the DataFrame to long format.

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

df_long = stack(df, Not(:xs))

Julia: Statistic Properties: Gadfly: Box Plot

And utilize box_plot directly, to create a box plot using Gadfly.

box_plot = Gadfly.plot(
  df_long,
  x=:variable,
  y=:value,
  color=:variable,
  Geom.boxplot(),
  Guide.xlabel("Variable"),
  Guide.ylabel("Value"),
  Guide.title("Box Plot for ys1, ys2, and ys3"),
  Theme(
    key_position = :top,
    boxplot_spacing = 100px,
    background_color = "white",
  )
)

Julia: Statistic Properties: Gadfly: Box Plot

The plot result can be shown as follows:

Julia: Statistic Properties: Gadfly: Box Plot

You can obtain the interactive JupyterLab in this following link:

Gadfly: Violin Plot

Also to use this violin_plot method from Gadfly, we need to import Cairo and Fontconfig.

With the same data, we can use the method directly, to create a violin plot using Gadfly.

violin_plot = Gadfly.plot(
  df_long,
  x=:variable,
  y=:value,
  color=:variable,
  Geom.violin,
  Guide.xlabel("Variable"),
  Guide.ylabel("Value"),
  Guide.title("Violin Plot for ys1, ys2, and ys3"),

  Coord.cartesian(ymin=0),

  Scale.y_continuous(minvalue=0),
  Theme(
    key_position=:top,
    default_color="purple", 
    background_color="white",
    panel_stroke=colorant"gray",
    minor_label_font_size=10pt, 
    major_label_font_size=12pt,
  )
)

Julia: Statistic Properties: Gadfly: Violin Plot

The plot result can be shown as follows:

Julia: Statistic Properties: Gadfly: Violin Plot

You can obtain the interactive JupyterLab in this following link:

Gadfly: Swarm Plot

To draw swarm plot, we neen box_plot from Gadfly, but with additional Geom.beeswarm() parameter.

box_plot = Gadfly.plot(
  df_long,
  x=:variable,
  y=:value,
  color=:variable,
  Geom.beeswarm(),
  Guide.xlabel("Variable"),
  Guide.ylabel("Value"),
  Guide.title("Swarm Plot for ys1, ys2, and ys3"),
  Theme(
    key_position = :top,
    background_color = "white",
  )
)

Julia: Statistic Properties: Gadfly: Swarm Plot

The plot result can be shown as follows:

Julia: Statistic Properties: Gadfly: Swarm Plot

You can obtain the interactive JupyterLab in this following link:


Statistic Properties: Distribution

Just like previous plots, we can analyse the y-axis, but this time by frequency of each series.

KDE Plot

Kernel Density Estimation

KDE shown well the distribution of the frequency. This complex task can be done easily with kde_plot from StatsPlots.

We need to melt the DataFrame to long format.

using CSV, DataFrames, StatsPlots

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

df_long = stack(df, Not(:xs))

Julia: Statistic Properties: StatsPlot: KDE Plot

Now we can create KDE plot using StatsPlots with custom colors

kde_plot = density(
  df_long.value, 
  group = df_long.variable,
  fillalpha = 0.7,
  legend = :topright,
  xlabel = "Value",
  ylabel = "Density",
  title = "KDE Plot for ys1, ys2, and ys3",
  lw = 2, # Line width
  α = 0.5 # Opacity
)

Julia: Statistic Properties: StatsPlot: KDE Plot

The plot result can be shown as follows:

Julia: Statistic Properties: Distribution: KDE Plot

You can obtain the interactive JupyterLab in this following link:

Rug Plot

No Luck

You know, I still have no luck, drawing this plot in Julia. I’d better come back later on.

Histogram

This looks like the most common chart for beginner. This simple task can be done easily with hist_plot from StatsPlots.

using CSV, DataFrames, StatsPlots

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

df_long = stack(df, Not(:xs))

Julia: Statistic Properties: StatsPlot: Histogram

Now we can create Histogram using StatsPlots with custom colors

hist_plot = histogram(
    df_long.value,
    group = df_long.variable,
    bins = collect(0:50:maximum(df_long.value)),
    linecolor = :black,
    fillalpha = 0.7,
    color = :Set1,
    xlabel = "Value",
    ylabel = "Density",
    title = "Histogram Plot for ys1, ys2, and ys3",
    legend = :topleft
)

Julia: Statistic Properties: StatsPlot: Histogram

The plot result can be shown as follows:

Julia: Statistic Properties: Distribution: Histogram

You can obtain the interactive JupyterLab in this following link:

Unfortunately, I still don’t know to use custom color.

Well, I should learn more. I’ll do it later. When I’ve got the time.

Marginal

No Luck

I have to explore more. Unfortunately.

I apologize.


What’s the Next Chapter 🤔?

You can obtain the interactive JupyterLab in this following link:

  • [github.com/…/trend/.ipynb]

We can visualize statistical properties, in practical way.

Beside statistical analysis with python, R and Julia. We can go further to Typescript and Go, so you can integrate with your application seamlessly.

But currently I’m pretty busy with my job.


Conclusion

It is fun, right?

What do you think?

Farewell. We shall meet again.