Where to Discuss?

Local Group

Preface

Goal: Explore R Programming language visualization with ggplot2. Provide a bunch of example of plot cases in your fingertip.

Let’s continue our previous ggplot2 journey. It is easy if we can embrace the grammar.


Distribution

We can start with normal distribution.

The dnorm method can be used to calculate the corresponding y-values for the standard normal distribution

y <- dnorm(x)

Normal Distribution

geom_line

We can start with load required libraries. Then generate data points for x-axis. And use dnorm method to calculate the corresponding y-values for a standard normal distribution, So we can create data frame for plotting.

library(ggplot2)

x <- seq(-5, 5, length.out = 1000)
y <- dnorm(x)

df <- data.frame(x = x, y = y)

This way we can plot the normal distribution using geom_line. Then add decoration such as grid, labels and title. And finally save plot as PNG.

plot <- ggplot(df, aes(x = x, y = y)) +
  geom_line(color = "black")

plot <- plot +
  theme_minimal() +
  theme(
    text = element_text(size = 4),
    panel.grid = element_blank()) + 

  labs(
    x = "x", y = "Density",
    title = "Standard Normal ",
      "Distribution with Quantiles")

ggsave("63-normal.png", plot,
  width = 800, height = 400, units = "px")

Normal Distribution with Quantiles

geom_area

With above plot we can add quantiles.

First we have to calculate the quantiles, based on defined percentiles mark.

percentiles <- c(25, 50, 75, 100)
quantiles <- quantile(x, probs = percentiles / 100)

R: Distribution: Quantiles

And add this shade regions corresponding to percentiles, to the plot grammar. This can be done by using geom_area.

for (i in seq_along(quantiles)) {
  plot <- plot + geom_area(
    data = subset(df,x <= quantiles[i]),
    aes(x = x, y = y),
    fill = i, alpha = 0.3)
}

R: Distribution: Quantiles

The plot result can be shown as follows:

R: Distribution: Normal Distribution with Quantiles

You can obtain the interactive JupyterLab in this following link:

Kurtosis

With the dnorm method, we can simulate kurtosis and skewness.

Let’s make examples of distributions with different levels of kurtosis.

  1. Standard normal distribution (Kurtosis = 0)
  2. Lower kurtosis
  3. Higher kurtosis
y_standard <- dnorm(x)
df_standard <- data.frame(x = x, y = y_standard)

y_kurtosis_1 <- dnorm(x, mean = 1, sd = 1)
y_kurtosis_2 <- dnorm(x, mean = 1, sd = 0.5)
y_kurtosis_3 <- dnorm(x, mean = 1, sd = 2)

df_kurtosis_1 <- data.frame(x = x, y = y_kurtosis_1)
df_kurtosis_2 <- data.frame(x = x, y = y_kurtosis_2)
df_kurtosis_3 <- data.frame(x = x, y = y_kurtosis_3)

R: Distribution: Kurtosis

Then add geom_line, for each different levels of kurtosis to the plot grammar.

plot <- ggplot() +
  geom_line(data = df_standard, color = "black"
    aes(x = x, y = y), linewidth = 0.2) +
  geom_line(data = df_kurtosis_1,
    aes(x = x, y = y), color = "red",
    linetype = "dashed", linewidth = 0.2) +
  geom_line(data = df_kurtosis_2,
    aes(x = x, y = y), color = "green",
    linetype = "dashed", linewidth = 0.2) +
  geom_line(data = df_kurtosis_3,
    aes(x = x, y = y), color = "blue",
    linetype = "dashed", linewidth = 0.2) +
  labs(x = "x", y = "Density",
    title = "Normal Distribution ",
      "with Different Kurtosis") +
  scale_linetype_manual(
    values = c("solid", "dashed", "dashed", "dashed"),
    labels = c(
      "Standard Normal", "Standard Kurtosis = 0",
      "Lower Kurtosis", "Higher Kurtosis")) +
  theme_minimal() +
  theme(
    text = element_text(size = 4))

R: Distribution: Kurtosis

The plot result can be shown as follows:

R: Distribution: Kurtosis

You can obtain the interactive JupyterLab in this following link:

Skewness

The same can be applied with skewness.

Let’s make examples of distributions with different skewness parameters.

  1. Negative skewness
  2. Moderate positive skewness
  3. High positive skewness
y_standard <- dnorm(x)
df_standard <- data.frame(x = x, y = y_standard)

y_skewed_1 <- dnorm(x) * 2 * pnorm(x)
y_skewed_2 <- dnorm(x) * 2 * pnorm(-x)
y_skewed_3 <- dnorm(x) * 2 * pnorm(x) * 2

df_skewed_1 <- data.frame(x = x, y = y_skewed_1)
df_skewed_2 <- data.frame(x = x, y = y_skewed_2)
df_skewed_3 <- data.frame(x = x, y = y_skewed_3)

R: Distribution: Skewness

Then again add geom_line, for each different skewed distributions to the plot grammar.

plot <- ggplot() +
  geom_line(data = df_standard, color = "black",
    aes(x = x, y = y), linewidth = 0.2) +
  geom_line(data = df_skewed_1,
    aes(x = x, y = y), color = "red",
    linetype = "dashed", linewidth = 0.2) +
  geom_line(data = df_skewed_2,
    aes(x = x, y = y), color = "green",
    linetype = "dashed", linewidth = 0.2) +
  geom_line(data = df_skewed_3,
    aes(x = x, y = y), color = "blue",
    linetype = "dashed", linewidth = 0.2) +
  labs(x = "x", y = "Density",
    title = "Normal Distribution with Different Skewness") +
  scale_linetype_manual(
    values = c("solid", "dashed", "dashed", "dashed"),
    labels = c(
      "Standard Normal",
      "Negative Skewness = -4",
      "Moderate Positive Skewness = 2",
      "High Positive Skewness = 6")) +
  theme_minimal() +
  theme(
    text = element_text(size = 4))

R: Distribution: Skewness

The plot result can be shown as follows:

R: Distribution: Skewness

You can obtain the interactive JupyterLab in this following link:


Trend: Multiple

From the perspective of visualization, We can manage to display different series, in one plot, or different using grid.

Geom Smooth

Instead of calculating linear model manually, we can utilize geom_smooth to plot the curve fitting. This geom_smooth also have standard error feature.

Let’s plot an example of this case. We can start with plot area with geom_point, then by using geom_smooth add regression line, for each ys1, ys2 and ys3.

plot <- ggplot(data, aes(x = xs)) +
  geom_point(
    aes(x = xs, y = ys1),
    size = 0.5, color = "firebrick") +  
  geom_smooth(
    aes(x = xs, y = ys1), method = "lm",
    se = TRUE, color = "firebrick",
    linewidth = 0.2) +  
    text = element_text(size = 4))
  ...

R: Trend: Multiple: geom smooth

We can also add nice solarized theme. To obtain this you need ggthemes library.

  labs(x = "x", y = "y",
    title = "Scatter Plot with Regression Lines") +
  theme_solarized() +
  scale_color_solarized() +
  theme(
    text = element_text(size = 4))

R: Trend: Multiple: geom smooth

The plot result can be shown as follows:

R: Trend: Multiple: Geom Smooth

You can obtain the interactive JupyterLab in this following link:

Grid Extra

For some reason, it would be better to separate the result. For example, if you want different y-axis scale. To obtain this you need gridExtra library.

Let’s arrange plots using gridExtra horizontally.

grid_plot <- grid.arrange(
  plot_y1, plot_y2, plot_y3, ncol = 3)

R: Trend: Multiple: grid extra

The plot result can be shown as follows:

R: Trend: Multiple: Grid Extra

You can obtain the interactive JupyterLab in this following link:


Statistic Properties: One Axis Plot

Three series in one axis plot

As you can see from previous statistical properties. We can analyze the data for each series. For example we can just consider just the y-series, and obtain the mean, median, mode, and also the minimum, maximum, range, and quantiles.

Long Format

Melt

To visualize multiple y-series, we need to melt the series to long format. Piping to gather method. The gather method is available in tidyr library.

series_longer <- series %>%
  gather(key = "y", value = "value", -xs)

R: Statistic Properties: Long Format

You can check the result by cat or print the merged series.

Box Plot

The most common way to visualize this is the box plot. We can utilize geom_boxplot to get the plot.

plot <- ggplot(
    series_longer,
    aes(x = y, y = value, fill = y)) +
  geom_boxplot(color = "black", linewidth= 0.2) +
  ...

R: Statistic Properties: Box Plot

Let’s use custom colors for this example.

  scale_fill_manual(values = soft_colors) +

The plot result can be shown as follows:

R: Statistic Properties: One Axis: Box Plot

You can obtain the interactive JupyterLab in this following link:

Violin Plot

The better to visualize is by using violin plot. We can utilize geom_violin to get the plot.

plot <- ggplot(
    series_longer,
    aes(x = y, y = value, fill = y)) +
  geom_violin(color = "black", linewidth= 0.2) +
 ...

R: Statistic Properties: Violin Plot

The plot result can be shown as follows:

R: Statistic Properties: One Axis: Violin Plot

You can obtain the interactive JupyterLab in this following link:

Swarm Plot

This leave us with other option such as swarm plot and strip plot. We can get swarm plot using jitter inside geom_point.

plot <- ggplot(
    series_longer,
    aes(x = y, y = value, color = y)) +
  geom_point(
    position = position_jitterdodge(
      jitter.width = 0.3, jitter.height = 0),
    size = 0.5) +
  ...

R: Statistic Properties: Swarm Plot

The plot result can be shown as follows:

R: Statistic Properties: One Axis: Swarm Plot

You can obtain the interactive JupyterLab in this following link:

Strip Plot

We can get strip plot using geom_jitter.

plot <- ggplot(
    series_longer,
    aes(x = y, y = value, color = y)) +
  geom_jitter(
    width = 0.3, height = 0, size = 0.5) +
  ...

R: Statistic Properties: Strip Plot

The plot result can be shown as follows:

R: Statistic Properties: One Axis: Strip Plot

You can obtain the interactive JupyterLab in this following link:


Statistic Properties: Distribution

Just like previous four, we can analyse the y-axis, but this time by frequency of each series.

KDE Plot

Kernel Density Estimation

KDE shown well the distribution of the frequency. This complex task can be done easily with geom_density.

plot <- ggplot(
    series_longer,
    aes(x = value, fill = Category)) +
  geom_density(alpha = 0.7, color = NA) +
  ...

R: Statistic Properties: KDE Plot

The plot result can be shown as follows:

R: Statistic Properties: Distribution: KDE Plot

You can obtain the interactive JupyterLab in this following link:

Rug Plot

We can also simply show the rug plot using geom_rug.

plot <- ggplot(
    series_longer,
    aes(x = value, fill = Category)) +
  geom_rug(alpha = 0.5, sides = "b") +
  ...

R: Statistic Properties: Rug Plot

The plot result can be shown as follows:

R: Statistic Properties: Distribution: Rug Plot

You can obtain the interactive JupyterLab in this following link:

Histogram

This looks like the most common chart for beginner. But geom_histogram is more than the basic histogram.

plot <- ggplot(
    series_longer,
    aes(x = value, fill = Category)) +
  geom_histogram(
    binwidth = 50, linewidth = 0.2,
    alpha = 0.7, color = "black") +
  ...

R: Statistic Properties: Histogram

The plot result can be shown as follows:

R: Statistic Properties: Distribution: Histogram

You can obtain the interactive JupyterLab in this following link:


Statistic Properties: Marginal

We can step to analyse each of single axis analysis, right on its own axis using ggMarginal from ggExtra library.

Density Example

For example we can add marginal density plot. Let’s start with usual plot.

R: Statistic Properties: Marginal: Density

Then add marginal density plot.

p_with_margins <- ggMarginal(
  p, type = "density", linewidth = 0.2,)

R: Statistic Properties: Marginal: Density

The plot result can be shown as follows:

R: Statistic Properties: Marginal:

You can obtain the interactive JupyterLab in this following link:

Histogram Example

We can add different marginal plot such as histogram.

p_with_margins <- ggMarginal(
  p, type = "histogram",
  color = "black", fill = alpha("#FFD700", 0.1),
  linewidth = 0.1)

R: Statistic Properties: Marginal: Histogram

The plot result can be shown as follows:

R: Statistic Properties: Marginal:

You can obtain the interactive JupyterLab in this following link:


What’s the Next Chapter 🤔?

We can visualize statistical properties, in practical way.

Beside python and R, for statistical analysis. We can have a peek to Julia for future programming language. And also Typescript and Go, so you can integrate with your application seamlessly.

Consider continuing your exploration with [ Trend - Language - Julia - Part One ].


Conclusion

It is fun, right?

What do you think?

Farewell. We shall meet again.