### Preface

Goal: Explore R Programming language visualization with ggplot2. Provide a bunch of example of plot cases in your fingertip.

Let’s continue our previous `ggplot2`

journey.
It is easy if we can embrace the grammar.

### Distribution

We can start with normal distribution.

The `dnorm`

method can be used to
calculate the corresponding y-values
for the standard normal distribution

`y <- dnorm(x)`

#### Normal Distribution

geom_line

We can start with load required libraries.
Then generate data points for x-axis.
And use `dnorm`

method to
calculate the corresponding y-values
for a standard normal distribution,
So we can create data frame for plotting.

```
library(ggplot2)
x <- seq(-5, 5, length.out = 1000)
y <- dnorm(x)
df <- data.frame(x = x, y = y)
```

This way we can plot the normal distribution using `geom_line`

.
Then add decoration such as grid, labels and title.
And finally save plot as PNG.

```
plot <- ggplot(df, aes(x = x, y = y)) +
geom_line(color = "black")
plot <- plot +
theme_minimal() +
theme(
text = element_text(size = 4),
panel.grid = element_blank()) +
labs(
x = "x", y = "Density",
title = "Standard Normal ",
"Distribution with Quantiles")
ggsave("63-normal.png", plot,
width = 800, height = 400, units = "px")
```

#### Normal Distribution with Quantiles

geom_area

With above plot we can add quantiles.

First we have to calculate the quantiles, based on defined percentiles mark.

```
percentiles <- c(25, 50, 75, 100)
quantiles <- quantile(x, probs = percentiles / 100)
```

And add this shade regions corresponding to percentiles,
to the plot grammar. This can be done by using `geom_area`

.

```
for (i in seq_along(quantiles)) {
plot <- plot + geom_area(
data = subset(df,x <= quantiles[i]),
aes(x = x, y = y),
fill = i, alpha = 0.3)
}
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Kurtosis

With the `dnorm`

method,
we can simulate kurtosis and skewness.

Let’s make examples of distributions with different levels of kurtosis.

- Standard normal distribution (Kurtosis = 0)
- Lower kurtosis
- Higher kurtosis

```
y_standard <- dnorm(x)
df_standard <- data.frame(x = x, y = y_standard)
y_kurtosis_1 <- dnorm(x, mean = 1, sd = 1)
y_kurtosis_2 <- dnorm(x, mean = 1, sd = 0.5)
y_kurtosis_3 <- dnorm(x, mean = 1, sd = 2)
df_kurtosis_1 <- data.frame(x = x, y = y_kurtosis_1)
df_kurtosis_2 <- data.frame(x = x, y = y_kurtosis_2)
df_kurtosis_3 <- data.frame(x = x, y = y_kurtosis_3)
```

Then add `geom_line`

,
for each different levels of kurtosis to the plot grammar.

```
plot <- ggplot() +
geom_line(data = df_standard, color = "black"
aes(x = x, y = y), linewidth = 0.2) +
geom_line(data = df_kurtosis_1,
aes(x = x, y = y), color = "red",
linetype = "dashed", linewidth = 0.2) +
geom_line(data = df_kurtosis_2,
aes(x = x, y = y), color = "green",
linetype = "dashed", linewidth = 0.2) +
geom_line(data = df_kurtosis_3,
aes(x = x, y = y), color = "blue",
linetype = "dashed", linewidth = 0.2) +
labs(x = "x", y = "Density",
title = "Normal Distribution ",
"with Different Kurtosis") +
scale_linetype_manual(
values = c("solid", "dashed", "dashed", "dashed"),
labels = c(
"Standard Normal", "Standard Kurtosis = 0",
"Lower Kurtosis", "Higher Kurtosis")) +
theme_minimal() +
theme(
text = element_text(size = 4))
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Skewness

The same can be applied with skewness.

Let’s make examples of distributions with different skewness parameters.

- Negative skewness
- Moderate positive skewness
- High positive skewness

```
y_standard <- dnorm(x)
df_standard <- data.frame(x = x, y = y_standard)
y_skewed_1 <- dnorm(x) * 2 * pnorm(x)
y_skewed_2 <- dnorm(x) * 2 * pnorm(-x)
y_skewed_3 <- dnorm(x) * 2 * pnorm(x) * 2
df_skewed_1 <- data.frame(x = x, y = y_skewed_1)
df_skewed_2 <- data.frame(x = x, y = y_skewed_2)
df_skewed_3 <- data.frame(x = x, y = y_skewed_3)
```

Then again add `geom_line`

,
for each different skewed distributions to the plot grammar.

```
plot <- ggplot() +
geom_line(data = df_standard, color = "black",
aes(x = x, y = y), linewidth = 0.2) +
geom_line(data = df_skewed_1,
aes(x = x, y = y), color = "red",
linetype = "dashed", linewidth = 0.2) +
geom_line(data = df_skewed_2,
aes(x = x, y = y), color = "green",
linetype = "dashed", linewidth = 0.2) +
geom_line(data = df_skewed_3,
aes(x = x, y = y), color = "blue",
linetype = "dashed", linewidth = 0.2) +
labs(x = "x", y = "Density",
title = "Normal Distribution with Different Skewness") +
scale_linetype_manual(
values = c("solid", "dashed", "dashed", "dashed"),
labels = c(
"Standard Normal",
"Negative Skewness = -4",
"Moderate Positive Skewness = 2",
"High Positive Skewness = 6")) +
theme_minimal() +
theme(
text = element_text(size = 4))
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

### Trend: Multiple

From the perspective of visualization, We can manage to display different series, in one plot, or different using grid.

#### Geom Smooth

Instead of calculating linear model manually,
we can utilize `geom_smooth`

to plot the curve fitting.
This `geom_smooth`

also have standard error feature.

Let’s plot an example of this case.
We can start with plot area with `geom_point`

,
then by using `geom_smooth`

add regression line,
for each `ys1`

, `ys2`

and `ys3`

.

```
plot <- ggplot(data, aes(x = xs)) +
geom_point(
aes(x = xs, y = ys1),
size = 0.5, color = "firebrick") +
geom_smooth(
aes(x = xs, y = ys1), method = "lm",
se = TRUE, color = "firebrick",
linewidth = 0.2) +
text = element_text(size = 4))
...
```

We can also add nice `solarized`

theme.
To obtain this you need `ggthemes`

library.

```
labs(x = "x", y = "y",
title = "Scatter Plot with Regression Lines") +
theme_solarized() +
scale_color_solarized() +
theme(
text = element_text(size = 4))
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Grid Extra

For some reason, it would be better to separate the result.
For example, if you want different y-axis scale.
To obtain this you need `gridExtra`

library.

Let’s arrange plots using gridExtra horizontally.

```
grid_plot <- grid.arrange(
plot_y1, plot_y2, plot_y3, ncol = 3)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

### Statistic Properties: One Axis Plot

Three series in one axis plot

As you can see from previous statistical properties.
We can analyze the data for each series.
For example we can just consider just the y-series,
and obtain the `mean`

, `median`

, `mode`

,
and also the `minimum`

, `maximum`

, `range`

, and `quantiles`

.

#### Long Format

Melt

To visualize multiple y-series,
we need to melt the series to long format.
Piping to `gather`

method.
The `gather`

method is available in `tidyr`

library.

```
series_longer <- series %>%
gather(key = "y", value = "value", -xs)
```

You can check the result by `cat`

or `print`

the merged series.

#### Box Plot

The most common way to visualize this is the box plot.
We can utilize `geom_boxplot`

to get the plot.

```
plot <- ggplot(
series_longer,
aes(x = y, y = value, fill = y)) +
geom_boxplot(color = "black", linewidth= 0.2) +
...
```

Let’s use custom colors for this example.

` scale_fill_manual(values = soft_colors) +`

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Violin Plot

The better to visualize is by using violin plot.
We can utilize `geom_violin`

to get the plot.

```
plot <- ggplot(
series_longer,
aes(x = y, y = value, fill = y)) +
geom_violin(color = "black", linewidth= 0.2) +
...
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Swarm Plot

This leave us with other option such as swarm plot and strip plot.
We can get swarm plot using `jitter`

inside `geom_point`

.

```
plot <- ggplot(
series_longer,
aes(x = y, y = value, color = y)) +
geom_point(
position = position_jitterdodge(
jitter.width = 0.3, jitter.height = 0),
size = 0.5) +
...
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Strip Plot

We can get strip plot using `geom_jitter`

.

```
plot <- ggplot(
series_longer,
aes(x = y, y = value, color = y)) +
geom_jitter(
width = 0.3, height = 0, size = 0.5) +
...
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

### Statistic Properties: Distribution

Just like previous four, we can analyse the y-axis, but this time by frequency of each series.

#### KDE Plot

Kernel Density Estimation

KDE shown well the distribution of the frequency.
This complex task can be done easily with `geom_density`

.

```
plot <- ggplot(
series_longer,
aes(x = value, fill = Category)) +
geom_density(alpha = 0.7, color = NA) +
...
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Rug Plot

We can also simply show the rug plot using `geom_rug`

.

```
plot <- ggplot(
series_longer,
aes(x = value, fill = Category)) +
geom_rug(alpha = 0.5, sides = "b") +
...
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Histogram

This looks like the most common chart for beginner.
But `geom_histogram`

is more than the basic histogram.

```
plot <- ggplot(
series_longer,
aes(x = value, fill = Category)) +
geom_histogram(
binwidth = 50, linewidth = 0.2,
alpha = 0.7, color = "black") +
...
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

### Statistic Properties: Marginal

We can step to analyse each of single axis analysis,
right on its own axis using `ggMarginal`

from `ggExtra`

library.

#### Density Example

For example we can add marginal density plot. Let’s start with usual plot.

Then add marginal density plot.

```
p_with_margins <- ggMarginal(
p, type = "density", linewidth = 0.2,)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Histogram Example

We can add different marginal plot such as histogram.

```
p_with_margins <- ggMarginal(
p, type = "histogram",
color = "black", fill = alpha("#FFD700", 0.1),
linewidth = 0.1)
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

### What’s the Next Chapter ðŸ¤”?

We can visualize statistical properties, in practical way.

Beside python and R, for statistical analysis. We can have a peek to Julia for future programming language. And also Typescript and Go, so you can integrate with your application seamlessly.

Consider continuing your exploration with [ Trend - Language - Julia - Part One ].

### Conclusion

It is fun, right?

What do you think?

Farewell. We shall meet again.