### Preface

Goal: Explore R Programming language visualization with ggplot2. Providing the data using linear model.

The thing about `R`

is I’m more in data aspect,
rather than in coding aspect.
This is weird at first for me as a coder,
so I avoid `R`

at first,
but then I love how the `R`

works.

Just like `python`

's `seaborn`

,
`R`

programming language equipped with this powerful `ggplot2`

.
Just like `python`

's `polyfit`

, there is this powerful `lm`

.
Just like `python`

. `R`

is also considered easy to learn.

Sure I have a lot of question about `R`

.
Instead of asking to the `R`

community directly,
I choose to explore the `R`

first,
and making bunch of working example.
So I can answer question in `R`

community,
whenever a member required a working example.

### Preparation

Of course you need `R`

installed in your system.
No need for `RStudio`

, but there is a few things to consider.

#### Library

The script provided here start from the very basic,
and you need to get additional library from time to time.
You can install the package from `R`

terminal.

```
install.packages("readr")
install.packages("ggplot2")
install.packages("ggthemes")
```

You might prefer `tidyverse`

for convenience.
But I’d simply choose one library at a time,
to get more understanding.

#### Jupyter Lab

You also need to activate kernel for `R`

.

`IRkernel::installspec()`

This is optional.

#### Data Series Samples

I provide minimal case for visualization.
With only two example data,
we can make many kinds of visualization.
This way, you don’t need to adapt to new data,
for each visualization.
You can also reuse the `R`

code as well.
Minimizing rethink for each step.

The first one is using muiltiple series, suitable to experiment with melting dataframe.

```
xs, ys1, ys2, ys3
0, 5, 5, 5
1, 9, 12, 14
2, 13, 25, 41
3, 17, 44, 98
4, 21, 69, 197
5, 25, 100, 350
6, 29, 137, 569
7, 33, 180, 866
8, 37, 229, 1253
9, 41, 284, 1742
10, 45, 345, 2345
11, 49, 412, 3074
12, 53, 485, 3941
```

[R: ggplot2: Statistical Properties: CSV Source][017-vim-series]

And here is the simple one, for statistic properties, such as least square.

```
x,y
0,5
1,12
2,25
3,44
4,69
5,100
6,137
7,180
8,229
9,284
10,345
11,412
12,485
```

I use the word samples, to differ with the population. Since the calculation result would be different.

### Trend: LM Model

Linear Model

Let’s get is started.

#### Vector

The array in `R`

is called vector.

```
# Given data
x_values <- c(
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
y_values <- c(
5, 14, 41, 98, 197, 350, 569, 866,
1253, 1742, 2345, 3074, 3941)
```

Let’s say we have our linear regression as:

Let’s solve the linear model using `lm()`

.
First we need to define the order of the curve fitting.
Then perform cubic regression using `lm()`

.
With the `lm_model`

object we can get the coefficient.
But for printing, we need to reverse order to match output.
At last, we can print the coefficients with `cat`

.

```
order <- 3
lm_model <- lm(y_values ~
poly(x_values, order, raw = TRUE))
coefficients <- coef(lm_model)
coefficients <- coefficients[
length(coefficients):1]
cat("Coefficients (a, b, c, d):\n\t",
coefficients, "\n")
```

This should have this result below:

```
❯ Rscript 01-lm-vector.r
Coefficients (a, b, c, d):
2 3 4 5
```

It is so predictable, right?

You can obtain the interactive `JupyterLab`

in this following link:

#### Reading from CSV

Let’s continue,
this time reading from CSV, instead of hardcoded vector.
We can utilize built-in `read.csv`

method
to read data from CSV file.

We need to extract x values and y values from the data frame.

```
data <- read.csv("series.csv")
x_values <- data$xs
y_values <- data$ys3
```

The result is exactly the same as previous.

You can obtain the interactive `JupyterLab`

in this following link:

#### Using Readr

For a more complex case,
we can utilize `readr`

library.

First we need to load the required `readr`

library.
Then read data from CSV file and put into a dataframe.
Then create a variable shortcut,
by extracting x values and y values.

```
library(readr)
data <- read_csv(
"series.csv",
show_col_types = FALSE)
column_spec <- spec(data)
x_values <- data$xs
y_values <- data$ys3
```

You can retrieve the column specifications, and print if you need to inspect.

You can obtain the interactive `JupyterLab`

in this following link:

#### Different Order of LM

We can repeat above code for different order, or make it simpler.

We can make a generic function to make the process not repetitive.

This function,
perform linear regression using `lm()`

.
Also define a named vector to map order numbers to curve types.
Get the coefficients and also reverse order to match equation above.
And we can finally print the coefficients result.

```
calc_coeff <- function(x_values, y_values, order) {
lm_model <- lm(y_values ~
poly(x_values, order, raw = TRUE))
coeff_text <- c(
"(a, b)" = 1, "(a, b, c)" = 2, "(a, b, c, d)" = 3)
order_text <- c(
"Linear" = 1, "Quadratic" = 2, "Cubic" = 3)
cat(paste("Using lm_model :",
names(order_text)[order], "\n"))
coefficients <- coef(lm_model)
coefficients <- coefficients[
length(coefficients):1]
cat("Coefficients ",
names(coeff_text)[order], ":\n\t",
coefficients, "\n")
```

This way we can calculate coefficient, for different order and for different series.

```
library(readr)
data <- read_csv(
"series.csv",
show_col_types = FALSE)
calc_coeff(data$xs, data$ys1, 1)
calc_coeff(data$xs, data$ys2, 2)
calc_coeff(data$xs, data$ys3, 3)
```

With the result as below:

```
❯ Rscript 04-lm-merge.r
Using lm_model : Linear
Coefficients (a, b) :
4 5
Using lm_model : Quadratic
Coefficients (a, b, c) :
3 4 5
Using lm_model : Cubic
Coefficients (a, b, c, d) :
2 3 4 5
```

You can obtain the interactive `JupyterLab`

in this following link:

### Trend: Built-in Plot

`R`

provide built-in plot with no additional library.
It is rather limited, but enough to get started with plotting.

#### Default Output

The default resulit is `Rplot.pdf`

.
But we can save to `png`

instead, for example:

```
# Open PNG graphics device
png("11-lm-line.png", width = 800, height = 400)
```

#### Linear Equation

We can start with plotting the data points.

```
plot(
x_values, y_values,
pch = 16, col = "blue",
xlab = "x", ylab = "y",
main = "Straight line fitting")
```

And continue with lines,
from precalculated plot values.
The `y`

values comes from the regression line,
previously performed by `lm()`

into `lm_model`

.

```
x_plot <- seq(
min(x_values), max(x_values),
length.out = 100)
y_plot <- predict(
lm_model,
newdata = data.frame(x_values = x_plot))
lines(x_plot, y_plot, col = "red")
```

We can also add decorative legend, to communicate the visual result.

```
legend("topright",
legend = c("Data points", "Linear Equation"),
col = c("blue", "red"),
pch = c(16, NA), lty = c(NA, 1))
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Straight Line

We can also utilized this `abline`

,
to add linear regression line to the plot.
so we don’t have to generate the `y_plot`

values manually.

`abline(lm_model, col = "red")`

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Quadratic Curve

From this, we can repeat the equation for the quadratic curve fitting. All we need to do is using different order for specific given data. Then change a view minor thing, such as title, and legend text. And that’s all.

`order <- 2`

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Cubic Curve

Also applied for cubic curve. Changing given data, order, and minor decorative changes. That simple.

`order <- 3`

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

### Trend: ggplot2

For complex case, we require this `ggplot2`

library.
But the thing is, we need to understand,
that plotting has its own grammar.

#### Linear Equation

Let’s try for a straight line.

To make the plot structure simple,
let’s put the model outside.
We need to generate values for the regression line,
the apply the result tp create data frame for `ggplot2`

.

```
x_plot <- seq(
min(x_values), max(x_values),
length.out = 100)
y_plot <- predict(
lm_model,
newdata = data.frame(x_values = x_plot))
data <- data.frame(x = x_values, y = y_values)
```

Now we are ready for the view.
Plot using `ggplot2`

.
As you can see, there is a lot of plus sign here.
This like an object stacked with another object,
all in one `ggplot2`

figure.

```
plot <- ggplot(data, aes(x = x, y = y)) +
geom_point(aes(color="Data Points"), size = 0.5) +
geom_line(
data = data.frame(x = x_plot, y = y_plot),
aes(x, y, color="Linear Equation"),
linewidth = 0.2) +
labs(
x = "x", y = "y",
title = "Straight line fitting") +
theme_minimal() +
theme(legend.position = "right",
legend.text = element_text(size = 2),
text = element_text(size = 4)) +
scale_color_manual(
name = "Plot",
breaks = c(
"Data Points",
"Linear Equation"),
values = c(
"Data Points"="red",
"Linear Equation"="black")) +
guides(
color = guide_legend(
override.aes = list(
shape = c(16, NA), linetype = c(0, 1)
)))
```

Do not forget to save to `png`

for convenience.
I’m using specific size (width x height),
so I can use the result directly in my blog article.

```
# Save plot as PNG
ggsave("14-lm-gg-line.png",
plot, width = 800, height = 400, units = "px")
```

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Quadratic Curve

By changing the given data, order and minor decorative changes,
we can apply the same `ggplot2`

grammar,
stacked parts of smaller plot object,
and sum them all to `plot`

variable.
And finally save the `png`

,
based on this generated `plot`

variable .

`order <- 2`

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

#### Cubic Curve

The same applied for cubic. You can see the detail in the source code.

`order <- 3`

The plot result can be shown as follows:

You can obtain the interactive `JupyterLab`

in this following link:

Easy peasy right?

### What’s the Next Chapter 🤔?

Let’s continue our previous `R`

journey,
with building class and statistical properties.

Consider continuing your exploration with [ Trend - Language - R - Part Two ].