Preface
Goal: Explore Julia statistic plot visualization. Providing the data using linear model.
Julia is a very interesting language. Unfortunately, I don’t have enough Julia knowledge to review. I haven’t learn the fundamental part. I have no right to talk about it yet. However, I can still plot two or three things.
Preparation
You just need Julia
installed in your system.
Library
The script provided here start from the very basic,
and you need to get additional library from time to time.
You can install the package from Julia
terminal.
add Polynomials
add Plots
add CSV
add Printf
add DataFrames
add GLM
add Distributions
add StatsPlots
add ColorSchemes
add ColorTypes
add Gadfly
add IJulia
import Cairo, Fontconfig
Data Series Samples
As usual. I provide minimal case for visualization. With only two example data.
The first one is using muiltiple series, suitable to experiment with melting dataframe.
And here is the simple one, for statistic properties, such as least square.
I use the word samples, to differ with the population. Since the calculation result would be different.
Polynomials Fit
Let’s start from simple, just reading data, and interpret.
Vector
We can use array as xs and ys as a source data.
Then use Polynomials.fit
to get the coefficient of curve fitting
This require Polynomials
library.
# Given data
x_values = [
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
y_values = [
5, 9, 13, 17, 21, 25, 29,
33, 37, 41, 45, 49, 53]
Let’s say we have our linear regression as:
We need to set the curve fitting order,
such as one for straight line.
Then perform linear regression using Polynomials.fit
.
With the result, we can extract coefficients,
and reverse them to fit the equation above.
And finally printing in rounding decimal format.
using Polynomials
order = 1
pf = fit(x_values, y_values, order)
println("Using Polynomials.fit")
coeffs_r= reverse(coeffs(pf))
println("Coefficients (a, b):")
coeffs_fmt = [
round(c, digits=2) for c in coeffs_r]
println(coeffs_fmt, "\n")
We can see the result as follows.
❯ julia 01-poly-vector.jl
Using Polynomials.fit
Coefficients (a, b):
[4.0, 5.0]
You can obtain the interactive JupyterLab
in this following link:
Dataframe
Instead of array, we can read from CSV and put the result into dataframe.
First we read data from CSV, and sanitize the column names by stripping faces.
using CSV, DataFrames
df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))
Let’s show what’s in the dataframe.
println(last(df,5))
println()
println(names(df))
We can see the result as follows.
❯ julia 02-dataframe.jl
5×4 DataFrame
Row │ xs ys1 ys2 ys3
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 8 37 229 1253
2 │ 9 41 284 1742
3 │ 10 45 345 2345
4 │ 11 49 412 3074
5 │ 12 53 485 3941
["xs", "ys1", "ys2", "ys3"]
You can obtain the interactive JupyterLab
in this following link:
Stack
Long Format
For some kind of visualization, we need to melt the DataFrame to long format.
df_long = stack(df, Not(:xs))
show(df_long, allrows=false)
println("\n")
show(names(df_long))
We can see the result as follows.
❯ julia 03-stack.jl
39×3 DataFrame
Row │ xs variable value
│ Int64 String Int64
─────┼────────────────────────
1 │ 0 ys1 5
2 │ 1 ys1 9
3 │ 2 ys1 13
4 │ 3 ys1 17
⋮ │ ⋮ ⋮ ⋮
36 │ 9 ys3 1742
37 │ 10 ys3 2345
38 │ 11 ys3 3074
39 │ 12 ys3 3941
31 rows omitted
["xs", "variable", "value"]%
We can see that this dataframe has
three series: [ys1
, ys2
, ys3
].
You can obtain the interactive JupyterLab
in this following link:
Curve Fitting
Using this dataframe, we can calculate polynomial coefficient for each series.
This is how we can define each series. First, we read data from CSV file, and also strip spaces from column names. Then extract columns from DataFrame.
using CSV, DataFrames, Polynomials, Printf
df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))
x_values = df.xs
y_values1 = df.ys1
y_values2 = df.ys2
y_values3 = df.ys3
And this is how we can calculate polynomial coefficient for each series.
- linear regression (order 1) for ys1
- quadratic curve fitting (order 2) for ys2
- cubic curve fitting (order 3) for ys3
pf_1 = fit(x_values, y_values1, 1)
coeffs_r1 = reverse(coeffs(pf_1))
coeffs_fmt_1 = [
round(c, digits=2) for c in coeffs_r1]
println("Coefficients (a, b) for ys1:")
println(coeffs_fmt_1, "\n")
pf_2 = fit(x_values, y_values2, 2)
...
pf_3 = fit(x_values, y_values3, 3)
...
We can see the result as follows.
❯ julia 04-poly-fit.jl
Coefficients (a, b) for ys1:
[4.0, 5.0]
Coefficients (a, b, c) for ys2:
[3.0, 4.0, 5.0]
Coefficients (a, b, c, d) for ys3:
[2.0, 3.0, 4.0, 5.0]
You can obtain the interactive JupyterLab
in this following link:
Merge All Series in One Plot
Instead of repeating the code, we can bundle simlar code to a function. Here we need use symbol as a function argument.
With this symbol name, we can extract x and y values. Then perform polynomial fitting, for three kinds of polynomial order.
With this equation, we need to reverse coefficients to match output. We also need to round the coefficients, and using string interpolation to print the result
function calc_coeff(df::DataFrame,
x_col::Symbol, y_col::Symbol, order::Int)
xs = df[!, x_col]
ys = df[!, y_col]
order_text = Dict(1 => "Linear",
2 => "Quadratic", 3 => "Cubic")
coeff_text = Dict(1 => "(a, b)",
2 => "(a, b, c)", 3 => "(a, b, c, d)")
pf = fit(xs, ys, order)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [round(c, digits=2) for c in cfs_r]
println("Curve type for $y_col: ",
order_text[order])
println("Coefficients ",
"$(coeff_text[order]):\n\t$cfs_fmt\n")
end
Let’s call the function for each ys series with respective order.
df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)), ' ')))
println("Using Polynomials.fit\n")
calc_coeff(df, :xs, :ys1, 1)
calc_coeff(df, :xs, :ys2, 2)
calc_coeff(df, :xs, :ys3, 3)
We can see the result as follows.
❯ julia 05-poly-merge.jl
Using Polynomials.fit
Curve type for ys1: Linear
Coefficients (a, b):
[4.0, 5.0]
Curve type for ys2: Quadratic
Coefficients (a, b, c):
[3.0, 4.0, 5.0]
Curve type for ys3: Cubic
Coefficients (a, b, c, d):
[2.0, 3.0, 4.0, 5.0]
You can obtain the interactive JupyterLab
in this following link:
Plot
Wouldn’t it be nice if we can visualized the result of the coefficient above? Let’s do this.
Straight Line
Consider start from simple straight line.
We need this library. You can install using julia REPL.
using CSV, DataFrames, Polynomials, Plots
As usual, we read data from CSV and sanitize the column names. Then extract columns from DataFrame
df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)), ' ')))
xs = df.xs
ys = df.ys1
println(xs, "\n", ys, "\n")
Now we can perform linear regression for ys. And get a new pair series (xp, yp) for the plot.
pf = fit(xs, ys, 1)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [
round(c, digits=2) for c in cfs_r]
println("Coefficients (a, b) for ys:")
println(cfs_fmt, "\n")
xp = range(minimum(xs), maximum(xs), length=100)
yp = pf.(xp)
Then draw plot.
As you can see the grammar here is interesting.
First we draw the first plot using scatter
,
then we can add the above plot using other plot parts.
All additional parts use exclamation !
.
scatter(xs, ys,
label="Data Points")
plot!(xp, yp, color=:red,
label="Linear Equation")
xlabel!("X values")
ylabel!("Y values")
title!("Straight line fitting")
Then finally save the plot output, as a PNG file.
savefig("11-poly-linear.png")
We can see the result as follows.
❯ julia 11-poly-linear.jl
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53]
Coefficients (a, b) for ys:
[4.0, 5.0]
The plot result can be shown as follows:
You can obtain the interactive JupyterLab
in this following link:
Quadratic Curve
We can adapt code above for second order. Extract columns from DataFrame, then perform quadratic regression for ys.
xs = df.xs
ys = df.ys2
println(xs, "\n", ys, "\n")
pf = fit(xs, ys, 2)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [
round(c, digits=2) for c in cfs_r]
println("Coefficients (a, b, c) for ys:")
println(cfs_fmt, "\n")
The plot result can be shown as follows:
You can obtain the interactive JupyterLab
in this following link:
Cubic Curve
And also adapt code above for third order. Extract columns from DataFrame, then perform cubic regression for ys.
xs = df.xs
ys = df.ys3
println(xs, "\n", ys, "\n")
pf = fit(xs, ys, 3)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [
round(c, digits=2) for c in cfs_r]
println("Coefficients (a, b, c, d) for ys:")
println(cfs_fmt, "\n")
The plot result can be shown as follows:
You can obtain the interactive JupyterLab
in this following link:
Merge
Instead of using three series, we can analyze only one series, but using three different orders.
We can start with th sekleton of the script.
using CSV, DataFrames, Polynomials, Plots, Printf
function calc_coeff(df::DataFrame,
x_col::Symbol, y_col::Symbol, order::Int)
...
end
function calc_coeffs(df::DataFrame)
...
end
function calc_plot_all(df::DataFrame,
x_col::Symbol, y_col::Symbol)
...
end
We are still using symbol name as parameter argument. Form this we extract x and y values. Then perform polynomial fitting as usual. Remember that we have three kinds of polynomial order.
With this equation, we need to reverse coefficients to match output. We also need to round the coefficients, and using string interpolation to print the result
function calc_coeff(df::DataFrame,
x_col::Symbol, y_col::Symbol, order::Int)
xs = df[!, x_col]
ys = df[!, y_col]
order_text = Dict(1 => "Linear",
2 => "Quadratic", 3 => "Cubic")
coeff_text = Dict(1 => "(a, b)",
2 => "(a, b, c)", 3 => "(a, b, c, d)")
pf = fit(xs, ys, order)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [round(c, digits=2) for c in cfs_r]
println("Curve type for $y_col: ",
order_text[order])
println("Coefficients ",
"$(coeff_text[order]):\n\t$cfs_fmt\n")
end
Beside plotting, we need to display the coefficient result. Here we call the function for only ys3 column with respective order.
function calc_coeffs(df::DataFrame)
println("Using Polynomials.fit\n")
calc_coeff(df, :xs, :ys3, 1)
calc_coeff(df, :xs, :ys3, 2)
calc_coeff(df, :xs, :ys3, 3)
end
In this function, we calc and plot.
- Calc all three series and
- Plot all three curve fittings.
function calc_plot_all(df::DataFrame,
x_col::Symbol, y_col::Symbol)
# Extract x and y values
xs = df[!, x_col]
ys = df[!, y_col]
# Draw Plot
xp = range(minimum(xs), maximum(xs), length=100)
yp1 = fit(xs, ys, 1).(xp)
yp2 = fit(xs, ys, 2).(xp)
yp3 = fit(xs, ys, 3).(xp)
# Plotting
scatter(xs, ys,
label="Data Points")
plot!(xp, yp1, color=:red,
label="Linear Equation")
plot!(xp, yp2, color=:green,
label="Fitted second-order polynomial")
plot!(xp, yp3, color=:blue,
label="Fitted third-order polynomial")
# Decoration
xlabel!("X values")
ylabel!("Y values")
title!("Polynomial Curve Fitting")
# Save the plot as a PNG file
savefig("15-poly-merge.png")
end
Let’s gather it all together. Plot all three series.
df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))
calc_coeffs(df)
calc_plot_all(df, :xs, :ys3)
We can see the result as follows.
❯ julia 15-poly-merge.jl
Using Polynomials.fit
Curve type for ys1: Linear
Coefficients (a, b):
[4.0, 5.0]
Curve type for ys2: Quadratic
Coefficients (a, b, c):
[3.0, 4.0, 5.0]
Curve type for ys3: Cubic
Coefficients (a, b, c, d):
[2.0, 3.0, 4.0, 5.0]
The plot result can be shown as follows:
You can obtain the interactive JupyterLab
in this following link:
Struct
Building Class
Julia has unique way to define class. You can observe the example below.
It may strange at first. And I don’t know the real reason for not having common building block for the class.
All I can imagine is working with jupyter notebook
.
It is easier to write modular code this way,
without the limit of building block issue.
Let’s see the skeleton first.
using CSV, DataFrames, Polynomials, Plots, Printf
mutable struct CurveFitter
...
end
function calc_coeff(cf::CurveFitter, order::Int)
...
end
function calc_coeffs(cf::CurveFitter)
...
end
function calc_plot_all(cf::CurveFitter)
...
end
Now, let’s see this mutable struct
.
mutable struct CurveFitter
df::DataFrame
x_col::Symbol
y_col::Symbol
function CurveFitter(df::DataFrame,
x_col::Symbol, y_col::Symbol)
return new(df, x_col, y_col)
end
end
Now we can define each methods of the class.
function calc_coeff(cf::CurveFitter, order::Int)
xs = cf.df[!, cf.x_col]
ys = cf.df[!, cf.y_col]
order_text = Dict(1 => "Linear",
2 => "Quadratic", 3 => "Cubic")
coeff_text = Dict(1 => "(a, b)",
2 => "(a, b, c)", 3 => "(a, b, c, d)")
pf = fit(xs, ys, order)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [round(c, digits=2) for c in cfs_r]
println("Curve type for $(cf.y_col): ",
order_text[order])
println("Coefficients ",
coeff_text[order], ":\n\t", cfs_fmt, "\n")
end
There is no self
Each method use cf::CurveFitter
as first parameter.
function calc_coeffs(cf::CurveFitter)
println("Using Polynomials.fit\n")
for order in 1:3
calc_coeff(cf, order)
end
end
This method is very similar with previous function.
function calc_plot_all(cf::CurveFitter, y_col::Symbol)
xs = cf.df[!, cf.x_col]
ys = cf.df[!, cf.y_col]
xp = range(minimum(xs), maximum(xs), length=100)
yp1 = fit(xs, ys, 1).(xp)
yp2 = fit(xs, ys, 2).(xp)
yp3 = fit(xs, ys, 3).(xp)
scatter(xs, ys,
label="Data Points")
plot!(xp, yp1, color=:red,
label="Linear Equation")
plot!(xp, yp2, color=:green,
label="Fitted second-order polynomial")
plot!(xp, yp3, color=:blue,
label="Fitted third-order polynomial")
xlabel!("X values")
ylabel!("Y values")
title!("Polynomial Curve Fitting")
savefig("16-poly-struct.png")
end
Again, let’s gather all stuff together. After reading data from CSV file. We need to instantiate the class, and call any necessary method.
This can be done by defining a CurveFitter object. then calculate coefficients and plot all three series.
df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))
cf = CurveFitter(df, :xs, :ys3)
calc_coeffs(cf)
calc_plot_all(cf)
We can see the result as follows.
❯ julia 16-poly-struct.jl
Using Polynomials.fit
Curve type for ys1: Linear
Coefficients (a, b):
[4.0, 5.0]
Curve type for ys2: Quadratic
Coefficients (a, b, c):
[3.0, 4.0, 5.0]
Curve type for ys3: Cubic
Coefficients (a, b, c, d):
[2.0, 3.0, 4.0, 5.0]
The plot result can be shown as follows:
You can obtain the interactive JupyterLab
in this following link:
What’s the Next Chapter 🤔?
Let’s continue our previous Julia
journey,
with building class and statistical properties,
also exploring utf-8 symbol to make calculation
so similar with the original equation.
Consider continuing your exploration with [ Trend - Language - Julia - Part Two ].