Where to Discuss?

Local Group

Preface

Goal: Explore Julia statistic plot visualization. Providing the data using linear model.

Julia is a very interesting language. Unfortunately, I don’t have enough Julia knowledge to review. I haven’t learn the fundamental part. I have no right to talk about it yet. However, I can still plot two or three things.


Preparation

You just need Julia installed in your system.

Library

The script provided here start from the very basic, and you need to get additional library from time to time. You can install the package from Julia terminal.

add Polynomials
add Plots
add CSV
add Printf
add DataFrames
add GLM
add Distributions
add StatsPlots
add ColorSchemes
add ColorTypes
add Gadfly
add IJulia
import Cairo, Fontconfig

Data Series Samples

As usual. I provide minimal case for visualization. With only two example data.

The first one is using muiltiple series, suitable to experiment with melting dataframe.

And here is the simple one, for statistic properties, such as least square.

I use the word samples, to differ with the population. Since the calculation result would be different.


Polynomials Fit

Let’s start from simple, just reading data, and interpret.

Vector

We can use array as xs and ys as a source data. Then use Polynomials.fit to get the coefficient of curve fitting This require Polynomials library.

# Given data
x_values = [
  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
y_values = [
  5, 9, 13, 17, 21, 25, 29,
  33, 37, 41, 45, 49, 53]

Julia: Trend: Polynomials Fit: Vector

Let’s say we have our linear regression as:

We need to set the curve fitting order, such as one for straight line. Then perform linear regression using Polynomials.fit. With the result, we can extract coefficients, and reverse them to fit the equation above. And finally printing in rounding decimal format.

using Polynomials

order = 1
pf = fit(x_values, y_values, order)
println("Using Polynomials.fit")

coeffs_r= reverse(coeffs(pf)) 
println("Coefficients (a, b):")

coeffs_fmt = [
  round(c, digits=2) for c in coeffs_r]  
println(coeffs_fmt, "\n")

Julia: Trend: Polynomials Fit: Vector

We can see the result as follows.

❯ julia 01-poly-vector.jl
Using Polynomials.fit
Coefficients (a, b):
[4.0, 5.0]

401-vim-poly-vector-03

You can obtain the interactive JupyterLab in this following link:

Dataframe

Instead of array, we can read from CSV and put the result into dataframe.

First we read data from CSV, and sanitize the column names by stripping faces.

using CSV, DataFrames

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

Let’s show what’s in the dataframe.

println(last(df,5))
println()
println(names(df))

Julia: Trend: Polynomials Fit: Dataframe

We can see the result as follows.

❯ julia 02-dataframe.jl
5×4 DataFrame
 Row │ xs     ys1    ys2    ys3   
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 │     8     37    229   1253
   2 │     9     41    284   1742
   3 │    10     45    345   2345
   4 │    11     49    412   3074
   5 │    12     53    485   3941

["xs", "ys1", "ys2", "ys3"]

Julia: Trend: Polynomials Fit: Dataframe

You can obtain the interactive JupyterLab in this following link:

Stack

Long Format

For some kind of visualization, we need to melt the DataFrame to long format.

df_long = stack(df, Not(:xs))

show(df_long, allrows=false)
println("\n")

show(names(df_long))

Julia: Trend: Polynomials Fit: Stack

We can see the result as follows.

❯ julia 03-stack.jl
39×3 DataFrame
 Row │ xs     variable  value 
     │ Int64  String    Int64 
─────┼────────────────────────
   1 │     0  ys1           5
   2 │     1  ys1           9
   3 │     2  ys1          13
   4 │     3  ys1          17
  ⋮  │   ⋮       ⋮        ⋮
  36 │     9  ys3        1742
  37 │    10  ys3        2345
  38 │    11  ys3        3074
  39 │    12  ys3        3941
               31 rows omitted

["xs", "variable", "value"]%  

Julia: Trend: Polynomials Fit: Stack

We can see that this dataframe has three series: [ys1, ys2, ys3].

You can obtain the interactive JupyterLab in this following link:

Curve Fitting

Using this dataframe, we can calculate polynomial coefficient for each series.

This is how we can define each series. First, we read data from CSV file, and also strip spaces from column names. Then extract columns from DataFrame.

using CSV, DataFrames, Polynomials, Printf

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

x_values = df.xs
y_values1 = df.ys1
y_values2 = df.ys2
y_values3 = df.ys3

Julia: Trend: Polynomials Fit: Curve Fitting

And this is how we can calculate polynomial coefficient for each series.

  • linear regression (order 1) for ys1
  • quadratic curve fitting (order 2) for ys2
  • cubic curve fitting (order 3) for ys3
pf_1 = fit(x_values, y_values1, 1)

coeffs_r1 = reverse(coeffs(pf_1))
coeffs_fmt_1 = [
  round(c, digits=2) for c in coeffs_r1]

println("Coefficients (a, b) for ys1:")
println(coeffs_fmt_1, "\n")

pf_2 = fit(x_values, y_values2, 2)
...

pf_3 = fit(x_values, y_values3, 3)
...

Julia: Trend: Polynomials Fit: Curve Fitting

We can see the result as follows.

❯ julia 04-poly-fit.jl
Coefficients (a, b) for ys1:
[4.0, 5.0]

Coefficients (a, b, c) for ys2:
[3.0, 4.0, 5.0]

Coefficients (a, b, c, d) for ys3:
[2.0, 3.0, 4.0, 5.0]

Julia: Trend: Polynomials Fit: Curve Fitting

You can obtain the interactive JupyterLab in this following link:

Merge All Series in One Plot

Instead of repeating the code, we can bundle simlar code to a function. Here we need use symbol as a function argument.

With this symbol name, we can extract x and y values. Then perform polynomial fitting, for three kinds of polynomial order.

With this equation, we need to reverse coefficients to match output. We also need to round the coefficients, and using string interpolation to print the result

function calc_coeff(df::DataFrame,
    x_col::Symbol, y_col::Symbol, order::Int)

  xs = df[!, x_col]
  ys = df[!, y_col]

  order_text = Dict(1 => "Linear",
    2 => "Quadratic", 3 => "Cubic")
  coeff_text = Dict(1 => "(a, b)",
    2 => "(a, b, c)", 3 => "(a, b, c, d)")

  pf = fit(xs, ys, order)
  cfs_r = reverse(coeffs(pf))
  cfs_fmt = [round(c, digits=2) for c in cfs_r]

  println("Curve type for $y_col: ",
    order_text[order])
  println("Coefficients ",
    "$(coeff_text[order]):\n\t$cfs_fmt\n")
end

Julia: Trend: Polynomials Fit: Merge

Let’s call the function for each ys series with respective order.

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)), ' ')))

println("Using Polynomials.fit\n")
calc_coeff(df, :xs, :ys1, 1)
calc_coeff(df, :xs, :ys2, 2)
calc_coeff(df, :xs, :ys3, 3)

Julia: Trend: Polynomials Fit: Merge

We can see the result as follows.

❯ julia 05-poly-merge.jl
Using Polynomials.fit

Curve type for ys1: Linear
Coefficients (a, b):
        [4.0, 5.0]

Curve type for ys2: Quadratic
Coefficients (a, b, c):
        [3.0, 4.0, 5.0]

Curve type for ys3: Cubic
Coefficients (a, b, c, d):
        [2.0, 3.0, 4.0, 5.0]

Julia: Trend: Polynomials Fit: Merge

You can obtain the interactive JupyterLab in this following link:


Plot

Wouldn’t it be nice if we can visualized the result of the coefficient above? Let’s do this.

Straight Line

Consider start from simple straight line.

We need this library. You can install using julia REPL.

using CSV, DataFrames, Polynomials, Plots

As usual, we read data from CSV and sanitize the column names. Then extract columns from DataFrame

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)), ' ')))

xs = df.xs
ys = df.ys1
println(xs, "\n", ys, "\n")

Julia: Trend: Plot: Straight Line

Now we can perform linear regression for ys. And get a new pair series (xp, yp) for the plot.

pf = fit(xs, ys, 1)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [
  round(c, digits=2) for c in cfs_r]

println("Coefficients (a, b) for ys:")
println(cfs_fmt, "\n")

xp = range(minimum(xs), maximum(xs), length=100)
yp = pf.(xp)

Julia: Trend: Plot: Straight Line

Then draw plot. As you can see the grammar here is interesting. First we draw the first plot using scatter, then we can add the above plot using other plot parts. All additional parts use exclamation !.

scatter(xs, ys,
  label="Data Points")
plot!(xp, yp, color=:red,
  label="Linear Equation")
xlabel!("X values")
ylabel!("Y values")
title!("Straight line fitting")

Julia: Trend: Plot: Straight Line

Then finally save the plot output, as a PNG file.

savefig("11-poly-linear.png")

We can see the result as follows.

❯ julia 11-poly-linear.jl
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53]

Coefficients (a, b) for ys:
[4.0, 5.0]

The plot result can be shown as follows:

Julia: Trend: Built-in Plot: Linear Equation

You can obtain the interactive JupyterLab in this following link:

Quadratic Curve

We can adapt code above for second order. Extract columns from DataFrame, then perform quadratic regression for ys.

xs = df.xs
ys = df.ys2
println(xs, "\n", ys, "\n")

pf = fit(xs, ys, 2)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [
  round(c, digits=2) for c in cfs_r]

println("Coefficients (a, b, c) for ys:")
println(cfs_fmt, "\n")

Julia: Trend: Plot: Quadratic Curve

The plot result can be shown as follows:

Julia: Trend: Built-in Plot: Quadratic Curve

You can obtain the interactive JupyterLab in this following link:

Cubic Curve

And also adapt code above for third order. Extract columns from DataFrame, then perform cubic regression for ys.

xs = df.xs
ys = df.ys3
println(xs, "\n", ys, "\n")

pf = fit(xs, ys, 3)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [
  round(c, digits=2) for c in cfs_r]

println("Coefficients (a, b, c, d) for ys:")
println(cfs_fmt, "\n")

Julia: Trend: Plot: Cubic Curve

The plot result can be shown as follows:

Julia: Trend: Built-in Plot: Cubic Curve

You can obtain the interactive JupyterLab in this following link:

Merge

Instead of using three series, we can analyze only one series, but using three different orders.

We can start with th sekleton of the script.

using CSV, DataFrames, Polynomials, Plots, Printf

function calc_coeff(df::DataFrame,
    x_col::Symbol, y_col::Symbol, order::Int)
  ...
end

function calc_coeffs(df::DataFrame)
  ...
end

function calc_plot_all(df::DataFrame,
    x_col::Symbol, y_col::Symbol)
  ...
end

Julia: Trend: Plot: Merge All Series

We are still using symbol name as parameter argument. Form this we extract x and y values. Then perform polynomial fitting as usual. Remember that we have three kinds of polynomial order.

With this equation, we need to reverse coefficients to match output. We also need to round the coefficients, and using string interpolation to print the result

function calc_coeff(df::DataFrame,
    x_col::Symbol, y_col::Symbol, order::Int)

  xs = df[!, x_col]
  ys = df[!, y_col]

  order_text = Dict(1 => "Linear",
    2 => "Quadratic", 3 => "Cubic")
  coeff_text = Dict(1 => "(a, b)",
    2 => "(a, b, c)", 3 => "(a, b, c, d)")

  pf = fit(xs, ys, order)
  cfs_r = reverse(coeffs(pf))
  cfs_fmt = [round(c, digits=2) for c in cfs_r]

  println("Curve type for $y_col: ",
    order_text[order])
  println("Coefficients ",
    "$(coeff_text[order]):\n\t$cfs_fmt\n")
end

Julia: Trend: Plot: Merge All Series

Beside plotting, we need to display the coefficient result. Here we call the function for only ys3 column with respective order.

function calc_coeffs(df::DataFrame)
  println("Using Polynomials.fit\n")

  calc_coeff(df, :xs, :ys3, 1)
  calc_coeff(df, :xs, :ys3, 2)
  calc_coeff(df, :xs, :ys3, 3)
end

Julia: Trend: Plot: Merge All Series

In this function, we calc and plot.

  • Calc all three series and
  • Plot all three curve fittings.
function calc_plot_all(df::DataFrame,
    x_col::Symbol, y_col::Symbol)

  # Extract x and y values
  xs = df[!, x_col]
  ys = df[!, y_col]

  # Draw Plot
  xp = range(minimum(xs), maximum(xs), length=100)
  yp1 = fit(xs, ys, 1).(xp)
  yp2 = fit(xs, ys, 2).(xp)
  yp3 = fit(xs, ys, 3).(xp)

  # Plotting
  scatter(xs, ys,
    label="Data Points")
  plot!(xp, yp1, color=:red,
    label="Linear Equation")
  plot!(xp, yp2, color=:green,
    label="Fitted second-order polynomial")
  plot!(xp, yp3, color=:blue,
    label="Fitted third-order polynomial")

  # Decoration
  xlabel!("X values")
  ylabel!("Y values")
  title!("Polynomial Curve Fitting")

  # Save the plot as a PNG file
  savefig("15-poly-merge.png")
end

Julia: Trend: Plot: Merge All Series

Let’s gather it all together. Plot all three series.

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

calc_coeffs(df)
calc_plot_all(df, :xs, :ys3)

Julia: Trend: Plot: Merge All Series

We can see the result as follows.

❯ julia 15-poly-merge.jl
Using Polynomials.fit

Curve type for ys1: Linear
Coefficients (a, b):
        [4.0, 5.0]

Curve type for ys2: Quadratic
Coefficients (a, b, c):
        [3.0, 4.0, 5.0]

Curve type for ys3: Cubic
Coefficients (a, b, c, d):
        [2.0, 3.0, 4.0, 5.0]

Julia: Trend: Plot: Merge All Series

The plot result can be shown as follows:

Julia: Trend: Built-in Plot: Merge Series

You can obtain the interactive JupyterLab in this following link:

Struct

Building Class

Julia has unique way to define class. You can observe the example below.

It may strange at first. And I don’t know the real reason for not having common building block for the class.

All I can imagine is working with jupyter notebook. It is easier to write modular code this way, without the limit of building block issue.

Let’s see the skeleton first.

using CSV, DataFrames, Polynomials, Plots, Printf

mutable struct CurveFitter
  ...
end

function calc_coeff(cf::CurveFitter, order::Int)

  ...
end

function calc_coeffs(cf::CurveFitter)
  ...
end

function calc_plot_all(cf::CurveFitter)
  ...
end

Julia: Trend: Plot: Building Class with Struct

Now, let’s see this mutable struct.

mutable struct CurveFitter
  df::DataFrame
  x_col::Symbol
  y_col::Symbol

  function CurveFitter(df::DataFrame,
      x_col::Symbol, y_col::Symbol)
    return new(df, x_col, y_col)
  end
end

Julia: Trend: Plot: Building Class with Struct

Now we can define each methods of the class.

function calc_coeff(cf::CurveFitter, order::Int)
  xs = cf.df[!, cf.x_col]
  ys = cf.df[!, cf.y_col]

  order_text = Dict(1 => "Linear",
    2 => "Quadratic", 3 => "Cubic")
  coeff_text = Dict(1 => "(a, b)",
    2 => "(a, b, c)", 3 => "(a, b, c, d)")

  pf = fit(xs, ys, order)
  cfs_r = reverse(coeffs(pf))
  cfs_fmt = [round(c, digits=2) for c in cfs_r]

  println("Curve type for $(cf.y_col): ",
    order_text[order])
  println("Coefficients ",
    coeff_text[order], ":\n\t", cfs_fmt, "\n")
end

Julia: Trend: Plot: Building Class with Struct

There is no self

Each method use cf::CurveFitter as first parameter.

function calc_coeffs(cf::CurveFitter)
  println("Using Polynomials.fit\n")
  for order in 1:3
    calc_coeff(cf, order)
  end
end

Julia: Trend: Plot: Building Class with Struct

This method is very similar with previous function.

function calc_plot_all(cf::CurveFitter, y_col::Symbol)
  xs = cf.df[!, cf.x_col]
  ys = cf.df[!, cf.y_col]

  xp = range(minimum(xs), maximum(xs), length=100)
  yp1 = fit(xs, ys, 1).(xp)
  yp2 = fit(xs, ys, 2).(xp)
  yp3 = fit(xs, ys, 3).(xp)

  scatter(xs, ys,
    label="Data Points")
  plot!(xp, yp1, color=:red,
    label="Linear Equation")
  plot!(xp, yp2, color=:green,
    label="Fitted second-order polynomial")
  plot!(xp, yp3, color=:blue,
    label="Fitted third-order polynomial")

  xlabel!("X values")
  ylabel!("Y values")
  title!("Polynomial Curve Fitting")

  savefig("16-poly-struct.png")
end

Julia: Trend: Plot: Building Class with Struct

Again, let’s gather all stuff together. After reading data from CSV file. We need to instantiate the class, and call any necessary method.

This can be done by defining a CurveFitter object. then calculate coefficients and plot all three series.

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

cf = CurveFitter(df, :xs, :ys3)
calc_coeffs(cf)
calc_plot_all(cf)

We can see the result as follows.

❯ julia 16-poly-struct.jl
Using Polynomials.fit

Curve type for ys1: Linear
Coefficients (a, b):
        [4.0, 5.0]

Curve type for ys2: Quadratic
Coefficients (a, b, c):
        [3.0, 4.0, 5.0]

Curve type for ys3: Cubic
Coefficients (a, b, c, d):
        [2.0, 3.0, 4.0, 5.0]

Julia: Trend: Plot: Building Class with Struct

The plot result can be shown as follows:

Julia: Trend: Built-in Plot: Building Class

You can obtain the interactive JupyterLab in this following link:


What’s the Next Chapter 🤔?

Let’s continue our previous Julia journey, with building class and statistical properties, also exploring utf-8 symbol to make calculation so similar with the original equation.

Consider continuing your exploration with [ Trend - Language - Julia - Part Two ].