Where to Discuss?

Local Group

Preface

Goal: Explore Julia statistic plot visualization. Providing the data using linear model.

Welcome to our Trend meets Julia trilogy.

Julia is like the youngest sibling in the programming family. Fast, smart, and still figuring out where the spoons are. While I haven’t yet studied all of its grammar rules, or fully learned how to conjugate its macros, we can already do what truly matters: plot pretty charts and poke at distributions.

Starting early with Julia means, we are future proofing our workflow , nd catching the train before it becomes the bullet train Itโ€™s okay if we don’t understand everything yet. Plots speak first. Syntax catches up later.

We may not be fluent yet, but weโ€™re statistically enthusiastic.


Preparation

All we need is Julia installed on our system. Preferably somewhere we can find it later without using locate.

Library

The script here begins from the absolute basics. As with any language trying to be both fast and friendly, libraries need to be added gradually, like toppings on statistical pizza.

Use Julia's terminal to install them. Yes, it talks back.

add Polynomials
add Plots
add CSV
add Printf
add DataFrames
add GLM
add Distributions
add StatsPlots
add ColorSchemes
add ColorTypes
add Gadfly
add IJulia
import Cairo, Fontconfig

These libraries are our statistical toolbox. GLM for models, Plots for the art, Distributions for the theory, and ColorSchemes. Letโ€™s face it, default colors are rarely emotionally fulfilling.

Data Series Samples

As is tradition, we begin with minimal cases. Two datasets to keep things simple. No fifty-column CSV monsters today.

The first is a multi-series file. Great for practicing melt operations, and pretending we’re experts in data tidying:

The second one is a clean, focused dataset for statistical modeling like least squares. Small enough to open in a text editor.

These sample files let us skip, the “where is my data” drama , nd focus on learning the plotting tools. Also, note the filename: “samples” not “population”. Statistically speaking, thatโ€™s not a typo, it is a worldview.


Polynomials Fit

Let’s start from simple, just reading data, and interpret.

Vector

We begin with simple arrays. No DataFrame drama. Just pure, honest xs and ys as a source data. Then use Polynomials.fit to get the coefficient of curve fitting This require Polynomials library.

# Given data
x_values = [
  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
y_values = [
  5, 9, 13, 17, 21, 25, 29,
  33, 37, 41, 45, 49, 53]

Julia: Trend: Polynomials Fit: Vector

We suspect a linear relationship. And yes, sometimes data is kind enough not to lie to our faces.

Our target regression model:

Letโ€™s let Julia do the math with Polynomials.fit:

using Polynomials

order = 1
pf = fit(x_values, y_values, order)
println("Using Polynomials.fit")

coeffs_r= reverse(coeffs(pf)) 
println("Coefficients (a, b):")

coeffs_fmt = [
  round(c, digits=2) for c in coeffs_r]  
println(coeffs_fmt, "\n")

We need to set the curve fitting order, such as one for straight line. Then perform linear regression using Polynomials.fit. With the result, we can extract coefficients, and reverse them to fit the equation above. And finally printing in rounding decimal format.

Julia: Trend: Polynomials Fit: Vector

We can see the result as follows.

โฏ julia 01-poly-vector.jl
Using Polynomials.fit
Coefficients (a, b):
[4.0, 5.0]

401-vim-poly-vector-03

๐Ÿ““ Jupyter version here:

This is the starting point for understanding, how our data trends behave over time or iterations. It helps reveal whether we’re dealing with a polite straight line, or something a bit more rebellious.

Dataframe

Ready to scale up? Let’s switch to using DataFrames for better structure and flexibility.

Instead of array, we can read from CSV, and put the result into dataframe. First we read data from CSV, and sanitize the column names by stripping faces.

using CSV, DataFrames

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

Check the contents like any cautious statistician would:

println(last(df,5))
println()
println(names(df))

Julia: Trend: Polynomials Fit: Dataframe

We can see the result as follows.

โฏ julia 02-dataframe.jl
5ร—4 DataFrame
 Row โ”‚ xs     ys1    ys2    ys3   
     โ”‚ Int64  Int64  Int64  Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     8     37    229   1253
   2 โ”‚     9     41    284   1742
   3 โ”‚    10     45    345   2345
   4 โ”‚    11     49    412   3074
   5 โ”‚    12     53    485   3941

["xs", "ys1", "ys2", "ys3"]

Julia: Trend: Polynomials Fit: Dataframe

๐Ÿ““ Jupyter version here:

Stack

Long Format

For some kind of visualization, we need to melt the DataFrame to long format.

df_long = stack(df, Not(:xs))

show(df_long, allrows=false)
println("\n")

show(names(df_long))

Julia: Trend: Polynomials Fit: Stack

We can see the result as follows.

โฏ julia 03-stack.jl
39ร—3 DataFrame
 Row โ”‚ xs     variable  value 
     โ”‚ Int64  String    Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     0  ys1           5
   2 โ”‚     1  ys1           9
   3 โ”‚     2  ys1          13
   4 โ”‚     3  ys1          17
  โ‹ฎ  โ”‚   โ‹ฎ       โ‹ฎ        โ‹ฎ
  36 โ”‚     9  ys3        1742
  37 โ”‚    10  ys3        2345
  38 โ”‚    11  ys3        3074
  39 โ”‚    12  ys3        3941
               31 rows omitted

["xs", "variable", "value"]%  

Julia: Trend: Polynomials Fit: Stack

We can see that this dataframe has three series: [ys1, ys2, ys3].

You can obtain the interactive JupyterLab in this following link:

Stacked (or melted) data makes it easier, to feed multiple series into a single plot. Our charts become neater, and our code becomes less repetitive.

Curve Fitting

Now, the real fun begins. Let’s fit curves of varying orders for our three series.

This is how we can define each series. First, we read data from CSV file, and also strip spaces from column names. Then extract columns from DataFrame.

using CSV, DataFrames, Polynomials, Printf

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

x_values = df.xs
y_values1 = df.ys1
y_values2 = df.ys2
y_values3 = df.ys3

Julia: Trend: Polynomials Fit: Curve Fitting

We match each series with a polynomial of suitable ambition. This is how we can calculate polynomial coefficient for each series.

  • linear regression (order 1) for ysโ‚
  • quadratic curve fitting (order 2) for ysโ‚‚
  • cubic curve fitting (order 3) for ysโ‚ƒ
pf_1 = fit(x_values, y_values1, 1)

coeffs_r1 = reverse(coeffs(pf_1))
coeffs_fmt_1 = [
  round(c, digits=2) for c in coeffs_r1]

println("Coefficients (a, b) for ys1:")
println(coeffs_fmt_1, "\n")

… and similarly for pf_2 and pf_3.

pf_2 = fit(x_values, y_values2, 2)
...

pf_3 = fit(x_values, y_values3, 3)
...

Julia: Trend: Polynomials Fit: Curve Fitting

We can see the result as follows.

โฏ julia 04-poly-fit.jl
Coefficients (a, b) for ys1:
[4.0, 5.0]

Coefficients (a, b, c) for ys2:
[3.0, 4.0, 5.0]

Coefficients (a, b, c, d) for ys3:
[2.0, 3.0, 4.0, 5.0]

Julia: Trend: Polynomials Fit: Curve Fitting

๐Ÿ““ Jupyter version here:

By fitting different polynomial orders, we can assess the complexity of each trend. Are we dealing with a gentle slope or a chaotic rollercoaster?

Merge All Series in One Function

Time to refactor. Let’s write a reusable function and pass symbols like pros.

With this symbol name, we can extract x and y values. Then perform polynomial fitting, for three kinds of polynomial order.

With this equation, we need to reverse coefficients to match output. We also need to round the coefficients, and using string interpolation to print the result

function calc_coeff(df::DataFrame,
    x_col::Symbol, y_col::Symbol, order::Int)

  xs = df[!, x_col]
  ys = df[!, y_col]

  order_text = Dict(1 => "Linear",
    2 => "Quadratic", 3 => "Cubic")
  coeff_text = Dict(1 => "(a, b)",
    2 => "(a, b, c)", 3 => "(a, b, c, d)")

  pf = fit(xs, ys, order)
  cfs_r = reverse(coeffs(pf))
  cfs_fmt = [round(c, digits=2) for c in cfs_r]

  println("Curve type for $y_col: ",
    order_text[order])
  println("Coefficients ",
    "$(coeff_text[order]):\n\t$cfs_fmt\n")
end

Julia: Trend: Polynomials Fit: Merge

Let’s call the function for each ys series with respective order.

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)), ' ')))

println("Using Polynomials.fit\n")
calc_coeff(df, :xs, :ys1, 1)
calc_coeff(df, :xs, :ys2, 2)
calc_coeff(df, :xs, :ys3, 3)

Julia: Trend: Polynomials Fit: Merge

We can see the result as follows.

โฏ julia 05-poly-merge.jl
Using Polynomials.fit

Curve type for ys1: Linear
Coefficients (a, b):
        [4.0, 5.0]

Curve type for ys2: Quadratic
Coefficients (a, b, c):
        [3.0, 4.0, 5.0]

Curve type for ys3: Cubic
Coefficients (a, b, c, d):
        [2.0, 3.0, 4.0, 5.0]

Julia: Trend: Polynomials Fit: Merge

๐Ÿ““ Jupyter version here:

We avoid copy-paste regression, by turning repeated logic into one elegant function. Reusability is the hallmark of statistical sanity.


Plot

Where regressions meet Renaissance.

Wouldn’t it be satisfying, if we could see the regression we just calculated? I mean, what’s the point of fitting models, if we don’t let them strut their stuff on a plot?

Straight Line

Let’s start with the most basic of polynomial creatures: the straight line. No drama, no surprises, just honest-to-goodness linearity.

To begin, we bring in our trusty tools. Think of these as the stat nerdโ€™s brush and palette: We can install using julia REPL.

using CSV, DataFrames, Polynomials, Plots

As usual, we load the data from CSV and sanitize those column names. Then extract columns from DataFrame, like we’re preparing them for polite statistical society:

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)), ' ')))

xs = df.xs
ys = df.ys1
println(xs, "\n", ys, "\n")

๐Ÿ–ผ๏ธ Visual checkpoint

Julia: Trend: Plot: Straight LineSeeing all fits

Now we fit a linear model by performing linear regression for ys. And get a new pair series (xp, yp) for the plot. We also pretty-print the coefficients because weโ€™re civilised like that:

pf = fit(xs, ys, 1)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [
  round(c, digits=2) for c in cfs_r]

println("Coefficients (a, b) for ys:")
println(cfs_fmt, "\n")

xp = range(minimum(xs), maximum(xs), length=100)
yp = pf.(xp)

๐Ÿ“‰ Mid-plot drama

Julia: Trend: Plot: Straight Line

And now, the fun partโ€”drawing it.

As you can see the grammar here is interesting. First we draw the first plot using scatter, then we can add the above plot using other plot parts. All additional parts use exclamation !.

scatter(xs, ys,
  label="Data Points")
plot!(xp, yp, color=:red,
  label="Linear Equation")
xlabel!("X values")
ylabel!("Y values")
title!("Straight line fitting")

Julia: Trend: Plot: Straight Line

๐Ÿ’พ Save our masterpiece

Then finally save the plot output, as a PNG file.

savefig("11-poly-linear.png")

We can see the result as follows.

โฏ julia 11-poly-linear.jl
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53]

Coefficients (a, b) for ys:
[4.0, 5.0]

๐Ÿ“ธ Gallery update

The plot result can be shown as follows:

Julia: Trend: Built-in Plot: Linear Equation

๐Ÿงช Interactive playground:

You can obtain the interactive JupyterLab in this following link:

Linear fitting is our first statistical hello to data trends. It tells us if the relationship walks in a straight line or takes a detour.

Quadratic Curve

Lines are great. But sometimes data likes to show off and curve a bit. Let’s adapt code above for second order. Extract columns from DataFrame, then perform quadratic regression for ys.

xs = df.xs
ys = df.ys2
println(xs, "\n", ys, "\n")

pf = fit(xs, ys, 2)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [
  round(c, digits=2) for c in cfs_r]

println("Coefficients (a, b, c) for ys:")
println(cfs_fmt, "\n")

๐Ÿ“ธ Quadratic got curves

Julia: Trend: Plot: Quadratic Curve

The plot result can be shown as follows:

Julia: Trend: Built-in Plot: Quadratic Curve

๐Ÿ“š Interactive version:

You can obtain the interactive JupyterLab in this following link:

Quadratics help us catch nonlinear patterns, that a straight line would totally miss. We’re not just drawing curves. Weโ€™re modeling real-world bounce and dip.

Cubic Curve

When quadratic still doesn’t cut it, bring in the big polynomial guns: cubic regression.

xs = df.xs
ys = df.ys3
println(xs, "\n", ys, "\n")

pf = fit(xs, ys, 3)
cfs_r = reverse(coeffs(pf))
cfs_fmt = [
  round(c, digits=2) for c in cfs_r]

println("Coefficients (a, b, c, d) for ys:")
println(cfs_fmt, "\n")

Julia: Trend: Plot: Cubic Curve

๐ŸŽข Now thatโ€™s a curve

The plot result can be shown as follows:

Julia: Trend: Built-in Plot: Cubic Curve

๐Ÿงช Interactive notebook:

You can obtain the interactive JupyterLab in this following link:

Cubic fits are flexible. Maybe too flexible. ike a statistics student overfitting their final project. Use with care.


Merging Plot

Why juggle multiple columns when one will do? Let’s use a single series and fit it using all three orders. Because comparing fits is our version of competitive curling.

Instead of using three series, we can analyze only one series, but using three different orders.

First, we create function skeletons:

using CSV, DataFrames, Polynomials, Plots, Printf

function calc_coeff(df::DataFrame,
    x_col::Symbol, y_col::Symbol, order::Int)
  ...
end

function calc_coeffs(df::DataFrame)
  ...
end

function calc_plot_all(df::DataFrame,
    x_col::Symbol, y_col::Symbol)
  ...
end

Julia: Trend: Plot: Merge All Series

We are still using symbol name as parameter argument. Form this we extract x and y values. Then perform polynomial fitting as usual. Remember that we have three kinds of polynomial order.

Fit and print coefficients:

With this equation, we need to reverse coefficients to match output. We also need to round the coefficients, and using string interpolation to print the result

function calc_coeff(df::DataFrame,
    x_col::Symbol, y_col::Symbol, order::Int)

  xs = df[!, x_col]
  ys = df[!, y_col]

  order_text = Dict(1 => "Linear",
    2 => "Quadratic", 3 => "Cubic")
  coeff_text = Dict(1 => "(a, b)",
    2 => "(a, b, c)", 3 => "(a, b, c, d)")

  pf = fit(xs, ys, order)
  cfs_r = reverse(coeffs(pf))
  cfs_fmt = [round(c, digits=2) for c in cfs_r]

  println("Curve type for $y_col: ",
    order_text[order])
  println("Coefficients ",
    "$(coeff_text[order]):\n\t$cfs_fmt\n")
end

Julia: Trend: Plot: Merge All Series

Aggregate results:

Beside plotting, we need to display the coefficient result. Here we call the function for only ys3 column with respective order.

function calc_coeffs(df::DataFrame)
  println("Using Polynomials.fit\n")

  calc_coeff(df, :xs, :ys3, 1)
  calc_coeff(df, :xs, :ys3, 2)
  calc_coeff(df, :xs, :ys3, 3)
end

Julia: Trend: Plot: Merge All Series

Plot them all:

In this function, we calc and plot.

  • Calc all three series and
  • Plot all three curve fittings.
function calc_plot_all(df::DataFrame,
    x_col::Symbol, y_col::Symbol)

  # Extract x and y values
  xs = df[!, x_col]
  ys = df[!, y_col]

  # Draw Plot
  xp = range(minimum(xs), maximum(xs), length=100)
  yp1 = fit(xs, ys, 1).(xp)
  yp2 = fit(xs, ys, 2).(xp)
  yp3 = fit(xs, ys, 3).(xp)

  # Plotting
  scatter(xs, ys,
    label="Data Points")
  plot!(xp, yp1, color=:red,
    label="Linear Equation")
  plot!(xp, yp2, color=:green,
    label="Fitted second-order polynomial")
  plot!(xp, yp3, color=:blue,
    label="Fitted third-order polynomial")

  # Decoration
  xlabel!("X values")
  ylabel!("Y values")
  title!("Polynomial Curve Fitting")

  # Save the plot as a PNG file
  savefig("15-poly-merge.png")
end

Julia: Trend: Plot: Merge All Series

Pull it together:

Let’s gather it all together. Plot all three series.

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

calc_coeffs(df)
calc_plot_all(df, :xs, :ys3)

Julia: Trend: Plot: Merge All Series

We can see the result as follows.

โฏ julia 15-poly-merge.jl
Using Polynomials.fit

Curve type for ys1: Linear
Coefficients (a, b):
        [4.0, 5.0]

Curve type for ys2: Quadratic
Coefficients (a, b, c):
        [3.0, 4.0, 5.0]

Curve type for ys3: Cubic
Coefficients (a, b, c, d):
        [2.0, 3.0, 4.0, 5.0]

Julia: Trend: Plot: Merge All Series

๐Ÿ“ธ The whole gallery

The plot result can be shown as follows:

Julia: Trend: Built-in Plot: Merge Series

๐Ÿ“š Interactive version:

You can obtain the interactive JupyterLab in this following link:

Seeing all fits on one canvas, helps us judge whether our data just wants: a line, a parabola, or full polynomial dramatics.


Struct

Building Class

Julia has unique way to define class. You can observe the example below.

Structs in Julia are how we create custom data types with methods. Think of it like classes, but on vacation. No self, no this, just vibes and clear structure.

It may strange at first. And I don’t know the real reason for not having common building block for the class.

All I can imagine is working with jupyter notebook. It is easier to write modular code this way, without the limit of building block issue.

Let’s see the skeleton first.

using CSV, DataFrames, Polynomials, Plots, Printf

mutable struct CurveFitter
  ...
end

function calc_coeff(cf::CurveFitter, order::Int)

  ...
end

function calc_coeffs(cf::CurveFitter)
  ...
end

function calc_plot_all(cf::CurveFitter)
  ...
end

Julia: Trend: Plot: Building Class with Struct

Create our CurveFitter type

Now, let’s see this mutable struct.

mutable struct CurveFitter
  df::DataFrame
  x_col::Symbol
  y_col::Symbol

  function CurveFitter(df::DataFrame,
      x_col::Symbol, y_col::Symbol)
    return new(df, x_col, y_col)
  end
end

Julia: Trend: Plot: Building Class with Struct

Add methods

Now we can define each methods of the class.

function calc_coeff(cf::CurveFitter, order::Int)
  xs = cf.df[!, cf.x_col]
  ys = cf.df[!, cf.y_col]

  order_text = Dict(1 => "Linear",
    2 => "Quadratic", 3 => "Cubic")
  coeff_text = Dict(1 => "(a, b)",
    2 => "(a, b, c)", 3 => "(a, b, c, d)")

  pf = fit(xs, ys, order)
  cfs_r = reverse(coeffs(pf))
  cfs_fmt = [round(c, digits=2) for c in cfs_r]

  println("Curve type for $(cf.y_col): ",
    order_text[order])
  println("Coefficients ",
    coeff_text[order], ":\n\t", cfs_fmt, "\n")
end

Julia: Trend: Plot: Building Class with Struct

There is no self

Each method use cf::CurveFitter as first parameter.

function calc_coeffs(cf::CurveFitter)
  println("Using Polynomials.fit\n")
  for order in 1:3
    calc_coeff(cf, order)
  end
end

Julia: Trend: Plot: Building Class with Struct

This method is very similar with previous function.

function calc_plot_all(cf::CurveFitter, y_col::Symbol)
  xs = cf.df[!, cf.x_col]
  ys = cf.df[!, cf.y_col]

  xp = range(minimum(xs), maximum(xs), length=100)
  yp1 = fit(xs, ys, 1).(xp)
  yp2 = fit(xs, ys, 2).(xp)
  yp3 = fit(xs, ys, 3).(xp)

  scatter(xs, ys,
    label="Data Points")
  plot!(xp, yp1, color=:red,
    label="Linear Equation")
  plot!(xp, yp2, color=:green,
    label="Fitted second-order polynomial")
  plot!(xp, yp3, color=:blue,
    label="Fitted third-order polynomial")

  xlabel!("X values")
  ylabel!("Y values")
  title!("Polynomial Curve Fitting")

  savefig("16-poly-struct.png")
end

Julia: Trend: Plot: Building Class with Struct

Again, let’s gather all stuff together. After reading data from CSV file. We need to instantiate the class, and call any necessary method.

This can be done by defining a CurveFitter object. then calculate coefficients and plot all three series.

df = CSV.read("series.csv", DataFrame)
rename!(df, Symbol.(strip.(string.(names(df)))))

cf = CurveFitter(df, :xs, :ys3)
calc_coeffs(cf)
calc_plot_all(cf)

We can see the result as follows.

โฏ julia 16-poly-struct.jl
Using Polynomials.fit

Curve type for ys1: Linear
Coefficients (a, b):
        [4.0, 5.0]

Curve type for ys2: Quadratic
Coefficients (a, b, c):
        [3.0, 4.0, 5.0]

Curve type for ys3: Cubic
Coefficients (a, b, c, d):
        [2.0, 3.0, 4.0, 5.0]

Julia: Trend: Plot: Building Class with Struct

The plot result can be shown as follows:

Julia: Trend: Built-in Plot: Building Class

You can obtain the interactive JupyterLab in this following link:

Modularizing our work like this makes reuse and experimentation a breeze. Itโ€™s not just tidy. Itโ€™s practical.


What’s the Next Chapter ๐Ÿค”?

Weโ€™re back on our noble quest through the kingdom of Julia, armed with semicolons and sharp wits. This time, we’re building classes (a.k.a. types, because classes are so last season), exploring statistical properties, and flexing our UTF-8 muscles, to write equations that look like, they walked straight out of a LaTeX fashion show.

Why bother? Because writing ฮผ and ฯƒยฒ, instead of mean and variance doesn’t just make us look smarter, it reduces cognitive switching between notation and code. It’s like giving our brain better variable names, but in Greek.

Also, this approach bridges the gap , etween our code and the math in the textbooks. Fewer translation errors, fewer headaches. It’s like bringing a fluent interpreter, to a multilingual statistics conference.

As always, we balance rigor with readability. And sneak in a pun when least expected.

Consider continuing your Julia-verse adventures with ๐Ÿ‘‰ [ Trend - Language - Julia - Part Two ].