Where to Discuss?

Local Group

Preface

Goal: Goes further from linear regression to polynomial regression. The practical implementation, shows how to implement the theory in python code.

We’ve already built the theoretical scaffolding, and wrestled Excel into obedience. But real-world data is messy, expectations are high, and Ctrl+C only goes so far as the rows grow. It’s time to move from spreadsheets to scripts, from clicking to coding.

Now don’t get me wrong, spreadsheets are fantastic. They make us look brilliant in meetings, and our office admin can spot a trend, faster than our manager can say “Q4 report.” But let’s face it: they’re not built for scale, automation, or creative chaos.

That’s where Python enters, stage left. 🐍 With Python, we will:

  • Reproduce our polynomial models programmatically

  • Automate the boring stuff (goodbye, repetitive formula updates!)

  • Customize for edge cases (reality rarely follows your tidy cell ranges)

When your data hits 10,000 rows, we’ll want something more robust than Excel and a prayer.

So sharpen our Jupyter notebooks, warm up our terminal, and prepare for lift-off into, the statistically thrilling world of Polynomial Regression in Python.

Let’s bring theory to life, with code.


1: Statisctic Properties

Welcome to the part where we roll up our sleeves, and inspect the machinery inside polynomial regression. Not just admiring the curve, but poking at the math that put it there.

We’ll begin with minimalist tools (hello, numpy) before bringing in reinforcements from scipy and sklearn. This is the programmer’s equivalent of, starting with a torque wrench before switching to a power drill.

We are going to start from minimalist to solve this situation, using numpy manual calculation only, then utilize other library as well. Since most topic is already covered in previous section, we are going to skip the explanation, and goes to ready to use class, for practical application.

1: From Skeletons to Stats:

Building Blocks

This time we’re engineering a master-detail class setup, with CurveFitting orchestrating the show, and PolynomialOrderAnalyzer doing the dirty work, for linear, quadratic, and cubic fits. One main class, manage three kind of polynomial order.

import ...

class PolynomialOrderAnalyzer:
    ...

class CurveFitting:
    ...

def main() -> int:
    ...

if __name__ == "__main__":
    raise SystemExit(main())

A reusable, extensible codebase saves future-us, from copy-paste regret. And lets us test multiple fits with a single call.

Python: Statistical Properties: Skeleton

Yes, it’s verbose. But so is our favorite statistics professor, who often explain things in great depth.

2: CurveFitting

The Conductor

The first is the main CurveFitting class. This is the main controller. It stores data, prints general stats, and loops through each polynomial order.

class CurveFitting:
    def __init__(self, xs, ys : List[int]) -> None:
        ...

    def print_props_general(self) -> None:
        ...

    def process(self) -> None:
        ...

Polynomial fitting is only useful, if we understand how the curve behaves across different complexities. This lets us test and compare all three in one go.

3: PolynomialOrderAnalyzer

The Intern Who Does the Actual Work

Then comes the PolynomialOrderAnalyzer class to analyse each order.

class PolynomialOrderAnalyzer:
    COEFF_TEXTS = {1: '(a, b)', 2: '(a, b, c)', 3: '(a, b, c, d)'}
    ORDER_TEXTS = {1: 'Linear', 2: 'Quadratic', 3: 'Cubic'}

    def __init__(self, xs, ys: np.ndarray, order: int) -> None:
        ...
       
    def calc_props_matrix(self) -> None:
        ...

    def calc_props_mse(self) -> None:
        ...

    def calc_props_t_p_value(self) -> None:
        ...

This is where we compute the good stuff. The matrix math, error stats, and the ever-humble p-values.

4 : Required Imports

The Arsenal

We need to include a few import statements.

Python: Statistical Properties: Import Statement

import numpy as np
from numpy.polynomial import Polynomial
from scipy.stats import t
from sklearn.metrics import r2_score
from typing import List

You can see that we are going to use a few different library. Yes, scipy.stats.t is our friendly t-distribution. Like any good academic, it’s judgy and loves small p-values.

5: Data Series

From humble rows, mighty fits are born. For example using these series, you may choose ys₁, ys₂, or ys₃:

xs, ys1, ys2, ys3
0,  5,   5,   5
1,  9,   12,  14
2,  13,  25,  41
3,  17,  44,  98
4,  21,  69,  197
5,  25,  100, 350
6,  29,  137, 569
7,  33,  180, 866
8,  37,  229, 1253
9,  41,  284, 1742
10, 45,  345, 2345
11, 49,  412, 3074
12, 53,  485, 3941

Python: Manual Calculation: Statistical Properties: CSV Source

Real-world data often bends and curves. Linear fits get nervous. Higher-order polynomials step in.

6: Entry Point

Where the Show Begins

The entry point is simply as below:

if __name__ == "__main__":
    raise SystemExit(main())

While the main function itself is just CSV loader, featuring column chooser.

Python: Statistical Properties: Entry Point: Main

def main() -> int:
    # Getting Matrix Values
    mCSV = np.genfromtxt("series.csv",
      skip_header=1, delimiter=",", dtype=float)
    mCSVt   = np.transpose(mCSV)

    example = CurveFitting(mCSVt[0], mCSVt[3])
    example.process()

    return 0

Yes, we pick a column and unleash math on it.

7: CurveFitting: Initialization

We just need to convert he list to np.array, for further use.

Python: Curve Fitting: Initialization

class CurveFitting:
    def __init__(self, xs, ys : List[int]) -> None:
        # Given data
        self.xs = np.array(xs)
        self.ys = np.array(ys)

8: CurveFitting: General Properties

ȳ and ∑(yᵢ - ȳ)²

Then print general properties.

Python: Curve Fitting: General Properties

    def print_props_general(self) -> None:
        y_mean = np.mean(self.ys)
        self.y_sq_deviation = np.sum((self.ys-y_mean) ** 2)
        print('General')
        print(f'\tȳ (mean)   : {y_mean:15,.2f}')
        print(f'\t∑(yᵢ-ȳ)²   : {self.y_sq_deviation:15,.2f}')
        print()

Python: Manual Calculation Result: Statistical Properties: General

Before fitting a curve, we need to know how wild the data is. This is our baseline. The “no model” chaos.

9: CurveFitting: Process

Loop Through Orders

Here we print all the properties.

Python: Curve Fitting: Execute Process

    def process(self) -> None:
        # Print Statistical Properties
        self.print_props_general()
        
        for order in [1, 2, 3]:
            case = PolynomialOrderAnalyzer(self.xs, self.ys, order)
            case.calc_props_matrix()
            case.calc_props_mse()
            case.calc_props_t_p_value()
            print()

Not all curves are created equal. Some just overfit and lie. This loop checks who’s telling the truth.

10: Polynomial Order Analyzer: Initialization

Here we take care of each polynomial order.

Python: Polynomial Order Analyzer: Initialization

class PolynomialOrderAnalyzer:
    COEFF_TEXTS = {1: '(a, b)', 2: '(a, b, c)', 3: '(a, b, c, d)'}
    ORDER_TEXTS = {1: 'Linear', 2: 'Quadratic', 3: 'Cubic'}

    def __init__(self, xs, ys: np.ndarray, order: int) -> None:
        self.xs = xs
        self.ys = ys
        self.order = order
        self.df = len(self.xs) - order -1

        self.mA   = None
        self.mC   = None
        self.yp   = None
        self.diag = None
        self.MSE  = None

        # Display
        self.coeff_text = self.COEFF_TEXTS[order]
        self.order_text = self.ORDER_TEXTS[order]

11: Properties Calculation: Matrix

Design and Coefficients

The actual matrix calculation is short.

Python: Polynomial Order Analyzer: Matrix Properties

    def calc_props_matrix(self) -> None:
        # Perform regression using polynomial.
        poly = Polynomial.fit(self.xs, self.ys, deg=self.order)
        self.mC = poly.convert().coef
        self.yp = poly(self.xs)

        # Matrix Operation
        self.mA = np.flip(np.vander(self.xs, self.order+1), axis=1)
        mAt   = self.mA.T
        mAt_A = mAt @ self.mA # gram matrix
        mI   = np.linalg.inv(mAt_A)
        self.diag = np.diag(mI)

        print(f'Using polyfit : {self.order_text}')
        print(f'Coefficients  : {self.coeff_text}: '
            + f'{self.mC}')

Python: Manual Calculation Result: Statistical Properties: Matrix

But for broader overview, we have different kind of calculations. I commented the alterantive calculation.

Design Matrix A

        # Design Matrix A
        # mA_1 = np.column_stack([np.ones_like(self.xs), self.xs, self.xs**2])
        # mA_2 = np.column_stack([self.xs**j for j in range(self.order+1)])  # Design matrix
        self.mA = np.flip(np.vander(self.xs, self.order+1), axis=1)

Matrix Operation

        mB = self.ys
        mAt   = self.mA.T
        mAt_A = mAt @ self.mA # gram matrix
        mAt_B = mAt @ mB 
        # mC_1 = np.linalg.solve(mAt_A, mAt_B)
        # mC_2 = np.polyfit(self.xs, self.ys, deg=order)
        # mC_3 = np.linalg.inv(self.mA.T @ self.mA) @ (self.mA.T @ mB)
        # mC_4 = np.linalg.lstsq(self.mA, mB, rcond=None)[0]
        # print(mC_4)
        mI   = np.linalg.inv(mAt_A)
        self.diag = np.diag(mI)

Getting Prediction Series

        poly = Polynomial.fit(self.xs, self.ys, deg=self.order)
        self.mC = poly.convert().coef
        # yp_1    = np.polyval(mC, self.xs)
        self.yp = poly(self.xs)
        # y_pred_2 = self.mA @ mC
        # print(y_pred_2)

Print The Statistic Properties Header

        print(f'Using polyfit : {self.order_text}')
        print(f'Coefficients  : {self.coeff_text}: '
            + f'{self.mC}')

we don’t get meaningful t-tests, unless we’ve built the model matrix right. This is our math foundation.

12: Properties Calculation: MSE and R²

Mean Square Error

Are We Even Close?

This is how we calculate MSE manually.

Python: Polynomial Order Analyzer: Error Properties

    def calc_props_mse(self) -> None:
        # Calculate SST and SSE
        y_mean = np.mean(self.ys)
        SST = y_sq_deviation = np.sum((self.ys-y_mean) ** 2)
        SSR = np.sum((self.ys - self.yp) ** 2) # ∑ϵᵢ²
        R_squared = 1 - (SSR / SST)
        # r2 = r2_score(self.ys, yp)
        self.MSE = SSR/self.df

        print(f'\t∑(yᵢ - ŷᵢ)² : {SSR:15,.2f}')
        print(f'\tR²          : {R_squared:15,.4f}')
        print(f'\tMSE         : {self.MSE:15,.2f}')

Python: Manual Calculation Result: Statistical Properties: Mean Squared Error

R² is our curve’s bragging score. MSE is its humility check. Together, they balance hubris and usefulness.

r2_score()?

Note that we can calculate R² using r2_score() function from sklearn.metrics. But why can’t we calculate MSE using mean_squared_error() function from sklearn.metrics.

This is because of difference of degree freedom. This is due of the focus of the sklearn.metrics is not for the statistical properties, but designed for general use (e.g., evaluating predictions without knowing model internals). The DoF in sklearn.metrics use n as divisor, while we are using (n - k -1).

Degrees of freedom matter, and sklearn doesn’t care about our academic rigor.

13: Properties Calculation: t-value, p-value

The Statistical Lie Detector

The calculation using t distribution is simple.

Python: Polynomial Order Analyzer: t-value, p-value

    def calc_props_t_p_value(self) -> None:
        # t-value, p-value
        SE   = np.sqrt(self.MSE * self.diag)
        t_value = self.mC/SE
        p_value = 2 * (1 - t.cdf(abs(t_value), self.df))

But we need to convert the array inpit, so that this would have nice looks in output.

        diag_formatted = [f"{x:15,.4f}" for x in self.diag]
        SE_formatted   = [f"{x:15,.4f}" for x in SE]
        t_v_formatted  = [f"{x:15,.2e}" for x in t_value]
        p_v_formatted  = [f"{x:15,.10f}" for x in p_value]

Now we can finally print the result.

        print(f"diag    : [" + " ".join(diag_formatted) + "]")
        print(f"SE(β)   : [" + " ".join(SE_formatted) + "]")
        print(f"t_value : [" + " ".join(t_v_formatted) + "]")
        print(f"p_value : [" + " ".join(p_v_formatted) + "]")

Python: Manual Calculation Result: Statistical Properties: t-value, p value

This is where we ask, “Are these coefficients statistically significant, or just enthusiastic guesswork?


2: Simple Plot

After printing the stats (yes, all those glorious coefficients and R2R2 values), it’s time to do what every stats professor secretly lives for: draw a nice plot. But this time, we will give our chart a little more personality, and a lot more polynomial.

This time with emphasis of selected properties.

Skeleton

Anatomy of a Trend-Plotter

This plotting class is lean, and clean. We have only three new functions here:

Python: Matplotlib: Curve Fitting: Skeleton

class CurveFitting:
    def __init__(self, xs, ys : List[int]) -> None:
        ...

    def print_props_general(self) -> None:
        ...

    def calc_plot_all(self) -> None:
        ...

    def add_plot_text(self, plt) -> None:
        ...

    def plot_trend(self, plt) -> None:
        ...

    def draw_plot(self) -> None:
        ...

    def process(self) -> None:
        ...

Separating concerns in code is like labeling our axes. It saves our sanity later. Each method here has a clear purpose, just like each coefficient in a quadratic equation (even the one that always seems to vanish).

Prediction Calculation

A Gentle Squeeze from Chaos into Curve

From given points, it got us three beautiful, smooth curves: linear, quadratic, and cubic. No drama. The calculation is as simple as this one below:

Python: Matplotlib: Curve Fitting: Prediction Calculation

    def calc_plot_all(self) -> None:
        self.x_plot = xp = np.linspace(
            min(self.xs), max(self.xs), 100)

        # Calculate coefficients directly
        self.y1_plot = Polynomial.fit(self.xs, self.ys, deg=1)(xp)
        self.y2_plot = Polynomial.fit(self.xs, self.ys, deg=2)(xp)
        self.y3_plot = Polynomial.fit(self.xs, self.ys, deg=3)(xp)

A plot without prediction is like regression without a cause. These lines help us see how the model fits. Not just mathematically, but viscerally (cue dramatic music).

Plot Trend

The Curve Awakens

Time to let matplotlib strut its stuff. Scatter + curves = the stats professor’s version of abstract art. The plot is also the same as in previous section, but I put it in separate function.

Python: Matplotlib: Curve Fitting: Plot Trend

    def plot_trend(self, plt) -> None:
        # Scatter plot with Seaborn color
        sns.scatterplot(
            x=self.xs, y=self.ys, color=self.colors[0], 
            s=100, label='Data points', edgecolor='w', linewidth=0.5)

        # Polynomial curves with Seaborn colors
        plt.plot(self.x_plot, self.y1_plot, color=self.colors[1], 
                linewidth=2.5, label='Linear fit')
        plt.plot(self.x_plot, self.y2_plot, color=self.colors[2], 
                linewidth=2.5, label='Quadratic fit')
        plt.plot(self.x_plot, self.y3_plot, color=self.colors[3], 
                linewidth=2.5, label='Cubic fit')

        plt.legend(fontsize=10, framealpha=0.9)
        plt.xlabel('x', fontsize=12)
        plt.ylabel('y', fontsize=12)
        plt.title('Polynomial Regression with Statistics')
        plt.grid(True, linestyle='--', alpha=0.3)

        plt.tight_layout()

Visualizing each polynomial degree side by side, lets us compare their fit personalities. Linear is minimal, quadratic is flexible, and cubic is a drama queen. Pick our champion.

Main Plot Function

Bringing It All Together (With Style)

Now we can merge that trend plot into the main plot function. this way we can add additional plot later easily. This is where it all merges: scatterplot, curves, and stats-packed annotation box.

Python: Matplotlib: Curve Fitting: Plot Text

    def draw_plot(self) -> None:
        plt.figure()
        self.plot_trend(plt)
        self.add_plot_text(plt)

        plt.tight_layout()
        plt.show()

Notes that I add this weird line here. Yes, this line below may seem mysterious, but it’s the secret sauce:

        self.add_plot_text(plt)

Numbers are powerful, but annotations turn our chart into a conversation. Especially when our audience is tired and caffeinated.

Selected Properties

Stats in a Box

Sometimes, we want the plot to speak, specifically, to whisper something like:

_Hey, quadratic fit here._
_I’ve got an R²=0.928,_
_and a mean squared error of only 2.1._
_Just saying._

This line add a box for any text. For example, if we want to emphasis, the result of the second order polynomial, We can display the statistical properties easily.

Python: Matplotlib: Curve Fitting: Draw Plot

    def add_plot_text(self, plt) -> None:
        # Add statistics for quadratic fit (example)
        poly = Polynomial.fit(self.xs, self.ys, deg=2)
        mC = poly.convert().coef
        yp = poly(self.xs)
        SST = self.y_sq_deviation
        SSR = np.sum((self.ys - yp) ** 2)
        R_squared = 1 - (SSR / SST)
        MSE = SSR / (len(self.xs) - 3)  # deg=2 → 3 params (a, b, c)

        # Format equation and stats
        eqn = f"$y = {mC[0]:.2f} + {mC[1]:.2f}x + {mC[2]:.2f}x^2$"
        stats_text = (
            f"$R^2 = {R_squared:.3f}$\n"
            f"$MSE = {MSE:.2f}$\n"
        )

        # Place text on plot
        plt.text(
            0.05, 0.95, 
            f"{eqn}\n{stats_text}",
            transform=plt.gca().transAxes,
            ha='left', va='top',
            bbox=dict(facecolor='white', alpha=0.8)
        )

This turns our graph into a mini-report. When people ask, “So, how good is your model?”, we can just point at the chart smugly.

Execute The Process

The Grand Finale

Put it all together. This is the conductor cueing the orchestra. Then add the plot to the main process.

    def process(self) -> None:
        # Print Statistical Properties
        self.print_props_general()
        
        for order in [1, 2, 3]:
            case = PolynomialOrderAnalyzer(self.xs, self.ys, order)
            case.calc_props_matrix()
            case.calc_props_mse()
            case.calc_props_t_p_value()
            print()

        self.calc_plot_all()
        self.draw_plot()

If this method were a person, it would be your overachieving classmate, who not only finishes the assignment, but also builds a working demo with animated transitions.

The Result

Not Just a Chart, But a Conversation

Yes, it’s still a humble old scatter plot chart. But now it speaks. It tells us how well the curve fits, which model performs better, and whether our hypothesis has any statistical backbone.

Python: Matplotlib Result: Polynomial Prediction: Statistical Properties

Feedback

Got a different way to interpret the fit? Found a flaw in the logic? Or just want to argue that cubic fits are overkill 90% of the time? Let’s chat—data nerd to data nerd.

I welcome any other useful interpretation, or any feedback. If you think my visualization, or interpretation is wrong, please let me know.


3: Visualizing the Invisible

Coefficients

When our regression results are feeling a little too abstract, like a shy statistician at a party, it’s time to give them a visual voice. Let’s plot the statistical properties of our polynomial coefficients, because staring at numbers in a table is so last century.

I actually do not have any idea for the visualization of the statistical properties. But I think we can recap the coefficient result in a simple chart. so people can just read our result easily.

We’re talking standard errors, t-values, and p-values. Yes, those awkward guests at every stats gathering, let’s shine a spotlight on them.

Saving the Statistical Goods

Before we can show off our coefficients’ credentials, we need to save them. Think of this step as grabbing their résumé before printing it on a billboard.

Python: Visualization: Curve Fitting: t-value, p-value

class PolynomialOrderAnalyzer:
    ...

    def calc_props_t_p_value(self) -> None:
        ...

        return {
            'SE'      :  SE,
            't-value' :  t_value,
            'p-value' :  p_value
        }

These values tell us how trustworthy each coefficient is. They’re like GPA scores for your polynomial terms.

  • Big t-values? Great!
  • Tiny p-values? Even better!

And no, no need to add abs(t_value). Negative coefficients deserve love too. The direction in positive and negative is matter.

Statistic Plot

Bar Charts to the Rescue

Numbers can mumble, but plots shout. Let’s pick one polynomial order (say, degree 2), and visualize how each coefficient performs.

Python: Visualization: Curve Fitting: Comparing Statistic Properties

    def plot_stats(self, plt) -> None:
        # Statistics bar plots
        order = 2
        stats = self.stats[order]
        labels = [f"β{i}" for i in range(order+1)]

        def stat_bar(ax, values, title, color):
            ax.bar(labels, values, color=color)
            ax.set_title(title)
            ax.grid(True, axis='y', linestyle='--', alpha=0.3)

        ax1 = plt.subplot(1, 3, 1)
        stat_bar(ax1, stats['SE'], "Standard Error (SE)", 'orange')

        ax2 = plt.subplot(1, 3, 2)
        stat_bar(ax2, stats['t-value'], "t-values", 'skyblue')

        ax3 = plt.subplot(1, 3, 3)
        stat_bar(ax3, stats['p-value'], "p-values", 'salmon')

        # log scale to better show small p-values
        ax3.set_yscale('log')

These plots reveal which coefficients are reliable, and which are just freeloaders pretending to help the model. The p-values in particular help us judge, whether a coefficient is significantly different from zero, or just statistically loitering.

Main Plot Function

Drawing the Big Picture

Let’s blend everything together in the draw_plot() method, both the trend and the stats. Context is everything: without the trend line, those stats are just numbers in a vacuum.

Python: Visualization: Curve Fitting: Drawing Second Plot

    def draw_plot(self) -> None:
        # Plot 1: Trend + Equation
        plt.figure()
        self.plot_trend(plt)
        self.add_plot_text(plt)

        # Plot 2: Bar Plots
        plt.figure()
        self.plot_stats(plt)
        plt.show()

By combining the regression plot and its statistical underpinnings, we create a full narrative: “Here’s the curve, and here’s why we should believe it.”

Python: Matplotlib Result: Polynomial Prediction: Compare Statistical Properties

No, the chart doesn’t talk (yet), but it does whisper important truths: “This coefficient is legit,” or “This term might be nonsense.” And if our β₂ has a p-value that could legally be rounded to zero, that’s our cue to shout “Statistically significant!”, like a proud parent at graduation.

I’m still curious how to visualize further, for better visual metaphors in future posts. But this time I still have no idea. For now, these bar plots help turn statistical smoke into readable signal.


What’s the Next Exciting Step 🤔?

Now that we’ve tamed the coefficients, what’s next on this statistical safari?

Well, we’ve plotted the trend. We’ve grilled the coefficients for their p-values. We’ve even thrown their t-values onto a public chart for all to see. Now it’s time to turn our eyes toward another shady character in statistics:

📈 The Distribution Curve.

If our statistical properties are the “what,” the distribution plot is the “why.” It helps us see the shape of the data that justifies those values. It’s like finally meeting the parents after dating the regression line for a while.

Understanding the distribution helps us validate assumptions. Are the residuals normal? Are we modeling randomness or just noise with a fancy hat?

But before we go full Gauss, we’ll need to brush up on the basics of drawing distribution plots. So we can later connect them with the statistical interpretations we’ve just made.

Curious how this unfolds? Continue your exploration in 👉 [ Trend - Visualizing Distribution ]. Because even regression models deserve to know where they come from.