Where to Discuss?

Local Group

Preface

Goal: Visualizing interpretation of statistic properties, using python matplotlib.

We are ready for more usage of our statistic properties python helper, to give intepretation of those statistic properties, in visualization. I guess we can read statistic charts better, if we understand how to write them.

I welcome any other useful interpretation, or any feedback. If you think my visualization, or interpretation is wrong, please let me know.


Visualizing Interpretation

We can utilize the matplotlib to visualize the interpreation of statistic properties. Of course not everything can be visualized, some properties are just a number, without any need to be visualized at all.

You have seen some of the plot below in previous article. This article tell you how to make those plots. If you think my interpretation, or calculation is wrong, I welcome any better opinion.

Skeleton

Let’s use our previous Properties.py helper.

Instead of hardcoded data, we can setup the source data in CSV.

The plot using helper above have this skeleton below:

import matplotlib.pyplot as plt

# Local Library
from Properties import get_properties, display

properties = get_properties("50-samples.csv")
display(properties)
locals().update(properties)

def plot() -> int:
  ...

  return 0

if __name__ == "__main__":
  raise SystemExit(plot())

The script will return zero exit code, if everything goes well.

Python: Visual Plot: Statistical Properties: Skeleton

We can utilize this pattern for all our visualization.


Basic Data Series

We can start with basic data series:

We use scatter to plot the data series.

def plot() -> int:
  plt.figure(figsize=(10, 6))

  # Plot the data series
  plt.scatter(x_observed, y_observed, color='blue',
    s=100, label='Data Points')

Python: Visualization: Statistical Properties: Data Points

Then draw the mean as horizontal axis. And also the lines to show deviation from the mean, for each independent oberserved x.

  # Plot deviation from mean
  plt.axhline(y=y_mean, color='orange',
    linestyle='--',  label='Mean of y')
  plt.vlines(x_observed, y_observed, y_mean,
    linestyle='--', color='teal',
    label='Deviation from Mean (y)')

Python: Visualization: Statistical Properties: Deviation from Mean

In every chart plot, we define any decoration, such as label, legend, title and so on. The we show the plot.

def plot() -> int:
  ...

  # Chart Decoration
  plt.title('Mean and Deviation')
  plt.xlabel('x')
  plt.ylabel('y')
  plt.legend()
  plt.grid(True)

  plt.show()

  return 0

Python: Visualization: Statistical Properties: Decoration

And then we can plot this data points and mean (average), along with it’s (yáµ¢-yÌ„) interpretation.

Python: Plot Visualization: Statistical Properties: Data Points

Plotting is pretty simple, right?

Interactive JupyterLab

You can obtain the interactive JupyterLab in this following link:


Standard Deviation

Let’s continue to interpretation of standard deviation relative to mean. This is the interpretation when there is no information about linear regression at all.

I added my own color pallete to enhanced the color output. This pallete is using google material color.

blueScale = {
  0: '#E3F2FD', 1: '#BBDEFB', 2: '#90CAF9',
  3: '#64B5F6', 4: '#42A5F5', 5: '#2196F3',
  6: '#1E88E5', 7: '#1976D2', 8: '#1565C0',
  9: '#0D47A1'
}

Python: Visualization: Statistical Properties: Google Material Color

Now we can draw previous plot, but with nicer color output:

  # Plot the data series
  plt.scatter(x_observed, y_observed,
    color=blueScale[9], s=100, zorder=5,
    label='Data Points')

  # Plot deviation from mean
  plt.axhline(y=y_mean, color=blueScale[7],
    linestyle='--',  label='Mean of y')
  plt.vlines(x_observed, y_observed, y_mean,
    linestyle='--', color=blueScale[5],
    label='Deviation from Mean (y)')

Python: Visualization: Statistical Properties: Coloring

And append shadowed region to draw the standard deviation.

  # Plot shaded region for standard deviation
  plt.fill_between(x_observed,
    y_mean - y_std_dev, y_mean + y_std_dev,
    color=blueScale[1], alpha=0.3, zorder=1,
    label='Standard Deviation')

  # Plot covariance
  plt.text(x_mean, max(y_observed),
    f'Covariance: {xy_covariance:.2f}',
    fontsize=12, color=blueScale[9])

Python: Visualization: Statistical Properties: Fill Between: Standard Deviation

Now we can plot the interpretation of standard deviation relative to mean.

Python: Plot Visualization: Statistical Properties: Standard Deviation

With a simple touch, the plot is already looks better, right?

Interactive JupyterLab

You can obtain the interactive JupyterLab in this following link:


Mean and Standard Deviation

The above chart is not the only representation, we have other simpler interpretation. Not very pretty, looks naive, but clear.

First, plot the data points, and also both mean as axis x and axis y.

  plt.scatter(x_observed, y_observed,
    color='blue', label='Data Points')

  plt.axvline(x=x_mean, color='green',
    linestyle='--', label='Mean of x')
  plt.axhline(y=y_mean, color='orange',
    linestyle='--', label='Mean of y')

Python: Visualization: Statistical Properties: Mean as Axis X and Y

Then we can plot standard deviation as error bars.

  plt.errorbar(x_mean, y_mean,
    xerr=x_std_dev, yerr=y_std_dev,
    fmt='o', color='purple',
    label='Standard Deviation')

Python: Visualization: Statistical Properties: Errorbar: Standard Deviation

Now we can naively plot the interpretation of both mean and standard deviation in the middle of the chart.

Python: Plot Visualization: Statistical Properties: Mean as Axis

Not visually very useful, but now we know that this kind of interpretation exist.

Interactive JupyterLab

You can obtain the interactive JupyterLab in this following link:


Linear Regression

Next, we will plot linear regression based on our calculated least square.

  # Plot the data and regression line
  plt.scatter(x_observed, y_observed,
    color=tealScale[9], label='Data Points')
  plt.plot(x_observed, y_fit,
    color=tealScale[5], label='Regression Line')

Python: Visualization: Statistical Properties: Errorbar: Standard Deviation

I use teal color scale from google material color.

tealScale = {
  0: '#E0F2F1', 1: '#B2DFDB', 2: '#80CBC4',
  3: '#4DB6AC', 4: '#26A69A', 5: '#009688',
  6: '#00897B', 7: '#00796B', 8: '#00695C',
  9: '#004D40'
}

No interpretation this time, just plain chart:

  • Observed y, and
  • Predicted Å· = fit(x) as line.

Python: Plot Visualization: Statistical Properties: Linear Regression

Very simple. No Comment.

Interactive JupyterLab

You can obtain the interactive JupyterLab in this following link:


Residual

How about error (ϵ)? Of course we can draw using vlines.

  # Plot the data and regression line
  plt.scatter(x_observed, y_observed,
    color=blueScale[9], label='Data Points')
  plt.plot(x_observed, y_fit,
    color=blueScale[5], label='Regression Line')

  # Plot residual errors
  plt.vlines(x_observed, y_observed, y_fit,
  linestyle='--', color=blueScale[3],
  label='Residual')

Python: Visualization: Statistical Properties: Residual Error

The interpretation of residual or error (ϵ), is simple as shown in below plot:

Python: Plot Visualization: Statistical Properties: Residual Error

Simple, but enough to to interpret the (yáµ¢-Å·) difference.

Interactive JupyterLab

You can obtain the interactive JupyterLab in this following link:


Standard Deviation

How about interpretation of standard deviation relative to predicted values, of regression line?

First, we need to draw our data series, and the regression line.

  # Plot the data and regression line
  plt.scatter(x_observed, y_observed,
    color=tealScale[9], label='Data Points')
  plt.plot(x_observed, y_fit,
    color=tealScale[5], label='Regression Line')

Then plot standard deviation, on both above and below the curve fitting trend.

  plt.plot(x_observed, y_fit + y_std_dev,
    c=tealScale[1], linestyle='--')
  plt.plot(x_observed, y_fit - y_std_dev,
    c=tealScale[1], linestyle='--',
    label='Regression ± Standard Deviation')

Python: Plot Visualization: Statistical Properties: Standard Deviation

Then we fill a shaded region, between upper and lower bounds:

  plt.fill_between(x_observed,
  y_fit - y_std_dev, y_fit + y_std_dev,
  color=tealScale[1], alpha=0.3,
  label='Standard Deviation')

Python: Plot Visualization: Statistical Properties: Shaded Region

With settings above, we can plot the interpretation of standard deviation relative to curve fitting trend.

Python: Plot Visualization: Statistical Properties: Standard Deviation

This should be pretty cool.

Interactive JupyterLab

You can obtain the interactive JupyterLab in this following link:


Standard Error with Level of Confidence

It is basically the same with shaded region of standard deviation, but instead of just Standard Error, we can add confidence interval.

This confidence of interval can be predicted using OLS. For example, let’s use 95% confidence level, with the resul of approximately 1.96.

def get_CI() -> float:
  # Create regression line
  y_fit = m_slope * x_observed + b_intercept
  y_err = y_observed - y_fit

  # Calculate variance of residuals (MSE)
  var_residuals = np.sum(y_err ** 2) / (n - 2)

  SE = np.sqrt(var_residuals)

  # Calculate the confidence interval
  # for the predictions using 95% confidence
  return 1.96 * SE

Python: Plot Visualization: Statistical Properties: Standard Error

Then use this calculation to fill the standard region.

  # Fill between upper and lower bounds
  CI = get_CI()
  plt.fill_between(x_observed,
  y_fit - CI, y_fit + CI,
  color=tealScale[1], alpha=0.3,
  label='Standard Error')

Python: Plot Visualization: Statistical Properties: Shaded Region

With calculation of the confidence interval above, we can plot the interpretation of standard error of the curve fitting trend.

Python: Plot Visualization: Statistical Properties: Standard Error

Ultimately, the choice of your plot depends on the specific interpretation and communication goals of your analysis. You should contact your nearest statistician to get most valid visual interpretation.

Interactive JupyterLab

You can obtain the interactive JupyterLab in this following link:


Other Tools: Seaborn

Matplotlib is not the only tools. There is also this seaborn tools, that render really cool graphic.

Add the sns in import clause.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Getting Matrix Values
pairCSV = np.genfromtxt("50-samples.csv",
  skip_header=1, delimiter=",", dtype=float)

# Extract x and y values from CSV data
x_observed = pairCSV[:, 0]
y_observed = pairCSV[:, 1]

And this cool oneliner.

# Scatter plot with regression line
plt.figure(figsize=(8, 6))
sns.regplot(x=x_observed, y=y_observed)
plt.title('Scatter Plot with Regression Line')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()

Python: Plot Visualization: Seaborn: Regression Line

And you will get the plot instantly.

Python: Plot Visualization: Seaborn: Regression Line

By this time I write this article, I don’t know how the seaborn calculate the curve. So I refuse further exploration.

All I know is this plot is cool. And have pretty color pallete too.

Interactive JupyterLab

You can obtain the interactive JupyterLab in this following link:

At Last

I welcome any other useful interpretation. Or any feedback.


What’s the Next Exciting Step 🤔?

Since we also need to visualize the interpretation of statistics properties against the distribution plot curve, then we need to get the basic of making distribution plot curve.

Consider continuing your exploration by reading [ Trend - Visualizing Distribution ].