Where to Discuss?

Local Group

Preface

Goal: Exploring additional statistic properties not related to trend.

While trends often hog the spotlight in data analysis (they’re the drama queens of statistics), some lesser, known side characters, like variance, standard deviation, and friends, play quietly in the background and actually hold the story together.

In the context of trends, we’ll call these additional properties. They may not tell you where the data is going, but they do whisper whether it’s marching in a straight line, or chaotically dancing like it’s data prom night.

And here’s why it matters: Before we strap our statistical properties onto a machine-learning rocket, and launch it into our app, we should first test them with a solid mathematical model. Because if our code outputs nonsense, the issue might not be your coding, it could be that our math is drunk.

we can model these properties in a spreadsheet. Yes, Excel isn’t just for budget planning or tracking, who borrowed your flash drive in 2012. But before that, let’s make sure we understand the math that powers the formulas. Trust me: understanding the math is like reading the manual, before assembling IKEA furniture. Sure, it’s optional. But do you really want three leftover screws?

Stay tuned, this isn’t just academic. These properties will sneak their way, into nearly every corner of statistical analysis, from A/B testing to anomaly detection. Ignore them at our peril (or worse, our boss’s angry emails).


Equation

The trend might be the star of the show, but these supporting characters often steal the scene. Now that we’ve left the trend spotlight, it’s time to meet the less glamorous, but just as critical, statistical properties. Think of them as the data janitors, quiet, consistent, and the reason the whole system doesn’t fall apart.

Let’s start laying down the symbolic groundwork. Yes, it’s math time. But the kind of math that makes you look really smart in meetings. We begin with equation symbol, as our base of calculation.

Min, Max, Range

The Bounds of Data Dignity

Obvious, yes. But still foundational. Like socks. They don’t get attention unless missing.

Set-theoretically speaking, the max function can be express in set notation:

Pretty explanatory

Range gives us the first glance at data spread. A huge range might signal outliers, or just a sensor gone rogue.

Median

The Middle Manager of Our Data

No favoritism here, median doesn’t care about how wild the numbers are, just who’s in the middle. The median can be described as below:

Median resists drama. It stays cool even if our data includes a billionaire or a zero-dollar bank account.

Mode

Popularity Contest Winner

The most frequently appearing value. The celebrity of the dataset.

We need two steps to find the modus. First we have to find frequencies, then find the maximum frequencies, and finally get the value of that frequencies.

To find the frequencies of each unique value in a dataset, where I() is the indicator function, which equals 1 if the condition is true and 0 otherwise.

Step 1: Count how many times each value shows up.

Step 2: Find the biggest fanbase

Then we find the maximum frequency.

​

Or in the set notation as:

​

Final step: identify the actual winner(s).

Mathematically, we would find the mode value​ using this equation:

​

Or if you’re a fan of compact “math-as-a-one-liner” expressions:

​

If our dataset were a party, the mode is the person everyone’s talking to. Important for understanding dominant categories.

SEM (Standard Error of the Mean)

The Nervousness of the Mean

Think of the Standard Error of the Mean (SEM) as our data’s social anxiety. It tells us how much the mean might fluctuate with different samples.

The equation for SEM is:

​

Smaller SEM means more confidence in our average. Larger SEM? Time to re-evaluate our sampling or our life choices.

Kurtosis and Skewness

The Drama Analysts

These two detect asymmetry and “peakedness.” Skewness checks if our data is leaning left or right. Kurtosis checks if it’s just chill or obsessively spiky.

The Standard Error equation for both Kurtosis and Skewness can be tricky, and can be differ from one reference to another.

These moments (third and fourth) help assess normality. If skewness or kurtosis are off the charts, so is our assumption of normal distribution.

Standard Error of Kurtosis and Skewness

Measuring Your Measurement’s Wobble

Here’s where things get spicy. There’s more than one way to calculate these.

Simplified approximations:

The simplified approximation of Standard Error can be expressed as below:

Fancy-pants formulas:

While the complex Standard Error equation can be described as follows:

From StackOverflow ???

From stackoverflow I found out that, on different software the equation of Standard Error of gaussian kurtosis, can be expressed as follows.

Thanks to Howard Seltman, whose R script taught my spreadsheet to do calculus after midnight: I’m simply copying and pasting the code from here:

Note that above equation only applied for data that follow gaussian distribution.

Standard errors help us understand, whether our skewness or kurtosis values are meaningful, or just statistical noise dressed in fancy math.


Spreadsheet Sorcery

Statistics at Your Fingertips

A brief tale of numbers, built-in functions, and spreadsheet wizardry.

I have already prepare the built in formula for these statistics properties.

Worksheet Source

The Magical Artefact

🧙 “Behold, the Spreadsheet of Secrets!

The Excel file has been forged and uploaded. Tinker, test, or tear it apart. We’re holding a fully functional statistical toolbox:

Min, Max, Range

The Big Three

We begin with the three musketeers of statistical bounds: minimum, maximum, and range. They help define the battlefield. Telling us how far our data stretches. Range might not be sophisticated, but it’s honest and loud.

The formula is also obvious.

Trend: Worksheet: Min, Max, Range

These simple formulas can be summarized as follows:

properties formula
x min =MIN(x_sample)
y min =MIN(y_sample)
x max =MAX(x_sample)
y max =MAX(y_sample)
x range =MAX(x_sample)-MIN(x_sample)
y range =MAX(y_sample)-MIN(y_sample)

If our range is zero, either oour data is broken, or we’ve got a perfect flatline. Call a doctor or a mathematician

Median and Mode

The Democratic Center and the Popular Vote

When mean feels too mainstream.

Median resists outliers like a grumpy old professor ignoring student trends. Mode, on the other hand, is all about popularity, if it exists.

But here’s the catch: For datasets with all unique values, mode politely declines to show up, giving us a #VALUE! instead. Think of it as statistics saying “I don’t do groupies.

We can use built-in median() formula. But we should be careful while i uderstanding the mode() formula. For example for a completely unique data series, the result should be no mode, because all data frequency is 1. Then our beloved spreadsheet is right to give error as #VALUE1.

To fix this, use a helper column (e.g., x_freq), then channel our inner Excel sage. The solution is make a new helper column to calculate frequency. Let’s name the range as x_freq and y_freq, and we can use our excel/calc expertise to craft this formula:

=INDEX(y_sample, MATCH(MAX(y_freq), y_freq, 0))

Voilà!. Manual mode detection, spreadsheet-style.

The summary can be described as follows:

Trend: Worksheet: Median and Mode

properties formula
x median =MEDIAN(x_sample)
y median =MEDIAN(y_sample)
x mode =MODE(x_sample)
y mode =MODE(y_sample)
x mode (alternative) =INDEX(x_sample, MATCH(MAX(x_freq), x_freq, 0))
y mode (alternative) =INDEX(y_sample, MATCH(MAX(y_freq), y_freq, 0))
x SE Mean =STDEV.S(x_sample) / SQRT(COUNT(x_sample))
y SE Mean =STDEV.S(y_sample) / SQRT(COUNT(y_sample))

Kurtosis and Skewness

The Drama Queens of Distribution

When our data has a mood.

Skewness shows asymmetry, how lopsided your data is. Kurtosis checks whether our data is heavy-tailed (peaky) or chill (flat). It’s like asking: is our dataset more like, a volcano, a pancake, or somewhere in between?

Excel gives us KURT() and SKEW() to play with. But for the error bars (standard error), we will have to roll up our sleeves. Unfortunately no built-in formula for standard error. With equation above, we can craft our SE formula as follows:

Trend: Worksheet: Kurtosis and Skewness

properties formula
x kurtosis =KURT(x_sample)
y kurtosis =KURT(y_sample)
x skewness =SKEW(x_sample)
y skewness =SKEW(y_sample)
n =COUNT(x_sample)
SE kurtosis =SQRT((24*(n*(n-2)*(n-3)))/((n+1)*(n+3)*(n-1)^2))
SE skewness =SQRT((6n(n-1))/((n-2)(n+1)(n+3)))
SE kurtosis (gaussian) =SQRT(4*(n^2-1)*SE_s^2/((n-3)*(n+5))

When Excel doesn’t have the function we need, we become the function. It’s like statistical DIY, but with more square roots and fewer splinters.

It is fun to play with these two properties, if you understand the concept. We will also give you the visualization in this article.

A Word About Outliers

We love them, we fear them.

Outliers are the eccentrics of the dataset, possibly geniuses, possibly errors. We will give them their own stage in another article, because like all rebels, they deserve their spotlight.


Python Tools

Oonce we’ve got the average, the real fun begins.

Welcome to the second half of our statistical makeover session, this time with Python as our glamorous assistant. We’re diving into those less-talked-about but absolutely vital properties: median, mode, range, skewness, and kurtosis. Think of them as the supporting actors who steal the show, if the mean ever calls in sick.

Python Source

You don’t have to reinvent the bell curve, just clone the script:

Data Source

No hardcoding here. We like our data fresh and CSV-flavored:

Min, Max, Range

Like in any good soap opera, it’s all about the highs and lows, and the emotional distance in between.

import numpy as np

# Getting Matrix Values
pairCSV = np.genfromtxt("50-samples.csv",
  skip_header=1, delimiter=",", dtype=int)

# Extract x and y values from CSV data
x_observed = pairCSV[:, 0]
y_observed = pairCSV[:, 1]

# Number of data points
n = len(x_observed)

Python: Additional Statistical Properties: Data Series

We can calculate

# Calculate maximum, minimum, and range
x_max = np.max(x_observed)
x_min = np.min(x_observed)
x_range = x_max - x_min

y_max = np.max(y_observed)
y_min = np.min(y_observed)
y_range = y_max - y_min

# Output of maximum, minimum, and range
print('x (max, min, range) = '
 + f'({x_min:7.2f}, {x_max:7.2f}, {x_range:7.2f} )')
print('y (max, min, range) = '
 + f'({y_min:7.2f}, {y_max:7.2f}, {y_range:7.2f} )')
print()

Python: Additional Statistical Properties: Min, Max, Range

Min, max, and range help us understand the battlefield. How wide is the spread
Are we talking table tennis or intergalactic warfare?

Median and Mode

If the mean is a people-pleaser, the median is the unbothered introvert, and the mode is… the popular kid who may or may not exist.

We can find mode using statistics library.

import statistics

x_mode = statistics.mode(x_observed)
y_mode = statistics.mode(y_observed)

But Python isn’t always diplomatic. If there’s no repeating value, statistics.mode() might throw a tantrum (i.e., raise an error). So here’s a DIY method to calculate median:

We can implement above equation to find median:

def calc_median(data: np.array) -> float:
  # Sort the data
  sorted_data = np.sort(data)

  # Calculate the median
  n = len(sorted_data)
  if n % 2 == 1:
    # If odd number of data points
    median = sorted_data[n // 2]
  else:
    # If even number of data points
    median = (sorted_data[n // 2 - 1] \
           +  sorted_data[n // 2]) / 2
  
  return median

Python: Additional Statistical Properties: Median Calculation

And display.

# Calculate additional propeties
x_median = calc_median(x_observed)
y_median = calc_median(y_observed)

x_mode = statistics.mode(x_observed)
y_mode = statistics.mode(y_observed)

# Output of additional propeties
print(f'x median       = {x_median:9.2f}')
print(f'y median       = {y_median:9.2f}')
print(f'x mode         = {x_mode:9.2f}')
print(f'y mode         = {y_mode:9.2f}')
print()

Python: Additional Statistical Properties: Median and Mode

Pro tip for rebels: we can also utilize.

y_mode = np.argmax(np.bincount(y_observed))

This bypasses statistics.mode()’s sensitivities. You’re welcome.

Median is robust against outliers. Mode? Great for categorical or oddly-behaved data. Know our tools before we wield them.

Kurtosis and Skewness

Because data can be weird, lopsided, and prone to drama. Enter scipy.stat to calculate kurtosis and skewness.

from scipy.stats import kurtosis, skew

# Calculate kurtosis and skewness
x_kurtosis = kurtosis(x_observed, bias=False)
y_kurtosis = kurtosis(y_observed, bias=False)

x_skewness = skew(x_observed, bias=False)
y_skewness = skew(y_observed, bias=False)

print(f'x kurtosis     = {x_kurtosis:9.2f}')
print(f'y kurtosis     = {y_kurtosis:9.2f}')
print(f'x skewness     = {x_skewness:9.2f}')
print(f'y skewness     = {y_skewness:9.2f}')
print()

Python: Additional Statistical Properties: Kurtosis and Skewness

Skewness tells us if our data leans left, right, or politically neutral. Kurtosis tells us whether the tails are drama queens or chill.

Standard Error of Kurtosis and Skewness

We should craft our own method to get the standard errors.

def calc_se_kurtosis(n):
  return np.sqrt( \
    (24 * n * (n - 2) * (n - 3)) \
    / ((n + 1) * (n + 3) * (n - 1) ** 2))

def calc_se_skewness(n):
  return np.sqrt( \
    (6 * n * (n - 1)) \
    / ((n - 2) * (n + 1) * (n + 3)))

def calc_se_kurtosis_gaussian(n):
  return np.sqrt( \
    (4 * n**2 * calc_se_skewness(n)**2) \
    / ((n - 3) * (n + 5)))

Python: Additional Statistical Properties: Standard Error Calculation

Now deploy those formulas like a well-trained stats ninja:

# number of data points
x_n = len(x_observed)
y_n = len(y_observed)

# Calculate SE kurtosis and SE skewness
x_se_kurtosis = calc_se_kurtosis_gaussian(x_n)
y_se_kurtosis = calc_se_kurtosis_gaussian(y_n)
x_se_skewness = calc_se_skewness(x_n)
y_se_skewness = calc_se_skewness(y_n)

print(f'x SE kurtosis  = {x_se_kurtosis:9.2f}')
print(f'y SE kurtosis  = {y_se_kurtosis:9.2f}')
print(f'x SE skewness  = {x_se_skewness:9.2f}')
print(f'y SE skewness  = {y_se_skewness:9.2f}')
print()

Python: Additional Statistical Properties: Standard Error

Without standard errors, we’re just guessing in a lab coat. These formulas help us tell signal from noise, and wisdom from nonsense.

Output Result

Just like we practiced, here’s what it looks like when the numbers come home from school:

x (max, min, range) = (   0.00,   12.00,   12.00 )
y (max, min, range) = (   5.00,  485.00,  480.00 )

x median       =      6.00
y median       =    137.00
x mode         =      0.00
y mode         =      5.00

x kurtosis     =     -1.20
y kurtosis     =     -0.73
x skewness     =      0.00
y skewness     =      0.70

x SE kurtosis  =      1.19
y SE kurtosis  =      1.19
x SE skewness  =      0.62
y SE skewness  =      0.62

Python: Additional Statistical Properties: Output Result

📈 Translation:

  • Our x data is symmetrical and well-behaved—basically a textbook student.
  • y is a little skewed and platykurtic (a fancy way of saying it’s allergic to drama).
  • But nothing’s too wild. Standard errors confirm: we’re in safe territory.

Interactive JupyterLab

Want to poke and prod these stats like a true data scientist? Try it live:


Properties Visualization

Visualizing Descriptive Stats Like a Statistician With a Paintbrush

Welcome to the part where statistics come alive. Not just as numbers in a table, but as glorious visualizations. Think of this as the “Instagram filter” phase of our dataset. We’re going to draw lines, plot curves, and throw some color around like a statistician in art school.

Let’s turn those abstract stat properties into actual plots. Python and matplotlib are our brushes. Our dataset is the canvas. Picasso would approve.

Min, Max, Range, Median and Mode

We’ll begin by scattering the important descriptive stats. But horizontally, like a minimalist’s bookshelf.

# Add horizontal lines for properties
plt.axhline(y_median, c='r', ls='--',
  label=f'Median: {y_median}')
plt.axhline(y_mean,   c='g', ls='--',
  label=f'Mean: {y_mean:.2f}')
plt.axhline(y_mode,   c='m', ls='--',
  label=f'Mode: {y_mode}')
plt.axhline(y_min,    c='c', ls='--',
  label=f'Min: {y_min}')
plt.axhline(y_max,    c='y', ls='--',
  label=f'Max: {y_max}')

Python: Visualization: Statistic Properties: Horizontal Line

The result of the plot can be visualized as below:

Python: Plot: Statistic Properties: Horizontal Line

Try it yourself:

These lines help us see where our data hangs out. Is the mean close to the median? Is the mode just photobombing? Horizontal stats lines = instant data vibes.

Histogram and Distribution Curve

Time to check how our data is distributed. A histogram shows the crowd, while the curve shows how it’s supposed to behave, in an ideal world. Spoiler: it rarely does.

You can create a histogram, and overlay a normal distribution curve on top of it.

# Calculate skewness and kurtosis of y
y_skewness = np.mean(
  ((y_observed - y_mean) / y_std) ** 3)
y_kurtosis = np.mean(
  ((y_observed - y_mean) / y_std) ** 4)

# Create histogram
plt.hist(y_observed, bins=10,
  density=True, alpha=0.6, color='cyan')

# Create range for x values
x_range = np.linspace(
  min(y_observed), max(y_observed), 100)

y_dist = norm.pdf(x_range, y_mean, y_std)

# Plot normal distribution curve
plt.plot(x_range, y_dist, color='blue',
  label='Normal Distribution')

Python: Visualization: Statistic Properties: Histogram

The result of the plot can be visualized as below:

Python: Distribution Curve Plot: Histogram and Distribution Curve

You can obtain the interactive JupyterLab in this following link:

Histograms show real data. The curve? That’s the stat professor’s dream. The comparison tells us how rebellious our data is.

Revisited: Min, Max, Range, Median and Mode

Same plot party, but now we flip everything 90 degrees. Using scatter plot we can put the statistic properties, but this time in vertical line.

# Add vertical lines for properties
plt.axvline(y_median, c='r', ls='--',
  label=f'Median: {y_median}')
plt.axvline(y_mean,   c='g', ls='--',
  label=f'Mean: {y_mean:.2f}')
plt.axvline(y_mode,   c='m', ls='--',
  label=f'Mode: {y_mode}')
plt.axvline(y_min,    c='c', ls='--',
  label=f'Min: {y_min}')
plt.axvline(y_max,    c='y', ls='--',
  label=f'Max: {y_max}')

Python: Visualization: Statistic Properties: Vertical Line

The result of the plot can be visualized as below:

Python: Distribution Curve Plot: Statistic Properties: Vertical Line

You can obtain the interactive JupyterLab in this following link:

Kurtosis and Skewness

Want to really see what tailedness and tiltedness look like? This is the grand performance. Yes, it’s experimental. Yes, it’s beautiful. No, I still don’t fully understand the shape parameter.

With histogram, we can also overlay a distribution curve, with applied kurtosis and skewness.

First we need to calculate the corresponding y-values for the standard normal distribution. Then we need to adjust the shape parameter manually to achieve the desired kurtosis We may need to experiment with different values to get closer to the desired kurtosis. Then calculate the corresponding y-values for skewnorm distribution with given skewness and adjusted shape parameter. Finally plot both normal distribution curve, and skewnorm distribution.

y_standard = norm.pdf(x_range, y_mean, y_std)

shape_param = 2

y_ks = skewnorm.pdf(x_range, a=y_skewness,
  loc=y_mean, scale=y_std / shape_param)

plt.plot(x_range, y_standard,
  label='Standard Normal')

plt.plot(x_range, y_ks, ls='-.',
  label='With Kurtosis and Skewness')

Python: Visualization: Statistic Properties: Kurtosis and Skewness

The result of the plot can be visualized as below:

Python: Distribution Curve Plot: Kurtosis and Skewness

You can obtain the interactive JupyterLab in this following link:

Actually, I’m not sure if I get the visual right. I still don’t know how this shape parameter works. I may have plotted something artistic instead of accurate. If your plot starts to resemble abstract art, please consult your local statistician. Especially one who’s into distribution curves.

Density with Seaborne

Pretty Matters

Let’s end with dessert. Seaborn gives us plots so pretty they could be wall art. It combines histogram, KDE, and rug plot into a beautiful statistical parfait. Seaborne can make a really pretty chart.

# Plot distribution of y with annotations
sns.set(style="whitegrid")
sns.displot(y_observed,
  bins=10, kde=True, rug=True)

Python: Visualization: Statistic Properties: Density with Seaborne

The result of the plot can be visualized as below:

Python: Seaborne Plot: Density with Seaborne

You can obtain the interactive JupyterLab in this following link:

Clean, informative, beautiful. All in one plot. Like the pie chart’s cooler cousin who went to design school.


What’s the Next Chapter 🤔?

A brief intermission before the next scatterplot.

So far, matplotlib has been like that trusty old wrench. Reliable, precise, and a bit clunky when it comes to aesthetics. It gets the job done, but sometimes, you want more… pizzazz.

Enter Seaborn, the statistics nerd’s favorite artist. Think of it as matplotlib after a makeover, easier to use, and with a deep love for statistical plots.

We’re about to crank up the visuals again, this time with Seaborn’s pre-built plots designed for statistics. It’s like switching from hand-tooling our car to, using a diagnostic scanner with a touchscreen. Same insights, way more fun.

Great visualization tools reduce friction. Less time tweaking, more time interpreting. Or sipping our coffee while the plot impresses the team.

So buckle up, because the next chapter will unlock even prettier, more intuitive ways to visualize our dataset’s story. And don’t forget, seaborn library is specifically made for statistics.

When we’re feeling ready to evolve, from matplotlib sketches to Seaborn artistry, head on over to: 🔗 [ Trend - Visualization - Seaborn ].