Preface
Goal: A quick glance to the open source PSPPire.
We’ve built spreadsheets and Python scripts to conquer statistics. Now let’s double-check our results with a bona fide statistical application. PSPPire gives us a full GUI on top of PSPP’s trusty terminal interface. Think of it as putting a tuxedo on a spreadsheet, same reliable engine, plus a sleek dashboard.
Verifying results across tools helps catch slip-ups. If Excel, Python, and PSPPire all agree, our confidence in those numbers just took a joyride.
1: Using PSPPire
Getting started feels surprisingly familiar once we spot the menus.
User Interface
By default PSPP lives in the terminal. It’s lean and mean but can feel like driving a rally car with no windshield.
We need PSPPire to run the GUI (graphical user interface) Enter PSPPire for point-and-click joy. All the features of PSPP dressed up in windows and buttons.
A GUI reduces the learning curve. We spend fewer brain cycles on syntax and more on interpreting p-values.
2: Import Data
Let’s bring in our trusty CSV of (x, y) pairs.
We use the Import Data
menu and follow PSPPire’s prompts.
For example this CSV file
x, y
0, 5
1, 12
2, 25
3, 44
4, 69
5, 100
6, 137
7, 180
8, 229
9, 284
10, 345
11, 412
12, 485
File โ Import Data โ CSV
We point PSPPire at 50-samples.csv.
We need to follow required procedure, such as selecting rows.
Select Data Start
We skip the header row and pick line 2 as first case.
Choose Delimiter
Separator.
A comma tells PSPPire where one number ends and the next begins.
Define Variables
Variable format.
We assign x F2.0 and y F3.0 formats, enough precision for point plots.
Data View
Once complete, the Data View shows our numbers in a grid.
PSPPire also spits out the equivalent PSPP command syntax in the Output Viewer:
GET DATA
/TYPE=TXT
/FILE="/home/epsi/50-samples.csv"
/ARRANGEMENT=DELIMITED
/DELCASE=LINE
/FIRSTCASE=2
/DELIMITERS=","
/VARIABLES=
x F2.0
y F3.0.
The output above is the command line.
Having the generated command gives us reproducibility, and a behind-the-scenes peek at PSPP’s underbelly. If we ever need to automate or script a batch of files, we already have the template.
3: Frequency Analysis
With our data safely in PSPPire, we can summon a full suite of descriptive statistics in just a few clicks. Now you can do analysis easily.
Running Frequencies
Analysis โ Frequencies
We select variables x and y to analyze, using analysis menu.
Dialog Settings
We check options for tables and all statistics. Then click OK.
Output Viewer
In an instant PSPPire serves up our results in the Output Viewer.
Behind the Scenes: PSPP Syntax
PSPPire graciously shows us the equivalent command syntax, so we know exactly what happened under the hood:
FREQUENCIES
/VARIABLES= x y
/FORMAT=AVALUE TABLE
/STATISTICS=ALL.
Seeing the command ensures our analysis is reproducible. We can save or tweak it later for batch processing, no mystery clicks required.
The Table of Truth
PSPPire lays out all the key properties we calculated earlier. Here’s a condensed view:
Statistics
โญโโโโโโโโโโฌโโโโโโฌโโโโโโโโโฎ
โ โ x โ y โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โN Valid โ 13โ 13โ
โ Missingโ 0โ 0โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โMean โ 6.00โ 179.00โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โS.E. Meanโ 1.08โ 44.52โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โMedian โ 6.00โ 137.00โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โMode โ 0โ 5โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โStd Dev โ 3.89โ 160.52โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โVariance โ15.17โ25768.17โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โKurtosis โ-1.20โ -.73โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โS.E. Kurtโ 1.19โ 1.19โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โSkewness โ .00โ .70โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โS.E. Skewโ .62โ .62โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โRange โ12.00โ 480.00โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โMinimum โ 0โ 5โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โMaximum โ 12โ 485โ
โโโโโโโโโโโผโโโโโโผโโโโโโโโโค
โSum โ78.00โ 2327.00โ
โฐโโโโโโโโโโดโโโโโโดโโโโโโโโโฏ
In one neat table we confirm sample size, missing data, central tendency dispersion shape and range. It’s our statistical Swiss army knife.
Verify Excel
Cross-Tool Verification
You may compare the result, with our previous calculation with Excel.
Our built-in formulas produced identical means medians variances and so on.
Verify Python
Cross-Tool Verification
You may compare the result, with our previous calculation with Python.
Our numpy
scipy
script echoed the same results:
x (max, min, range) = ( 0.00, 12.00, 12.00 )
y (max, min, range) = ( 5.00, 485.00, 480.00 )
x median = 6.00
y median = 137.00
x mode = 0.00
y mode = 5.00
x kurtosis = -1.20
y kurtosis = -0.73
x skewness = 0.00
y skewness = 0.70
x SE kurtosis = 1.19
y SE kurtosis = 1.19
x SE skewness = 0.62
y SE skewness = 0.62
When three independent tools agree we can trust our numbers. Discrepancies would signal a bug or a typo.
4: Linear Regression Analysis
We can harness PSPPire to run a full linear regression in a few clicks. Let us see how our GUI stacks up against spreadsheets and scripts.
Running the Regression
Analysis โ Regression โ Linear
Again, fill the necessary dialog.
We assign x
as predictor and y
as dependent. Then click OK.
Output Viewer
PSPPire serves up tables for model summary, ANOVA, and coefficients. No calculator required.
A built-in regression procedure frees us from manual formula work, and ensures consistency with standard statistical methods.
PSPP Command Syntax
PSPPire logs the equivalent command so we can script this later:
REGRESSION
/VARIABLES= x
/DEPENDENT= y
/METHOD=ENTER
/STATISTICS=COEFF R ANOVA.
Seeing the command offers reproducibility. We can batch process multiple datasets by tweaking one script.
Model Summary
PSPPire’s Model Summary matches our earlier calculations, for R, Rยฒ, adjusted Rยฒ, and standard error of estimate:
Model Summary (y)
โญโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ R โR SquareโAdjusted R SquareโStd. Error of the Estimateโ
โโโโโผโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ.97โ .94โ .94โ 40.47โ
โฐโโโโดโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
This table tells us how well x explains y. An Rยฒ of 0.94 means 94% of variance in y is captured by our linear model.
ANOVA Table
Next PSPPire gives us ANOVA details: sums of squares, degrees of freedom, mean squares, F statistic, and significance:
ANOVA (y)
โญโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโฌโโโโโโโโโโโโฌโโโโโโโฌโโโโโฎ
โ โSum of SquaresโdfโMean Squareโ F โSig.โ
โโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโผโโโโโโโโโโโโผโโโโโโโผโโโโโค
โRegressionโ 291200.0โ 1โ 291200.0โ177.78โ.000โ
โResidual โ 18018.00โ11โ 1638.00โ โ โ
โTotal โ 309218.0โ12โ โ โ โ
โฐโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโดโโโโโโโโโโโโดโโโโโโโดโโโโโฏ
The huge F value and p-value < 0.001 confirm that, our regression model is statistically significant.
Coefficients
Finally we see the unstandardized and standardized coefficients, their standard errors, t-values, and significance:
Coefficients (y)
โญโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโฌโโโโโฎ
โ โ Unstandardized CoefficientsโStandardized Coefficientsโ โ โ
โ โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโค โ โ
โ โ B โ Std. Error โ Beta โ t โSig.โ
โโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโผโโโโโค
โ(Constant)โ -61.00โ 21.21โ .00โ-2.88โ.014โ
โx โ 40.00โ 3.00โ .97โ13.33โ.000โ
โฐโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโดโโโโโฏ
Coefficient B = 40 tells us that, for each one-unit increase in x, y, increases by 40 on average.
The intercept โ61 and its p-value let us judge, whether the line truly crosses zero meaningfully.
Verify Excel
Cross-Tool Verification
You may compare the result, with our previous calculation with excel and python. Our tabular worksheet yielded identical slope, intercept, t-value, and standard errors.
And also the right part of the worksheet.
Verify Python
Cross-Tool Verification
You may compare the result,
with our previous calculation with Python.
Our numpy
/scipy
calculations match PSPPire’s output:
n = 13
โx (total) = 78.00
โy (total) = 2327.00
xฬ (mean) = 6.00
yฬ (mean) = 179.00
โ(xแตข-xฬ) = 0.00
โ(yแตข-yฬ) = 0.00
โ(xแตข-xฬ)ยฒ = 182.00
โ(yแตข-yฬ)ยฒ = 309218.00
โ(xแตข-xฬ)(yแตข-yฬ) = 7280.00
m (slope) = 40.00
b (intercept) = -61.00
Equation y = -61.00 + 40.00.x
sโยฒ (variance) = 14.00
syยฒ (variance) = 23786.00
covariance = 560.00
sโ (std dev) = 3.74
sy (std dev) = 154.23
r (pearson) = 0.97
Rยฒ = 0.94
SSR = โฯตยฒ = 18018.00
MSE = โฯตยฒ/(n-2) = 1638.00
SE(ฮฒโ) = โ(MSE/sโ) = 3.00
t-value = ฮฒฬ
โ/SE(ฮฒโ) = 13.33
Agreement across Excel, Python, and PSPPire confirms, that no computations are off. We can trust these regression estimates.
5: Polynomial Regression Analysis
Using Shell
After getting comfortable with PSPPire, let’s go full command line, trade the GUI for a keyboard only experience. The PSPP terminal gives us more flexibility, than the point-and-click world. And yes, we can run polynomial regression right here in the shell.
Data Source
Let’s try a new dataset.
xs, ys1, ys2, ys3
0, 5, 5, 5
1, 9, 12, 14
2, 13, 25, 41
3, 17, 44, 98
4, 21, 69, 197
5, 25, 100, 350
6, 29, 137, 569
7, 33, 180, 866
8, 37, 229, 1253
9, 41, 284, 1742
10, 45, 345, 2345
11, 49, 412, 3074
12, 53, 485, 3941
We’ll focus on xs
and ys3
,
but feel free to explore the other columns too.
They’re not just filler, theyโre backup dancers.
Import Data
Here’s how we get our data into PSPP shell. The command is simple, nothing fancy. This time we have four columns.
GET DATA
/TYPE=TXT
/FILE="/home/epsi/series.csv"
/ARRANGEMENT=DELIMITED
/DELCASE=LINE
/FIRSTCASE=2
/DELIMITERS=","
/VARIABLES=
xs F2.0
ys1 F2.0
ys2 F3.0
ys3 F4.0.
Statistic Properties
Check the Metrics
Before we dive into curves, let’s get a sense of scale. This time only what we need.: Mean, standard deviation, min, max.
FREQUENCIES
/VARIABLES=xs ys3
/FORMAT=NOTABLE
/STATISTICS=MEAN STDDEV MIN MAX.
With the result as below:
Statistics
โญโโโโโโโโโโฌโโโโโฌโโโโโโโโฎ
โ โ xs โ ys3 โ
โโโโโโโโโโโผโโโโโผโโโโโโโโค
โN Valid โ 13โ 13โ
โ Missingโ 0โ 0โ
โโโโโโโโโโโผโโโโโผโโโโโโโโค
โMean โ6.00โ1115.00โ
โโโโโโโโโโโผโโโโโผโโโโโโโโค
โStd Dev โ3.89โ1296.44โ
โโโโโโโโโโโผโโโโโผโโโโโโโโค
โMinimum โ 0โ 5โ
โโโโโโโโโโโผโโโโโผโโโโโโโโค
โMaximum โ 12โ 3941โ
โฐโโโโโโโโโโดโโโโโดโโโโโโโโฏ
Polynomial Regression Coefficient
Quadratic
To compute polynomial regression,
we need to define xsยฒ
helper variable first.
This turns our linear regression into something a bit curvier.
COMPUTE xs_squared = xs * xs.
REGRESSION
/VARIABLES=xs xs_squared
/DEPENDENT=ys3
/METHOD=ENTER
/STATISTICS=COEFF.
And we get:
Coefficients (ys3)
โญโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโฌโโโโโฎ
โ โ Unstandardized CoefficientsโStandardized Coefficientsโ โ โ
โ โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโค โ โ
โ โ B โ Std. Error โ Beta โ t โSig.โ
โโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโผโโโโโค
โ(Constant)โ 137.00โ 65.22โ .00โ 2.10โ.060โ
โxs โ -162.00โ 25.25โ -.49โ-6.42โ.000โ
โxs_squaredโ 39.00โ 2.03โ 1.46โ19.23โ.000โ
โฐโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโดโโโโโฏ
It matches our results in Excel/Calc and Python. Harmony across platforms, every data analyst’s dream.
Cubic
Let’s go one degree hotter, cubic regression with xsยฒ
and xsยณ
.
Add a third power term and suddenly we’re modeling rollercoasters.
COMPUTE xs_squared = xs * xs.
COMPUTE xs_cubed = xs * xs * xs.
REGRESSION
/VARIABLES=xs xs_squared xs_cubed
/DEPENDENT=ys3
/METHOD=ENTER
/STATISTICS=COEFF.
And here’s the result:
Coefficients (ys3)
โญโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโฌโโโโโฎ
โ โ Unstandardized CoefficientsโStandardized Coefficientsโ โ โ
โ โโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโค โ โ
โ โ B โ Std. Error โ Beta โ t โSig.โ
โโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโผโโโโโค
โ(Constant)โ 5.00โ .00โ .00โ+Infinitโ.000โ
โxs โ 4.00โ .00โ .01โ+Infinitโ.000โ
โxs_squaredโ 3.00โ .00โ .11โ+Infinitโ.000โ
โxs_cubed โ 2.00โ .00โ .88โ+Infinitโ.000โ
โฐโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโฏ
Still consistent with our Excel/Calc and Python results.
Do you think that the variable format is very similar to LINEST
formula?
Do you think that this similarity has to do with design matrix?
That’s the power of methodical math. We can call this replication, but it feels more like statistical dรฉjร vu.
Beyond Regression
Easy peasy? Don’t be so sure.
If all we needed were the coefficients, we could stop here and call it a day. But PSPP (and its big sibling SPSS) are statistical beasts, with much more beneath the surface. We’ve only tiptoed into the shallow end.
There’s a world of diagnostics, assumptions, plots, and tests ahead. Think of this as checking your tire pressure before a race. Important, but far from the whole event.
Now that weโve done our part:
exit.
Stay humble, statisticians.
-
The more we explore, the clearer it becomes, we’ve only just touched the surface.
-
In the sea of statistics, every method we master reveals deeper waters ahead.
-
Moving from spreadsheets to statistical tools, we realize how vast the field truly is.
-
No matter how far we’ve come, there’s always more to learn.
Stay humble!
Whatโs Next for Our Curious Clipboard? ๐
We’ve poked around PSPPire, ran some regressions, and even peeked under the hood at those ANOVA pistons firing. But where do we steer our statistical engine next? Let’s break it downโthree lanes ahead.
Further Analysis
PSPPire Knows More Than It Shows
If we regularly wrestle with data, PSPP is not just a friend. It’s the nerdy lab partner who finishes our sentences with a p-value. PSPPire offers way more than we’ve explored here, from nonparametric tests to factor analysis, but diving into every nook would require another trilogy.
Let’s be honest. This article was never meant to be a full tour of PSPPire. Think of it as a strong cup of coffee, and a friendly push into the statistics playground. The rest? Explore when the data screams louder, or when our spreadsheet starts judging us.
Mastering the basics lets us decide, when to trust software and when to double-check. Knowing the tools means we’re never stuck, staring at an error bar like it’s an existential crisis.
The Curiosities We Skipped (But Might Revisit)
What’s left behind ๐ค?
Yes, we did regression. We danced with the line of best fit. We even toyed with quadratic and cubic forms, like high schoolers rewriting song lyrics into parabolas.
But we skipped a few gems:
-
What if we measured correlation, between two datasets on the same x-axis?
-
Or looked at how the fluctuation of one variable, predicts the fluctuation of another? Think of it as emotional intelligence for spreadsheets.
And we haven’t touched splines. Because honestly, splines are the deep-fried snacks of stats. Delicious but hard to digest without proper guidance.
Regression is just the beginning. Real-world data is rarely linear and never polite. These advanced approaches let us understand, when data moves together, when it rebels, and when it just wants to be left alone.
Other Languages, Same Obsession
Sure, we’ve dabbled in Python. But the statistical family dinner is much bigger:
-
R: The grandmaster of stats. It comes with more packages than a post office in December.
-
Julia: Lightning-fast and math-savvy. For those who want to feel both trendy and efficient.
-
Go: For when we need stats to run inside a production-grade backend, and still sleep well at night.
Choosing a language is like picking a wrench. The goal isn’t to be fancy. It’s to make the data talk, and sometimes scream.
Feeling adventurous? Join us in the next article: ๐ [ Trend - Language - R - Part One ]