Where to Discuss?

Local Group

Preface

Goal: A quick glance to the open source PSPPire.

I would really like to explore PSPPire. PSPP is the open source version of SPSS.

We need to verify the result of our previous excel and python, with a real statistic application.


Using PSPPire

The first time using PSPPire is easy as long as you know this interface.

User Interface

The PSPP itself is just a TUI (terminal user interface).

PSPP: UI: Terminal User Interface

You need PSPPire to run the GUI (graphical user interface)

PSPP: UI: Graphical User Interface


Import Data

We can import our previous CSV series using Import Data menu.

For example this CSV file

x, y
0, 5
1, 12
2, 25
3, 44
4, 69
5, 100
6, 137
7, 180
8, 229
9, 284
10, 345
11, 412
12, 485

PSPP: Import CSV: Menu

We need to follow required procedure, such as selecting rows.

PSPP: Import CSV: Selecting Lines

This part, I choose the second line.

PSPP: Import CSV: Choose the first data

Separator.

PSPP: Import CSV: Separator

Variable format.

PSPP: Import CSV: Variable format

And you are done. You may switch to data view.

PSPP: Import CSV: Data View

There will be other window popping up. This is the output viewer.

GET DATA
  /TYPE=TXT
  /FILE="/home/epsi/50-samples.csv"
  /ARRANGEMENT=DELIMITED
  /DELCASE=LINE
  /FIRSTCASE=2
  /DELIMITERS=","
  /VARIABLES=
    x F2.0
    y F3.0.

PSPP: Import CSV: Output Viewer

The output above is the command line.


Frequency Analysis

Now you can do analysis easily.

PSPPire

You can start by choosing anaylisis menu.

PSPP: Analysis: Frequency: Menu

Fill the necessary dialog. Click OK.

PSPP: Analysis: Frequency: Dialog

And voila, the output viewer.

PSPP: Analysis: Frequency:

The output is similar with this below. The first text is the command line.

FREQUENCIES
	/VARIABLES= x y
	/FORMAT=AVALUE TABLE
	/STATISTICS=ALL.

Followed by tabular data of statistical properties.

        Statistics
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚         โ”‚  x  โ”‚    y   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚N Valid  โ”‚   13โ”‚      13โ”‚
โ”‚  Missingโ”‚    0โ”‚       0โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Mean     โ”‚ 6.00โ”‚  179.00โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚S.E. Meanโ”‚ 1.08โ”‚   44.52โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Median   โ”‚ 6.00โ”‚  137.00โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Mode     โ”‚    0โ”‚       5โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Std Dev  โ”‚ 3.89โ”‚  160.52โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Variance โ”‚15.17โ”‚25768.17โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Kurtosis โ”‚-1.20โ”‚    -.73โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚S.E. Kurtโ”‚ 1.19โ”‚    1.19โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Skewness โ”‚  .00โ”‚     .70โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚S.E. Skewโ”‚  .62โ”‚     .62โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Range    โ”‚12.00โ”‚  480.00โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Minimum  โ”‚    0โ”‚       5โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Maximum  โ”‚   12โ”‚     485โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Sum      โ”‚78.00โ”‚ 2327.00โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Verify Excel

You may compare the result, with our previous calculation with Excel.

Trend: Worksheet: Built-in Formula: Statistic

Verify Python

You may compare the result, with our previous calculation with Python.

x (max, min, range) = (   0.00,   12.00,   12.00 )
y (max, min, range) = (   5.00,  485.00,  480.00 )

x median       =      6.00
y median       =    137.00
x mode         =      0.00
y mode         =      5.00

x kurtosis     =     -1.20
y kurtosis     =     -0.73
x skewness     =      0.00
y skewness     =      0.70

x SE kurtosis  =      1.19
y SE kurtosis  =      1.19
x SE skewness  =      0.62
y SE skewness  =      0.62

Python: Additional Statistical Properties: Output Result


Linear Regression Analysis

You can also analyse linear regression.

PSPPire

Again, fill the necessary dialog. Click OK.

PSPP: Analysis: Linear Regression:

And again, voila, the output viewer.

PSPP: Analysis: Linear Regression:

The output is similar following text. The first text is the command line. Followed by tabular data.

REGRESSION
	/VARIABLES= x
	/DEPENDENT= y
	/METHOD=ENTER
	/STATISTICS=COEFF R ANOVA.

The first is model summary. You can see the table match our previous calculation.

                     Model Summary (y)
โ•ญโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ R โ”‚R Squareโ”‚Adjusted R Squareโ”‚Std. Error of the Estimateโ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚.97โ”‚     .94โ”‚              .94โ”‚                     40.47โ”‚
โ•ฐโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

The second is ANOVA. You can see the table match our previous calculation. With additional F-statistics.

                       ANOVA (y)
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ•ฎ
โ”‚          โ”‚Sum of Squaresโ”‚dfโ”‚Mean Squareโ”‚   F  โ”‚Sig.โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ค
โ”‚Regressionโ”‚      291200.0โ”‚ 1โ”‚   291200.0โ”‚177.78โ”‚.000โ”‚
โ”‚Residual  โ”‚      18018.00โ”‚11โ”‚    1638.00โ”‚      โ”‚    โ”‚
โ”‚Total     โ”‚      309218.0โ”‚12โ”‚           โ”‚      โ”‚    โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ•ฏ

The last is Coefficients. This is also match our calculation. Except that we haven’t calculate the standard error of constant yet.

                               Coefficients (y)
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ•ฎ
โ”‚          โ”‚ Unstandardized Coefficientsโ”‚Standardized Coefficientsโ”‚     โ”‚    โ”‚
โ”‚          โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค     โ”‚    โ”‚
โ”‚          โ”‚      B     โ”‚   Std. Error  โ”‚           Beta          โ”‚  t  โ”‚Sig.โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ค
โ”‚(Constant)โ”‚      -61.00โ”‚          21.21โ”‚                      .00โ”‚-2.88โ”‚.014โ”‚
โ”‚x         โ”‚       40.00โ”‚           3.00โ”‚                      .97โ”‚13.33โ”‚.000โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ•ฏ

Verify Excel

You may compare the result, with our previous calculation with excel and python.

Trend: Worksheet: Correlation: Tabular Worksheet

And also the right part of the worksheet.

Trend: Worksheet: Correlation: Tabular Worksheet

Verify Python

You may compare the result, with our previous calculation with Python.

n          =   13
โˆ‘x (total) =   78.00
โˆ‘y (total) = 2327.00
xฬ„ (mean)   =    6.00
yฬ„ (mean)   =  179.00

โˆ‘(xแตข-xฬ„)    =      0.00
โˆ‘(yแตข-yฬ„)    =      0.00
โˆ‘(xแตข-xฬ„)ยฒ   =    182.00
โˆ‘(yแตข-yฬ„)ยฒ   = 309218.00
โˆ‘(xแตข-xฬ„)(yแตข-yฬ„)  =   7280.00
m (slope)      =     40.00
b (intercept)  =    -61.00

Equation     y = -61.00 + 40.00.x

sโ‚“ยฒ (variance) =     14.00
syยฒ (variance) =  23786.00
covariance     =    560.00
sโ‚“ (std dev)   =      3.74
sy (std dev)   =    154.23
r (pearson)    =      0.97
Rยฒ             =      0.94

SSR = โˆ‘ฯตยฒ           =  18018.00
MSE = โˆ‘ฯตยฒ/(n-2)     =   1638.00
SE(ฮฒโ‚)  = โˆš(MSE/sโ‚“) =      3.00
t-value = ฮฒฬ…โ‚/SE(ฮฒโ‚) =     13.33

Python: Manual Calculation: Statistical Properties: Result Output


Further Analysis

If you work regularly with statistic, PSPP is your friend. There are a lot that can be done with PSPPire, but this is article is never meant to cover all PSPP feature.

I think that’s all with PSPPire.


What’s the Next Chapter ๐Ÿค”?

Beside python there is this R, Julia for statistical analysis. And also Go, so you can integrate with your application seamlessly.

Consider continuing your exploration with [ Trend - Language - R - Part One ].


What’s left behind ๐Ÿค”?

We are going to calculate regression for quadratic, cubic, and even spline. After that we are going to calculate correlation between two data series, with the same x-axis. Then correlation of fluctuation of, a data series against other fluctuation.