Preface
Goal: Overview of trend, from data modelling in excel to python.
I have been craving to learn statistic, but I don’t know where to start. The lights comes up when I get a challenge, to get a curve fitting from a series data. From this I can step up to regression and correlation.
I never skip any statistic class in my college day. But the thing is, they didn’t teach me that far. Wht I got is only basic statistic. While in te same time, I understand other college teach them ANOVA. Now I have to learn on my own, without class or any course.
Actually I started by reading a statistics book in 2017, this help me get started. There are a lot of books and video that comes up with concept. There are also ready to use online calculator. But there are not many excel sheet examples online, for daily practical use. But then, how do I suppose to know if my calculation right, if I can’t calculate manually. I need to know how the math works internally.
Even with calculator online, How do I suppose to understand how it works, if it doesn’t comes with the math? Of course there are books for this. But even when I know the concept and the math, how do I automate my job with scripting?
The knowledge related to statistics has evolve. So I decide to choose another approach:
- The math behind
- The manual calculation with excel
- The implementation with python
Habit Change
How do you start solving math problem?
In college days, I used to take a piece of paper, and I can just right the equation down, and derive stuff over the paper.
Things have changed, there is no much paper anymore. I do not bring pen in my bag, just like the oldschool. So the write down things has also changed.
What tools do we used these days?
We have spreadsheet all around since about 25 years ago. My intuition say that I can just use excel right away, for data modelling. Then we can just take a screenshot, or copy paste the result into whatsapp. And also attach the spreadsheet file in that very conversation.
How do you communicate the equation?
Sure Excel can produce equation perfectly. The thing is we need to also share, the equation source, in human readable or even machine readable. This can be done perfectly in LaTex.
What is the final form?
Depends on the requirement, and preference. For the polynomial case,I need a visual chart, that I can compare both source data and the XY lines, that the result can be send via any social media. Although this chart can be easily produce with matplotlib. You may use any programming language.
How do Start
We start from the polynomial curve fitting.
Then we are going to step up to least square along with regression and correlation. Strating from equation cheatsheet for linear regression for samples.
I also make complete worksheet helper for beginner, to solve linear regression along with statistics properties, using manual calculation. After manual calculation, we are going to continue to utilize built-in formula in Excel/Calc.
We are also going to cover python, from manual calculation, to methods to from numpy and also statistic related library. And then we visualize the interpretation, of the statistics properties using matplotlib. For example the standard error with level of confidence in shaded region below:
Since we also need to analyze additional statistic properties, and compared with the data distribution, we need to learn the basic of plotting the distribution curve.
So we can put additional statistics in the same chart with the histogram.
Then we are going to enhance matplotlib visualization view with Seaborn. All also provided with Jupyter Lab counterpart in github.
We also need to verify our manual calculation with PSPPire.
Then you can imagine how this can be implemented in different language,
such as R
, Julia
, Typescript
, and Golang
.
What Comes Next 🤔?
Our journey begins with
the built in linset
formula in spreadsheet,
and built-in polyfit
method in python.
Consider diving into the next step by exploring [ Trend - Built-in Method ].