Preface
Goal: Initialization and conditional optimization.
2: Restructure Data
I must admit, by the time I write this code. I don’t know which data structure is the best to represent, the pivot table.
However, I have start to code anyway.
Defining Variables
Here it is the restructured data:
puppy = 0
juvenile = 0
junior = { '04': 0, '06': 0, '08': 0 }
female = { '02': 0, '04': 0, '06': 0, '08':0 }
male = { '02': 0, '04': 0, '06': 0, '08':0 }
total = 0
Now using dictionary.
Walk The Loop
Consider scan different data source.
Instead of the first data source (B
and C
),
we can scan the second data source (D
and E
),
- Row
D
: Prediction - Row
E
: Probability
# start: 3
# end: 43
rows = range(3, 43)
for row in rows:
pred_col = ws['D'+str(row)]
prob_col = ws['E'+str(row)]
pred = pred_col.value
prob = float(prob_col.value)
Conditional
The skeleton conditional is very similar, except now it is using dictionary as below:
if pred=='Female':
if (0.2 <= prob < 0.4): female['02'] += 1
elif (0.4 <= prob < 0.6): female['04'] += 1
elif (0.6 <= prob < 0.8): female['06'] += 1
elif (0.8 <= prob <= 1): female['08'] += 1
else: raise Exception("Female not in range" + str(prob))
elif pred=='Male':
if (0.2 <= prob < 0.4): male['02'] += 1
elif (0.4 <= prob < 0.6): male['04'] += 1
elif (0.6 <= prob < 0.8): male['06'] += 1
elif (0.8 <= prob <= 1): male['08'] += 1
else: raise Exception("Male not in range" + str(prob))
elif pred=='Junior':
if (0.4 <= prob < 0.6): junior['04'] += 1
elif (0.6 <= prob < 0.8): junior['06'] += 1
elif (0.8 <= prob <= 1) : junior['08'] += 1
else: raise Exception("Junior not in range" + str(prob))
elif pred=='Juvenile':
juvenile += 1
elif pred=='Puppy':
puppy += 1
And I also write it in a more compact way.
Counting Total
This is still primitive, but with sum of dictionary.
total = puppy + juvenile + sum(junior.values()) \
+ sum(female.values()) + sum(male.values())
Printing The Output
I keep the code simple stupid, so you can catch up with the difference, from previous code.
print('Puppy : %3d' % puppy)
print('Juvenile : %3d' % juvenile)
print('Junior 0.4-0.6 : %3d' % junior['04'])
print('Junior 0.6-0.8 : %3d' % junior['06'])
print('Junior 0.8-1.0 : %3d' % junior['08'])
print('Female 0.2-0.4 : %3d' % female['02'])
print('Female 0.4-0.6 : %3d' % female['04'])
print('Female 0.6-0.8 : %3d' % female['06'])
print('Female 0.8-1.0 : %3d' % female['08'])
print('Male 0.2-0.4 : %3d' % male['02'])
print('Male 0.4-0.6 : %3d' % male['04'])
print('Male 0.6-0.8 : %3d' % male['06'])
print('Male 0.8-1.0 : %3d' % male['08'])
print('------------------------')
print('Total : %3d' % total)
It is still dumb, but with dictionary.
The Result
We haven’t had our pivot sheet yet. But the output is enough, to make a pivot.
Puppy : 1
Juvenile : 0
Junior 0.4-0.6 : 1
Junior 0.6-0.8 : 4
Junior 0.8-1.0 : 0
Female 0.2-0.4 : 0
Female 0.4-0.6 : 1
Female 0.6-0.8 : 17
Female 0.8-1.0 : 14
Male 0.2-0.4 : 0
Male 0.4-0.6 : 2
Male 0.6-0.8 : 0
Male 0.8-1.0 : 0
------------------------
Total : 40
We already have one code structure. We can bundle them in a function.
3: Pivot Function
Consider bundle the complexity inside a function.
Function Declaration
The function declaration is simply as below:
def pivot_me(rows, name_pred, name_prob):
# ...
Calling Function
At the end of the code, we are going to call the function as below:
# start: 3
# end: 43
rows = range(3, 43)
print('First Pool')
print('------------------------')
pivot_me(rows, 'B', 'C')
print('Second Pool')
print('------------------------')
pivot_me(rows, 'D', 'E')
print('Third Pool')
print('------------------------')
pivot_me(rows, 'F', 'G')
Defining Variables
We put data initialization in the function:
def pivot_me(rows, name_pred, name_prob):
# declare all local variables
puppy = 0
juvenile = 0
junior = { 40: 0, 60: 0, 80:0 }
female = { 20: 0, 40: 0, 60: 0, 80:0 }
male = { 20: 0, 40: 0, 60: 0, 80:0 }
total = 0
str_ranges = { 20: '0.2-0.4', 40: '0.4-0.6',
60: '0.6-0.8', 80: '0.8-1.0' }
We have str_ranges
here, for data output later.
str_ranges = { 20: '0.2-0.4', 40: '0.4-0.6',
60: '0.6-0.8', 80: '0.8-1.0' }
Walk The Loop
Intead of hardcoded cell address, we can be more flexible.
- Row
name_pred
: Prediction - Row
name_prob
: Probability
for row in rows:
pred_col = ws[name_pred + str(row)]
prob_col = ws[name_prob + str(row)]
pred = pred_col.value
And also, to anticipate cell with empty value, we alter the code a little bit.
if prob_col.value == None: continue
else:
prob = float(prob_col.value)
probint = int(prob*100)
probfloor = probint - (probint % 20)
Conditional
With probfloor
above,
we can reduce the conditional code into below:
if pred=='Female':
if (0.2 <= prob <= 1): female[probfloor] += 1
else: raise Exception("Female not in range" + str(prob))
elif pred=='Male':
if (0.2 <= prob <= 1): male[probfloor] += 1
else: raise Exception("Male not in range" + str(prob))
elif pred=='Junior':
if (0.4 <= prob <= 1): junior[probfloor] += 1
else: raise Exception("Junior not in range" + str(prob))
elif pred=='Juvenile':
juvenile += 1
elif pred=='Puppy':
puppy += 1
An I also write it in a more compact way.
Counting Total
The same as previous, except now it is counted inside the function.
total = puppy + juvenile + sum(junior.values()) \
+ sum(female.values()) + sum(male.values())
Printing The Output
The code get more complex.
print('Puppy : %3d' % puppy)
print('Juvenile : %3d' % juvenile)
for key in [40, 60, 80]:
print('Junior %s : %3d' % (str_ranges[key], junior[key]))
for key in [20, 40, 60, 80]:
print('Female %s : %3d' % (str_ranges[key], female[key]))
for key in [20, 40, 60, 80]:
print('Male %s : %3d' % (str_ranges[key], male[key]))
print('------------------------')
print('Total : %3d' % total)
print()
We are getting close to our pivot table.
The Result
The output is longer. I guess CLI is no longer suitable to show complex data.
First Pool
------------------------
Puppy : 1
Juvenile : 15
Junior 0.4-0.6 : 0
Junior 0.6-0.8 : 0
Junior 0.8-1.0 : 0
Female 0.2-0.4 : 0
Female 0.4-0.6 : 8
Female 0.6-0.8 : 2
Female 0.8-1.0 : 1
Male 0.2-0.4 : 0
Male 0.4-0.6 : 9
Male 0.6-0.8 : 3
Male 0.8-1.0 : 1
------------------------
Total : 40
Second Pool
------------------------
Puppy : 1
Juvenile : 0
Junior 0.4-0.6 : 1
Junior 0.6-0.8 : 4
Junior 0.8-1.0 : 0
Female 0.2-0.4 : 0
Female 0.4-0.6 : 1
Female 0.6-0.8 : 17
Female 0.8-1.0 : 14
Male 0.2-0.4 : 0
Male 0.4-0.6 : 2
Male 0.6-0.8 : 0
Male 0.8-1.0 : 0
------------------------
Total : 40
Third Pool
------------------------
Puppy : 1
Juvenile : 15
Junior 0.4-0.6 : 0
Junior 0.6-0.8 : 0
Junior 0.8-1.0 : 0
Female 0.2-0.4 : 0
Female 0.4-0.6 : 8
Female 0.6-0.8 : 13
Female 0.8-1.0 : 0
Male 0.2-0.4 : 0
Male 0.4-0.6 : 3
Male 0.6-0.8 : 0
Male 0.8-1.0 : 0
------------------------
Total : 40
Our multicolumn pivot is based on these value above. We will show the result in spreadsheet later. But in the next part, we need to clean up the code first.
What is Next 🤔?
It is still a terrible code. I must be ashamed. I have to refactor.
Consider continue reading [ Python - Excel - Pivot - Part Three ].