Preface
Goal: Refactoring to object oriented.
We need to clean-up the code. There will be nothing change in output. But so much change, in interal code structure.
4: Refactoring to Function
It is the same with the previous code. Except we are going to bundle everything in function.
Code Skeleton
The big picture.
import openpyxl
from openpyxl import load_workbook
# ...
def get_sample_init():
# ...
def get_total(s):
# ...
def display_sample(s):
# ...
def update_sample_count(pred, prob, s):
# ...
def get_sample_count(rows, name_pred, name_prob, s):
for row in rows:
# ...
def pivot_me(rows, name_pred, name_prob):
# ...
def main():
# ...
main()
Main function
For example, we could redefine the last part in a function.
def main():
r_start = 3
r_stop = 43
rows = range(r_start, r_stop)
print('First Pool')
print('------------------------')
pivot_me(rows, 'B', 'C')
print('Second Pool')
print('------------------------')
pivot_me(rows, 'D', 'E')
print('Third Pool')
print('------------------------')
pivot_me(rows, 'F', 'G')
And call it later as:
main()
Global Initialization
It is the same as previous.
import openpyxl
from openpyxl import load_workbook
wb = load_workbook('recap.xlsx')
ws = wb["Combined"]
We will deal later on, in the next part, as object oriented.
Pivot Function
Now it consist of three function.
def pivot_me(rows, name_pred, name_prob):
s = get_sample_init()
s = get_sample_count(rows, name_pred, name_prob, s)
display_sample(s)
This step, we only have CLI output. We will output to worksheet later.
Defining Variables
Consider redefine in function.
def get_sample_init():
# declare all local variables
return {
'puppy' : 0,
'juvenile' : 0,
'junior' : { 40: 0, 60: 0, 80:0 },
'female' : { 20: 0, 40: 0, 60: 0, 80:0 },
'male' : { 20: 0, 40: 0, 60: 0, 80:0 }
}
Walk The Loop
While scanning process is as below:
def get_sample_count(rows, name_pred, name_prob, s):
for row in rows:
pred_col = ws[name_pred + str(row)]
prob_col = ws[name_prob + str(row)]
pred = pred_col.value
if prob_col.value == None: continue
else:
prob = float(prob_col.value)
s = update_sample_count(pred, prob, s)
return s
This require update_sample_count
function.
Updating Sample Count
Conditional Optimization
This is the place where, each sample update count.
def update_sample_count(pred, prob, s):
probint = int(prob*100)
probfloor = probint - (probint % 20)
if pred=='Female':
if (0.2 <= prob <= 1): s['female'][probfloor] += 1
else: raise Exception("Female not in range" + str(prob))
elif pred=='Male':
if (0.2 <= prob <= 1): s['male'][probfloor] += 1
else: raise Exception("Male not in range" + str(prob))
elif pred=='Junior':
if (0.4 <= prob <= 1): s['junior'][probfloor] += 1
else: raise Exception("Junior not in range" + str(prob))
elif pred=='Juvenile':
s['juvenile'] += 1
elif pred=='Puppy':
s['puppy'] += 1
return s
Counting Total
Also bundle this into function
def get_total(s):
return s['puppy'] \
+ s['juvenile'] \
+ sum(s['junior'].values()) \
+ sum(s['female'].values()) \
+ sum(s['male'].values())
Printing The Output
Again. Bundle.
def display_sample(s):
str_ranges = { 20: '0.2-0.4', 40: '0.4-0.6',
60: '0.6-0.8', 80: '0.8-1.0' }
total = get_total(s)
print('Puppy : %3d' % s['puppy'])
print('Juvenile : %3d' % s['juvenile'])
for key in [40, 60, 80]:
print('Junior %s : %3d' % (str_ranges[key], s['junior'][key]))
for key in [20, 40, 60, 80]:
print('Female %s : %3d' % (str_ranges[key], s['female'][key]))
for key in [20, 40, 60, 80]:
print('Male %s : %3d' % (str_ranges[key], s['male'][key]))
print('------------------------')
print('Total : %3d' % total)
print()
The Result
The output is exactly the same with the previous one.
First Pool
------------------------
Puppy : 1
Juvenile : 15
...
------------------------
Total : 40
Second Pool
------------------------
Puppy : 1
Juvenile : 0
...
------------------------
Total : 40
Third Pool
------------------------
Puppy : 1
Juvenile : 15
...
------------------------
Total : 40
Our multicolumn pivot is based on these value above.
5: Bundle into Class
After function, we can use object oriented, to simplify code, such as initialization, code reuse, and stuff.
Code Skeleton
The big picture of the Dog Class.
import openpyxl
from openpyxl import load_workbook
class DogSample:
# ...
def __init__(self, sheet_src, rows):
# ...
def reset(self):
# ...
def get_total(self):
# ...
def display_to_console(self):
# ...
def update(self, pred, prob):
# ...
def walk_each_row(self):
for row in self.rows:
# ...
def pivot(self, title, name_pred, name_prob):
# ...
def main():
# ...
main()
Main Function
We deal with the book part in main function. I think, it is more tidy this way.
def main():
book_src = load_workbook('recap.xlsx')
sheet_src = book_src["Combined"]
r_start = 3
r_stop = 43
rows = range(r_start, r_stop)
sample = DogSample(sheet_src, rows)
sample.pivot('First Pool', 'B', 'C')
sample.pivot('Second Pool', 'D', 'E')
sample.pivot('Third Pool', 'F', 'G')
And call it later as:
main()
Global Initialization
It is the same as previous.
import openpyxl
from openpyxl import load_workbook
Except, nomore workbook stuff.
Dog Class Property
The only property is this one below:
class DogSample:
str_ranges = { 20: '0.2-0.4', 40: '0.4-0.6',
60: '0.6-0.8', 80: '0.8-1.0' }
Dog Class Initialization
We only initialize two variables
sheet_src
,rows
def __init__(self, sheet_src, rows):
self.sheet_src = sheet_src
self.rows = rows
Pivot Method
It still consist of three function.
And it also initialize three variables
title
,name_pred
,name_prob
def pivot(self, title, name_pred, name_prob):
self.title = title
self.name_pred = name_pred
self.name_prob = name_prob
self.reset()
self.walk_each_row()
self.display_to_console()
Defining Property
We should reset the counter for each calls.
def reset(self):
self.sample = {
'puppy' : 0,
'juvenile' : 0,
'junior' : { 40: 0, 60: 0, 80: 0 },
'female' : { 20: 0, 40: 0, 60: 0, 80: 0 },
'male' : { 20: 0, 40: 0, 60: 0, 80: 0 }
}
Walk The Loop
The scanning process is very similar, with the plain function version.
def walk_each_row(self):
for row in self.rows:
pred_col = self.sheet_src[self.name_pred + str(row)]
prob_col = self.sheet_src[self.name_prob + str(row)]
pred = pred_col.value
if prob_col.value == None: continue
else:
prob = float(prob_col.value)
self.update(pred, prob)
This require update
method.
Updating Sample Count
Conditional Optimization
This is the place where, each sample update count.
def update(self, pred, prob):
# s is just a shorter form of sample variable
s = self.sample
probint = int(prob*100)
probfloor = probint - (probint % 20)
if pred=='Female':
if (0.2 <= prob <= 1): s['female'][probfloor] += 1
else: raise Exception("Female not in range" + str(prob))
elif pred=='Male':
if (0.2 <= prob <= 1): s['male'][probfloor] += 1
else: raise Exception("Male not in range" + str(prob))
elif pred=='Junior':
if (0.4 <= prob <= 1): s['junior'][probfloor] += 1
else: raise Exception("Junior not in range" + str(prob))
elif pred=='Juvenile':
s['juvenile'] += 1
elif pred=='Puppy':
s['puppy'] += 1
self.sample = s
I’m not fond of long code,
so I make the s
shortcut.
Counting Total
Also bundle the total count into method:
def get_total(self):
return self.sample['puppy'] \
+ self.sample['juvenile'] \
+ sum(self.sample['junior'].values()) \
+ sum(self.sample['female'].values()) \
+ sum(self.sample['male'].values())
Printing The Output
Bundle, as usual:
def display_to_console(self):
# s is just a shorter form of sample variable
s = self.sample
sr = self.str_ranges
print(self.title)
print('------------------------')
print('Puppy : %3d' % s['puppy'])
print('Juvenile : %3d' % s['juvenile'])
for key in [40, 60, 80]:
print('Junior %s : %3d' % (sr[key], s['junior'][key]))
for key in [20, 40, 60, 80]:
print('Female %s : %3d' % (sr[key], s['female'][key]))
for key in [20, 40, 60, 80]:
print('Male %s : %3d' % (sr[key], s['male'][key]))
print('------------------------')
print('Total : %3d' % self.get_total())
print()
The Result
The output is exactly the same with the previous one.
First Pool
...
Second Pool
...
Third Pool
...
Our multicolumn pivot is based on these value above. We will show the result in spreadsheet in the next part.
What is Next 🤔?
We are done with CLI output. Now it is a good time to write the output to the sheet.
Consider continue reading [ Python - Excel - Pivot - Part Four ].