Where to Discuss?

Local Group

Preface

Goal: Refactoring to object oriented.

We need to clean-up the code. There will be nothing change in output. But so much change, in interal code structure.


4: Refactoring to Function

It is the same with the previous code. Except we are going to bundle everything in function.

Code Skeleton

The big picture.

import openpyxl
from openpyxl import load_workbook

# ...

def get_sample_init():
  # ...

def get_total(s):
  # ...

def display_sample(s):
  # ...

def update_sample_count(pred, prob, s):
  # ...

def get_sample_count(rows, name_pred, name_prob, s):
  for row in rows:
    # ...

def pivot_me(rows, name_pred, name_prob):
  # ...

def main():
  # ...

main()

Main function

For example, we could redefine the last part in a function.

def main():
  r_start =  3
  r_stop  = 43
  rows = range(r_start, r_stop)

  print('First Pool')
  print('------------------------')
  pivot_me(rows, 'B', 'C')

  print('Second Pool')
  print('------------------------')
  pivot_me(rows, 'D', 'E')

  print('Third Pool')
  print('------------------------')
  pivot_me(rows, 'F', 'G')

And call it later as:

main()

Global Initialization

It is the same as previous.

import openpyxl
from openpyxl import load_workbook

wb = load_workbook('recap.xlsx')
ws = wb["Combined"]

We will deal later on, in the next part, as object oriented.

Pivot Function

Now it consist of three function.

def pivot_me(rows, name_pred, name_prob):
  s = get_sample_init()
  s = get_sample_count(rows, name_pred, name_prob, s)
  display_sample(s)

This step, we only have CLI output. We will output to worksheet later.

Defining Variables

Consider redefine in function.

def get_sample_init():
  # declare all local variables
  return {
    'puppy'    : 0,
    'juvenile' : 0,
    'junior'   : { 40: 0, 60: 0, 80:0 },
    'female'   : { 20: 0, 40: 0, 60: 0, 80:0 },
    'male'     : { 20: 0, 40: 0, 60: 0, 80:0 }
  }

Walk The Loop

While scanning process is as below:

def get_sample_count(rows, name_pred, name_prob, s):
  for row in rows:
    pred_col = ws[name_pred + str(row)]
    prob_col = ws[name_prob + str(row)]

    pred = pred_col.value

    if prob_col.value == None: continue
    else:
      prob = float(prob_col.value)
      s = update_sample_count(pred, prob, s)

  return s

This require update_sample_count function.

Updating Sample Count

Conditional Optimization

This is the place where, each sample update count.

def update_sample_count(pred, prob, s):
  probint   = int(prob*100)
  probfloor = probint - (probint % 20)

  if pred=='Female': 
    if   (0.2 <= prob <= 1): s['female'][probfloor] += 1
    else: raise Exception("Female not in range" + str(prob))
  elif pred=='Male':
    if   (0.2 <= prob <= 1): s['male'][probfloor] += 1
    else: raise Exception("Male not in range" + str(prob))
  elif pred=='Junior':
    if   (0.4 <= prob <= 1): s['junior'][probfloor] += 1
    else: raise Exception("Junior not in range" + str(prob))
  elif pred=='Juvenile':
    s['juvenile'] += 1
  elif pred=='Puppy':
    s['puppy'] += 1

  return s

Counting Total

Also bundle this into function

def get_total(s):
  return s['puppy'] \
       + s['juvenile'] \
       + sum(s['junior'].values()) \
       + sum(s['female'].values()) \
       + sum(s['male'].values())

Printing The Output

Again. Bundle.

def display_sample(s):
  str_ranges  = { 20: '0.2-0.4', 40: '0.4-0.6',
                  60: '0.6-0.8', 80: '0.8-1.0' }

  total = get_total(s)

  print('Puppy            : %3d' % s['puppy'])
  print('Juvenile         : %3d' % s['juvenile'])
  for key in [40, 60, 80]:
    print('Junior %s   : %3d' % (str_ranges[key], s['junior'][key]))
  for key in [20, 40, 60, 80]:
    print('Female   %s : %3d' % (str_ranges[key], s['female'][key]))
  for key in [20, 40, 60, 80]:
    print('Male     %s : %3d' % (str_ranges[key], s['male'][key]))
  print('------------------------')
  print('Total            : %3d' % total)
  print()

The Result

The output is exactly the same with the previous one.

First Pool
------------------------
Puppy            :   1
Juvenile         :  15
...
------------------------
Total            :  40

Second Pool
------------------------
Puppy            :   1
Juvenile         :   0
...
------------------------
Total            :  40

Third Pool
------------------------
Puppy            :   1
Juvenile         :  15
...
------------------------
Total            :  40

Our multicolumn pivot is based on these value above.


5: Bundle into Class

After function, we can use object oriented, to simplify code, such as initialization, code reuse, and stuff.

Code Skeleton

The big picture of the Dog Class.

import openpyxl
from openpyxl import load_workbook

class DogSample:
  # ...

  def __init__(self, sheet_src, rows):
    # ...

  def reset(self):
    # ...

  def get_total(self):
    # ...

  def display_to_console(self):
    # ...

  def update(self, pred, prob):
    # ...

  def walk_each_row(self):
    for row in self.rows:
      # ...

  def pivot(self, title, name_pred, name_prob):
    # ...

def main():
  # ...

main()

Main Function

We deal with the book part in main function. I think, it is more tidy this way.

def main():
  book_src  = load_workbook('recap.xlsx')
  sheet_src = book_src["Combined"]

  r_start =  3
  r_stop  = 43
  rows = range(r_start, r_stop)

  sample = DogSample(sheet_src, rows)
  sample.pivot('First Pool',  'B', 'C')
  sample.pivot('Second Pool', 'D', 'E')
  sample.pivot('Third Pool',  'F', 'G')

And call it later as:

main()

Global Initialization

It is the same as previous.

import openpyxl
from openpyxl import load_workbook

Except, nomore workbook stuff.

Dog Class Property

The only property is this one below:

class DogSample:
  str_ranges   = { 20: '0.2-0.4', 40: '0.4-0.6',
                   60: '0.6-0.8', 80: '0.8-1.0' }

Dog Class Initialization

We only initialize two variables

  • sheet_src,
  • rows
  def __init__(self, sheet_src, rows):
    self.sheet_src = sheet_src
    self.rows      = rows

Pivot Method

It still consist of three function.

And it also initialize three variables

  • title,
  • name_pred,
  • name_prob
  def pivot(self, title, name_pred, name_prob):
    self.title     = title
    self.name_pred = name_pred
    self.name_prob = name_prob

    self.reset()
    self.walk_each_row()
    self.display_to_console()

Defining Property

We should reset the counter for each calls.

  def reset(self):
    self.sample = {
      'puppy'    : 0,
      'juvenile' : 0,
      'junior'   : { 40: 0, 60: 0, 80: 0 },
      'female'   : { 20: 0, 40: 0, 60: 0, 80: 0 },
      'male'     : { 20: 0, 40: 0, 60: 0, 80: 0 }
    }

Walk The Loop

The scanning process is very similar, with the plain function version.

  def walk_each_row(self):
    for row in self.rows:
      pred_col = self.sheet_src[self.name_pred + str(row)]
      prob_col = self.sheet_src[self.name_prob + str(row)]

      pred = pred_col.value

      if prob_col.value == None: continue
      else:
        prob = float(prob_col.value)
        self.update(pred, prob)

This require update method.

Updating Sample Count

Conditional Optimization

This is the place where, each sample update count.

  def update(self, pred, prob):
    # s is just a shorter form of sample variable
    s = self.sample

    probint   = int(prob*100)
    probfloor = probint - (probint % 20)

    if pred=='Female': 
      if   (0.2 <= prob <= 1): s['female'][probfloor] += 1
      else: raise Exception("Female not in range" + str(prob))
    elif pred=='Male':
      if   (0.2 <= prob <= 1): s['male'][probfloor] += 1
      else: raise Exception("Male not in range" + str(prob))
    elif pred=='Junior':
      if   (0.4 <= prob <= 1): s['junior'][probfloor] += 1
      else: raise Exception("Junior not in range" + str(prob))
    elif pred=='Juvenile':
      s['juvenile'] += 1
    elif pred=='Puppy':
      s['puppy'] += 1

    self.sample = s

I’m not fond of long code, so I make the s shortcut.

Counting Total

Also bundle the total count into method:

  def get_total(self):
    return self.sample['puppy'] \
         + self.sample['juvenile'] \
         + sum(self.sample['junior'].values()) \
         + sum(self.sample['female'].values()) \
         + sum(self.sample['male'].values())

Printing The Output

Bundle, as usual:

  def display_to_console(self):
    # s is just a shorter form of sample variable
    s = self.sample
    sr = self.str_ranges

    print(self.title)
    print('------------------------')
    print('Puppy            : %3d' % s['puppy'])
    print('Juvenile         : %3d' % s['juvenile'])
    for key in [40, 60, 80]:
      print('Junior %s   : %3d' % (sr[key], s['junior'][key]))
    for key in [20, 40, 60, 80]:
      print('Female   %s : %3d' % (sr[key], s['female'][key]))
    for key in [20, 40, 60, 80]:
      print('Male     %s : %3d' % (sr[key], s['male'][key]))
    print('------------------------')
    print('Total            : %3d' % self.get_total())
    print()

The Result

The output is exactly the same with the previous one.

First Pool
...

Second Pool
...

Third Pool
...

Our multicolumn pivot is based on these value above. We will show the result in spreadsheet in the next part.


What is Next 🤔?

We are done with CLI output. Now it is a good time to write the output to the sheet.

Consider continue reading [ Python - Excel - Pivot - Part Four ].