Where to Discuss?

Local Group

Preface

Goal: Initialization and conditional optimization.


2: Restructure Data

I must admit, by the time I write this code. I don’t know which data structure is the best to represent, the pivot table.

However, I have start to code anyway.

Defining Variables

Here it is the restructured data:

puppy    = 0
juvenile = 0
junior   = { '04': 0, '06': 0, '08': 0 }
female   = { '02': 0, '04': 0, '06': 0, '08':0 }
male     = { '02': 0, '04': 0, '06': 0, '08':0 }
total    = 0

Now using dictionary.

Walk The Loop

Consider scan different data source. Instead of the first data source (B and C), we can scan the second data source (D and E),

  • Row D: Prediction
  • Row E: Probability
# start: 3
# end:  43
rows = range(3, 43)
for row in rows:
  pred_col = ws['D'+str(row)]
  prob_col = ws['E'+str(row)]

  pred = pred_col.value
  prob = float(prob_col.value)

Conditional

The skeleton conditional is very similar, except now it is using dictionary as below:

    if pred=='Female':
    if   (0.2 <= prob < 0.4): female['02'] += 1
    elif (0.4 <= prob < 0.6): female['04'] += 1
    elif (0.6 <= prob < 0.8): female['06'] += 1
    elif (0.8 <= prob <= 1):  female['08'] += 1
    else: raise Exception("Female not in range" + str(prob))
  elif pred=='Male':
    if   (0.2 <= prob < 0.4): male['02'] += 1
    elif (0.4 <= prob < 0.6): male['04'] += 1
    elif (0.6 <= prob < 0.8): male['06'] += 1
    elif (0.8 <= prob <= 1):  male['08'] += 1
    else: raise Exception("Male not in range" + str(prob))
  elif pred=='Junior':
    if   (0.4 <= prob < 0.6): junior['04'] += 1
    elif (0.6 <= prob < 0.8): junior['06'] += 1
    elif (0.8 <= prob <= 1) : junior['08'] += 1
    else: raise Exception("Junior not in range" + str(prob))
  elif pred=='Juvenile':
    juvenile += 1
  elif pred=='Puppy':
    puppy += 1

And I also write it in a more compact way.

Counting Total

This is still primitive, but with sum of dictionary.

total = puppy + juvenile + sum(junior.values()) \
      + sum(female.values()) + sum(male.values())

Printing The Output

I keep the code simple stupid, so you can catch up with the difference, from previous code.

print('Puppy          : %3d' % puppy)
print('Juvenile       : %3d' % juvenile)
print('Junior 0.4-0.6 : %3d' % junior['04'])
print('Junior 0.6-0.8 : %3d' % junior['06'])
print('Junior 0.8-1.0 : %3d' % junior['08'])
print('Female 0.2-0.4 : %3d' % female['02'])
print('Female 0.4-0.6 : %3d' % female['04'])
print('Female 0.6-0.8 : %3d' % female['06'])
print('Female 0.8-1.0 : %3d' % female['08'])
print('Male 0.2-0.4   : %3d' % male['02'])
print('Male 0.4-0.6   : %3d' % male['04'])
print('Male 0.6-0.8   : %3d' % male['06'])
print('Male 0.8-1.0   : %3d' % male['08'])
print('------------------------')
print('Total          : %3d' % total)

It is still dumb, but with dictionary.

The Result

We haven’t had our pivot sheet yet. But the output is enough, to make a pivot.

Puppy          :   1
Juvenile       :   0
Junior 0.4-0.6 :   1
Junior 0.6-0.8 :   4
Junior 0.8-1.0 :   0
Female 0.2-0.4 :   0
Female 0.4-0.6 :   1
Female 0.6-0.8 :  17
Female 0.8-1.0 :  14
Male 0.2-0.4   :   0
Male 0.4-0.6   :   2
Male 0.6-0.8   :   0
Male 0.8-1.0   :   0
------------------------
Total          :  40

We already have one code structure. We can bundle them in a function.


3: Pivot Function

Consider bundle the complexity inside a function.

Function Declaration

The function declaration is simply as below:

def pivot_me(rows, name_pred, name_prob):
  # ...

Calling Function

At the end of the code, we are going to call the function as below:

# start: 3
# end:  43
rows = range(3, 43)

print('First Pool')
print('------------------------')
pivot_me(rows, 'B', 'C')

print('Second Pool')
print('------------------------')
pivot_me(rows, 'D', 'E')

print('Third Pool')
print('------------------------')
pivot_me(rows, 'F', 'G')

Defining Variables

We put data initialization in the function:

def pivot_me(rows, name_pred, name_prob):

  # declare all local variables
  puppy       = 0
  juvenile    = 0
  junior      = { 40: 0, 60: 0, 80:0 }
  female      = { 20: 0, 40: 0, 60: 0, 80:0 }
  male        = { 20: 0, 40: 0, 60: 0, 80:0 }
  total       = 0
  str_ranges  = { 20: '0.2-0.4', 40: '0.4-0.6',
                  60: '0.6-0.8', 80: '0.8-1.0' }

We have str_ranges here, for data output later.

  str_ranges  = { 20: '0.2-0.4', 40: '0.4-0.6',
                  60: '0.6-0.8', 80: '0.8-1.0' }

Walk The Loop

Intead of hardcoded cell address, we can be more flexible.

  • Row name_pred: Prediction
  • Row name_prob: Probability
  for row in rows:
    pred_col = ws[name_pred + str(row)]
    prob_col = ws[name_prob + str(row)]

    pred = pred_col.value

And also, to anticipate cell with empty value, we alter the code a little bit.

    if prob_col.value == None: continue
    else:
      prob = float(prob_col.value)
      probint   = int(prob*100)
      probfloor = probint - (probint % 20)

Conditional

With probfloor above, we can reduce the conditional code into below:

    if pred=='Female': 
      if   (0.2 <= prob <= 1): female[probfloor] += 1
      else: raise Exception("Female not in range" + str(prob))
    elif pred=='Male':
      if   (0.2 <= prob <= 1): male[probfloor] += 1
      else: raise Exception("Male not in range" + str(prob))
    elif pred=='Junior':
      if   (0.4 <= prob <= 1): junior[probfloor] += 1
      else: raise Exception("Junior not in range" + str(prob))
    elif pred=='Juvenile':
      juvenile += 1
    elif pred=='Puppy':
      puppy += 1

An I also write it in a more compact way.

Counting Total

The same as previous, except now it is counted inside the function.

  total = puppy + juvenile + sum(junior.values()) \
        + sum(female.values()) + sum(male.values())

Printing The Output

The code get more complex.

  print('Puppy            : %3d' % puppy)
  print('Juvenile         : %3d' % juvenile)
  for key in [40, 60, 80]:
    print('Junior %s   : %3d' % (str_ranges[key], junior[key]))
  for key in [20, 40, 60, 80]:
    print('Female   %s : %3d' % (str_ranges[key], female[key]))
  for key in [20, 40, 60, 80]:
    print('Male     %s : %3d' % (str_ranges[key], male[key]))
  print('------------------------')
  print('Total            : %3d' % total)
  print()

We are getting close to our pivot table.

The Result

The output is longer. I guess CLI is no longer suitable to show complex data.

First Pool
------------------------
Puppy            :   1
Juvenile         :  15
Junior 0.4-0.6   :   0
Junior 0.6-0.8   :   0
Junior 0.8-1.0   :   0
Female 0.2-0.4   :   0
Female 0.4-0.6   :   8
Female 0.6-0.8   :   2
Female 0.8-1.0   :   1
Male 0.2-0.4     :   0
Male 0.4-0.6     :   9
Male 0.6-0.8     :   3
Male 0.8-1.0     :   1
------------------------
Total            :  40

Second Pool
------------------------
Puppy            :   1
Juvenile         :   0
Junior 0.4-0.6   :   1
Junior 0.6-0.8   :   4
Junior 0.8-1.0   :   0
Female 0.2-0.4   :   0
Female 0.4-0.6   :   1
Female 0.6-0.8   :  17
Female 0.8-1.0   :  14
Male 0.2-0.4     :   0
Male 0.4-0.6     :   2
Male 0.6-0.8     :   0
Male 0.8-1.0     :   0
------------------------
Total            :  40

Third Pool
------------------------
Puppy            :   1
Juvenile         :  15
Junior 0.4-0.6   :   0
Junior 0.6-0.8   :   0
Junior 0.8-1.0   :   0
Female 0.2-0.4   :   0
Female 0.4-0.6   :   8
Female 0.6-0.8   :  13
Female 0.8-1.0   :   0
Male 0.2-0.4     :   0
Male 0.4-0.6     :   3
Male 0.6-0.8     :   0
Male 0.8-1.0     :   0
------------------------
Total            :  40

Our multicolumn pivot is based on these value above. We will show the result in spreadsheet later. But in the next part, we need to clean up the code first.


What is Next 🤔?

It is still a terrible code. I must be ashamed. I have to refactor.

Consider continue reading [ Python - Excel - Pivot - Part Three ].