Where to Discuss?

Local Group

Preface

Goal: Begin from scratch.

This is a multipart article:

  1. Begin from scratch.
  2. Initialization and conditional optimization.
  3. Refactoring to object oriented.
  4. Writing the xslx sheet.

From the very start, this is not only an explanation, on how to handle worksheet. But rather, a step by step real life example, of building macros from scratch.

Most of the part is modelling the internal data, without writing to sheet at all. With strong backend, we can go further to other case, other than pivot, such as statistical analysis and such.

Capability
The script shall be able to build a custom multi column pivot table.

However, we need a simple case. So I humbly present you my reader, the pivot case.

Why Pivot?

My boss said summary.

I have weekly report to be delivered to management. This include a bunch of pivot table and charts.

Why Custom Script?

Just like Excel, LibreOffice Calc can manage pivot table. But just like Excel, Calc can only manage one data source at a time. If I should compare between data source, I should make the pivot one by one, and copy-paste the result.

It is a daunting process, and also error prone. So I decide to make my own multicolumn pivot generator.

Why Python?

It has openpyxl

Python also has bigger ecosystem. Large community base.

What Source?

The Puppy Case.

Imagine a case, you have three cameras that could identify dogs, by gender and age, and also with the confidence score.

All cameras identify the length of the dog, if the dog is short, it simply puppy without probabilty. Otherwise it is an adult dog, that could be female or male, with probability. Between adult and puppy there is other category as well.

Each camera come from different vendor, thus different terminology. The first and the third use the junior word with probability. While the second camera use juvenile word without probability.

openpyxl: Input: Plain Sheet

You can build a custom multicolumn pivot table. Then manually copy the format from a template, to get nice gradient looks as below worksheet.

openpyxl: Output: Colored Pivot

The figure above is a simplification of real data.

Example Data

You can download here

The Process

I’m making the output mockup first. And code later.

You can download the result here.

I mostly code, after making output model. The output could be changed later, as the code process begun.

Why Automation?

Time savings.

We should invest on thinking, rather than doing tedious repetitive task.


1: Code from Scratch

Apologize for my ignorance. I never have a proper IT education. Nor algorithm course.

So I what I do is just start coding. And see what I can do after. Just type, and then optimize whatever comes.

Have a look at this source code.

Opening Workbook

Most of the time, starting is scary. So I push myself to open workbook.

import openpyxl
from openpyxl import load_workbook

wb = load_workbook('recap.xlsx')
ws = wb["Combined"]

Defining Variables

The next step is defining what variable that I need. I guess the need of this variables from the colored output above.

puppy      = 0
juvenile   = 0
junior_04  = 0
junior_06  = 0
junior_08  = 0
female_02  = 0
female_04  = 0
female_06  = 0
female_08  = 0
male_02    = 0
male_04    = 0
male_06    = 0
male_08    = 0
total      = 0

Walk The Loop

I scan each row of the sheet for this two variable:

  • Row B: Prediction
  • Row C: Probability
# start: 3
# end:  43
rows = range(3, 43)
for row in rows:
  pred_col = ws['B'+str(row)]
  prob_col = ws['C'+str(row)]

  pred = pred_col.value
  prob = float(prob_col.value)

This would take two level conditional. The outer just based on prediction, and the inner count its probability.

Outer Conditional

The skeleton of the outer conditional is as below

  if pred=='Female':
    # ...
  elif pred=='Male':
    # ...
  elif pred=='Junior':
    # ...
  elif pred=='Juvenile':
    juvenile += 1
  elif pred=='Puppy':
    puppy += 1

There is three prediction that require probability.

  • Female
  • Male
  • Junior

Inner Conditional

The three inner conditional is very similar.

Don’t worry if the code below looks stupid. We will optimize later.

  • Female
  if pred=='Female':
    if (0.2 <= prob < 0.4):
      female_02 += 1
    elif (0.4 <= prob < 0.6):
      female_04 += 1
    elif (0.6 <= prob < 0.8):
      female_06 += 1
    elif (0.8 <= prob <= 1):
      female_08 += 1
    else:
      raise Exception("Female not in range" + str(prob))
  • Male
  elif pred=='Male':
    if (0.2 <= prob < 0.4):
      male_02 += 1
    elif (0.4 <= prob < 0.6):
      male_04 += 1
    elif (0.6 <= prob < 0.8):
      male_06 += 1
    elif (0.8 <= prob <= 1):
      male_08 += 1
    else:
      raise Exception("Male not in range" + str(prob))
  • Junior
  elif pred=='Junior':
    if (0.4 <= prob < 0.6):
      junior_04 += 1
    elif (0.6 <= prob < 0.8):
      junior_06 += 1
    elif (0.8 <= prob <= 1):
      junior_08 += 1
    else:
      raise Exception("Junior not in range" + str(prob))

Each has exception, to anticipate outliers.

Counting Total

This is also very primitive.

total = puppy + juvenile \
      + junior_04 + junior_06 + junior_08 \
      + female_02  + female_04  + female_06 + female_08 \
      + male_02  + male_04  + male_06 + male_08

Printing The Output

And this is another dumb code.

print('Puppy          : %3d' % puppy)
print('Juvenile       : %3d' % juvenile)
print('Junior 0.4-0.6 : %3d' % junior_04)
print('Junior 0.6-0.8 : %3d' % junior_06)
print('Junior 0.8-1.0 : %3d' % junior_08)
print('Female 0.2-0.4 : %3d' % female_02)
print('Female 0.4-0.6 : %3d' % female_04)
print('Female 0.6-0.8 : %3d' % female_06)
print('Female 0.8-1.0 : %3d' % female_08)
print('Male 0.2-0.4   : %3d' % male_02)
print('Male 0.4-0.6   : %3d' % male_04)
print('Male 0.6-0.8   : %3d' % male_06)
print('Male 0.8-1.0   : %3d' % male_08)
print('------------------------')
print('Total          : %3d' % total)

I know it is dumb. But it works as a starter.

The Result

We haven’t had our pivot yet. But the output is clean enough.

Puppy          :   1
Juvenile       :  15
Junior 0.4-0.6 :   0
Junior 0.6-0.8 :   0
Junior 0.8-1.0 :   0
Female 0.2-0.4 :   0
Female 0.4-0.6 :   8
Female 0.6-0.8 :   2
Female 0.8-1.0 :   1
Male 0.2-0.4   :   0
Male 0.4-0.6   :   9
Male 0.6-0.8   :   3
Male 0.8-1.0   :   1
------------------------
Total          :  40

What is Next 🤔?

It is pretty bad code for a starter. It means we have a lot to be improved.

Consider continue reading [ Python - Excel - Pivot - Part Two ].