# Pivot - Model - List Comprehension

Python Automation Article Series Test

Data Modelling

Python - Excel

LibreOffice Macro

LibreOffice UNO

Table of Content
Where to Discuss?

Local Group

### Preface

Goal: Transform data with list comprehension, from plain table to pivot.

In this article, we will explore how the data evolves from a table to a pivot. We’ll have fun with list comprehension and learn cool tricks while transforming data.

### Building Pivot

After reading the CSV, row by row, we can build model using build_pivot(), by calculating the pivot manually.

I’ve simplified the diagram for better understanding.

#### Main Program

The main program remains intact, exactly the same as before.

#### Class Skeleton

The class skeleton is as follows:

class PivotSample:
def __init__(self, source_csv: str,
categories: List[str]) -> None:
...

...

def build_pivot(self) -> None:
...

def display(self) -> None:
...

def process(self):
self.load_data()

Instead of returning a value, in OOP, we can store the value in a property.

#### Process

We are going to add build_pivot() method here:

  def process(self):
self.build_pivot()

#### Build Pivot

This is our main part. Let’s go through this step by step.

### Transformation Abstraction

Consider making a plan without ever writing a single line of code. Write down how the data transforms.

#### Raw Data

Imagine how to get the pivot using set theory. Denote the raw data as a set Raw Data, where each element is a triple:

For example:

Number,Date,Fruit
1,19/02/2017,Orange
2,19/02/2017,Grape
3,19/02/2017,Strawberry
4,19/02/2017,Orange
5,19/02/2017,Apple


#### Overall Process

Express the overall process in a single mathematical expression. Note that this expression would be more symbolic and conceptual, rather than an actual operation.

This expression uses the set union symbol (∪) to add the zero count for categories not found on a particular date.

#### Actual Process

The actual operations involve several steps. Let’s break down the operations.

1. Grouping by Date:

For example:

'01/03/2017': ['Strawberry',
'Strawberry',
'Grape',
'Grape',
'Banana',
'Apple',
'Orange',
'Orange',
'Banana',
'Grape',
'Banana',
'Strawberry',
'Orange',
'Strawberry'],
1. Counting Occurrences:

The result of above example is:

{'01/03/2017': {'Apple': 1,
'Banana': 3,
'Grape': 3,
'Orange': 3,
'Strawberry': 4},
'01/04/2017': {'Apple': 4,
'Banana': 6,
'Grape': 2,
'Mango': 2,
'Orange': 3,
'Strawberry': 4},
1. Ensuring All Categories with Zero Counts:

The data required to for each pivot row are:

{'01/03/2017': {'Apple': 1,
'Banana': 3,
'Grape': 3,
'Orange': 3,
'Strawberry': 4},
'01/04/2017': {'Apple': 4,
'Banana': 6,
'Grape': 2,
'Mango': 2,
'Orange': 3,
'Strawberry': 4},

Now we are ready for the actual code.

### Data Transformation

Let’s backtrack for a while:

#### CSV File

The original CSV file have a format as below

Number,Date,Fruit
1,19/02/2017,Orange
2,19/02/2017,Grape
3,19/02/2017,Strawberry
4,19/02/2017,Orange
5,19/02/2017,Apple


#### All Values: CSV Reader Result

As a result of the CSV reader, we have a self.all_values variable. This memory data can be shown in this List of List(s):

❯ python 02-model-view.py
['Number', 'Date', 'Fruit']
['1', '19/02/2017', 'Orange']
['2', '19/02/2017', 'Grape']
['3', '19/02/2017', 'Strawberry']
['4', '19/02/2017', 'Orange']
['5', '19/02/2017', 'Apple']

This is our first transformation.

#### All Dates: Grouping Category

I need to group the result by date, keeping the date as index for further use. This means we are the number index, will be disposed.

In order to group by date, we can use list comprehension in below:

  def build_pivot(self) -> None:
self.all_dates = {
row[1]: [item[2]
for item in self.all_values
if item[1] == row[1]]
for row in self.all_values
}

pprint(self.all_dates)

This is basically a loop in a loop, comparing all_values item. The first loop is iterating all values, while the second loop will append the category, only if the date matches.

The result would looks like below:

'01/03/2017': ['Strawberry',
'Strawberry',
'Grape',
'Grape',
'Banana',
'Apple',
'Orange',
'Orange',
'Banana',
'Grape',
'Banana',
'Strawberry',
'Orange',
'Strawberry'],

#### Occurences: Category Count

I need to count the distinct category for each group.

Counting occurrences can be done, by breaking row into an array of distinct category value. This can be done with list comprehension below:

    self.occurrences = {
date: {cat: row.count(cat) for cat in set(row)}
for date, row in self.all_dates.items()
}

pprint(self.occurrences)

The list comprehension is pretty self explanatory. The first loop is iterating all dates,

      for date, row in self.all_dates.items()

The second one is getting the count of distinct category. Distinct category can be obtained by using set. The count work for specific category in row (list of values).

      {cat: row.count(cat) for cat in set(row)}

The result would looks like below:

{'01/03/2017': {'Apple': 1,
'Banana': 3,
'Grape': 3,
'Orange': 3,
'Strawberry': 4},
'01/04/2017': {'Apple': 4,
'Banana': 6,
'Grape': 2,
'Mango': 2,
'Orange': 3,
'Strawberry': 4},

#### Ensure Occurences: Respect Zero Count

I need to simulate Excel feature, showing a category that does not exist in original table.

Ensuring occurrences can be done, by creating a new dictionary with all class values and their counts, even zero count if category not found. This can be done with list comprehension below:

    self.ensure_occurrences = {
date: {
cat: pairs.get(cat, 0)
for cat in self.categories
}
for date, pairs in self.occurrences.items()}

pprint(self.ensure_occurrences)

The list comprehension is pretty self explanatory. The first loop is iterating all occurrences, and getting the default value as zero if category not found.

    cat: pairs.get(cat, 0)

The result would looks like below:

{'01/03/2017': {'Apple': 1,
'Banana': 3,
'Dragon Fruit': 0,
'Durian': 0,
'Grape': 3,
'Mango': 0,
'Orange': 3,
'Strawberry': 4},
'01/04/2017': {'Apple': 4,
'Banana': 6,
'Dragon Fruit': 0,
'Durian': 0,
'Grape': 2,
'Mango': 2,
'Orange': 3,
'Strawberry': 4},

### Displaying Pivot

Moving from the model, let’s shift our focus to the view by expanding the display() method.

#### Main Program

The main program remains intact. No changes have been made.

#### Class Skeleton

The class skeleton is as below:

class PivotSample:
def __init__(self, source_csv: str,
categories: List[str]) -> None:
...

...

def build_pivot(self) -> None:
...

def process(self):
...

def display_basic(self) -> None:
...

def display_process(self) -> None:
...

...

def display_pivot_rows(self) -> None:
...

def display(self) -> None:
...

We have some additional display method.

#### Process

The main process is still intact. No changes have been made.

  def process(self):
self.build_pivot()

#### Build Pivot

This is where the list comprehension perform pivot operations

We should regularly clear the model from any view, that might have occurred while developing data model. Remove all pprint from the build_pivot() method.

  def build_pivot(self) -> None:
try:
self.all_dates = {
row[1]: [item[2]
for item in self.all_values
if item[1] == row[1]]
for row in self.all_values
}

self.occurrences = {
date: {cat: row.count(cat) for cat in set(row)}
for date, row in self.all_dates.items()
}

self.ensure_occurrences = {
date: {
cat: pairs.get(cat, 0)
for cat in self.categories
}
for date, pairs in self.occurrences.items()}

except Exception as e:
print("An error occurred " \
+ f"while processing data: {e}")

This is the final version of this build_pivot() method.

#### The Main Display Method

There is a lot happening on the display. We have four new methods here.

  def display(self) -> None:
print("No data to display.")
else:
self.display_basic()
self.display_process()

self.display_pivot_rows()

#### Display Basic: Plain Table

Sometimes we need to look back, at the original data.

  def display_basic(self) -> None:
# ['Number', 'Date', 'Fruit']

# Display the data
# ['1', '19/02/2017', 'Orange']
for row in self.all_values:
print(row)

print()

#### Display Process: Data Transformation

For debugging purposes, I document the example result of each data transformation

  def display_process(self) -> None:
# '19/02/2017': ['Orange', 'Grape', 'Strawberry', 'Orange',
#   'Apple', 'Banana', 'Strawberry', 'Banana', 'Strawberry']
print(self.all_dates)
print()

# '19/02/2017': {'Mango': 2, 'Banana': 3, 'Apple': 1,
#   'Strawberry': 4, 'Grape': 2, 'Orange': 3}
print(self.occurrences)
print()

# {'19/02/2017': {'Apple': 1, 'Banana': 3, 'Dragon Fruit': 0,
#  'Durian': 0, 'Grape': 2, 'Mango': 2,
#  'Orange': 3, 'Strawberry': 4}
print(self.ensure_occurrences)
print()

  def display_pivot_header(self) -> None:
line_title = "Date/Fruit"
for cat in self.categories:
line_title += f"\t{cat}"

print(line_title)

#### Display Pivot Rows: Count

  def display_pivot_rows(self) -> None:
for date, pairs in \
self.ensure_occurrences.items():
line_row = f"{date}"

for cat, count in pairs.items():
line_row += f"\t{count}"

print(line_row)

#### The Pivot Result Example

The pivot result example, can be shown as below. The spacing might looks different, because I use tabs instead of spaces.

Date/Fruit	Apple	Banana	Dragon Fruit	Durian	Grape	Mango Orange	Strawberry
19/02/2017	1	3	0	0	2	2	3	4
20/02/2017	0	4	0	0	0	2	2	2
21/02/2017	2	1	0	0	4	4	2	3
22/02/2017	4	1	0	0	1	1	4	5
23/02/2017	1	4	0	0	3	5	6	3
24/02/2017	2	3	0	0	4	2	5	5
25/02/2017	1	2	0	0	4	1	1	1
26/02/2017	4	2	0	0	2	2	3	4
27/02/2017	3	3	0	0	4	1	4	1
28/02/2017	3	2	0	0	3	4	3	2

It’s not a perfect display, but at least we have a preview.

### Displaying Total

What good is pivot without total summary? We are going to create three kinds of total.

• Total Row
• Total Column
• Grand Total

Each process stands in its own method, that all called in display() method. For the sake of clarity, I do not follow standard UML diagram.

#### Class Skeleton

There is only one additional method

class PivotSample:
...

...

def display_pivot_rows(self) -> None:
...

def display_pivot_total(self) -> None:
...

def display(self) -> None:
...

While the display_pivot_total() method show the total at the bottom. The row total at the rightmost column is done in display_pivot_rows().

We need to add additional Sub Total title in the header.

  def display_pivot_header(self) -> None:
line_title = "Date/Fruit"
for cat in self.categories:
line_title += f"\t{cat}"

line_title += f"\tSub Total"
print(line_title)

#### Total Row

For every row, we also add an additional column, that sums all the values in that row.

  def display_pivot_rows(self) -> None:
# Create rows for each category
for date, pairs in \
self.ensure_occurrences.items():
line_row = f"{date}"

for cat, count in pairs.items():
line_row += f"\t{count}"

# Calculate the total for each column
total = sum(pairs.values())
line_row += f"\t{total}"

print(line_row)

#### Total Column and Grand Total

To create a total row: sum of the column.

This is a new method, The task is adding an additional row, at the bottom of the table. Starting from the left header, with the sum of column values, and ending with the grand total.

  def display_pivot_total(self) -> None:
line_total = "Grand Total"
total = {}

for cat in self.categories:
total[cat] = 0

for date, pairs in \
self.ensure_occurrences.items():
total[cat] += pairs[cat]

line_total += f"\t{total[cat]}"

line_total += f"\t{sum(total.values())}"
print(line_total)

#### Pivot Display

We do not need to display all the stuff. All we need is just the pivot. So I commented out any unnecessary output.

  def display(self) -> None:
print("No data to display.")
else:
# self.display_basic()
# self.display_process()

self.display_pivot_rows()
self.display_pivot_total()

#### The Pivot Result Example

The pivot result example, can be shown as below. You can adjust the spacing manually for better looks.

❯ python 05-total.py
Date/Fruit	Apple	Banana	Dragon Fruit	Durian	Grape	Mango	Orange	Strawberry	Sub Total
19/02/2017	1	3	0	0	2	2	3	4 	15
20/02/2017	0	4	0	0	0	2	2	2 	10
21/02/2017	2	1	0	0	4	4	2	3 	16
22/02/2017	4	1	0	0	1	1	4	5 	16
...
...
...
02/04/2017	1	1	0	0	5	4	5	1 	17
03/04/2017	4	0	0	0	3	3	3	5 	18
04/04/2017	0	2	0	0	2	1	3	3 	11
05/04/2017	3	1	0	0	1	2	2	4 	13
Grand Total	114	125	0	0	127	115	148	126	755

We got ourselves, a very basic pivot table.

### Compare

Let’s compare it with our previous worksheet.

Since the count number are the same. I think we are doing just fine.

### What’s Up Next 🤔?

Feeling dizzy yet? We are going to simplify this pivot-building process using a dataframe.

Consider continuing to read [ Pivot - Model - Dataframe ].