Where to Discuss?

Local Group

Preface

Goal: Transform data with list comprehension, from plain table to pivot.

In this article, we will explore how the data evolves from a table to a pivot. We’ll have fun with list comprehension and learn cool tricks while transforming data.


Building Pivot

After reading the CSV, row by row, we can build model using build_pivot(), by calculating the pivot manually.

Pivot: List Comprehension: Class Diagram

I’ve simplified the diagram for better understanding.

Main Program

The main program remains intact, exactly the same as before.

Class Skeleton

The class skeleton is as follows:

class PivotSample:
  def __init__(self, source_csv: str,
      categories: List[str]) -> None:
    ...

  def load_data(self) -> None:
    ...

  def build_pivot(self) -> None:
    ...

  def display(self) -> None:
    ...

  def process(self):
    self.header = None
    self.load_data()

Pivot: List Comprehension: Class Skeleton

Instead of returning a value, in OOP, we can store the value in a property.

Process

We are going to add build_pivot() method here:

  def process(self):
    self.header = None
    self.load_data()
    self.build_pivot()

Build Pivot

This is our main part. Let’s go through this step by step.


Transformation Abstraction

Consider making a plan without ever writing a single line of code. Write down how the data transforms.

Raw Data

Imagine how to get the pivot using set theory. Denote the raw data as a set Raw Data, where each element is a triple:

For example:

Number,Date,Fruit
1,19/02/2017,Orange
2,19/02/2017,Grape
3,19/02/2017,Strawberry
4,19/02/2017,Orange
5,19/02/2017,Apple

Overall Process

Express the overall process in a single mathematical expression. Note that this expression would be more symbolic and conceptual, rather than an actual operation.

This expression uses the set union symbol (∪) to add the zero count for categories not found on a particular date.

Actual Process

The actual operations involve several steps. Let’s break down the operations.

  1. Grouping by Date:

For example:

'01/03/2017': ['Strawberry',
                'Strawberry',
                'Grape',
                'Grape',
                'Banana',
                'Apple',
                'Orange',
                'Orange',
                'Banana',
                'Grape',
                'Banana',
                'Strawberry',
                'Orange',
                'Strawberry'],
  1. Counting Occurrences:

The result of above example is:

{'01/03/2017': {'Apple': 1,
                'Banana': 3,
                'Grape': 3,
                'Orange': 3,
                'Strawberry': 4},
 '01/04/2017': {'Apple': 4,
                'Banana': 6,
                'Grape': 2,
                'Mango': 2,
                'Orange': 3,
                'Strawberry': 4},
  1. Ensuring All Categories with Zero Counts:

The data required to for each pivot row are:

{'01/03/2017': {'Apple': 1,
                'Banana': 3,
                'Grape': 3,
                'Orange': 3,
                'Strawberry': 4},
 '01/04/2017': {'Apple': 4,
                'Banana': 6,
                'Grape': 2,
                'Mango': 2,
                'Orange': 3,
                'Strawberry': 4},

Now we are ready for the actual code.


Data Transformation

Let’s backtrack for a while:

CSV File

The original CSV file have a format as below

Number,Date,Fruit
1,19/02/2017,Orange
2,19/02/2017,Grape
3,19/02/2017,Strawberry
4,19/02/2017,Orange
5,19/02/2017,Apple

All Values: CSV Reader Result

As a result of the CSV reader, we have a self.all_values variable. This memory data can be shown in this List of List(s):

❯ python 02-model-view.py
['Number', 'Date', 'Fruit']
['1', '19/02/2017', 'Orange']
['2', '19/02/2017', 'Grape']
['3', '19/02/2017', 'Strawberry']
['4', '19/02/2017', 'Orange']
['5', '19/02/2017', 'Apple']

This is our first transformation.

All Dates: Grouping Category

I need to group the result by date, keeping the date as index for further use. This means we are the number index, will be disposed.

In order to group by date, we can use list comprehension in below:

  def build_pivot(self) -> None:
    self.all_dates = {
      row[1]: [item[2]
      for item in self.all_values
      if item[1] == row[1]]
        for row in self.all_values
    }

    pprint(self.all_dates)

This is basically a loop in a loop, comparing all_values item. The first loop is iterating all values, while the second loop will append the category, only if the date matches.

Pivot: List Comprehension: Build Pivot: All Dates

The result would looks like below:

'01/03/2017': ['Strawberry',
                'Strawberry',
                'Grape',
                'Grape',
                'Banana',
                'Apple',
                'Orange',
                'Orange',
                'Banana',
                'Grape',
                'Banana',
                'Strawberry',
                'Orange',
                'Strawberry'],

Occurences: Category Count

I need to count the distinct category for each group.

Counting occurrences can be done, by breaking row into an array of distinct category value. This can be done with list comprehension below:

    self.occurrences = {
      date: {cat: row.count(cat) for cat in set(row)}
      for date, row in self.all_dates.items()
    }

    pprint(self.occurrences)

The list comprehension is pretty self explanatory. The first loop is iterating all dates,

      for date, row in self.all_dates.items()

The second one is getting the count of distinct category. Distinct category can be obtained by using set. The count work for specific category in row (list of values).

      {cat: row.count(cat) for cat in set(row)}

Pivot: List Comprehension: Build Pivot: Occurences

The result would looks like below:

{'01/03/2017': {'Apple': 1,
                'Banana': 3,
                'Grape': 3,
                'Orange': 3,
                'Strawberry': 4},
 '01/04/2017': {'Apple': 4,
                'Banana': 6,
                'Grape': 2,
                'Mango': 2,
                'Orange': 3,
                'Strawberry': 4},

Ensure Occurences: Respect Zero Count

I need to simulate Excel feature, showing a category that does not exist in original table.

Ensuring occurrences can be done, by creating a new dictionary with all class values and their counts, even zero count if category not found. This can be done with list comprehension below:

    self.ensure_occurrences = {
      date: {
        cat: pairs.get(cat, 0)
        for cat in self.categories
      }
      for date, pairs in self.occurrences.items()}

    pprint(self.ensure_occurrences)

The list comprehension is pretty self explanatory. The first loop is iterating all occurrences, and getting the default value as zero if category not found.

    cat: pairs.get(cat, 0)

Pivot: List Comprehension: Build Pivot: Ensure

The result would looks like below:

{'01/03/2017': {'Apple': 1,
                'Banana': 3,
                'Dragon Fruit': 0,
                'Durian': 0,
                'Grape': 3,
                'Mango': 0,
                'Orange': 3,
                'Strawberry': 4},
 '01/04/2017': {'Apple': 4,
                'Banana': 6,
                'Dragon Fruit': 0,
                'Durian': 0,
                'Grape': 2,
                'Mango': 2,
                'Orange': 3,
                'Strawberry': 4},

Displaying Pivot

Moving from the model, let’s shift our focus to the view by expanding the display() method.

Pivot: List Comprehension: Class Diagram

Main Program

The main program remains intact. No changes have been made.

Class Skeleton

The class skeleton is as below:

class PivotSample:
  def __init__(self, source_csv: str,
      categories: List[str]) -> None:
    ...

  def load_data(self) -> None:
    ...

  def build_pivot(self) -> None:
    ...

  def process(self):
    ...

  def display_basic(self) -> None:
    ...

  def display_process(self) -> None:
    ...

  def display_pivot_header(self) -> None:
    ...

  def display_pivot_rows(self) -> None:
    ...

  def display(self) -> None:
    ...

Pivot: List Comprehension: Class Skeleton

We have some additional display method.

Process

The main process is still intact. No changes have been made.

  def process(self):
    self.header = None
    self.load_data()
    self.build_pivot()

Build Pivot

This is where the list comprehension perform pivot operations

We should regularly clear the model from any view, that might have occurred while developing data model. Remove all pprint from the build_pivot() method.

  def build_pivot(self) -> None:
    try:
      self.all_dates = {
        row[1]: [item[2]
        for item in self.all_values
        if item[1] == row[1]]
          for row in self.all_values
      }

      self.occurrences = {
        date: {cat: row.count(cat) for cat in set(row)}
        for date, row in self.all_dates.items()
      }

      self.ensure_occurrences = {
        date: {
          cat: pairs.get(cat, 0)
          for cat in self.categories
        }
        for date, pairs in self.occurrences.items()}

    except Exception as e:
      print("An error occurred " \
        + f"while processing data: {e}")

Pivot: List Comprehension: Build Pivot

This is the final version of this build_pivot() method.

The Main Display Method

There is a lot happening on the display. We have four new methods here.

  def display(self) -> None:
    if self.header is None:
      print("No data to display.")
    else:
      self.display_basic()
      self.display_process()

      self.display_pivot_header()
      self.display_pivot_rows()

Pivot: List Comprehension: Display Main

Display Basic: Plain Table

Sometimes we need to look back, at the original data.

  def display_basic(self) -> None:
    # Display the header
    # ['Number', 'Date', 'Fruit']
    print(self.header)

    # Display the data
    # ['1', '19/02/2017', 'Orange']
    for row in self.all_values:
      print(row)

    print()

Pivot: List Comprehension: Display Basic

Display Process: Data Transformation

For debugging purposes, I document the example result of each data transformation

  def display_process(self) -> None:
      # '19/02/2017': ['Orange', 'Grape', 'Strawberry', 'Orange',
      #   'Apple', 'Banana', 'Strawberry', 'Banana', 'Strawberry']
      print(self.all_dates)
      print()

      # '19/02/2017': {'Mango': 2, 'Banana': 3, 'Apple': 1,
      #   'Strawberry': 4, 'Grape': 2, 'Orange': 3}
      print(self.occurrences)
      print()

      # {'19/02/2017': {'Apple': 1, 'Banana': 3, 'Dragon Fruit': 0,
      #  'Durian': 0, 'Grape': 2, 'Mango': 2,
      #  'Orange': 3, 'Strawberry': 4}
      print(self.ensure_occurrences)
      print()

Pivot: List Comprehension: Display Process

Display Pivot Header: Categories

  def display_pivot_header(self) -> None:
    line_title = "Date/Fruit"
    for cat in self.categories:
       line_title += f"\t{cat}"

    print(line_title)

Pivot: List Comprehension: Display Pivot Header

Display Pivot Rows: Count

  def display_pivot_rows(self) -> None:
    for date, pairs in \
        self.ensure_occurrences.items():
      line_row = f"{date}"

      for cat, count in pairs.items():
        line_row += f"\t{count}"

      print(line_row)

Pivot: List Comprehension: Display Pivot Rows

The Pivot Result Example

The pivot result example, can be shown as below. The spacing might looks different, because I use tabs instead of spaces.

Date/Fruit	Apple	Banana	Dragon Fruit	Durian	Grape	Mango Orange	Strawberry
19/02/2017	1	3	0	0	2	2	3	4
20/02/2017	0	4	0	0	0	2	2	2
21/02/2017	2	1	0	0	4	4	2	3
22/02/2017	4	1	0	0	1	1	4	5
23/02/2017	1	4	0	0	3	5	6	3
24/02/2017	2	3	0	0	4	2	5	5
25/02/2017	1	2	0	0	4	1	1	1
26/02/2017	4	2	0	0	2	2	3	4
27/02/2017	3	3	0	0	4	1	4	1
28/02/2017	3	2	0	0	3	4	3	2

Pivot: List Comprehension: Pivot Result

It’s not a perfect display, but at least we have a preview.


Displaying Total

What good is pivot without total summary? We are going to create three kinds of total.

  • Total Row
  • Total Column
  • Grand Total

Each process stands in its own method, that all called in display() method. For the sake of clarity, I do not follow standard UML diagram.

Pivot: List Comprehension: Class Diagram

Class Skeleton

There is only one additional method

class PivotSample:
  ...

  def display_pivot_header(self) -> None:
    ...

  def display_pivot_rows(self) -> None:
    ...

  def display_pivot_total(self) -> None:
    ...

  def display(self) -> None:
    ...

While the display_pivot_total() method show the total at the bottom. The row total at the rightmost column is done in display_pivot_rows().

Total Header

We need to add additional Sub Total title in the header.

  def display_pivot_header(self) -> None:
    # Create a header row
    line_title = "Date/Fruit"
    for cat in self.categories:
       line_title += f"\t{cat}"

    line_title += f"\tSub Total"
    print(line_title)

Pivot: List Comprehension: Total Header

Total Row

For every row, we also add an additional column, that sums all the values in that row.

  def display_pivot_rows(self) -> None:
    # Create rows for each category
    for date, pairs in \
        self.ensure_occurrences.items():
      line_row = f"{date}"

      for cat, count in pairs.items():
        line_row += f"\t{count}"

      # Calculate the total for each column
      total = sum(pairs.values())
      line_row += f"\t{total}"

      print(line_row)

Pivot: List Comprehension: Total Row

Total Column and Grand Total

To create a total row: sum of the column.

This is a new method, The task is adding an additional row, at the bottom of the table. Starting from the left header, with the sum of column values, and ending with the grand total.

  def display_pivot_total(self) -> None:
    line_total = "Grand Total"
    total = {}

    for cat in self.categories:
      total[cat] = 0

      for date, pairs in \
          self.ensure_occurrences.items():
        total[cat] += pairs[cat]

      line_total += f"\t{total[cat]}"

    line_total += f"\t{sum(total.values())}"
    print(line_total)

Pivot: List Comprehension: Total Column

Pivot Display

We do not need to display all the stuff. All we need is just the pivot. So I commented out any unnecessary output.

  def display(self) -> None:
    if self.header is None:
      print("No data to display.")
    else:
      # self.display_basic()
      # self.display_process()

      self.display_pivot_header()
      self.display_pivot_rows()
      self.display_pivot_total()

Pivot: List Comprehension: Display Main

The Pivot Result Example

The pivot result example, can be shown as below. You can adjust the spacing manually for better looks.

 python 05-total.py
Date/Fruit	Apple	Banana	Dragon Fruit	Durian	Grape	Mango	Orange	Strawberry	Sub Total
19/02/2017	1	3	0	0	2	2	3	4 	15
20/02/2017	0	4	0	0	0	2	2	2 	10
21/02/2017	2	1	0	0	4	4	2	3 	16
22/02/2017	4	1	0	0	1	1	4	5 	16
...
...
...
02/04/2017	1	1	0	0	5	4	5	1 	17
03/04/2017	4	0	0	0	3	3	3	5 	18
04/04/2017	0	2	0	0	2	1	3	3 	11
05/04/2017	3	1	0	0	1	2	2	4 	13
Grand Total	114	125	0	0	127	115	148	126	755

Pivot: List Comprehension: Pivot Result

We got ourselves, a very basic pivot table.


Compare

Let’s compare it with our previous worksheet.

Pivot: Result Preview

Since the count number are the same. I think we are doing just fine.


What’s Up Next 🤔?

Feeling dizzy yet? We are going to simplify this pivot-building process using a dataframe.

Consider continuing to read [ Pivot - Model - Dataframe ].