Where to Discuss?

Local Group

Preface

Goal: Simple spreadsheet diff tool using openpyxl.

Why Diff?

Tracking Changes

It is very common to have weekly report, with hundreds of rows. We can even share a document to be edited from multiple places, such as googledocs. This very sheet created by your team is normally, changed from time to time. Getting larger and larger. Further, there is actually a tracking changes feature, that nobody knows about. So what’s the problem?

Sometimes employee require further processing, or just need to take a note, then make a copy of the sheet for themself. This time the data is already detached.

After a while, let’s say a week later, you need to update your data. But the data has changed, and you are too shy to ask to your manager, which changes, at what cell.

Capability
The script shall be able to show difference between one sheet and another.

This is where the diff tools take place, and save your career.

How to Track?

Brute force compare, then coloring.

Consider the input. These required two sheet: First, and Second.

openpyxl: Input: First Sheet

Then, where is wally?

openpyxl: Input: Second Sheet

Both are pretty similar, but if you look closer there are differences. As shown in output below:

openpyxl: Output: Result Sheet

The figure above is a simplification of real data.

Example Data

You can download here

Why Automation?

Time savings.

It is pretty much easier than doing manually.

Shortly, we should invest on thinking, rather than doing tedious repetitive task.


Diff Code

Enough with introduction. Let’s get into the code.

There is only consist one short file required.

Skeleton

I’m using class.

import ...

class SheetDiff:
  def __init__(self, ...):
    ...

  def diff(self):
    ...

def main():
  sample = SheetDiff(...)
  sample.diff()

main()

Very simple, it only have one method diff, beside the constructor.

Import Header

Since this using openpyxl library, we must declare first at the top.

import openpyxl
from openpyxl import load_workbook
from openpyxl.styles import PatternFill
from openpyxl.utils import get_column_letter

openpyxl: ViM: Impot Header

Constructor

The constructor is also simple.

  def __init__(self, book_fst, sheet_fst,
                     book_snd, sheet_snd,
                     book_target, cols, rows):

This will be called in main later, for example as below:

      sample = SheetDiff(
             'first.xlsx',  'Combined',
             'second.xlsx', 'Combined',
             'diff.xlsx', cols, rows)

openpyxl: ViM: Constructor

Consider go back to our constructor, it is initialize a few stuff, such as open both sheet.

    wb_fst      = load_workbook(book_fst)
    self.ws_fst = wb_fst[sheet_fst]
    self.wb_snd = load_workbook(book_snd)
    self.ws_snd = self.wb_snd[sheet_snd]

And also keep other variable as well, bundle in class.

    self.target = book_target
    self.rows   = rows
    self.cols   = cols

At last, add some constant, that will be used throught out the loop later.

    color_lime5   = 'FFCDDC39'
    self.fill     = PatternFill(
      start_color   = color_lime5,
      end_color     = color_lime5,
      fill_type     = 'solid')

The Diff Method

This double loop compare all. And finally save the result to new spreadsheet.

  def diff(self):
    for col in self.cols:
      for row in self.rows:
        cursor = get_column_letter(col) + str(row)
        ...

    # Save the file
    self.wb_snd.save(self.target)

Inside the loop is just this code

        cursor = get_column_letter(col) + str(row)

        value_fst = self.ws_fst[cursor].value
        value_snd = self.ws_snd[cursor].value

        if (value_fst != value_snd):
          print(f'{cursor:6} : {value_fst} -> {value_snd}')
          self.ws_snd[cursor].fill = self.fill

If the value is different for both sheet, there are two actions that will be taken:

  1. Print in terminal.
  2. Color the cell, so we can see the difference.

openpyxl: ViM: Diff Method

Main

Program entry Point

This is also only consist one file

def main():
  cols = range(1, 8)
  rows = range(3, 43)
  sample = SheetDiff(
             'first.xlsx',  'Combined',
             'second.xlsx', 'Combined',
             'diff.xlsx', cols, rows)
  sample.diff()

main()

With the result similar as below:

B21    : Female -> Mouse
C11    : 0.55 -> 0.54
C20    : 0.57 -> 0.36
C24    : 1 -> 0.9
C25    : 1 -> 1.1
D7     : Female -> Male
D22    : Female -> Dog
E14    : 0.5 -> 0.77
F25    : Juvenile -> Cat
G22    : 0.66 -> 0.49
G42    : 0.58 -> 0.38

openpyxl: ViM: Program Entry Point

The Result

Finally we have peace, with complete coloring.

openpyxl: Output: Result Sheet

Now you can continue with your own further processing.


What is Next 🤔?

To prepare for massive database output, we also need to learn to decorate the spreadsheet. Consider start with importing master detail CSV.

Consider continue reading [ Python - Excel - CSV - Part One ].