Preface
Goal: Simple spreadsheet diff tool using openpyxl.
Why Diff?
Tracking Changes
It is very common to have weekly report, with hundreds of rows. We can even share a document to be edited from multiple places, such as googledocs. This very sheet created by your team is normally, changed from time to time. Getting larger and larger. Further, there is actually a tracking changes feature, that nobody knows about. So what’s the problem?
Sometimes employee require further processing, or just need to take a note, then make a copy of the sheet for themself. This time the data is already detached.
After a while, let’s say a week later, you need to update your data. But the data has changed, and you are too shy to ask to your manager, which changes, at what cell.
This is where the diff tools take place, and save your career.
How to Track?
Brute force compare, then coloring.
Consider the input. These required two sheet: First, and Second.
Then, where is wally?
Both are pretty similar, but if you look closer there are differences. As shown in output below:
The figure above is a simplification of real data.
Example Data
You can download here
-
Input: first.xlsx
-
Input: second.xlsx
-
Output: diff.xlsx
Why Automation?
Time savings.
It is pretty much easier than doing manually.
Shortly, we should invest on thinking, rather than doing tedious repetitive task.
Diff Code
Enough with introduction. Let’s get into the code.
There is only consist one short file required.
Skeleton
I’m using class.
import ...
class SheetDiff:
def __init__(self, ...):
...
def diff(self):
...
def main():
sample = SheetDiff(...)
sample.diff()
main()
Very simple, it only have one method diff
, beside the constructor.
Import Header
Since this using openpyxl
library,
we must declare first at the top.
import openpyxl
from openpyxl import load_workbook
from openpyxl.styles import PatternFill
from openpyxl.utils import get_column_letter
Constructor
The constructor is also simple.
def __init__(self, book_fst, sheet_fst,
book_snd, sheet_snd,
book_target, cols, rows):
This will be called in main later, for example as below:
sample = SheetDiff(
'first.xlsx', 'Combined',
'second.xlsx', 'Combined',
'diff.xlsx', cols, rows)
Consider go back to our constructor, it is initialize a few stuff, such as open both sheet.
wb_fst = load_workbook(book_fst)
self.ws_fst = wb_fst[sheet_fst]
self.wb_snd = load_workbook(book_snd)
self.ws_snd = self.wb_snd[sheet_snd]
And also keep other variable as well, bundle in class.
self.target = book_target
self.rows = rows
self.cols = cols
At last, add some constant, that will be used throught out the loop later.
color_lime5 = 'FFCDDC39'
self.fill = PatternFill(
start_color = color_lime5,
end_color = color_lime5,
fill_type = 'solid')
The Diff Method
This double loop compare all. And finally save the result to new spreadsheet.
def diff(self):
for col in self.cols:
for row in self.rows:
cursor = get_column_letter(col) + str(row)
...
# Save the file
self.wb_snd.save(self.target)
Inside the loop is just this code
cursor = get_column_letter(col) + str(row)
value_fst = self.ws_fst[cursor].value
value_snd = self.ws_snd[cursor].value
if (value_fst != value_snd):
print(f'{cursor:6} : {value_fst} -> {value_snd}')
self.ws_snd[cursor].fill = self.fill
If the value is different for both sheet, there are two actions that will be taken:
- Print in terminal.
- Color the cell, so we can see the difference.
Main
Program entry Point
This is also only consist one file
def main():
cols = range(1, 8)
rows = range(3, 43)
sample = SheetDiff(
'first.xlsx', 'Combined',
'second.xlsx', 'Combined',
'diff.xlsx', cols, rows)
sample.diff()
main()
With the result similar as below:
B21 : Female -> Mouse
C11 : 0.55 -> 0.54
C20 : 0.57 -> 0.36
C24 : 1 -> 0.9
C25 : 1 -> 1.1
D7 : Female -> Male
D22 : Female -> Dog
E14 : 0.5 -> 0.77
F25 : Juvenile -> Cat
G22 : 0.66 -> 0.49
G42 : 0.58 -> 0.38
The Result
Finally we have peace, with complete coloring.
Now you can continue with your own further processing.
What is Next 🤔?
To prepare for massive database output, we also need to learn to decorate the spreadsheet. Consider start with importing master detail CSV.
Consider continue reading [ Python - Excel - CSV - Part One ].