Python - Excel - Coloring

LibreOffice

Python - Excel

Local Group

Preface

Goal: Simple conditional cell coloring using openpyxl.

Why Coloring?

Have you ever have to analyze data by the value of the cell? Well, if the data is a few, there is no problem with that. But when it comes to hundred of data, your eyes is going to be so weary.

The script shall be able to color cell based on its value.

The solution is using color to spot anomaly, instead of using the value itself.

What Color?

The Puppy Case.

Imagine a case, you have three cameras that could identify dogs, by gender and age, and also with the confidence score.

openpyxl: Input: Plain Sheet

You can color, not just by the result. But also the gradient based on the confidence score.

openpyxl: Output: Colored Sheet

The figure above is a simplification of real data.

Example Data

You can download here

combined.xlsx

Why Automation?

Time savings.

With so many worksheets, as a weekly routine, manual coloring really is an exhausting task. This takes so many working hours, and it is also so tedious that it become error prone.

This is where python comes saving you precious office hours. This means, you can do many thing else, beside just coloring. You still have to do further pivot analysis, or things like statistics.

Shortly, we should invest on thinking, rather than doing tedious repetitive task.

Simple Coloring

Consider getting started from simple thing. Have a look at this source code.

gitlab.com/…/coloring/01-simple.py

Accessing Cell Value

This script just displaying value of cells, and nothing more.

openpyxl: Input: Accessing Cell Value

import openpyxl
from openpyxl import load_workbook

wb = load_workbook('combined.xlsx')
ws = wb["Combined"]

b3 = ws['B3']
c3 = ws['C3']

print(b3.value)
print(c3.value)

With the result similar to below:

Female
0.53

Formatting Cell Style

We can go further by altering the style.

from openpyxl.styles import Color, PatternFill, Font, Border

redFill = PatternFill(
  start_color='FFFF0000',
  end_color='FFFF0000',
  fill_type='solid')

ws['c3'].fill = redFill

Saving Workbook

And do not forget to store the file somewhere.

# Save the file
wb.save("sample.xlsx")

The result of the sheets will be as below:

openpyxl: Output: Formatting Cell Style

Color Library

Since we deal with gradient of color, we should prepare the color. There are a few option, ranging from open color, to google material color.

Percentage Reasoning

My choice comes to Material color, because it has scale range from 100 to 900. with additional 50 that we can use. Perfect for percentage scale from zero to a hundred percent.

The Material Design Color System

The Script Module

I already write all colors, but for learning purpose, I need the code to be short. So I only write three colors here: blue, lime and green.

gitlab.com/…/coloring/mycolors.py

from openpyxl.styles import Color, PatternFill, Font, Border

blueScale = {
  0: 'FFE3F2FD',
  1: 'FFBBDEFB',
  2: 'FF90CAF9',
  3: 'FF64B5F6',
  4: 'FF42A5F5',
  5: 'FF2196F3',
  6: 'FF1E88E5',
  7: 'FF1976D2',
  8: 'FF1565C0',
  9: 'FF0D47A1'
}

limeScale = {
  ...
}

greenScale = {
  ...
}

blueScaleFill  = {}
limeScaleFill  = {}
greenScaleFill = {}

for key, value in blueScale.items():
  blueScaleFill[key] = PatternFill(
    start_color=value,
    end_color=value,
    fill_type='solid')

for key, value in limeScale.items():
  ...

for key, value in greenScale.items():
  ...

Conditional Coloring

Now we can format cell according to, the conditional state of the cell.

gitlab.com/…/coloring/02-conditional.py

Importing Library

We need to import the color library first

import openpyxl
import math
from openpyxl import load_workbook
from mycolors import blueScaleFill, limeScaleFill, greenScaleFill

Getting Color Scale

Now we can format cell according to, the conditional state of the cell.

wb = load_workbook('combined.xlsx')
ws = wb["Combined"]

b3 = ws['B3']
c3 = ws['C3']

pred = b3.value
prob = c3.value
colScale = math.floor(prob*10)

Notice that we have these two lines:

The Prediction.
The Probability.

pred = b3.value
prob = c3.value

From this probability value, we can convert the gradient scale of the material color.

colScale = math.floor(prob*10)

Getting Color Scale

Now we can test our conditional for these two cells. Using only prediction column, and probability column.

if pred=='Female': 
  ws['B3'].fill = blueScaleFill[colScale]
  ws['C3'].fill = blueScaleFill[colScale]
elif pred=='Male':
  ws['B3'].fill = limeScaleFill[colScale]
  ws['C3'].fill = limeScaleFill[colScale]
elif pred=='Junior':
  ws['B3'].fill = greenScaleFill[400]
  ws['C3'].fill = greenScaleFill[400]
elif pred=='Juvenile':
  ws['B3'].fill = greenScaleFill[300]
  ws['C3'].fill = greenScaleFill[300]
elif pred=='Puppy':
  ws['B3'].fill = greenScaleFill[200]
  ws['C3'].fill = greenScaleFill[200]

The result

As usual, do not forget to store the good result as a file somewhere.

# Save the file
wb.save("sample.xlsx")

Now we can highlight the first row of data with the right color as figure below:

openpyxl: Output: Testing Conditional Coloring

We should see the difference between gradient, as we have more row of data.

Applying Color to Range of Rows

After testing for one record, we can utilize the code above to color many rows.

Main Code

The main code is simple. The loop walk each row.

gitlab.com/…/coloring/03-range.py

We define the range manually, for example from first row to the 650th row.

# ... some header and initialization

def color_me(row, name_pred, name_prob):
  # ... some code

rows = range(1, 650)
for row in rows:
  color_me(row, 'B', 'C')
  color_me(row, 'D', 'E')
  color_me(row, 'F', 'G')

# Save the file
wb.save("sample.xlsx")

Each loop calls the color_me function three times. Each for different column.

The Color Me Function

We should make the function, flexible enough to handle any column. The generic function is as below code:

def color_me(row, name_pred, name_prob):
  column_pred = ws[name_pred+str(row)]
  column_prob = ws[name_prob+str(row)]

  pred = column_pred.value
  prob = column_prob.value
  
  if not (type(prob) == int or type(prob) == float): return
  colScale = math.floor(prob*10)
  
  # ... conditional formatting below

Generic Conditional Code

Now the formatting can also be generic:

  if pred=='Female': 
    column_pred.fill = blueScaleFill[colScale]
    column_prob.fill = blueScaleFill[colScale]
  elif pred=='Male':
    column_pred.fill = limeScaleFill[colScale]
    column_prob.fill = limeScaleFill[colScale]
  elif pred=='Junior':
    column_pred.fill = greenScaleFill[4]
    column_prob.fill = greenScaleFill[4]
  elif pred=='Juvenile':
    column_pred.fill = greenScaleFill[3]
    column_prob.fill = greenScaleFill[3]
  elif pred=='Puppy':
    column_pred.fill = greenScaleFill[2]
    column_prob.fill = greenScaleFill[2]

Handling None Value

You should aware ther presence of this code below:

  if not (type(prob) == int or type(prob) == float): return

When the probability is not a number, the entire line would be ignored, because the return statement would exit the function.

This means when we encounter empty white cells, that have None value. This line will be also ignored.

The Result

Finally we have peace, with complete coloring.

openpyxl: Output: Conditional Coloring Applied to Range

This is very useful for processing hundred of rows.

Conclusion

We are done. That is all. Thank you for reading.

What do you think?