Where to Discuss?

Local Group

Preface

Goal: Overview of daily office task automation, from merging CSVs, to format complex multiple tabular data.

We can use use worksheet to facilitate clearer communication so anyone who read your report can interpret the content easily. To create visual appeal you need to know how to format spreadsheet. As the formatting become your workflow, we need to automate formatting to saving valuable working time.

The reward is similar to server admin, who can automate server task with BASH script, make their job efficient, then having opportunity to raise the salary. But this is down to earth, since anybody can use spreadsheet, as daily routine.

Formatting Automation

Spreadsheet is basic skill that can emphasize modern day job. Well formatted worksheet would make your content more readable. Content are matters, and visual appeal would make your report interesting.

Good reporting can even put your team ahead. You can think of your worksheet as communication tools, and not just a data in a table. Then what you do is not just reporting result to your colleague, but rather proposing idea.

The design key is small details but everywhere. This applied to header color, text font, cell border, row height, column width, numbering, alignment et cetera. Once you found your formatting style, you can apply to many sheet you work with. Formatting is fun, make this as your daily routine, and you would looks seriously good at your task.

What good about this formatting is we can automate this process. Integrating automation into your workflow would make your task more efficient. For huge amount of document, this automation would save man hours. Reducing a week of task into just few hours. This means you can have time to focus to other higher priority, other than just reporting.

In Data Science

When we deal with data we can use jupyter notebook. But when we deal with boss or customer, what matters is simple spreadsheet. You might think that jupyter notebook is easy to use, you can filter, sort, pivoting the dataframe and so on. But with spreadsheet, your boss can scroll with Excel.

But here is the catch, Excel or Calc have limitation, when it comes to heavy processing, especially when using vlookup. And pivoting would also looks primitive compared to dataframe.

What we need is to separate the workflow.

  1. Heavy data processing using python script, from CSV to other CSV.
  2. Formatting using LibreOfice Macro, with result as either .ods or .xlsx.

Configuration

To face all above obstacles, we need to simplify differetn use cases into configuration.

This article is not merely just about spreadsheet formatting, but also implement configuration into code in python class. We will see how to transform from simple configuration, to unified configuration.

Why This Article?

The fact that the macro works, does not mean that it is good enough.

For me, documentation is a must. Making a code for myself, and for other people is completely different. I have think about the logic dozen times. Test dozen times. Make class diagram, and activity diagram. After I pour my code into diagram, any inconsistencies will shown up. So I have time to fix flaw on my logic. then I have to think about readibility as well. Not just giving proper comments, but concise code as well. After all these rituals, I produced better code.

I can use this better code in production. I mean daily task in my office. It is easier to alter a tidy code than the mess one. As an employee I become more productive.

This is a complete cycle. I this method for almost my project. Although not everything have to be shown in article series in blog. Most of my project is belong to the company, and I do the documentation in a very different approach, such as internal document. But the cycle is very similar.

In short: I need a good code for myself.


Topics

This is the steps in this article series.

  1. Preparation
    1. Generate example CSV data
    2. Merge CSV data into spreadsheet
    3. Coloring tabs as a bonus
  2. Formatting Data
    1. Simple Configuration: 7 Steps
    2. Unified Configuration: also 7 Steps

Steps Screenshot

First we need Example CSV data.

Example: Vim: CSV

Then we can open this CSV in LibreOffice

Example: LibreOffice: Import Raw

In LibreOffice, we can use macro to format the data easily.

Example: LibreOffice: Formatted

Not just plain format, but we can control more.

Example: LibreOffice: Formatted

Formatting Steps for Simple Configuration

I have mention above about seven steps of formatting. This is the core part of this article.

I’m experimenting with OOP design pattern, and implement with each steps.

  1. Step One

    • Class: FormatterBase, TabularFormatterCommon
    • Basic Flow: Multiple sheet
    • Formatting Procedure: Insert column gap
  2. Step Two:

    • Helper: Last Row
    • Helper: Prevent Reinsert Column
    • Formatting Procedure: Insert row gap, set row height
  3. Step Three

    • Abstract Method
    • Formatting Procedure: Show grid, Freeze.
  4. Step Four

    • Initialization: Prepare Sheet
    • Configuration: Field Metadata Configuration
    • Initialization: Prepare Sheet
    • Helper: Column Letter to Index
    • Helper: Number Formats
    • Formatting Procedure: Format Columns
  5. Step Five

    • Helper: Format Cell Rectangle
    • Predefined Constant: Color Scale (GMC)
    • Predefined Constant: Line Format
    • Formatting Procedure: Add merged title
  6. Step Six

    • Helper: Get head range, set head rectangle, apply head border
    • Formatting Procedure: Format head border
    • Formatting Procedure: Format head colors
  7. Step Seven

    • Helper: Get data range, set data rectangle, apply data border
    • Formatting Procedure: Format data border

I’m also experimenting with diagrams, based on UML but more compact in shape. And also I use symbol instead of legend such as:

  • «abstract»: ~ italic
  • «hook»: ~ italic
  • «private»: -
  • «public»: +

The conceptual representation of the code, can be summarized in this class diagram below.

Class Diagram: Formatter Tabular: Steps - Simple

The actual code is slightly different with the ideal concept for a few reason. The code works well, but the class diagrams seems to have things unproper. Everything is too crowded in base class. This will be fixed in the next class diagram.

Formatting Steps for Unified Configuration

Complex project require a bunch of configuration. So wee need to simplify a few things.

I have mention above about seven steps of formatting. This is the same as previous with additional methods, with one step simple coloring procedure. This simple procedure become complex, when we have to integrate to it into unified configuration.

  1. Step One

    • Class: FormatterBase,FormatterCommon, FormatterTabularData, FormatterTabularMovies
    • Configuration: Merge Metadata
    • Similar to Previous
  2. Step Two:

    • Similar to Previous
  3. Step Three

    • Class: FormatterTabular
    • Similar to Previous
  4. Step Four

    • Helper: Get Relative Column Letter
    • Formatting Procedure: Format Columns (multiple block of columns)
    • Similar to Previous
  5. Step Five

    • Class: BorderFormatManager
    • Formatting Procedure: Add merged titles (multiple block of columns)
    • Formatting Procedure: Set merge title (refactored for one block)
    • Formatting Procedure: Add merged titles post (Hook)
    • Similar to Previous
  6. Step Six

    • Formatting Procedure: Format head borders (multiple block of columns)
    • Similar to Previous
  7. Step Seven

    • Formatting Procedure: Format data borders (multiple block of columns)
    • Similar to Previous
  8. Step Eight: Coloring Cells.

    • TBD

The conceptual representation of the refactored code for the unified configuration, can be summarized in this class diagram below. Now we can automate the sheet formatting. Let’s start with a simple class, rearangging the column. Formatting single sheet and multiple sheet. Class Diagram: Formatter Tabular: Steps - Simple

This looks clear now.


Example Data

For learning purpose we can use IMDB movie as example data table. We can get Movie data using OMDB API.

Column Header

For this example, we will fetch these columns:

  • Year
  • Title
  • Genre
  • Plot
  • Actors
  • Director
  • Rated
  • Runtime
  • Metascore

Example: Vim: CSV: Header

Year,Title,Genre,Plot,Actors,Director,Rated,Runtime,Metascore

Ready to Use Data

Rows in CSView

I have built an example data for you, so you don’t need to genereate a new one.

With CSView the data would looks like below:

Example: CSView

The data consist of movies from year of 1900 until 2020.

Example: File Manager: EXample By Year

Now we are ready to use it.

Data Generator

If you want, you can generate your own data. The script to generate this data can be found here.

You can explore the fetcher script, change the required columns as you need.

Example: Python: OMDB: Fetch

If you want to use, you need to get API KEY from OMDB, then put it in secret.toml.

It’s likely that you don’t need this script, since I’ve already made the example data for you.


Prerequisite

TOML Config

This configuration rely on TOML Configuration. This TOML is required to avoid hardcoded data, so we have flexibility against different type of CSV.

Real world data processing is a hostile situation, in a sense that, you have to deliver task fast, but in the same time, you have too many use case as below:

  1. There is no single constant format of data source. Every customer has different flavour of data format.
  2. You can’t stop your mcolleague for adding-up different colum, as the accounting data grow.Now we can automate the sheet formatting. Let’s start with a simple class, rearangging the column. Formatting single sheet and multiple sheet.
  3. Your boss might need different kind of summary, so your pivot might likely change from time to time.
  4. And finally you can’t stop your own curiosity, and also adding-up different data view with different parameter.

However we can still make a code abstraction so that, a single script can carry the most use cases.

Windows

Installing TOML within python environment for LibreOffice in Windows

LibreOffice carry it’s own python environment. This should not be a problem for most linux, but in Windows we should be aware for the environment. We have to install TOML using python in LibreOffice environment. Installing TOML within python installed on windows won’t works for Macro. We need the right python environment.

In Windows you can open powershell and do these steps.

Prerequisite: Installing TOML within python environment for LibreOffice in Windows

We need to change directory to LibreOffice manuall, then get tthe PIP using CURL.

PS C:\WINDOWS\system32> cd "C:\Program Files\LibreOffice\program"

PS C:\Program Files\LibreOffice\program> curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

PS C:\Program Files\LibreOffice\program> ./python.exe -m pip install tomli
C:\Program Files\LibreOffice\program\python-core-3.8.19\bin\python.exe: No module named pip

Then add the PIP into the environment.

PS C:\Program Files\LibreOffice\program> .\python.exe .\get-pip.py
Collecting pip
  Downloading pip-24.3.1-py3-none-any.whl.metadata (3.7 kB)
Collecting setuptools
  Downloading setuptools-75.3.0-py3-none-any.whl.metadata (6.9 kB)
Collecting wheel
  Downloading wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)Now we can automate the sheet formatting.
Let's start with a simple class, rearangging the column.
Formatting single sheet and multiple sheet.
Downloading pip-24.3.1-py3-none-any.whl (1.8 MB)
   ---------------------------------------- 1.8/1.8 MB 644.1 kB/s eta 0:00:00
Downloading setuptools-75.3.0-py3-none-any.whl (1.3 MB)
   ---------------------------------------- 1.3/1.3 MB 393.2 kB/s eta 0:00:00
Downloading wheel-0.45.1-py3-none-any.whl (72 kB)
Installing collected packages: wheel, setuptools, pip
  WARNING: The script wheel.exe is installed in 'C:\Program Files\LibreOffice\program\python-core-3.8.19\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts pip.exe, pip3.8.exe and pip3.exe are installed in 'C:\Program Files\LibreOffice\program\python-core-3.8.19\Scripts' which is not on PATH.Now we can automate the sheet formatting.
Let's start with a simple class, rearangging the column.
Formatting single sheet and multiple sheet.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed pip-24.3.1 setuptools-75.3.0 wheel-0.45.1

Now you can install TOML within LibreOffice’s python environment.

PS C:\Program Files\LibreOffice\program> ./python.exe -m pip install tomli
Collecting tomli
  Downloading tomli-2.2.1-py3-none-any.whl.metadata (10 kB)
Downloading tomli-2.2.1-py3-none-any.whl (14 kB)
Installing collected packages: tomli
Successfully installed tomli-2.2.1

What is Next 🤔?

Before we study the code in formatter class, we need to merge the example data above into a workbook. We are going to merge these example CSVs into one workbook using macro.

Consider continue reading [ Formatter - Config - Merge CSV ].