Where to Discuss?

Local Group

Preface

Goal: Merging multiple CSVs into a workbook using a Python macro with the UNO library.

Before diving into the code for the formatter class, we first need to merge the example data into a single workbook. In this section, we’ll use a macro to combine the example CSV files into one workbook.

CSV Source Options

How do we choose the CSV source?

There are two approaches to consider:

  1. Using a Directory: The simpler option is to use a directory as the source.

  2. Using a Config: In real-world scenarios, more control is often needed, such as using a TOML configuration.

Reference

While we can manually load CSV files into a Calc sheet using LibreOffice, automating this process requires a simple trick.

For more details, refer to the following link:

You can read the full details on that page.

Manual Loading

Normally, you can load CSV files directly from the file manager, and LibreOffice will show this dialog:

Merge CSV: Manual Loading: Dialog

But how do we load CSV files using a macro? We should avoid the dialog, especially when dealing with dozens of CSV files.


1: CSV Importer: By Directory

The simpler case would be to use a directory as the source.

Macro Skeleton

You can find the full macro here:

The macro consists of only a few methods.

Merge CSV: Calc Macro: Class Skeleton

import os, uno

class CSVImporterDir:
  def __init__(self, input_dir) -> None:
    ...

  def load_csv(self,
        csv_path: str, sheet_name_dst: str) -> None:
    ...

  def process(self) -> None:
    ...

def importDir():
  ...

  csv_importer = CSVImporterDir(input_dir)
  csv_importer.process()

The core part of the macro is the filter options. This is handled in the load_csv() method where the CSV is processed.

APSO Dialog

You can load the macro using APSO

Merge CSV: Calc Macro: APSO Dialog

Initialization

The initialization part is simple and nearly identical in all macros.

Merge CSV: Calc Macro: Initialization

  def __init__(self, input_dir) -> None:
    # config, all parameter arguments from tomli
    self.input_dir= input_dir

    # Get the current LibreOffice context
    self.desktop   = XSCRIPTCONTEXT.getDesktop()
    self.model_dst = self.desktop.loadComponentFromURL(
      "private:factory/scalc", "_blank", 0, ())

Just remember, whenever you plan to refactor, the XSCRIPTCONTEXT should remain in the main script.

Loading CSV

Filter Options

This is the trick we are referring to. With this method, we can avoid the dialog.

Merge CSV: Calc Macro: Loading CSV: Filter Options

First, define the filter options.

  def load_csv(self,
        csv_path: str, sheet_name_dst: str) -> None:

    # Prepare filter options for CSV import\
    filter_options = "44,34,0,1,"
    file_url = uno.systemPathToFileUrl(csv_path)

Then some ritual to set properties using official guidance.

    # Set properties for loading the CSV file
    properties = (
        uno.createUnoStruct("com.sun.star.beans.PropertyValue"),
        uno.createUnoStruct("com.sun.star.beans.PropertyValue"),
    )
    properties[0].Name  = "FilterName"
    properties[0].Value = "Text - txt - csv (StarCalc)"
    properties[1].Name  = "FilterOptions"
    properties[1].Value = filter_options

Now, create the sheet from thin air.

    # Load the CSV file as a spreadsheet
    model_src = self.desktop.loadComponentFromURL(
      file_url, "_blank", 0, properties)
    model_src.Sheets[0].Name = sheet_name_dst
    self.model_dst.Sheets.importSheet(
      model_src, sheet_name_dst, 0)
    model_src.close(True)

This is essentially practical magic. The behind-the-scenes process is something I can’t fully explain.

Process

Multiple CSV

Instead of just loading a single sheet, we can extend this method to handle mass CSV imports.

The main process for loading multiple CSV files, into a single workbook is as follows.

Merge CSV: Calc Macro: Process

We must ensure that we retain one non-empty sheet, and delete it once the process is complete.

  def process(self) -> None:
    # write something, do not delete this line
    first_sheet = self.model_dst.Sheets[0]
    first_sheet.getCellRangeByName(
      "A1").String = "Hello World!"

Next, manage the content of the directory as input to the loop.

    # Get a list of CSV files in the input directory
    all_files = sorted(os.listdir(self.input_dir))
    csv_files = [f for f in all_files if f.endswith('.csv')]

    # Iterate over the files in reverse order
    for idx, filename in enumerate(reversed(csv_files), start=1):
        ...

    self.model_dst.Sheets.removeByName(
      first_sheet.Name)

This is the actual CSV loading process.

    # Iterate over the files in reverse order
    # with an index starting from the largest number
    for idx, filename in enumerate(reversed(csv_files), start=1):
        input_csv = os.path.join(self.input_dir, filename)

        # Format the sheet name with
        # a two-digit index followed by year
        output_sheet = f"{len(csv_files) - idx + 1:02d}-" + \
                       f"{filename[7:11]}"

        print(f"Loading: {input_csv} into sheet: {output_sheet}")
        self.load_csv(input_csv, output_sheet)

Class Instance

Path

The macro function simply creates a class instance with path arguments. Be mindful that paths differ between Linux and Windows systems.

Merge CSV: Calc Macro: Main Method

def importDir():
  # Configure paths or parameters as needed
  
  # Linux based
  input_dir = '/home/epsi/movies-by-year/'

  # Windows based
  # input_dir = 'D:/movies-by-year/'

  csv_importer = CSVImporterDir(input_dir)
  csv_importer.process()

APSO Console

APSO python console [LibreOffice]

If you need to debug, you can use the APSO Console. Simply type the function name importDir() to run the script.

Merge CSV: Calc Macro: APSO Console

>>> importDir()
Loading: /home/epsi/movies-by-year/movies_2019.csv into sheet: 40-2019
Loading: /home/epsi/movies-by-year/movies_2018.csv into sheet: 39-2018
Loading: /home/epsi/movies-by-year/movies_2017.csv into sheet: 38-2017
Loading: /home/epsi/movies-by-year/movies_2016.csv into sheet: 37-2016
Loading: /home/epsi/movies-by-year/movies_2015.csv into sheet: 36-2015
Loading: /home/epsi/movies-by-year/movies_2014.csv into sheet: 35-2014
Loading: /home/epsi/movies-by-year/movies_2013.csv into sheet: 34-2013
Loading: /home/epsi/movies-by-year/movies_2012.csv into sheet: 33-2012
Loading: /home/epsi/movies-by-year/movies_2011.csv into sheet: 32-2011
Loading: /home/epsi/movies-by-year/movies_2010.csv into sheet: 31-2010

Sheet Result

The result of the macro is displayed in the screenshot below:

Merge CSV: Calc Macro: Sheet Result

Now you can save it to ODS, for example, as movies.ods.

Merge CSV: Calc Macro: Save to ODS

You can download the example ODS result here:

We are done here.


Prerequisite

This is the only parts that require tomli. The formatting steps that follow do not depend on tomli.

TOML Config

This configuration relies on TOML to avoid hardcoded data, giving us the flexibility to handle different types of CSV files.

Real-world data processing can be challenging, because you need to deliver tasks quickly, while dealing with a variety of use cases, such as:

  1. Inconsistent data formats: Every customer may have a different flavor of data format.

  2. Dynamic data: You can’t prevent your colleagues, from adding new columns as accounting data grows.

  3. Changing requirements: Your boss might need different kinds of summaries, so your pivot tables may change over time.

  4. Personal curiosity: You might want to explore different views of the data with varying parameters.

Despite these challenges, we can create an abstraction in the code, so that a single script can handle most of these use cases.

Windows

Installing TOML within python environment for LibreOffice on Windows

LibreOffice carries its own Python environment, which generally works fine on Linux. However, on Windows, we need to ensure we’re installing TOML in the correct Python environment, as installing it in the system’s Python won’t work for macros in LibreOffice.

Follow these steps in PowerShell:

Prerequisite: Installing TOML within python environment for LibreOffice in Windows

Change the directory to LibreOffice’s installation folder:

PS C:\WINDOWS\system32> cd "C:\Program Files\LibreOffice\program"

Download the get-pip.py script using CURL:

PS C:\Program Files\LibreOffice\program> curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

Run the get-pip.py script to install PIP. If you encounter the error above, proceed to the next step.

PS C:\Program Files\LibreOffice\program> ./python.exe -m pip install tomli
C:\Program Files\LibreOffice\program\python-core-3.8.19\bin\python.exe: No module named pip

Add PIP to the environment by running get-pip.py. This will install PIP successfully.

PS C:\Program Files\LibreOffice\program> .\python.exe .\get-pip.py
Collecting pip
  Downloading pip-24.3.1-py3-none-any.whl.metadata (3.7 kB)
Collecting setuptools
  Downloading setuptools-75.3.0-py3-none-any.whl.metadata (6.9 kB)
Collecting wheel
  Downloading wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)Now we can automate the sheet formatting.
Let's start with a simple class, rearangging the column.
Formatting single sheet and multiple sheet.
Downloading pip-24.3.1-py3-none-any.whl (1.8 MB)
   ---------------------------------------- 1.8/1.8 MB 644.1 kB/s eta 0:00:00
Downloading setuptools-75.3.0-py3-none-any.whl (1.3 MB)
   ---------------------------------------- 1.3/1.3 MB 393.2 kB/s eta 0:00:00
Downloading wheel-0.45.1-py3-none-any.whl (72 kB)
Installing collected packages: wheel, setuptools, pip
  WARNING: The script wheel.exe is installed in 'C:\Program Files\LibreOffice\program\python-core-3.8.19\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts pip.exe, pip3.8.exe and pip3.exe are installed in 'C:\Program Files\LibreOffice\program\python-core-3.8.19\Scripts' which is not on PATH.Now we can automate the sheet formatting.
Let's start with a simple class, rearangging the column.
Formatting single sheet and multiple sheet.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed pip-24.3.1 setuptools-75.3.0 wheel-0.45.1

Now, install TOML using PIP in LibreOffice’s Python environment. After this, the installation should complete successfully:

PS C:\Program Files\LibreOffice\program> ./python.exe -m pip install tomli
Collecting tomli
  Downloading tomli-2.2.1-py3-none-any.whl.metadata (10 kB)
Downloading tomli-2.2.1-py3-none-any.whl (14 kB)
Installing collected packages: tomli
Successfully installed tomli-2.2.1

2: CSV Importer: By Config

Real world case require more control, such as using TOML config.

TOML Configuration

The TOML configuration might look like this:

Merge CSV: Calc Macro: Config: TOML

# Linux Based
path = "/home/epsi/movies-by-year/"

# Windows Based
path = "D:/movies-by-year"

# Included Files
csv_s = [
  "movies_1980",
  "movies_1981",
  "movies_1982",
  ...
  ...
  "movies_2018",
  "movies_2019"
]

Just be aware that the path format differs between Linux and Windows.

Macro Skeleton

You can find the macro here:

There is only one additional method.

Merge CSV: Calc Macro: Class Skeleton

import tomli, uno

class CSVImporter:
  def __init__(self, config_path) -> None:
    ...

  def set_config(self) -> None:
    ...

  def load_csv(self, csv_path, sheet_name_dst):
    ...

  def process(self, sheetType) -> None:
    ...

def importCSVs():
  config_path = '/home/epsi/Dev/config.toml'  
  csv_importer = CSVImporter(config_path)
  csv_importer.process('Tabular')

Initialization

This is very similar to the previous approach, but instead of using a directory, we now use config.toml as the source.

Merge CSV: Calc Macro: Initialization

  def __init__(self, config_path) -> None:
    # config, all parameter arguments from tomli
    self.config_path = config_path
    self.set_config()

    # Get the current LibreOffice context
    self.desktop   = XSCRIPTCONTEXT.getDesktop()
    self.model_dst = self.desktop.loadComponentFromURL(
      "private:factory/scalc", "_blank", 0, ())

Configuration

We need to build a list with dictionaries containing the following:

  • filename (input-expand)
  • sheetname (sheet-expand)

The term expand here is my internal naming convention. I tend to name my data CSV files in various ways depending on the processing step. My naming steps are as follows: tabular, expand, pivot, and stat for statistics.

Merge CSV: Calc Macro: TOML Configuration

The result of this list will be stored in self.filenames.

  def set_config(self) -> None:
    # read toml configuration
    file_obj: TextIO = open(self.config_path, 'rb')
    config_root = tomli.load(file_obj)
    file_obj.close()

    # Get config content.
    path_root = config_root.get('path', '.')
    csv_s     = config_root.get('csv_s', [])

    # Define input and output file paths
    filenames = []
    for index, csv in enumerate(csv_s):
      filenames.append({
        'input-expand' : f'{path_root}/{csv}.csv',
        'sheet-expand' : f'{index+1:02d}-{csv[7:11]}', 
      })

    self.filenames = filenames

The worksheet names would look like below:

Merge CSV: Calc Macro: Sheetname Tab

Loading CSV

Filter Options

This part is exactly the same as in the previous method.

Merge CSV: Calc Macro: Loading CSV: Filter Options

So there’s no need to repeat the source code here.

Process

This is where we utilize the previously mentioned self.filenames.

Merge CSV: Calc Macro: Process

  def process(self, sheetType) -> None:
    # write something, do not delete this line
    first_sheet = self.model_dst.Sheets[0]
    first_sheet.getCellRangeByName(
      "A1").String = "Hello World!"

    # Main function to load Pivot CSV 
    for filename in reversed(self.filenames):
      input_csv    = filename['input-expand']
      output_sheet = filename['sheet-expand']

      print(f"Loading: {input_csv}")
      self.load_csv(input_csv, output_sheet)

    self.model_dst.Sheets.removeByName(
      first_sheet.Name)

Class Instance

Config Path

The main method is essentially just, instantiating the class with the prepared parameters.

Instead of using a folder path, we use the config path. This allows us to store more variables, not just the directory.

Merge CSV: Calc Macro: Main Method

def importCSVs():
  # Configure paths or parameters as needed
  config_path = '/home/epsi/Dev/config.toml'  
  csv_importer = CSVImporter(config_path)
  csv_importer.process('Tabular')

Sheet Result

The result of the macro is similar to the previous macro, but this time you have more control.

I’m using this script heavily, but of course, my CSV data is not about movies.


3: Tab Colorizer

For workbooks with many worksheets, it can be helpful to distinguish them with distinct colors. We can achieve this by defining a new Python macro in LibreOffice.

This is more of a “nice-to-have” feature, but I personally use it often. Therefore, I’d like to share the code with you.

Macro Skeleton

You can find the macro here.

Sheet Tab Colorizer: Macro Skeleton

# Color scale 
blueScale  = { ... }
tealScale  = { ... }
amberScale = { ... }
brownScale = { ... }

class SheetTabColorizer:
  def __init__(self):
    ...

  # Apply Color To Sheets
  def run(self):
    ...

def main():
  # Create the colorizer instance and run it
  colorizer = SheetTabColorizer()
  colorizer.run()

Color Scale

I’m using Google Material Color scales as usual.

Sheet Tab Colorizer: GMC Color Scale

# Google Material color scale 
blueScale = {
  0: 0xE3F2FD, 1: 0xBBDEFB, 2: 0x90CAF9,
  3: 0x64B5F6, 4: 0x42A5F5, 5: 0x2196F3,
  6: 0x1E88E5, 7: 0x1976D2, 8: 0x1565C0,
  9: 0x0D47A1
}

tealScale = {
  0: 0xE0F2F1, 1: 0xB2DFDB, 2: 0x80CBC4,
  3: 0x4DB6AC, 4: 0x26A69A, 5: 0x009688,
  6: 0x00897B, 7: 0x00796B, 8: 0x00695C,
  9: 0x004D40
}

amberScale = {
  0: 0xFFF8E1, 1: 0xFFECB3, 2: 0xFFE082,
  3: 0xFFD54F, 4: 0xFFCA28, 5: 0xFFC107,
  6: 0xFFB300, 7: 0xFFA000, 8: 0xFF8F00,
  9: 0xFF6F00
}

brownScale = {
  0: 0xEFEBE9, 1: 0xD7CCC8, 2: 0xBCAAA4,
  3: 0xA1887F, 4: 0x8D6E63, 5: 0x795548,
  6: 0x6D4C41, 7: 0x5D4037, 8: 0x4E342E,
  9: 0x3E2723
}

You can adjust the color scale to suit your needs.

Initialization

This step involves preparing the necessary variables.

Sheet Tab Colorizer: Initialization

class SheetTabColorizer:
  def __init__(self):
    # Get the current LibreOffice context and document
    self.document = XSCRIPTCONTEXT.getDocument()

    # List of palettes to cycle through
    self.palettes = [blueScale, tealScale, amberScale, brownScale]

Applying Color

You may adjust the logic to suit your needs. This is just an example.

Sheet Tab Colorizer: Run: Applying Color

  # Apply Color To Sheets
  def run(self):
    # Loop through each sheet and apply color based on index
    sheets = self.document.Sheets

    for i in range(sheets.Count):
      sheet = sheets[i]
      
      # Determine which palette to use based on sheet index
      palette_index = (i // 10) % 4  # Cycle through 4 palettes
      color_index = i % 10  # Cycle through 0 to 9 for each palette

      # Get the color from the selected palette
      tab_color = self.palettes[palette_index][color_index]
      sheet.TabColor = tab_color  # Set the tab color

Class Instance

The main method simply instantiates the class along with the prepared parameters.

Sheet Tab Colorizer: Main Method: Class Instance

def main():
  # Create the colorizer instance and run it
  colorizer = SheetTabColorizer()
  colorizer.run()

Sheet Result

The worksheet names will look like the example below:

Merge CSV: Calc Macro: Sheet Result with Colorized Tab

You can compare this with the previous preview:

Merge CSV: Calc Macro: Sheetname Tab


What is Next 🤔?

Now that we’ve automated sheet formatting, let’s move on to the next step: a simple-config class for rearranging columns.

The basic flow will focus on formatting both single and multiple sheets.

Feel free to continue reading [ Formatter - Simple Config - One ].