Pivot - LibreOffice - Table Reader

Data Modelling

Python - Excel

LibreOffice Macro

LibreOffice UNO

Local Group

Preface

Goal: Reading a predefined sheet to a plain dataframe, and and build pivot dataframe.

In this section, we embark on the journey of building a pivot with a Python macro in LibreOffice. This approach provides you with a readily applicable template for common office scenarios.

As part of our well-structured learning process, we’ll commence the LibreOffice Python macro tutorial by focusing on the creation of a straightforward spreadsheet table.

Reading

Understanding the code snippets below will necessitate a careful examination of the LibreOffice API.

api.libreoffice.org

Source

Our data source for this exercise is a simple worksheet table:

Pivot: Calc Macro: Source Preview: Table Worksheet

This table comprises three columns: A, B, and C, adorned with uncomplicated decoration.

Preview

The terminal display below represents the envisioned DataFrame we aim to generate:

Pivot: Calc Macro: Result Preview: APSO Terminal

Source Examples

Explore the source examples available for reference:

/python/libreoffice/pivot/

Reader: Main Macro

For this task, we will utilize our existing Example.ods file and read data from the Table worksheet.

github.com/…/pivot/pivot-21.py

Class(es) in Macro

Within the internal structure of the LibreOffice macro, the class is encapsulated within the macro itself. While in more sophisticated scenarios, one might consider relocating the class into a custom library, debugging could be challenging due to caching issues.

In my system, the Pivot file is stored in this directory.

/home/epsi/.config/libreoffice/4/user/Scripts/python/Pivot

In summary, all classes are stored within one macro, eliminating the need for local libraries.

Class Diagram

To provide a comprehensive illustration of the overall structure, the main macro is represented in a class diagram. Although this isn’t a standard UML representation, adjustments have been made for clarity. We’ll start with a simple diagram.

Pivot: Calc Macro: Model: Class Diagram

The class diagram above is created using multiple pages in Inkscape. The source can be obtained here, allowing you to adapt it for your own purposes.

class-diagram-macro.svg

For simplicity, I intentionally use XModel in the diagram, while the actual type is SpreadsheetDocument. The same applies to the sheet, where we use XSheet in the diagram, while the actual type is Spreadsheet. We will refine the types further in the UNO (Universal Network Objects) part of this article series.

Libraries

The only external library needed is pandas. The rest are optional.

import pandas as pd

from pprint import pprint
from typing import Dict

Pivot: Calc Macro: Model: Library

The XSCRIPTCONTEXT is already handled by LibreOffice.

Macro Skeleton

Here’s how the macro skeleton looks:

class PivotSample:
  ...

def main() -> None:
  ...

Pivot: Calc Macro: Model: Skeleton

Note that there is no equivalent to __main__ in a macro. The execution begins with the main method.

Main Method

The first part is custom variable, also categories to be used as feed for class initialization.

def main() -> None:
  columns = {
    'index' : 'A',
    'date'  : 'B',
    'cat'   : 'C'
  }

  sample = PivotSample('Table', columns)
  sample.process()

Pivot: Calc Macro: Table Reader: Main

We have defined the column that we are going to use: A, B, C.

Table Reader Class

This macro essentially consists of a loop that iterates through a range of rows in a worksheet, concatenating the values into a DataFrame.

With the ability to read and write into worksheets, your basic macro skills are sufficient, for straightforward data processing. You can play around with any bulk of worksheet data.

Class Skeleton

Here’s how the skeleton looks:

class PivotSample:
  def __init__(self,
      sheetSourceName: str,
      columns: Dict[str, str]) -> None:

    ...

  def get_last_used_row(self) -> None:
    ...
  def load_data(self) -> None:
    ...

  def process(self) -> None:
    ...

Pivot: Calc Macro: Table Reader: Skeleton

XSCRIPTCONTEXT

The root of all interfaces is XSCRIPTCONTEXT. From this point, you can trace all the interfaces and services required in this example. Further details about this interface can be found here.

api.libreoffice.org - XSCRIPTCONTEXT

Initialization

There are two parameter arguments here.

First is the data source.
Second is the columns for mapping the data.

And this XSCRIPTCONTEXT is crucial in LibreOffice macro development, especially for beginners.

class PivotSample:
  def __init__(self,
      sheetSourceName: str,
      columns: Dict[str, str]) -> None:

    # Getting tthe source sheet
    document = XSCRIPTCONTEXT.getDocument()
    self.sheet_src = document.Sheets[sheetSourceName]

Pivot: Calc Macro: Table Reader: Initialization

We can unpack the mapping column to self as usual.

    # Unpack the dictionary keys and values
    # into class attributes
    for key, value in columns.items():
      setattr(self, f"col_{key}", value)

And also initialize dataframe.

    # initialize dataframe
    self.df_source = pd.DataFrame({
      "Number": [], "Date": [], "Category": [] })

Pivot: Calc Macro: Table Reader: Initialization

Now, we can use the DataFrame based on the predefined list above.

LibreOffice API

While coding a LibreOffice Macro, I often refer to the API documentation, going back and forth between page until I can grasp what I need. For example, a seemingly cumbersome task like obtaining a sheet can be simplified as shown below. Though it may take some effort to understand, once the code works, you can reuse it for multiple projects. Understanding the official API is, more reliable than repeatedly searching on Stack Overflow.

Let’s take an example of getting a sheet. The longer form of the above method is:

    # Getting the source sheet
    document = XSCRIPTCONTEXT.getDocument()
    self.sheet_src = document.getSheets().getByName(sheetSourceName)

Getting the sheet can be expressed in a shorter form:

    self.sheet_src = document.Sheets[sheetSourceName]

From api.libreoffice.org, I can found the type. getDocument() return: XModel in this case is SpreadsheetDocument that is descendant of XModel. SpreadsheetDocument itself is descendant of XSpreadsheetDocument. XSpreadsheetDocument has method getSheets() method that return XSpreadsheets. XSpreadsheets is descendant of XNameContainer which is descendant of XNameReplace which is descendant of XNameAccess. XNameAccess has method getByName() with return of any. In this case the return type is supposed to be Spreadsheet as expected. Spreadsheet is descendant of XSpreadsheet.

Confused yet? This is why I love diagram.

Internal Diagram

To enhance your understanding of the internal workings of the LibreOffice API, let’s document the services, interfaces, and methods used in this macro and then visualize the class diagram. As usual, this is a conceptual diagram that requires clarity, so I won’t strictly adhere to the standard UML diagram.

Pivot: Calc Macro: Table Reader: Internal Class Diagram

The class diagram above was created using the multi-page Inkscape feature. The source can be obtained here, allowing you to modify it for your own purposes.

class-diagram-internal.svg

Service and Interface

In the context of the LibreOffice API and UNO (Universal Network Objects), the relationship between a service and an interface is more akin to an implementation relationship rather than a direct inheritance relationship.

When you have a Spreadsheet Service, it typically means that you have an instance of a component, or service that provides the functionality of a spreadsheet document. In this case, the service is an implementation of the com.sun.star.sheet.SpreadsheetDocument service.

On the other hand, XSpreadsheet is an interface. An interface defines a set of methods that an object can implement. The com.sun.star.sheet.XSpreadsheet interface, in particular, defines methods related to working with a spreadsheet.

So, in a sense, a spreadsheet document service is not, a descendant of XSpreadsheet interface in a classical inheritance sense, but an instance of a service can implement or support the functionality defined by XSpreadsheet.

In summary, a service provides a certain functionality, and an interface defines a set of methods. Instances of a service can implement or support one or more interfaces, allowing them to expose specific functionality defined by those interfaces.

Getting The Actual Sheet.

Now the question, with the return of getByName() is as any type, how come that the actual return is as Spreadsheet type?

Here, getSheets() returns an object of type XSpreadsheets, and the subsequent getByName(sheetSourceName) call retrieves a specific sheet with the given name (sheetSourceName). The getByName() method is designed to work with a predefined type, and in this context, it returns an object of type Spreadsheet.

Even though the return type of getByName() is specified as any in the interface hierarchy, the actual implementation ensures that the returned object is of the expected type (Spreadsheet in this case).

Maximum Used Row

Moving from concepts to actual code, let’s focus on obtaining the row length.

Fortunately, LibreOffice allows us to count the used rows, eliminating the need to find the table length manually. We can set a predefined range as a basis for our loop.

  def get_last_used_row(self) -> None:
    cursor = self.sheet_src.createCursor()
    cursor.gotoEndOfUsedArea(False)
    cursor.gotoStartOfUsedArea(True)
    rows = cursor.getRows()
    
    return len(rows)

Pivot: Calc Macro: Table Reader: Load Data: Max Row

Loading Data

This function further streamlines the process by removing the manual search for table length. We create a predefined range as a basis for our loop.

  def load_data(self) -> None:
    # range to be proceed
    # omit header and plus one for the last
    max_row = self.get_last_used_row()
    range_rows = range(2, max_row+1)

    print(f'Range  : {range_rows}')

Pivot: Calc Macro: Table Reader: Load Data: Max Row

Using this loop, we traverse each row of the worksheet, extracting three values to form a new dataframe.

    for row in range_rows:
      # Convert the new data to a DataFrame
      new_row = pd.DataFrame({
        "Number"   : int(self.sheet_src[
            f'{self.col_index}{row}'].Value),
        "Date"     : int(self.sheet_src[
            f'{self.col_date}{row}'].Value),
        "Category" : self.sheet_src[
            f'{self.col_cat}{row}'].String
      }, index=[0])

Pivot: Calc Macro: Table Reader: Load Data: Walk Sheet Row

We then concatenate each row, appending it to the dataframe.

      # Append the new row to the existing DataFrame
      self.df_source = pd.concat(
        [self.df_source, new_row], ignore_index=True)

Pivot: Calc Macro: Table Reader: Load Data: Concat

Voila… magic happens.

Behind The Scene

Internal Diagram

This diagram illustrates how the service and interface components come together to work with the Cell object.

Pivot: Calc Macro: Table Reader: Internal Class Diagram

Let’s consider an example: retrieving the value. The longer form of the method is as follows.

int(self.sheet_src.getCellRangeByName(
    f'{self.col_index}{row}').getCellByPosition(0,0).getValue()),

This can be simplified to a shorter form.

int(self.sheet_src[
   f'{self.col_index}{row}'].Value),

Obtaining the string is somewhat more complex. The longer form of the method is as follows.

self.sheet_src[
    f'{self.col_cat}{row}'].getText().getString()

Alternatively, using a property instead of a getter.

self.sheet_src[
    f'{self.col_cat}{row}'].Text.String

Retrieving the string can be further simplified.

self.sheet_src[
            f'{self.col_cat}{row}'].String

Process

The flow of the class is straightforward.

  def process(self) -> None:
    self.load_data()

    # Print the updated DataFrame
    print(self.df_source)
    print()

Pivot: Calc Macro: Table Reader: Running Process Flow

Running Macro in Calc

To execute a Python macro in LibreOffice Calc, you can use APSO:

Result Preview

Run the macro in the terminal, and you’ll see a text output similar to the one below:

>>> main()
Range  : range(2, 757)
     Number     Date    Category
0       1.0  42785.0      Orange
1       2.0  42785.0       Grape
2       3.0  42785.0  Strawberry
3       4.0  42785.0      Orange
4       5.0  42785.0       Apple
..      ...      ...         ...
750   751.0  42830.0  Strawberry
751   752.0  42830.0  Strawberry
752   753.0  42830.0       Mango
753   754.0  42830.0  Strawberry
754   755.0  42830.0      Banana

[755 rows x 3 columns]

Pivot: Calc Macro: Result Preview: APSO Terminal

As you can observe, the dates are stored as integer values.

Pivot Dataframe: Main Macro

After obtaining the dataframe result, the next step is to transform it into a pivot dataframe.

Class Diagram

Advancing from the simple diagram, we can continue building the pivot and visualize the result at startup.

Pivot: Calc Macro: Model: Class Diagram

Macro Skeleton

Here is the structure of the macro:

github.com/…/pivot/pivot-22.py

class PivotSample:
  def __init__(self,
      sheetSourceName: str,
      columns: Dict[str, str],
      categories: List[str]) -> None:
    ...

  def get_last_used_row(self) -> None:
    ...

  def load_data(self) -> None:
    ...

  def build_pivot(self) -> None:
    ...

  def add_total_column(self):
  ...
  
  def add_total_row(self):
    ...

  def process(self) -> None:
    ...

def main() -> None:
  ...

Pivot: Calc Macro: Model: Skeleton

Main Method

Similar to the previous one, with the addition of categories.

def main() -> None:
  columns = {
    'index' : 'A',
    'date'  : 'B',
    'cat'   : 'C'
  }

  categories = [
    "Apple", "Banana", "Dragon Fruit",
    "Durian", "Grape", "Mango",
    "Orange", "Strawberry"]

  pd.set_option('display.max_rows', 10)

  sample = PivotSample('Table', columns, categories)
  sample.process()

Pivot: Calc Macro: Model: Main

The category lookup serves as a reference value for the C column.

Initialization

The only difference lies in the initial setup of categories.

class PivotSample:
  def __init__(self,
      sheetSourceName: str,
      columns: Dict[str, str],
      categories: List[str]) -> None:

    # Getting the source sheet
    ...

    self.categories = categories

    # initialize dataframe
    ...

Pivot: Calc Macro: Model: Initialization

Building Pivot

We can now proceed to building the pivot. The details have already been explained in the previous dataframe example.

  def build_pivot(self) -> None:
    try:
      # Perform pivot operations
      self.pivot_table = self.df_source.pivot_table(
        index='Date', columns='Category',
        aggfunc='count', fill_value=0)

      # Ensure all specified columns are present
      for cat in self.categories:
        if ('Number', cat) not in self.pivot_table.columns:
          self.pivot_table[('Number', cat)] = 0

      # Sort the columns (fruits) in alphabetical order
      self.pivot_table = self.pivot_table.sort_index(axis=1)

    except Exception as e:
      print("An error occurred " \
        + f"while processing data: {e}")

Pivot: Calc Macro: Model: Building Pivot

Adding Total

Similar to the previous example, we can add both total columns and rows.

Total Column

  def add_total_column(self):
    # Calculate the row sums and add a total column
    row_sums = self.pivot_table.sum(axis=1)
    self.pivot_table[('Total Date', 'Total')] = row_sums

Pivot: Calc Macro: Model: Total Column

Total Row

  def add_total_row(self):
    # Calculate the sum for each column
    # and add a total row
    total_row = self.pivot_table.sum().to_frame().T
    total_row.index = ['Total']
    self.pivot_table = pd.concat(
      [self.pivot_table, total_row])

Pivot: Calc Macro: Model: Total Row

Process

Gathering All Together

Now, we can consolidate all the dataframe creation steps into a single process method.

  def process(self) -> None:
    self.load_data()
    self.build_pivot()
    self.add_total_column()
    self.add_total_row()

    # Print the newly created pivot table
    print(self.pivot_table)
    print()

Result Preview

You can run the Python macro in LibreOffice Calc using APSO and view the result in the APSO terminal as shown below:

>>> main()
Range  : range(2, 757)
         Number                                                          Total
Category  Apple Banana Dragon Fruit Durian Grape Mango Orange Strawberry Total
42785.0       1      3            0      0     2     2      3          4    15
42786.0       0      4            0      0     0     2      2          2    10
42787.0       2      1            0      0     4     4      2          3    16
42788.0       4      1            0      0     1     1      4          5    16
42789.0       1      4            0      0     3     5      6          3    22
...         ...    ...          ...    ...   ...   ...    ...        ...   ...
42827.0       1      1            0      0     5     4      5          1    17
42828.0       4      0            0      0     3     3      3          5    18
42829.0       0      2            0      0     2     1      3          3    11
42830.0       3      1            0      0     1     2      2          4    13
Total       114    125            0      0   127   115    148        126   755

[47 rows x 9 columns]

Pivot: Calc Macro: Result Preview: APSO Terminal

Now is an opportune time to write the result into its dedicated worksheet.

What Comes Next 🤔?

Our journey begins with the initial task of reading the table. Following this, the subsequent step involves constructing the pivot dataframe. These actions collectively contribute to the creation of a LibreOffice pivot using a Python macro.

After successfully reading the data, the next steps involve writing the pivot dataframe into a designated sheet and adding aesthetic embellishments to enhance the overall presentation of the worksheet.

Consider progressing further by exploring the next topic: [ Pivot - Calc Macro - Pivot Writer ].