Where to Discuss?

Local Group

Preface

Goal: Merge worksheet from multiple xlsx into one.

Have you ever been in a awkward situation. A choice between doing something manually fast. And automate with coding, that takes more time. Hesitate to code, because this looks worthless.

When the task getting harder. More sheets to be done. More report to your bozz in a short urgent time. More and more mistake as your eyes getting weary. You are going to have to shift your choice.

In short: prepare for routine task.

Office Automation

On a more basic requirement, there are a lot of routine based on your business flow. This requirement is cannot be generalized, in context of different business.

All I can do is giving an example on performing simple task.

  • Merging and Extract
  • Coloring Cell
  • Making your own pivot
  • Perform your own statistic algorithm
  • et cetera, I have a bunch of idea.

With this you can perform your own automation, based on your special task.

The Needs

But why do I even bother with automation, while I can do it manually?

The reason is human tend to err when in comes to tedious task.

The second reason is I prefer coding better than tedious task.


Example Data

Before you get lost in long introduction. Let’s get into the code. But hey, what case do I need to work with?

Instead of my confidential office document, I’m using my own company annual report.

  • Balance Sheet, monthly from January until December
  • Profit Loss, monthly from January until December

File Naming

I name my balance sheet as BS-01, and profit loss sheet as PL-01. You can see them all in this folder:

openpyxl: Monthly Accounting Report

Sheet Naming

Let’s merge some sheet

The expected result is as below:

openpyxl: Sheets Tab Result

You can see all the sheet name in sheets tab.

Manual Extract/Merge

Merging and extract manually usually can be done, with menu Move or Copy Sheet.

openpyxl: Manual

How does it works

The issue with merge in openpyxl is, there is no way to copy (or move) sheet, from one workbook to newly created worksheet object.

So we have to prepare a file with empty sheet first, and copy the result to from the source workbook to prepared file.

Limitation

This works only for similar sheet.

To make the issue worst. The prepared sheet should contain other properties, that all the other sheets have.

So here is what I do, I copy manually form one working sheet. Empty the sheet, and save this sheet as our empty sheet.


1: Query Sheet

Script

Consider begin with a short script, using openpyxl library.

import openpyxl
from openpyxl import load_workbook

wb = load_workbook('./monthly/01-BS.xlsx')

for sheet in wb.worksheets:
    print(sheet.title)

Result

The result is as simply as below:

Sheet1

ViM: opexpyxl: Query Sheet


2: Rename Sheet

Script

Still short script. We are assuming that the sheet has only one sheet. The other sheet will be ignored for this example.

import openpyxl
from openpyxl import load_workbook

wb = load_workbook('./monthly/01-BS.xlsx')

# Rename first sheet
sheet = wb.worksheets[0] 
sheet.title = '01 - Balance Sheet'

# Save the file
wb.save("sample.xlsx")

Result

There is nothing in CLI output.

ViM: opexpyxl: Rename Sheet

But you can examine the result directly in saved sample.xlsx file.

openpyxl: Renamed Sheet


3: Copy Sheet Between Workbook

Like I said. I utilize prepared xlsx file, named empty.xlsx.

Script

Still short script.

import openpyxl
from openpyxl import load_workbook
from openpyxl import Workbook

wb_dest   = load_workbook('./empty.xlsx')
wb_source = load_workbook('./monthly/01-BS.xlsx')

# Rename first sheet
sheet = wb_source.worksheets[0] 
sheet.title = '01-BS'

sheet._parent = wb_dest
wb_dest._add_sheet(sheet)

# Save the file
wb_dest.save("sample.xlsx")

ViM: opexpyxl: Copy Sheet

Result

Again. You can examine the result directly in saved sample.xlsx file.

openpyxl: Copy Sheet Between Workbook


4: Query File

Before we go further processing script. We need to make sure that we can list all the excel file properly.

Script

Still short script

import os
from os import listdir
from os.path import isfile, join

path = './monthly/'

onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
onlyxlxs  = [f for f in onlyfiles if '.xlsx' in f]

onlyxlxs.sort()

for f in onlyxlxs:
  noext = os.path.splitext(f)[0]
  print(noext)

Or fancier with pprint.

import os
from os import listdir
from os.path import isfile, join
import pprint

path = './monthly/'

onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
onlyxlxs  = [f for f in onlyfiles if '.xlsx' in f]
onlyxlxs.sort()

my_print  = pprint.PrettyPrinter(width=60, compact=True)
my_print.pprint(onlyxlxs)

ViM: opexpyxl: Query File

Result

You can check the output in CLI below:

['01-BS.xlsx', '01-PL.xlsx', '02-BS.xlsx', '02-PL.xlsx',
 '03-BS.xlsx', '03-PL.xlsx', '04-BS.xlsx', '04-PL.xlsx',
 '05-BS.xlsx', '05-PL.xlsx', '06-BS.xlsx', '06-PL.xlsx',
 '07-BS.xlsx', '07-PL.xlsx', '08-BS.xlsx', '08-PL.xlsx',
 '09-BS.xlsx', '09-PL.xlsx', '10-BS.xlsx', '10-PL.xlsx',
 '11-BS.xlsx', '11-PL.xlsx', '12-BS.xlsx', '12-PL.xlsx']

5: Build from One Directory

Now we can process the whole directory.

Script

This is a little bit longer. But it is actually simple.

import os
from os import listdir
from os.path import isfile, join

import openpyxl
from openpyxl import load_workbook
from openpyxl import Workbook

wb_dest   = load_workbook('./empty.xlsx')
path = './monthly/'

onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
onlyxlxs  = [f for f in onlyfiles if '.xlsx' in f]
onlyxlxs.sort()

for f in onlyxlxs:
  wb_source = load_workbook(join(path, f))
  noext = os.path.splitext(f)[0]

  # Rename first sheet
  sheet = wb_source.worksheets[0] 
  sheet.title = noext

  sheet._parent = wb_dest
  wb_dest._add_sheet(sheet)

# Save the file
wb_dest.save("sample.xlsx")

ViM: opexpyxl: Merge One Directory

Result

Now the whole separated sheets are gathered into one, saved in sample.xlsx file.

openpyxl: All Sheets Gathered in One Workbook

And surprisingly, the process is pretty fast.


6: Build from Multiple Directory

I don’t know about your case. But in my case, I have to gather data from multiple directory as well.

And they are not accounting sheets

Script

After refactoring. Here is my final script:

import os
from os import listdir
from os.path import isfile, join

import openpyxl
from openpyxl import load_workbook
from openpyxl import Workbook

wb_dest   = load_workbook('./empty.xlsx')

def merge_all(path):
  onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
  onlyxlxs  = [f for f in onlyfiles if '.xlsx' in f]
  onlyxlxs.sort()

  for f in onlyxlxs:
    wb_source = load_workbook(join(path, f))
    noext = os.path.splitext(f)[0]

    # Process first sheet
    sheet = wb_source.worksheets[0] 
    sheet.title = noext
    sheet._parent = wb_dest
    wb_dest._add_sheet(sheet)

merge_all('./monthly/')
# merge_all('./other-dir-01/')
# merge_all('./other-dir-01/')

# Save the file
wb_dest.save("sample.xlsx")

ViM: opexpyxl: Merge Multiple Directories

Result

The script have about the same result as above script.


What is Next 🤔?

We have done with merging. How about extracting?

Consider continue reading [ Python - Excel - Extract Sheets ].