Preface
Goal: Merge worksheet from multiple xlsx into one.
Have you ever been in a awkward situation. A choice between doing something manually fast. And automate with coding, that takes more time. Hesitate to code, because this looks worthless.
When the task getting harder. More sheets to be done. More report to your bozz in a short urgent time. More and more mistake as your eyes getting weary. You are going to have to shift your choice.
In short: prepare for routine task.
Office Automation
On a more basic requirement, there are a lot of routine based on your business flow. This requirement is cannot be generalized, in context of different business.
All I can do is giving an example on performing simple task.
- Merging and Extract
- Coloring Cell
- Making your own pivot
- Perform your own statistic algorithm
- et cetera, I have a bunch of idea.
With this you can perform your own automation, based on your special task.
The Needs
But why do I even bother with automation, while I can do it manually?
The reason is human tend to err when in comes to tedious task.
The second reason is I prefer coding better than tedious task.
Example Data
Before you get lost in long introduction. Let’s get into the code. But hey, what case do I need to work with?
Instead of my confidential office document, I’m using my own company annual report.
- Balance Sheet, monthly from January until December
- Profit Loss, monthly from January until December
Requirement
Purpose: I want to be able to merge multiple worksheet into one xlsx file.
If the script does not work that way, then I failed.
File Naming
I name my balance sheet as BS-01
,
and profit loss sheet as PL-01
.
You can see them all in this folder:
Sheet Naming
Let’s merge some sheet
The expected result is as below:
You can see all the sheet name in sheets tab.
Manual Extract/Merge
Merging and extract manually usually can be done,
with menu Move or Copy Sheet
.
How does it works
The issue with merge
in openpyxl
is,
there is no way to copy (or move) sheet,
from one workbook to newly created worksheet object.
So we have to prepare a file with empty sheet first, and copy the result to from the source workbook to prepared file.
Limitation
This works only for similar sheet.
To make the issue worst. The prepared sheet should contain other properties, that all the other sheets have.
So here is what I do, I copy manually form one working sheet. Empty the sheet, and save this sheet as our empty sheet.
1: Query Sheet
Script
Consider begin with a short script,
using openpyxl
library.
import openpyxl
from openpyxl import load_workbook
wb = load_workbook('./monthly/01-BS.xlsx')
for sheet in wb.worksheets:
print(sheet.title)
Result
The result is as simply as below:
Sheet1
2: Rename Sheet
Script
Still short script. We are assuming that the sheet has only one sheet. The other sheet will be ignored for this example.
import openpyxl
from openpyxl import load_workbook
wb = load_workbook('./monthly/01-BS.xlsx')
# Rename first sheet
sheet = wb.worksheets[0]
sheet.title = '01 - Balance Sheet'
# Save the file
wb.save("sample.xlsx")
Result
There is nothing in CLI output.
But you can examine the result directly in saved sample.xlsx
file.
3: Copy Sheet Between Workbook
Like I said.
I utilize prepared xlsx file, named empty.xlsx
.
Script
Still short script.
import openpyxl
from openpyxl import load_workbook
from openpyxl import Workbook
wb_dest = load_workbook('./empty.xlsx')
wb_source = load_workbook('./monthly/01-BS.xlsx')
# Rename first sheet
sheet = wb_source.worksheets[0]
sheet.title = '01-BS'
sheet._parent = wb_dest
wb_dest._add_sheet(sheet)
# Save the file
wb_dest.save("sample.xlsx")
Result
Again. You can examine the result directly in saved sample.xlsx
file.
4: Query File
Before we go further processing script. We need to make sure that we can list all the excel file properly.
Script
Still short script
import os
from os import listdir
from os.path import isfile, join
path = './monthly/'
onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
onlyxlxs = [f for f in onlyfiles if '.xlsx' in f]
onlyxlxs.sort()
for f in onlyxlxs:
noext = os.path.splitext(f)[0]
print(noext)
Or fancier with pprint
.
import os
from os import listdir
from os.path import isfile, join
import pprint
path = './monthly/'
onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
onlyxlxs = [f for f in onlyfiles if '.xlsx' in f]
onlyxlxs.sort()
my_print = pprint.PrettyPrinter(width=60, compact=True)
my_print.pprint(onlyxlxs)
Result
You can check the output in CLI below:
['01-BS.xlsx', '01-PL.xlsx', '02-BS.xlsx', '02-PL.xlsx',
'03-BS.xlsx', '03-PL.xlsx', '04-BS.xlsx', '04-PL.xlsx',
'05-BS.xlsx', '05-PL.xlsx', '06-BS.xlsx', '06-PL.xlsx',
'07-BS.xlsx', '07-PL.xlsx', '08-BS.xlsx', '08-PL.xlsx',
'09-BS.xlsx', '09-PL.xlsx', '10-BS.xlsx', '10-PL.xlsx',
'11-BS.xlsx', '11-PL.xlsx', '12-BS.xlsx', '12-PL.xlsx']
5: Build from One Directory
Now we can process the whole directory.
Script
This is a little bit longer. But it is actually simple.
import os
from os import listdir
from os.path import isfile, join
import openpyxl
from openpyxl import load_workbook
from openpyxl import Workbook
wb_dest = load_workbook('./empty.xlsx')
path = './monthly/'
onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
onlyxlxs = [f for f in onlyfiles if '.xlsx' in f]
onlyxlxs.sort()
for f in onlyxlxs:
wb_source = load_workbook(join(path, f))
noext = os.path.splitext(f)[0]
# Rename first sheet
sheet = wb_source.worksheets[0]
sheet.title = noext
sheet._parent = wb_dest
wb_dest._add_sheet(sheet)
# Save the file
wb_dest.save("sample.xlsx")
Result
Now the whole separated sheets are gathered into one,
saved in sample.xlsx
file.
And surprisingly, the process is pretty fast.
6: Build from Multiple Directory
I don’t know about your case. But in my case, I have to gather data from multiple directory as well.
And they are not accounting sheets
Script
After refactoring. Here is my final script:
import os
from os import listdir
from os.path import isfile, join
import openpyxl
from openpyxl import load_workbook
from openpyxl import Workbook
wb_dest = load_workbook('./empty.xlsx')
def merge_all(path):
onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
onlyxlxs = [f for f in onlyfiles if '.xlsx' in f]
onlyxlxs.sort()
for f in onlyxlxs:
wb_source = load_workbook(join(path, f))
noext = os.path.splitext(f)[0]
# Process first sheet
sheet = wb_source.worksheets[0]
sheet.title = noext
sheet._parent = wb_dest
wb_dest._add_sheet(sheet)
merge_all('./monthly/')
# merge_all('./other-dir-01/')
# merge_all('./other-dir-01/')
# Save the file
wb_dest.save("sample.xlsx")
Result
The script have about the same result as above script.
What is Next 🤔?
We have done with merging. How about extracting?
Consider continue reading [ Python - Excel - Extract Sheets ].