Preface
Goal: Master Detail CSV Reader, Using Console
Using Faktur Pajak (Indonesian Tax) as Example.
There will be a lot of picture here. But the code is actually simple.
And there is not much paragraph either.
Local Issue
Indonesia regulation required any VAT (PPn), to be reported using goverment application called ETaxInvoice. This tax application from goverment can be exported into CSV. But this CSV cannot be imported into LibreOffice, because the CSV is a master-detail table relationship.
So I make a little tool that you can modify easily to suit your needs.
This is a local issue, but you can apply this tricks, for use with similar case.
1: CSV Case: Faktur Keluaran
Source
Consider have a look at this example data.
- Input: faktur-keluaran.csv
This data is simply a double quoted comma separated values.
Header
You can see the raw text file as below.
If you take a look closely at the first three rows. They are all, column header name, which each lines consist many quoted string.
The first column all started with either “FK”, “FAPR”, or “OF”.
I do not know what that means.
Content
You can see the content of the first “FK” line, and the first “FAPR” line below. These lines looks terrifying.
Now consider remove the hash content in “FAPR” line, to make the result less terrifying.
Row Content
And again, have a look, one by one. There are at least three distinct content.
- “FK”: This is the master table.
Master row can only consist one row, for example:
"FK","03","0","0042131234569","1","2020","18/01/2020","024441926721000","PT. Kucing Oren Lucu",...
- “FAPR”: This is also one to one relationship with “FK”.
This is actually a part of master table, for example:
"FAPR","PT Tiga Jam Saja","Gedung Gorila Lantai Dasar L13 JL. Wortel Dambaan...",
- “OF”: This is clearly a detail table of “FAPR”
The detail can consist of more than one row, for example:
"OF","","Ramuan Anti Gatal","13000000","2.0","26000000","0.0","26000000","2600000","0","0.0"
"OF","","Topping Coklat","15000.0","200.0","3000000.0","0.0","3000000.0","300000.0","0","0.0"
"OF","","Zat Pewarna Pink","8000.0","5000.0","40000000","0.0","40000000","4000000.0","0","0.0"
Hints: You can compare the CSV with your eFaktur Application.
2: Simple Parsing
Quick and Dirty
Consider go straight into coding.
Source
Just do it. Open The file and extract the content.
Header
First thing to do is copy-and-paste the column header definition.
header = \
'"FK","Kode","Ganti","Faktur","Masa",' + \
'"Tahun","Tanggal","NPWP","Nama","Alamat",' + \
'"DPP","PPn","PPnBM","Keterangan","FG",' + \
'"UM DPP","UM PPn","UM PPnBM","Referensi"'
We can just split the array using standard string split. But this can cause an issue, whenever we caught comma inside quoted text.
keys = header.split('",')
So we’d better use regular expression, instead.
import re
keys = re.split(r',(?=")', header)
keys = [key.replace('"', '') for key in keys]
Sample Content
Consider examine the fourth lines. The rows with “FK” signature.
# parse lines
filename = 'faktur-keluaran.csv'
with open(filename) as f:
lines = f.readlines()
f.close()
count = 0
for line in lines:
count += 1
if count==4:
# do something
We can pair the header and the content entry.
if count==4:
print(f'line {count}:\n{line}')
values = re.split(r',(?=")', line)
values = [value.replace('"', '')
for value in values]
pairs = dict(zip(keys, values))
for key in pairs:
print(f'{key:>10} : {pairs[key][:40]}')
Result
Then display the result as below. There are many fields in just one line.
line 4:
"FK","01","0","0042131234567","1","2020","15/01/2020","081443518011000","PT. Teliti Telaten Tekun","Jl. Gigih Gagah No. 1234 RT:007 RW:007","12000000","1200000","0","","1","6000000","600000","0","",
FK : FK
Kode : 01
Ganti : 0
Faktur : 0042131234567
Masa : 1
Tahun : 2020
Tanggal : 15/01/2020
NPWP : 081443518011000
Nama : PT. Teliti Telaten Tekun
Alamat : Jl. Gigih Gagah No. 1234 RT:007 RW:007
DPP : 12000000
PPn : 1200000
PPnBM : 0
Keterangan :
FG : 1
UM DPP : 6000000
UM PPn : 600000
UM PPnBM : 0
Referensi : ,
Python is powerful, right! With only this few lines, we are already have this output above.
3: Three Kind of Headers
Examine Master Detail Header
This is exactly the same with previous example, but this time we examine three headers: “FK”, “FAPR”, or “OF”, instead just the first “FK” header.
Source
Just a modification of previous code.
Header
Again we need to copy-and-paste the column header definition.
hd_fk = \
'"FK","Kode","Ganti","Faktur","Masa",' +\
'"Tahun","Tanggal","NPWP","Nama","Alamat",' +\
'"DPP","PPn","PPnBM","Keterangan","FG",' +\
'"UM DPP","UM PPn","UM PPnBM","Referensi"'
hd_fpr = \
'"LT","Nama","Alamat","Perekam",' +\
'"Wilayah","Timestamp","Hash"'
hd_of = \
'"OF","Kode","Nama","Satuan","Jumlah",' +\
'"Total","Diskon","DPP","PPN","Tarif","PPnBM"'
Then split using regular expression.
import re
keys_fk = re.split(r',(?=")', hd_fk)
keys_fk = [key.replace('"', '')
for key in keys_fk]
keys_fpr = re.split(r',(?=")', hd_fpr)
keys_fpr = [key.replace('"', '')
for key in keys_fpr]
keys_of = re.split(r',(?=")', hd_of)
keys_of = [key.replace('"', '')
for key in keys_of]
Sample Content
Consider examine all entries. We need to ignore the first three rows, because these contain header name, insetad of entry.
Since your output might have hundres of lines, consider limit the row to just the few lines. This could be the first ten lines or more, depend on your needs.
# parse lines
filename = 'faktur-keluaran.csv'
with open(filename) as f:
lines = f.readlines()
f.close()
count = 0
for line in lines:
count += 1
if (count>3) and (count<10):
# examine all entries
We can pair the header and the content entry, depend on the signature: “FK”, “FAPR”, or “OF”,
if (count>3) and (count<10):
values = re.split(r',(?=")', line)
values = [value.replace('"', '')
for value in values]
if values[0]=="FK":
pairs = dict(zip(keys_fk, values))
for key in pairs:
print(f'{key:>10} : {pairs[key][:40]}')
print()
if values[0]=="FAPR":
pairs = dict(zip(keys_fpr, values))
for key in pairs:
print(f'{key:>10} : {pairs[key][:40]}')
print()
if values[0]=="OF":
pairs = dict(zip(keys_of, values))
for key in pairs:
print(f'{key:>10} : {pairs[key][:40]}')
print()
Result
Then display the result as below. There are three different output:
- “FK” Example
You might notice comma ”," int the last field. This is a bug and we are going to fix this later.
FK : FK
Kode : 01
Ganti : 0
Faktur : 0042131234567
Masa : 1
Tahun : 2020
Tanggal : 15/01/2020
NPWP : 081443518011000
Nama : PT. Teliti Telaten Tekun
Alamat : Jl. Gigih Gagah No. 1234 RT:007 RW:007
DPP : 12000000
PPn : 1200000
PPnBM : 0
Keterangan :
FG : 1
UM DPP : 6000000
UM PPn : 600000
UM PPnBM : 0
Referensi : ,
- “FAPR” Example, followed by one “OF” Example
LT : FAPR
Nama : PT Tiga Jam Saja
Alamat : Gedung Gorila Lantai Dasar L13 JL. Worte
Perekam : Nobita
Wilayah : Jakarta Timur
Timestamp : 20200115123148
Hash : MAACLIYJKoZIhvcNAGcCoIICHTCCAhkCAGExDzAN
OF : OF
Kode :
Nama : Motor iPhone
Satuan : 12000000
Jumlah : 1.0
Total : 12000000
Diskon : 0.0
DPP : 12000000
PPN : 1200000.0
Tarif : 0
PPnBM : 0.0
- Another “FAPR” Example, followed by one “OF” Example
LT : FAPR
Nama : PT Tiga Jam Saja
Alamat : Gedung Gorila Lantai Dasar L13 JL. Worte
Perekam : Suzuka
Wilayah : Jakarta Timur
Timestamp : 20200117113521
Hash : MAACLIYJKoZIhvcNAGcCoIICHTCCAhkCAGExDzAN
OF : OF
Kode :
Nama : Ramuan Anti Gatal
Satuan : 13000000
Jumlah : 1.0
Total : 13000000
Diskon : 0.0
DPP : 13000000
PPN : 1300000.0
Tarif : 0
PPnBM : 0.0
The last image, is actually has three “OF” details. But this would be too long to be shown. So I cut the output.
4: Class Parser
Object Oriented Approach
The fact that the code above run well. Does not means our the code is good.
After quick and dirty approach, We need to write proper code.
We shall begin with the simple class.
Source
Let’s get civilized, nomore barbaric code.
Skeleton
I would like to make a master detail faktur class.
class FakturMD2Sheet:
def __init__(self, sheet):
# ...
# some function here
# ...
def run(self):
# ...
def main():
# class initialization ...
main()
Header
This is just an OOP version of previous step by step starter.
import re
# Master Detail Faktur Exporter Class
class FakturMD:
header_fk = \
'"FK","Kode","Ganti","Faktur","Masa",' +\
'"Tahun","Tanggal","NPWP","Nama","Alamat",' +\
'"DPP","PPn","PPnBM","Keterangan","FG",' +\
'"UM DPP","UM PPn","UM PPnBM","Referensi"'
Class Initialization
We can access using class properties using self keyword.
def __init__(self, filename):
# prepare header
keys = re.split(r',(?=")', self.header_fk)
self.keys = [key.replace('"', '') for key in keys]
# parse lines
with open(filename) as f:
self.lines = f.readlines()
f.close()
Run: Display Entries
And call the properties anywhere within the class.
def run(self):
count = 0
for line in self.lines:
count += 1
if count==4:
print(f'line {count}:\n{line}')
values = re.split(r',(?=")', line)
values = [value.replace('"', '')
for value in values]
pairs = dict(zip(self.keys, values))
for key in pairs:
print(f'{key:>10} : {pairs[key][:40]}')
Main
And finally make a class instance in main function.
def main():
filename = 'faktur-keluaran.csv'
md = FakturMD(filename)
md.run()
main()
Result
Then display the result as below. The result is very similar to previous code.
5: Master Detail Class
Examine Master Detail in Class Fashioned
Consider go further with class, examining all the three headers “FK”, “FAPR”, or “OF”.
Source
Just an OOP version modification of previous code.
Skeleton
All definition shown here:
def split_quotes(header):
class FakturMD:
def __init__(self, filename):
def init_header_keys(self):
def print_record(self, keys, values):
def run(self):
def main():
main()
Helper: Split Quotes
Not everything have to be stored in class. Such as this class helper below:
import re
def split_quotes(header):
keys = re.split(r',(?=")', header)
return [key.replace('"', '') for key in keys]
Class Initialization
We can start all in initialization.
# Master Detail Faktur Exporter Class
class FakturMD:
def __init__(self, filename):
# prepare header
self.init_header_keys()
# parse lines
with open(filename) as f:
self.lines = f.readlines()
f.close()
Header
And put header definition in its own function.
def init_header_keys(self):
header_fk = \
'"FK","Kode","Ganti","Faktur","Masa",' +\
'"Tahun","Tanggal","NPWP","Nama","Alamat",' +\
'"DPP","PPn","PPnBM","Keterangan","FG",' +\
'"UM DPP","UM PPn","UM PPnBM","Referensi"'
header_fpr = \
'"LT","Nama","Alamat","Perekam",' +\
'"Wilayah","Timestamp","Hash"'
header_of = \
'"OF","Kode","Nama","Satuan","Jumlah",' +\
'"Total","Diskon","DPP","PPN","Tarif","PPnBM"'
self.keys_fk = split_quotes(header_fk)
self.keys_fpr = split_quotes(header_fpr)
self.keys_of = split_quotes(header_of)
This way, we can be more flexible to alter class properties.
Helper: Print Record
We can also put helper method as a part of this FakturMD class.
def print_record(self, keys, values):
pairs = dict(zip(keys, values))
for key in pairs:
print(f'{key:>10} : {pairs[key][:40]}')
print()
Run: Display Entries
And call the method anywhere within the class.
def run(self):
count = 0
for line in self.lines:
count += 1
# ignore the first three lines
if (count>3) and (count<10):
values = split_quotes(line)
if values[0]=="FK":
self.print_record(self.keys_fk, values)
if values[0]=="FAPR":
self.print_record(self.keys_fpr, values)
if values[0]=="OF":
self.print_record(self.keys_of, values)
Main
And finally make a class instance in main function.
def main():
filename = 'faktur-keluaran.csv'
md = FakturMD(filename)
md.run()
main()
Result
Then display the result as below. The result is very similar to previous code.
What is Next 🤔?
We are not done yet. We haven’t even touch the spreadsheet part.
Consider continue reading [ Python - Excel - CSV - Part Two ].