Python - Excel - CSV

LibreOffice

Python - Excel

Local Group

Preface

Goal: Master Detail CSV Reader, Using Console

Using Faktur Pajak (Indonesian Tax) as Example.

There will be a lot of picture here. But the code is actually simple.

And there is not much paragraph either.

Local Issue

Indonesia regulation required any VAT (PPn), to be reported using goverment application called ETaxInvoice. This tax application from goverment can be exported into CSV. But this CSV cannot be imported into LibreOffice, because the CSV is a master-detail table relationship.

So I make a little tool that you can modify easily to suit your needs.

The script shall be able to convert CSV from ETaxInvoice into XLSX file.

This is a local issue, but you can apply this tricks, for use with similar case.

1: CSV Case: Faktur Keluaran

Source

Consider have a look at this example data.

Input: faktur-keluaran.csv

This data is simply a double quoted comma separated values.

You can see the raw text file as below.

CSV: Faktur Keluaran: Plain

If you take a look closely at the first three rows. They are all, column header name, which each lines consist many quoted string.

CSV: Faktur Keluaran: Header

The first column all started with either “FK”, “FAPR”, or “OF”.

I do not know what that means.

Content

You can see the content of the first “FK” line, and the first “FAPR” line below. These lines looks terrifying.

CSV: Faktur Keluaran: First

Now consider remove the hash content in “FAPR” line, to make the result less terrifying.

Row Content

And again, have a look, one by one. There are at least three distinct content.

“FK”: This is the master table.

CSV: Faktur Keluaran: FK

Master row can only consist one row, for example:

"FK","03","0","0042131234569","1","2020","18/01/2020","024441926721000","PT. Kucing Oren Lucu",...

“FAPR”: This is also one to one relationship with “FK”.

CSV: Faktur Keluaran: FAPR

This is actually a part of master table, for example:

"FAPR","PT Tiga Jam Saja","Gedung Gorila Lantai Dasar L13 JL. Wortel Dambaan...",

“OF”: This is clearly a detail table of “FAPR”

CSV: Faktur Keluaran: OF

The detail can consist of more than one row, for example:

"OF","","Ramuan Anti Gatal","13000000","2.0","26000000","0.0","26000000","2600000","0","0.0"
"OF","","Topping Coklat","15000.0","200.0","3000000.0","0.0","3000000.0","300000.0","0","0.0"
"OF","","Zat Pewarna Pink","8000.0","5000.0","40000000","0.0","40000000","4000000.0","0","0.0"

Hints: You can compare the CSV with your eFaktur Application.

2: Simple Parsing

Quick and Dirty

Consider go straight into coding.

Source

Just do it. Open The file and extract the content.

gitlab.com/…/csv-md-openpyxl/01-simple.py

Header

First thing to do is copy-and-paste the column header definition.

header = \
    '"FK","Kode","Ganti","Faktur","Masa",' + \
    '"Tahun","Tanggal","NPWP","Nama","Alamat",' + \
    '"DPP","PPn","PPnBM","Keterangan","FG",' + \
    '"UM DPP","UM PPn","UM PPnBM","Referensi"'

We can just split the array using standard string split. But this can cause an issue, whenever we caught comma inside quoted text.

keys = header.split('",')

So we’d better use regular expression, instead.

import re

keys = re.split(r',(?=")', header)
keys = [key.replace('"', '') for key in keys]

Simple Parser

Sample Content

Consider examine the fourth lines. The rows with “FK” signature.

# parse lines
filename = 'faktur-keluaran.csv'

with open(filename) as f:
  lines = f.readlines()
  f.close()

count = 0
for line in lines:
  count += 1
  if count==4:
    # do something

We can pair the header and the content entry.

  if count==4:
    print(f'line {count}:\n{line}')

    values = re.split(r',(?=")', line)
    values = [value.replace('"', '')
              for value in values]

    pairs = dict(zip(keys, values))

    for key in pairs:
      print(f'{key:>10} : {pairs[key][:40]}')

Simple Parser

Result

Then display the result as below. There are many fields in just one line.

Simple Parser: Output

line 4:
"FK","01","0","0042131234567","1","2020","15/01/2020","081443518011000","PT. Teliti Telaten Tekun","Jl. Gigih Gagah No. 1234 RT:007 RW:007","12000000","1200000","0","","1","6000000","600000","0","",

        FK : FK
      Kode : 01
     Ganti : 0
    Faktur : 0042131234567
      Masa : 1
     Tahun : 2020
   Tanggal : 15/01/2020
      NPWP : 081443518011000
      Nama : PT. Teliti Telaten Tekun
    Alamat : Jl. Gigih Gagah No. 1234 RT:007 RW:007
       DPP : 12000000
       PPn : 1200000
     PPnBM : 0
Keterangan : 
        FG : 1
    UM DPP : 6000000
    UM PPn : 600000
  UM PPnBM : 0
 Referensi : ,

Python is powerful, right! With only this few lines, we are already have this output above.

3: Three Kind of Headers

Examine Master Detail Header

This is exactly the same with previous example, but this time we examine three headers: “FK”, “FAPR”, or “OF”, instead just the first “FK” header.

Source

Just a modification of previous code.

gitlab.com/…/csv-md-openpyxl/02-check.py

Header

Again we need to copy-and-paste the column header definition.

hd_fk  = \
    '"FK","Kode","Ganti","Faktur","Masa",' +\
    '"Tahun","Tanggal","NPWP","Nama","Alamat",' +\
    '"DPP","PPn","PPnBM","Keterangan","FG",' +\
    '"UM DPP","UM PPn","UM PPnBM","Referensi"'

hd_fpr = \
    '"LT","Nama","Alamat","Perekam",' +\
    '"Wilayah","Timestamp","Hash"'

hd_of  = \
    '"OF","Kode","Nama","Satuan","Jumlah",' +\
    '"Total","Diskon","DPP","PPN","Tarif","PPnBM"'

Checking Entries

Then split using regular expression.

import re

keys_fk = re.split(r',(?=")', hd_fk)
keys_fk = [key.replace('"', '')
           for key in keys_fk]

keys_fpr = re.split(r',(?=")', hd_fpr)
keys_fpr = [key.replace('"', '')
            for key in keys_fpr]

keys_of = re.split(r',(?=")', hd_of)
keys_of = [key.replace('"', '')
           for key in keys_of]

Checking Entries

Sample Content

Consider examine all entries. We need to ignore the first three rows, because these contain header name, insetad of entry.

Since your output might have hundres of lines, consider limit the row to just the few lines. This could be the first ten lines or more, depend on your needs.

# parse lines
filename = 'faktur-keluaran.csv'

with open(filename) as f:
  lines = f.readlines()
  f.close()

count = 0
for line in lines:
  count += 1
  if (count>3) and (count<10):
    # examine all entries

We can pair the header and the content entry, depend on the signature: “FK”, “FAPR”, or “OF”,

  if (count>3) and (count<10):

    values = re.split(r',(?=")', line)
    values = [value.replace('"', '')
              for value in values]

    if values[0]=="FK":
      pairs = dict(zip(keys_fk, values))
      for key in pairs:
        print(f'{key:>10} : {pairs[key][:40]}')
      print()

    if values[0]=="FAPR":
      pairs = dict(zip(keys_fpr, values))
      for key in pairs:
        print(f'{key:>10} : {pairs[key][:40]}')
      print()

    if values[0]=="OF":
      pairs = dict(zip(keys_of, values))
      for key in pairs:
        print(f'{key:>10} : {pairs[key][:40]}')
      print()

Checking Entries

Result

Then display the result as below. There are three different output:

“FK” Example

Checking Entries: Output

You might notice comma ”," int the last field. This is a bug and we are going to fix this later.

        FK : FK
      Kode : 01
     Ganti : 0
    Faktur : 0042131234567
      Masa : 1
     Tahun : 2020
   Tanggal : 15/01/2020
      NPWP : 081443518011000
      Nama : PT. Teliti Telaten Tekun
    Alamat : Jl. Gigih Gagah No. 1234 RT:007 RW:007
       DPP : 12000000
       PPn : 1200000
     PPnBM : 0
Keterangan : 
        FG : 1
    UM DPP : 6000000
    UM PPn : 600000
  UM PPnBM : 0
 Referensi : ,

“FAPR” Example, followed by one “OF” Example

Checking Entries: Output

        LT : FAPR
      Nama : PT Tiga Jam Saja
    Alamat : Gedung Gorila Lantai Dasar L13 JL. Worte
   Perekam : Nobita
   Wilayah : Jakarta Timur
 Timestamp : 20200115123148
      Hash : MAACLIYJKoZIhvcNAGcCoIICHTCCAhkCAGExDzAN

        OF : OF
      Kode : 
      Nama : Motor iPhone
    Satuan : 12000000
    Jumlah : 1.0
     Total : 12000000
    Diskon : 0.0
       DPP : 12000000
       PPN : 1200000.0
     Tarif : 0
     PPnBM : 0.0

Another “FAPR” Example, followed by one “OF” Example

Checking Entries: Output

        LT : FAPR
      Nama : PT Tiga Jam Saja
    Alamat : Gedung Gorila Lantai Dasar L13 JL. Worte
   Perekam : Suzuka
   Wilayah : Jakarta Timur
 Timestamp : 20200117113521
      Hash : MAACLIYJKoZIhvcNAGcCoIICHTCCAhkCAGExDzAN

        OF : OF
      Kode : 
      Nama : Ramuan Anti Gatal
    Satuan : 13000000
    Jumlah : 1.0
     Total : 13000000
    Diskon : 0.0
       DPP : 13000000
       PPN : 1300000.0
     Tarif : 0
     PPnBM : 0.0

The last image, is actually has three “OF” details. But this would be too long to be shown. So I cut the output.

4: Class Parser

Object Oriented Approach

The fact that the code above run well. Does not means our the code is good.

After quick and dirty approach, We need to write proper code.

We shall begin with the simple class.

Source

Let’s get civilized, nomore barbaric code.

gitlab.com/…/csv-md-openpyxl/11-simple.py

Skeleton

I would like to make a master detail faktur class.

class FakturMD2Sheet:
  def __init__(self, sheet):
    # ...

  # some function here
  # ...

  def run(self):
    # ...

def main():
  # class initialization ...

main()

Header

This is just an OOP version of previous step by step starter.

Class Parser

import re

# Master Detail Faktur Exporter Class
class FakturMD:
  header_fk  = \
    '"FK","Kode","Ganti","Faktur","Masa",' +\
    '"Tahun","Tanggal","NPWP","Nama","Alamat",' +\
    '"DPP","PPn","PPnBM","Keterangan","FG",' +\
    '"UM DPP","UM PPn","UM PPnBM","Referensi"'

Class Initialization

We can access using class properties using self keyword.

Class Parser

  def __init__(self, filename):
    # prepare header
    keys = re.split(r',(?=")', self.header_fk)
    self.keys = [key.replace('"', '') for key in keys]

    # parse lines
    with open(filename) as f:
      self.lines = f.readlines()
      f.close()

Run: Display Entries

And call the properties anywhere within the class.

Class Parser

  def run(self):
    count = 0

    for line in self.lines:
      count += 1
      if count==4:
        print(f'line {count}:\n{line}')
        values = re.split(r',(?=")', line)
        values = [value.replace('"', '')
                  for value in values]

        pairs = dict(zip(self.keys, values))

        for key in pairs:
          print(f'{key:>10} : {pairs[key][:40]}')

Main

And finally make a class instance in main function.

Class Parser

def main():
  filename = 'faktur-keluaran.csv'
  md = FakturMD(filename)
  md.run()

main()

Result

Then display the result as below. The result is very similar to previous code.

Class Parser: Output

5: Master Detail Class

Examine Master Detail in Class Fashioned

Consider go further with class, examining all the three headers “FK”, “FAPR”, or “OF”.

Source

Just an OOP version modification of previous code.

gitlab.com/…/csv-md-openpyxl/12-check.py

Skeleton

All definition shown here:

def split_quotes(header):

class FakturMD:
  def __init__(self, filename):
  def init_header_keys(self):
  def print_record(self, keys, values):
  def run(self):

def main():

main()

Helper: Split Quotes

Not everything have to be stored in class. Such as this class helper below:

Class Parser

import re

def split_quotes(header):
  keys = re.split(r',(?=")', header)
  return [key.replace('"', '') for key in keys]

Class Initialization

We can start all in initialization.

Class Parser

# Master Detail Faktur Exporter Class
class FakturMD:
  def __init__(self, filename):
    # prepare header
    self.init_header_keys()

    # parse lines
    with open(filename) as f:
      self.lines = f.readlines()
      f.close()

Header

And put header definition in its own function.

Class Parser

  def init_header_keys(self):
    header_fk  = \
    '"FK","Kode","Ganti","Faktur","Masa",' +\
    '"Tahun","Tanggal","NPWP","Nama","Alamat",' +\
    '"DPP","PPn","PPnBM","Keterangan","FG",' +\
    '"UM DPP","UM PPn","UM PPnBM","Referensi"'

    header_fpr = \
    '"LT","Nama","Alamat","Perekam",' +\
    '"Wilayah","Timestamp","Hash"'

    header_of  = \
    '"OF","Kode","Nama","Satuan","Jumlah",' +\
    '"Total","Diskon","DPP","PPN","Tarif","PPnBM"'

    self.keys_fk  = split_quotes(header_fk)
    self.keys_fpr = split_quotes(header_fpr)
    self.keys_of  = split_quotes(header_of)

This way, we can be more flexible to alter class properties.

Helper: Print Record

We can also put helper method as a part of this FakturMD class.

Class Parser

  def print_record(self, keys, values):
    pairs = dict(zip(keys, values))
    for key in pairs:
      print(f'{key:>10} : {pairs[key][:40]}')
    print()

Run: Display Entries

And call the method anywhere within the class.

Class Parser

  def run(self):
    count = 0

    for line in self.lines:
      count += 1

      # ignore the first three lines
      if (count>3) and (count<10):
        values = split_quotes(line)

        if values[0]=="FK": 
          self.print_record(self.keys_fk, values)

        if values[0]=="FAPR":
          self.print_record(self.keys_fpr, values)

        if values[0]=="OF":
          self.print_record(self.keys_of, values)

Main

And finally make a class instance in main function.

Class Parser

def main():
  filename = 'faktur-keluaran.csv'
  md = FakturMD(filename)
  md.run()

main()

Result

Then display the result as below. The result is very similar to previous code.

Class Parser: Output

What is Next 🤔?

We are not done yet. We haven’t even touch the spreadsheet part.

Consider continue reading [ Python - Excel - CSV - Part Two ].

Python - Excel - CSV - Part One

Preface

Local Issue

1: CSV Case: Faktur Keluaran

Source

Header

Content

Row Content

2: Simple Parsing

Source

Header

Sample Content

Result

3: Three Kind of Headers

Source

Header

Sample Content

Result

4: Class Parser

Source

Skeleton

Header

Class Initialization

Run: Display Entries

Main

Result

5: Master Detail Class

Source

Skeleton

Helper: Split Quotes

Class Initialization

Header

Helper: Print Record

Run: Display Entries

Main

Result

What is Next 🤔?