Article Series

This article series discuss more than 30 different programming languages. Please read overview before you read any of the details.

Playing with Records Related Articles.

Where to Discuss?

Local Group

Preface

Goal: Utilizing GNU Sed for string processing.

This is like a hardcore part of the shell article series. The regular express is scary for beginner, because of the inhuman syntax. But after a while, regexp looks common sense.

Reference Reading

You need to read the official document first, before you read this article.

I have warn you.

I won’t say I’ve told you.

Source Examples

You can obtain source examples here:


Common Use Case

Task: Get the unique tag string

Please read overview for more detail.

Data Structure Support

We are going to use external text file, consist of CSV like field.

Very similar to previous awk example data.

Prepopulated Data

Songs and Poetry

The data is simply a text file.

Cantaloupe Island; 60s,jazz
Let It Be;60s,rock
Knockin' on Heaven's Door;  70s,  rock
Emotion; 70s, pop
The River

SED Solution

The Answer

It is so hieroglyph.

#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d; s/^(.*);(.*)$/\2/
' "$@" | 
exec sed -E '
:label;N;$!b label
s/(,|  *|\n ?)/ /g;s/^ //g;s/ /\n/g
' | sort -u | 
exec sed -E '
:label;N;$!b label
s/\n/:/g
'

Enough with introduction, at this point we should go straight to coding.

Environment

No need any special setup. Just run and voila..!


1: Field in Sed

We are going to check how far sed, can handle data structure.

Simple Array

Consider begin with simple line contain this field below.

rock,  jazz,rock, pop, pop

Sed: A Simple Array Data in Text File

Then we can process the text file with awk directly with code below:

#!/bin/sh
exec sed '
s/,/ /g
s/  */ /g
' "$@"
  • With output result as below:
❯ ./01-tags.sh my-tags.txt
rock jazz rock pop pop

Sed: Simple Tags

Tnhe sed above strip pattern into space.

Debug

If you would like to know what happend under the hood, you can use debug option argument.

#!/bin/sh
exec sed --debug '
s/,/ /g
s/  */ /g
' "$@"
  • With output result as below:
❯ ./01-debug.sh my-tags.txt
SED PROGRAM:
  s/,/ /g
  s/  */ /g
INPUT:   'my-tags.txt' line 1
PATTERN: rock,  jazz,rock, pop, pop
COMMAND: s/,/ /g
MATCHED REGEX REGISTERS
  regex[0] = 4-5 ','
PATTERN: rock   jazz rock  pop  pop
COMMAND: s/  */ /g
MATCHED REGEX REGISTERS
  regex[0] = 4-7 '   '
PATTERN: rock jazz rock pop pop
END-OF-CYCLE:
rock jazz rock pop pop

Sed: Debug

Record

Now consider this data, separated by semicolon as separator delimiter.

Cantaloupe Island; 60s,jazz
Let It Be;60s,rock
Knockin' on Heaven's Door;  70s,  rock
Emotion; 70s, pop
The River

Sed: The Songs Record in CSV like Text File

We can remove unnecessary part with pattern, such as filed withous tags.:

/^(.*);(.*)$/!d

And then show only the second field separated by semicolon ;:

s/^(.*);(.*)$/\2/

And process with further code, as complete script below:

#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d
s/^(.*);(.*)$/\2/
s/,/ /g
s/  */ /g
s/^ //g
' "$@"

With output result as below:

❯ ./02-extract-a.sh my-songs.txt
60s jazz
60s rock
70s rock
70s pop

Sed: Output associative array

Or alternatively, we can have shorter code by just replacing the first field to empty.

#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d
s/(.*;)//
s/,/ /g
s/  */ /g
s/^ //g
' "$@"

With output result exactly the same as above.

Sed: Output associative array


Join Lines with Label

How does it works?

Simple Example

In order to examine how sed label works, consider start from simple example:

one
two
three
four
five

Sed: Text File Containing Multiple Lines

And process with script below:

#!/bin/sh
exec sed -E '
:label
s/three//g
N
$!b label

s/\n/:/g
' "$@"
  • With output result as below:
❯ ./03-join.sh my-lines.txt
one:two::four:five

Sed: Simple Label

Applying to Data

With previous sed, we can pipe the result to flatten our data.

#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d
s/^(.*);(.*)$/\2/
s/,/ /g
s/  */ /g
s/^ //g
' "$@" | 
exec sed -E '
:label
N
$!b label
s/\n/ /g
s/ /:/g
'
  • With output result as below:
❯ ./04-flatten.sh my-songs.txt
60s:jazz:60s:rock:70s:rock:70s:pop

Sed: Flatten Array

Looks like magic?

The Joining Code

These strange lies in joining label.

exec sed -E '
:label
N
$!b label
s/\n/ /g
s/ /:/g
'

3: Finishing The Task

Unique

There is a pure unique script in official documentation. But I prefer the simple approach instead, using sort -u command, the redisplay with delimiter separator.

This contain three parts:

  1. Extract field
#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d
s/^(.*);(.*)$/\2/
' "$@" | 
  1. Flatten, and output with newlines, then unique
exec sed -E '
:label
N
$!b label

s/(,|  *|\n ?)/ /g
s/^ //g

s/ /\n/g
' | sort -u | 
  1. Redisplay with delimiter separator.
exec sed -E '
:label
N
$!b label

s/\n/:/g
'
  • With output result as below:
❯ ./05-unique.sh my-songs.txt
:60s:70s:jazz:pop:rock

Sed: Unique Array

This is not a perfect result, but it is enough for me.

If you desire to make it more cryptic, you can combine a few lines into oneliner.

#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d; s/^(.*);(.*)$/\2/
' "$@" | 
exec sed -E '
:label;N;$!b label
s/(,|  *|\n ?)/ /g;s/^ //g;s/ /\n/g
' | sort -u | 
exec sed -E '
:label;N;$!b label
s/\n/:/g
'
  • With about similar result.

Happy Now?


What is Next 🤔?

Consider continue reading [ Perl - Playing with Records - Part One ].