Preface
Goal: Utilizing GNU Sed for string processing.
This is like a hardcore part of the shell article series. The regular express is scary for beginner, because of the inhuman syntax. But after a while, regexp looks common sense.
Reference Reading
You need to read the official document first, before you read this article.
I have warn you.
I won’t say I’ve told you.
Source Examples
You can obtain source examples here:
Common Use Case
Task: Get the unique tag string
Please read overview for more detail.
Data Structure Support
We are going to use external text
file,
consist of CSV like field.
Very similar to previous awk
example data.
Prepopulated Data
Songs and Poetry
The data is simply a text file.
Cantaloupe Island; 60s,jazz
Let It Be;60s,rock
Knockin' on Heaven's Door; 70s, rock
Emotion; 70s, pop
The River
SED Solution
The Answer
It is so hieroglyph.
#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d; s/^(.*);(.*)$/\2/
' "$@" |
exec sed -E '
:label;N;$!b label
s/(,| *|\n ?)/ /g;s/^ //g;s/ /\n/g
' | sort -u |
exec sed -E '
:label;N;$!b label
s/\n/:/g
'
Enough with introduction, at this point we should go straight to coding.
Environment
No need any special setup. Just run and voila..!
1: Field in Sed
We are going to check how far sed
,
can handle data structure.
Simple Array
Consider begin with simple line contain this field below.
rock, jazz,rock, pop, pop
Then we can process the text file with awk
directly with code below:
#!/bin/sh
exec sed '
s/,/ /g
s/ */ /g
' "$@"
- With output result as below:
❯ ./01-tags.sh my-tags.txt
rock jazz rock pop pop
Tnhe sed
above strip pattern into space.
Debug
If you would like to know what happend under the hood,
you can use debug
option argument.
#!/bin/sh
exec sed --debug '
s/,/ /g
s/ */ /g
' "$@"
- With output result as below:
❯ ./01-debug.sh my-tags.txt
SED PROGRAM:
s/,/ /g
s/ */ /g
INPUT: 'my-tags.txt' line 1
PATTERN: rock, jazz,rock, pop, pop
COMMAND: s/,/ /g
MATCHED REGEX REGISTERS
regex[0] = 4-5 ','
PATTERN: rock jazz rock pop pop
COMMAND: s/ */ /g
MATCHED REGEX REGISTERS
regex[0] = 4-7 ' '
PATTERN: rock jazz rock pop pop
END-OF-CYCLE:
rock jazz rock pop pop
Record
Now consider this data, separated by semicolon as separator delimiter.
Cantaloupe Island; 60s,jazz
Let It Be;60s,rock
Knockin' on Heaven's Door; 70s, rock
Emotion; 70s, pop
The River
We can remove unnecessary part with pattern, such as filed withous tags.:
/^(.*);(.*)$/!d
And then show only the second field separated by semicolon ;
:
s/^(.*);(.*)$/\2/
And process with further code, as complete script below:
#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d
s/^(.*);(.*)$/\2/
s/,/ /g
s/ */ /g
s/^ //g
' "$@"
With output result as below:
❯ ./02-extract-a.sh my-songs.txt
60s jazz
60s rock
70s rock
70s pop
Or alternatively, we can have shorter code by just replacing the first field to empty.
#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d
s/(.*;)//
s/,/ /g
s/ */ /g
s/^ //g
' "$@"
With output result exactly the same as above.
Join Lines with Label
How does it works?
Simple Example
In order to examine how sed label works, consider start from simple example:
one
two
three
four
five
And process with script below:
#!/bin/sh
exec sed -E '
:label
s/three//g
N
$!b label
s/\n/:/g
' "$@"
- With output result as below:
❯ ./03-join.sh my-lines.txt
one:two::four:five
Applying to Data
With previous sed, we can pipe the result to flatten our data.
#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d
s/^(.*);(.*)$/\2/
s/,/ /g
s/ */ /g
s/^ //g
' "$@" |
exec sed -E '
:label
N
$!b label
s/\n/ /g
s/ /:/g
'
- With output result as below:
❯ ./04-flatten.sh my-songs.txt
60s:jazz:60s:rock:70s:rock:70s:pop
Looks like magic?
The Joining Code
These strange lies in joining label.
exec sed -E '
:label
N
$!b label
s/\n/ /g
s/ /:/g
'
3: Finishing The Task
Unique
There is a pure unique script in official documentation.
But I prefer the simple approach instead,
using sort -u
command, the redisplay with delimiter separator.
This contain three parts:
- Extract field
#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d
s/^(.*);(.*)$/\2/
' "$@" |
- Flatten, and output with newlines, then unique
exec sed -E '
:label
N
$!b label
s/(,| *|\n ?)/ /g
s/^ //g
s/ /\n/g
' | sort -u |
- Redisplay with delimiter separator.
exec sed -E '
:label
N
$!b label
s/\n/:/g
'
- With output result as below:
❯ ./05-unique.sh my-songs.txt
:60s:70s:jazz:pop:rock
This is not a perfect result, but it is enough for me.
If you desire to make it more cryptic, you can combine a few lines into oneliner.
#!/bin/sh
exec sed -E '
/^(.*);(.*)$/!d; s/^(.*);(.*)$/\2/
' "$@" |
exec sed -E '
:label;N;$!b label
s/(,| *|\n ?)/ /g;s/^ //g;s/ /\n/g
' | sort -u |
exec sed -E '
:label;N;$!b label
s/\n/:/g
'
- With about similar result.
Happy Now?
What is Next 🤔?
Consider continue reading [ Perl - Playing with Records - Part One ].