It’s pretty rare that I publish anything about my work, but given the difficulty I had in figuring out this particular problem with online research (in the end, I only found the answer by experimenting) I thought it would be useful to others if I published the solution.

One of the first problems I always encounter when searching industry standard technologies as they apply to the IBM i platform is that the name of the platform is incredibly hard to include in search terms, so I’ll helpfully mention here that this also applies if your search terms are AS/400, AS400, iSeries, System i, or i5/OS. Heck, I’ll even throw in a gratuitous eServer reference!

With that out of the way, the problem to be solved: using QShell to apply Unix text manipulation commands to a stream file, making use of the power of regular expressions. In my case I had downloaded a CSV file from a public site and needed to take care of some formatting prior to using the CPYFRMIMPF command to load it into a database file. I'd used this method before, but hit an additional snag this time around.

There were three problems to solve with the file, all of which I knew I could attack with sed. If you're not familiar with sed, here's a brief introduction to how I use it.

sed -e 's/find this/replace with this/'

You can probably figure out from the example what it will do. It's a simple search and replace (the initial 's' means 'substitute'). As with most Unix commands, in this default form it will take standard input, filter it, and write the result to standard output. Later on I'll hook those up to the files I need.

The next thing to know for this example is how regular expressions work. That's way too deep a subject for me to cover here. If you need to learn this, I recommend having a read of parts 17 and 18 of Bart Busschots' Taming the Terminal series. I'll be using some basic features plus back references to capture groups. They will be deployed within the sed command.

So to the first problem. The first two fields in each record were actually numeric but provided as quoted values, so CPYFRMIMPF would treat them as strings and complain about the target fields being numeric. Also, there was a requirement to concatenate these two values (ironically, as if they were strings) as a new value at the end of each record. The goal, then, to strip the quotes and to append a new value on the end of the record. Back references to the rescue.

sed -e 's/^"([0-9]{1,3})","([0-9]{1,4})"(.*)$/\1,\2\3,\1\2/'

That looks pretty complex but if you break it down, it's simply looking for two quoted groups of digits. (I specified the lengths but could have opted for lazy matching instead.)

But here's the key reason I'm writing this article – as written above, the command will not work in QShell. It's relatively common to need to escape capture parentheses thus: \(  ... \) and this is indeed required in QShell. But even then it won't work. The trick that took me a good hour of experimenting to discover is that QShell also requires the cardinality braces { } to be escaped! I've not seen this on any other platform. Here's what actually works.

sed -e 's/^"\([0-9]\{1,3\}\)","\([0-9]\{1,4\}\)"\(.*\)$/\1,\2\3,\1\2/'

It looks a complete mess with all the back-slashes required to escape both parentheses and braces. At least the square brackets do not need the same treatment!

With my new snag out of the way (I'd not previously used cardinality), let's move on to the next problem in the file – the dates are presented as "2017-07-15" and we need them as an 8 digit number. So we need to strip the quotes and the hyphens.

sed -e 's/"\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)"/\1\2\3/g'

This is a pretty simple example, but adds the 'g' modifier on the end to enable it to replace all matching dates in the line, not just the first.

Finally, there were a lot of empty fields in the file and because every field was quoted and some of these needed to be treated as numeric, I decided to remove all empty quotes. Which is to say, find any adjacent pair of double-quote characters and remove them. Again, this applies as many times as it can on the line.

sed -e 's/,""/,/g'

That was a really simple one. Once I had these working on the QShell command line, it was time to wrap them up for execution in my CL program, at which point I added the input file on the front by issuing a cat command and piping it into the multi-action sed command, and finally redirecting that output to my result file. Note the doubling up of the single quotes in the sed command to allow for the overall quoting of the CMD( ) parameter string.

STRQSH CMD('cat /tmp/inputfile.csv | +
sed -e ''s/^"\([0-9]\{1,3\}\)","\([0-9]\{1,4\}\)"\(.*\)$/\1,\2\3,\1\2/'' +
-e ''s/"\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)"/\1\2\3/g'' +
-e ''s/,""/,/g'' > /tmp/outputfile.csv') 

And there you have it! Simple!

Comment