Format a unix file on Teradata

I have a requirement to format a file and am noticing a bug and inefficency. Can somebody give some suggestions?

Input Data ( i mean columns header and rows)
-------------

current_licl_nbr| policy_id | plociyhold_id|mail_allowed_id|email_address_txt
--------------------
701000002990.| 200000000175.| 200000000175.| 2|xyz@abcd.ATT.NET
 

output data should look like

701000002990|200000000175|200000000175|2|XYZ@ABCD.COM

The Current commands we are using has a bug and is generating the output like the below and is very ineffiecent.
There is no dot between ABCD and COM

Current output generated
------------------------
701000002990|200000000175|200000000175|2|XYZ@ABCDCOM

The code we have in the script is

csplit -ks -f ${DWH_OUT}/other/a1prefix ${DWH_OUT}/other/a1_xxxx.tmp 3
cat ${DWH_OUT}/other/a1prefix01|sed -e 's/ //g' -e 's/\.//g' >${DWH_OUT}/other/a1_xxxx.tmp\

The amount of data that is being formatted would be around 6,000,000.
Does anybody have a suggestion to fix the bug in a efficent manner?
 

Have you thought about using Perl?
I noticed that your desired output has 2 additional changes made that your code doesn't show.
1) changed the line to uppercase
2) changed @abcd.ATT.NET to @ABCD.COM

Here's 2 variations of 1 Perl solution (with Perl "there is always more than 1 way to to anything"). These will not make the changes I noted above, but could easily be added.

from command line:

perl -pe "s/\.\|\s*/|/g" input.txt > output.txt

or

perl -pi -e "s/\.\|\s*/|/g" input.txt

The second one does an inline edit of the original file.

I ran a benchmark test on a 6,000,000 line file and it took between 120 to 130 seconds to complete on a slow Windows PII 550 machine.

----------------------------------

For your reference, here are the complete scripts that I used to test/benchmark.script to create the source file:
#!/usr/bin/perl -w

open OUT, ">braveking.txt" or die $!;

for (1..6000000) {
print OUT "701000002990.| 200000000175.| 200000000175.| 2|xyz\@abcd.ATT.NET\n";
}
 

benchmark script:
#!/usr/bin/perl -w

use Time::HiRes 'time';

for $i (1..50) {
open IN, "<braveking.txt" or die $!;
open OUT, ">reformat.txt" or die $!;
$start = time;
while (<IN>) {
s/\.\|\s*/|/g;
print OUT;
}
$delta = time - $start;
printf "Loop $i took %.2f seconds\n", $delta;
}

Have a Unix Problem
Unix Forum - Do you have a UNIX Question?

Unix Books :-
UNIX Programming, Certification, System Administration, Performance Tuning Reference Books

Return to : - Unix System Administration Hints and Tips

(c) www.gotothings.com All material on this site is Copyright.
Every effort is made to ensure the content integrity.  Information used on this site is at your own risk.
All product names are trademarks of their respective companies.
The site www.gotothings.com is in no way affiliated with or endorsed by any company listed at this site.
Any unauthorised copying or mirroring is prohibited.