The file has the numeric value toward the end of the first
field,
" La la la, bla bla bla 123 bla bla ", "ksdjf"
" La , Bla 123 la la blala bla", "ksdjkjf"
I want to check the -3 word of the first field, if it is numeric, then add "," before the -3 word to delimit a new field. If not check the -4 word , if it is numberic then add "," before it. This will isolate the numeric word and the following text in field 2. It needs to work from the end of the field.
Sed? Gawk? Awk? Grep?
A possible solution with 'awk'
-----------
awk '
{
# Get Field 1
if (match($0, /^"[^"]*",/) == 0) {
print $0;
next;
}
field1 = substr($0,1,RLENGTH-1);
# Search for number in word 3 or 4 starting from the end of field1
if (match(field1,/[0-9]+ +[^ ]+ +[^ ]+ *"$/) == 0) {
if (match(field1,/[0-9]+ +[^ ]+ +[^ ]+ +[^ ]+ *"$/) == 0) {
print $0;
next;
}
}
# Insert "," before number
print substr($0,1,RSTART-1) "\",\"" substr($0,RSTART,length($0)-RSTART+1);
}
' input_file
-----------
If your version of awk supports "interval expression", you can rewrite the two last if statements :
if (match(field1,/[0-9]+( +[^ ]+){2} *"$/) == 0) {
if (match(field1,/[0-9]+( +[^ ]+){3} *"$/) == 0) {
With the following input data :
" La la la, bla bla bla 123 bla bla ", "ksdjf"
" La , Bla 123 lala blala bla", "ksdjkjf"
" La , Bla 123 la la blala bla", "ksdjkjf"
The result is :
" La la la, bla bla bla ","123 bla bla ", "ksdjf"
" La , Bla ","123 lala blala bla", "ksdjkjf"
" La , Bla 123 la la blala bla", "ksdjkjf"
Quick Links:
Have a Unix Problem
Do you have
a UNIX Question?
Unix Books :-
UNIX Programming, Certification,
System Administration, Performance Tuning Reference Books
Return to : - Unix System Administration Hints and Tips
(c) www.gotothings.com All material on this site is
Copyright.
Every effort is made to ensure the content integrity.
Information used on this site is at your own risk.
All product names are trademarks of their respective
companies.
The site www.gotothings.com is in no way affiliated
with or endorsed by any company listed at this site.
Any unauthorised copying or mirroring is prohibited.