Unix - Text Reformatting

Text Reformatting

I have a text file that looks like this:

03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,other , 0.68,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,other , 0.69
I need it to look like this (first line is a header line):
Date,oracle,nis,other,network,memory_management,other_user_root,
03/04/04,0.55,0.43,0.61,0.11,0.10,3.76,
03/04/04,0.00,0.68,0.11,0.07,3.14,0.69,

In English... basically every application polled needs to be in columnar format. Applications can be added or removed during the month so it should place a zero value if there is no value for that pariticular sample period.

I can use shell or perl, but shell is preferred. I wrote a script to do it, but it is verrrry slooow, and I didn't know how to handle if an application wasn't consistant throughout the sample period. Do you guys know of an easy way to do this?

---------->

Your problem would be more simple to resolve if there was a ligne delimiting periods in your file.
For example :

03/04/04,NEW_PERIOD
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,other , 0.68,
03/04/04,NEW_PERIOD
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,other , 0.69

---------->

Well I could probably accomplish that by extracting an additional time field. For example. The date probably has a sample time that would be uniq to each sample period. I would just omit that field on the output. I'm checking to see if that is an option.

---------->

#kludge, but fun
Sfile=/tmp/$$
sort -o $Sfile $1
for d in $(awk -F, '{ print $1 }' $Sfile|sort -u)
do
f2=$(echo $(grep $d $Sfile|awk -F, '{ print $2 }' |sort -u))
echo "Date $(echo $f2|sed 's/ /,/g'),"
v1=""
v2=""
for f in $f2
do
vg1=$(grep -c "$d,.*$f ," $Sfile)
if [ $vg1 = 1 ]
then
v1="$v1, 0"
else
v1="$v1, $(echo $(grep "$d,.*$f ," $Sfile|head -1|awk -F, '{ print $3 }'))"
fi
v2="$v2, $(echo $(grep "$d,.*$f ," $Sfile|tail -1|awk -F, '{ print $3 }'))"
done
echo "$d $v1,"
echo "$d $v2,"

done
rm $Sfile

---------->

The following awk script assume that a delimiting line exists for each period (this line start a new period and is identified by the NEW_PERIOD application)
The input datas :

03/04/04,NEW_PERIOD
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,NEW_PERIOD
03/04/04,other , 0.68,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,NEW_PERIOD
03/04/04,other , 0.69

The awk script :

#!/usr/bin/awk -f
#
# Initialize
#
BEGIN {
???FS = ",";
???header = "Date";
}

#
# New period,
# Memorize in periods[]
#
$2=="NEW_PERIOD" {
???periods[++period] = $1;
???next;
}

#
# Application,
# Memorize appli in applis[] and value in values[]
#
{

???# Get appli (without leading spaces) and value
???app = $2;
???val = $3;
???sub(/[[:space:]]*$/,"",app);

???# Get appli id, if first time affect id and memorize
???if (app in applis)
??????app_id = applis[app];
???else {
??????app_id = ++appli;
??????applis[app] = app_id;
??????header = header "," app;
???}

???# Memorize value for application in period
???values[period, app_id ] += val;
}

#
# End of datas, print result
#
END {
???print header;

???# For each period, display the value of each application?
???for (periode_id=1; periode_id<=period; periode_id++) {
??????line = periods[periode_id];

??????for (app_id=1; app_id<=appli; app_id++) {
??????????id = periode_id SUBSEP app_id;
??????????if (id in values)
?????????????line = line "," values[id];
??????????else
?????????????line = line ",0.00";
??????}

??????print line;
???}
}

The result :

Date,oracle,nis,other,network,memory_management,other_user_root
03/04/04,0.55,0.43,0.61,0.11,0.1,3.76
03/04/04,0.00,0.00,0.68,0.11,0.07,3.14
03/04/04,0.00,0.00,0.69,0.00,0.00,0.00

---------->

I thought G-M wanted only the 1st and last.
For file:
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,other , 0.68,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,other , 0.69
03/05/04,oracle , 0.55,
03/05/04,nis , 0.43
03/05/04,other , 0.81,
03/05/04,ZOrt , 0.81,
03/05/04,network , 0.11,
03/05/04,other_user_root , 3.76,
03/05/04,other , 0.88,
03/05/04,network , 0.11,
03/05/04,other_user_root , 3.14,
03/05/04,other , 0.89

my script gives:

Date memory_management,network,nis,oracle,other,other_user_root,
03/04/04 , 0.07, 0.11, 0, 0, 0.61, 3.14,
03/04/04 , 0.10, 0.11, 0.43, 0.55, 0.69, 3.76,
Date network,nis,oracle,other,other_user_root,ZOrt,
03/05/04 , 0.11, 0, 0, 0.81, 3.14, 0,
03/05/04 , 0.11, 0.43, 0.55, 0.89, 3.76, 0.81,

---------->

First of all... thanks everyone for your kind suggestions.
To help determine the iteration "NEW_PERIOD". I was able to change the data format so that it could more easily be ascertained.

Here's a sample:

03/15/04,1079308800,oracle , 0.21,
03/15/04,1079308800,other , 0.64,
03/15/04,1079308800,network , 0.10,
03/15/04,1079308800,memory_management , 0.05,
03/15/04,1079308800,other_user_root , 1.51,
03/15/04,1079312400,other , 0.63,
03/15/04,1079312400,network , 0.11,
03/15/04,1079312400,memory_management , 0.05,
03/15/04,1079312400,other_user_root , 1.51,

The second field is seconds since 1970 (longtime).

I appreciate the sample scripts. I will work through them to learn a little more about sorting using these methods.

See Also
Unix Administrator Career Path

Have a Unix Problem
Do you have a UNIX Question?

Unix Books :-
UNIX Programming, Certification, System Administration, Performance Tuning Reference Books

Return to : - Unix System Administration Hints and Tips

(c) www.gotothings.com All material on this site is Copyright.
Every effort is made to ensure the content integrity. Information used on this site is at your own risk.
All product names are trademarks of their respective companies.
The site www.gotothings.com is in no way affiliated with or endorsed by any company listed at this site.
Any unauthorised copying or mirroring is prohibited.