I have a text file that looks like this:
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,other , 0.68,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,other , 0.69
I need it to look like this (first line is a header line):
Date,oracle,nis,other,network,memory_management,other_user_root,
03/04/04,0.55,0.43,0.61,0.11,0.10,3.76,
03/04/04,0.00,0.68,0.11,0.07,3.14,0.69,
In English... basically every application polled needs to be in columnar format. Applications can be added or removed during the month so it should place a zero value if there is no value for that pariticular sample period.
I can use shell or perl, but shell is preferred. I wrote a script to do it, but it is verrrry slooow, and I didn't know how to handle if an application wasn't consistant throughout the sample period. Do you guys know of an easy way to do this?
---------->
Your problem would be more simple to resolve if there was a ligne delimiting
periods in your file.
For example :
03/04/04,NEW_PERIOD
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,other , 0.68,
03/04/04,NEW_PERIOD
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,other , 0.69
---------->
Well I could probably accomplish that by extracting an additional time field. For example. The date probably has a sample time that would be uniq to each sample period. I would just omit that field on the output. I'm checking to see if that is an option.
---------->
#kludge, but fun
Sfile=/tmp/$$
sort -o $Sfile $1
for d in $(awk -F, '{ print $1 }' $Sfile|sort -u)
do
f2=$(echo $(grep $d $Sfile|awk -F, '{ print $2 }' |sort -u))
echo "Date $(echo $f2|sed 's/ /,/g'),"
v1=""
v2=""
for f in $f2
do
vg1=$(grep -c "$d,.*$f ," $Sfile)
if [ $vg1 = 1 ]
then
v1="$v1, 0"
else
v1="$v1, $(echo $(grep "$d,.*$f ," $Sfile|head -1|awk -F, '{ print
$3 }'))"
fi
v2="$v2, $(echo $(grep "$d,.*$f ," $Sfile|tail -1|awk -F, '{ print
$3 }'))"
done
echo "$d $v1,"
echo "$d $v2,"
done
rm $Sfile
---------->
The following awk script assume that a delimiting line exists for each
period (this line start a new period and is identified by the NEW_PERIOD
application)
The input datas :
03/04/04,NEW_PERIOD
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,NEW_PERIOD
03/04/04,other , 0.68,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,NEW_PERIOD
03/04/04,other , 0.69
The awk script :
#!/usr/bin/awk -f
#
# Initialize
#
BEGIN {
???FS = ",";
???header = "Date";
}
#
# New period,
# Memorize in periods[]
#
$2=="NEW_PERIOD" {
???periods[++period] = $1;
???next;
}
#
# Application,
# Memorize appli in applis[] and value in values[]
#
{
???# Get appli (without leading spaces) and value
???app = $2;
???val = $3;
???sub(/[[:space:]]*$/,"",app);
???# Get appli id, if first time affect id and memorize
???if (app in applis)
??????app_id = applis[app];
???else {
??????app_id = ++appli;
??????applis[app] = app_id;
??????header = header "," app;
???}
???# Memorize value for application in period
???values[period, app_id ] += val;
}
#
# End of datas, print result
#
END {
???print header;
???# For each period, display the value of each application?
???for (periode_id=1; periode_id<=period; periode_id++) {
??????line = periods[periode_id];
??????for (app_id=1; app_id<=appli; app_id++) {
??????????id = periode_id SUBSEP app_id;
??????????if (id in values)
?????????????line = line "," values[id];
??????????else
?????????????line = line ",0.00";
??????}
??????print line;
???}
}
The result :
Date,oracle,nis,other,network,memory_management,other_user_root
03/04/04,0.55,0.43,0.61,0.11,0.1,3.76
03/04/04,0.00,0.00,0.68,0.11,0.07,3.14
03/04/04,0.00,0.00,0.69,0.00,0.00,0.00
---------->
I thought G-M wanted only the 1st and last.
For file:
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,other , 0.68,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,other , 0.69
03/05/04,oracle , 0.55,
03/05/04,nis , 0.43
03/05/04,other , 0.81,
03/05/04,ZOrt , 0.81,
03/05/04,network , 0.11,
03/05/04,other_user_root , 3.76,
03/05/04,other , 0.88,
03/05/04,network , 0.11,
03/05/04,other_user_root , 3.14,
03/05/04,other , 0.89
my script gives:
Date memory_management,network,nis,oracle,other,other_user_root,
03/04/04 , 0.07, 0.11, 0, 0, 0.61, 3.14,
03/04/04 , 0.10, 0.11, 0.43, 0.55, 0.69, 3.76,
Date network,nis,oracle,other,other_user_root,ZOrt,
03/05/04 , 0.11, 0, 0, 0.81, 3.14, 0,
03/05/04 , 0.11, 0.43, 0.55, 0.89, 3.76, 0.81,
---------->
First of all... thanks everyone for your kind suggestions.
To help determine the iteration "NEW_PERIOD". I was able to change
the data format so that it could more easily be ascertained.
Here's a sample:
03/15/04,1079308800,oracle , 0.21,
03/15/04,1079308800,other , 0.64,
03/15/04,1079308800,network , 0.10,
03/15/04,1079308800,memory_management , 0.05,
03/15/04,1079308800,other_user_root , 1.51,
03/15/04,1079312400,other , 0.63,
03/15/04,1079312400,network , 0.11,
03/15/04,1079312400,memory_management , 0.05,
03/15/04,1079312400,other_user_root , 1.51,
The second field is seconds since 1970 (longtime).
I appreciate the sample scripts. I will work through them to learn a little more about sorting using these methods.
Quick Links:
Do you have
a UNIX Question?
Unix Home: Unix System Administration
Hints and Tips