ex data
aaa20040110
aaa20040110
aaa20040110
aaa20040111
aaa20040111
File may contain data for 2 dates or single day. I need to find if it has 2 day data or one day data
---------------------------------------
#!/bin/ksh
exec 3< ./junk.data
while read -u3 line
do
if [[ $(print $line | wc -w) -eq 2 ]]
then
print $line
fi
---------------------------------------
Solution prints each line containing 2 words.
To determine how many unique dates your file contains (keying on word1 in each line), you can isolate that word (with awk or cut, for example), and then do a unique sort which will eliminate duplicates. And this is korn shell syntax, but can be converted for other shells.
k=$(awk '{print $1}' myfile.txt | sort -u | wc -l)
echo "number of unique dates:" $k
The following script will summarize your file on word1:
awk '\
????{date[$1]++}
END {for (i in date)
?????????print i, date[i]}
' myfile.txt
---------------------------------------
Thanks a lot. Next hurdle is I want to know if there are more than
2 dates then how many records for each date.
in above example.
20040110 - 3
20040111 - 2
Once again thanks a lot.
---------------------------------------
The first script gives you a count ($k) of how many unique dates you
have.
The second script provides the summary you are asking for.
---------------------------------------
It's working great. it's dumb q. but how can I get summary data into
unix shell variable ( or array) and manipulate it.
---------------------------------------
and also skip first and last line of file.
---------------------------------------
It is very easy to waste the first line. ?awk could do an initial getline
to waste it, or tail +1 would do it. ?But the last line is much more of
a pain. ?awk could do that too, but only awkwardly (sorry about that) by
holding each line and processing it on a delayed basis. ?I took the easy
way below and just let sed strip those.
There are several ways to get output from awk (or from any command)
back to the shell. ?But when the output is any number of lines, you need
a construct that can process multiple lines. ?Below, the awk output is
piped into a while-loop, where each iteration will process one line.
By the way, on some linux, any environment variables established within that while-loop will go away after the while-loop.
sed '1d;$d' myfile.txt |
awk '\
{date[$1]++}
END {for (i in date)
????????print i, date[i]}' |
while read date k
do
echo "date=$date k=$k"
done
---------------------------------------
I used
set -A record_count `awk 'BEGIN{FS = "|"}{ if(NF!=1) date[$1]++}END
{for (i in date) print date[i]}' $fposted`
and then array record_count will have data as i require. also if(NF!=1)
got rid of head and tail as they will not have dilimiter(which i did not
mention in my post, sorry)
Thanks for all your help
Quick Links:
Do you have
a UNIX Question?
Unix Home: Unix System Administration
Hints and Tips