Compare Lines Within File

Compare Lines Within File

ex data
aaa20040110
aaa20040110
aaa20040110
aaa20040111
aaa20040111

File may contain data for 2 dates or single day. I need to find if it has 2 day data or one day data

---------------------------------------
#!/bin/ksh
exec 3< ./junk.data
while read -u3 line
do
if [[ $(print $line | wc -w) -eq 2 ]]
then
print $line
fi

---------------------------------------
Solution prints each line containing 2 words.

To determine how many unique dates your file contains (keying on word1 in each line), you can isolate that word (with awk or cut, for example), and then do a unique sort which will eliminate duplicates. And this is korn shell syntax, but can be converted for other shells.

k=$(awk '{print $1}' myfile.txt | sort -u | wc -l)
echo "number of unique dates:" $k

The following script will summarize your file on word1:

awk '\
????{date[$1]++}
END {for (i in date)
?????????print i, date[i]}
' myfile.txt

---------------------------------------
Thanks a lot. Next hurdle is I want to know if there are more than 2 dates then how many records for each date.
in above example.
20040110 - 3
20040111 - 2

Once again thanks a lot.

---------------------------------------
The first script gives you a count ($k) of how many unique dates you have.

The second script provides the summary you are asking for.

---------------------------------------
It's working great. it's dumb q. but how can I get summary data into unix shell variable ( or array) and manipulate it.

---------------------------------------
and also skip first and last line of file.

---------------------------------------
It is very easy to waste the first line. ?awk could do an initial getline to waste it, or tail +1 would do it. ?But the last line is much more of a pain. ?awk could do that too, but only awkwardly (sorry about that) by holding each line and processing it on a delayed basis. ?I took the easy way below and just let sed strip those.
There are several ways to get output from awk (or from any command) back to the shell. ?But when the output is any number of lines, you need a construct that can process multiple lines. ?Below, the awk output is piped into a while-loop, where each iteration will process one line.

By the way, on some linux, any environment variables established within that while-loop will go away after the while-loop.

sed '1d;$d' myfile.txt |
awk '\
{date[$1]++}
END {for (i in date)
????????print i, date[i]}' |
while read date k
do
echo "date=$date k=$k"
done

---------------------------------------
I used
set -A record_count `awk 'BEGIN{FS = "|"}{ if(NF!=1) date[$1]++}END {for (i in date) print date[i]}' $fposted`
and then array record_count will have data as i require. also if(NF!=1) got rid of head and tail as they will not have dilimiter(which i did not mention in my post, sorry)

Thanks for all your help

Quick Links:
Have a Unix Problem?
Do you have a UNIX Question?

Unix Home: Unix System Administration Hints and Tips

(c) www.gotothings.com All material on this site is Copyright.
Every effort is made to ensure the content integrity. Information used on this site is at your own risk.
All product names are trademarks of their respective companies. The site www.gotothings.com is in no way affiliated with SAP AG.
Any unauthorised copying or mirroring is prohibited.