ip 独立ip 特定时间内统计到的不通ip
squid统计方法
[root@localhost etc]# cut -d " " -f1 /usr/local/squid/var/logs/access.log|sort|uniq|wc -l
2823
当用到awk时
awk '{print $1}'|sort|uniq test_8.5.2016
10.185.130.78 - - [04/Aug/2016:23:59:29 +0800] "GET http://swcdn.apple.com/content/downloads/52/05/041-9986/25o8fo3gq5m75ylgi9h1vgf4b0r5fovzhr/041-9986.zh_TW.dist HTTP/1.1" 200 53879 TCP_MISS:HIER_DIREC
以上应是cut与awk 的不同之处
pv page view,页面流浪次数
在squid中每一条log就是一个页面的pv,但这个统计是没有意义的嚒(统计那么多的网站的总共pv,有啥用,倒是可以用grep抓取同一域名网站的进行统计)
笼统的统计所有:
[root@localhost etc]# wc -l /usr/local/squid/var/logs/access.log|awk '{print $1}'
3874236
根据不通域名统计
(通过for循环实现)
[root@localhost etc]# for i in `cut -d " " -f7 test_8.5.2016 |cut -d "/" -f3|sort|uniq`;do echo "${i} is visited (times)"; grep $i test_8.5.2016 -c ;done
cn180156 is visited (times)
1
dungcoivb.googlepages.com is visited (times)
1
lh-hn-505 is visited (times)
8
swcdn.apple.com is visited (times)
4
www.whatismyip.com is visited (times)
1
下面来到郑州服务器实践
!/bin/bash
#Version 1.0
#Author Scott
#Mail yzh?????@sina.com
#Introduction This is for count website that was visited a few hours ago in special log
cut -d " " -f7 /bash/script/log|sort|uniq \
>/bash/script/website.log
for i in `grep -v ^htt /bash/script/website.log`
do
echo -ne "$i is visited\n"
grep -c $i /bash/script/log
done
for j in `cut -d "/" -f3 /bash/script/website.log|uniq`
do
echo "$j is visited"
grep $j /bash/script/log -c
done
以上程序跑了1.5个小时还没出结果,log共有300W个
改进程序
#!/bin/bash
#Version 1.1
#Author Scott
#Mail yzh?????@sina.com
#Introduction This is for count website that was visited a few hours ago in special log
cut -d " " -f7 /bash/scripts/log|sort\
>/bash/scripts/website.log
grep -v ^htt /bash/scripts/website.log|uniq -c >loged.log &&\
cut -d "/" -f3 /bash/scripts/website.log|uniq -c >loged.log
再改进
[root@hadphost scripts]# wc -l log
2805719 log
[root@hadphost scripts]# cat pv_test.sh
#!/bin/bash
#Version 1.2
#Author Scott
#Mail yzh??????@sina.com
#Introduction This is for count website that was visited a few hours ago in special log
cut -d " " -f7 /bash/scripts/log|sort\
>/bash/scripts/website.log
grep -v ^htt /bash/scripts/website.log|uniq -c >loged.log &&\
cut -d "/" -f3 /bash/scripts/website.log|uniq -c >>loged.log &&\
sort -rbn loged.log >queue.log
[root@hadphost scripts]# wc -l queue.log
18408 queue.log
[root@hadphost scripts]# fg
time sh pv_test.sh
real 1m40.616s
user 1m37.560s
sys 0m1.807s
uv 统计不通客服端个数
squid的log信息占时没有相关数据,故不做研究了