SEO: search engine optimization搜索引擎优化:为了提高来自搜索引擎的流量,SEO已经成为很多商业网站的必修课。但是如何评价网站的SEO效果呢,设计了以下脚本,可以获得以下方面的参考数据:
1 那些网页被搜索引擎的Spider收录: 来自搜索引擎的spider统计;
2 那些网页被搜索搜索到那些网页并被点击: 来自搜索引擎的referer统计;
3 被搜索引擎搜索到的时候使用的那些关键词: 来自搜索引擎的keywords统计;
脚本如下:
架设网站的apache日志使用cronolog进行轮循或者能够获得的时间做为有规律文件名:
/home/apache/logs/access_log.20040415
/home/apache/logs/access_log.20040416
/home/apache/logs/access_log.20040417
/home/apache/logs/access_log.20040418
#!/bin/sh #$Id: spider_stats.sh,v 1.9 2004/05/15 16:52:44 chedong Exp $ YESTERDAY=`date -d yesterday +%Y%m%d` # for FreeBSD: YESTERDAY=`date -v-1d +%Y%m%d`THISMONTH=`date -d yesterday +%m%Y`
LOG_FILE='/home/apache/logs/access_log'
grep -i Googlebot $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn > spider/$YESTERDAY.googlebot.txt
grep -i baiduspider $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn>spider/$YESTERDAY.baiduspider.txt
grep -i msnbot $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn>spider/$YESTERDAY.msnbot.txt
grep -i slurp $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn>spider/$YESTERDAY.inktomi.txt
grep -i openbot $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort |uniq -c | sort -rn>spider/$YESTERDAY.openbot.txt# for search entry stats
grep -i www.google.com/search $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn > search/$YESTERDAY.google.txt
grep -i www.baidu.com/baidu $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn > search/$YESTERDAY.baidu.txt
grep -i 3721.com $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn > search/$YESTERDAY.3721.txt
grep -i search.sohu.com $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn > search/$YESTERDAY.sohu.txt
grep -i search.sina.com.cn $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort |uniq -c | sort -rn > search/$YESTERDAY.sina.txt
grep -i search.yahoo.com $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort |uniq -c | sort -rn > search/$YESTERDAY.yahoo.txt# for search keywords stats
grep www.baidu.com/baidu $LOG_FILE.$YESTERDAY | awk '{print $11}' | perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg' | perl -pe 's/(.*)?(word=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.baidu.txt
grep www.google.com/search $LOG_FILE.$YESTERDAY | awk '{print $11}' | perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg'|perl -pe 's/(.*)?(q=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.google.txt
grep 3721.com $LOG_FILE.$YESTERDAY | awk '{print $11}'| perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg'|perl -pe 's/(.*)?((p|name)=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.3721.txt
grep search.sohu.com $LOG_FILE.$YESTERDAY | awk '{print $11}'| perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg'|perl -pe 's/(.*)?((key_word|word)=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.sohu.txt
grep search.sina.com.cn $LOG_FILE.$YESTERDAY | awk '{print $11}'| perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg'|perl -pe 's/(.*)?((_searchkey|word)=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.sina.txt
grep search.yahoo.com $LOG_FILE.$YESTERDAY | awk '{print $11}'| perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg'|perl -pe 's/(.*)?(p=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.yahoo.txt
perl -pe 's/\\x(\w+)/%\1/gi' : 用于转换: \xe4\x23 这样的转码
perl -p -e 's/%(..)/pack("c", hex($1))/eg' : 进行UrlDecode
sort|uniq -c|sort -rn : 用于排序,汇聚计数并按照次数排序输出
作者:车东 发表于:2004-05-16 01:05 最后更新于:2007-04-15 19:04版权声明:可以转载,转载时请务必以超链接形式标明文章 SEO效果评价:spider referer 和 keywords 的原始出处和作者信息及本版权声明。
http://www.chedong.com/blog/archives/000432.html
Comments
太棒了!看来我要潜心学习一下正则表达式
由: 汤汤 发表于 2004年06月13日 夜间04时13分
你觉得Google分析如何呢?
看你写这篇文章的时间是2004年
现在用Google Analytics
由: sangern 发表于 2006年09月29日 上午09时27分