SEO效果评价:spider referer 和 keywords


SEO: search engine optimization搜索引擎优化:为了提高来自搜索引擎的流量,SEO已经成为很多商业网站的必修课。但是如何评价网站的SEO效果呢,设计了以下脚本,可以获得以下方面的参考数据:
1 那些网页被搜索引擎的Spider收录: 来自搜索引擎的spider统计;
2 那些网页被搜索搜索到那些网页并被点击: 来自搜索引擎的referer统计;
3 被搜索引擎搜索到的时候使用的那些关键词: 来自搜索引擎的keywords统计;

脚本如下:

架设网站的apache日志使用cronolog进行轮循或者能够获得的时间做为有规律文件名:
/home/apache/logs/access_log.20040415
/home/apache/logs/access_log.20040416
/home/apache/logs/access_log.20040417
/home/apache/logs/access_log.20040418

#!/bin/sh
#$Id: spider_stats.sh,v 1.9 2004/05/15 16:52:44 chedong Exp $
YESTERDAY=`date -d yesterday +%Y%m%d`
# for FreeBSD: YESTERDAY=`date -v-1d +%Y%m%d`

THISMONTH=`date -d yesterday +%m%Y`

LOG_FILE='/home/apache/logs/access_log'

grep -i Googlebot $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn > spider/$YESTERDAY.googlebot.txt
grep -i baiduspider $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn>spider/$YESTERDAY.baiduspider.txt
grep -i msnbot $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn>spider/$YESTERDAY.msnbot.txt
grep -i slurp $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn>spider/$YESTERDAY.inktomi.txt
grep -i openbot $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort |uniq -c | sort -rn>spider/$YESTERDAY.openbot.txt

# for search entry stats
grep -i www.google.com/search $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn > search/$YESTERDAY.google.txt
grep -i www.baidu.com/baidu $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn > search/$YESTERDAY.baidu.txt
grep -i 3721.com $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn > search/$YESTERDAY.3721.txt
grep -i search.sohu.com $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort | uniq -c | sort -rn > search/$YESTERDAY.sohu.txt
grep -i search.sina.com.cn $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort |uniq -c | sort -rn > search/$YESTERDAY.sina.txt
grep -i search.yahoo.com $LOG_FILE.$YESTERDAY|awk '{print $7}' |sort |uniq -c | sort -rn > search/$YESTERDAY.yahoo.txt

# for search keywords stats
grep www.baidu.com/baidu $LOG_FILE.$YESTERDAY | awk '{print $11}' | perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg' | perl -pe 's/(.*)?(word=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.baidu.txt
grep www.google.com/search $LOG_FILE.$YESTERDAY | awk '{print $11}' | perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg'|perl -pe 's/(.*)?(q=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.google.txt
grep 3721.com $LOG_FILE.$YESTERDAY | awk '{print $11}'| perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg'|perl -pe 's/(.*)?((p|name)=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.3721.txt
grep search.sohu.com $LOG_FILE.$YESTERDAY | awk '{print $11}'| perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg'|perl -pe 's/(.*)?((key_word|word)=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.sohu.txt
grep search.sina.com.cn $LOG_FILE.$YESTERDAY | awk '{print $11}'| perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg'|perl -pe 's/(.*)?((_searchkey|word)=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.sina.txt
grep search.yahoo.com $LOG_FILE.$YESTERDAY | awk '{print $11}'| perl -pe 's/\\x(\w+)/%\1/gi' |perl -p -e 's/%(..)/pack("c", hex($1))/eg'|perl -pe 's/(.*)?(p=(.*?))[&"].*/$3/gi' |sort|uniq -c|sort -rn > keywords/$YESTERDAY.yahoo.txt

perl -pe 's/\\x(\w+)/%\1/gi' : 用于转换: \xe4\x23 这样的转码

perl -p -e 's/%(..)/pack("c", hex($1))/eg' : 进行UrlDecode

sort|uniq -c|sort -rn : 用于排序,汇聚计数并按照次数排序输出

作者:车东 发表于:2004-05-16 01:05 最后更新于:2007-04-15 19:04
版权声明:可以转载,转载时请务必以超链接形式标明文章 的原始出处和作者信息及本版权声明

引用通告

以下是前来引用的链接: SEO效果评价:spider referer 和 keywords:

» Searching result scripts 来自 blog::technology
Thanks Chedong, I redo the script he suggest to me and make a script which can find out the keywords and make a easy stastic from access_log: The searching result for this website seems really funny and we do can find sth interest here: Searching resul... [阅读更多细节]

» 访问来源链接Referer统计 来自 CNBlog: Blog on Blog
搜索引擎利用网页之间的显式的静态链接引用做为基本的评价来源,然而互联网中一种的动态引用关系则是通过用户点击产生的访问来源链接Referer:它记录了用户在当前网页之前的访问地址... [阅读更多细节]

Comments

太棒了!看来我要潜心学习一下正则表达式

你觉得Google分析如何呢?
看你写这篇文章的时间是2004年
现在用Google Analytics

发表一个评论

(如果你此前从未在此 Blog 上发表过评论,则你的评论必须在 Blog 主人验证后才能显示,请你耐心等候。)

Creative Commons License
此 Blog 中的日记遵循以下授权 Creative Commons(创作共用)授权.
Powered by
Movable Type 3.36