最近很多站点都发现了一个名叫hl_ftien_spider的蜘蛛,这个蜘蛛的抓取频度:可是相当的厉害,几乎和DoS攻击差不多,自然也引起了不少公愤:
今天网站流量再度攀升,经检查原来是一个名为"hl_ftien_spider"的蜘蛛在疯狂的爬我的网站网页.
grep hl_ftien_spider access_log.20060304 |awk '{print $1 $2 $4 $3 $12}'
218.68.240.81-[04/Mar/2006:22:57:11-"hl_ftien_spider"
218.68.240.81-[04/Mar/2006:22:57:27-"hl_ftien_spider"
218.68.240.81-[04/Mar/2006:22:57:28-"hl_ftien_spider"
218.68.240.81-[04/Mar/2006:22:57:28-"hl_ftien_spider"
218.68.240.81-[04/Mar/2006:22:57:45-"hl_ftien_spider"
218.68.240.81-[04/Mar/2006:22:57:46-"hl_ftien_spider"
................
这样胡乱爬网页跟攻击没有什么区别,马上封了该ip地址,iptables在封锁ip上还是非常有效的:
iptables -A INPUT -s 218.68.240.38 -j REJECT
查询了一下该Ip地址,是来自天津的:
您要查询的是"218.68.240.38",它被理解为"218.68.240.38"
官方数据:
在亚洲与太平洋网络信息中心(APNIC)找到:
% [whois.apnic.net node-1]
% Whois data copyright terms http://www.apnic.net/db/dbcopyright.html
路由: 218.68.0.0/15
单位全名和地址: CNC Group CHINA169 Tianjin Province Network
国家或地区: 中国
自治域(AS): AS4837
维护者: MAINT-CNCGROUP-RR
变更记录: abuse@cnc-noc.net 20060118
信息来源: APNIC
谁知道这是哪家的蜘蛛出来乱爬么?
Google上也没有有用的信息,似乎是流氓引擎吧.
看看WebMasterWorld上的评论:
This bot hit my site today, picked up robots.txt, then picked up a few dozen pages - and tried to pick up about 1800 more pages after it hit a bot trap.
All of which it did within 4 minutes.
The IP is the same as you had, and it resolves to net263.com in China.
I've banned it.
我今天从服务器的突然后台负载增高也发现了这个蜘蛛,上月的一次大量抓取来自上海某高校,当时不得不将我的twiki改成了认证登录。
60.28.249.27 - - [31/Mar/2006:17:32:42 +0800] "GET /phpMan.php/man/nl/3ncurses HTTP/1.1" 200 10235 "http://www.chedong.com/phpMan.php/man/output./1" "hl_ftien_spider" 60.28.249.27.38631143794139731
60.28.249.27 - - [31/Mar/2006:17:32:42 +0800] "GET /phpMan.php/man/printw/3ncurses HTTP/1.1" 200 4939 "http://www.chedong.com/phpMan.php/man/output./1" "hl_ftien_spider" 60.28.249.27.38631143794139731
60.28.249.27 - - [31/Mar/2006:17:32:42 +0800] "GET /phpMan.php/man/curl_mvsnprintf/3 HTTP/1.1" 200 6830 "http://www.chedong.com/phpMan.php/man/output./1" "hl_ftien_spider" 60.28.249.27.38631143794139731
60.28.249.27 - - [31/Mar/2006:17:32:42 +0800] "GET /phpMan.php/man/HTML::Element/3pm HTTP/1.1" 200 61844 "http://www.chedong.com/phpMan.php/man/class/1" "hl_ftien_spider" 60.28.249.27.38631143794139731
60.28.249.27 - - [31/Mar/2006:17:32:43 +0800] "GET /phpMan.php/man/vw_printw/3ncurses HTTP/1.1" 200 4948 "http://www.chedong.com/phpMan.php/man/output./1" "hl_ftien_spider" 60.28.249.27.38631143794139731
60.28.249.27 - - [31/Mar/2006:17:32:43 +0800] "GET /phpMan.php/man/vprintf/3 HTTP/1.1" 200 30165 "http://www.chedong.com/phpMan.php/man/output./1" "hl_ftien_spider" 60.28.249.27.38631143794139731
60.28.249.27 - - [31/Mar/2006:17:32:43 +0800] "GET /phpMan.php/man/yes/1 HTTP/1.1" 200 3301 "http://www.chedong.com/phpMan.php/man/output./1" "hl_ftien_spider" 60.28.249.27.38631143794139731
60.28.249.27 - - [31/Mar/2006:17:32:44 +0800] "GET /phpMan.php/man/snprintf/3 HTTP/1.1" 200 30168 "http://www.chedong.com/phpMan.php/man/output./1" "hl_ftien_spider" 60.28.249.27.38631143794139731
60.28.249.27 - - [31/Mar/2006:17:32:45 +0800] "GET /phpMan.php/man/vfprintf/3 HTTP/1.1" 200 30168 "http://www.chedong.com/phpMan.php/man/output./1" "hl_ftien_spider" 60.28.249.27.38631143794139731
60.28.249.27 - - [31/Mar/2006:17:32:45 +0800] "GET /phpMan.php/man/scrollok/3ncurses HTTP/1.1" 200 10253 "http://www.chedong.com/phpMan.php/man/output./1" "hl_ftien_spider" 60.28.249.27.38631143794139731
网站稍微差一点的都会承受不住压力的。我今天顺着来源的IP地址看了一下,http://60.28.249.27/ 这不是海量的DIGDIG搜索引擎的论坛吗?