baidu spider SSL 301的问题

百度的爬虫对于SSL网站 还是优先爬http协议 而不是https
对于混合的 HTTP转向SSL协议的 301转向 对于百度爬虫 好像问题比较严重
查看最近一周的日志 都是第一个日志的样子 不爬内容 判断301 直接停了

而且这种现象很多
追随301爬下个链接的 行为很少

所以如果是靠百度过日子的 还是考虑清楚在SSL吧

#这个样子的
180.76.15.34 - - [13/Aug/2016:17:33:55 +0800] "GET / HTTP/1.1" 301 436 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.15.9 - - [13/Aug/2016:17:34:45 +0800] "GET / HTTP/1.1" 301 436 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

42.156.138.38 - - [13/Aug/2016:17:44:52 +0800] "GET /ipv4/42.156.128.186 HTTP/1.1" 200 8548 "-" "YisouSpider"
42.120.160.38 - - [13/Aug/2016:17:45:02 +0800] "GET /ipv4/42.156.128.186 HTTP/1.1" 200 42504 "-" "YisouSpider"
220.181.108.162 - - [13/Aug/2016:17:52:32 +0800] "GET /asn/AS13006 HTTP/1.1" 301 458 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
123.125.71.71 - - [13/Aug/2016:17:52:33 +0800] "GET /asn/AS13006 HTTP/1.1" 200 15128 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
103.231.166.27 - - [13/Aug/2016:18:03:39 +0800] "OPTIONS / HTTP/1.0" 404 1643 "-" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.0 Safari/537.36"
180.76.15.153 - - [13/Aug/2016:18:27:09 +0800] "GET / HTTP/1.1" 301 436 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.15.14 - - [13/Aug/2016:18:27:43 +0800] "GET / HTTP/1.1" 301 436 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
45.55.52.227 - - [13/Aug/2016:18:43:08 +0800] "GET / HTTP/1.1" 200 9514 "-" "Netcraft SSL Server Survey - contact [email protected]"
180.76.15.161 - - [13/Aug/2016:18:52:36 +0800] "GET /asn/AS4609 HTTP/1.1" 301 456 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.15.140 - - [13/Aug/2016:18:52:37 +0800] "GET /asn/AS4609 HTTP/1.1" 200 28073 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

测了4个独立的不同服务器的网站
都一个现象

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注