对于Apache和PHP代码屏蔽yisouspider的办法没有亲自测试,本站只采用了Nginx屏蔽yisouspider的办法,所以如果采用其他方法遇到问题的请留言求助。
Apache屏蔽爬虫yisouspider访问站点方法
1、通过修改 .htaccess文件
修改网站目录下的.htaccess,添加如下代码即可(2种代码任选):
可用代码 (1):
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (^$|yisouspider|FeedDemon|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms) [NC]
RewriteRule ^(.*)$ - [F]
可用代码 (2):
SetEnvIfNoCase ^User-Agent$ .*(yisouspider|FeedDemon|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms) BADBOT
Order Allow,Deny
Allow from all
Deny from env=BADBOT
2、通过修改httpd.conf配置文件
找到如下类似位置,根据以下代码 新增 / 修改,然后重启Apache即可:
DocumentRoot /home/wwwroot/xxx
<Directory "/home/wwwroot/xxx">
SetEnvIfNoCase User-Agent ".*(yisouspider|FeedDemon|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms)" BADBOT
Order allow,deny
Allow from all
deny from env=BADBOT
</Directory>
以上是Apache屏蔽爬虫yisouspider访问站点方法