悬赏wordpress英文站robots.txt 规则的完整写法
本帖最后由 hardrock 于 2014-12-29 15:45 编辑悬赏wordpress英文站robots.txt 规则的完整写法
按以下要求写
1,只能Alexa, aol, Ask, Google、Yahoo、Bing的爬虫访问,其他的禁止
2,禁止访问特定目录,文件类型的通用写法,适用于wordpress英文版3.6~4.1版本
3,Sitemaps文件
本帖最后由 yzhvps 于 2014-12-29 22:09 编辑
User-agent: Alexa*
Allow: /
User-agent: Aol*
Allow: /
User-agent: Ask*
Allow: /
User-agent: Yahoo*
Allow: /
User-agent: Bing*
Allow: /
User-agent: Google*
Allow: /
User-agent: *
Disallow: /
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*
Sitemap: http://www.eefaq.com/sitemap.xml
很粗率,完了上robots验证验证一下或者麻烦些。。。
User-agent: Alexa*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*
User-agent: Aol*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*
User-agent: Ask*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*
User-agent: Yahoo*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*
User-agent: Bing*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*
User-agent: Google*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*
User-agent: *
Disallow: /
Disallow: *
Sitemap: http://www.eefaq.com/sitemap.xml
1.学习下robots.txt语法,其实很简单
2.google下各个ua
本帖最后由 work4seo 于 2014-12-29 17:40 编辑
只写了google的写法,你需要让哪个爬虫爬,照抄一份UA进去就行了。
User-agent: Googlebot
Allow: /
Disallow: /mulu/
Disallow: /*.css$
User-agent: Googlebot-Mobile
Allow: /
Disallow: /mulu/
Disallow: /*.css$
User-agent: *
Disallow: /
Sitemap:http://www.xxx.com/sitemap.xml
其实百度百科里面写的很详细啊。。。
http://baike.baidu.com/view/9274458.htm#3_1
work4seo 发表于 2014-12-29 17:39 static/image/common/back.gif
只写了google的写法,你需要让哪个爬虫爬,照抄一份UA进去就行了。
User-agent: Googlebot
Allow: /
Alexa, aol, Ask, Yahoo、Bing??
要将
User-agent: *
Disallow: /
放在最前面吧?后面的设置会覆盖前面的设置?
User-agent: *
Disallow: /
User-agent: Google*,Yahoo*,msn*,.....
Allow: /
Disallow: /mulu/
Disallow: /*.css$
Sitemap: http://www.eefaq.com/sitemap.xml
User-agent: *
Disallow: /
User-agent: Google*,Yahoo*,msn*,.....
Allow: /
Disallow: /mulu/
Disallow: /*.css$
Sitemap: http://www.eefaq.com/sitemap.xml
楼上几个都不错,炒一下吧
这么多都要写啊。。。
楼主如果找到了 可以分享一下吗 :'(:'(:'(
页:
[1]