2
   

phpBB robots.txt tutorial

 
 
Reply Sat 10 Apr, 2004 10:03 pm
robots.txt is a file that must be placed in the domain's root directory. Search engine spiders will load it and then follow the rules (if they are "good" spammer spiders and other miscreant spiders will not follow the rules).

robots.txt is basically the best way to keep certain pages from being spidered and indexed by search engines.

The following is an example for phpBB.

Create a file called robots.txt

Then add the rules, preventing spiders from spidering the pages you do not want them to spider.

Here is an example for a phpBB install in a subdirectory called "forums". If you have it installed in a different directory you need to change this.

Code:User-agent: *
Disallow: /forums/admin/
Disallow: /forums/images/
Disallow: /forums/includes/
Disallow: /forums/language/
Disallow: /forums/templates/
Disallow: /forums/common.php
Disallow: /forums/groupcp.php
Disallow: /forums/memberlist.php
Disallow: /forums/modcp.php
Disallow: /forums/posting.php
Disallow: /forums/profile.php
Disallow: /forums/privmsg.php
Disallow: /forums/viewonline.php
Disallow: /forums/faq.php
Disallow: /forums/updates-topic
Disallow: /forums/stop-updates-topic
Disallow: /forums/ptopic
Disallow: /forums/ntopic
Disallow: /post-


The last few pertain to the Able2Know.com SEO MOD. You might also consider preventing search.php from being spidered.
  • Topic Stats
  • Top Replies
  • Link to this Topic
Type: Discussion • Score: 2 • Views: 71,844 • Replies: 70
No top replies

 
zoomsan
 
  1  
Reply Sun 11 Apr, 2004 01:12 pm
hello craven,

what would be the benefit or the consequence to throwing in search.php in that robots.txt file?

thanks!

zm
0 Replies
 
Craven de Kere
 
  1  
Reply Sun 11 Apr, 2004 04:49 pm
Well,

This type of page:

http://www.able2know.com/forums/search.php?search_author=zoomsan

Can either help or hurt, it depends on your site's "link flow" and a couple of other trivialities.

Usernames can be SE queries and for that it can help (especially if you have thousands of members because the variety means a couple of obscure hits on SERPs.

But if you have very context targeted content in smaller numbers it might just be seen as dillution of your content by search engines. To tell which it is for your site you need to follow the search engines closely as well as your log files.
0 Replies
 
zoomsan
 
  1  
Reply Mon 12 Apr, 2004 09:38 am
Sounds great!

I noticed that on the post here you left out:

Disallow: forums/post-*.html$

but you have that in the Mod file ....

what is the significance of this line and should it be included or left out in your opinion?

Thx

zm
0 Replies
 
Craven de Kere
 
  1  
Reply Mon 12 Apr, 2004 09:49 am
I left it out because that's not vanilla phpBB.

The significance is that all the post URLs are just seen as duplicate content.
0 Replies
 
zoomsan
 
  1  
Reply Fri 16 Apr, 2004 09:27 am
hello craven,

do you feel there is any reason not to Disallow login.php in the robots.txt file?

Thanks!

zm

BTW will that link mod be released in the other thread? Or perhaps a seperate thread? And do you use the link mod, as I noticed you disallow links completely ... Thanks in advance! And I will PM you a link so you can see your mod in action.
0 Replies
 
Craven de Kere
 
  1  
Reply Fri 16 Apr, 2004 10:03 am
I see no reason not to disallow login.php on a vanilla install.

Yeah, the links mod is in effect here. See: www.cnn.com
0 Replies
 
zoomsan
 
  1  
Reply Fri 16 Apr, 2004 01:58 pm
Fantastic .. where should I look for the link mod once its done?

Also, what is the reason for not having the "/" in front of those files at the end of your robots.txt? I am sure it is something simple but it would be helpful to know.

Thx.

zm
0 Replies
 
Craven de Kere
 
  1  
Reply Fri 16 Apr, 2004 02:02 pm
Doh, that's a typo.
0 Replies
 
LocanT
 
  1  
Reply Sun 16 May, 2004 04:42 am
Craven or anyone else

Quote:
Disallow: forums/updates-topic.html*$
Disallow: forums/stop-updates-topic.html*$
Disallow: forums/ptopic*.html$
Disallow: forums/ntopic*.html$


are these file specific?

In other words if my forum is named myforumsshould they be in my robots.text

Disallow: myforums/updates-topic.html*$
Disallow: myforums/stop-updates-topic.html*$
Disallow: myforums/ptopic*.html$
Disallow: myforums/ntopic*.html$

Thank you very much in advance
0 Replies
 
Craven de Kere
 
  1  
Reply Mon 17 May, 2004 11:43 am
If you are using a different directory you will need to change the "forums" part. Basically, you seem to have the right idea.
0 Replies
 
LocanT
 
  1  
Reply Mon 17 May, 2004 08:42 pm
Craven thanks

On a similar subject

the mod_rewrite part
Quote:
OPEN ]------------------------------------------
#

.htaccess

#
#-----[ ADD ]------------------------------------------
#

RewriteEngine On
RewriteRule ^forums.* index.php [L,NC]
RewriteRule ^post-([0-9]*).html&highlight=([a-zA-Z0-9]*) viewtopic.php?p=$1&highlight=$2 [L,NC]
RewriteRule ^post-([0-9]*).* viewtopic.php?p=$1 [L,NC]
RewriteRule ^view-poll([0-9]*)-([0-9]*)-([a-zA-Z]*).* viewtopic.php?t=$1&postdays=$2&postorder=$3&vote=viewresult [L,NC]
RewriteRule ^about([0-9]*).html&highlight=([a-zA-Z0-9]*) viewtopic.php?t=$1&highlight=$2 [L,NC]
RewriteRule ^about([0-9]*).html&view=newest viewtopic.php?t=$1&view=newest [L,NC]
RewriteRule ^about([0-9]*)-([0-9]*)-([a-zA-Z]*)-([0-9]*).* viewtopic.php?t=$1&postdays=$2&postorder=$3&start=$4 [L,NC]
RewriteRule ^about([0-9]*)-([0-9]*).* viewtopic.php?t=$1&start=$2 [L,NC]
RewriteRule ^about([0-9]*).* viewtopic.php?t=$1 [L,NC]
RewriteRule ^about([0-9]*).html viewtopic.php?t=$1&start=$2&postdays=$3&postorder=$4&highlight=$5 [L,NC]
RewriteRule ^mark-forum([0-9]*).html* viewforum.php?f=$1&mark=topics [L,NC]
RewriteRule ^updates-topic([0-9]*).html* viewtopic.php?t=$1&watch=topic [L,NC]
RewriteRule ^stop-updates-topic([0-9]*).html* viewtopic.php?t=$1&unwatch=topic [L,NC]
RewriteRule ^forum-([0-9]*).html viewforum.php?f=$1 [L,NC]
RewriteRule ^forum-([0-9]*).* viewforum.php?f=$1 [L,NC]
RewriteRule ^topic-([0-9]*)-([0-9]*)-([0-9]*).* viewforum.php?f=$1&topicdays=$2&start=$3 [L,NC]
RewriteRule ^ptopic([0-9]*).* viewtopic.php?t=$1&view=previous [L,NC]
RewriteRule ^ntopic([0-9]*).* viewtopic.php?t=$1&view=next [L,NC]


#
#----




The firts line
Quote:
RewriteRule ^forums.*


Should this also be changed to (in my case) ^myforums.*
0 Replies
 
Craven de Kere
 
  1  
Reply Mon 17 May, 2004 08:43 pm
Nope, in fact that is pretty much useless. It's just there in case you want to link to index.php as forums.html

That being said, change it to whatever ya want, it won't be used unless you link to it that way.
0 Replies
 
LocanT
 
  1  
Reply Wed 19 May, 2004 04:58 pm
LocanT wrote:
Craven or anyone else

Quote:
Disallow: forums/updates-topic.html*$
Disallow: forums/stop-updates-topic.html*$
Disallow: forums/ptopic*.html$
Disallow: forums/ntopic*.html$


are these file specific?

In other words if my forum is named myforumsshould they be in my robots.text

Disallow: myforums/updates-topic.html*$
Disallow: myforums/stop-updates-topic.html*$
Disallow: myforums/ptopic*.html$
Disallow: myforums/ntopic*.html$

Thank you very much in advance


One more Should these have a / before

exmaple

instead of
Disallow: myforums/updates-topic.html*$

it would be
Disallow: /myforums/updates-topic.html*$

my forum is not on a sub domain(perhaps I should have done) it is in a folder on the domain


Thanks
0 Replies
 
Craven de Kere
 
  1  
Reply Wed 19 May, 2004 05:20 pm
It should have the slash.
0 Replies
 
strum4life
 
  1  
Reply Tue 25 May, 2004 05:07 pm
Google Adsense and Robots.txt
I have Google Adsense on each page of my forum. The Adsense Bot needs to visit each page to deliver relevant ads. Do you know if your example robots.txt file causes problems with Adsense? Thanks.
0 Replies
 
Craven de Kere
 
  1  
Reply Tue 25 May, 2004 05:09 pm
It shouldn't but you may want to allow post urls as the bot will want to monetize those as well.
0 Replies
 
AdamStone
 
  1  
Reply Mon 8 Nov, 2004 08:32 am
For some reason, Google still indexes login.php and search.php. If anybody sees something wrong with my robots.txt, let me know. My phpbb files and robots.txt are in the root directory of my site. Thanks Smile

Code:User-agent: *
Disallow: /post-*.html$
Disallow: /updates-topic.html*$
Disallow: /stop-updates-topic.html*$
Disallow: /ptopic*.html$
Disallow: /ntopic*.html$
Disallow: /admin/
Disallow: /db/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /templates/
Disallow: /common.php
Disallow: /groupcp.php
Disallow: /memberlist.php
Disallow: /modcp.php
Disallow: /posting.php
Disallow: /profile.php
Disallow: /profile.php?mode=register
Disallow: /privmsg.php
Disallow: /viewonline.php
Disallow: /faq.php
Disallow: /search.php
Disallow: /search.php?search_id=unanswered
Disallow: /login.php
0 Replies
 
Craven de Kere
 
  1  
Reply Mon 8 Nov, 2004 10:58 pm
Adam,

Use this link, and have Google remove urls based on the robots.txt exclusions:

http://services.google.com/urlconsole/controller

See if that works.
0 Replies
 
AdamStone
 
  1  
Reply Wed 10 Nov, 2004 10:11 am
First let me say: Craven, you're awesome! Thanks for all the effort you put into helping us out on these forums.

I tried what you suggested, and here's what Google said:

Quote:
URLs cannot have wild cards in them (e.g. "*"). The following line contains a wild card:
DISALLOW /post-*.html$


Similarly, when I check my robots.txt file with online programs, they pretty much all tell me the same thing - wildcards are a no-no.

Now, I don't actually have a problem with Google indexing ntopic or ptopic or any of the additions from your mod_rewrite text, but I'm wondering if there's some inadvertent correlation between that and the problem I'm having.

Something else I just realized - I have this tag inside overall_header.tpl:
Code:<meta name="robots" content="index,follow">

Could that be telling the search engines to index everything regardless of robots.txt?

Thanks Smile
0 Replies
 
 

Related Topics

 
  1. Forums
  2. » phpBB robots.txt tutorial
Copyright © 2024 MadLab, LLC :: Terms of Service :: Privacy Policy :: Page generated in 1.77 seconds on 11/13/2024 at 09:19:13