2
   

phpBB robots.txt tutorial

 
 
steffun00
 
  1  
Reply Fri 4 Aug, 2006 04:54 am
I don't understand.. why you don't want your topics and posts to be indexed by google ??
0 Replies
 
Notebookadvies
 
  1  
Reply Fri 11 Aug, 2006 12:23 pm
Yea, I don't get it anymore, at first I need to use * (wildcards)
A few pages later I don't (And from reading your posts I know why


So here is my Robots.txt

Disallow: forums/post-
Disallow: /updates-topic.html
Disallow: /stop-updates-topic.html
Disallow: /ptopic
Disallow: /ntopic
Disallow: /admin/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /templates/
Disallow: /common.php
Disallow: /groupcp.php
Disallow: /memberlist.php
Disallow: /modcp.php
Disallow: /profile.php
Disallow: /privmsg.php
Disallow: /viewonline.php
Disallow: /faq.php
Disallow: /updates-topic
Disallow: /stop-updates-topic
Disallow: /ptopic
Disallow: /ntopic
Disallow: /search.php

I have installed your SEO mod and the phpBB static URLs mod _rewrite 1.0.0 mod
(btw. from the default install I was missing extension.inc Laughing but I fixed that)

So is this a good robots.txt for an phpbbforum which runs Adsense!?

I know disallowing /search.php is a good one, because adsense is never going to give relevant results on that page, but the others?

(/faq, /admin, /images and /profile are correct to block as far as I know )
0 Replies
 
Notebookadvies
 
  1  
Reply Sun 20 Aug, 2006 07:45 am
Can someone compare it to his or hers own robots.txt?

I like to know if I am correct about mine.
0 Replies
 
Notebookadvies
 
  1  
Reply Sun 24 Sep, 2006 09:25 am
still waiting on an response .... Sad
0 Replies
 
cmr924
 
  1  
Reply Sun 8 Oct, 2006 06:18 pm
First of all, what is the entire point of blocking the robots from indexing these pages? Wouldn't it be most beneficial if we let them index the most amount of pages possible?

Second, how does this robots.txt file look:
Code:
User-agent: *
Disallow: /forum/admin/
Disallow: /forum/images/
Disallow: /forum/includes/
Disallow: /forum/language/
Disallow: /forum/templates/
Disallow: /forum/common.php
Disallow: /forum/groupcp.php
Disallow: /forum/memberlist.php
Disallow: /forum/modcp.php
Disallow: /forum/posting.php
Disallow: /forum/profile.php
Disallow: /forum/privmsg.php
Disallow: /forum/viewonline.php
Disallow: /forum/faq.php
Disallow: /forum/updates-topic
Disallow: /forum/stop-updates-topic
Disallow: /forum/ptopic
Disallow: /forum/ntopic
Disallow: /forum/post-
Disallow: /forum/updates-topic-
Disallow: /forum/stop-updates-topic-
Disallow: /forum/ptopic-
Disallow: /forum/ntopic-


And for an off topic question, do I put the .htaccess file, for the SEO Optimization Mod, in my forum directory or in the root of my website itself?

Thanks for being so helpful!

-Chris
0 Replies
 
cmr924
 
  1  
Reply Mon 30 Oct, 2006 03:13 pm
bump
0 Replies
 
Notebookadvies
 
  1  
Reply Mon 6 Nov, 2006 03:41 pm
ha ha ... and an other bump ...
0 Replies
 
GMPCM
 
  1  
Reply Sun 4 Mar, 2007 11:08 am
aldemolay wrote:
I am having problems with bots signing up for my forum. can someone help me out on what exactly i need to do in order to stop them from being able to register. i know very little about server side code so as much detail as possible would be helpful. i just recently started messing with the forum part of the website i took over.


Since this seems to be a nice helpful forum I thought I would let you in a little tidbit for preventing bot regs and posts which will work without a bunch of visual confirmation junk. All you need to do is add the following lines to your apache conf file :

Edit [Moderator]: Link removed

The first two lines just ensure you have the required modules loaded, likely already done but good to double check. It should be obvious that ALL spam bots use the HTTP 1.0 protocol rather than HTTP 1.1. This will not prevent an actual user from registering and posting spam, as a normal browsers access will be via HTTP 1.1, but that is far less prelevant than the onslaught of spam bot activity we seem to be subjected to now.

The fact is this is quite elementary, and if you did not know how to configure apache for something this simple you likely should not be running an apache server at this time, since there are far more dangerous exploits that need to be addressed in order to have anything close to a secured server. You should research how to configure apache for hardened security and implement the proper methods before proceeding to run your server.

If you would like to prevent spam posts from actual users you can do that too, but it is much more involved and may require the AI monitoring software we develop for some servers.

Note: Since the moderators remove important links to text files, and since the filters here do not allow some very important parts of the conf file addtitions, you are going to have to go without my help. These forum are really nothing but a paranoia circus, my advice : learn how to run a website, how to manage server, and how to control a forum without making it unusable.
0 Replies
 
georgism
 
  1  
Reply Sun 11 Nov, 2007 04:26 am
All is so good, but i don't understand this:

Disallow: /post-*.html$

Why i will disallow post-*.html from spidering ?
What is the point of this ?

We all want google to index our topics and post, didn't we ?
0 Replies
 
Robert Gentel
 
  1  
Reply Sun 11 Nov, 2007 08:50 pm
The post urls are redundant to the topic urls. By preventing the spiders from indexing 10 additional urls per topic page (or more depending on your pagination settings) they will spider more unique content.
0 Replies
 
georgism
 
  1  
Reply Mon 12 Nov, 2007 03:09 am
Okey, i have now new question.

Here is the robots.txt:
Code:Disallow: forum/post-*.html$
Disallow: forum/updates-topic.html*$
Disallow: forum/stop-updates-topic.html*$
Disallow: forum/ptopic*.html$
Disallow: forum/ntopic*.html$


when spiders craw the pages with "www." infront the urls, will they index the pages ?

there are two variants:

/forum/post....
and
forum/post... but which one is the best ?
0 Replies
 
 

Related Topics

 
Copyright © 2024 MadLab, LLC :: Terms of Service :: Privacy Policy :: Page generated in 0.06 seconds on 11/14/2024 at 09:59:15