This may have been covered before, and if so, I apologize. This thread is getting enormous!
I'm having some problems with my robots.txt file in conjunction with this mod. The intention is to remove all dynamically-generated URLs, to include all the "viewtopic.php", "search.php", "posting.php", etc URLs. However, the way you've constructed the robots.txt, I don't think this is occurring.
Example Line: Disallow: /forums/ptopic*.html$
The robots.txt standard does not allow for wildcards. Therefore, most search engines will ignore these lines, or choke on them. Since this is a Google mod, and Google DOES allow for wildcards, you can leave the asterisks in... but the dollar sign will be ignored by Google; it should be replaced by an asterisk, if that's your intention.
More importantly here, using partial URLs in robots.txt does not screen out dynamically-generated URLs. I don't know why. My robots.txt has this:
Disallow: /forums/posting.php
...but Google is still happily indexing such URLs as "/forums/posting.php?mode=quote&p=7799".
According to the Google FAQ, to disable indexing of dynamically generated pages, add this:
User-agent: Googlebot
Disallow: /*?
Note that you should ONLY do this if you've implemented the URL-rewrite mod.
To verify your robots.txt is kosher, run it through the validator here:
http://www.searchengineworld.com/cgi-bin/robotcheck.cgi
I'm certainly not an expert in this, so all comments welcome. I initially started looking into the problem because the mod seems to be working fine, but Google is still indexing every dynamic URL it can find on my site.
Regards,
Foul