2
   

robot.txt 101 for dummies?

 
 
tthome
 
Reply Thu 27 May, 2004 08:50 am
one last ?tion I promise...

I've heard of this robot.txt file and this htaccess file that will prevent these spiders form hitting certain pages of the phpbb forum.

Is there a nice and tidy robot.txt for dummies thread somewhere that explains how to set this up and what pages are "recommended" that these spiders not see on phpbb? I'm not a coder but I'm trying really hard to get my forum atleast a fighting chance to get listed on google and ahead of the others... :wink:

I'm still cutting my teeth on this stuff...and my gums are bleeding... Shocked

Tim
  • Topic Stats
  • Top Replies
  • Link to this Topic
Type: Discussion • Score: 2 • Views: 10,540 • Replies: 33
No top replies

 
Monger
 
  1  
Reply Thu 27 May, 2004 11:18 am
Quote:
Is there a nice and tidy robot.txt for dummies thread somewhere that explains how to set this up and what pages are "recommended" that these spiders not see on phpbb?


Craven de Kere wrote up the following which sounds like just what you're looking for: phpBB robots.txt tutorial


SearchEngineWorld did a basic robots.txt tutorial here: http://www.searchengineworld.com/robots/robots_tutorial.htm

Or try this google search: http://www.google.com/search?q=robots.txt+tutorial
0 Replies
 
tthome
 
  1  
Reply Thu 27 May, 2004 03:38 pm
Thanks monger I put my robots.txt file in the root of my site which I'm assuming is where I see the standard listing of the following folders right? When I ftp to my site root I get these folders. I am supposed to put the robots.txt file here????

/etc
/mail
/public_ftp
/public_html
/tmp
/www
robots.txt
0 Replies
 
Monger
 
  1  
Reply Thu 27 May, 2004 06:30 pm
Yes robots.txt should go in root, but what you listed above isn't the root of your site--that is above root. The public_html folder you see there is your website's root directory.
The current location of your robots.txt file is inaccessible to search engine robots (& visitors for that matter).
0 Replies
 
tthome
 
  1  
Reply Tue 8 Jun, 2004 01:06 pm
ok, I'm trying to figure out how or if google is crawling my site so I did some research and found out that I could download these "raw files" to my computers and view who's accessed the site. I inported the file into excel and was able to see some very interesting info.

I see that the MSN bot crawls my site and I see that it's hit a bunch of /forums/about-*.html files which I know the SOE MOD is suppose to help do.

I also see that thee is another bot called GornKer Crawler that also has these same /forums/about-*.html files crawled

YET... Question

When I see what google has done it doesn't appear that it's doing the same as MSN and GornKer is doing.

Here is my robots.txt file

User-agent: *
Disallow: /forums/admin/
Disallow: /forums/db/
Disallow: /forums/images/
Disallow: /forums/includes/
Disallow: /forums/language/
Disallow: /forums/templates/
Disallow: /forums/common.php
Disallow: /forums/groupcp.php
Disallow: /forums/memberlist.php
Disallow: /forums/modcp.php
Disallow: /forums/posting.php
Disallow: /forums/profile.php
Disallow: /forums/privmsg.php
Disallow: /forums/viewonline.php
Disallow: /forums/faq.php
Disallow: forums/updates-topic.html*$

Does this look right and what can I do to help google crawl my site more?

Oh please great CDK and Monger or whoever knows please help. I feel I'm on the cusp of search engine acceptance, but I don't have something quite right.

Here is an almost complete listing of what googlebot touches:

GET /robots.txt HTTP/1.0 (it hits this alot)
GET / HTTP/1.0
GET /robots.txt HTTP/1.0
GET / HTTP/1.0
GET /robots.txt HTTP/1.0
GET /cal/cal.php HTTP/1.0
GET /robots.txt HTTP/1.0
GET /main.htm HTTP/1.0
GET /contactinfo.htm HTTP/1.0
GET /contactinfo.htm HTTP/1.0
GET /robots.txt HTTP/1.0
GET / HTTP/1.0
GET /HC_web_tou.htm HTTP/1.0
GET /why.htm HTTP/1.0
GET /why.htm HTTP/1.0
GET /computing.htm HTTP/1.0
GET /why.htm HTTP/1.0
GET / HTTP/1.0
GET /aboutus.htm HTTP/1.0
GET /aboutus.htm HTTP/1.0
GET /forums HTTP/1.0 (i'm wanting it to tunnel deeper here, this is my forum root)
GET /HC_web_tou.htm HTTP/1.0
GET /HC_web_tou.htm HTTP/1.0
GET /HC_privacy.htm HTTP/1.0
GET /HC_privacy.htm HTTP/1.0
GET /reviews HTTP/1.0
GET /hometheater.htm HTTP/1.0

Am I even close to where I need to be? or should the GET /forums HTTP/1.0 URL be replaced with a more direct /forums/index.php link?
0 Replies
 
Craven de Kere
 
  1  
Reply Tue 8 Jun, 2004 01:15 pm
I only skimmed your data but I don't get what the problem is. If you are wanting Google to spider your forums, well you need to wait or improve your ranking in their algo.

Your setup does not preclude Google spidering from what you've mentioned. So if Google's not indexing it is most likely that they did not rank your site well enough to index any deeper. Maybe they'll hit it next time.

Remember, there are several Google crawls, maybe the deepbot hasn't hit you yet and the freshbot doesn't want to yet.

Post the Google IPs and I'll might be able to tell you what bot hit you up.
0 Replies
 
tthome
 
  1  
Reply Tue 8 Jun, 2004 01:42 pm
thanks CDK...here are the IP's

64.68.82.55 - - [07/Jun/2004:01:38:03 -0500]
64.68.82.55 - - [07/Jun/2004:01:38:04 -0500]
64.68.82.27 - - [07/Jun/2004:01:58:18 -0500]
64.68.82.27 - - [07/Jun/2004:01:58:18 -0500]
64.68.82.55 - - [07/Jun/2004:02:18:48 -0500]
64.68.82.159 - - [07/Jun/2004:02:30:39 -0500]
64.68.82.13 - - [07/Jun/2004:02:40:19 -0500]
64.68.82.13 - - [07/Jun/2004:02:40:19 -0500]
64.68.82.79 - - [07/Jun/2004:02:53:29 -0500]
64.68.82.135 - - [07/Jun/2004:03:09:46 -0500]
64.68.82.135 - - [07/Jun/2004:03:09:56 -0500]
64.68.82.37 - - [07/Jun/2004:03:11:47 -0500]
64.68.82.55 - - [07/Jun/2004:03:20:02 -0500]
64.68.82.144 - - [07/Jun/2004:03:28:55 -0500]
64.68.82.159 - - [07/Jun/2004:04:34:48 -0500]
64.68.82.165 - - [07/Jun/2004:04:39:54 -0500]
64.68.82.137 - - [07/Jun/2004:04:59:28 -0500]
64.68.82.144 - - [07/Jun/2004:05:01:24 -0500]
64.68.82.144 - - [07/Jun/2004:05:02:05 -0500]
64.68.82.136 - - [07/Jun/2004:05:22:37 -0500]
64.68.82.136 - - [07/Jun/2004:05:22:41 -0500]
64.68.82.137 - - [07/Jun/2004:06:44:40 -0500]
64.68.82.137 - - [07/Jun/2004:07:30:41 -0500]
64.68.82.18 - - [07/Jun/2004:07:37:49 -0500]
64.68.82.144 - - [07/Jun/2004:09:24:46 -0500]
64.68.82.174 - - [07/Jun/2004:13:10:37 -0500]
64.68.82.181 - - [07/Jun/2004:13:50:20 -0500]
64.68.82.27 - - [07/Jun/2004:16:02:13 -0500]
64.68.82.18 - - [07/Jun/2004:16:38:56 -0500]
64.68.82.201 - - [07/Jun/2004:23:32:05 -0500]


I don't know how to increase my rank to be honest with you.
0 Replies
 
Craven de Kere
 
  1  
Reply Tue 8 Jun, 2004 01:51 pm
All of those IPs are Google's freshbot. It usually just updates pages that are already indexed and deviates from current index only slightly.

If you wait till the next deepcrawl you might get the deepcrawl bot going deep.

As to increasing your rankings: back links.
0 Replies
 
tthome
 
  1  
Reply Tue 8 Jun, 2004 02:02 pm
Any tips on how to increase backlinks? I've heard of all these link farms, but I've heard to stay away from them too. I want to be legit when I do this and not get blackballed....
0 Replies
 
Craven de Kere
 
  1  
Reply Tue 8 Jun, 2004 02:18 pm
Well, the only pure white technique is to sit back and let it happen itself. People are supposed to be so happy with your site that they start linking to it.

Now getting there is hard to do, most of my own sites gained popularity that way but if not I'd have been just as happy.

Here are some things you can do:

1) Get into some of the large directories. dmoz is crucial as it's syndicated to thousands of sites. Getting in there means getting thousands of backlinks. Looksmart is another famous one but you might have to pay for it (Yahoo is another biggie).

I don't recommend paying unless you know SEO and SEM well otherwise you will be wasting money.

2) If you have an already established site that is indexed add backlinks to your new site.

3) Contact websites and offer link exchanges.

Those are the basic steps for white-hat techniques. Other techniques include hiring someone to develop backlinks for you.

Black-hat techniques involve:

Forum spam (this pisses people off, we combat it well here).

Guestbook spam. This means spamming guestbooks just to add links. This is so common that search engines are starting to ignore guestbook backlinks.

Referrer spam. This involves sending fake traffic with your site as a referrer to a website that publishes referral traffic. This is a really pointless way of doing it as the logs change. It only works when the spammers use a large scale only possible through automation.

Generating doorway pages. This basically means making a fake network of sites that link to yours. It's not worth the effort.

-----

I recommend that you avoid any but the white hat techniques.

Maybe pay someone to give you a backlink just to get you spidered quick (for e.g. I get my new sites spidered within 7 days by putting backlinks on my other sites). But maybe even this isn't worth it as your initial pages are already indexed.

Google's cache of your forum is a cache before you (or Helena if you hired her) added my mod.

So when Google figures out that it can now index your pages it might go apeshit and deep index you.

My own advice is:

1) Get rid of the flash intro. Using a flash intro is basically to say "search engines, I do not need you. Go away".

Flash intros only make sense for sites that do not need or benefit from search engine traffic (e.g. graphics sites that do not need se traffic).

Make the main.htm page the index. And remember flash intros are the antithesis to useability, good sense and moral decency.

2) Get rid of the frames. Don't use frames unless you know what you are doing and know what the cost/benefit is.

3) Get rid of the flash navigation. And remember, flash navigation is the antithesis to useability and SEO and a general affont and internet pestilence.

Do that, and in a month or so Google should start indexing you more deeply.
0 Replies
 
tthome
 
  1  
Reply Tue 8 Jun, 2004 02:26 pm
I'll definitely do that...thanks....I'll see what I can do to get some more backlinks.

Tim
0 Replies
 
tthome
 
  1  
Reply Wed 9 Jun, 2004 10:31 am
CDK,

I took your advice this morning and removed the flash intro from my site and also added some nav links that matched my flash nav bar. If anyone would be willing to assist me in changing the nav bar on my site from flash to something more static to help in the SEO I'd apprecaite it.

CDK...if you have a minute, could you check it out really quick to see what i've done? I know you'll know if it will work better.

thx for everything,

Tim
0 Replies
 
Craven de Kere
 
  1  
Reply Wed 9 Jun, 2004 10:41 am
Your main logo is not linked.

Otherwise much much better, get rid of the flash nav and you'll be fine.
0 Replies
 
tthome
 
  1  
Reply Wed 9 Jun, 2004 10:48 am
does having a flash NAV automatically keep google from check out the rest of the site even though I have the same links at the top right in standard HTML?
0 Replies
 
Craven de Kere
 
  1  
Reply Wed 9 Jun, 2004 10:53 am
No, but they index flash poorly and navigation should be a flow for their indexing.

Function over form, right?
0 Replies
 
tthome
 
  1  
Reply Wed 9 Jun, 2004 11:01 am
I would agree with this....I went ahead and linked the logo back to the main index.html page, I also noticed that the "alt" for the image is my domain name. I assume this is a good thing to have my domain name in the "alt' text of the image?
0 Replies
 
tthome
 
  1  
Reply Wed 9 Jun, 2004 11:08 am
CDK,

I have some other domain names that I have forwarded to my forums, I have these domains masked so it won't show any of the forum URL's.

What would I need to do to best utilize these other 6 domains that I have to promote my forum?

I saw on your able2know.net that you help in site promotion. If you want send me a PM on what that would cost me.

the help you've provided so far has been fantastic, and free to boot, I really apprecaite it. If you can help boost me up the google ranks, I'd be willing to pay for it, but like most people here my funds are tight and this side business I'm running isn't allowing me to "back the money truck" up. I'd be paying you out of my own pocket...ok, enough of the sob story.

Anyhow, if you can give me some direction/guideance on how I can use these other domains I'll dow whatever I have to do to promote my main forum.

thx,

Tim
0 Replies
 
Craven de Kere
 
  1  
Reply Wed 9 Jun, 2004 11:10 am
tthome wrote:
I assume this is a good thing to have my domain name in the "alt' text of the image?


Not really (though not bad). See anyone who knows you domain can find you. What I'd do is use the main keyword query that you are targeting with your site.
0 Replies
 
Craven de Kere
 
  1  
Reply Wed 9 Jun, 2004 11:16 am
tthome wrote:
I have some other domain names that I have forwarded to my forums, I have these domains masked so it won't show any of the forum URL's.

What would I need to do to best utilize these other 6 domains that I have to promote my forum?


The best thing you can do is make 6 forum installs that run off the same topics, members etc database but with different config tables and templates.

But that's pretty complicated.

Let me explain why your set up is not optimal:

phpBB uses cookies to authenticate. It can't authenticate across domains (for security purposes you can't access one domain's cookie from another domain, if this were possible people could steal your passwords and session info).

So basically, users who access the forums from domains not specified in phpBB as the cookie domain will have log in difficulties.

Quote:
I saw on your able2know.net that you help in site promotion. If you want send me a PM on what that would cost me.


A2K net does lots of different promotional services (cost depends, of course, on what you want done), if you are interested you can use the contact form there to contact me by email.

Quote:
the help you've provided so far has been fantastic, and free to boot, I really apprecaite it. If you can help boost me up the google ranks, I'd be willing to pay for it, but like most people here my funds are tight and this side business I'm running isn't allowing me to "back the money truck" up. I'd be paying you out of my own pocket...ok, enough of the sob story.


I hear ya. And all of my sites have been "out of pocket" things too. This one (Able2Know) used to cost me hundreds and hundreds a month (though now it's breaking even).

Increasing your pagerank would be a piece of cake, but quite frankly I recommend against paying just for page rank as page rank is overrated, and because it's overrated you pay much more than it's worth.
0 Replies
 
tthome
 
  1  
Reply Wed 9 Jun, 2004 11:29 am
I'll send you a quick email from your able2know.net site. Maybe you can help me on the things I just can't seem to figure out.

I'm really determined to do what I can on my own, but I know that sometimes it's better to pay a little for help and ease up on the workload and frustration.

Thanks CDK...I'm sure I'll have other questions.
0 Replies
 
 

Related Topics

Webdevelopment and hosting - Question by harisit2005
Showing an Ico File - Discussion by Brandon9000
how to earn money in internet - Discussion by rizwanaraj
The version 10 bug. Worse then Y2K! - Discussion by Nick Ashley
CSS Border style colors - Question by meesa
There is no Wisdom in Crowds - Discussion by ebrown p
THANK YOU CRAVEN AND NICK!!! - Discussion by dagmaraka
I'm the developer - Discussion by Nick Ashley
 
  1. Forums
  2. » robot.txt 101 for dummies?
Copyright © 2024 MadLab, LLC :: Terms of Service :: Privacy Policy :: Page generated in 0.03 seconds on 05/04/2024 at 11:38:23