PDA

View Full Version : Is Google ignoring my robots.txt file?



flyingpylon
01-12-2004, 12:03 PM
Looks like Google might have finally found my new Indy Racing Collectibles site. I have a whopping 42 pages indexed as of right now.

However, when I do the following Google search:

indy site:www.indy-racing-collectibles.com

I'm seeing pages listed that I specifically told it not to index.

The interesting thing is that the pages I did not want indexed show up without any title, description, url, or cached info. Does this just mean that Google knows about them and adds the url to the index but would never display them to someone in search results?

The reason I'm even concerned about this is because this is an AWS site, and I've taken pains to try to keep it somewhat "on topic" to start with until I can better gauge the impact of allowing search engines to run wild over millions of possible urls. If Google ignores the robots.txt file and crawls them anyway, then I might as well just open it up and let them have their way with me.

Mike
01-12-2004, 12:33 PM
Couldn't it only go to those possible URL's if you were linking to them somewhere?

flyingpylon
01-12-2004, 12:43 PM
Yes, they're linked from other pages. But I don't want them crawled or indexed. Am I misunderstanding what robots.txt is for? Honestly, I've never had a reason to use it on other sites.

s2kinteg916
01-12-2004, 01:11 PM
did u upload a robots file after u submitted to google ?

sometimes it takes a while to go into effect...

i also have the same problem 4 of my pages show pr4... which shouldnt be spidered ? shouldnt they get a pr0 ?

michael_gersitz
01-12-2004, 07:31 PM
Disallow: /racing-books/nse/
Disallow: /racing-fan-shop/nse/
Disallow: /racing-magazines/nse/
Disallow: /racing-games/nse/
Disallow: /racing-videos-dvd/nse/
Disallow: /racing-videos-vhs/nse/
Disallow: /about-searching/
Disallow: /powersearches/
Disallow: /about-aws/
Disallow: /about-irc/
Disallow: /contact-irc/
Disallow: /terms-irc/
Disallow: /privacy-irc/
Disallow: /newsletter-sub-irc/


Which ones is it specifically going to?

s2kinteg916
01-12-2004, 09:32 PM
flying has the ebay affiliate program been converting for u ? i sent around 500 targetted clicks and go no bids or registrations at all

flyingpylon
01-13-2004, 07:01 AM
Michael-

Google is going (or did go) to almost all of the directories listed. I'm not so concerned about the bottom half of that list, those are just about/policy pages (though since I have links to them on every page, I was hoping to avoid losing PR). However, the top half of the list are pages that could (if all of the possible links are followed) result in hundreds of thousands of pages being crawled.

s2kinteg916-

It's possible that there was a day or so between the time I submitted to Google and the time I uploaded the robots.txt. But I'm fairly sure that these pages were crawled just within the last day or two. I suppose it's possible that Google hit the home page right away and created a list of urls to visit the next time and then didn't check robots.txt again.

I guess for now I will try not to panic and just see what happens over the next few days or week.

Regarding the eBay program and conversions, it's tough to convert without any traffic! I just launched the site a week ago. I'm not expecting much out of it - it's more just a way to display some pretty cool collectible items on my site, and if I get a few eBay nickels or registrations then that's okay with me. Longer term, I'll have to do some analysis to determine whether it's worth having it there. I'm really just at the very early stages of operating this site and I expect it may be a few years before it's where I want it to be.

Chris
01-13-2004, 09:07 AM
Google will see the links to the pages and add them as URLs it knows about but they shouldn't be listed in normal search results since Google will not actually crawl them.

Best.Flash
01-13-2004, 09:33 AM
Originally posted by Chris
Google will see the links to the pages and add them as URLs it knows about but they shouldn't be listed in normal search results since Google will not actually crawl them.

I know the Toolbar usually causes more confusion than anything ;) with that in mind I see robots.txt blocked pages do display their correct PR - in your opinion can that be read as they also leak PR from the pages their linked from?

Chris
01-13-2004, 09:47 AM
ITs probably guessed PR.