View Full Version : Spiders and websites....
Hey all,
How do spiders recognize that they are still on the same website, for example, for big websites have alot of servers/ip's who host articles on many of them.
Do the crawlers only look at the URL to see if they are still on the same webpage ?
Regards
darker
04-15-2006, 10:39 AM
I would think so yes, cause one ip can have many different domains aswell, so the domain name is what matters.
Chris
04-15-2006, 12:39 PM
For the most part though a crawler doesn't care if it is on the same site or not. It is judging pages, not sites.
Would it be possible to fake a renoun website (ie cnn.com) write an article about a website I want to promote (with links of course) and when the crawler sees that page, a custom httpd displays a fake URL confirming that the article is on CNN.com for example.
So we would be able to have links from any website as long as you spoof a page URL.
Regards
Masetek
04-15-2006, 05:07 PM
That sort of behavior will eventually lead in your site getting banned
Chris
04-15-2006, 05:08 PM
That doesn't work, you cannot change where domains resolve to.
Would a spider keep crawling a website after a page that would have reloaded automaticly ?
Automatic page reloading normally relies on client side scripting like JavaScript since bots don't run these types of server side scripting the pages would not reload for them.
john190
04-17-2006, 06:54 AM
That is something that I wouldn't consider. It is likely to get your site banned and may even damage the site that you are faking and you are then likely to get into trouble.
Would it be possible to fake a renoun website (ie cnn.com) write an article about a website I want to promote (with links of course) and when the crawler sees that page, a custom httpd displays a fake URL confirming that the article is on CNN.com for example.
So we would be able to have links from any website as long as you spoof a page URL.
Regards
Powered by vBulletin® Version 4.2.2 Copyright © 2024 vBulletin Solutions, Inc. All rights reserved.