PDA

View Full Version : How To Stop Content Stealing?



ozgression
01-20-2005, 04:44 PM
What are the best ways to prevent people from using bots/software etc. to steal your websites content to reproduce it elsewhere? I realize I can't stop people copy and pasting every page of my site's content, but I would at the very least like to prevent "mass theft" of my content.

Cheers...

Chris
01-20-2005, 04:59 PM
1. Check User Agents and redirect the bad ones elsewhere.

No legitimate program allows user agent changing, if you can change your user agent the program is theft-ware. Its like calling trojan horses remote administration tools.

Anyways, for those who do change the user agent, you can't do it (unless you know their user agent is still fake, for instance if it is blank, or just says "Mozilla" or something like that.)

2. Check your logs for large requests from 1 IP, check out who owns the IP and it's viewing behavior. If it looks suspicious and is not a real search engine, block it.

3. For repeat offenders with changing IPs I've blocked entire chunks of IPs.

4. Use a script or something to monitor in real time which ips are requesting what files and when a single IP requests too many, block it.

ozgression
01-20-2005, 05:35 PM
Isn't there a way to ban user-agents/bots in robots.txt or something? Does this also work against unknown spiders etc?

Cheers...

Cutter
01-20-2005, 06:06 PM
Best solution: occasionally check google for duped content. If you find anything send a DMCA notice to their host and also Google.

James
01-20-2005, 06:53 PM
@ozgression+robots.txt Some unknown ones ignore it.

r2d2
01-21-2005, 01:31 AM
Yes, I would have thought that if they were stealing content, they wouldnt really even look at robots.txt, nevermind take notice of it.

Xander
01-21-2005, 03:27 AM
You can always check for already copied content with something like copyscape (http://copyscape.com/).

ASP-Hosting.ca
01-21-2005, 08:13 AM
http://copyscape.com/ seems like a good option indeed.

So what do you do exactly when you find a stolen copy of your content? Contact their webhost with request to remove the stolen material? Get your lawyer to send them a letter with request to remove the stolen content?

Has anybody actually done this?



Thanks,

Peter

Chris
01-21-2005, 08:15 AM
I have, sent it to the host, their site was down for a week or so while they removed it.

ASP-Hosting.ca
01-21-2005, 08:24 AM
Thanks Chris,

Did you use snail mail or email? Did you get a lawyer to draft the letter?



Peter

ASP-Hosting.ca
01-22-2005, 01:06 PM
Do you guys have a sample letter, which can be sent to the host in case of stolen copyrighted content?


Thanks,

Peter

Cutter
01-22-2005, 03:58 PM
Here's an example:

Search for kazaa (http://www.google.com/search?hl=en&q=kazaa&btnG=Google+Search) in google. Scroll down to the bottom and you see this message - "In response to a complaint we received under the Digital Millennium Copyright Act, we have removed 6 result(s) from this page. If you wish, you may read the DMCA complaint for these removed results."

You can read the exact letter sent to Google.

Here is Google's information they require for a DMCA notice: http://www.google.com/dmca.html

The ISP has to remove copyrighted content on request of the copyright holder or else they may be liable for it.

Mike
01-23-2005, 02:59 AM
Heh, never seen anything like that before on google.

Xander
01-23-2005, 03:51 AM
I find that strange to target google to get results taken out. Why not just get the sites themselves removed. Is it illegal to link to a site that contains illegal items/copyright infringements?

r2d2
01-23-2005, 05:59 AM
Interesting, so thats the Kazaa copyright owners complaining about people copying their program? Kind of ironic really..

moonshield
01-23-2005, 12:33 PM
lol, is that what it says? Kazaa complained about copyright violation? lol

incka
01-23-2005, 01:11 PM
Someone had copied entire debt management guide... Sent them a cease and desist and ordered them to pay a fine for my time.

James
01-23-2005, 01:57 PM
How much per hour did you charge? (if disclosable, of course)

Blue Cat Buxton
01-23-2005, 01:58 PM
Someone had copied entire debt management guide... Sent them a cease and desist and ordered them to pay a fine for my time.

Did it work?

r2d2
01-23-2005, 02:01 PM
Ah, just checked and found someone has copied the whole of House buying guide here: http://realestate.bizhat.com/guide/index.php

How do I found out who to email? Have looked at whois.sc: http://www.whois.sc/bizhat.com but couldnt find any where to go from there...

ozgression
01-23-2005, 02:54 PM
Did you try the contact form? http://www.bizhat.com/contact.php. Their host appears to be http://www.hosthat.com - contact them aswell.

Also, how did you find them?

Cheers...

r2d2
01-23-2005, 03:58 PM
Googled a random bit of text from my site in quotes: http://www.google.com/search?q=%22insurance+against+gazumping.+GA+Proper ty+Services+and+John+Charcol%22&num=100&hl=en&lr=&c2coff=1&safe=off&filter=0

Got my site and this other one, checked it out and saw they have copied the whole site...

May just go to the host directly? Dont see why I should just let them take it down, i would rather their host took the whole site down, and they had to sort it out...


Hmmm, offending site is 'BizHat', host is 'HostHat' I guess they might be hosted by themselves... What would you do in this situation?

ASP-Hosting.ca
01-23-2005, 04:30 PM
I would write directly to their dedicated hosting provider.



Hmmm, offending site is 'BizHat', host is 'HostHat' I guess they might be hosted by themselves... What would you do in this situation?

r2d2
01-23-2005, 04:41 PM
How could I find out who they are?

ozgression
01-23-2005, 04:44 PM
Could this be it - http://www.pwebtech.com ? http://www.whois.sc/65.98.61.140 (its the IP from the BizHat.com whois.sc results).

Cheers...

r2d2
01-23-2005, 05:13 PM
Ah thanks ozgression, will have to remember that. So would it be best to email pwebtech's abuse email address?

Does anyone know where I could get a template letter from?

ASP-Hosting.ca
01-23-2005, 06:11 PM
Ah thanks ozgression, will have to remember that. So would it be best to email pwebtech's abuse email address?

Does anyone know where I could get a template letter from?

Here is what you need:

OrgAbusePhone: +1-973-294-2021
OrgAbuseEmail: abuse@pwebtech.com

Use domainwhitepages.com it's very good :).

I can use a template email too. Does anybody want to share, Chris maybe?


Peter

r2d2
01-23-2005, 06:17 PM
Yes, I saw the abuse email address on ozgression's whois link too, that domainwhitepages.com is cool though, will have to bookmark that.

Well, I was feeling nice for some reason, and used their contact page and asked them to remove it or I would take it up with their host... Lets see what happens...

ASP-Hosting.ca
01-23-2005, 06:39 PM
Yes, I saw the abuse email address on ozgression's whois link too, that domainwhitepages.com is cool though, will have to bookmark that.

Well, I was feeling nice for some reason, and used their contact page and asked them to remove it or I would take it up with their host... Lets see what happens...

Let us know how it went...

Cutter
01-23-2005, 09:30 PM
The reason you have to go to Google is because sometimes you can't get a host to take the site down. If they are located in the US and they don't -- they can get in big trouble. If they are in China or perhaps some other country laws may be a little bit different.

Chris
01-23-2005, 09:57 PM
No, I just sent an email and fax to their host's legal dept (it was Rackspace).

Chris
01-23-2005, 10:08 PM
You know, just Googling around, its amazing how many of my articles have been copied....

r2d2
01-24-2005, 05:27 AM
Let us know how it went...

Just checked now, and the index page of the copy of my site now redirects to the homepage: http://realestate.bizhat.com/guide/index.php

Will check email later to see if they said anything...

ASP-Hosting.ca
01-24-2005, 07:26 AM
You know, just Googling around, its amazing how many of my articles have been copied....

Are you going to deal with all of these? It can be really time consuming to do this on a regular basis...

James
01-24-2005, 08:39 AM
You know, just Googling around, its amazing how many of my articles have been copied....
I don't think it's very amazing. Your sites are mostly pretty high-profile, easily found in search engines, and the infringers are probably just searching for the same thing your visitors are, then copying. Now, what'd be amazing is someone copying entries from my blog. (Note: it syndicates great big blog (http://greatbigblog.com) so the copyright google thing would probibaly show that in the results) Though in the past I have seen some small sites who have had their content stolen. Not even very high quality content, either.

incka
01-24-2005, 09:14 AM
I think there is some rule about enforcing your copyright or loosing it, or maybe that is patents...

Cutter
01-24-2005, 11:00 AM
Its trademarks that you need to enforce.

Thats the reason why a companies are sueing any website that might have their domain in their name even if the site poses no threat at all to them. If some time in the future they want to get rid of a site that really is threatening their brand, but there are a bunch of other sites using their name in the domain, they will most likely lose the case because they haven't been enforcing their trademark.

moonshield
01-24-2005, 01:29 PM
the content theives = the scum of the net

seafoam
02-17-2005, 04:56 PM
1. Check User Agents and redirect the bad ones elsewhere.

No legitimate program allows user agent changing, if you can change your user agent the program is theft-ware. Its like calling trojan horses remote administration tools.

Anyways, for those who do change the user agent, you can't do it (unless you know their user agent is still fake, for instance if it is blank, or just says "Mozilla" or something like that.)

How would you go about doing this? I've been looking for a way to prevent my site from being snaked by offline browsers/site grabbers.

Chris
02-17-2005, 08:04 PM
I do this:


$agent = strtolower($HTTP_USER_AGENT);
if ((strstr($agent, "rip")) ||
(strstr($agent, "get")) ||
(strstr($agent, "icab")) ||
(strstr($agent, "wget")) ||
(strstr($agent, "lwp-request")) ||
(strstr($agent, "Wg")) ||
(strstr($agent, "ninja")) ||
(strstr($agent, "Wget/")) ||
(strstr($agent, "reap")) ||
(strstr($agent, "subtract")) ||
(strstr($agent, "offline")) ||
(strstr($agent, "xaldon")) ||
(strstr($agent, "ecatch")) ||
(strstr($agent, "msiecrawler")) ||
(strstr($agent, "rocketwriter")) ||
(strstr($agent, "httrack")) ||
(strstr($agent, "track")) ||
(strstr($agent, "teleport")) ||
(strstr($agent, "teleport pro")) ||
(strstr($agent, "webzip")) ||
(strstr($agent, "extractor")) ||
(strstr($agent, "lepor")) ||
(strstr($agent, "copier")) ||
(strstr($agent, "disco")) ||
(strstr($agent, "capture")) ||
(strstr($agent, "anarch")) ||
(strstr($agent, "snagger")) ||
(strstr($agent, "downloader")) ||
(strstr($agent, "superbot")) ||
(strstr($agent, "strip")) ||
(strstr($agent, "block")) ||
(strstr($agent, "saver")) ||
(strstr($agent, "webdup")) ||
(strstr($agent, "webhook")) ||
(strstr($agent, "webdup")) ||
(strstr($agent, "pavuk")) ||
(strstr($agent, "interarchy")) ||
(strstr($agent, "blackwidow")) ||
(strstr($agent, "w3mir")) ||
(strstr($agent, "plucker")) ||
(strstr($agent, "naver")) ||
(strstr($agent, "cherry"))){
// $LogFile = '/home/aspen0/public_html/.htaccess';

// $rad = $_SERVER['REMOTE_ADDR'];


// $logline = "deny from ". $_SERVER['REMOTE_ADDR'] . "\n";

//$file = fopen("$LogFile", "a");
//flock($file, 2);
//fwrite($file, "$logline");
//flock($file, 3);
//fclose($file);
header("Location: http://www.online-literature.com/banned/banned.php");



The commented out lines are where I made an autoban. So that if someone used a ripper, then turned it off or tried to change the user agent, they'd be banned in .htaccess.

My .htaccess files got huge though, and many people on shared ips (like aol users) were banned.

moonshield
02-18-2005, 02:05 PM
that is awesome, thanks a lot.

moonshield
02-22-2005, 04:52 PM
Chris, how does one go about implementing this script? Just Include it? That did not work very well for me.

Chris
02-22-2005, 06:21 PM
If you use the header() redirect the php needs to be processed before any text/html is outputted.

moonshield
02-22-2005, 06:30 PM
oh, okay that makes sense. thanks.