View Full Version : Google semantic recognition (or similar)
chromate
01-18-2005, 08:58 AM
I've just seen something very interesting!
I've just noticed that I'm listed #8 for "south beach diet". I haven't been listed for that term since I started my south beach diet site way back at the start of 2004.
This is despite TOTALLY gearing my site for the term (misspelling) "southbeach diet". I don't think I have a single backlink with the anchor text "south beach diet". The only time the term "south beach diet" appears on the page is once right at the bottom.
Something else that has happened since the last update is sites totally optimized for the normal spelling "south beach diet" have risen above me for the term "southbeach diet".
This is REALLY interesting and kinda says to me that Google is almost certainly using some sort of semantic recognition in the ranking algs that has associated the two terms as being closely linked and meaning the same thing. I can't think of any other reason as to why this should happen.
Anyway, I'm off to optimize some of the on page content to try and get a better ranking on the main term now I know I have a chance with it. So if you see more "south beach diet" terms, then I've added them since writing this.
moonshield
01-18-2005, 01:57 PM
nice. i have seen this myself.
chromate
01-18-2005, 02:52 PM
wanna elaborate? :)
nohaber
01-18-2005, 03:30 PM
chromate,
that's part of the stemming technology. It is very obvious - when you search for "south beach diet" and google bolds "southbeach". Basically when a user specifies a query let's say "keyw1 keyw2...keywN", google may choose to weight in additional words instead of the user typed, and it turns into sth like "keyw1 OR keyw1variant OR keyw1variant2 keyw2.." etc.
Most variants carry less weight, unless they are synonyms. So if the original keyword weight is 1, the variant can be sth like 0.5.
I've written a small article on this some time ago here: http://www.seoguide.org/seo201-google-stemming.htm
chromate
01-18-2005, 03:46 PM
Ah right. Yeah, I never actually thought about stemming. I didn't think Google would use stemming to that extent. I only thought it included short variant stuff like plural / not plural, Australia / Australian etc. I didn't think it would stem through whole words. Well, indeed, it didn't seem to before, as this particular change has only occurred since the last update.
chromate
01-18-2005, 04:02 PM
Just out of interest, what do you think is more valuable: a) 100 keyword rich anchor text backlinks providing an exact PR of 5 or b) One backlink providing the exact same PR of 5 and with the exact same anchor text.
ie, is quantity important?
ozgression
01-18-2005, 04:12 PM
When it comes to anchor text, I believe quantity is important...
moonshield
01-18-2005, 04:13 PM
I have seen it in a lot of cases but there is not one in particular thats pops into my head at the moment.
but as to the value question, I like to think that the one backlink is better. I would like a nice big page rank 5 link better then a million stupid links that add up to 5. Its all about the simplicity. but then again if that link is not that stable and you lose it, your quite up the creek.
chromate
01-18-2005, 04:35 PM
Well, forget the security aspects for now. I just wanted to know if anyone knew for certain if quantity is important. From personal observations in the "online dating" type of searches, quantity does seem to be important.
Blue Cat Buxton
01-19-2005, 03:05 AM
For new sites is there not some evidnce that one big link is better and avoids any sandboxing.
Take Chris for example. His new sites dont seem to suffer from sandboxing - he gets traffic almost immediately, but he links from a few high PR sites (Chris will no doubt correct me if I am wrong here)
Sites that need lots of links to get the same level of PR dont seem to get traffic as quickly.
This may be of course that Chris' site just have more PR so would rank better or that lots of links take longer to feed into the rankings, but from what I have read on sandboxing lots of new links may trigger the sandboxing effect.
nohaber
01-19-2005, 06:20 AM
Just out of interest, what do you think is more valuable: a) 100 keyword rich anchor text backlinks providing an exact PR of 5 or b) One backlink providing the exact same PR of 5 and with the exact same anchor text.
Depends on LocalRank. If the link has no LocalRank then option a) is better, though it will take more time for it to take effect.
Still, anchor text from higher PR links counts more than anchor text from low PR links, but the difference is not proportional to the PR (if it were, considering that nowadays most links do include the keywords, then all would come down to PR). So, higher PR gives more weight to anchor text, but not much more weight to prevent buy high PR non-relevant links. And the relevancy of a high PR link is determined by LocalRank (if it is relevant, then the page that links to you will also be ranked high with a good amount of PR).
If all anchors were created equal then forum sig links would be king. Anchor spamming would also be quite easy.
For new sites is there not some evidnce that one big link is better and avoids any sandboxing. There's no such thing as sandboxing.
New sites have a hard time getting top rankings because of 1 very simple reason: there are too many pages that contain the keywords (because of the growth of the web and all the SEO). Just a forum with 10000 posts all linking to your home page for your keywords adds 10000 pages that contain every one of your keywords.
When a user puts a query, Google has to decide which pages to "participate" in the calculation of the IR score. So it takes the list of all pages that contain every one of the searched keywords and decides on using the top N using some form of sorting (probably PR matters a lot here). Google does not care if it will miss a page that will otherwise rank #453. It cares about the top ones, and it will rank the minimal amount of pages that ensure top20-top30 SERPs and ignore the rest. Ranking all pages (although possible) is plain dumb, time and resource consuming. Have also in mind that if Google uses LocalRank, LocalRank slows down response time by adding a final reranking phase. So, to compensate, Google might have decided to more aggressively choose the top pages in the first phase, and less and less pages will have the chance to get in there.
How does Google decide which pages to participate? Well, that's sth only they can tell. I *guess* it keeps the inverted list sorted by a score based on PageRank and probably single keyword IR score. So when you search for lets say "search engine optimization" Google gets 3 huge lists:
all pages that contain "search" in the text and anchor texts
all pages that contain "engine" in the text and anchor texts
all pages that contain "optimization" in the text and anchor texts
Originally these lists were sorted by docID for quick merging but that was when the web was small and people didn't SEO this way. Now, I believe they are presorted by PageRank * IRScore (which makes answering single keyword queries trivial). So, the list for "search" contains all pages, sorted by their relevance to "search" based on PR x IR Score (no LocalRank here). Then the 3 lists are scanned from top down. It is safe to say that pages that should rank high for "search engine" should also be reasonable up in the lists for "search" and "engine" separately. While Google scans these lists, it asynchroniously sends all candidates to the index servers which calculate the IR score x PR and return them also asynchroniously. When a given number of pages participes in the ranking (let's say 40000 just as an example), Google stops considering the rest ones. They are "in the sandbox" using dumb SEO terms. When these pages finally get more PR, IR score and Google updates the inverted, they suddently appear in the top 1000, and sometimes on top of it :)
Sorry for the rant. To understand these things, just read carefully Google's prototype.
And yes, a high PR link will ensure getting in the top 1000 more quickly.
chromate
01-19-2005, 11:16 AM
There's no such thing as sandboxing.
New sites have a hard time getting top rankings because of 1 very simple reason: there are too many pages that contain the keywords (because of the growth of the web and all the SEO).
I used to be skeptical about the whole sandbox thing until I found it hard to get new sites listed, from about half way through 2004. I never had that problem before then even on very competitive keywords such as "dating services". This is something that has obviously been noticed by a lot of people all around the same time. (march 2004 I think). So, unless there was a massive increase of competing pages for almost all competitive keywords around march 2004, I don't think your explanation suffices to be honest.
From a lot of accounts, even pages with reasonably high PR (~7) would be ranked very low or not at all for a competitive keyword and then shoot up the SERPs suddenly after about 6 months.
I'm not saying that I think sandboxing in the common sense does exist. But I do at least believe something has happened that effects how fast a site gets ranked on competitive keywords.
(I haven't read your post completely and fully understood it, but I will do!) :)
Edit: Just read your post and see you offer an explination of sorts. Interesting.
moonshield
01-19-2005, 01:59 PM
maybe we should not do seo... has anyone tried doing that?
Powered by vBulletin® Version 4.2.2 Copyright © 2024 vBulletin Solutions, Inc. All rights reserved.