12/21/2005

How Google Works

Google Engineer, Matt Cutts, recently wrote for an article for "Google's Newsletter for Librarians." The article, entitled, "How does Google collect and rank results?" offers an excellent outline of Google's methodologies which are similar, in fact, to the methodologies of other major search engines.

Below is a snippet of the article that focusses on relevancy and the importance of inbound links ::

Now we have the set of pages that contain the user's query somewhere, and it's time to rank them in terms of relevance. Google uses many factors in ranking. Of these, the PageRank algorithm might be the best known. PageRank evaluates two things: how many links there are to a web page from other pages, and the quality of the linking sites. With PageRank, five or six high-quality links from websites such as www.cnn.com and www.nytimes.com would be valued much more highly than twice as many links from less reputable or established sites. But we use many factors besides PageRank. For example, if a document contains the words "civil" and "war" right next to each other, it might be more relevant than a document discussing the Revolutionary War that happens to use the word "civil" somewhere else on the page. Also, if a page includes the words "civil war" in its title, that's a hint that it might be more relevant than a document with the title "19th Century American Clothing." In the same way, if the words "civil war" appear several times throughout the page, that page is more likely to be about the civil war than if the words only appear once.

As a rule, Google tries to find pages that are both reputable and relevant. If two pages appear to have roughly the same amount of information matching a given query, we'll usually try to pick the page that more trusted websites have chosen to link to. Still, we'll often elevate a page with fewer links or lower PageRank if other signals suggest that the page is more relevant. For example, a web page dedicated entirely to the civil war is often more useful than an article that mentions the civil war in passing, even if the article is part of a reputable site such as Time.com.