According to a filing issued to the U.S. Patent and Trademark Office, patent 7,158,961, Google is working on deploying a "similarity-engine." This similarity-engine compares documents and websites for redunancy.
A common problem for search engine uses is receiving similar results during a search. Most website results returned will either have identical information or "roughly the same" information. With a similarity-engine in place, Google will be able to return the most relevant information while hiding or discarding reptitive data.
Google's patent filing claims:
From the search engine's perspective, one problem in cataloging the large number of available web pages is that multiple ones of the web documents are often identical or nearly identical. Separately cataloging similar documents is inefficient and can be frustrating for the user if, in response to a request, a list of nearly identical documents is returned. Accordingly, it is desirable for the search engine to identify documents that are similar or "roughly the same" so that this type of redundancy in search results can be avoided.Google's similarity-engine project is not particularly earth-shattering. According to earlier reports, IBM, Hitachi and Visage Inc., are a few that have filed for similar inventions. In fact, over 15 patents for similarity-engines have been filed over the last 10 years.
According to Google, the similarity-engine will be based on creating and calculating differences and sums in vectors. Using hashes and what Google calls "sketches," its engine will be able to compare differences in text as well as images. The similarity-engine will take an object, create an vector for it, and compare the vector to that of another object.
Further into Google's filing, the search giant also describes the use of its similarity-engine in other applications. Besides web documents, the engine can be used to compare regular text documents, spreadsheets, presentations and other commonly used office productivity data.
"The concepts described could also be implemented based on any object that contains a series of discrete elements," the filing emphasized.
Source: dailytech
No comments:
Post a Comment