Personalizing Search Results at Google

Sharing is caring!

196shares

document sets at Google

1 thing most SEOs are aware of is that search results at Google are sometimes personalized for searchers; but it’s not something that I’ve seen too much written about. When I came across a patent that’s about personalizing search results, I wanted to dig in, and see whether it could give us more insights.

The patent was an updated continuation patent, and I really like to look at these, because it is possible to compare changes to claims from an older version, to see if they could provide some details of how processes described in these patents have changed. Sometimes changes are spelled out in amazing detail, and sometimes they focus upon different concepts which may be in the original version of the patent, but weren’t necessarily focused upon so much.

One of the last continuation patents I looked at was one from Navneet Panda, in the post, Click a Panda: High Quality Search Results based on Duplicate Clicks and Visit Duration In that one, we saw a shift in focus to involve more user behaviour data like repeat clicks from the same user on a site, and the duration of a visit to a site.

Personalizing search results
Inventors: Paul Tucker
Assignee: GOOGLE INC..
US Patent: 9,734,211
Allowed: August 15, 2017
Filed: February 27, 2015

Abstract

A system receives a search query from a user and performs a search of a corpus of documents, depending on the search query, to form a ranked set of search results. The system re-ranks the set of search results based on preferences of the user, or a group of users, and provides the re-ranked search results to the user.

The older version of this patent is Personalizing search results, which was filed on September 16, 2013, and was granted on March 10, 2015.

A continuation patent has claims rewritten on it, that reflect changes in the way the process that’s been patented might have changed, using the filing date of the original version of the patent.

I enjoy comparing the claims, because that is what generally changes in continuation patents. I noticed some significant changes from the older version to this newer version.

There is far more focus on “high quality” sites and “distrusted websites” from the new version of the patent, which can be viewed in the first claim of the patent. It’s worth putting the old and the new first claim one following the other, and comparing the two.

The Old First Claim

1. A method comprising: identifying, by at least one of one or more server devices, a first set of files related to a user, files, in the initial set of files, being assigned weights which reflect a comparative quantification of an interest of the user in the files in the first group of files; receiving, by at least one of the one or more server devices, a search query from a client device connected to the user; identifying, by at least one of the one or more server devices and depending on the search query, a second set of documents, each file from the next set of files using a respective score; ascertaining, by at least one of the one or more server devices, that a specific document, from the next set of files, matches or links to one of the files in the initial set of files; correcting, by at least one of the one or more server devices, the respective rating of the specific document, to form an adjusted score, based on the weight assigned to the one of the files in the initial set of files; forming, by at least one of the one or more server devices, a list of files in which files from the next group of documents are ranked based on the respective scores, the specific document being ranked in the list based on the adjusted score; and providing, by at least one of the one or more server devices, the list of files to the client device.

The New First Claim

This is newly granted this week:

1. A method, comprising: determining, by at least one of one or more server devices, preferences of an individual or a group of users, wherein the preferences indicate a document bias set and weights assigned to the files, wherein the weights comprise distrusted document weights; ascertaining, from the at least one of the one or more server devices, a high quality document set obtained from a document ranking algorithm; creating, by at least one of the one or more server devices, an intersection set of files including documents in both the document bias set and the high quality document set; receiving, by at least one of the one or more server devices, a search query from the user; doing, by at least one of the one or more server devices, a hunt of a corpus of documents, depending on the search query, to form a ranked set of search results files; deciding, by at least one of the one or more server devices, at least one link from the intersection set of files to at least one file in the ranked set of search results files, the at least one file not in the intersection set of files; re-ranking, by at least one of the one or more server devices, the set of search results files depending on the preferences of the user or the group of users, whereas re-ranking the set of search results comprises: identifying a connection of this set of links from the intersection set of files to the document of this set of search results files, and based on identifying the connection, adjusting a rank of the search result file depending on the weight assigned to the document in the document prejudice set from where the identified link originated from; and providing, by at least one of the one or more server devices, the re-ranked search results to the user.

The changes I am seeing in these two different initial claims involve what are being called “distrusted document weights” from a “document bias set”, and showing pages from “a high quality document collection.” The more recent claim makes it more clear that personalized results come from these two different sets of results. It’s possible that it does not change how customization really works, but the higher clarity is good to see.

The Purpose of these Personalizing Search Results Patents

We’re told that some sites are favored more than others, and some are disliked more than others, and people are are made from a question or browser history, to create a document prejudice set:

FIG. 1 illustrates an overview of the re-ranking of search results based on a user’s or group’s document or site preferences. In accordance with this aspect of the invention, a document bias set F 105 could be generated that indicates the user’s or group’s preferred and/or disfavored documents. Bias set F 105 may be automatically collected from a query or browser history of an individual. Bias set F 105 may also be generated by human compilation, or editing of an automatically generated group. Bias set F 105 may include a set of files shared, or developed, by a group that may further include a community of users of common interest. Document bias set F 105 may include one or more designated files (e.g., files a, b, x, y and z) with associated weights (e.g. w.sup.a.sub.F, w.sup.b.sub.F, w.sup.x.sub.F, w.sup.y.sub.F and w.sup.z.sub.F). The weights could be assigned to each document (e.g., files a, b, x, y and z) according to a user’s, or group’s, relative preferences among files of prejudice set F 105. For instance, bias set F 105 may include a user’s personal most-respected, or most-distrusted, document list, with the weights being assigned to every document in prejudice set F 105 according to a comparative quantification of the user’s preference among each of the files of this set.

This document bias set mention appears in both the elderly, and the more recent version of this patent.

The patents also both refer to a high quality document set, and that is described in a way that seems to place a lot of focus on PageRank or a Hubs and Authority approach to rank:

A high quality document set L 110 can be obtained from any existing document ranking algorithm. Such document ranking algorithms might include a link-based ranking algorithm, such as, for instance, Google’s PageRank algorithm, or Kleinberg’s Hubs and Authorities ranking algorithm. The document ranking algorithm may provide a worldwide ranking of document quality which might be used for ranking the results of searches performed by search engines. High excellent document set L 110 could be derived from the highest-ranking files in the web as ranked by an existing document ranking algorithm. In one implementation, for instance, set L 110 may include the top percentage of the files globally ranked by an existing document ranking algorithm (e.g., the highest ranked 20 percent of files). In an implementation with PageRank, set L 110 may comprise documents having PageRank scores greater than a threshold value (e.g., files with PageRank scores greater than 10,000,000). Set L 110 may include multiple files (e.g., files m, n, o, p, x, y and z) with associated weights (e.g., weights W.sup.m.sub.L, W.sup.n.sub.L, W.sup.o.sub.L, W.sup.p.sub.L, W.sup.x.sub.L, W.sup.y.sub.L and W.sup.Z.sub.L). The weights could be assigned to each document (e.g., files m, n, o, p, x, y and z) based on a relative ranking of “quality” between the various documents of set L 110 produced from the document ranking algorithm.

Personalized results served to a searcher are results that come from both the document bias set, and the high quality document set (as the patent claims, by an “intersection” between the two sets).

If you are considering how personalized search may work at Google, spending some time with this new patent may provide some insights. Knowing about how two different sets of files are involved in returning results is a good starting point.

Sharing is caring!

196shares

Related