Citations behind the Google Brain Word Vector Approach

Sharing is caring!



In October of 2015, a new algorithm was announced by members of the Google Brain group, described in this article from Search Engine Land — Meet RankBrain: The Artificial Intelligence That’s Now Processing Google Search Results One of the Google Brain team members that gave Bloomberg News a long interview on Rankbrain, Gregory S. Corrado was a co-inventor on a patent which was granted this August alongside other members of the Google Brain team.

From the SEM Post article, RankBrain: Everything We Know About Google’s AI Algorithm we’re told that Rankbrain uses theories from Geoffrey Hinton, involving Thought Vectors. We are told by the summary from the description in the patent about a word vector approach might be used in such a system:

Embodiments of the subject matter can be implemented to realize at least one of the following advantages. If the words are understood, unknown words in sequences of words can be predicted. Words surrounding a word that was known in a sequence of words can be predicted. Numerical representations of words in a vocabulary of words can be easily and effectively generated. The representations can reveal relationships and syntactic and semantic similarities between the words that they represent.

With a word prediction system using a two-layer architecture and by parallelizing the training process, the word prediction system can be can be effectively trained on very large word corpuses, e.g., corpuses that contain on the order of 200 billion words, leading to higher quality numerical representations than the ones that are obtained by training systems on relatively smaller word corpuses. Further, words can be represented in very high-dimensional spaces, e.g., spaces which have on the order of 1000 measurements, leading to higher quality representations than when words are represented in relatively lower-dimensional spaces. The time needed to train the word prediction system can be reduced.

An ambiguous or incomplete question that contains a few words could use those words to predict. Those called words could then be used to return search results that the words might have difficulties. The patent which describes this prediction process is:

Computing numerical representations of words in a high-dimensional space

Inventors: Tomas Mikolov, Kai Chen, Gregory S. Corrado and Jeffrey A. Dean
Assignee: Google Inc..
US Patent: 9,740,680
Allowed: August 22, 2017
Filed: May 18, 2015


Methods, systems, and apparatus, including computer applications for computing numeric representations of words. One of the methods includes obtaining a set of training data, wherein the set of training data comprises sequences of words; training a classifier and an embedding function on the set of training data, wherein coaching the embedding function comprises obtained trained values of the embedding function parameters; processing every word in the vocabulary using the embedding function in compliance with the trained values of the embedding function parameters to generate a respective numerical representation of each word in the vocabulary from the high-dimensional space; and linking each word in the vocabulary with the various numeric representation of the word from the high-dimensional space.

One of the things which I found really interesting about this patent was that it includes a number of citations. They looked worth reading, and historians of the patent co-authored several of them, by people that are well-known in the field of artificial intelligence, or by people from Google. As soon as I saw them, I started hunting on the Web for them, and I was able to find copies of these. I will read through them and believed it would be helpful to share those links; that was the idea behind this article. It might be beneficial to read as many of these as possible. Let us know what you’ve found intriguing if anything stands out in any way to you.

Bengio and LeCun, “Scaling learning algorithms towards AI,” Large-Scale Kernel Machines, MIT Press, 41 pages, 2007. cited by applicant.

Bengio et al., “A neural probabilistic language model,” Journal of Machine Learning Research, 3:1137-1155, 2003. cited by applicant .

Brants et al., “Large language models in machine translation,” Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language Learning, 10 pages, 2007. cited by applicant .

Collobert and Weston, “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning,” International Conference on Machine Learning, ICML, 8 pages, 2008. cited by applicant .

Collobert et al., “Natural Language Processing (Almost) from Scratch,” Journal of Machine Learning Research, 12:2493-2537, 2011. cited by applicant .

Dean et al., “Large Scale Distributed Deep Networks,” Neural Information Processing Systems Conference, 9 pages, 2012. cited by applicant .

Elman, “Finding Structure in Time,” Cognitive Science, 14, 179-211, 1990. cited by applicant .

Huang et al Improving Word Representations via Global Context and Multiple Word Prototypes,” Proc. Association for Computational Linguistics 2012. cited by applicant .

Mikolov and Zweig, “Linguistic Regularities in Continuous Space Word Representations,” submitted to NAACL HLT, 6 pages, 2012. cited by applicant .

Mikolov et al., “Empirical Evaluation and Combination of Advanced Language Modeling Techniques,” Proceedings of Interspeech, 4 pages, 2011. cited by applicant .

Mikolov et al., “Extensions of recurrent neural network language model,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528-5531, May 22-27, 2011. cited by applicant .

Mikolov et al., “Neural network based language models for highly inflective languages,” Proc. ICASSP, 2009, 4 pages. cited by applicant .

Mikolov et al., “Recurrent neural network based language model,” Proceedings of Interspeech, 4 pages, 2010. cited by applicant .

Mikolov et al., “Strategies for Coaching Large Scale Neural Network Language Models,” Proc. Automatic Speech Recognition and Understanding 2011. cited by applicant .

Mikolov, “RNNLM Toolkit,” Faculty of Information Technology (FIT) of Brno University of Technology [online], 2010-2012 [retrieved on Jun. 16, 2014]. Retrieved from the Internet: , 3 pages. cited by applicant .

Mikolov, “Statistical Language Models based on Neural Networks,” PhD thesis, Brno University of Technology, 133 pages, 2012. cited by applicant .

Mnih and Hinton, “A Scalable Hierarchical Distributed Language Model,” Advances in Neural Information Processing Systems 21, MIT Press, 8 pages, 2009. cited by applicant .

Morin and Bengio, “Hierarchical Probabilistic Neural Network Language Model,” AISTATS, 7 pages, 2005. cited by applicant .

Rumelhart et al., “Learning representations by back-propagating errors,” Nature, 323:533-536, 1986. cited by applicant .

Turian et al., “MetaOptimize / jobs / wordreprs /” [online], recorded on Mar. 7, 2012. Retrieved from the Web using the Wayback Machine: , 2 pages. cited by applicant .
Turlan et al., “Word Representations: A Simple and General Method for Semi-Supervised Learning,” Proc. Association for Computational Linguistics, 384-394, 2010. cited by applicant .

Turney, “Measuring Semantic Similarity by Latent Relational Analysis,” Proc. International Joint Conference on Artificial Intelligence 2005. cited by applicant .

Zweig and Burges, “The Microsoft Research Sentence Completion Challenge,” Microsoft Research Technical Report MSR-TR-2011-129, 7 pages, Feb. 20, 2011. cited by applicant.

Sharing is caring!