Google’s search for meaning

by danjo

Ibland är det svårt att bestämma sig om man ska saker på allvar eller bara säga ‘coolt!’ och släppa det. Den här artikeln i New Scientist balanserar definitivt på den gränsen.

But Paul Vitanyi and Rudi Cilibrasi of the National Institute for Mathematics and Computer Science in Amsterdam, the Netherlands, realised that a Google search can be used to measure how closely two words relate to each other. For instance, imagine a computer needs to understand what a hat is.

To do this, it needs to build a word tree – a database of how words relate to each other. It might start with any two words to see how they relate to each other. For example, if it googles “hat” and “head” together it gets nearly 9 million hits, compared to, say, fewer than half a million hits for “hat” and “banana”. Clearly “hat” and “head” are more closely related than “hat” and “banana”.

To gauge just how closely, Vitanyi and Cilibrasi have developed a statistical indicator based on these hit counts that gives a measure of a logical distance separating a pair of words. They call this the normalised Google distance, or NGD. The lower the NGD, the more closely the words are related.