Monday, February 7, 2011

Collective Intelligence: How to find needle in a Haystack

I came across Does Bing copy Google Search? the Evidence. For me the timing was perfect. I was refreshing my "collective intelligence" books for my project related work.

Programming Collective Intelligence: Building Smart Web 2.0 ApplicationsData Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)

Google turned out to be simply lucky with their algorithm.

Anyone interested in Collective intelligence necessarily comes across Google's Page Rank Algorithm or Amazon Recommendations or Netflix Prize.
Google's Page Rank is very similar to the idea of finding the most relevant paper on the topic by searching for the most cited scholarly paper on the topic. 
Are all patterns interesting?
Now there is no guarantee that when you apply the data mining techniques it will actually give any meaningful data.  That is the problem with machine learning. Not all patterns are interesting. 
That begs the question : what is an interesting pattern? 
The idea of continuum of Data, Information, Knowledge and Wisdom is a faimilar concept. But the concept is subjective. Look at the following analysis in Data Mining
This raises some serious questions for data mining. You may wonder,"What makes a pattern interesting? Can data mining system generate all of the interesting patterns? can a data mining system generate only interesting patterns? "
To Answer first question, a pattern is interesting if (1) it is easily understood by humans,(2)valid  on new or test data with some degree of certainty,(3) potentially useful, (4)novel. A pattern is also interesting if it validates the hypothesis user sought to confirm. An interesting pattern represents the knowledge.
Subjective interestingness measures find patterns interesting if they are unexpected and actionable.
Here is a relevant paper 
Now if you look at how interestingness is defined it is all subjective. There was absolutely no reason why Google's algorithm should necessarily give the best results. Once again, of all the algorithms  out there, there was no reason why results of page rank algorithms necessarily should satisfy the interestingness criteria for searching web pages using keywords.
I think they just got lucky!

It is 1% inspiration, the 99% perspiration
I strongly believe that innovation is like an iceberg.A publicly spectacular innovation is supported by invisible foundation of infrastructure and raw repositories of ideas and enthusiasm.These books are by far the closest coming to my conception of innovation.
Origins of Genius: Darwinian Perspectives on CreativityWhere Good Ideas Come From: The Natural History of Innovation

The reason why google is google ,is because of their infrastructure.
Google File System,
 Google Map Reduce
Straight from the horse's mouth
Here is a series of lectures by google








No comments:

Post a Comment