Sunday, February 27, 2011

Estimates? What estimates? Stupidity of software estimates.

I have to confess: I hate software estimates. Every now and then the issue crops up. I run into bunch of idiots who believe software estimates have inherently profound intrinsic meaning and their job as a manager really means asking estimates from their direct reports and tracking them. They are generally still high on management cool-aid they drank in their latest training which tells them their job is to command, control (and coerce if need be).  I am not talking about sizing software in general. But specifically the obsessive compulsive micro-managing kind who insist developers can and should give estimates for all work they do.
The most dangerous kind have read something about SCRUM and agile somewhere and they just can't wait to drive a project so that they can claim in their upcoming review how they used all that goodness to make tons of money for the company. Make no mistake - their mind is still filled with the project management gunk they acquired over the years. Their tiny CPU has no capacity to process philosophical ideas so they feel insecure and panicky. Result? The concept of complex adaptive process never got registered. 
With estimates in their hand they are back in control again. Their self worth is somehow still mysteriously tied to being able to 'drive' things - 
and indeed drive they do - drive me crazy.  

Fortunately the world need not be so depressing - sometimes the common sense prevails.  Read this must read article.
I can't be as funny as Linus. But still I hate software estimates (and idiots who ask them). So I have to speak - it is high time to de-program ourselves from all the HR and management school jargon and see the naked reality. 

Naked Reality for the Long Term Planning 
Detailed Estimates are irrelevant to deciding head count cost. Neither can marketing predict exact number of units sold. 
Businesses make money by selling stuff and to make stuff they have to invest in equipment and hire people. If you know the cost then you can decide the price per unit, calculate profitability, prioritize work. Or so the theory goes ...
At first the argument seems reasonable. But underneath it is all hocus-focus.  Detailed developer estimates have nothing to do with the cost of developing software. 
Suppose you are working on a software product and need to estimate costs for a year.  
  • Software is a labour intensive industry. Headcount is a major cost. Salaries vary and not by just skill levels but for many other reasons. Sometimes people are tough negotiators or you are simply in a tough spot and need someone badly. And you must account for all that variation. Say 15% plus or minus. Sounds reasonable? Most would say yes. But wait - have you realized what that means?  If you are calculating cost for one man year then your cost estimates are already off by 8 working weeks or two months. Now add overhead per person- office space, subscriptions, machines. Another sources of variability in costs that you can not precisely predict.
  • Now turn to marketing - how many units  they think they are going to sell ? They won't give you a precise number. They will vary the price and the sales changes month to month and state by state. There are sales commission. Then there are promotions and marketing expenditure. Another big source of imprecision.   
  • Now software platforms move fast : there is iPhone, android, tablets, Phone 7s and RIA : focus was silverlight (I think it is HTML 5 now, wait they changed it back).You have flash but too bad apple banned it. How about wasted  effort on JavaFx? Ajax ? Jquery ? REST or soap? Then there is cloud. Again we need integration with facebooks, twitters. Point: You can't predict which technology you'll have to use a year from now. You don't know. How are you going to estimate?   
  • So do you need 7 developers or 8 ?  You say 8. Okay! Now choose . 8 =2 Sr. SDEs and 5 juniors ? or 8=  1 Sr SDE, 1 architect, 3  web developers, 1 DBA, 2 Server side devs? or one of the ten more such choices...
Do you really believe your sweating on precise developer estimates matters in any way ?  Seriously - think about it.You have to paint the picture using very very broad brush. 
So for the long term planning (6 months to 2 years) the developer estimates and precise specs are irrelevant. 
You say - You don't control the headcount and don't define marketing strategy.  Fair enough! As long as you don't claim your gathering and tracking of estimates is helping the big picture, I am okay. 
Let's move on.           

Naked Reality for the Medium Term Planning  
You do not need to know all the details to estimate.
Now we are talking about medium term 1 to 6 months. You know how many people you have and what technology you need. You also have fairly good idea of features. Your manager is asking when you'll be done and able to release. What do you do next ? Gather your developers and ask them to write up a design document and estimate every individual tasks? No !!! you don't need to do that! Lets go there step by step.   
  • First let me make sure that you know that estimate is not a single number. You are supposed to tell a range of possible values. For practical purpose the beta distribution is used in pert, critical path etc. We use 3 numbers [best case, worst case and normal].  Generally not every task will be done in worst time. Things generally average out. But don't be fooled into believing that you just add up individual estimates to come up with final estimate.  
  • Next is understanding that we build software in components. And our tasks can reflect that. For example : We can generally tell that our software will need a web service, a database, a few store procedures, couple of webpages and little bit of ajax. We also know we need to implement batch processing with XML. We can describe the architecture of our software from high level and make a guess at how complex each one will be without really going into too many details. 
  • Now use something similar to function points to make a guestimate but only at higher level of abstraction.  
  • We should estimate using developer weeks. This way we can plan out how many short iterations you will need.   
  • It is best to use variant of Delphi method to come up with better estimates. 
See! For the medium term planning you don't need detailed design. You will be able create rough schedule without knowing internals of each component. As long as you have good idea of how complex a component is and map that to dev weeks. 

Naked Reality for the Short Term Planning
On day to day basis the software estimates are meaningless because we can't track anything with them. Don't sweat over it.
We spend huge amount of time in unstructured way. I would say at least 1/3 of the time each week. Most of the time learning and reading stuff. 
Here are some examples. 
  • Understanding new system : reading tutorials 
  • Discovering apis and trying a small solution
  • Code review
  • Installing tools 
  • Software upgrade
  • Refactoring code
The list can go on and on. 
Here is another list of why sometimes our tasks themselves change
  • We discover issues with our approach
  • We have blocking issues and need to work around it.

Naked Reality for your Job Planning
If you are tracking projects and communicating status then as manager you are an overhead. 
Your salary as a manager comes from a business which in turn makes money by selling stuff.  
Not only that you yourself do not produce any of that stuff but you also do not shape vision. You  won't write any code and can't make important decisions. Technical people interview your new hires, your manager fights for the budget. Do not give me that "growing and developing people" crap. Your reports are smarter than you and need a solid technical person to be a mentor - not you. 
Possibly a decent project management software and self organizing team should replace you. (Status reports can be automatically generated). 
That will save lots of money. 
If you are worried about saving your job then start adding some unique value.

Conclusion : I hate software estimates because they are meaningless and useless.

Monday, February 7, 2011

Collective Intelligence: How to find needle in a Haystack

I came across Does Bing copy Google Search? the Evidence. For me the timing was perfect. I was refreshing my "collective intelligence" books for my project related work.

Programming Collective Intelligence: Building Smart Web 2.0 ApplicationsData Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)

Google turned out to be simply lucky with their algorithm.

Anyone interested in Collective intelligence necessarily comes across Google's Page Rank Algorithm or Amazon Recommendations or Netflix Prize.
Google's Page Rank is very similar to the idea of finding the most relevant paper on the topic by searching for the most cited scholarly paper on the topic. 
Are all patterns interesting?
Now there is no guarantee that when you apply the data mining techniques it will actually give any meaningful data.  That is the problem with machine learning. Not all patterns are interesting. 
That begs the question : what is an interesting pattern? 
The idea of continuum of Data, Information, Knowledge and Wisdom is a faimilar concept. But the concept is subjective. Look at the following analysis in Data Mining
This raises some serious questions for data mining. You may wonder,"What makes a pattern interesting? Can data mining system generate all of the interesting patterns? can a data mining system generate only interesting patterns? "
To Answer first question, a pattern is interesting if (1) it is easily understood by humans,(2)valid  on new or test data with some degree of certainty,(3) potentially useful, (4)novel. A pattern is also interesting if it validates the hypothesis user sought to confirm. An interesting pattern represents the knowledge.
Subjective interestingness measures find patterns interesting if they are unexpected and actionable.
Here is a relevant paper 
Now if you look at how interestingness is defined it is all subjective. There was absolutely no reason why Google's algorithm should necessarily give the best results. Once again, of all the algorithms  out there, there was no reason why results of page rank algorithms necessarily should satisfy the interestingness criteria for searching web pages using keywords.
I think they just got lucky!

It is 1% inspiration, the 99% perspiration
I strongly believe that innovation is like an iceberg.A publicly spectacular innovation is supported by invisible foundation of infrastructure and raw repositories of ideas and enthusiasm.These books are by far the closest coming to my conception of innovation.
Origins of Genius: Darwinian Perspectives on CreativityWhere Good Ideas Come From: The Natural History of Innovation

The reason why google is google ,is because of their infrastructure.
Google File System,
 Google Map Reduce
Straight from the horse's mouth
Here is a series of lectures by google