Limitations of Hadoop
• Hadoop Map-reduce and HDFS are under active development.
• Programming model is very restrictive:- Lack of central data can be preventive.
• Joins of multiple datasets are tricky and slow:- No indices! Often entire dataset gets copied in the process.
• Cluster management is hard:- In the cluster, operations like debugging, distributing software, collection logs etc are too hard.
• Still single master which requires care and may limit scaling
• Managing job flow isn’t trivial when intermediate data should be kept
• Optimal configuration of nodes not obvious. Eg: – #mappers, #reducers, mem.limits