Sqoop – Data Transfer Tool Between Relational and HDFS

Sqoop is software for transferring data between relational databases and Hadoop. Sqoop became a top-level Apache project in March 2012. It can be used, for example, to populate tables from a relational or NoSQL database for use with Hive or HBase.

Microsoft uses a Sqoop-based connector to help transfer data from Microsoft SQL Server databases to Hadoop. Couchbase also provides a Couchbase-Hadoop connector by means of Sqoop.

Limitations of Hadoop

Limitations of Hadoop

• Hadoop Map-reduce and HDFS are under active development.

• Programming model is very restrictive:- Lack of central data can be preventive.

• Joins of multiple datasets are tricky and slow:- No indices! Often entire dataset gets copied in the process.

• Cluster management is hard:- In the cluster, operations like debugging, distributing software, collection logs etc are too hard.

• Still single master which requires care and may limit scaling

• Managing job flow isn’t trivial when intermediate data should be kept

• Optimal configuration of nodes not obvious. Eg: – #mappers, #reducers, mem.limits

No SQL Solution for Big Data Problem

Key-Value Databases:    Voldmort, Redis, Scalaris

Columnar Databases:     Hbase, Cassandra, Hypertable

Document Databases:   MongoDB, CouchDB

Graph Databases:            InfoGrid, Neo4j

Other Databases:             Kyotocabinet, Berkley DB

All these No-SQL databases, has capability to store unlimited data. These databcase can be used along with Memcached or Ehache for better user exeprience and faster procession.