Key Big Data Terms You Should Know

Given below is a listing of key Big Data terms that you should know and a very brief explanation of what it is in simple language. Hope you find it useful.

1. Hadoop: System for processing very large data sets
2. HDFS or Hadoop Distributed File System: For storage of large volume of data (key elements – Datanodes, Namenode and Tasktracker)
3. MapReduce: Think of it as Assembly level language for distributed computing. Used for computation in Hadoop
4. Pig: Developed by Yahoo. It is a higher level language than MapReduce
5. Hive: Higher level language developed by Facebook with SQL like syntax
6. Apache HBase: For real-time access to Hadoop data
7. Accumulo: Improved HBase with new features like cell level security
8. AVRO: New data serialization format (protocol buffers etc.)
9. Apache ZooKeeper: Distributed co-ordination system
10. HCatalog: For combining meta store of Hive and merging with what Pig does
11. Oozie: Scheduling system developed by Yahoo
12. Flume: Log aggregation system
13. Whirr: For automating hadoop cluster processing
14. Sqoop: For transfering structured data to Hadoop
15. Mahout: Machine learning on top of MapReduce
16: Bigtop: Integrate multiple Hadoop  sub-systems into one that works as a whole
17. Crunch:  Runs on top of MapReduce, Java API for tedious tasks like joining and data aggregation.
18. Giraph: Used for large scale distributed graph processing

Also, embedded below is an excellent TechTalk by Jakob Homan of LinkedIn on the subject explaining these tech terms.

  • About Dr. Harish Kotadia


    That's me with photo gear,  taking snaps of Texas wild flowers. #texas

  • Dr. Harish Kotadia

  • Dr. Harish Kotadia is an industry recognized thought leader on Big Data and Analytics with more than fifteen years' experience as a hands-on Big Data, Analytics and BI Program/Project Manager implementing Enterprise Solutions for Fortune 500 clients in the US.

    He also has five years' work experience as a Research Executive in Marketing Research and Consulting industry working for leading MR organizations such as Gallup.

    Dr. Harish Kotadia's educational qualification includes Ph.D. in Marketing Management. Subject of his doctoral thesis was Customer Satisfaction and it involved building a statistical model for predicting satisfaction of clients with services of their ad agency.

    His educational qualification also includes M.B.A. and B.B.A. with specialization in Marketing Management and Diploma in Computer Applications.

    Dr. Harish Kotadia currently works as Principal Data Scientist and Client Partner, Big Data and Analytics at a Global Consulting Company. Views and opinion expressed in this blog are his own.



  • Subscribe to this blog via RSS or Email


     Subscribe in a reader

    Enter your email address:

    Delivered by FeedBurner

  • Search this blog:




  • Tag Cloud

  • Calendar of Blog Posts:

  • September 2014
    S M T W T F S
    « Mar    
     123456
    78910111213
    14151617181920
    21222324252627
    282930  


  • © 2014 Harish Kotadia. All Rights Reserved.
  • Harish Kotadia's Flickr Photos


    By Erik Rasmussen