BigData
- What is BigData
- Characterstics of BigData
- Problems with BigData
- Handling BigData
• Distributed Systems
- Introduction to Distributed Systems
- Problems with Existing Distributed Systems to deal BigData
- Requirements of NewApprocach
- HADOOP history
• HADOOP Core Concepts
- HDFS
- MapReduce
• HADOOP Cluster
- Install Pseudo cluster
- Install Multi node cluster
- Configuration Introduction to HADOOP Cluster
- The Five Deamons working
• NameNode
• JobTracker
• SecondaryNameNode
• TaskTracker
• DataNode
- Introduction to HADOOP EcoSystem projects
• Writing MapReduce programs
- Understanding HADOOP API
- Basic programs of HADOOP MapReduce ApplicationForm
- Driver Code
- Mapper Code
- Reducer Code - Eclipse intigration with HADOOP for Rapid Application Development
• Understanding ToolRunner
- More about ToolRunner
- Combiner
- Reducer
- configure and close methods
• Common MapReduce Algorithems
- Sorting
- Searching
- Indexing
- TF-IDF
- Word_CoOccurance
• HADOOP EcoSystem
- Flume
- Sqoop
- Importing data from RDBMS using sqoop
- Hive
- Introduction to hive
- Creating tables in hive
- Running queries
- Pig
- Introduction to pig
- Different modes of pig
- when to use hive and when to use pig
- HBASE
- Basics of HBASE
• Advanced MapReduce Programming
- Developing custom Writable
- Developing custom WritableComparable
- Understanding Input Output formats
• Introduction to Ooziee
• Hands ons Exercise for each concept