Hadoop Introduction
1 Session - 4 Hours
• What is Big Data?
• Source of Data
• Characteristics of Big Data
• Benefits of Big Data analysis
• Challenges in Big Data processing
• Why Hadoop for Big Data?
• Introduction to Hadoop
• Hadoop not good for …
• Hadoop Ecosystem
• Hadoop Installation
Hadoop Distribute File System
(HDFS)
1 Session - 4 Hours
• Hadoop Distributed File System (HDFS)
• HDFS Architecture
• Types of Nodes in HDFS
• Data Flow
• HDFS Block
• HDFS Federation
• HDFS High Availability (HA)
• HDFS Commands
• Hadoop Archives
• HDFS Accessibility
MapReduce Framework
1 Session - 4 Hours
• MapReduce Introduction
• How does MapReduce work?
• MapReduce Program
• MapReduce program execution
• MapReduce program Unit Testing
• Behind the Scenes : MapReduce
2 Session - 4 Hours
• Hadoop streaming
• Combiner
• Partitioner
• Counters
Hive
1 Session - 4 Hours
• Hive Introduction
• Installing & Running Hive
• Hive Components
• Hive Metastore
• HiveQL
• Hive Data Model
2 Session - 4 Hours
• Querying Data
• User-Defined Functions
Pig
1 Session - 4 Hours
• Pig Introduction
• How it works?
• Execution Types
• Running Pig Programs
• Pig Latin
2 Session - 4 Hours
• User-Defined Functions
• Data Processing Operators
• Pig Best Practices
Additional Concepts
1 Session - 4 Hours
• Introduction to HBase.
• HBase Architecture.
• HBase Practical.
2 Session - 4 Hours
• Introduction to SQOOP.
• Import data into Hadoop using SQOOP.
• Introduction to Flume.
• Practical example with Flume.