- Introduction 
- Big Data Introduction
 
 
- What is Big Data
 - Bigdata Analytics
 - Bigdata Challenges
 - Technologies for Bigdata
 
- Hadoop Introduction
 
- What is Hadoop?
 - History of Hadoop
 - Basic Concepts
 - Future of Hadoop
 - The Hadoop Distributed File System
 - Anatomy of a Hadoop Cluster
 - Breakthroughs of Hadoop
 - Hadoop Distributions:
 - Apache Hadoop
 - Cloudera Hadoop (CDH)
 - Horton Networks Hadoop (HDP)
 - MapR Hadoop (mapR)
 
- Hadoop Daemon Processes
 
- Name Node
 - DataNode
 - Secondary Name Node/High Availability
 - Job Tracker/Resource Manager
 - Task Tracker/Node Manager
 
- HDFS (Hadoop Distributed File System)
 
- Blocks and Input Splits
 - Data Replication
 - Hadoop Rack Awareness
 - Hadoop Cluster Architecture and Block Placement
 - Accessing HDFS
 - JAVA Approach
 - CLI Approach
 - HDFS basic file operations
 - Basic Administration commands
 
- Hadoop Installation Modes
 
- Local Mode
 - Pseudo-distributed Mode
 - Fully distributed mode
 
- YARN
 
- What is YARN
 - How YARN Works 
- Resource Manager
 - Node Manager
 - Application Master
 - Containers and Uber jobs
 
 - Advantages of YARN
 
- Hadoop Developer Tasks
 
- Writing a MapReduce Program
 
- Basic API Concepts
 - The Driver Class
 - The Mapper Class
 - The Reducer Class
 - The Combiner Class
 - The Partitioner Class
 - Examining a Sample MapReduce Program with several examples
 - Hadoop's Streaming API
 - Examining a Sample MapReduce Program with several examples
 
- Hadoop Internals
 
- Record Reader
 - Record Writer
 - Role of Reporter
 - Output Collector
 - Counters
 - ToolRunner
 
- Advanced MapReduce Programming
 
- The Secondary Sort
 - Counting with Counters
 - Distributed Cache
 - Distributed Grep
 - Customized Input Formats and Output Formats
 - Map-Side Joins
 - Reduce-Side Joins
 
- Practical Development Tips and Techniques
 
- Strategies for Debugging MapReduce Code
 - Testing MapReduce Code Locally by Using LocalJobRunner
 - Testing with MRUnit
 - Writing and Viewing Log Files
 
- Data Input and Output
 
- Sequence Files
 - Avro
 - Parquet
 - Creating Custom Writable and Writable-Comparable Implementations
 - Saving Binary Data Using SequenceFile and Avro Data Files
 - Issues to Consider When Using File Compression
 
- Hadoop Ecosystems
 
- PIG
 
- Pig concepts
 - Pig Vs MapReduce and Hive
 - Modes of running Pig
 - Pig Latin Programming
 - Pig Latin Programming in Eclipse
 - Pig UDFs
 - Pig Macros
 - Accessing Hive from Pig
 
- HIVE
 
- Hive concepts
 - Hive architecture
 - Managed tables and external tables
 - Complex data types
 - Partitioned tables
 - Bucketed tables
 - Joins in Hive
 - Multiple ways of inserting data in Hive tables
 - CTAS, views and alter tables
 - Performance Tuning in Hive
 - User defined functions in Hive 
- Hive UDF
 - Hive UDAF
 - Hive UDTF
 
 
- SQOOP
 
- Sqoop concepts
 - Sqoop architecture
 - Connecting to RDBMS
 - Internal mechanism of import/export
 - Import data from Oracle/Mysql to Hive
 - Export data to Oracle/Mysql
 - Other Sqoop tools
 
- HBASE
 
- HBase concepts
 - Zookeeper concepts
 - HBase Master and Regional server architecture
 - File storage architecture
 - NoSQL vs SQL
 - Defining Schema and basic operations 
- DDLs
 - DMLs
 
 - Access data stored in HBase using clients like CLI, and Java
 - HBase admin tasks
 - Hive and HBase integration
 - HBase use cases
 
- OOZIE
 
- Oozie concepts
 - Oozie architecture 
- Workflow engine
 - Coordination Engine
 
 - HPDL and XML for creating Workflows
 - Nodes in Oozie 
- Action nodes
 - Control nodes
 
 - Accessing Oozie jobs through CLI, and web console
 - Creating workflows in Oozie for: 
- HDFS file operations scripts
 - MapReduce programs
 - Pig scripts
 - Hive scripts
 - Sqoop Imports/Exports
 
 
- FLUME
 
- Flume Concepts
 - Flume architecture
 - Flume Agents 
- Source
 - Chanel
 - Sink
 
 - Creating Flume Agent configurations
 - Executing Flume jobs
 
- KAFKA
 
- Kafka characteristics and salient features
 - Kafka setup on windows and Linux
 - Kafka Architecture
 - Understanding real-time Kafka streaming
 - Producer and consumer APIs
 - Kafka Use cases
 
- IMPALA
 
- What is Impala
 - How Impala Works
 - Impala Vs Hive
 - Impala's shortcomings
 - Impala Hands on
 
- ZOOKEEPER
 
- Zookeeper Concepts
 - Zookeeper Architecture
 - Zookeeper CLI Operations
 - Zookeeper as a service
 - Zookeeper role in distributed environment
 
- Integrations
 
- Java and Hive integration
 - Java and HBase integration
 - Hive - HBase Integration
 
- Course Deliverables
 
- Workshop style coaching
 - Interactive approach
 - Course material
 - Hands on practice exercises for each topic
 - Quiz at the end of each major topic
 - Tips and techniques on Certification Examinations
 - Linux concepts and basic commands on demand