Hadoop course Content online training â?? 30 hrs
Understanding Big Data
Introduction
Understanding Big Data
3V (Volume-Variety-Velocity) Characteristics
Structured and Unstructured Data
Application and use cases of Big Data
Limitations of Traditional Large Scale Systems
How a distributed way of computing is superior (cost and scale)
Installation/ setup/ configuration
Download and setting the VMware for running the Linux Cent OS or Ubunto
Download Install and configuration of Hadoop 1.0.4
Download Install and configuration of Hive 0.10 stable version
Download Install and configuration of sqoop 1.4.2
Download Install and configuration of oozie
Will share the configurations for setting up the Hadoop cluster
HDFS (Hadoop Distributed FileSystem)
HDFS Overview and Architecture
Data Replication
Safe Mode
Name Node
Checkpoint Node
Backup Node
Configuration Files
HDFS Data Flows
Read
Write
HDFS Commands
File System
Administrative
Advanced HDFS Features
HDFS Federation
HDFS High Availability
MapReduce Overview
Functional Programming Paradigms
Input and Output Formats
Hadoop Data Types
Input Splits
Shuffling
Sorting
Hadoop Streaming
Combiners
Partitioning
Configuration Files
Compression (Creating a sequence file and compressing the sequence file)
Distributed Cache
JVM Reuse
Standalone Mode
MR Algorithm and Data Flow
WordCount
MapReduce Architecture
Legacy MR
Next Generation MapReduce (aka YARN/MRV2)
Difference between Legacy MR and MR2 by programming
MR Best Practice and Debugging
Fundamental MR Algorithms (Non-Graph)
Max Temperature
Higher Level Abstractions for MapReduce â?? 1
Pig Introduction
Pig Latin Language Constructs
Pig User Defined Functions
Pig Use Cases
Pig scripts for data analysis
Higher Level Abstractions for MapReduce â?? 2
Hive - Introduction
Hive QL
Hive User Defined Functions
Hive Use Cases
NOSQL Databases
NoSQL Concepts
Review of RDBMS
Need for NOSQL
Brewers CAP Theorem
ACID vs BASE
Different Types of NoSQL Databases
Key Value
Columnar
Document
Graph
Columnar Databases
Hadoop Ecosystem
HBase â?? No SQL Database -
SQOOP
OOZIE
- At the end of each session material for the day will be shared and along with few links and other resources for the topics will be shared with the candidates.
- Exercises on the topic will be given for practice next day will discuss the solution for the same and will move to the next topic.
- Where ever possible Interview questions and scenario based problems will be shared to the candidates to get the insight of the real time environment.
- At the completion of the course a real time POC (Proof Of Concept) requirement will be given to the candidates for practicing the end to end Hadoop contents.
Optional:
For the candidate who do not have prior Java (Core Java) knowledge Five sessions on Java concepts required for writing MapReduce programs will be taken