HADOOP ADMINISTRATION
Module 1 – Introduction to BigData & Hadoop
-
What is BigData
-
The amount of data processing in today’s life
-
What Hadoop is why it is important?
-
Hadoop comparison with traditional systems
-
Hadoop history
-
Hadoop main components and architecture
Module 2 – Hadoop Distributed File System (HDFS)
-
HDFS overview and design
-
HDFS architecture
-
HDFS file storage
-
HDFS Daemons and their importance
-
HDFS Daemons failure cases
-
Block Size
Module 3 – Planning your Hadoop cluster
-
Planning a Hadoop cluster and its capacity
-
Hadoop software and hardware configuration
-
HDFS Block replication and rack awareness
Module 4 – Hadoop Deployment
-
Different Hadoop deployment types
-
Hadoop distribution options
-
Hadoop competitors
-
Prerequisites of Hadoop Installation
-
SSH, Java
-
Hadoop installation (Single Node & Multi Node)
-
Configuration of Hadoop cluster
-
Parameters used in configuration
-
Distributed cluster architecture
Module 5 – Working with HDFS
-
HDFS Write Anatomy
-
HDFS Read Anatomy
-
Ways of accessing data in HDFS
-
CLI(Command Line Interface)
-
WEB UI
-
Common HDFS operations and commands
-
DFSADMIN commands
Module 6 – Mapreduce
-
What is MapReduce and why it is popular
-
The Big Picture of the MapReduce
-
MapReduce process and terminology
-
MapReduce Deamons & their importance
-
MapReduce components failures and recoveries
-
Working with MapReduce
-
Monitoring Running Jobs
-
YARN Introduction
Module 7-Installation & Management of Hadoop Eco systems
-
pig
-
Hive
-
Hive metastore
-
HBase
-
Sqoop
-
Flume
-
Oozie
Module 8 - Getting Data from External sources to Hadoop Cluster
-
Sqoop
-
Importing data from RDBMS(Mysql & Oracle) to Hadoop Cluster
-
Exporting data to RDBMS from Hadoop Cluster
-
Flume
-
Importing logs from other resources
-
Twitter Analysis
Module 9 – Hadoop Cluster Configuration & Performance tuning
-
Hadoop configuration overview and important configuration files
-
Configuration parameters and values
-
HDFS parameters
-
MapReduce parameters
Module 10 – Hadoop Administration and Maintenance
-
Namenode/Datanode directory structures and files
-
File system image and Edit log
-
The Checkpoint Procedure
-
Namenode failure and recovery procedure
-
Safe Mode
-
Metadata and Data backup
-
Potential problems and solutions / what to look for
-
Commissioning & Decommissioning of nodes
-
Balancing
Module 11– Hadoop Monitoring and Troubleshooting
-
Best practices of monitoring a Hadoop cluster
-
Using logs and stack traces for monitoring and troubleshooting
-
Using open-source tools to monitor Hadoop cluster
-
Monitoring Hadoop Cluster using Cloudera Manager
Module 12 – Job Scheduling
-
How to schedule Hadoop Jobs on the same cluster
-
Default Hadoop FIFO Schedule
-
Fair Scheduler and its configuration
-
Example of Job Scheduling using Oozie
Module 13 – High Availability Federation, Yarn and Security
-----Cloudera Security-Kerberos-TLS/SSL Encryption-CDH Encryption-Sentry.