� Introduction to Hadoop
- Enterprise Data Trends @ Scale
- What is Big Data?
- A Market for Big Data
- Characteristics of Big Data 3VÂ 5VÂ 7Vâ??s of Big Data
- Most Common New Types of Data
- Moving from Causation to Correlation
- What is Hadoop? And Why Hadoop?
- Traditional Systems vs. Hadoop
- What is Hadoop 2.0?
- Overview of a Hadoop Cluster and Core components of Hadoop
- Different distributions of Hadoop
- Hadoop Use Case
- Lab exercise :- Login to Your Cluster
Â
� Hadoop Architecture
- Characteristics of Hadoop (Fault tolerance, Replication, Block size, Robustness)
- What is node, Rack, Cluster, datacenter and Data Hub
- Mapreduce Architecture
- HDFS Architecture
- Understanding Block Storage
- Demonstration: Understanding Block Storage
- The NameNode
- The Data Nodes
- HDFS Clients
Â
Â
� Installing Hadoop Cluster using Cloudera Manager
- Minimum Hardware Requirements
- Minimum Software Requirements
- A Formidable Starter Cluster
- Lab exercise :- Setting up the Environment
- Lab exercise :- Installing Cloudera Manager/Ambari and CDH/HDP
- Lab exercise :- Adding Services to Cluster
Â
� Configuring Hadoop
- Hadoop configuration files (core,hdfs.mapred,yarn-site.xml,bigtop_utils, master and slave files)
- Configuration Considerations
- Deployment Layout
- Configuring Hadoop Ports
- Configuring HDFS
- What Does the File System Check Look For?
- Replication Factor
- Understanding Hadoop Logs
- What is Cloudera Manager / Ambari
- Configuration via Cloudera Manager/Ambari
- Management Monitoring
- REST API and Thrift Server Overview
- Lab exercise :- Commissioning and Decommissioning of nodes
- Lab exercise :- Stopping and Starting CDH Services/HDP Services
- Lab exercise :- Using HDFS Commands, hadoop fsck and syntax and hadoop dfsadmin command
Â
� Ensuring Data Integrity
- Replication Placement
- Data Integrity â?? Writing Data
- Data Integrity â?? Reading Data
- Data Integrity â?? Block Scanning
- Running a File System Check
- What Does the File System Check Look For?
- hadoop fsck Syntax
- Data Integrity â?? File System Check: Commands & Output
- Hadoop dfsadmin Command
- NameNode Information
- Changing the Replication Factor
- Lab exercise :- Verify Data with Block Scanner and fsck
Â
� Mapreduce and YARN
- MapReduce
- Understanding MapReduce
- What is YARN?
- YARN Architecture ((RM, NM, AM, Container))Hadoop as Next-Gen Platform
- Beyond MapReduce
- YARN Use Case
- Lifecycle of a YARN Application
- Configuring YARN
- Configuring MapReduce tools
- YARN application logs
- YARN CLI
- Lab exercise :- Troubleshooting a MapReduce Job