- Introduction
- Big Data Introduction
- What is Big Data
- Bigdata Analytics
- Bigdata Challenges
- Technologies for Bigdata
- Hadoop Introduction
- What is Hadoop?
- History of Hadoop
- Basic Concepts
- Future of Hadoop
- The Hadoop Distributed File System
- Anatomy of a Hadoop Cluster
- Breakthroughs of Hadoop
- Hadoop Distributions:
- Apache Hadoop
- Cloudera Hadoop (CDH)
- Horton Networks Hadoop (HDP)
- MapR Hadoop (mapR)
- Hadoop Daemon Processes
- Name Node
- DataNode
- Secondary Name Node/High Availability
- Job Tracker/Resource Manager
- Task Tracker/Node Manager
- HDFS (Hadoop Distributed File System)
- Blocks and Input Splits
- Data Replication
- Hadoop Rack Awareness
- Hadoop Cluster Architecture and Block Placement
- Accessing HDFS
- JAVA Approach
- CLI Approach
- HDFS basic file operations
- Basic Administration commands
- Hadoop Installation Modes
- Local Mode
- Pseudo-distributed Mode
- Fully distributed mode
- YARN
- What is YARN
- How YARN Works
- Resource Manager
- Node Manager
- Application Master
- Containers and Uber jobs
- Advantages of YARN
- Hadoop Developer Tasks
- Writing a MapReduce Program
- Basic API Concepts
- The Driver Class
- The Mapper Class
- The Reducer Class
- The Combiner Class
- The Partitioner Class
- Examining a Sample MapReduce Program with several examples
- Hadoop's Streaming API
- Examining a Sample MapReduce Program with several examples
- Hadoop Internals
- Record Reader
- Record Writer
- Role of Reporter
- Output Collector
- Counters
- ToolRunner
- Advanced MapReduce Programming
- The Secondary Sort
- Counting with Counters
- Distributed Cache
- Distributed Grep
- Customized Input Formats and Output Formats
- Map-Side Joins
- Reduce-Side Joins
- Practical Development Tips and Techniques
- Strategies for Debugging MapReduce Code
- Testing MapReduce Code Locally by Using LocalJobRunner
- Testing with MRUnit
- Writing and Viewing Log Files
- Data Input and Output
- Sequence Files
- Avro
- Parquet
- Creating Custom Writable and Writable-Comparable Implementations
- Saving Binary Data Using SequenceFile and Avro Data Files
- Issues to Consider When Using File Compression
- Hadoop Ecosystems
- PIG
- Pig concepts
- Pig Vs MapReduce and Hive
- Modes of running Pig
- Pig Latin Programming
- Pig Latin Programming in Eclipse
- Pig UDFs
- Pig Macros
- Accessing Hive from Pig
- HIVE
- Hive concepts
- Hive architecture
- Managed tables and external tables
- Complex data types
- Partitioned tables
- Bucketed tables
- Joins in Hive
- Multiple ways of inserting data in Hive tables
- CTAS, views and alter tables
- Performance Tuning in Hive
- User defined functions in Hive
- Hive UDF
- Hive UDAF
- Hive UDTF
- SQOOP
- Sqoop concepts
- Sqoop architecture
- Connecting to RDBMS
- Internal mechanism of import/export
- Import data from Oracle/Mysql to Hive
- Export data to Oracle/Mysql
- Other Sqoop tools
- HBASE
- HBase concepts
- Zookeeper concepts
- HBase Master and Regional server architecture
- File storage architecture
- NoSQL vs SQL
- Defining Schema and basic operations
- DDLs
- DMLs
- Access data stored in HBase using clients like CLI, and Java
- HBase admin tasks
- Hive and HBase integration
- HBase use cases
- OOZIE
- Oozie concepts
- Oozie architecture
- Workflow engine
- Coordination Engine
- HPDL and XML for creating Workflows
- Nodes in Oozie
- Action nodes
- Control nodes
- Accessing Oozie jobs through CLI, and web console
- Creating workflows in Oozie for:
- HDFS file operations scripts
- MapReduce programs
- Pig scripts
- Hive scripts
- Sqoop Imports/Exports
- FLUME
- Flume Concepts
- Flume architecture
- Flume Agents
- Source
- Chanel
- Sink
- Creating Flume Agent configurations
- Executing Flume jobs
- KAFKA
- Kafka characteristics and salient features
- Kafka setup on windows and Linux
- Kafka Architecture
- Understanding real-time Kafka streaming
- Producer and consumer APIs
- Kafka Use cases
- IMPALA
- What is Impala
- How Impala Works
- Impala Vs Hive
- Impala's shortcomings
- Impala Hands on
- ZOOKEEPER
- Zookeeper Concepts
- Zookeeper Architecture
- Zookeeper CLI Operations
- Zookeeper as a service
- Zookeeper role in distributed environment
- Integrations
- Java and Hive integration
- Java and HBase integration
- Hive - HBase Integration
- Course Deliverables
- Workshop style coaching
- Interactive approach
- Course material
- Hands on practice exercises for each topic
- Quiz at the end of each major topic
- Tips and techniques on Certification Examinations
- Linux concepts and basic commands on demand