Introduction
- Big Data Introduction

What is Big Data
Bigdata Analytics
Bigdata Challenges
Technologies for Bigdata

Hadoop Introduction

What is Hadoop?
History of Hadoop
Basic Concepts
Future of Hadoop
The Hadoop Distributed File System
Anatomy of a Hadoop Cluster
Breakthroughs of Hadoop
Hadoop Distributions:
Apache Hadoop
Cloudera Hadoop (CDH)
Horton Networks Hadoop (HDP)
MapR Hadoop (mapR)

Hadoop Daemon Processes

Name Node
DataNode
Secondary Name Node/High Availability
Job Tracker/Resource Manager
Task Tracker/Node Manager

HDFS (Hadoop Distributed File System)

Blocks and Input Splits
Data Replication
Hadoop Rack Awareness
Hadoop Cluster Architecture and Block Placement
Accessing HDFS
JAVA Approach
CLI Approach
HDFS basic file operations
Basic Administration commands

Hadoop Installation Modes

Local Mode
Pseudo-distributed Mode
Fully distributed mode

YARN

What is YARN
How YARN Works
- Resource Manager
- Node Manager
- Application Master
- Containers and Uber jobs
Advantages of YARN

Hadoop Developer Tasks

Writing a MapReduce Program

Basic API Concepts
The Driver Class
The Mapper Class
The Reducer Class
The Combiner Class
The Partitioner Class
Examining a Sample MapReduce Program with several examples
Hadoop's Streaming API
Examining a Sample MapReduce Program with several examples

Hadoop Internals

Record Reader
Record Writer
Role of Reporter
Output Collector
Counters
ToolRunner

Advanced MapReduce Programming

The Secondary Sort
Counting with Counters
Distributed Cache
Distributed Grep
Customized Input Formats and Output Formats
Map-Side Joins
Reduce-Side Joins

Practical Development Tips and Techniques

Strategies for Debugging MapReduce Code
Testing MapReduce Code Locally by Using LocalJobRunner
Testing with MRUnit
Writing and Viewing Log Files

Data Input and Output

Sequence Files
Avro
Parquet
Creating Custom Writable and Writable-Comparable Implementations
Saving Binary Data Using SequenceFile and Avro Data Files
Issues to Consider When Using File Compression

Hadoop Ecosystems

PIG

Pig concepts
Pig Vs MapReduce and Hive
Modes of running Pig
Pig Latin Programming
Pig Latin Programming in Eclipse
Pig UDFs
Pig Macros
Accessing Hive from Pig

HIVE

Hive concepts
Hive architecture
Managed tables and external tables
Complex data types
Partitioned tables
Bucketed tables
Joins in Hive
Multiple ways of inserting data in Hive tables
CTAS, views and alter tables
Performance Tuning in Hive
User defined functions in Hive
- Hive UDF
- Hive UDAF
- Hive UDTF

SQOOP

Sqoop concepts
Sqoop architecture
Connecting to RDBMS
Internal mechanism of import/export
Import data from Oracle/Mysql to Hive
Export data to Oracle/Mysql
Other Sqoop tools

HBASE

HBase concepts
Zookeeper concepts
HBase Master and Regional server architecture
File storage architecture
NoSQL vs SQL
Defining Schema and basic operations
- DDLs
- DMLs
Access data stored in HBase using clients like CLI, and Java
HBase admin tasks
Hive and HBase integration
HBase use cases

OOZIE

Oozie concepts
Oozie architecture
- Workflow engine
- Coordination Engine
HPDL and XML for creating Workflows
Nodes in Oozie
- Action nodes
- Control nodes
Accessing Oozie jobs through CLI, and web console
Creating workflows in Oozie for:
- HDFS file operations scripts
- MapReduce programs
- Pig scripts
- Hive scripts
- Sqoop Imports/Exports

FLUME

Flume Concepts
Flume architecture
Flume Agents
- Source
- Chanel
- Sink
Creating Flume Agent configurations
Executing Flume jobs

KAFKA

Kafka characteristics and salient features
Kafka setup on windows and Linux
Kafka Architecture
Understanding real-time Kafka streaming
Producer and consumer APIs
Kafka Use cases

IMPALA

What is Impala
How Impala Works
Impala Vs Hive
Impala's shortcomings
Impala Hands on

ZOOKEEPER

Zookeeper Concepts
Zookeeper Architecture
Zookeeper CLI Operations
Zookeeper as a service
Zookeeper role in distributed environment

Integrations

Java and Hive integration
Java and HBase integration
Hive - HBase Integration

Course Deliverables

Workshop style coaching
Interactive approach
Course material
Hands on practice exercises for each topic
Quiz at the end of each major topic
Tips and techniques on Certification Examinations
Linux concepts and basic commands on demand

Gallery (4)

About the Trainer

Avg Rating

0 Reviews

0 Students

3 Courses

Veer Nagaraju

MS(Computer Science)

20 Years of Experience

He has over 23 years of diversified IT experience in the areas of Project/Program Management, Service Management, Application Development and Maintenance, ETL & DataWarehousing and Education/Training.

He is seasoned trainer in Hadoop, .NET and Oracle technologies. He conducted many workshops on Project Management topics such as PMP certification and Microsoft Office Project. He has more than 6 Ã?Â½ years of teaching experience that includes 6 years on Hadoop Development, Hadoop Administration (CDH and HDP distributions) and Spark (with Scala and Java), Kafka, Cassandra Technologies. He provides mentoring and consulting services for individuals and companies on Bigdata Analytics

Hadoop Consultant and Coach and providing Classroom, Online and Corporate trainings on bBigdata & Hadoop Development, Spark (with Scala and Java), Kafka, Cassandra and Hadoop Administration (Cloudera and Hortonworks distributions). Delivered corporate sessions in India and abroad for various MNCs like IBM, CTS Kolkata, CTS Chennai, CISCO, Scope International, American Megatrends , HP Malaysia, HP Philippines, Amdocs Philippines, NetApp Singapore, PT XL Axiata Tbk, pt bank of the Indonesia Tbk, Informatica, Capgemini, Genpact, Siemens, American Express, Deloitte, Wipro, Collabera, EA Sports, Verizon USA through Collabera, UHG-Hyderabad, Nomura Mumbai, Fidelity Chennai, FAI-Hyderabad, FactSet India, Barclays Bank Pune, and M3BI Hyderabad.

Published white papers on dealing with unstructured data in transportation sector using Solr cloud and Hadoop Ecosystems. He is providing Bigdata & Hadoop, Spark training, mentoring and consulting services to Factset and M3BI.

Visited various Engineering colleges like Gudlavelluru Enginnering College., VR Siddhartha Engg. College, DST sponsored programme at PVP Siddhartha Institute of Technology, VKR & VNB and AGK College of Engineering - Gudivada, Dhanekula Engineering college, Andhra Loyola institute of engineering and technology, Lakkireddy BalReddy College of Engineering, NRI Institute of Technology and conducted Hadoop workshops for Faculty members and Students

He handled large, medium and small size mission critical projects on various domains like Public Transportation, Finance, Banking/Credit card processing, e-Commerce, Content Management and HealthCare & Health Insurance. He served world class clients and delivered trainings and solutions on Bigdata & Hadoop, Spark (with Scala and Java), Hadoop Administration, Kafka, PHP, ASP, Oracle, SQL Server, Sybase, Mysql, ETL and Datawarehousing. Currently he is providing trainings and solutions on the following technologies: HDFS, MapReduce, YARN, Spark (with Scala and Java), PIG, Crunch, Hive, Impala, Sqoop, HBase, Oozie, Zookeeper, Mahout, Flume, Kafka, solr, Storm, Cassandra, Hadoop and Spark integrations

He worked in USA for 9 years and served for fortune 500 companies like American Express, ProgressRail Services, Indiana State Department of Health, and various medium and small size companies.