Big Data Foundation with Hadoop & Spark on HDP

LIVE

24 Hours

Course offered by Ashwini Kumar

0 review

overview batches reviews

This foundation course is designed for developing basic understanding of concepts of Big Data storage and processing for unstructured, semi-structured and structured data using core technologies of Hadoop and Spark. The course is delivered over a period of 3 Days. This course covers the basics of core components of Hadoop ecosystem like HDFS, Sqoop, Hive, HBase, MapReduce as well as basics of Apache Spark framework. The desired prerequisite for participants is a basic understanding of Linux and Java programming language but that can also be picked up during the course. Basic concepts on Spark core APIs like RDD's shall be imparted as part of the course.

The whole course will be delivered as a mix of theory and hands-on sessions, where participants will be asked to perform hands-on operations on various topics along with instructor. At the end of the course, participants will be familiar with Big Data concepts as well as above mentioned Big Data tools and APIs of Hadoop and Spark ecosystems, which will build foundation for advanced learning of the Big Data frameworks and technologies. The course will be delivered on a Hortonworks Data Platform (HDP) cluster on an AWS cloud platform.

Target audience - Database designers, developers, architects, Data analysts, Data engineering and Data science professionals.

About the Trainer

Avg Rating

0 Reviews

0 Students

1 Courses

Ashwini Kumar

M.Tech-IIIT Hyderabad, B.Tech (NIT Kurukshetra)

- 15+ years of experience in Enterprise apps and product design, engineering, integration and deployment for diverse industry verticals using IT tools/technologies like Java/J2EE frameworks, NoSQL, Big Data Ingestion and Processing with Hadoop and Spark, Data Analytics, Data Acquisition & Analysis, Business and Operational Intelligence, AI/ML/NLP, Manufacturing Analytics, Manufacturing Execution Systems (MES), Front end technologies like Angular/CSS/HTML5 and Cloud based infrastructure and services
- About 5 years of experience on Administration, Architecture and Development on Hortonworks Data Platform (HDP), AWS EMR and Cloudera distribution of Hadoop (CDH) based Hadoop Data Lake solutions covering data ingestion, storage and analytics from disparate sources of structured, semi-structured and unstructured data.
- Hands-on experience on Big Data ecosystem technologies like Hadoop, HDFS, YARN, Sqoop, Hive, HBase, Oozie, Zookeeper, Kafka, Zeppelin and Spark (Spark RDDs, SparkSQL, DataFrames, Spark Streaming, MLlib) with programming in Java and Scala. Designed and developed Spark applications integrated with HDFS, AWS S3, Hive, Cassandra, HBase and Kafka
- Hands-on experience of designing, sizing, deploying, performance tuning and administration of multi-node Hadoop clusters with HDP using Ambari on cloud (AWS EC2, Azure) as well as on-prem infrastructure.
- Hands-on experience on architecting and developing Spark (1.6 and 2.x) applications for batch as well as real-time streaming data for ETL, Data Transformation and Machine learning.