PY- SPARK 3.x 30hrs

Understanding Data Lake
Introduction to Spark
Spark Architecture
Spark RDD
Apache Spark in Cloud - Databricks Community & notebook
Apache spark in Hadoop Ecosystem - Zeplin notebook
Apache Spark in Local mode - scala / pyspark shell

Spark Execution model & Architecture
- how to run spark program
- Spark Single node and multi node cluster setup
- Spark Execution models & cluster managers
- working with notebook in cluster mode
- working with spark submit

Apache Spark using SQL - Getting Started

Launching and using Spark SQL CLI
Understanding Spark Metastore Warehouse Directory
Managing Spark Metastore Databases
Managing Spark Metastore Tables
Retrieve Metadata of Spark Metastore Tables
Role of Spark Metastore or Hive Metastore

Example to working with Dataframe
- DataFrame with SparkSQL shell
- Spark DataFrame
working with dataframe row
working with Dataframe row and unit test
working with Dataframe row and unstructure data
working with dataframe column
DataFrame partition and Executors
Creating and using UDF
Aggregation in DataFrame
Windowing in dataframe
- -Grouping Aggregation in Dataframe
DataFrame joins
Internal Joins & shuffle
Optimizing joins
Implementing Bucket joins
Spark Transformation and Actions
Spark Jobs Stages & Task
Understanding Execution plan
Unit Testing in Spark
Debuging Spark Driver and Executor
Spark Application logs in cluster

Assignment :
- Spark SQL Exercise

Apache Spark using SQL - Pre-defined Function

Overview of Pre-defined Functions using Spark SQL
Validating Functions using Spark SQL
String Manipulation Functions using Spark SQL
Date Manipulation Functions using Spark SQL
Overview of Numeric Functions using Spark SQL
Data Type Conversion using Spark SQL
Dealing with Nulls using Spark SQL
Using CASE and WHEN using Spark SQL

Apache Spark using SQL - Basic Transformations

Prepare or Create Tables using Spark SQL
Projecting or Selecting Data using Spark SQL
Filtering Data using Spark SQL
Joining Tables using Spark SQL - Inner
Joining Tables using Spark SQL - Outer
Aggregating Data using Spark SQL
Sorting Data using Spark SQL

Apache Spark using SQL - Basic DDL and DML

Introduction to Basic DDL and DML using Spark SQL
Create Spark Metastore Tables using Spark SQL
Overview of Data Types for Spark Metastore Table Columns
Adding Comments to Spark Metastore Tables using Spark SQL
Loading Data Into Spark Metastore Tables using Spark SQL - Local
Loading Data Into Spark Metastore Tables using Spark SQL - HDFS
Loading Data into Spark Metastore Tables using Spark SQL - Append and Overwrite
. Creating External Tables in Spark Metastore using Spark SQL
Managed Spark Metastore Tables vs External Spark Metastore Tables
Overview of Spark Metastore Table File Formats
Drop Spark Metastore Tables and Databases
Truncating Spark Metastore Tables
Exercise - Managed Spark Metastore Tables

Apache Spark using SQL - DML and Partitioning

Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
ntroduction to Partitioning of Spark Metastore Tables using Spark SQL
Creating Spark Metastore Tables using Parquet File Format
Load vs. Insert into Spark Metastore Tables using Spark SQL
Inserting Data using Stage Spark Metastore Table using Spark SQL
Creating Partitioned Spark Metastore Tables using Spark SQL
Adding Partitions to Spark Metastore Tables using Spark SQL
Loading Data into Partitioned Spark Metastore Tables using Spark SQL
Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
Using Dynamic Partition Mode to insert data into Spark Metastore Tables

Spark Stream using window functions

What are Discretized Streams?
How to Create Discretized Streams
Transformations on DStreams
Transformation Operation
Window Operations
Window
countByWindow
reduceByKeyAndWindow
countByValueAndWindow
Output Operations on DStreams
forEachRDD
SQL Operations
Aggregating Dataframes
Grouping Aggregations
Windowing Aggregations

Advance Pyspark

Join Operations
Stateful Transformations
Checkpointing
Accumulators
Fault Tolerance
Dataframe Joins and column name ambiguity
Outer Joins in Dataframe
Internals of Spark Join and shuffle
Optimizing your joins
Implementing Bucket Joins
Streaming Aggregates and State Store
Incremental Aggregates and Update Mode
Spark Streaming Output Modes
Statefull Vs Stateless Aggregation
Implementing Stateless Streaming Aggregation
Timebound Stateful Tumbling Window Aggregation
Watermarking and State Store Cleanup
Sliding Window Aggregates

Spark Structure Streaming

Introduction to Structured Streaming
Operations on Streaming Dataframes and DataSets
Window Operations
Handling Late Data and Watermarking
Performance Tuning
PySpark Streaming with Apache Kafka
Integration with Kafka Text Lecture
PySpark Streaming with Azure Databricks

Spark Programming , Model and execution

Execution Methods - How to Run Spark Programs?
Spark Distributed Processing Model - How your program runs?
Spark Execution Modes and Cluster Managers
Summarizing Spark Execution Models - When to use What?
Working with PySpark Shell - Demo
Working with Notebooks in Cluster - Demo
Working with Spark Submit -

Creating Spark Project Build Configuration
Configuring Spark Project Application Logs
Creating Spark Session
Configuring Spark Session
Data Frame Introduction
Data Frame Partitions and Executors
Spark Transformations and Actions
Spark Jobs Stages and Task
Understanding your Execution Plan
Unit Testing Spark Application
There will be 4-5 use case

About the Trainer

4.8 Avg Rating

235 Reviews

346 Students

16 Courses

Amit Raj

MSCIT

16 Years of Experience

About Me :
• Overall total 14.5+ years of Experience years of Experience in Application Design, Development & Deployment of Hadoop Eco System/Java/J2EE systems with good exposure to Enterprise Architectures.
• Relevant Experience 8.2 yrs into Big Data technologies working on multiple clients and domain knowledges.
• Experienced in Cassandra data modelling, cluster setup and data management.
• Experienced in working with Spark-RDD, Spark SQL and Spark Data Frame using MLIb to analyze structure data queries.
• Experienced to design solution using Spark Streaming and Kafka Streaming for Payment Gateway / point of sales events.
• Individual Contribution (Kafka Architect) : Delivered UAT and PROD Cluster within the timeline for Kafka cluster using Cloudera 6.x, CSP 2.0 .
• Implemented unified data platform to gather data from different sources using Kafka Producers and consumers in Scala and java.
• Solid background in Object-Oriented analysis & design, UML and various design patterns.
• Worked using Azure cloud(Blob,EventHub),kubernetes ,docker with Spark, scala ,Schema Registry , Avro Schema with home security application for Honeywell
• Implemented KSQL ,KTable and KStream using Confluent Kafka along with Kafka Connect .
• Hands on Data bricks - Databricks Clusters , Data Lakehouse , Delta lake , DBFS, EXPLORE, Analyze, Clean, Transform and Load Data in Databricks.
• Hands on Azure service - ADF ,ADLS ,Event hub, Security , NoSQL
• Motivated Technical Architect with 5 years of progressive experience.
• Energetic self-starter and team builder. Navigates high-stress situations and achieves goals on time and under budget.
• Effectively manages assignments and team members.
• Dedicated to self-development to provide expectation-exceeding service. Customer-focused , successfully contributing to company profits by improving team efficiency and productivity.
• Utilizes excellent organizational skill to enhance efficiency and lead teams to achieve outstanding delivery.

SKILLS
• Database architecture
• Database architecture development
• Data architecture
• Big Data ETL
• Technical solution development
• Azure data solutions
• Data insight provision
• Technical guidance
• IT architecture
• Technical solutions
• Big data frameworks

Technical Skills:

Hortonworks2.5, Cloudera5/6, Apache Hadoop2/3 ,Spark2/3,Apache Kafka, Confluent Kafka, Hive 2/3,Impala,Sqoop,OOZie,Zookeeper, Apache NiFI, Splunk ,Snowflake, Data Build tool (DBT) , HBase, Apache Cassandra /DataStax Cassandra , Data bricks , Azure Cloud , AWS cloud ,Airflow etc. .
Programming Language Python ,Scala & Java
Other Tools Kibana,Logstash,ElasticSearch,ELK-Hadoop,SBT, Elassandra, Tableau,

Reviews (109)

4 out of 5 109 reviews

Amit Raj https://p.urbanpro.com/tv-prod/member/photo/2700031-small.png Kannuru

4.805109

Amit Raj

Gaurav Bhausaheb gadhave

Reviewed on 26 Sep, 2024

Data Engineer with Hadoop3.x,Hive3.x,Spark3.x,Scala,Confluent Kafka ,Cassandra, Elastic Search ,Splunk and Azure Cloud

"This Scala course equipped me with valuable skills in big data development. The curriculum effectively covered both fundamental and advanced concepts. Practical exercises and real-world examples solidified my understanding. The instructor's expertise and guidance were instrumental in my learning journey. I highly recommend this course for aspiring Scala developers. "

Amit Raj

Rushi

Reviewed on 31 Aug, 2024

Data Engineer with Hadoop3.x,Hive3.x,Spark3.x,Scala,Confluent Kafka ,Cassandra, Elastic Search ,Splunk and Azure Cloud

"I would like to share my experience with my Scala Teacher, Amit .His unique teaching style truly stand out and he explains concept in such a clear and engaging way that I'll never forget. Learning with him has been an exceptional experience. "

Reply by Amit Raj

Thanks Rushikesh for your feedback . All the best and be in touch with me always .

Amit Raj

Kirti Sharma

Reviewed on 26 Aug, 2024

Data Engineer with Hadoop3.x,Hive3.x,Spark3.x,Scala,Confluent Kafka ,Cassandra, Elastic Search ,Splunk and Azure Cloud

"I truly appreciate the dedication and expertise Amit brought to every session. Their ability to break down complex concepts and make them easy to understand was outstanding. The practical examples and hands-on approach greatly helped me grasp the intricacies of Scala. "

Reply by Amit Raj

Thanks Kirti for your feedback . when ever you need any help just ping me , I will make sure always support for any query .

Amit Raj

Karishma

Reviewed on 26 Apr, 2023

Data Engineer with Hadoop3.x,Hive3.x,Spark3.x,Scala,Confluent Kafka ,Cassandra, Elastic Search ,Splunk and Azure Cloud

"Amit stands out as a top educator and mentor because of his great training in big data and cloud technologies. Amit goes above and above in his lessons to deliver a thorough learning experience. He expertly integrates technical principles with real-world examples and use cases to make the training sessions interesting and applicable. Amit's instruction is particularly exceptional because of his talent at demystifying complex ideas so that everyone can understand them, regardless of programming experience. He spends extra time making sure his pupils understand, patiently answers questions, offers direction on tasks and use cases. Amit further enhances the learning process by providing insightful advice on probable interview questions based on his expertise as a tech panelist in recruitment. Overall, Amit is a fantastic educator and mentor due to his commitment to their success, his love of learning, and his knowledge of big data and cloud technologies. As a learner, I am happy to have Amit as my mentor and instructor while I learn about the Hadoop ecosystem, Scala, Spark, and Azure. I was able to become proficient in these technologies thanks to Amit's training in a short amount of time. His approach to instruction is effective, relevant, and practical, and each lesson demonstrates his dedication to greatness. It has been quite helpful to have Amit's openness to answer questions, offer further advice, and share knowledge gained through participating on a tech panel. His thorough approach, calm manner, and knowledge of big data and cloud technologies have made learning interesting and fulfilling. I am grateful for Amit's guidance, and I heartily endorse him to anyone looking to improve their abilities in these fields. "

View All

Have you attended any class with Amit Raj ?