This class is for anyone who is interested in learning about data engineering with PySpark, Hadoop, and Azure Cloud. It is especially well-suited for:
- Software engineers who want to transition to a data engineering role
- Data scientists who want to learn more about data engineering to build better data pipelines and models
- Business analysts who want to learn more about data engineering to make better data-driven decisions
What will the students learn in this class?
In this class, students will learn the following:
- What is data engineering and why is it important?
- Setting up a single-node cluster in Windows and learning Installation and configuration of Hadoop, Hive, MySql, Apache Spark.
- Fundamental concepts and hands-on Bigdata, Hadoop, HDFS, Hive, Python and Azure Cloud
- Learn Apache Spark Foundation and Spark architecture
- Use PyCharm IDE for Spark development and debugging
- Learning Apache Spark (PySpark) programming knowledge from zero to intermediate level
- Learning how to source data from different formats, transform to valuable data, and store in various formats.
- Complete knowledge about RDD, DataFrame and Spark SQL, Spark Catalyst.
- Exception Handling, Debuging, Deploying Spark program in Python flovor.
- Create an orchestration and transformation job in Azure DataFactory (ADF)
- Develop, execute, and monitor data flows using Azure Synapse
- Create big data pipelines using Databricks and Delta tables
- Work with big data in Azure Data Lake using Spark Pool
- Migrate on-premises SSIS jobs to ADF
- Integrate ADF with commonly used Azure services, such as Azure ML, Azure Logic Apps, and Azure Functions
- Run big data compute jobs within HDInsight and Azure Databricks
- Copy data from AWS S3 and Google Cloud Storage to Azure Storage using ADF's built-in connectors
Is there anything the students need to bring to the class?
Students should bring a laptop with a minimum hardware configuration of 8GB RAM, Intel i5 Processor and 100GB HDD/SDD.
Benefits of attending the free demo course
- By the end of this course, you will be able to build data engineering solutions using Spark structured API in Python.
- By the end of this course, you’ll be able to use ADF as the main ETL and orchestration tool for your data warehouse or data platform projects
- Learn about the latest trends and technologies in data engineering
- Get hands-on experience with PySpark, Hadoop, and Azure Cloud
- Ask questions and get expert advice from experienced instructors.
Register for the free demo course today!
To register for the free demo course, please visit