DATABRICKS DEVELOPER AND ADMIN : Azure / OnPrem

Time : 25hrs Approax

Fundamental Databricks

Introduction to Databricks
Databricks Terminology and Databricks Community
Create a free Databricks account
Introduction to the Databricks environment
First steps with Databricks

Databricks Platforms

Importing notebooks, language configuration and markdown
Databricks File Dystem (DBFS)
Create, manipulate and visualize tables
Databricks widgets

Databricks Utilities

Databricks Utils for managing File System and libraries
Databricks Utils for notebooks, secrets and Widgets

ETL Approach in Databricks:

Creating and saving DataFrames in Databricks
Transformation and visualization of data in Databricks

Databricks installation using Azure:

Introduction to Setup Databricks Environment using Azure
Signup for Azure Portal
Setup Azure Databricks using Azure Portal
Launching Azure Databricks Environment
Create Single Node Databricks Cluster
Editing Databricks Clusters using Databricks UI
Getting Started with Databricks Notebooks
Create Databricks SQL Warehouse
Increase Quota to Create Databricks SQL Warehouse Cluster
Run Queries using Databricks SQL Warehouse
Overview of Uploading Data using Databricks SQL Warehouse UI
Review Data Explorer of Data Science and Engineering Environment
Analyze Sales Data using Databricks Notebooks
Terminate Databricks Data Science and Engineering Clusters
Terminate Databricks SQL Warehouse Clusters
Delete Azure Databricks Workspace
Population Data Analytics Lab

Setup Databricks for SQL :

Installing Databricks CLI using python3
Configure Databricks CLI using Token and Profile
Setup Git Repository for Material and Data Sets related to Databricks SQL Course

Databricks SQL

Introduction DB-SQL platform
Run first SQL query using DB SQL editor
Intro about Dashborad of Databricks
Overview of Databricks SQL Data Explorer to review Metastore Database and Tables
Use Databricks SQL Editor to develop scripts or queries
Review Metadata of Tables using Databricks SQL Platform
Overview of loading data into retail_db tables
Configure Databricks CLI to push data into Databricks Platform
Copy JSON Data into DBFS using Databricks CLI
Analyze JSON Data using Spark APIs
Analyze Delta Table Schemas using Spark APIs
Load Data from Spark Data Frames into Delta Tables
Run Adhoc Queries using Databricks SQL Editor to validate data
Overview of External Tables using Databricks SQL
Using COPY Command to Copy Data into Delta Tables
Manage Databricks SQL Endpoints

Managing Database using DB SQL warehouse :

Review Databases using Databricks SQL Data Explorer
Create Database or Schema using Databricks SQL
Using IF NOT EXISTS while Creating Databases using Databricks SQL
Listing or Showing Databases and Getting Metadata of Databases using Databricks
Understand Default Location of Databricks SQL Database or Schema
Create Database or Schema using Location in Databricks SQL Warehouse
Drop Databases in Databricks SQL Warehouse
Alter Database in Databricks SQL Warehouse
Comments on Databases in Databricks SQL Warehouse

Manage Delta tables using DB SQL warehouse :

List Databases and Save Databricks SQL Script
Create Table using Delta Format in Databricks SQL Warehouse
Understand Location and Using Clause to specify File Format for Databricks
Create External Table using Delta Format in Databricks SQL Warehouse
Drop External Table and Delete Folder in Databricks SQL Warehouse
Overview of DML or CRUD Operations using Databricks SQL
Insert Records into Databricks SQL Warehouse table
Insert Multiple Records into Databricks SQL Warehouse table
Update Existing Records in Databricks SQL Warehouse table
Update Existing Records in Databricks SQL Warehouse table based on Null Values
Delete Existing Records in Databricks SQL Warehouse table
Cleanup Users Tables from Databricks SQL Warehouse Database or Schema

Setup Dataset for DB SQL Views and copy command

Create Folder in DBFS using Databricks CLI Commands
Copy Files from Local File System into DBFS using Databricks CLI Commands
Overwrite Files while Copying into DBFS using Databricks CLI Command
Understand Course Catalog Data in the files uploaded to DBFS
Options to Analyze Data using Databricks SQL Queries
Run Select Queries using DBFS Path in From Cluase
Run Queries using Temporary Views in Databricks SQL
Run Queries using External Tables in Databricks SQL

Queries to process value in JSON

Queries to Process Values in JSON String Columns
Get Distinct and Count based on Key using Course Catalog Data
Filter Data using Basic Databricks SQL Queries using Course Catalog Data
Exploring Functions using Databricks SQL
Understand Record Column Values in Course Catalog Table
Processing JSON String Values using Databricks SQL Queries
Process Instructors JSON Records using Databricks SQL Queries
Create View for Instructors using Databricks SQL Queries

Copy Data into Delta tables In Databricks Warehouse :

Create Delta Table for Course Catalog Data Set
Get File Names along with Data using Databricks SQL Queries
Overview of Databricks SQL COPY Command
Copy Data from single file into Delta Tables using Files
Copy Data from multiple files into Delta Tables using Files
Copy Data from multiple files into Delta Tables using Pattern
Create Course Catalog Table in Databricks SQL Warehouse with additional Column
Copy Data from Files using Queries into Delta Tables
Validate Course Catalog Table in Bronze Layer

Insert or Merge Query Results or View into delta tables using Databricks SQL

Introduction to Insert or Merge Query Results or View into Delta Tables using D
Create Course Catalog and Instructors Tables using Databricks SQL
Copy Data into Course Catalog Table from JSON Files using Databricks SQL
Insert Query Results into Delta Table using Databricks SQL
Exercise to Create Courses Table and Insert Data
Copy Instructors Data into Course Catalog Table from new file
Understand the Concept of Merge or Upsert in DML or CRUD Operations
Develop Query to Get the latest Instructors Records from Course Catalog Table
Overview of Merge Statement Syntax using Databricks SQL
Merge Data into Instructors Table from Course Catalog using Databricks SQL
Exercise to merge Courses Data from Course Catalog into Courses Table

Delta Lake Lab : Exercise

Data Lakehouse Architecture
Medallion Lakehouse architecture
Delta Lake
1: Create Delta Table (SQL & Python)
2: Read & Write Delta Table
3: Update / Delete / Merge
4: Schema Validation
5: Time Travel
6: Convert a Parquet table to a Delta table
7: Generated Columns
8: Incremental ETL load
9: Incremental ETL load (@version property)
Processing Nested XML file
Processing Nested JSON file
Delta Table - Time Travel and Vacuum

Databricks: Admin

Manage User & Group
Lab: Add User into Azure Active Directory
Lab: Create Group
Lab: Table Access Control
Lab: Workspace, Cluster, Job Access
Introduction to Azure Databricks Workspace.
Databricks Clusters
Databricks Pools
Databricks Notebooks and magic commands
Databricks CLI and DBFS management
Administrating Cluster via Terraform

Databricks Notebook – CI/CD using Azure Devops

Integrate databricks notebook with Git providers like Github.
Configure Continuous Integration - Artefacts to deployed in clusters.
Configure Continuous delivery using datathirst templates.
Run notebook on Azure Databricks via Jobs.
Secure cluster via cluster policy and permission
DataFactory LinkedServices
Orchestrate notebook via DataFactory

Databricks Cluster & Utilities details :

Navigate the Workspace
Databricks Runtimes
Clusters Part 1
Cluster Part 2
Notebooks
Libraries
Repos for Git integration
Databricks File System (DBFS)
DBUTILS
Widgets
Workflows
Metastore - Setup external Megastore
Metastore - Setup external Metastore II

Structure Streaming using Databricks, Spark and Azure

What is Spark Structure Streaming
Data Source & Sink
Lab: Rate & File Source
Lab: Kafka Source
Lab: Sink: Console, Memory, File & Custom
Lab: Build Streaming ETL
Lab: Setup Event Hub
Lab: Event Hub Producer
Lab: Integrate Event Hubs with Data Bricks
Lab: Transformation
Streaming ETL: Ingest into Azure Data storage

Deep Dive into DataLake house , Delta lake and Delta table :

Understanding Data Warehouse, Data Lake and Data Lakehouse
Databricks Lakehouse Architecture and Delta Lake
Delta Tables
Storing data in a Delta table, Databricks SQL and time travel
Delta Table caching
Delta Table partitioning
Delta Table Z-ordering
Where to go from here?

Azure Databricks
- Why Spark is difficult? Why Databricks Evolved?
- Why Databricks in Cloud? Introduction to Azure Databricks
- How to save Databricks demo Cost
- Demo overview
- Understand about Databricks tables and filessystem.
- Load CSV data in Azure blob storage
- Demo: Provision Databricks, Clusters and workbook
- Demo: Mount Data Lake to Databricks DBFS
- Creating Azure Free Account
- Azure Portal Overview
- Introduction to Azure Databricks
- Creating Azure Databricks Service
- Azure Databricks Architecture Overview
- Project Solution Databricks Notebooks
- Azure Databricks Cluster Types
- Azure Databricks Cluster Configuration
- Creating Azure Databricks Cluster
- Azure Databricks Cluster Pool
- Azure Databricks Notebooks Introduction
- Magic commands
- Databricks Utilities
- Databricks File System (DBFS)
- Databricks Mount overview
- Creating Azure Data Lake Storage Gen2
- Creating Azure Service Principal
- Mounting Azure Data Lake Storage Gen2
- Secret Scopes Overview
- Creating Secret Scope and Secrets in Key Vault
- Mounting Data Lake Using Secrets
- Project :
- Formula1 Data Overview
- Upload Formula1 Data to Data Lake
- Project Requirement Overview
- Solution Architecture Overview
- Data Ingestion - CSV
- Data Ingestion - JSON
- Data Ingestion - Multiple Files
- Databricks Workflows
- Filter & Join Transformations
- Aggregations
- Spark SQL - Databases/ Tables/ Views
- Spark SQL - Filters/ Joins/ Aggregations
- Incremental Load
- Data Loading Design Patterns
- Formula1 Project Scenario
- Formula1 Project Data Set-up
- Full Refresh Implementation
- Incremental Load - Method 1
- Incremental Load - Method 2
- Incremental Load Improvements - Assignment
- Incremental Load Improvements - Solution
- Incremental Load - Notebook Workflows
- Incremental Load - Race Results
- Incremental Load - Driver Standings
- Incremental Load - Constructor Standings (Assignment)
- Pitfalls of Data Lakes
- Data Lakehouse Architecture
- Read & Write to Delta Lake
- Updates and Deletes on Delta Lake
- Merge/ Upsert to Delta Lake
- History, Time Travel, Vacuum
- Delta Lake Transaction Log
- Convert from Parquet to Delta
- Data Ingestion - Circuits File
- Data Ingestion - Results File
- Data Ingestion - Results File
- File Improvements
- Data Transformation -pysprrk/spark-scala/SQL

DEMO: EXPLORE, Analyse, Clean, Transform and Load Data in Databricks
Azure Databricks Clusters
Azure Databricks other Important Components
Databricks – Monitoring

Use Case Discussion and solution using DATABRICKS: Any 2 use case will be taken during training
Building a solution architecture for a data engineering solution using Azure Databricks, Azure Data Lake Gen2, Azure Data Factory and Power BI
Mounting Azure Storage in Databricks using secrets stored in Azure Key Vault
Working with Databricks Tables, Databricks File System (DBFS) etc
Using Delta Lake to implement a solution using Lakehouse architecture
Creating dashboards to visualise the outputs
Connecting to the Azure Databricks tables from PowerBI
Working with Databricks notebooks as well as using Databricks utilities, magic commands etc.
Configure Azure Databricks logging via Log4j and spark listener library via log analytics workspace.
Configure notebook deployment via Databricks Jobs.
Configure CI CD using Azure DevOps.
Delta Lake : Spark /Scala using Databricks

Detailed discussion on delta lake -spark/Scala

Introduction to Data Lake
Key Features of Delta Lake
Implementing incremental load pattern using delta lake
Emergence of Data Lakehouse architecture and the role of delta lake.
Read, Write, Update, Delete and Merge to delta lake using both PySpark as well as SQL
Create a table
Write a table
Read a table
Schema validation
Update table schema
Table Metadata
Delete from a table
Update a Table
Vacuum
History
Concurrency Control
Optimistic concurrency control
Migrate Workloads to Delta Lake
Optimize Performance with File Management
Auto Optimize
Optimize Performance with Caching
Delta and Apache Spark caching
Cache a subset of the data
Isolation Levels
Best Practices
working on multiple use cases

About the Trainer

4.8 Avg Rating

235 Reviews

346 Students

16 Courses

Amit Raj

MSCIT

16 Years of Experience

About Me :
• Overall total 14.5+ years of Experience years of Experience in Application Design, Development & Deployment of Hadoop Eco System/Java/J2EE systems with good exposure to Enterprise Architectures.
• Relevant Experience 8.2 yrs into Big Data technologies working on multiple clients and domain knowledges.
• Experienced in Cassandra data modelling, cluster setup and data management.
• Experienced in working with Spark-RDD, Spark SQL and Spark Data Frame using MLIb to analyze structure data queries.
• Experienced to design solution using Spark Streaming and Kafka Streaming for Payment Gateway / point of sales events.
• Individual Contribution (Kafka Architect) : Delivered UAT and PROD Cluster within the timeline for Kafka cluster using Cloudera 6.x, CSP 2.0 .
• Implemented unified data platform to gather data from different sources using Kafka Producers and consumers in Scala and java.
• Solid background in Object-Oriented analysis & design, UML and various design patterns.
• Worked using Azure cloud(Blob,EventHub),kubernetes ,docker with Spark, scala ,Schema Registry , Avro Schema with home security application for Honeywell
• Implemented KSQL ,KTable and KStream using Confluent Kafka along with Kafka Connect .
• Hands on Data bricks - Databricks Clusters , Data Lakehouse , Delta lake , DBFS, EXPLORE, Analyze, Clean, Transform and Load Data in Databricks.
• Hands on Azure service - ADF ,ADLS ,Event hub, Security , NoSQL
• Motivated Technical Architect with 5 years of progressive experience.
• Energetic self-starter and team builder. Navigates high-stress situations and achieves goals on time and under budget.
• Effectively manages assignments and team members.
• Dedicated to self-development to provide expectation-exceeding service. Customer-focused , successfully contributing to company profits by improving team efficiency and productivity.
• Utilizes excellent organizational skill to enhance efficiency and lead teams to achieve outstanding delivery.

SKILLS
• Database architecture
• Database architecture development
• Data architecture
• Big Data ETL
• Technical solution development
• Azure data solutions
• Data insight provision
• Technical guidance
• IT architecture
• Technical solutions
• Big data frameworks

Technical Skills:

Hortonworks2.5, Cloudera5/6, Apache Hadoop2/3 ,Spark2/3,Apache Kafka, Confluent Kafka, Hive 2/3,Impala,Sqoop,OOZie,Zookeeper, Apache NiFI, Splunk ,Snowflake, Data Build tool (DBT) , HBase, Apache Cassandra /DataStax Cassandra , Data bricks , Azure Cloud , AWS cloud ,Airflow etc. .
Programming Language Python ,Scala & Java
Other Tools Kibana,Logstash,ElasticSearch,ELK-Hadoop,SBT, Elassandra, Tableau,

Reviews (109)

4 out of 5 109 reviews

Amit Raj https://p.urbanpro.com/tv-prod/member/photo/2700031-small.png Kannuru

4.805109

Amit Raj

Gaurav Bhausaheb gadhave

Reviewed on 26 Sep, 2024

Data Engineer with Hadoop3.x,Hive3.x,Spark3.x,Scala,Confluent Kafka ,Cassandra, Elastic Search ,Splunk and Azure Cloud

"This Scala course equipped me with valuable skills in big data development. The curriculum effectively covered both fundamental and advanced concepts. Practical exercises and real-world examples solidified my understanding. The instructor's expertise and guidance were instrumental in my learning journey. I highly recommend this course for aspiring Scala developers. "

Amit Raj

Rushi

Reviewed on 31 Aug, 2024

Data Engineer with Hadoop3.x,Hive3.x,Spark3.x,Scala,Confluent Kafka ,Cassandra, Elastic Search ,Splunk and Azure Cloud

"I would like to share my experience with my Scala Teacher, Amit .His unique teaching style truly stand out and he explains concept in such a clear and engaging way that I'll never forget. Learning with him has been an exceptional experience. "

Reply by Amit Raj

Thanks Rushikesh for your feedback . All the best and be in touch with me always .

Amit Raj

Kirti Sharma

Reviewed on 26 Aug, 2024

Data Engineer with Hadoop3.x,Hive3.x,Spark3.x,Scala,Confluent Kafka ,Cassandra, Elastic Search ,Splunk and Azure Cloud

"I truly appreciate the dedication and expertise Amit brought to every session. Their ability to break down complex concepts and make them easy to understand was outstanding. The practical examples and hands-on approach greatly helped me grasp the intricacies of Scala. "

Reply by Amit Raj

Thanks Kirti for your feedback . when ever you need any help just ping me , I will make sure always support for any query .

Amit Raj

Karishma

Reviewed on 26 Apr, 2023

Data Engineer with Hadoop3.x,Hive3.x,Spark3.x,Scala,Confluent Kafka ,Cassandra, Elastic Search ,Splunk and Azure Cloud

"Amit stands out as a top educator and mentor because of his great training in big data and cloud technologies. Amit goes above and above in his lessons to deliver a thorough learning experience. He expertly integrates technical principles with real-world examples and use cases to make the training sessions interesting and applicable. Amit's instruction is particularly exceptional because of his talent at demystifying complex ideas so that everyone can understand them, regardless of programming experience. He spends extra time making sure his pupils understand, patiently answers questions, offers direction on tasks and use cases. Amit further enhances the learning process by providing insightful advice on probable interview questions based on his expertise as a tech panelist in recruitment. Overall, Amit is a fantastic educator and mentor due to his commitment to their success, his love of learning, and his knowledge of big data and cloud technologies. As a learner, I am happy to have Amit as my mentor and instructor while I learn about the Hadoop ecosystem, Scala, Spark, and Azure. I was able to become proficient in these technologies thanks to Amit's training in a short amount of time. His approach to instruction is effective, relevant, and practical, and each lesson demonstrates his dedication to greatness. It has been quite helpful to have Amit's openness to answer questions, offer further advice, and share knowledge gained through participating on a tech panel. His thorough approach, calm manner, and knowledge of big data and cloud technologies have made learning interesting and fulfilling. I am grateful for Amit's guidance, and I heartily endorse him to anyone looking to improve their abilities in these fields. "

View All

Have you attended any class with Amit Raj ?