Become Spark Developer and Expert in Big Data Domain.
Course Content:
Section 1: Unix
- Unix Fundamentals
- Architecture
- Process Management Commands
- File Compression Commands
- System Information Commands
- vi Editor
- Unix Shell Scripting
- Shell Variables
- Arithmetic Operators
- Logical and String Operators
- File Operators
- Control Structures
- Shell Substitution
Section 2: Hadoop
- Hadoop Architecture
- Namenode, Datanode
- Filesystem Namespace
- Data Replication
- FSImage and EditsLog
- Checkpoint
- HDFS High Availability
- HDFS Commands using REST API
- YARN Architecture
- Scheduler and App Manager
- Node Manager
- YARN Schedulers - Capacity and Fair
- MapReduce 2.0
- Serialization and Deserialization
- Mapper, Reducer, Shuffle, Sort
- Partitioner
- Job Configuration
- Hive
- What is Datawarehouse?
- How Hive helps with ETL?
- Hive Datatypes - primitive and collection
- Managed and External Tables
- Partitioning - Static and Dynamic
- Bucketing
- Hive UDFs
Section 3: Spark Programming
- Scala Programming
- Imperative style v/s Functional style
- Scala Variables and Datatypes - Mutable, Immutabale, Variable Type Inference
- Classes and Objects
- Access Modifiers
- Control Structures
- Closures, Operators, Strings and Arrays
- Scala Functions
- Traits
- Collections
- Lists, Tuples, Collection Functions, Map, Array Buffer, Exception Handling
Section 4: Spark
- Core Spark Concepts
- Spark Architecture and Shell
- Creating RDDS
- Transformation and Actions on RDDs
- Spark Variables
- Data Partitioning
- Spark SQL, DataFrames and Datasets
- Spark Streaming
- Transformations and Actions on Streaming Data
- Kafka
- Installation of Kafka and Zookeeper
- Messages and Batches
- Producer, Consumer, Topics
- Offsets and Consumer Groups
- Brokers and Clusters
- Write Kafka API using Scala
- Capstone Project using Spark/Scala/Kafka on a complex dataset