Introduction to Big Data
- Overview of Big Data Technologies and its role in Analytics
- Big Data challenges & solutions
- Data Science vs Data Engineering
- FOUR V's of Big Data given by Google.
Unix & Java
- Introduction to UNIX shell.
- Basic Commands of UNIX
- Create
- Copy
- Move
- Delete etc.
- Basic of JAVA Programming Language
- Architecture JVM, JRE, JIT
- Control Structures
- OOP's Concept in Java
- String Classes/Array/Exception Handling
- Collection Classes
Apache HDFS
- Understanding the problem statement and challenges persisting to such large data to
- perceive the need of Distributed File System.
- Understanding HDFS architecture to solve problems
- Understanding configuration and creating directory structure to get a solution of the given
- problem statement
- Setup appropriate permissions to secure data for appropriate users
- Setting up Java Development with HDFS libraries to use HDFS Java APIs
Apache Map-Reduce
- What is Map Reduce.
- Input and output formats.
- Data Types in Map Reduce.
- Flow of Map Reduce Jobs.
- Wordcount In Map Reduce.
- How to use Custom Input Formats
- Use case for Structure Data Sets.
- Writing Custom Classes.
- What is HIVE.
- Architecture of HIVE.
- Tables in Hive with Load Functions.
- Query Optimization.
- Partitioning and Bucketing.
- Joins in HIVE.
- Indexing In HIVE.
- File Formats in HIVE.
- How to read JSON files in HIVE.
- What is Sqoop.
- Relation between SQL & Hadoop.
- Performing Sqoop Import.
- Incremental and Conditional Imports
- Performing Sqoop Export.
- What is PIG & ETL.
- Introduction to PIG Architecture.
- Introduction of PIG Latin.
- How to Perform ETL on any Kind of data
- (PIG Eats Everything)
- Use cases of PIG.
- Joins in PIG.
- Co-grouping In PIG.
Introduction to NoSQL Database &OOZIE
- What is HBASE.
- Architecture of HBASE.
- CRUD operations in HBASE
- Retrival of HBASE Data.
- Introduction of Apache Oozie (Scheduler tool)
Introduction to Programming in Scala
- Basic data types and literals used
- List the operators and methods used in Scala
- Classes of Scala
- Traits of Scala.
- Control Structures in Scala.
- Collection of Scala.
- Libraries of Scala.
Introduction to Spark
- Limitations of MapReduce in Hadoop Objectives
- Batch vs. Real-time analytics
- Application of stream processing
- Spark vs. Hadoop Eco-system
Using RDD for Creating Applications in Spark
- Features of RDDs
- How to create RDDs
- RDD operations and methods
- Explain RDD functions and describe how to
- write different codes in Scala
Running SQL queries Using SparkQL
- Explain the importance and features of SparkQL
- Describe methods to convert RDDs to
- DataFrames
- Explain concepts of SparkSQL
Describe the concept of hive integration
Spark ML Programming
- Explain the use cases and techniques of
- Machine Learning (ML)
- Describe the key concepts of Spark ML
- Explain the concept of an ML Dataset, and ML
- algorithm, model selection via cross validation