Hadoop course by ThinkNod is being designed considering latest updates in Industry.
JAVA FUNDAMENTALS
Java is a high-level programming language originally developed by Sun Microsystems and released in 1995. Java runs on a variety of platforms, such as Windows, Mac OS, and various versions of UNIX. This module will take you through simple and practical approach while learning Java Programming language. It consists of the essentials that a candidate should know to begin learning about Hadoop.
HADOOP FUNDAMENTALS
Hadoop is indispensable when it comes to processing big data! This module is your introduction to Hadoop Architecture, its file system (HDFS), its processing engine (MapReduce), and many libraries and programming tools associated with Hadoop.
HDFS
The Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS is a distributed file system that provides high-performance access to data across Hadoop clusters. Like other Hadoop-related technologies, HDFS has become a key tool for managing pools of big data. HDFS is built to support applications with large data sets, including individual files that reach into terabytes.
MAPREDUCE
MapReduce is a core component of the Apache Hadoop software framework. Hadoop enables resilient, distributed processing of massive unstructured data sets across commodity computer clusters, in which each node of the cluster includes its own storage. MapReduce serves two essential functions: It parcels out work to various nodes within the cluster or map, and it organizes and reduces the results from each node into a cohesive answer to a query.
SPARK
A new name has entered many of the conversations around big data recently. Some see the popular newcomer Apache Spark as a more accessible and more powerful replacement for Hadoop. Others recognize Spark as a powerful complement to Hadoop and other more established technologies, with its own set of strengths, quirks and limitations. Spark, like other big data tools, is powerful, capable, and well-suited to tackling a range of data challenges.
HIVE
Apache Hive is an open-source data warehouse system built on Hadoop for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for managing large datasets in a distributed computing environment and Hive helps in indexing, metadata storage, built-in user defined functions and more.
PIG
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. Pig’s language layer currently consists of a textual language called Pig Latin.
HBASE
HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and written in Java. It is developed as part of Apache Software Foundation’s Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. It provides a fault-tolerant way of storing large quantities of sparse data.
SQOOP
Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.
YARN
Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology. YARN is one of the key features in the second-generation Hadoop 2 version of the Apache Software Foundation’s open source distributed processing framework. Originally described by Apache as a redesigned resource manager, YARN is now characterized as a large-scale, distributed operating system for big data applications.
MONGODB
MongoDB is an open source database that uses a document-oriented data model. MongoDB is one of several database types to arise in the mid-2000s under the NoSQL banner. Instead of using tables and rows as in relational databases, MongoDB is built on an architecture of collections and documents. Documents comprise sets of key-value pairs and are the basic unit of data in MongoDB. Collections contain sets of documents and function as the equivalent of relational database tables.