Industrial Training -1

LIVE

40 Hours

Course offered by ThinkNod Faridabad

0 review

overview batches reviews

Hadoop course by ThinkNod is being designed considering latest updates in Industry.

JAVA FUNDAMENTALS

Java is a high-level programming language originally developed by Sun Microsystems and released in 1995. Java runs on a variety of platforms, such as Windows, Mac OS, and various versions of UNIX. This module will take you through simple and practical approach while learning Java Programming language. It consists of the essentials that a candidate should know to begin learning about Hadoop.

HADOOP FUNDAMENTALS

Hadoop is indispensable when it comes to processing big data! This module is your introduction to Hadoop Architecture, its file system (HDFS), its processing engine (MapReduce), and many libraries and programming tools associated with Hadoop.

HDFS

The Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS is a distributed file system that provides high-performance access to data across Hadoop clusters. Like other Hadoop-related technologies, HDFS has become a key tool for managing pools of big data. HDFS is built to support applications with large data sets, including individual files that reach into terabytes.

MAPREDUCE

MapReduce is a core component of the Apache Hadoop software framework. Hadoop enables resilient, distributed processing of massive unstructured data sets across commodity computer clusters, in which each node of the cluster includes its own storage. MapReduce serves two essential functions: It parcels out work to various nodes within the cluster or map, and it organizes and reduces the results from each node into a cohesive answer to a query.

SPARK

A new name has entered many of the conversations around big data recently. Some see the popular newcomer Apache Spark as a more accessible and more powerful replacement for Hadoop. Others recognize Spark as a powerful complement to Hadoop and other more established technologies, with its own set of strengths, quirks and limitations. Spark, like other big data tools, is powerful, capable, and well-suited to tackling a range of data challenges.

HIVE

Apache Hive is an open-source data warehouse system built on Hadoop for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for managing large datasets in a distributed computing environment and Hive helps in indexing, metadata storage, built-in user defined functions and more.

PIG

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. Pig’s language layer currently consists of a textual language called Pig Latin.

HBASE

HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and written in Java. It is developed as part of Apache Software Foundation’s Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. It provides a fault-tolerant way of storing large quantities of sparse data.

SQOOP

Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.

YARN

Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology. YARN is one of the key features in the second-generation Hadoop 2 version of the Apache Software Foundation’s open source distributed processing framework. Originally described by Apache as a redesigned resource manager, YARN is now characterized as a large-scale, distributed operating system for big data applications.

MONGODB

MongoDB is an open source database that uses a document-oriented data model. MongoDB is one of several database types to arise in the mid-2000s under the NoSQL banner. Instead of using tables and rows as in relational databases, MongoDB is built on an architecture of collections and documents. Documents comprise sets of key-value pairs and are the basic unit of data in MongoDB. Collections contain sets of documents and function as the equivalent of relational database tables.

About the Trainer

Avg Rating

0 Reviews

2 Students

2 Courses

ThinkNod Faridabad

B.Tech, M.Tech Computer Science

She is a topper in studies thru out with working experience of over 4 years. She has trained herself on Big Data, Python, R programming. She is also qualified faculty for Corporate and Colleges.

Students also enrolled in these courses

Industrial Training -1

LIVE

40 Hours

View this Course

Course offered by Lovely Gupta

0 review

Tutor has not setup batch timings yet. Book a Demo to talk to the Tutor.

Different batches available for this Course

No Reviews yet!