true

Learn Big Data from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

Different Data File Formats in Big Data

Raghunandana S K

17/03/2022 0 0

Overview

In this lesson I will be explaining the different kinds of Data File formats used in Big Data, These are widely used but unspoken of. Anyone aspiring to be a Data Engineer/Data Analyst/ML engineer should be aware of this hidden magic in big data.

Why do we even need a data file format?

- Imagine storing all information you get in a random order, Will this help? No, it won't

- Every company gets 10s and 1000s of GB data every day. If these are not stored in a proper format then understanding this data will be difficult.

- More time you spend sorting through the data, The company is missing out on the opportunity to retain customers or generate more orders/revenue.

This is why data file formats are used in Data lake.

Different kinds of file formats present

We have CSV,JSON,ORC,Parquet,Avro. These are the leading data file formats present.

Which one to choose from and why to choose them completely depends on the use case of a company.

Use-case 1:

If you are looking into total sales data from a table let's say orders, Then this requires 1 column in your table sale_amount to be scanned/quired.

Hence storing this table in columnar format is the best way to go.

File formats to choose: ORC, Parquet

Use-case 2 :

If you are trying to identify the customer behavior, What kind of items are customers placing the order?

To gather this information you have to scan row-level information spanning multiple columns like user_id, item_name, sale_amount.

Here, Storing the data in row-level format is the right choice to make.

File formats to choose: CSV, JSON, Avro

Conclusion

It's important to know that we don't have one fit-all. The question you must ask "How will this data be used? " Based on this file format should be defined.

0 Like 0 Dislike

Follow 1

Other Lessons for You

What is Hyperion?

- Its an Business Intelligence tools. Like Brio which was an independent product bought over my Hyperion has converted this product name to Hyperion Intelligence. Is it an OLAP tool? - Yes. You can analyse...

ITech Analytic Solutions

0 0

Big Data & Hadoop - Introductory Session - Data Science for Everyone

Data Science for Everyone An introductory video lesson on Big Data, the need, necessity, evolution and contributing factors. This is presented by Skill Sigma as part of the "Data Science for Everyone" series.

Skill Sigma

0 0

4 Key Things to Learn for Data Science

1. Theory:Use Coursera and EdX for theory, concepts, and applications of probability, statistics, linear algebra, calculus, and machine learning.2. Data Visualisation:Tableau and PowerBI are easy-to-use...

Kavaiya Yashumar Amrutlal

0 1

Python programming - Applications

If you’re thinking of learning Python? Or if you recently started learning it? You may be asking yourself: "What exactly can I use Python for?" There are so many applications for python, but there...

Sambasivarao Karlakunta

0 0

Microsoft Outlook

Microsoft Outlook is the preferred email client used to access Microsoft Exchange Server email. Not only does Microsoft Outlook provide access to Exchange Server email, but it also includes contact, calendaring...

ITech Analytic Solutions

0 0

Find Big Data Training near you

Online Big Data Training

Looking for Big Data Training?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

Big Data Questions

what is the right course in software development for sales person in aviation industry? I am working...

16 Answers

How much beneficial it would be for me to get a job as certified business analyst if I pursue a course...

25 Answers

Which are the best course, big data or data science, for beginners with a non-tech background?

82 Answers

How much time will I take to learn Big Data and after learning how much time will it take to attain a job?

12 Answers

How big data development knowledge will help big data testing. What are the requirements for BIG data testing. Does ETL testing cover big data?

10 Answers

Looking for Big Data Classes?

The best tutors for Big Data Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Learn Big Data with the Best Tutors

The best Tutors for Big Data Classes are on UrbanPro

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.