Here, I am going to cover on - What is Data Science, skills required to a data scientist and general tasks that data scientist do
What is Data Science?
This is an exciting discipline where we take the raw data and turn this data into understanding, insight and knowledge. Amount of data that is growing is too huge and Data is everywhere you can’t escape it. It is from tweets, movies, likes and data upload in social media. It is expected that by 2020, data is expected to grow 44 times the data of 2009. When data was little, we managed with structured data bases, when data is growing we found data warehouses, now data is growing gigantically in form of images, videos, audios, text, so we now discovered data lakes. Data Lakes are used for Big data. Will talk more on this later.
What is job of Data Scientists?
Import data of different forms - which either can be structured or unstructured (will discuss what is structured and unstructured later), clean it (pre-process the data) so we make sure data is ready to read, transform the data to the desired way (get the insights of data), visualize the data (look into the pattern of data with GUI), and build model for this. Transform, Visualize and Model building is cyclic way till we get feel that this is the right pattern. And lastly, we communicate these insights to client. This is all we do, course covers details of this.
Simply we say the process we follow for model building is
- Give features and data sources
- Collect data
- Use math to find patterns
- Build product
As a Data Scientist you need to
- Get a little domain understanding
- Define the problem statement well
- Express results in an easy to understand manner
To build a career in Data Science, you will need to work on the following skills:
- Statistics
- Programming
- Machine Learning
- Communications
- Understanding customer domain
- Asking the right questions of the data
General tasks of a Data Scientist
- Pre-process data to fix data issues like duplicates, missing values, etc.
- Visualize data to the extent possible for better understanding and to see basic patterns
- Identify what kind of a problem it is (Prediction/Forecasting, Classification, Optimization and/or Managing Big Data)
- Identify appropriate modeling techniques and build models
- Analyze results and iterate, as needed
- Visualize outputs and Communicate