Students without any prior knowledge can opt the course,
Course will cover the following
R,
Python,
Statistics
Machine Learning,
Tableau Visualization
1. Introduction
â?¢ What is Data Science
â?¢ Evolution of Analytics
â?¢ Data Science Components
â?¢ Data Scientist Skillset
â?¢ Types of Data Scientists
â?¢ Introduction to Machine Learning
â?¢ Data Science Process
2. Analytic Techniques using R
2.1 Introduction to R Programming
â?¢ When and Why to use R for Analytics
â?¢ Types of Objects in R
â?¢ Naming Conventions in R
â?¢ Creating Objects in R
â?¢ Data Structure in R
â?¢ Matrix, Data Frame, String, Vectors
â?¢ Understanding Vectors & Data input in R
â?¢ Lists, Data Elements
â?¢ Creating Data Files using R
â?¢ Importing Data Files from other sources.
â?¢ Know your Data
2.2 Data Manipulation & Exploration in R
â?¢ Sorting Data
â?¢ Sub-setting Data
â?¢ Selecting (Keeping) Variables
â?¢ Excluding (Dropping) Variables
â?¢ Selecting Observations and Selection using Subset Function
â?¢ Merging Data
â?¢ Adding Rows
â?¢ Data Type Conversion
â?¢ Built-In Numeric Functions
â?¢ Built-In Character Functions
â?¢ User Built Functions
â?¢ Control Structures
â?¢ Loop Functions
â?¢ Outlier & Missing Values
3. Advanced Excel
4. Statistical Concepts & Application
4.1 Descriptive Statistics
â?¢ Data Basics
â?¢ Observations, variables, and data matrices
â?¢ Types of variables
â?¢ Relationships between variables
â?¢ Central Tendency
â?¢ Measures of Central Tendency
- Arithmetic Mean / Average
- Mode
- Median
- Standard Deviation
- Variance
4.2 Data Visualization
â?¢ BAR Graph
â?¢ Pie Chart
â?¢ Box Plot
â?¢ Scatter Plot
â?¢ Histograms
â?¢ Bimodal & Multimodal Histograms
â?¢ Frequency Chart
â?¢ Line Charts
â?¢ Basic Statistics & Data Visualization in R
4.3 Probability Basics
â?¢ Notation and Terminology
â?¢ Unions and Intersections
â?¢ Conditional Probability and Independence
Page 1 of 7
4.4 Probability Distributions
â?¢ Random Variable
â?¢ Probability Distributions
â?¢ Probability Mass Function
â?¢ Parameters vs. Statistics
â?¢ Binomial Distribution
â?¢ Poisson Distribution
â?¢ Normal Distribution
â?¢ Standard Normal Distribution
â?¢ Central Limit Theorem
â?¢ Cumulative Distribution function
â?¢ Probability Distributions in R
4.5 Probability Distributions Sampling
â?¢ Random Sampling
â?¢ Convenient Sampling
â?¢ Stratified Random Sampling
4.6 Inferential Statistics
â?¢ Hypothesis Testing
- Null Hypothesis
- Alternate Hypothesis
- Level of Significance
- P-Value, Normality
- Decision Criteria
â?¢ Tests of Hypothesis
- One Sample: Testing Population Mean
- Hypothesis in One Sample z-test
- Two Sample: Testing Population Mean
- One Sample t-test
- Two Sample t-test
- Paired t-test
- Hypothesis in Paired Samples t-test
- Chi-Square test
- Hypothesis in Chi-Square test
- F test, Hypothesis in F test
4.7 ANOVA (Analysis of Variance)
â?¢ Hypothesis in Analysis of Variance
â?¢ General setup of ANOVA
â?¢ Nonparametric Test and a Parametric Test
â?¢ Tests of Hypothesis using R
â?¢ Analysis of Variance Using R
5. Analytic Techniques using Python
5.1 Basics of Python Language
â?¢ When and Why to use Python for Analytics
â?¢ Introduction & Installation of Python
â?¢ Python Syntax
â?¢ Strings
â?¢ Lists and Dictionaries
â?¢ Loops
â?¢ Regular Expressions
5.2 Introduction to Pandas
â?¢ Selecting data from Pandas DataFrame
â?¢ Slicing and dicing using Pandas
â?¢ GroupBY / Aggregate
â?¢ Strings with Pandas
â?¢ Cleaning up messy data with Pandas
â?¢ Dropping Entries
â?¢ Selecting Entries
5.3 Data Manipulation using Pandas
â?¢ Data Alignment
â?¢ Sorting and Ranking
â?¢ Summary Statistics
â?¢ Missing values
â?¢ Merging data
â?¢ Concatenation
â?¢ Combining DataFrames
â?¢ Pivot
â?¢ Duplicates
â?¢ Binning
5.4 Visualization with MatplotLib
â?¢ Anatomy of a Matplotlib Plot
â?¢ Matplotlib Installation
â?¢ Matplotlib Basic Plots & it's Containers
â?¢ A MatplotLib Figure, it's components and properties
â?¢ Axes and other graphical objects
â?¢ Pylab & Pyplot
â?¢ Scatter plots
â?¢ 2D Plots-Straight Lines & Curves
â?¢ Histograms
â?¢ Pie Charts
â?¢ Bar Graphs
â?¢ Box Plots
â?¢ Data for Matplotlib Plots
â?¢ Set up Title, Axes Labels, Legend, Layout
â?¢ Showing, Saving and Closing Your Plot
5.5 Scientific Libraries in Python
â?¢ Numpy
â?¢ Scikit-Learn
6. Machine Learning
6.1 Fundamentals of Machine Learning
â?¢ Overview & Terminologies
â?¢ What is Machine Learning?
â?¢ Why Learn?
â?¢ When is Learning required?
â?¢ Data Mining
â?¢ Application Areas and Roles
â?¢ Types of Machine Learning
- Supervised Learning
- Unsupervised Learning
-- Supervised Learning --
Predictive Models
(Simple/Multiple/Logistic Regression)
7. Simple Linear Regression
â?¢ Correlation
â?¢ Regression
â?¢ Model Assumptions
â?¢ Estimation Process
â?¢ Least Squares Method
â?¢ The Coefficient of Determination
â?¢ Correlation and Regression Using R & Python
â?¢ Simple Linear Regression Assignments
8. Multiple Regression Analysis
â?¢ Introduction
â?¢ Design Requirements
â?¢ Assumptions
â?¢ Independence
â?¢ Normality, Homoscedasticity, Linearity
â?¢ Multiple Regression
â?¢ Formal Statement of the Model
â?¢ Estimating parameters of the model
â?¢ F-test for the overall fit of the model
â?¢ Multiple regression model Building
â?¢ Selecting the best Regression equation
â?¢ Examples/Use Cases
â?¢ Interpreting the Final Model
â?¢ Multicollinearity and its Diagnostics
â?¢ Examples/Use Cases
â?¢ Qualitative Independent Variables
â?¢ Indicator variables
â?¢ Interpretation of Regression Coefficients
â?¢ Examples/Use Cases
â?¢ Regression Diagnostics and Residual Analysis
â?¢ Multiple Linear Regression Using R & Python
â?¢ Multiple Regression Assignment
9. Logistic Regression Analysis
â?¢ Theory Behind Logistic Regression
- Assessing the Model and Predictors
â?¢ When and Why do we Use Logistic Regression?
- Binary
- Multinomial
â?¢ Interpreting Logistic Regression
â?¢ Assumptions
â?¢ Sample size requirements
â?¢ The logistic function & Interpretation
â?¢ Methods for including variables
â?¢ Computational method
10. Decision Trees
â?¢ Understanding the Concept
â?¢ Internal decision nodes
â?¢ Terminal leaves.
â?¢ Tree induction: Construction of the tree
â?¢ Classification Trees
â?¢ Entropy
â?¢ Selecting Attribute
â?¢ Information Gain
â?¢ Partially learned tree
â?¢ Overfitting
â?¢ Causes for over fitting
â?¢ Overfitting Prevention (Pruning) Methods
â?¢ Reduced Error Pruning
â?¢ Decision trees - Advantages & Drawbacks
â?¢ Ensemble Models
â?¢ Decision Trees using Python
â?¢ Decision Trees Assignment
â?¢ Logistic Regression Model using R & Python
â?¢ Logistic Regression Assignment
Page 4 of 7
12. Random Forests
â?¢ Introduction & Motivation
â?¢ Ensemble Methods - Bagging, Boosting & Random Forests
â?¢ Ensemble Classifiers
â?¢ Ensemble Models
â?¢ How random forests work?
â?¢ Gini Index
â?¢ Operation of Random Forest
â?¢ Random forest algorithm
â?¢ Common variables for random forests
â?¢ Random Forest â?? practical consideration
â?¢ Random Forest â?? Features, Advantages and Disadvantages
â?¢ Limitations of random forests
â?¢ Random Forest using Python
14. Bayesian Theory
â?¢ Axioms of Probability Theory
â?¢ Conditional Probability
â?¢ Independence
â?¢ Joint Distribution
â?¢ Bayeâ??s Rule
â?¢ Bayesian Categorization
â?¢ Generative Probabilistic Models
� Naïve Bayes Generative Model
� Naïve Bayesian Categorization
â?¢ Example & Exercises
� Naïve Bayes Classifier using Python
15. K-Nearest Neighbor (K-NN)
â?¢ Non-parametric methods
â?¢ k-Nearest Neighbor Estimator
â?¢ How to Choose k or h
â?¢ Strengths and Weaknesses
â?¢ K-Nearest Neighbor using Python
16. Intro to Dimensionality Reduction
â?¢ Principal Components Analysis (PCA)
â?¢ PCA using Python
Page 5 of 7
-- Un-Supervised Learning --
17. K Means Clustering
â?¢ Parametric Methods Recap
â?¢ Clustering
â?¢ Direct Clustering Method
â?¢ Mixture densities
â?¢ Classes v/s Clusters
â?¢ Non-Hierarchical Clustering
â?¢ K-Means
â?¢ Distance Metrics
â?¢ K-Means Algorithm
â?¢ K-Means Objective
â?¢ Color Quantization
â?¢ Vector Quantization
â?¢ Encoding/Decoding
â?¢ Soft Clustering
â?¢ Expectation Maximization (EM)
â?¢ EM Algorithm
â?¢ Feature Selection vs Extraction
â?¢ Seed Choice
â?¢ Uses of Clustering
â?¢ Clustering as Pre-processing
â?¢ K-Means Clustering using Python
18. Time Series
â?¢ The Art of Forecasting
â?¢ Forecasting Approaches
â?¢ Qualitative Forecasting Methods
â?¢ Quantitative Forecasting Methods
â?¢ Time Series & its Components
- Trend
- Cyclical
- Seasonal
- Irregular
â?¢ Smoothing Methods
- Moving Average Method
- Exponential Smoothing Method
â?¢ Forecast Effect of Smoothing Coefficient
â?¢ Linear Time-Series Forecasting Model
â?¢ Forecast using Trend Models
â?¢ The Linear Trend Model
â?¢ Time Series Plot
â?¢ Seasonality Plot
â?¢ Trend Analysis
â?¢ Quadratic Time-Series Forecasting Model
â?¢ Quadratic Time-Series Model Relationships
â?¢ Quadratic Trend Model
â?¢ Exponential Time-Series Forecasting Model
â?¢ Exponential Weight
â?¢ Exponential Trend Model
â?¢ Autoregressive Modeling
â?¢ Time Series Data Plot
â?¢ Auto-correlation Plot
â?¢ Evaluating Forecasts
â?¢ Quantitative Forecasting Steps
â?¢ Forecasting Guidelines
â?¢ Pattern of Forecast Error
â?¢ Residual Analysis
â?¢ Time Series Using Python
Page 6 of 7
19. Tableau
19.1 Tableau Fundamentals
- Uses of Tableau
- Tableau Installation
- Help MENU AND SAMPLES
- Connecting to data
- Dimensions and measures
- Tableau interfaces
- Single table & Multiple table
- Copy and Paste
19.2 Data Visualization with Tableau
- Features of Tableau
- Exporting Data
- Connecting Sheets
- Making Basic Visualization
- Making sense out of visuals
- Bullet Charts
- Dual Axis
- Reference Lines
- Pareto Charts
- Waterfall Charts
- Joins
- Other Advance Techniques
19.3 Conditional Formatting & Scripting
- How to make Charts with Conditions
- Calculated Fields
- Drill Down
- Drill in Effects
- Date/Time Manipulations
19.4 Dashboard Integration
- Dashboard Designing
- Action Settings
- Linking Charts and Storytelling
- Capstone Project
- Interview Preparation
- Resume Building
- Post-Training Engagement