true

Learn Advanced Statistics from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

Principal component analysis- A dimension reduction technique

08/12/2016 0 0

In simple words, principal component analysis(PCA) is a method of extracting important variables (in form of components) from a large set of variables . It extracts low dimensional set of features from a high dimensional data set with a motive to capture as much information as possible. With fewer variables, visualization also becomes much more meaningful. This is why PCA is called dimension reduction technique. PCA is more useful when dealing with higher dimensional data and the variables have significant correlation among them.

Principal components analysis is one of the simplest of the multivariate methods. The objective of the analysis is to take p variables (x1,x2,x3.....xp) and find linear combination of these to produce transformed variabels (z1,z2,z3...zp) so that they are uncorelated in order of their importance and that describe the overall variation in the data set.

The lack of correlation means that the indices are measuring different “dimensions” of the data, and the ordering is such that var(z1)≥var(z2)≥var(z3)....var(zp), where var denotes the variance of . The Z indices are then the principal components. When doing principal components analysis, there is always the hope that the variances of most of the indices will be as low as to be negligible. In that case, most of the variation in the full data set can be adequately described by the few Z variables with variances that are not negligible, and some degree of economy is then achieved. For this reason this is also called dimension reduction technique. Often the significant variances explained by the Z variables have a dominant load factor associated with the original X variables and Z describe a specific degree of quantitative or qualitative nature of the X attributes. Hence such newly formed Z variables are called latent factor analysis.

Principal components analysis does not always work, in the sense that a large number of original variables are reduced to a small number of transformed variables. Indeed, if the original variables are uncorrelated, then the analysis achieves nothing. The best results are obtained when the original variables are very highly correlated, positively or negatively. If that is the case, then it is quite conceivable that for example 20 or more original variables can be adequately represented by two or three principal components. If this desirable state of affairs does occur, then the important principal components will be of some interest as measures of the underlying dimensions in the data. It will also be of value to know that there is a good deal of redundancy in the original variables, with most of them measuring similar things.

Where it is used?

A multi-dimensional hyper-space is often difficult to visualize. The main objectives of unsupervised learning methods are to reduce dimensionality, scoring all observations based on a composite index and clustering similar observations together based on multivariate attributes. Summarizing multivariate attributes by two or three variables that can be displayed graphically with minimal loss of information is useful in knowledge discovery. Because it is hard to visualize a multi-dimensional space, PCA is mainly used to reduce the dimensionality of d multivariate attributes into two or three dimensions.

PCA summarizes the variation in correlated multivariate attributes to a set of non-correlated components, each of which is a particular linear combination of the original variables. The extracted non-correlated components are called Principal Components (PC) and are estimated from the eigenvectors of the covariance matrix of the original variables. Therefore, the objective of PCA is to achieve parsimony and reduce dimensionality by extracting the smallest number components that account for most of the variation in the original multivariate data and to summarize the data with little loss of information.

A few use cases where PCA is used:
Survey data: Any kind of market survey data which is collected in a Likert scale (0-5/0-10 etc.) can be used to derived principal components that can describe a specific sentiment of the customers/participants in the survey. The principal components with Eigen value >1 are the important ones to be considered.

Market mix model: In developing market mix model usually 52-104 weeks of sales and marketing spend data along with many brand image variables that are measured in monthly/quarterly basis are used to derive the contribution of the marketing spends in generating revenue. In the overall ROI calculation a mix model is developed. Realized sales/Revenue/Pipeline sales are modeled with the help of many spend related attributes and its various derived adstock values . In such scenario PCA is used to reduce the overall dimension of the data.

Brand image: To create brand image from many brand variables often PCA is used to calculate brand value index

NPA score calculation: In the calculation of NPA (Net promoter score) from customer survey data often PCA is used by considering the overall effect of all the considered variables

CSAT score calculation: Similarly in CSAT score calculation PCA is used.

0 Like 0 Dislike

Follow 0

Other Lessons for You

What is Dummy Regression?

What is a Dummy variable? A Dummy variable or Indicator Variable is an artificial variable created to represent an attribute with two or more distinct categories/levels. Basically the binary variables...

Ashish R.

0 0

What Is Cart?

CART means classification and regression tree. It is a non-parametric approach for developing a predictive model. What is meant by non-parametric is that in implementing this methodology, we do not have...

Ashish R.

0 0

What is Logistic Regression Model ?

Logistic regression is a form of regression which is used when the dependent is a dichotomy (yes or no) and the independents of any type (either continuous or binary). Logistic regression can be used...

Ashish R.

0 0

Decision Tree or Linear Model For Solving A Business Problem

When do we use linear models and when do we use tree based classification models? This is common question often been asked in data science job interview. Here are some points to remember: We can use any...

Ashish R.

0 0

Topic Modeling in Text Mining : LDA

Latent Dirichlet allocation (LDA) Topic modeling is a method for unsupervised classification of text documents, similar to clustering on numeric data, which finds natural groups of items even when we’re...

Ashish R.

0 0

Find Advanced Statistics Training near you

Looking for Advanced Statistics Training?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

Looking for Advanced Statistics Classes?

The best tutors for Advanced Statistics Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Learn Advanced Statistics with the Best Tutors

The best Tutors for Advanced Statistics Classes are on UrbanPro

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.