true

Learn Data Science from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

Use Data Science To Find Credit Worthy Customers

14/07/2017 0 0

K-nearest neighbor classifier is one of the simplest to use, and hence, is widely used for classifying dynamic datasets. Click on the link to see how easy it is to classify credit-worthy vs credit-risk customers:

gc

##   Default checkingstatus1 duration history purpose amount savings employ
## 1       0             A11        6     A34     A43   1169     A65    A75
## 2       1             A12       48     A32     A43   5951     A61    A73
## 3       0             A14       12     A34     A46   2096     A61    A74
## 4       0             A11       42     A32     A42   7882     A61    A74
## 5       1             A11       24     A33     A40   4870     A61    A73
## 6       0             A14       36     A32     A46   9055     A65    A73
##   installment status others residence property age otherplans housing
## 1           4    A93   A101         4     A121  67       A143    A152
## 2           2    A92   A101         2     A121  22       A143    A152
## 3           2    A93   A101         3     A121  49       A143    A152
## 4           2    A93   A103         4     A122  45       A143    A153
## 5           3    A93   A101         4     A124  53       A143    A153
## 6           2    A93   A101         4     A124  35       A143    A153
##   cards  job liable tele foreign
## 1     2 A173      1 A192    A201
## 2     1 A173      1 A191    A201
## 3     1 A172      2 A191    A201
## 4     1 A173      2 A191    A201
## 5     2 A173      2 A191    A201
## 6     1 A172      2 A192    A201

## Taking back-up of the input file, in case the original data is required later

gc.bkup

##      duration.V1          amount.V1         installment.V1   
##  Min.   :-1.401713   Min.   :-1.070329   Min.   :-1.7636311  
##  1st Qu.:-0.738298   1st Qu.:-0.675145   1st Qu.:-0.8697481  
##  Median :-0.240737   Median :-0.337176   Median : 0.0241348  
##  Mean   : 0.000000   Mean   : 0.000000   Mean   : 0.0000000  
##  3rd Qu.: 0.256825   3rd Qu.: 0.248338   3rd Qu.: 0.9180178  
##  Max.   : 4.237315   Max.   : 5.368103   Max.   : 0.9180178

## Let's predict on a test set of 100 observations. Rest to be used as train set.

set.seed(123) 
test

## [1] 68

100 * sum(test.def == knn.5)/100  # For knn = 5

## [1] 74

100 * sum(test.def == knn.20)/100 # For knn = 20

## [1] 81

## If we look at the above proportions, it's quite evident that K = 1 correctly classifies 68% of the outcomes, K = 5 correctly classifies 74% and K = 20 does it for 81% of the outcomes. 

## We should also look at the success rate against the value of increasing K.

table(knn.1 ,test.def)

##      test.def
## knn.1  0  1
##     0 54 11
##     1 21 14

## For K = 1, among 65 customers, 54 or 83%, is success rate. Let's look at k = 5 now

table(knn.5 ,test.def)

##      test.def
## knn.5  0  1
##     0 62 13
##     1 13 12

## For K = 5, among 76 customers, 63 or 82%, is success rate.Let's look at K = 20 now

table(knn.20 ,test.def)

##       test.def
## knn.20  0  1
##      0 69 13
##      1  6 12

##For K = 20, among 88 customers, 71 or 80%, is success rate.

## It seems increasing K increases the classification but reduces success rate. It is worse to class a customer as good when it is bad, than it is to class a customer as bad when it is good. 
## By looking at above success rates, K = 1 or K = 5 can be taken as optimum K.
## We can make a plot of the data with the training set in hollow shapes and the new ones filled in. 
## Plot for K = 1 can be created as follows - 

plot(train.gc[,c("amount","duration")],
           col=c(4,3,6,2)[gc.bkup[-test, "installment"]],
           pch=c(1,2)[as.numeric(train.def)],
           main="Predicted Default, by 1 Nearest Neighbors",cex.main=.95)
     
     points(test.gc[,c("amount","duration")],
            bg=c(4,3,6,2)[gc.bkup[-test,"installment"]],
            pch=c(21,24)[as.numeric(knn.1)],cex=1.2,col=grey(.7))
     
     legend("bottomright",pch=c(1,16,2,17),bg=c(1,1,1,1),
            legend=c("data 0","pred 0","data 1","pred 1"),
            title="default",bty="n",cex=.8)
     
     legend("topleft",fill=c(4,3,6,2),legend=c(1,2,3,4),
            title="installment %", horiz=TRUE,bty="n",col=grey(.7),cex=.8)

0 Like 0 Dislike

Follow 0

Saumya Rajen Shah | 28/07/2017

Why didn't you use K-means instead. For KNN, you are supposed to have labels beforehand, what if you never know who was credit worthy?

0 0

Other Lessons for You

What is Time Series?

What is a Time Series? Time Series data is a series of data points indexed or listed or graphed with an equally spaced period. Time series forecasting is the use of the model to predict future values...

Pavan Balaji N

0 0

Basics of K means classification- An unsupervised learning algorithm

K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set with n objects through...

Ashish R.

0 0

Regularisation in Machine Learning

Regularization In Machine Learning, Regularization is the concept of shrinking or regularizing the coefficients towards zero. It helps the model to prevent overfitting. Overfitting in Machine Learning...

Talla Veerendranath

0 0

1st Lesson -Data Science -Introduction

Here, I am going to cover on - What is Data Science, skills required to a data scientist and general tasks that data scientist do What is Data Science?This is an exciting discipline where we take the...

Sree Latha K.

1 0

Mathematics used in various Machine learning concepts

Mathematics is the building block for data science. This blog focuses on various mathematical concepts that are used in machine learning. The mathematical concepts used for machine learning are categorized...

Akash L kulkarni

0 0

Find Data Science Classes near you

Online Data Science Instructor

Looking for Data Science Classes?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

Data Science Questions

How to learn Data Science?

6 Answers

Hi, currently I am working as associate systems engineer. But I am really interested in data science....

13 Answers

Which is the best institute or college for a data scientist course with placement support in Pune?

18 Answers

I want to learn data science in home itself bcz i dont want much time to take any coaching and also most...

17 Answers

Is that possible to do machine learning and Data science course after B.com, MBA Finance and marketing students and how is career growth?

24 Answers

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.

Use Data Science To Find Credit Worthy Customers

Other Lessons for You

Looking for Data Science Classes?

Data Science Questions

Looking for Data Science Classes?

Learn Data Science with the Best Tutors