true

Learn Data Science from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

Types of Data

02/07/2018 0 0

The data, which is under our primary consideration, contains a series of observations and measurements, made various subjects, patients, objects or other entities of interest. They might comprise the results of applying a battery of cognitive tests to a sample of patients with Alzheimer's disease, the taxonomic characteristics of bacteria or the relative proportions of several constituents of
different types of rock (or food), for example. One particular type of multivariate
data set involves the collection of repeated measures of the same characteristics over time. And in a situation that might be termed doubly multivariate, we might
indeed have a multidimensional set of features that are assessed at each of
several time points.

A typical multivariate data matrix, X, will have the form
X = ( ::: ::: : : : =~ l '
Xnl Xn2 Xnp
where the typical element, xii, is the value of the jth variable for the ith individual.
If there are several distinct groups of individuals, one of the xiis might be a
categorical variable with values of I, 2, etc. to distinguish these groups. The
number Of individuals under investigation is n, and the number of observations
taken on each of these n individuals is p. Table l.l gives a hypothetical example
of such a multivariate data matrix. Here n = 10, p = 7 and, for example,
X34 = 135.
In many cases, as in Table 1.1, the variables measured on each of them
individuals will be of different types depending on whether they are conveying

Types of data 3
Table 1.1 Data matrix for a hypothetical example of 10 individuals
Individual Gender Age (yrs) 10 Depression Health Weight (lbs)
1 Male 21 120 Yes Very good 150
2 Male 43 NK No Very good 160
3 Male 22 135 No Average 135
4 Male 86 150 No Very poor 140
5 Male 60 92 Yes Good 110
6 Female 16 130 Yes Good 110
7 Female NK 150 Yes Very good 120
8 Female 43 NK Yes Average 120
9 Female 22 84 No Average 105
10 Female 80 70 No Good 100
Note: NK =not known

Quantitative or merely qualitative information. The most common way of
distinguishing these types is the following:
• Nominal - unordered categorical variables. Examples include treatment
allocation, the gender of the respondent, hair colour, presence or absence of
Depression, and so on.
• Ordinal - where there is ordering but no implication of distance between
The different points of the scale. Examples include social class and self-perception
of health (each coded from I to V, say), and educational level
(no schooling, primary, secondary or tertiary education).
• Interval - where there are equal differences between successive points on the
Scale, but the position of zero is arbitrary. The classic example is the measurement
Of temperature using the Celsius or Fahrenheit scales. In some cases a
the variable such as a measure of depression, anxiety or intelligence, for example,
might be treated as if it were interval-scaled when this, in fact, might be
Difficult to justify. We take a practical approach to such problems
and frequently treat these variables as interval-scaled measures- but the readers
should always question whether this might be a sensible thing to do and
What implications a wrong decision might have.
• Ratio - the highest level of measurement, where one can investigate the
The relative magnitude of scores and their differences, where zero is in the fixed position. The perfect example is the absolute measure of
temperature (in Kelvin, for example) but other common ones include age (or any other time from a fixed event), weight and length.

The qualitative information in Table l. L could have been presented in terms
of numerical codes (as often would be the case in a multivariate data set) such
that Gender= l for males and gender= 2 for females, for example, or Health= 5 when
perfect and Health= l for very poor, and so on. But it is vital that both the
user and consumer of these data appreciate that the same numerical codes (l,
say) will convey utterly different information, depending on the scale of
measurement.
4 Multivariate data and multivariate statistics

A further feature of Table 1.1 is that it contains missing values (NK). Age
has not been recorded for individual number 7, and no IQ value is available
for individuals 2 and 8. Missing observations arise from a variety of reasons,
and it is essential to put some effort into discovering why the view is
missing. One explanation is that such an observation might not apply to that individual. In a taxonomic study, for example, in which the investigator
might wish to classify dinosaur fossils, 'wing length' might be an essential
variable. Dinosaurs without wings will have missing values for this
variable! In other cases the measurement might be missing by accident or
because the respondent either forgot or refused to provide the information.
Occasionally, one might be able to obtain the information from elsewhere or
to repeat the measurement and then replace the missing value with useful
information.

Missing values can cause problems for many of the methods of analysis
described in this text, mainly if there are a lot of them. Although there
are many ways of dealing with missing-data problems (both valid and invalid!),
these are, in general, beyond the scope of this text. One method with universal applicability, however, is to attribute ('estimate') the missing values
from a knowledge of the data that are not missing. Such imputation methods
range from the very simple (replace the missing value with the mean of the
values from subjects with non-missing data, for example) to the technically
complex (multiple imputations acknowledging the stochastic nature of the
data) and are briefly described in Appendix B. However, one should always
keep in mind that the imputed values are virtual measurements. We do not get something for anything! And if there is a substantial proportion of the
individuals with large amounts of missing data one should undoubtedly question
whether any form of statistical analysis is worth the bother.

0 Like 0 Dislike

Follow 2

Other Lessons for You

What Is R?

R is fast catching up as a must-know language because of the popularity of Data Science skill. R is a computer programming language which is particularly well suited to handling and sorting the large datasets...

Ranjit Mishra

0 0

R vs Statistics

I frequently asked the below question from my students: 'Do I You need Statistics to learn R Programming?' The answer is, NO. If you want to learn R programming only, Stat is not required. You can be...

Goutam Dutta

1 0

Lesson: Hive Queries

Lesson: Hive Queries This lesson will cover the following topics: Simple selects ? selecting columns Simple selects – selecting rows Creating new columns Hive Functions In SQL, of which...

Chitra S.

0 0

TOP 10 Tools for Data Science

TOP 10 Tools for Data Science1. Python2. SQL3. R4. Tableau5. PowerBI6. Java7. Julia8. Scala9. SAS10. ExcelTOP 10 Websites for Data Science1. Coursera3. EdX4. Udacity5. Kaggle6. Analytics Vidhya7. KDNuggets8....

Kavaiya Yashumar Amrutlal

0 0

REFERENCE BOOKS FOR DATA SCIENCE

Dear All, You can use the following books to master the DATA SCIENCE Concepts 1) First Course in Probability-Ronald Russel 2)Applied Regression Analysis-Drapper and Smith 3)Applied Multivariate Analysis-Richard...

Data Labs Training and Consulting Services

3 0

Find Data Science Classes near you

Online Data Science Instructor

Looking for Data Science Classes?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

Data Science Questions

I'm looking for a freelance data science trainer to get the training as most of the institutions are...

22 Answers

Is that possible to do machine learning and Data science course after B.com, MBA Finance and marketing students and how is career growth?

24 Answers

Currently I am working as a tester now, and looking to get trained in Data scientist. Will that be a...

17 Answers

What are the topics covered in Data Science?

5 Answers

Hi, currently I am working as associate systems engineer. But I am really interested in data science....

13 Answers

Looking for Data Science Classes?

The best tutors for Data Science Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Learn Data Science with the Best Tutors

The best Tutors for Data Science Classes are on UrbanPro

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.