UrbanPro
true

Learn Data Science from the Best Tutors

  • Affordable fees
  • 1-1 or Group class
  • Flexible Timings
  • Verified Tutors

Search in

Types of Data

Rajendra M.
02/07/2018 0 0

The data, which is under our primary consideration, contains a series of observations and measurements, made various subjects, patients, objects or other entities of interest. They might comprise the results of applying a battery of cognitive tests to a sample of patients with Alzheimer's disease, the taxonomic characteristics of bacteria or the relative proportions of several constituents of
different types of rock (or food), for example. One particular type of multivariate
data set involves the collection of repeated measures of the same characteristics over time. And in a situation that might be termed doubly multivariate, we might
indeed have a multidimensional set of features that are assessed at each of
several time points.

A typical multivariate data matrix, X, will have the form
X = ( ::: ::: : : : =~ l '
Xnl Xn2 Xnp
where the typical element, xii, is the value of the jth variable for the ith individual.
If there are several distinct groups of individuals, one of the xiis might be a
categorical variable with values of I, 2, etc. to distinguish these groups. The
number Of individuals under investigation is n, and the number of observations
taken on each of these n individuals is p. Table l.l gives a hypothetical example
of such a multivariate data matrix. Here n = 10, p = 7 and, for example,
X34 = 135.
In many cases, as in Table 1.1, the variables measured on each of them
individuals will be of different types depending on whether they are conveying

Types of data 3
Table 1.1 Data matrix for a hypothetical example of 10 individuals
Individual Gender Age (yrs) 10 Depression Health Weight (lbs)
1 Male 21 120 Yes Very good 150
2 Male 43 NK No Very good 160
3 Male 22 135 No Average 135
4 Male 86 150 No Very poor 140
5 Male 60 92 Yes Good 110
6 Female 16 130 Yes Good 110
7 Female NK 150 Yes Very good 120
8 Female 43 NK Yes Average 120
9 Female 22 84 No Average 105
10 Female 80 70 No Good 100
Note: NK =not known

Quantitative or merely qualitative information. The most common way of
distinguishing these types is the following:
• Nominal - unordered categorical variables. Examples include treatment
allocation, the gender of the respondent, hair colour, presence or absence of
Depression, and so on.
• Ordinal - where there is ordering but no implication of distance between
The different points of the scale. Examples include social class and self-perception
of health (each coded from I to V, say), and educational level
(no schooling, primary, secondary or tertiary education).
• Interval - where there are equal differences between successive points on the
Scale, but the position of zero is arbitrary. The classic example is the measurement
Of temperature using the Celsius or Fahrenheit scales. In some cases a
the variable such as a measure of depression, anxiety or intelligence, for example,
might be treated as if it were interval-scaled when this, in fact, might be
Difficult to justify. We take a practical approach to such problems
and frequently treat these variables as interval-scaled measures- but the readers
should always question whether this might be a sensible thing to do and
What implications a wrong decision might have.
• Ratio - the highest level of measurement, where one can investigate the
The relative magnitude of scores and their differences, where zero is in the fixed position. The perfect example is the absolute measure of
temperature (in Kelvin, for example) but other common ones include age (or any other time from a fixed event), weight and length.

The qualitative information in Table l. L could have been presented in terms
of numerical codes (as often would be the case in a multivariate data set) such
that Gender= l for males and gender= 2 for females, for example, or Health= 5 when
perfect and Health= l for very poor, and so on. But it is vital that both the
user and consumer of these data appreciate that the same numerical codes (l,
say) will convey utterly different information, depending on the scale of
measurement.
4 Multivariate data and multivariate statistics

A further feature of Table 1.1 is that it contains missing values (NK). Age
has not been recorded for individual number 7, and no IQ value is available
for individuals 2 and 8. Missing observations arise from a variety of reasons,
and it is essential to put some effort into discovering why the view is
missing. One explanation is that such an observation might not apply to that individual. In a taxonomic study, for example, in which the investigator
might wish to classify dinosaur fossils, 'wing length' might be an essential
variable. Dinosaurs without wings will have missing values for this
variable! In other cases the measurement might be missing by accident or
because the respondent either forgot or refused to provide the information.
Occasionally, one might be able to obtain the information from elsewhere or
to repeat the measurement and then replace the missing value with useful
information.

Missing values can cause problems for many of the methods of analysis
described in this text, mainly if there are a lot of them. Although there
are many ways of dealing with missing-data problems (both valid and invalid!),
these are, in general, beyond the scope of this text. One method with universal applicability, however, is to attribute ('estimate') the missing values
from a knowledge of the data that are not missing. Such imputation methods
range from the very simple (replace the missing value with the mean of the
values from subjects with non-missing data, for example) to the technically
complex (multiple imputations acknowledging the stochastic nature of the
data) and are briefly described in Appendix B. However, one should always
keep in mind that the imputed values are virtual measurements. We do not get something for anything! And if there is a substantial proportion of the
individuals with large amounts of missing data one should undoubtedly question
whether any form of statistical analysis is worth the bother.

0 Dislike
Follow 2

Please Enter a comment

Submit

Other Lessons for You

What Is R?
R is fast catching up as a must-know language because of the popularity of Data Science skill. R is a computer programming language which is particularly well suited to handling and sorting the large datasets...

Basics Of R Programming 1
# To know the working directory which is assigned by defaultgetwd()# set the working directory from where you would like to take the files setwd("C:/Mywork/MyLearning/MyStuddocs_UrbanPro/Data") # Assign...

Lesson: Hive Queries
Lesson: Hive Queries This lesson will cover the following topics: Simple selects ? selecting columns Simple selects – selecting rows Creating new columns Hive Functions In SQL, of which...
C

Regularisation in Machine Learning
Regularization In Machine Learning, Regularization is the concept of shrinking or regularizing the coefficients towards zero. It helps the model to prevent overfitting. Overfitting in Machine Learning...

Basics of K means classification- An unsupervised learning algorithm
K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set with n objects through...
X

Looking for Data Science Classes?

The best tutors for Data Science Classes are on UrbanPro

  • Select the best Tutor
  • Book & Attend a Free Demo
  • Pay and start Learning

Learn Data Science with the Best Tutors

The best Tutors for Data Science Classes are on UrbanPro

This website uses cookies

We use cookies to improve user experience. Choose what cookies you allow us to use. You can read more about our Cookie Policy in our Privacy Policy

Accept All
Decline All

UrbanPro.com is India's largest network of most trusted tutors and institutes. Over 55 lakh students rely on UrbanPro.com, to fulfill their learning requirements across 1,000+ categories. Using UrbanPro.com, parents, and students can compare multiple Tutors and Institutes and choose the one that best suits their requirements. More than 7.5 lakh verified Tutors and Institutes are helping millions of students every day and growing their tutoring business on UrbanPro.com. Whether you are looking for a tutor to learn mathematics, a German language trainer to brush up your German language skills or an institute to upgrade your IT skills, we have got the best selection of Tutors and Training Institutes for you. Read more