true

Learn Data Science from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

Topic Modeling in Text Mining : LDA

Ashish R.

13/05/2017 0 0

Latent Dirichlet allocation (LDA)

Topic modeling is a method for unsupervised classification of text documents, similar to clustering on numeric data, which finds natural groups of items even when we’re not sure what we’re looking for. In clustering one entity can belongs to one group only, whereas in topic modeling a word can belongs to multiple groups/clusters with varying level of probability. The input of the model is a text document/ or a set of documents. The out of the model is to split the documents into multiple K groups and then determining a topic from each group based on the association of the most important words in the respective group. The number of topic which is equivalent to the number of clusters in cluster analysis (K) has to be selected based various heuristics on how many topics might be extracted from the document/s. LDA treats each document as a mixture of topics, and each topic as a mixture of words. This allows documents to “overlap” with each other in terms of content, rather than being separated into discrete groups.

As an output of LDA model, if we decide to find out K topics then our set of documents are segregated into K groups. The key words or the tokens in each group receive a beta value describing how strong the tokenized word is associated with many other words (tokens) within the group. The larger the value of beta explains the more importance of the word in that group. Top 6-10 words with the largest beta values are chosen to decide the topic that is depicting by that group of words. The topic is decided based on human intelligence on understanding the meaning of those words in the underlying context of the collected documents.

How to determine the number of topic from a set of documents

Hierarchical clustering analysis is performed on the group of words that are collected from the corpus to determine the number of clusters to form. Using distance metric like Levenshtein distance, Hamming Distance etc., the distance among the words are plotted in a dendrogram. The vertical axis of the dendrogram scales the chosen distance metric. Based on the word cloud formation, we decide what distance to consider as a cut off distance to determine the number of appropriate groups to be formed with the set of documents. This is similar like hierarchical clustering with numeric data values where usually Euclidean distance is considered by default.

0 Like 0 Dislike

Follow 0

Other Lessons for You

Beware Of Trainers Of Data Science.

Most of the trainers in the market are teaching DATA SCIENCE as 1) Some software tools like R/Python/SAS/Hadoop etc 2)They are spending less amount of time on Mathematics and Statistics(Mostly 10 hrs...

Data Labs Training And Consulting Services

2 1

Linear Regression and its types

Linear Regression A Linear regression is a Regression Analysis technique which is used for modeling the predictions on the continuous data. A Linear Regression can be modelled using 1. A Simple Regression...

Talla Veerendranath

0 0

1st Lesson -Data Science -Introduction

Here, I am going to cover on - What is Data Science, skills required to a data scientist and general tasks that data scientist do What is Data Science?This is an exciting discipline where we take the...

Sree Latha K.

1 0

Data Scientist Vs Data Analyst

Data Scientist – Rock Star of IT A Data Scientist is a professional who understands data from a business point of view. He is in charge of making predictions to help businesses take accurate decisions....

Ramesh R.

4 0

Outlier

Outliers* An Outlier is an observation point that is distant from other observations.* An outlier may indicate an experimental error, or it may be due to variability in the measurement. * Outliers are...

Nitish Vig

0 0

Find Data Science Classes near you

Looking for Data Science Classes?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

Data Science Questions

which is the best college or institute for Data analysis course certificate with Fresher placement support in pune?

5 Answers

I want to get into data science but I dont have any prior knowledge on any of the programing languages, how do I go about it?

19 Answers

How to learn Data Science?

6 Answers

I have been in the teaching field for 4+ years working as an assistant professor now I need to get into...

20 Answers

Is that possible to do machine learning course after b.com,mba Finance and marketing?

24 Answers

Looking for Data Science Classes?

The best tutors for Data Science Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Migrate

Class

Subject

Chapter

Migrate to

Concept Solution

Migration Successful

Learn Data Science with the Best Tutors

The best Tutors for Data Science Classes are on UrbanPro

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.