UrbanPro

Learn C Language from the Best Tutors

  • Affordable fees
  • 1-1 or Group class
  • Flexible Timings
  • Verified Tutors

Search in

How do I write a program for a Fuzzy C-means/Fuzzy K-means algorithm in C/C++ for multi-dimensional data?

Asked by Last Modified  

Follow 1
Answer

Please enter your answer

Tutor

I will help you understand and implement K-Means Clustering Algorithm from scratch without using any Machine Learning libraries. We will further use this algorithm to compress an image. Here, I will implement this code in Python, but you can implement the algorithm in any other programming language of...
read more
I will help you understand and implement K-Means Clustering Algorithm from scratch without using any Machine Learning libraries. We will further use this algorithm to compress an image. Here, I will implement this code in Python, but you can implement the algorithm in any other programming language of your choice just by basically developing 4-5 simple functions. So now, first of all, what exactly is Clustering and in particular K-Means? As discussed in my blog on Machine Learning, Clustering is a type of unsupervised machine learning problem in which, we find clusters of similar data. K-means is the most widely used clustering algorithm. So basically, our task is to find those centers for the clusters around which our data points are associated. These centres of the Clusters are called centroids(K). Note that, these cluster centroids, may or may not belong to our dataset itself. Since our problem is to choose these points, let's first of all define the K-Means algorithm. 1. Randomly initialize K cluster centroids. 2. Repeat { Closest Assignment Step. Update Cluster Centroids. } So what does it actually mean? First of all, we'll randomly choose K number of points from our dataset {x(1),x(2),x(3)…,x(m)}, and initialize them as our cluster centroids {µ1, µ2, µ3…µk}. def ClusterInit(arr,K): print("Generating ",K," Clusters from Arr") s="Generating "+str(K)+" Clusters from Arr\n" a=random.sample(arr,K) return a Then, the next main step is the Closest Assignment Step. Here, For each point in dataset x(1),x(2),x(3)…,x(m), 1. Calculate it's distance from each centroid {µ1, µ2, µ3…µk}. 2. Select the index of the centroid closest to x(i), and assign it to c(i). def Closest(arr,Clusteroids): print("Computing Closest Clusteroids") indexes=[] count=1 for i in tqdm(arr): a="for "+str(count)+" element\n" temp =[] for j in Clusteroids: temp.append(norm(i,j)) indexes.append(temp.index(min(temp))) count+=1 print(indexes) return indexes Where, the distance between the centroids and a data point is calculated as a norm of the distance between two vectors. But for the simplicity sake, as distance is rather a relative feature, we'll simple calculate as a sum of the absolute values of the difference between the coordinates of them both. def norm(P1,P2): sum=0 for (i,j) in zip(P1,P2): sum+=(abs(i-j)) return sum Now moving ahead, next is the update centroid step. In this step, we select every data point associated with a particular cluster centroid and replace that particular centroid with the mean of all those associated data points. For this purpose, we'll refer to the C array which contains the index of the centroids associated with the particular data point. i.e. for k=1 to K, µk=mean of points assigned to cluster k. i.e. mean(x(i)) where i is such that c(i)=k. def ComputeMeans(arr,indexes,Clusteroids): newClus=[] print(len(arr)) print(len(indexes)) print(len(Clusteroids)) for i in range(len(Clusteroids)): z=[] for j in indexes: if i == j: z.append(arr[indexes.index(j)]) print(z) if len(z)==0: continue else: newClus.append(getmean(z)) for a in newClus: if str(newClus)==str(Clusteroids): return ("end K Means",newClus) return (None,newClus) Here, we can use numpy to calculate the column-wise mean, but the axis argument is quite confusing for me, so I devised my own function to calculate the column-wise mean. def getmean(z): temp=[] for j in range(len(z[0])): sum=0 for i in range(len(z)): sum+=z[i][j]; sum/=len(z) temp.append(int(sum)) return temp That's it ? THAT'S ALL there is to K-Means Clustering. So, the cluster centroids are the K centres for our K number of clusters. And the C vector contains the indexes of all centroid to which our X data sets are associated with. Note that, Size of C == length of the data set, and not the length of the features in the data set. However, there arises a question, what should be the ideal number of K, as we can see, K can take a value between 1 and m (length of our dataset). So, what should be the ideal value of K. To choose that, we need to first decide a cost function. J(µ1, µ2, µ3…. µk, c1, c2, c3…. cm) = ( ? || x(i) - µc(i) ||2 )/ m what it means is that, we are trying to find a particular pair of centroids and their associated clusters, such that, the average of the sum of their squared distance is minimum. So, now, if we choose a very low value of K, let's say 1, then the Cost Function J would have a very high value. Similarly, if we pick a very high value of K, let's say m (size of the data set), we get a very low value of J, which would in fact be zero. Moreover, it would cancel out the objective of clustering. So, what should be the ideal value of K? if we plot K à J, we get an arm shaped graph plot. And what we do is, look for the elbow point in that graph shaped plot. The corresponding value of K is considered to be the ideal no of clusters to be taken. Now, let's apply this algorithm, to compress an image. As, we know, K-Means helps us in locating those particular set of points, which are the centroids of clusters. How can it be applied for image compression? It's quite simple! We'll treat the image as an array of [R,G,B] values, and we'll find a particular set of values, around which, many other values are clustered around. Then we'll replace all the values in the clusters, with their particular set of centroids. and thus reduce the values of the number of colours used in the image. So let's device a function, which de-shapes the whole image in a set of array of RGB values. def deshape(img): arr=[] for i in img: for x in np.array(i,dtype="uint8").tolist(): print(x) arr.append(x) return arr Now, what we have done is covert a Height*Width*3 3D array into a [Height*Width]*3 2D array. So, we'll read an image using openCV, and then pass it to our deshape function to obtain an array of values, arr. Now, we'll decide the number of K, as well as the iterations. Using our value of K, we'll initialize our random Cluster Centroids and then pass it to our clustering function along with arr and no of iterations. img=(cv2.imread("112.jpeg")) arr=deshape(img) K=100 iterations=5 Clusteroids=ClusterInit(arr, K) print(arr[0]) print(Clusteroids) data=Clusetering(arr, Clusteroids, iterations) Now, our Clustering Function, will perform the Kmeans algorithm for the number of iterations and meanwhile keep on updating our clusteroids and indexes, simultaneously. When the number of iterations are over, we'll pass our arr, indexes and the cluster centroids to our compress functions, which would effectively, replace the respective points in our cluster with the value of the clusteroid. def Clusetering(arr,Clusteroids,iterations): for i in range(iterations): a=str(i)+"th Iteration\n" print(a) indexes=Closest(arr,Clusteroids) print("Computing means of clusteroids") a,Clusteroid=ComputeMeans(arr, indexes, Clusteroids) if(a=="end K means"): i=iterations Clusteroids=Clusteroid print("======================================================") compressed_data=Compress(arr,Clusteroids,indexes) return compressed_data def Compress(arr,Clusteroids,indexes): a=[] for i in indexes: a.append(Clusteroids[i]) return a Hence, the value returned by the Clustering function is a 2D array of compressed data. Which we'll now reshape back to a 3D array using the reshape function, and then display the image. def reshape(arr,r,c): img1=[] for i in range(len(arr)): if i==0: temp=[] elif (i%c)==0: img1.append(temp) temp=[] temp.append(arr[i]) return img1 img2=reshape(data,img.shape[0], img.shape[1]) cv2.imshow("Original",cv2.resize(img,(500,500))) cv2.imshow("Compressed",cv2.resize(np.array(img2,dtype="uint8"),(500,500))) cv2.waitKey(0) Result As, you can see, we have reduced the amount of colours used to just 100, and yet, we have maintained more than 95% visibility of our image. Below, attached is the code for this particular program. You can try it on your set of images and tweak the value of K and iterations to your convenience. P.S. the tqdm is a nice little tool you can use to view the iteration progress, which is quite useful since this program may take time for higher number of K. That's it from this blog, if there are any suggestions, or corrections, feel free to mention in the comment section. Also if you have any doubts, feel free to ask. References:- - Machine Learning by Andrew Ng, Coursera.org (among the best MOOCs). read less
Comments

Related Questions

Give the structure of 'C' programming. Also, give significance of math.h and string.h header files.
Hi, C structure is procedural programming. math.h and string .h will have a declerations of predefined methods. math.h ex: sqrt(),...etc. string.h ex: strlen(),strcpy,...etc. if you wanna brif let meknow. Thanks.
Ishwari
Which language is best, C, C++, Python or Java?
If you want to learn any languages C#, Java, ect. then must to learn C, C++ language as these are base or language to be strong in language skills. C++ is improved version of C language. C#, Java, J# and etc. which are improved version of C++.
Sribaghya
0 0
6
What are pointers in C language?
Hi Imran, Hope you are doing good. Are you looking for short answer or long answer? :) Pointers in C are variables that store memory addresses. They point to the location of another variable, allowing...
Imran
0 0
7
What is the meaning of \n and \t in C language?
is new line and \t is for tab
Priya
0 0
5
Is it necessary to mention the header files in the C programming language?
yaa !!!!! it is necessary because we need the data present in evey header file...like example we can do mathematical calculations with the help of math header file.
Hari

Now ask question in any of the 1000+ Categories, and get Answers from Tutors and Trainers on UrbanPro.com

Ask a Question

Related Lessons

Can we store different data types in a stack?
Yesterday, one of my Facebook friend asked me this question. My answer is "yes", and in this post I will discuss how could we do this.I am a great supporter of working with unions and I will be using union...

INTRODUCTION TO PROGRAMMING LANGUAGES
Language is a medium for communication. If we want to perform anything with another person, we can know the human language as a human being. Similarly, if we want to perform anything with a computer, we...

Why do pointers have a datatype?
Before we start with pointers you must know what is a variable and a datatype. int a; This is the basic line in every program in 'C' . It means that we are asking the compiler to give us 2 bytes of space...

Functions In C Programming
A C-language program is nothing but collection of Function, these are the building blocks of a ‘C’ program. Generally, a function mans a task. “Function is a...

Set 0 To Kth Bit In A Variable
The following code snippet Set 0 to Kth Bit in an variable #include int UnsetBitValue(int n, int k);int main(){ printf("%d\n\n",UnsetBitValue(255,6)); return 0;}// Set kth bit to zeroint UnsetBitValue(int...

Recommended Articles

Brilliant Academy is one of the reputed institutes for B.Tech tuition classes. This institute is specialised in delivering quality tuition classes for B.E, Engineering - all streams and Engineering diploma courses. Incorporated in 2012, Brillant Academy is a brainchild of Mr Jagadeesh. The main motto of the academy is to...

Read full article >

Lasya Infotech is a Hyderabad based IT training institute founded in 2016 by O Venkat. Believing in his innovation, passion and persistence and with a diverse blend of experience, he started his brainchild to deliver exemplary professional courses to aspiring candidates by honing their skills. Ever since the institute envisions...

Read full article >

Software Development has been one of the most popular career trends since years. The reason behind this is the fact that software are being used almost everywhere today.  In all of our lives, from the morning’s alarm clock to the coffee maker, car, mobile phone, computer, ATM and in almost everything we use in our daily...

Read full article >

Information technology consultancy or Information technology consulting is a specialized field in which one can set their focus on providing advisory services to business firms on finding ways to use innovations in information technology to further their business and meet the objectives of the business. Not only does...

Read full article >

Looking for C Language Classes?

Learn from the Best Tutors on UrbanPro

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you
X

Looking for C Language Classes?

The best tutors for C Language Classes are on UrbanPro

  • Select the best Tutor
  • Book & Attend a Free Demo
  • Pay and start Learning

Learn C Language with the Best Tutors

The best Tutors for C Language Classes are on UrbanPro

This website uses cookies

We use cookies to improve user experience. Choose what cookies you allow us to use. You can read more about our Cookie Policy in our Privacy Policy

Accept All
Decline All

UrbanPro.com is India's largest network of most trusted tutors and institutes. Over 55 lakh students rely on UrbanPro.com, to fulfill their learning requirements across 1,000+ categories. Using UrbanPro.com, parents, and students can compare multiple Tutors and Institutes and choose the one that best suits their requirements. More than 7.5 lakh verified Tutors and Institutes are helping millions of students every day and growing their tutoring business on UrbanPro.com. Whether you are looking for a tutor to learn mathematics, a German language trainer to brush up your German language skills or an institute to upgrade your IT skills, we have got the best selection of Tutors and Training Institutes for you. Read more