What I Wish Everyone Knew About K Means : Unsupervised Learning

K Means is one of the simplest unsupervised learning methods. Now, question comes what is Unsupervised learning? Lets have quick overview.

Unsupervised learning is the method where we train the model provided training data has no labels / dependent features.

Few keywords and their description before we understand K means:

Features : Training data has different columns and these are considered as features. They are also termed as independent variables.

Labels : Data used for trainings has labels or output variables/features. These are also called dependent variables. They are called dependent as their outcome depends upon the independent variables.



Truly based on Data and Problem Statements

Machine Learning has become buzzword in last few years. And, everybody knows about this and likes to experience the same. This is one of the good things happened and thats the reason it is getting more popular. More people explore and then more ideas come. There are many algorithms and models exists. It becomes problem to choose the correct one for your problem statement. Let us experience common steps to explore the problem of choosing ML algorithm and see if we can achieve the standards where chances of wrong selection can be reduced.

Common and Simple Steps before ML models…


Use case and Innovate will make you experience visualization differently

Tableau is the most elaborate and easy to implement visualization tool every experience by professionals. It is one of the in demand tools. I have already explored basics and important topics in below blog. Now, time to see more invented way of changing the existing charts in tableau to your requirement so that you can produce different charts. To your surprise, you will see that it is not as difficult. In single click, you can create existing chart options in tableau and with little bit innovation, we can turn them into required ones.

Tableau is capable of providing the favorite…

Overfitting | Less Data | Data Simulation — Solution is CV

Cross validation is one of the things which can be used to make your training of model more reliable with the given data. It is also known as rotation estimation or out of sample testing. You will understand in a while why is it so!

Cross validation or simply CV can also be referred as out of sample testing or rotation sampling. Once you have model which is not generalizing better with test data or in other words, we can say model is overfitting. So, it means you have less data for model to learn and converge. And, solution is…

Visualization Factor to Understand Data

Tableau is one of the best visualization tools available. It is the truth that till now we had many data analytic tools available but nobody has given so much depth to visually analyze data and present it based on the needs of viewers. It looks revolutionary in terms of dragging and dropping to reach out to beautiful visualization to extract meaning from the raw data. The best thing, I like about new technologies is that they keep evolving and improving. This tool has impact and future so it becomes necessary to learn it.

There are many things and learning is…

Word to Vector : Understanding the Concept

NLP is buzzword and there are plenty of problem statements to experiment with. The more deeper you go, you will get more insights about data. Innovation to explore data is capable of producing new things to solve existing problem statement. We already experienced feature engineering in NLP using tfidf and pmi in earlier blog. Link is below. Now time to move to the next steps to feature engineering.

Feature engineering in NLP can be known as vector compression. And, idea for doing this to get less sparse vector and better performance. Other dimensionality reduction like SVD can be applied too…

Developing Model and Tuning Steps For NLP

NLP stands for natural language processing and it is one of the buzzword in real world. Everybody wants to learn and expertise in this area. To start, NLP is directly correlated to processing text and we all know today’s world is full of text flowing from everywhere. This much data and processing it becomes very interesting. It brings lots of use cases, innovation and ideas to apply machine learning.

With the above discussion, we can understand how important and vast is text processing. It becomes difficult when it is very easy to implement and everybody can write simple lines of…

Master visualization to explore data in first go

Python is one of the new generation languages. It has libraries to visualize your data, explore and get some insights out of the same. Matplotlib is one of the libraries which is used most common in exploring the data. Every data analysis requires to visualize the data for different purpose like finding outliers, density, sparcity, trends and more importantly normalization of data.

Let us explore matplotlib and then we can see what are the other libraries for visualization.

Plotting the data to visualize and extracting first level meaning from the data is must learn art for data analytics and machine…

Learn this Library to Excel as Data Scientist

Pytorch is very known open source library which can be used for building neural network and natural language processing solutions. There are many advantages of this library over other libraries but the most visible one is change in processing internally and finally, achieving faster execution in evaluations performed across vectors. In other words, it emphasizes on faster processing of huge numerics in the best possible manner.

Pytorch : open source library + NLP + Faster processing / execution

Another Library with advanced features and flexibility

In comparison to tensorflow, it has all the replica libraries which can perform at similar capacity with faster execution. …

Text to Vector space, A Step Closer to Analysis and Must for Data Science Projects

Natural language processing (NLP) has capability to handle text and provide meanings to that based on the problem statement. First and foremost question arises, how machine can handle text. So, NLP is not different and it also converts text into vectors of numbers to have meaning from the words. Once everything is in the form of vector, it is easy to compare and evaluate the same. It makes it understandable that NLP also converts the data into vector and ML model requires data in numeric to train the model. …

Laxman Singh

Machine Learning Engineer | Data Science | MTECH NUS, Singapore

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store