K Means is one of the simplest unsupervised learning methods. Now, question comes what is Unsupervised learning? Lets have quick overview.
Unsupervised learning is the method where we train the model provided training data has no labels / dependent features.
Few keywords and their description before we understand K means:
Features : Training data has different columns and these are considered as features. They are also termed as independent variables.
Labels : Data used for trainings has labels or output variables/features. These are also called dependent variables. They are called dependent as their outcome depends upon the independent variables.
Supervised learning is where we have labelled data. When we don’t have labelled data and we have to train the model to learn the data, then this is called unsupervised learning. …
NLP stands for Natural Language Processing. As name suggests, it deals with text as input and has the ability to extract meaning out of given text like human. Two decades ago, it could be a challenge to get the data and scarcity of data to train the model did not bring success to NLP. Now, world has moved so fast that getting data is very easy specially in text format, the challenge is to handle huge amount of data and interpret the data in the most appropriate manner to Problem Statement. There is huge amount of data travelling on daily basis in social media and I don’t want to bring counts here to become redundant in providing info. Everybody knows it is very huge amount. …
Time series is one of the examples which can suit to any transactional related problem statement. So, time series analysis is buzzword, must to learn and challenge to understand to resolve. Concluding time series problem has been a major task to be achieved as ML engineer or Data Scientist. Famous use cases are cash flow analysis & forecasting, stock market, weather and electricity demand forecasting etc. In simple words, wherever we can find seasonal or random variations we can definitely take it as time series analysis to come to conclusion. Here, I will go through four techniques to handle time series data. And, at last fifth additional forecasting model to differentiate and differentiate forecasting by different methods. …
Algotrading is one of the buzzwords and it is well understood that machine is trading on your behalf to give you profit. I would like to term it as quantitative analysis system which will read the past time series data for trades/currency and based on that, make a decision to buy/sell/hold. Our use case is that where we have some initial amount, previous price index and today’s price index. And, our goal is to let algorithm do the trading and gives us the profit at the end of 15 days.
…
Analysis and steps involved in analysis are very important to reach to the right conclusion. It will help to derive meaning out of raw data which most of the world is striving for. This brings us to the discussion of different types of analysis and their respective importance in data analytics. To be frank, knowing them will give depth of understanding to utilize the same in actual environment and have edge over others.
This is the first stage in the business analytics in the modern day. This stage will give quite a good analysis about the raw data. …
There is introduction of new technologies and programming languages happening in a very rapid rate. Everybody wants to add new leaf into their programming vocabulary. And, once you start learning new programming languages, you will find many advices and lots of things. You end up wasting your lots of time in starting your journey. You won’t understand whom to rely or not. Learning a new language has become an art and if we do in organised manner, learning could be achieved more faster than actual. In my view and understanding with my learning process over the years, I have come up with few simple steps to initiate learning and master that art gradually. …
Python is a buzzword. Being most popular modern generation language with one of the biggest community of developers and capable of performing machine learning makes Python unique and should learn programming language. As per definition, it looks like the perfect programming language with all the ingredients. It has different libraries for covering different use cases, UI frameworks like flask, Django etc, machine learning libraries and visualization libraries.
Virtual environment setup is very important to work on projects in python. It helps to distinguish and maintain different projects’ libraries, versions used for the given problem statement. It could be possible one of your project is running in different version of python, different version of python libraries. All can be achieved by simple using virtual env. Maintaining virtual environment is very simple and easy. It provides flexibility, portability and easy maintainability. …
SPACY is one of the open source library available in python which guarantees solution to NLP (Natural Language Processing) problem statement. It has three different language corpus available : medium, small and large. It has CNN models for most of the NLP steps like POS tagging, NER, Dependency Parsing. It is capable of handling multiple languages other than english.
This library is very flexible as per your use case. It allows to custom train existing model as per your use case or allowed to provide patterns to match and overrule the existing models to tag the text.
Extract data from…
StackOverflow is the popular Q&A site for programmers, provides useful information to nearly 5 million programmers worldwide with its database of questions and answers — not to mention the additional comments that other programmers provided. Analysis that continuous flow of data and extracting useful meaning of the discussion makes it a good milestone to achieve. This option will give insights of the discussion and also provide a great help to different technologies in their expected manner. Keeping the same thing in mind, I will try to explore this option with the below goals and approach applied to achieve this.
Handing humongous streaming data and providing real time analysis on provided data is like revolutionary. Python with combination of pyspark make it possible with ease. Here, we have advantage of getting data using twitter provided api. Twitter API allows to get the real time data, process it using RDD, pass it on to display the counts and then, dashboard to display analysis of whole data processed. It is interesting and full of learnings. This is as simple to implement as it looks cumbersome to the understand given problem statement.
Technologies used: Python with pyspark, zookeeper, Kafka, Chart.js and Tableau
About