As mentioned in my previous blog: Machine Learning Basics, we have a data set given and we know how our output should look like.
It can be classified into:
Regression Problems: The input variables are mapped to a continuous output function.
Classification Problems: The input variables are mapped to a discret output function.
Given – The size of house data and the
Objective 1: Predict the price based on size of the house.
Objective 2: Whether the house sells for more or less than 20k$?
Objective 1 is based on the regression problem whereas the Objective 2 is based on the classification problem.
Given – A data set of patients. Objective: Determine if patient has diabetes or not?
This is an example of classification problem.
In this the datasets does not have any labels or might have some. The objective is to find some structure in the data. It might be used to find that data lies in 2 or x different clusters.
Example 1: Google News
It goes and looks at thousands of new stories on web and groups them into cohesive new stories. So for instance a news as shown below: The news title (Hindu Marriage Bill) is one but it has clustered all the news stories (India Today, Business Standard, Radio Pakistan, Pakistan Today) referencing this story. So this is what unsupervised learning is, trying to find structures in the data. Once found, similar set of the data form a cluster.
Example 2: Imagine we have 10K different genes and aim is to find a way to group the genes that are somehow related by some variable or similar. In other words, aim is to form clusters based on some factor such as lifespan etc.
Few other Examples:
a) Stock Market: Predict the price of a share tomorrow? REGRESSION SUPERVISED
b) Weather Prediction: Whether it will rain tomorrow at 10Pm ? CLASSIFICATION SUPERVISED
c) Determine amount of rain that will occur tomorrow? REGRESSION SUPERVISED
d) Predict if a person will have diabetes in 3 years or no? Given the sample of DNA of a person. CLASSIFICATION SUPERVISED
e) Given an email, classify if its spam or not? CLASSIFICATION SUPERVISED
f) Given a database of customer data, discover the market segments. UNSUPERVISED