Business Solution With Machine learning

Business Solution With Machine learning

Creating the best business start-up approach with data science and machine learning

Here, I present a self-devised business problem and my approach to solving the problem.

I created a hypothetical scenario where a chef who wishes to introduce a population of Toronto to Italian food but is battling with the constraint of not knowing which location would be best for his idea.

He contacts yours truly and laid his problem to me. I articulated the issue and devised the best possible way to create a solution for this. Here it goes.

The Approach

The main objective was to implement a solution using data science and machine learning and I settled on the idea that using Clustering with the K-Means Model would do just the job. Clustering here would let me visualize data points, in this case, locations and their constituents which would be the venues located in different neighborhoods in those locations and then find out which location had as little number of Italian Restaurants as possible.

Finding a place like this would pose as a great advantage for the enthusiastic chef to introduce his Italian restaurant to a new population.

I then decided to list out the requirements to solve the problem:

  • List of neighborhoods in Toronto, Canada.
  • Latitude and Longitude coordinates of these neighborhoods.
  • Venue data related to Italian restaurants. This will help us find the neighborhoods that are most suitable to open an Italian restaurant.

Visualization of 3 clusters showing areas that belong to the same category

How I was going to get these data was another question. Here are the solutions I settled on, with reference to where I would get the data:

  • Scrapping of Toronto neighborhoods via Postal Codes on Wikipedia.
  • Getting Latitude and Longitude coordinates of these neighborhoods via Geocoder package.
  • Using Foursquare API to get venue data related to these neighborhoods

The Result

After all the processes of data mining, data wrangling, separating the regions into 3 clusters, and examining those clusters, we got just the information we required.

Below is a link to the Report that was made from the information I got from examining the Clusters.

Report

Thank you.