Creating the best business start-up approach with data science and machine learning
Here, I present a self-devised business problem and my approach to solving the problem.
I created a hypothetical scenario where a chef who wishes to introduce a population of Toronto to Italian food but is battling with the constraint of not knowing which location would be best for his idea.
He contacts yours truly and laid his problem to me. I articulated the issue and devised the best possible way to create a solution for this. Here it goes.
The Approach
The main objective was to implement a solution using data science and machine learning and I settled on the idea that using Clustering with the K-Means Model would do just the job. Clustering here would let me visualize data points, in this case, locations and their constituents which would be the venues located in different neighborhoods in those locations and then find out which location had as little number of Italian Restaurants as possible.
Finding a place like this would pose as a great advantage for the enthusiastic chef to introduce his Italian restaurant to a new population.
I then decided to list out the requirements to solve the problem:
- List of neighborhoods in Toronto, Canada.
- Latitude and Longitude coordinates of these neighborhoods.
- Venue data related to Italian restaurants. This will help us find the neighborhoods that are most suitable to open an Italian restaurant.
Visualization of 3 clusters showing areas that belong to the same category
How I was going to get these data was another question. Here are the solutions I settled on, with reference to where I would get the data:
- Scrapping of Toronto neighborhoods via Postal Codes on Wikipedia.
- Getting Latitude and Longitude coordinates of these neighborhoods via Geocoder package.
- Using Foursquare API to get venue data related to these neighborhoods
The Result
After all the processes of data mining, data wrangling, separating the regions into 3 clusters, and examining those clusters, we got just the information we required.
Below is a link to the Report that was made from the information I got from examining the Clusters.
Thank you.