Notebooks: 01 - Yelp Affluence Demo, 07 - Yelp Class DevelopmentĪfter optimizing our K-Means model, we deployed the K-Nearest Neighbors Classifier (n=15) to assign a label to our queried locations. Methods for the YelpAffluence_NYC object include tools to fit the model to NYC data, query Yelp's Fusion API for a list of locations, plot results in a bar graph by price level, and plot results on a map of NYC. Using this model, we developed a class for use by our client: yelpaffluence_nyc.py class YelpAffluence_NYC. Our final K Means model used default parameters except for init='random' and random_state=42. Using the Elbow method, n_clusters = 4 was found to be the best, balanced choice. We analyzed silhouette score, inertia and the number of observations in each cluster - K Means yielded best results. We Gridsearched through three clustering models (K Means, Agglomerative, and Hierarchical) and changed many hyperparameters (n_clusters, inits, linkage_method, affinity.). Notebooks: 05 - Yelp Cluster Gridsearch, 06 - K-Means Modeling We gathered the top 100 business prices and reviews from 278 NYC zipcodes. The search criteria included all categories (shops, restaurants, etc.), filtered on the "best match" option. We set our Yelp Fusion API Key and established an API connection, storing the data we gathered into a Pandas DataFrame. Our data was acquired via Yelp's Fusion API from yelpapi import YelpAPI. Notebooks: 02 - Yelp Fusion API, 03 - IRS Data, 04 - API Pull for Training Data String of ZIP code or text location query parameter being classified with KNN modelįloating point standardized ("s") value for number of $ priced ("pr_1") establishments in top 100 best match for search locationįloating point standardized ("s") value for number of $$ priced ("pr_2") establishments in top 100 best match for search location, weighted ("w") by price levelįloating point standardized ("s") value for number of $$$ priced ("pr_3") establishments in top 100 best match for search location, weighted ("w") by price levelįloating point standardized ("s") value for number of $$$$ priced ("pr_4") establishments in top 100 best match for search location, weighted ("w") by price levelįloating point standardized ("s") value for total weighted price counts ("tot", "w") for top 100 best match for search location String for each of 278 ZIP codes across New York City boroughs The Class also has functions to plot the queried zipcodes or neighborhoods into one of four clusters and plot the results on a map of NYC. We developed a Python Class that, given a list of NYC zipcodes or neighborhood names, will access Yelp's Fusion API, gather data on this criteria, and use a K Means model to cluster the results.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |