top of page
Harsh Dhanuka Project

Projects and Academic Work

Skills put into action

To provide a tangible showcase of my work, I’ve included a portfolio of my recent academic projects. Take a moment to explore below my various academic and practical projects and skills showcase.

Projects: Projects

Project 1 - Airbnb - Kaggle Competition

R, random forest, boosting, xgboost

MS. Applied Analytics (Columbia University)

Project 2 - Fresh Direct LLC. - Market Optimization

MS Excel, Pivot, IBM Watson

MS. Applied Analytics (Columbia University)

Harsh Dhanuka Project | Display of Stock Market Quotes
Harsh Dhanuka Project | Shopping Basket

Secured the 1st rank in class and 11th rank out of the total 464 participants at a school-wide Kaggle competition to build the best rental price predicting machine learning model for Airbnb, using an enormous dataset of 40,000 observations across 90 features. The machine learning model was to be built in the R language, using models limited to Gradient Boosting, Ranger, Lasso, Ridge, Stepwise Selection, Regression, Bagging and Bootstrapping.

Grade: A+

Click below for in-class presentation. (Subject to copyright)

Scrutinized ‘FreshDirect LLC’, USA and proposed a new market segmentation plan, and a profitable supply chain optimization schedule. Out of given data consisting of sales figures, customer database, weekly trends, zipcodes, items sold, delivery schedules, etc., made a comprehensive analysis using Excel Pivot Table, IBM Watson, and Tableau to report the most profitable customer segments to target, build a geographical segmentation profile and heat map analysis, and suggest profit and delivery optimization.

Grade: A+

Click below for in-class submission. (Subject to copyright)

Project 4 - NYPD - Murder Rate Research Proposal

Research Design, Hypothesis, Sampling, Population

MS. Applied Analytics (Columbia University)

Designed a detailed research study for the NYPD's concerns of increasing murders rates in New York City. The design consists of several aspects such as management dilemma, research questions, benefits of the research, the methodology and type of research study, the population and sample selection, threats to validity, and analytical methodology. 

Grade: A+

Click below for in-class submission. (Subject to copyright)

Harsh Dhanuka Project | Medical Record Analysis
Harsh Dhanuka Project | Police Patch

Project 5 - Youtube - New Revenue Model Proposal

MS Powerpoint, Canva

MS. Applied Analytics (Columbia University)

Created an entirely new ad-based revenue model for YouTube as part of the Storytelling class. Analyzed top videos to enlist certain characteristics about good videos and modeled a new earnings section in the YouTube website called 'YouTube Preferred'. This was done for the Storytelling with Data module, and all data provided may not resemble actual figures.

Grade: A+

Click below for in-class presentation. (Subject to copyright)

Harsh Dhanuka Project

Project 6 - Women's Clothing - Rating Prediction

R, Unsupervised Learning, Sentiment Analysis, Clustering

MS. Applied Analytics (Columbia University)

Secured the 1st rank in class for an Unsupervised Learning Project. Performed detailed analysis and predictive modeling in R on a dataset containing Women's Clothing Reviews, with a total of 20,000 records and 11 variables. The analysis techniques and models included wordclouds, histograms, sentiment analysis, lexicon analysis, clustering analysis, TF, TF-IDF models, predictions using tree and linear regression models.

Grade: A+

Click below for in-class presentation. (Subject to copyright)

Harsh Dhanuka Project | Fashion Clothing

Project 8 - eBay Inc. - Market Expansion Analysis

Blue Ocean, ERRC, Strategy Analytics, MS Excel

MS. Applied Analytics (Columbia University)

Performed in-depth case study analysis of eBay Inc.'s current business model. Built a Strategic Plan for a Multi-Modal Auction e-Platform for eBay Inc. using tools such as Blue Ocean approach strategy, and an ERRC Model. Analyzed the competitive environment using SWOT, PESTLE and Porter's Five Forces, and clearly defined the Roadmap and Path to Implementation, Adaptive Planning, Analytics Application, and Risk Management for the suggested strategy.

Grade: A+

Click below for in-class submission. (Subject to copyright)

Project 7 - Google Playstore App - Rating Prediction

Python, Jupyter Notebook, pandas, numpy, sklearn, nltk

MS. Applied Analytics (Columbia University)

Built a Ratings Predictive Model for the Google Playstore Apps market, using a vast dataset form Kaggle containing numerous information such as size, date, free, installs, category, and others. Performed in-depth Data Cleaning, Data Transformation, Exploratory Analysis, Sentiment Analysis and Clustering, to develop meaningful actionable insights for app developers and also suggested a future outlook for the app market.

Grade: A+

Click below for Jupyter Notebook file. (Subject to copyright)

Harsh Dhanuka Project | Ebay

Project 9 - Lyft Inc. - Market Expansion Analysis

Blue Ocean, ERRC, Strategy Canvas, Strategy Analytics, MS Excel

MS. Applied Analytics (Columbia University)

Harsh Dhanuka Project | Uber Lyft

Built a new Strategic Roadmap for Lyft, for its expansion into other areas, other sectors, other regions, and other industries, and revenue growth, and customer satisfaction. Evaluated the current standing, competitive environment, the relevance of different strategic frameworks such as SWOT, Porters Five Forces, PESTLE, and the ERRC model, and suggested the timeline and implementation of the recommendation.

Grade: A+

Click below for in-class submission. (Subject to copyright)

Project 10 - Predicting Loan Defaults

Feature Engineering, EDA, Random Forest, h2o, Sampling, Lift

MS. Applied Analytics (Columbia University)

Harsh Dhanuka Project | Loan Default

Project 11 - Healthcare Outlier Detection 1

Python Outlier Detection PyOD, kNN, PCA Clustering

MS. Applied Analytics (Columbia University)

Harsh Dhanuka Project | Medical Fraud

Used a dataset containing data on the healthcare industry, such as DRG, hospital name, location, average charges, medicare payments, total discharges, and others over 163k rows. Performed an in-depth exploratory data analysis to understand each feature, and then performed thorough feature engineering to build many new meaningful features to help in hospital fraud detection. Further performed clustering analysis using the PyOD modules, to identify anomalous or potentially fraudulent clusters with the help of the average summary statistics table for each cluster.

Grade: A+

Click below for in-class submission. (Subject to copyright)

Project 13 - ML Model Monitoring Dashboard Proposal for Loan Default Model built in Project 10
Model Performance Metrics, System Usage Indicators, Service Response Metrics, Production Cost Monitoring

Used a real loan default dataset (company name withheld) containing 80,000 rows of data over 89 variables to performed an in-depth Exploratory Data Analysis and Feature Engineering, Further built multiple random forest models using the H2O package to build a stable and acceptable loan default prediction machine learning model. The final LIFT score was 3.01 with a Precision-Recall score of 0.50, and AUC of 0.79.

Grade: A+

Click below for in-class submission. (Subject to copyright)

Project 12 - Healthcare Outlier Detection 2

Python Outlier Detection PyOD, Autoencoder, iForest Clustering

MS. Applied Analytics (Columbia University)

Harsh Dhanuka Project | Healthcare Fraud

In this project, I continue with Project 11 and take it a step further to explore more PyOD modules such as Autoencoder and Isolated Forests. I further performed clustering analysis using the PyOD modules, to identify anomalous or potentially fraudulent clusters with the help of the average summary statistics table for each cluster.

Grade: A+

Click below for in-class submission. (Subject to copyright)

MS. Applied Analytics (Columbia University)

Harsh Dhanuka Project | Analytics Dashboard

Built a draft of a Model Minotoring Dashboard, which is used to track usage, set thresholds, and set parameters to accept/reject a machine learning model and decide its validity and stability. The dashboard is  built based on 4 distinct segments - Model Performance, System Usage, Production Cost, and Service Response. 

Grade: A+

Click below for in-class submission. (Subject to copyright)

For more examples and further understanding of my works, don’t hesitate to reach out. Keep exploring for my co-curricular and activities.

Harsh Dhanuka Project | App

Project 3 - Johnson & Johnson - Sales Analysis

MS Excel, MS Word

MS. Applied Analytics (Columbia University)

Designed a research study for Johnson & Johnson's baby powder's declining sales. Analyzed the market trends, past pattern analysis, future forecast, supply chain stock-outs, and reported key metrics regarding profitability. Further analyzed the marketing activities, the loyalty programs, and understood competitors' offerings to develop a more effective marketing campaign. Also analyzed the resignation of the CMO, and succession planning.

Grade: A+

Click below for in-class submission. (Subject to copyright)

bottom of page