Saihiel Bakshi

About Me

I am a Data Specialist and Developer that is passionate about Big Data Engineering, Machine Learning and Statistical Modeling. I graduated from the University of Toronto with a degree in Computer Science, Applied Statistics, and Mathematics. I am often found reading research papers or current events.

I consider myself a self-starter and a forever-student, and I am always looking to learn more and further hone my skillset. I enjoy spending my time reading, programming, playing tennis, or solving challenging problems with unorthodox solutions!

I currently work as a Data Engineer at Springboard Data Management in Toronto, Canada.

Latest Projects

A curation of some my projects:

New

Analysis of Data Related Jobs on LinkedIn

Built my own Web-Scrapper using Python and selenium to scrape the most relevant data related job postings from LinkedIn. Conducted exploratory analysis on the job details and the applicant insights collected from 'LinkedIn Premium Insights'. I analysed the most common skills amongst applicants (the Word Cloud of Skills is shown above), the education level of the various applicants and the experience of the applicants against different job types. I even analyse the data against the question 'Higher Education vs Experience'.

New

Try It Now!

Classifying Shoes For The Visually Impaired

Have you ever looked at two shoes and wondered whether they come from the same pair or not? Me neither. But unfortunately, not everyone has this privledge. People with visual impairments face similar challenges daily. In this project I designed and developed a model that can function as the underlying model for a system that deals with this issue. This end-to-end project deals with preprocessing the image data, creating models with two different convolutional architectures, training the neural networks (including tuning hyperparameters) and evaluating the models. The project is deployed to the Heroku cloud using Flask in Python.

Generating News Headlines Using Autoencoders

Writing a good news headline is an art. It requires a strong command over language to be able to grab the readers attention in a single sentence. In this project I created a model that takes an orginal news headline and generates a related but new one. This project focuses on the application of deep learning to natural language processing.

Find out more

Ensemble Model for Kaggle Competition on Beijing Pollution Data

This data analysis and regression project won me a Kaggle competition on an initially unknown dataset. There were over 200 participating teams in this private competition held for University of Toronto students. I built a weighted ensemble model consisting of a Linear Generalised Additive Model (GAM) and an Extreme Gradient Boosted Tree Regressor Model (XGBM).

Find out more

Analytical Report on Sparse Group LASSO Research Paper

An analytical report on the Sparse Group Lasso method developed by Friedman et al (2013). Also created a presentation to summarize and present my findings and analysis on the research paper. I used R to reproduce the results in the paper and demonstrate the efficacy of the methodology using Monte Carlo simulations.

Find out more

Modelling Songs By Release Dates

Built Logistic Regression and K - Nearest Neighbour models to predict which century a song was released in. I used the "YearPredictionMSD Data Set" from the Million Song Dataset. Built all my models from scratch using only Python and Numpy.

Find out more

Other Projects

Automated Marking System for Grading SQL Assignments Code Proprietary

Built an automated marking system to grade SQL assignments for students at University of Toronto. I also designed a partial marking system that leverages string similarity algorithms to assign partial marks to student submissions. The system is now being used by the University for future database courses. The system has helped increase the efficiency of the Computer Science department by reducing the number of hours Teaching Assistants spending marking assignments by 30%.

Algorithm for Cleaning Client Data for American Express Code Proprietary

Designed a system for removing duplicate client data from massive overlapping databases. The algorithm was designed to deal with cases where people's names were spelled differently, addresses were incomplete or records contained a lot of null values. I built the system using Java and Hive (Hadoop) using fuzzy logic algorithms. My system saved the department a $10,000 monthly subscription cost, while providing all the same services more accessibly.

Automated Foreign Currency Exchange Trading System Code Proprietary

Created a system that trades the Forex market for my investors using strategies developed and designed by me. The underlying model leverages Machine Learning techniques and performs time-series analysis on incoming live tick data. I scraped raw time series data for every available tick from the last 20 years using MetaQuotes Language. Applied a plethora of cleaning techniques, feature transformations and created a backtesting application with Python. The system uses multiple Recurrent Neural Networks with LSTM cells and is hosted on AWS.

Past Work Experience

Teaching Assistant, Databases - University of Toronto (Sept. 2019 - May 2020)

Lead tutorials for 150+ students on SQL, database management systems and relational data modeling.
Uncovered a new opportunity that could be automated and created a strategic system using Python to solve this that increased the time-efficiency of the department by almost 30%.
Invigilated and graded examinations and assignments based on Relational Databases concepts and SQL scripting/querying.

Data Scientist, Intern - Prolifics (May 2019 - Sept. 2019)

Explored various data sources and uncovered business opportunity in client’s chat-bot requests by using NLP, and deployed a sentiment analysis model that increased customer engagement by 4%.
Spearheaded the development of multiple machine learning models and experiments for client propensity modelling, customer behaviour analysis and target market identification.
Techlead for ”Data Science in a Box” project, developed using AWS S3, Stitch, Snowflake, IBM Watson Studio, DataRobot and sklearn.
Presented actionable insights derived from tested hypotheses on diverse datasets to project managers and cross-functional teams.
Engaged with stakeholders throughout the organization to identify opportunities for leveraging machine learning and AI algorithms to drive business decisions.
Created a comparative analysis of automated machine learning solutions, such as DataRobot, IBM AutoAI and Auto-scikit learn.

Big Data Engineer, Intern - American Express (May 2017 - September 2017)

Leveraged data insights derived from statistical tests for optimizing direct marketing campaign that lead to an increased conversion rate of preapproved clients by 1.7%.
Championed the development of big data algorithms centred around deduplication and entity resolution using Hadoop and Java.
Researched and developed big data algorithms centered around deduplication and entity resolution in Hadoop.
Liaised with cross-functional teams across the enterprise to derive data-driven multi-cluster parallel solutions for business use cases deployed in Hadoop.
Developed and monitored end-to-end pipeline for data extraction, transformation and loading (ETL) jobs that sourced data from Hadoop.