About Me

Welcome! I'm Thomas Hur, a Computer Science & Economics double major at Binghamton University. Several years ago, while working as a campaign intern responsible for canvassing and data input, I had an epiphany and realized just how important data has become in the modern world. Since then, I've had a keen interest in working with and manipulating data and am eager to learn more on how to do so.

So far, some fields of Computer Science I have dabbled in include Data Science, Software Engineering, and basic web development. Currently, my goal is to challenge myself and gain more experience in the field, whether it be through projects, internships, or hackathons.

Some of my skills include:

  • Python
  • R
  • SQL & MySQL
  • C++ & Java
  • Apache Spark

Experience

    " McAfee
    Data Analyst Intern
    McAfee
    May 2020 - Aug 2020
    • Segmented customers through K-Means Clustering to identify key app features and predict customer churn, retention, and value, potentially generating thousands in additional revenue through targeted advertising to new McAfee users
    • Delivered end-to-end dashboard visualization with Tableau and SQL to provide stakeholders with App Store and Google Play metric data, reducing time spent by 90% manually browsing through app store data
    • Leveraged AWS Athena/S3 Bucket, Azure Databricks, Python, and SQL to analyze success of app campaign messaging

    Xaltius
    Data Science Intern
    Xaltius Tech Pte Ltd
    Jun 2019 - Aug 2019
    • Built several financial use cases with PySpark and Scala to market to customers including Kickstarter Success and Auditing
    • Constructed multiple models of algorithms such as Support Vector Machines and utilized methods like Gradient Boosting, ML Flow, and Cross Validation to achieve highly accurate models
    • Collaborated with marketing interns to design multiple presentations using Canva to showcase machine learning projects to consumers and businesses

    TakenMind
    Data Analytics Intern
    TakenMind Organization
    Oct 2018 - Dec 2018
    • Discovered key parameters that led to high employee turnover by implementing multiple machine learning models like SVM, Decision Trees, Random Forest, and Naive Bayes
    • Utilized multiple classification algorithms on the popular iris flower dataset to label each individual flower as either Iris setosa, Iris virginica or Iris versicolor

Projects




  • Recipe Recommender System

    Developed a recommender system for thewoksoflife.com using Python. The recommender system operates using a custom-built Cosine Similarity algorithm comparing ingredients and other parameters like calories, average rating, etc in order to recommend a recipe to a user given specified parameters. In order to build the recommender system, I first developed a web scraper with the Requests, Pandas, JSON libraries, storing 1450 recipes and 9 features to a csv file. I then utilized regular expressions and KNN imputation to clean the data, before conducting data visualization with matplotlib, seaborn, and wordcloud libraries. Code can be found here.




  • Credit Card Fraud Detection

    Built multiple models, including of Support Vector Machines and Decision Trees, to detect incidents of credit card fraud with 98.98% accuracy after applying K-Fold Cross Validation and Under Sampling. Employed Spark, Python, and SQL to build an IPython Notebook to clean and analyze data. Used a dataset from Kaggle located at https://www.kaggle.com/mlg-ulb/creditcardfraud. The dataset contains 284,407 transactions and contains 28 anonymized features (in order to protect private information) and 3 given features: transaction data, time, amount.

    Code can be found here.




  • India Audit

    Built pipelines using PySpark, ML Flow, and RF to evaluate probability that a firm in India was guilty of tax evasion with 95.92% accuracy. Utilized SQL and Python to build an IPython Notebook for data cleaning as well as for data visualization. In particular, focused on Random Forest's feature extraction tool to determine the most important factors contributing to a company's risk of tax evasion. Dataset can be found at https://archive.ics.uci.edu/ml/datasets/Audit+Data The dataset contains 776 transactions and 23 features, including sector score and more.

    Code can be found here.