Resume

This is a summary of my resume. There is a more detailed version of my resume that you can download by clicking the PDF icon in the top right of the page.

Table of contents

Basics

Name Sam Voisin
Label Data Scientist and Software Engineer
Email samvoisin@protonmail.com
Url https://www.samvoisin.com/
Summary Skilled data scientist and software engineer with extensive experience in developing robust data pipelines, scalable architectures, and efficient algorithms. Proficient in integrating data science techniques into software solutions, with a strong foundation in statistical modeling, machine learning, and software design. Seeking a position to contribute to innovative data-driven projects.

Work

  • 2023.10 - Present
    Senior Data Scientist
    Tradewind Data Science, Chicago, IL
    Analyzed consumer trends and provided actionable insights for data-driven decision-making. Improved client revenue through advanced econometric modeling, optimizing pricing strategies.
  • 2022.03 - 2023.10
    Data Scientist
    Infinia ML, Durham, NC
    Developed flexible document processing pipelines for information extraction and classification. Designed and implemented large language model (LLM) infrastructure.
  • 2020.06 - 2022.03
    Data Scientist
    Geometric Data Analytics, Inc, Durham, NC
    Research and development for clients including DARPA, NRL, and AFRL. Developed novel algorithms for oceanographic research, remote sensing, and pattern-of-life modeling. See publications.
  • 2019.04 - 2019.09
    Research Assistant
    Duke University, Durham, NC
    Crafted research goals, planned and executed experiments, designed data pipelines, and optimized MCMC samplers for Bayesian hierarchical regression models.
  • 2015.01 - 2018.06
    Analyst
    Ally Financial Services, Charlotte, NC
    Analyzed financial market data and business metrics to mitigate business risk. Automated data gathering and processing. Acted as program lead and mentor for department internship program.

Education

  • 2018.08 - 2020.05
    M.S.
    Duke University, Trinity College of Arts and Sciences
    Statistical Science
  • 2010.08 - 2014.12
    B.S.
    Clemson University, College of Business and Behavioral Science
    Financial Management

Awards

  • 2020.02.11
    University of South Carolina Big Data Health Science Conference 2020
    UofSC Big Data Health Science Center
    First place at UofSC Big Data Health Science Conference 2020 case study competition. The objective of the case study was to develop a platform to aid first responders in diagnosing chemical exposure. We developed an interpretable nearest-neighbors model to transparently diagnose exposure to a wide variety of chemical agents.

Publications

  • 2023.03.01
    Topological Simplification of Signals for Inference and Approximate Reconstruction
    2023 IEEE Aerospace Conference
    As Internet of Things (loT) devices become both cheaper and more powerful, researchers are increasingly finding solutions to their scientific curiosities both financially and com- putationally feasible. When operating with restricted power or communications budgets, however, devices can only send highly- compressed data. Such circumstances are common for devices placed away from electric grids that can only communicate via satellite, a situation particularly plausible for environmental sensor networks. These restrictions can be further complicated by potential variability in the communications budget, for ex-ample a solar-powered device needing to expend less energy when transmitting data on a cloudy day. We propose a novel, topology-based, lossy compression method well-equipped for these restrictive yet variable circumstances. This technique, Topological Signal Compression, allows sending compressed sig-nals that utilize the entirety of a variable communications budget. To demonstrate our algorithm's capabilities, we per-form entropy calculations as well as a classification exercise on increasingly topologically simplified signals from the Free- Spoken Digit Dataset and explore the stability of the resulting performance against common baselines.
  • 2022.09.27
    Topological Feature Tracking for Submesoscale Eddies
    Geophysical Research Letters
    Abstract Current state-of-the art procedures for studying modeled submesoscale oceanographic features have made a strong assumption of independence between features identified at different times. Therefore, all submesoscale eddies identified in a time series were studied in aggregate. Statistics from these methods are illuminating but oversample identified features and cannot determine the lifetime evolution of the transient submesoscale processes. To this end, the authors apply the Topological Feature Tracking (TFT) algorithm to the problem of identifying and tracking submesoscale eddies over time. TFT identifies critical points on a set of time-ordered scalar fields and associates those points between consecutive timesteps. The procedure yields tracklets which represent spatio-temporal displacement of eddies. In this way we study the time-dependent behavior of submesoscale eddies, which are generated by a 1-km resolution submesoscale-permitting model. We summarize the submesoscale eddy data set produced by TFT, which yields unique, time-varying statistics.
  • 2021.12.01
    [Whitepaper] Automation is All You Need: Faster Earth Systems Models with AI/ML
    US Department of Energy
    Tropical cyclones can induce extreme water cycle events through dramatic precipitation and storm surge. More reliable models of intensity will translate into better prediction of the impact of extreme events in large scale Earth systems simulations. We demonstrate and describe AI/ML methodologies for rapid assimilation of new, in situ data products.

Skills

Probability and Statistics
Bayesian inference
Statistical modeling
Predictive modeling
Programming
Python
SQL
R
Machine Learning
Deep learning
Natural Language Processing
Computer Vision
Pytorch
Software Development
Object-oriented programming
Docker
Kubernetes
RESTful APIs
Agile methodology
Data Management
PostgreSQL
Neo4j
PySpark