Resume
This is a summary of my resume. There is a more detailed version of my resume that you can download by clicking the PDF icon in the top right of the page.
Table of contents
Basics
Name | Sam Voisin |
Label | Data Scientist and Software Engineer |
samvoisin@protonmail.com | |
Url | https://www.samvoisin.com/ |
Summary | Skilled data scientist and software engineer with extensive experience in developing robust data pipelines, scalable architectures, and efficient algorithms. Proficient in integrating data science techniques into software solutions, with a strong foundation in statistical modeling, machine learning, and software design. Seeking a position to contribute to innovative data-driven projects. |
Work
-
2023.10 - Present
Senior Data Scientist
Tradewind Data Science, Chicago, IL
Analyzed consumer trends and provided actionable insights for data-driven decision-making. Improved client revenue through advanced econometric modeling, optimizing pricing strategies.
-
2022.03 - 2023.10
Data Scientist
Infinia ML, Durham, NC
Developed flexible document processing pipelines for information extraction and classification. Designed and implemented large language model (LLM) infrastructure.
-
2020.06 - 2022.03
Data Scientist
Geometric Data Analytics, Inc, Durham, NC
Research and development for clients including DARPA, NRL, and AFRL. Developed novel algorithms for oceanographic research, remote sensing, and pattern-of-life modeling. See publications.
-
2019.04 - 2019.09
Research Assistant
Duke University, Durham, NC
Crafted research goals, planned and executed experiments, designed data pipelines, and optimized MCMC samplers for Bayesian hierarchical regression models.
-
2015.01 - 2018.06
Analyst
Ally Financial Services, Charlotte, NC
Analyzed financial market data and business metrics to mitigate business risk. Automated data gathering and processing. Acted as program lead and mentor for department internship program.
Education
Awards
-
2020.02.11
University of South Carolina Big Data Health Science Conference 2020
UofSC Big Data Health Science Center
First place at UofSC Big Data Health Science Conference 2020 case study competition. The objective of the case study was to develop a platform to aid first responders in diagnosing chemical exposure. We developed an interpretable nearest-neighbors model to transparently diagnose exposure to a wide variety of chemical agents.
Publications
-
2023.03.01 Topological Simplification of Signals for Inference and Approximate Reconstruction
2023 IEEE Aerospace Conference
As Internet of Things (loT) devices become both cheaper and more powerful, researchers are increasingly finding solutions to their scientific curiosities both financially and com- putationally feasible. When operating with restricted power or communications budgets, however, devices can only send highly- compressed data. Such circumstances are common for devices placed away from electric grids that can only communicate via satellite, a situation particularly plausible for environmental sensor networks. These restrictions can be further complicated by potential variability in the communications budget, for ex-ample a solar-powered device needing to expend less energy when transmitting data on a cloudy day. We propose a novel, topology-based, lossy compression method well-equipped for these restrictive yet variable circumstances. This technique, Topological Signal Compression, allows sending compressed sig-nals that utilize the entirety of a variable communications budget. To demonstrate our algorithm's capabilities, we per-form entropy calculations as well as a classification exercise on increasingly topologically simplified signals from the Free- Spoken Digit Dataset and explore the stability of the resulting performance against common baselines.
-
2022.09.27 Topological Feature Tracking for Submesoscale Eddies
Geophysical Research Letters
Abstract Current state-of-the art procedures for studying modeled submesoscale oceanographic features have made a strong assumption of independence between features identified at different times. Therefore, all submesoscale eddies identified in a time series were studied in aggregate. Statistics from these methods are illuminating but oversample identified features and cannot determine the lifetime evolution of the transient submesoscale processes. To this end, the authors apply the Topological Feature Tracking (TFT) algorithm to the problem of identifying and tracking submesoscale eddies over time. TFT identifies critical points on a set of time-ordered scalar fields and associates those points between consecutive timesteps. The procedure yields tracklets which represent spatio-temporal displacement of eddies. In this way we study the time-dependent behavior of submesoscale eddies, which are generated by a 1-km resolution submesoscale-permitting model. We summarize the submesoscale eddy data set produced by TFT, which yields unique, time-varying statistics.
-
2021.12.01 [Whitepaper] Automation is All You Need: Faster Earth Systems Models with AI/ML
US Department of Energy
Tropical cyclones can induce extreme water cycle events through dramatic precipitation and storm surge. More reliable models of intensity will translate into better prediction of the impact of extreme events in large scale Earth systems simulations. We demonstrate and describe AI/ML methodologies for rapid assimilation of new, in situ data products.
Skills
Probability and Statistics | |
Bayesian inference | |
Statistical modeling | |
Predictive modeling |
Programming | |
Python | |
SQL | |
R |
Machine Learning | |
Deep learning | |
Natural Language Processing | |
Computer Vision | |
Pytorch |
Software Development | |
Object-oriented programming | |
Docker | |
Kubernetes | |
RESTful APIs | |
Agile methodology |
Data Management | |
PostgreSQL | |
Neo4j | |
PySpark |