Data Scientist
I turn messy data into models people can trust.
A junior data scientist with a physics and business-informatics background, working end-to-end — from hand-labeling data to deploying inference, with a bias toward results that hold up under scrutiny.
About
I care about the part of machine learning that's easy to skip — knowing when a model is genuinely better, not just different.
I came to data science from physics and business informatics. I like the full loop: framing the question, building the dataset when one doesn’t exist yet, and pressure-testing a result until it’s honest.
My main project, OIRseg, took a multi-class segmentation model from several hundred hand-drawn masks to a validated, deployed web app — Dice 0.916 on the primary class. That mix of careful labeling, honest evaluation, and actually shipping is the work I want more of.
Toolbox
Languages & Data
- Python
- SQL
- Pandas
- Polars
- NumPy
- DuckDB
Machine Learning
- scikit-learn
- CatBoost
- statsmodels
- SciPy
Deep Learning & CV
- PyTorch
- segmentation-models-pytorch
- TensorFlow / Keras
- Segment Anything (SAM)
- ImageJ / Fiji
LLM & RAG
- RAG
- ChromaDB
- embeddings
- Anthropic API
Serving & Apps
- FastAPI
- Streamlit
- Hugging Face Spaces
Cloud & Data Eng
- GCP (BigQuery, Cloud Run)
- Azure (Synapse, ADLS)
- Databricks
- Spark
- MySQL
BI & Viz
- Looker Studio
- Tableau
- Plotly
MLOps & Quality
- Docker
- pytest
- ruff
- pre-commit
- Git / GitHub Actions
Selected Work
A few things I've built.
OIRseg — Retinal Image Segmentation
A multi-class U-Net (PyTorch) measuring disease zones in retinal microscopy — Dice 0.916 on the primary class, deployed as a public web app.
PubMed RAG
Retrieval-augmented Q&A over 980+ PubMed abstracts with local embeddings and citation-grounded answers, served via CLI and FastAPI.
Classical ML Studies
House-price regression, a CatBoost mushroom-edibility classifier (Kaggle), and audio-feature song clustering for mood-based playlists.
Sentiment Analysis Pipeline
A production-style NLP pipeline with split train/predict modules, Docker packaging, and CI-enforced quality gates.
Let's work together.
Open to data science roles and collaborations. The fastest way to reach me is email.