Data Scientist
I turn messy data into models people can trust.
A junior data scientist with a background in business informatics. I work end to end, from hand-labeling data to deploying inference, and I care about results that hold up under scrutiny.
About
I care about the part of machine learning that's easy to skip: checking whether a model is genuinely better before I trust the number.
I came to data science from business informatics. I like the whole loop: framing the question, building a dataset when none exists yet, then testing the result until it holds up.
My main project, OIRseg, took a segmentation model from a few hundred hand-drawn masks to a validated web app that scores Dice 0.916 on its primary class. I want more work like that, where careful labeling and honest evaluation lead to something that actually ships.
Toolbox
Languages & Data
- SQL
- NumPy
- Python
- Pandas
- Polars
- DuckDB
Machine Learning
- SciPy
- CatBoost
- statsmodels
- scikit-learn
Deep Learning & CV
- PyTorch
- ImageJ / Fiji
- TensorFlow / Keras
- segmentation-models-pytorch
LLM, RAG & Agents
- RAG
- chunking
- LangChain
- reranking
- embeddings
- Fine-tuning
- Vector databases
- Multi-agent orchestration
Serving & Apps
- FastAPI
- Streamlit
- Hugging Face Spaces
Cloud & Data Eng
- Spark
- MySQL
- Databricks
- Azure (Synapse, ADLS)
- GCP (BigQuery, Cloud Run)
BI & Viz
- Plotly
- Tableau
- Power BI
- Looker Studio
MLOps & Quality
- ruff
- Docker
- pytest
- pre-commit
- Git / GitHub Actions
- Evaluation & observability
Selected Work
A few things I've built.
OIRseg — Retinal Image Segmentation
A multi-class U-Net (PyTorch) that measures disease zones in mouse retinal images, replacing hours of manual tracing per image. It scores Dice 0.916 on the primary class and ships as a Streamlit app on Hugging Face with a FastAPI endpoint.
Life Points — Nutrition & Activity Analytics
An offline-first PWA for logging activities and meals against daily calorie and macro targets. Its Insights tab smooths noisy weight logs with a 7-day moving average and forecasts goal-weight dates using an energy-balance model.
Classical ML Studies
Several scikit-learn studies from my data bootcamp. The house-price models went through four iterations of pipelines and feature engineering. Others include a CatBoost mushroom-edibility classifier from a Kaggle challenge and K-Means song clustering with PCA for mood-based playlists.
PubMed RAG
A privacy-first RAG system over 980+ PubMed abstracts on retinal disease. Embeddings run locally, so only the final question leaves the machine. A FastAPI service returns citation-grounded answers, and it plugs into OIRseg to interpret segmentation results against the literature.
Let's work together.
Open to data science roles and collaborations. The fastest way to reach me is email.