Hi
I'm
I'm a Machine Learning Engineer specializing in LLM evaluation and AI security. I'm a contributor to EleutherAI's lm-evaluation-harness, a published researcher (IEEE, Springer), and I've built production AI at scale (400M+ records/month) with monitoring, guardrails, and responsible-AI practice.
01 / Selected Work
02 / Open Source
Contributed InfiniteBench — 11 long-context evaluation tasks (retrieval, code, math, novel QA, dialogue, EN/ZH) — to the field-standard open-source LLM evaluation framework used for benchmarking, red-teaming, and safety evaluation. 799 lines across 15 files, with scoring matched to the official implementation.
Open-source fMRI brain-encoding toolkit extending Meta's TRIBE v2. Published on PyPI (cortexlab-toolkit) and HuggingFace, with a live demo, 143 tests, and 4 community contributors.
03 / Experience
Seed-Stage Health-Tech Startup (NDA)
Seed-Stage Health-Tech Startup (NDA)
04 / Toolkit
05 / Research
Peer-reviewed work in AI security and trust — deepfake detection, ML for cybersecurity, and threats in AI-integrated cloud systems.
06 / Writing
Notes on building ML systems, evaluating frontier models, and lessons from production.
Why off-the-shelf benchmarks aren't enough, and how I built a custom harness with consistency scoring, calibration, and chain-of-thought quality analysis.
Read rag & agentsArchitecture decisions, FAISS optimization, and why agent evaluation is harder than agent building. From processing 2.3M figures, 890K tables, and 410K equations.
Read research toolingExtending TRIBE v2 with streaming inference, modality attribution, brain-alignment benchmarking, and cognitive-load scoring, built into a released toolkit.
Read07 / Education
Stevens Institute of Technology
Hoboken, NJSwami Rama Himalayan University
Dehradun, India08 / Contact
Graduating December 2026. Open to Applied AI, ML, and research-engineering roles, especially in LLM evaluation and AI security. Based in Hoboken, NJ; open to New York and remote.