Siddhant Rajhans | ML Engineer — LLM Evaluation & AI Security

01 / Selected Work

Explore My Latest Projects

CortexLab: Brain Encoding Toolkit

Multimodal fMRI brain encoding toolkit built on Meta's TRIBE v2. GPU voxelwise ridge (Triton), causal modality lesion analysis, brain-alignment benchmarking, 3D brain viewer, live inference from webcam/screen/video. 143 tests, 4 contributors, published on PyPI and HuggingFace.

PyTorchTritonfMRI3D VizLive Inference Code

Bloomberg Sentiment × FinBERT

Empirical test of whether Bloomberg's NEWS_SENTIMENT_DAILY_AVG can be replicated with open-source FinBERT on a 30-stock S&P 100 universe (2018-2026, 62.8K stock-day obs). Multi-horizon IC, long-short quintile backtest, Fama-French 3 with Newey-West HAC. Replication fails (ρ = −0.26) — and that's the point.

Bloomberg APIFinBERTstatsmodelsFama-FrenchNewey-West Code

DreamStudio: AI Cinematic Story Director

AI-powered cinematic story director. Speak naturally, point your camera, and watch scenes materialize as images, video, and music in real-time. Full-stack app with web, mobile, and Python backend.

Gemini LiveImagen 4Veo 3.1Lyria 2FlutterPython Code

Messaging-Based Data Pipeline

End-to-end streaming pipeline processing 400M+ records/month with 65s latency (p95). Kappa architecture with event-time processing, watermarking, and fault tolerance built for bursty event ingestion.

KafkaPySparkDockerAirflowHiveNiFi Code

Multimodal RAG for Scientific Literature

Textual + multimodal RAG over 500K+ papers with a LangGraph agent that runs a grounding-validation step, flagging weakly-supported answers instead of returning them (a hallucination guardrail). FAISS retrieval with cross-encoder reranking; 94% recall on an internal evaluation set.

LangGraphSciBERTCLIPTAPASFAISSGrounding Code

Bike Lane Sentinel

Computer vision system that monitors illegal vehicle encroachment into NYC bike lanes and automatically alerts NYC DOT. Real-time detection with interactive dashboard and evidence capture.

Computer VisionTypeScriptReactNode.jsReal-time Code

02 / Open Source

Open Source & Contributions

PR #3662 · Merged

EleutherAI lm-evaluation-harness

Contributed InfiniteBench — 11 long-context evaluation tasks (retrieval, code, math, novel QA, dialogue, EN/ZH) — to the field-standard open-source LLM evaluation framework used for benchmarking, red-teaming, and safety evaluation. 799 lines across 15 files, with scoring matched to the official implementation.

LLM EvaluationPythonLong-ContextBenchmarks

PyPI · HuggingFace · Live Demo

CortexLab

Open-source fMRI brain-encoding toolkit extending Meta's TRIBE v2. Published on PyPI (cortexlab-toolkit) and HuggingFace, with a live demo, 143 tests, and 4 community contributors.

GitHub PyPI Live Demo

03 / Experience

Built Production Systems & Shipped Real Products.

Oct 2023 – Dec 2024

Machine Learning Engineer

Seed-Stage Health-Tech Startup (NDA)

Designed and shipped streaming ML pipelines on AWS (Kafka, Airflow, Docker) with the data engineering team, processing 400M+ healthcare records/month at <65s p95 latency.
Developed transformer-based NLP and multimodal retrieval pipelines for clinical notes, lab reports, and diagnostic documents, improving retrieval accuracy by 30%.
Led the retrieval and prompt-orchestration workstream of an agentic AI system powering adaptive, personalized clinical-facing interactions.
Built model monitoring and data validation checks tracking data drift, prediction quality, and data integrity across clinical ML workflows.

AWSKafkaAirflowDockerNLPAgentic AI

Apr – Sep 2023

Software Development Engineer Intern

Seed-Stage Health-Tech Startup (NDA)

Contributed to developing backend services and APIs supporting ML-driven features, helping reduce average request latency by 40% by moving synchronous flows to async processing.
Helped improve ML model serving pipelines with caching and request batching, making clinical-facing features more responsive.
Assisted senior engineers in strengthening API security: input validation, authentication middleware, and audit logging for sensitive healthcare data.

ML ServingAPIsBackend OptimizationSecurity

04 / Toolkit

Technologies and Tools I Work With

LLM Evaluation & AI Security

LLM Evaluationlm-evaluation-harnessRed-teamingGuardrailsGrounding / FaithfulnessModel Monitoring & DriftResponsible AI (NIST AI RMF)

LLM, Agents & RAG

LangGraphRAGAgentic OrchestrationTool CallingPrompt EngineeringBenchmark Design

Core ML & Deep Learning

PyTorchTensorFlowScikit-learnTransformersHugging FaceCNNs

Data, Infra & MLOps

PythonSQLApache SparkKafkaFAISSPostgres/PgvectorAWS (SageMaker, S3)DockerAirflowMLflowW&B

05 / Research

Research Publications

Peer-reviewed work in AI security and trust — deepfake detection, ML for cybersecurity, and threats in AI-integrated cloud systems.

IEEE ICCE 2025

CNN-Based Detection Mechanism for Deepfake Image

A Mishra, S Rajhans, BB Gupta, KT Chui

View on Google Scholar

Springer, ICRTC 2025

The Evolving Landscape of Cloud Computing: AI Integration, Threats, Challenges and Security Concerns

P Singh, S Rajhans

View on Google Scholar

Springer, ICSPN 2023

Machine Learning and AI in Cybersecurity: Insights and Solutions

S Rajhans, A Mishra

View on Google Scholar

06 / Writing

From the Blog

Notes on building ML systems, evaluating frontier models, and lessons from production.

evaluation

Read all posts

07 / Education

Education & Credentials

2025–2026

MS in Machine Learning

Stevens Institute of Technology

Hoboken, NJ

2019–2023

B.Tech in Computer Science & Engineering

Swami Rama Himalayan University

Dehradun, India

Certifications

Weights & Biases Training & Fine-tuning LLMs Verify credential

Stanford / Coursera Machine Learning Verify credential

NVIDIA Generative AI with Diffusion Models Verify credential

IBM Cloud Computing Diploma Verify credential

08 / Contact

Let's build something.

Graduating December 2026. Open to Applied AI, ML, and research-engineering roles, especially in LLM evaluation and AI security. Based in Hoboken, NJ; open to New York and remote.

→ Get in touch Download resume

Siddhant Rajhans.

Explore My Latest Projects

CortexLab: Brain Encoding Toolkit

Bloomberg Sentiment × FinBERT

DreamStudio: AI Cinematic Story Director

Messaging-Based Data Pipeline

Multimodal RAG for Scientific Literature

Bike Lane Sentinel

Open Source & Contributions

EleutherAI lm-evaluation-harness

CortexLab

Built Production Systems & Shipped Real Products.

Machine Learning Engineer

Software Development Engineer Intern

Technologies and Tools I Work With

LLM Evaluation & AI Security

LLM, Agents & RAG

Core ML & Deep Learning

Data, Infra & MLOps

Research Publications

CNN-Based Detection Mechanism for Deepfake Image

The Evolving Landscape of Cloud Computing: AI Integration, Threats, Challenges and Security Concerns

Machine Learning and AI in Cybersecurity: Insights and Solutions

From the Blog

Building an LLM Evaluation Framework from Scratch

Lessons from a Multimodal RAG System over 500K Papers

Building CortexLab on Meta's Brain-Encoding Model

Education & Credentials

MS in Machine Learning

B.Tech in Computer Science & Engineering

Certifications

Let's build something.