brihat.ai
selected work

Projects

Things I have built or am building on my own time.

Agent Judge Calibration

A study of how reliably LLM judges score agentic tool-use trajectories. Measures inter-judge agreement across a five-axis rubric and shows that agents with similar success rates can have very different failure modes.

LLM evaluationagentsjudgesresearch

Medical Concept Features in Open-Weight LLMs

In progress

An interpretability project training sparse autoencoders on the residual stream of an open-weight model (Gemma-2-2B) to isolate features for medical concepts: drugs, diseases, procedures, symptoms. Features are grounded against medical ontologies (UMLS, SNOMED CT, RxNorm), used to steer behavior on a medical QA benchmark, and probed for spurious correlates, for example a “diabetes” feature that is really “age over 60.”

interpretabilitySAEsclinical NLPsafety

DriftGuard

A drift-detection system that catches silent performance degradation in ML workloads on Kubernetes. It scrapes training and inference metrics through Prometheus, compares baseline against recent windows by mean-shift and slope, and classifies severity from P0 to P3. The inference side models agent swarms (planner, executor, reviewer) and surfaces per-agent drift, down to a loopiness score.

drift detectionKubernetesobservabilityagents

Exoplanet Atlas

An interactive explorer for the 6,000+ confirmed exoplanets in NASA's Exoplanet Archive. Browse and filter the catalog, fly through 3D orbit visualizations with a habitable-zone overlay, compare planets side by side, and dig into discovery statistics, guides, and games.

data vizThree.jsastronomyNext.js

Cadenza

A music theory, ear training, and improvisation practice platform for guitarists. Interactive fretboard diagrams across 21 modes, Web Audio ear-training drills, a lick library with TAB, an AI improvisation coach, and a multi-layer engine that generates and scores chord progressions.

Web Audiomusic theoryAINext.js

Invariant

A paper-centric community platform for physicists. It unifies metadata from arXiv and OpenAlex, gives every paper a page for summaries, discussion, and replication notes, and sends signal-driven weekly briefs. Built with LLM-generated summaries and entity extraction, hybrid keyword-and-vector search, and ORCID sign-in.

physicsfull-stacksearchLLM