
Production AI Voice Agent Pipeline
End-to-end real-time voice agents built on Pipecat at sub-100ms latency — STT, LLM reasoning, and TTS in one streaming pipeline, with VoiceSHIELD-Small inline to neutralize adversarial inputs.
Building
I design production-ready AI systems across Voice AI, LLMs, RAG, and agentic pipelines. Co-author of an arXiv-published voice-security model reaching 99.16% accuracy at 90 ms latency.
I'm an AI/ML Engineer focused on the parts of machine learning that reach real users — real-time voice agents, retrieval-augmented assistants, and agentic systems that reason and act. I care about latency, robustness, and shipping models that hold up in production.
Currently pursuing an MSc in Data Science, I co-authored VoiceSHIELD-Small, a single-pass model for real-time malicious-speech detection published on arXiv. My work spans the full stack of applied AI: STT/TTS pipelines, LLM orchestration, RAG, and the evaluation discipline needed to trust what these systems do.
Bangalore, India
Technical Skills
Focus Areas

End-to-end real-time voice agents built on Pipecat at sub-100ms latency — STT, LLM reasoning, and TTS in one streaming pipeline, with VoiceSHIELD-Small inline to neutralize adversarial inputs.

Full-stack multimodal coach for 24/7 empathetic dialogue using Gemini (reasoning), Whisper (STT), and Murf.ai (TTS). Ranked Top 5 of 200+ teams at Christ University Hackathon 2025.

A RAG-powered assistant that retrieves and synthesizes cardiology literature for evidence-based decision support — Gemini Text-Embedding-004 + ChromaDB for semantic search with source attribution.

A conversational chatbot integrating LLM APIs with Chainlit to automate student enquiries about admissions, fees, scholarships, and hostel facilities.

Real-time facial recognition at 98% accuracy, deployed for a live student cohort. Cut manual attendance time by 90%, serving 200+ students per session.

Predicted customer churn at 85% accuracy (vs. 72% baseline), enabling targeted retention across a dataset of 7,000+ customers.
arXiv:2603.07708 [cs.SD] · Submitted June 2026
Abstract — We present a single-pass model that combines speech transcription and malicious-intent detection over the Whisper-small encoder, protecting voice AI systems from prompt injection, social engineering, and adversarial voice commands without a second model in the loop. The unified architecture reaches 99.16% threat-detection accuracy at 90 ms inference latency.
St. Aloysius Institute of Management & IT (AIMIT)
Specializing in Voice AI, LLMs, and Agentic Systems. CGPA 9.27
SDM College of Business Management
Advanced problem-solving for technical challenges. CGPA 8.44