Projects

Applied and research projects at Wadhwani AI and during my PhD at CVIT, IIIT Hyderabad.

Wadhwani AI

I lead or have led ML teams for the projects below. Currently: Oral Reading Fluency (ORF) and SEAP (ASR). Previously (2023–2025): Health Sentinel, Krishi 24/7, drug adherence from handwritten forms, and language technologies for low-resource languages.

Oral Reading Fluency (ORF)

2025 – Present

ORF (also deployed as Vachan Samiksha) listens to students read aloud and uses speech recognition to help teachers assess reading fluency at scale. ORF is deployed across multiple states in India, with millions of students assessed for reading fluency.

SEAP — spoken English assessment and practice

2025 – Present

SEAP helps students—especially those from regional-language backgrounds—practice spoken English and build confidence. The system uses automatic speech recognition to evaluate fluency, pronunciation, and accuracy, providing structured feedback as learners record passages.

Health Sentinel — real-time disease surveillance

2023 – 2025

ML pipeline to detect potential disease outbreaks from health-related events reported in online news and media. The system extracts structured signals from large volumes of articles so public-health teams can review and act on emerging events faster than manual media monitoring alone. Deployed as part of India’s media disease surveillance (MDS) program.

Krishi 24/7 — media surveillance for agriculture

2023 – 2025

AI-driven monitoring of agriculture-related news and media (Krishi 24/7) to surface relevant articles and signals for agricultural programs. Combines NLP and information extraction across regional languages to help teams track themes across large, noisy news corpora.

Drug adherence from handwritten forms

2023 – 2025

Document understanding for public-health workflows: table understanding and OCR to extract drug-adherence information from handwritten forms. Reduces manual data entry and supports more reliable tracking in field programs.

Language technologies for low-resource languages

2023 – 2025

Development of OCR, speech-to-text, text-to-speech, and machine translation for an extremely low-resource language, supporting preservation and access to manuscripts and other cultural artefacts. Part of the programme to strengthen cultural resilience of Tibetan communities, in collaboration with the Department of Religion and Culture and Monlam Tibetan IT Research Center.

IIIT Hyderabad (CVIT)

PhD and graduate research (2013–2023) on document image understanding, OCR, and scene text.

Document Visual Question Answering

2019 – 2023

This project was conceived as a joint effort between CVIT, IIIT Hyderabad and CVC, UAB Barcelona. The primary focus of the project is to motivate the Document Analysis community to look beyond traditional document analysis tasks and to strive for building systems with true "Document Understanding" capabilities. In partnership with industry partners, we introduced multiple tasks for QA/VQA on document images and conducted open challenges for these new tasks in leading computer vision and document analysis conferences. More details are available at docvqa.org.

Scene Text Understanding

2015 – 2019

CVIT has been working in this space for the last few years and has made significant contributions in scene text recognition prior to the deep learning wave. The IIIT5k scene text dataset is one of the most widely used datasets in this field. I joined this project and looked into scene text recognition in an unconstrained manner in a seq2seq framework. More details on our work can be found here. Scene text recognition models from this work are available on Bhashini, India's national language technology platform.

Indian Languages OCR

2013 – 2023

IIIT Hyderabad has been involved with the development of OCR for Indian languages since the conception of the DLI project by the government of India. I joined this group and contributed towards OCR for Indian languages. Despite the myriad of challenges in the Indian language space, compared to the Latin counterparts, we could achieve state of the art recognition accuracies in 12+ Indian languages. A comprehensive technical report on use of CTC based segmentation free approach for OCR of Indian languages is presented in this work. We follow a segmentation free approach to directly transcribe the text lines into sequences of unicodes. The OCRs developed as part of this effort are used in other related projects such as the Government of India project on information access from Indian language document collections, the Audiobooks project at IIIT Hyderabad, and Bhashini (printed-text OCR).

Audio Books for the Visually Challenged

2013 – 2015

The project was an offshoot of the OCR project. Here we worked in collaboration with the Speech lab at IIIT to make audio books in DAISY standard for the visually challenged. An OCR+TTS workflow was set up starting from scanning of the document. Our team developed web-based and desktop-based apps for audio book playback and deployed these applications at various schools for the visually impaired children. More details on this effort can be found at Bhasha Audible.

Earlier

Router Security using raw sockets @ Cisco, Bangalore

Undergrad project, 2008

To develop an access control framework for router security. Access Control Lists (ACL) filter network traffic by controlling whether routed packets are forwarded at the router. The decision whether to drop or forward the packet is based on the filter criteria set using the ACL.