Research

Academic research, publications, and experimental projects

PolySpeech-HS: Multilingual Non-Autoregressive Text-to-Speec...

PolySpeech-HS: Multilingual Non-Autoregressive Text-to-Speec...(expand)

Speech Synthesis & Multilingual AI

Abstract

A non-autoregressive text-to-speech (TTS) multilingual synthesis framework designed to address the linguistic diversity and real-time deployment challenges of Indian languages. By deploying a unified encoder-decoder architecture paired with lightweight hidden-state adapters, PolySpeech-HS enables efficient cross-lingual generalization while preserving language-specific prosodic nuances. Achieved state-of-the-art performance with MOS of 4.30, MCD of 4.7 dB, and RTF of 0.13 across six Indian languages.

A non-autoregressive text-to-speech (TTS) multilingual synthesis framework designed to address the linguistic diversity ...

IEEE Transactions on Audio, Speech and Language Processing

2025

Vellore Institute of Technology

TTS

Non-Autoregressive

Hidden-State Adapters

Multilingual AI

Indian Languages

AMO-HSA

A Novel Data-Centric Transformer Fine-Tuning: A Modular Fram...

A Novel Data-Centric Transformer Fine-Tuning: A Modular Fram...(expand)

Large Language Models & Domain Adaptation

Abstract

A data-centric, hardware-light workflow for fine-tuning transformers that sidesteps costly LLM APIs. Automatically scrapes high-signal web content and converts it into Q&A pairs to fine-tune a GPT-2-Medium model (355M parameters) in ~7 minutes on a single RTX-3060. Achieves 67.3% accuracy (+34% over base model) with 1.4s median latency and zero inference cost.

A data-centric, hardware-light workflow for fine-tuning transformers that sidesteps costly LLM APIs. Automatically scrap...

IEEE Transactions on Computational Social Systems

2025

Vellore Institute of Technology

GPT-2

LoRA

8-bit Adam

Domain Adaptation

Next.js

Q&A Generation

Fine-tuning

Fine-Tuning Mistral 22B: The First Large Language Model for ...

Fine-Tuning Mistral 22B: The First Large Language Model for ...(expand)

Low-Resource Language Processing

Abstract

The first fine-tuned Large Language Model specifically engineered for Assamese, a low-resource Indo-Aryan language spoken by approximately 15 million individuals. Introduces AssamText-750K dataset and custom Unicode mapping system exclusively for Assamese. This pioneering work becomes the first and only Assamese LLM backed by language-specific Unicode infrastructure, achieving 20% average improvement across text generation fluency, sentiment analysis accuracy, and Assamese-to-English translation quality.

The first fine-tuned Large Language Model specifically engineered for Assamese, a low-resource Indo-Aryan language spoke...

IEEE Transactions on Neural Networks and Learning Systems

2025

Vellore Institute of Technology

Mistral 22B

LoRA

Unicode Mapping

Assamese NLP

Low-Resource Languages

AssamText-750K