Back to research

arXiv preprint

mmWave Radar Aware Dual-Conditioned GAN for Speech Reconstruction of Signals With Low SNR

Built a two-stage RAD-GAN pipeline to reconstruct intelligible full-band speech from noisy, band-limited mmWave captures (-5 dB to -1 dB), with a radar-tailored Multi-Mel Discriminator and Residual Fusion Gate. Even with limited data, no pre-trained modules, and no augmentations, the method outperformed SOTA approaches for this task.

2026

Regular microphones capture audio waves through the air, radar sensors on the other hand capture the surface vibrations to capture millimeter waves. These waves are then processed to produce intelligible speech. Now, the issue with radar capture is that the environments are generally extremely noisy, making audio reconstruction a near impossible task, especially in scenarios with low SNR (Signal to Noise Ratio). Our work tries to solve this problem by introducing a novel two-stage Radar Aware Dual Path-Generative Adversarial Network (RAD-GAN) pipeline that is specifically designed to handle the unique challenges of mmWave radar captures

What makes our work stand out is that we designed a radar-tailored Multi-Mel Discriminator and a Residual Fusion Gate to effectively reconstruct intelligible full-band speech from noisy, band-limited mmWave captures with SNRs as low as -5 dB. Even with limited data, no pre-trained modules, and no augmentations, our method outperformed state-of-the-art approaches for this task.

This research has significant implications for the development of more robust and effective speech reconstruction techniques in noisy environments, which could have applications in various fields such as telecommunications, assistive technologies, and surveillance.

The work is currently under review for publication in Interspeech 2026, and we have also made a preprint available on arXiv for the community to access and build upon. We have also built a demo website with the output spectrograms, comparison with baselines and even audio outputs so that you cant ake a look at what the output from the models we built sound like! Check it out here.

This work was done by myself and two of my wingies - Deepan Roy and Jash Karani, along with the kind support of Prof. Sandeep Joshi. Looking back, this work is by far and easily the coolest thing I've managed to do over the course of my 4 years of undergrad.

Deepan and I had a little bit of experience dabbling around with DL frameworks but we had never built anything even remotely from scratch ourselves. Jash is an Electronics and Comms. Engineer that focussed on the Digital Signal Processing aspect of the work. I never thought I'd get to collaborate with someone from a completely different field than mine, that too on a project that is so technically dense.

We have a few more ideas on where to take this work, but for now we hope we can make it to Australia for Interspeech 2026!

ChittemGPTAsk anything from the site