Marco Comunità, Huy Phan, Joshua D. Reiss
Paper CodeFootsteps are among the most ubiquitous sound effects in multimedia applications. There is substantial research into understanding the acoustic features and developing synthesis models for footstep sound effects. In this paper, we present a first attempt at adopting neural synthesis for this task. We implemented two GAN-based architectures and compared the results with real recordings as well as six traditional sound synthesis methods. Our architectures reached realism scores as high as recorded samples, showing encouraging results for the task at hand.
Here we compare recorded samples with the ones synthesised by our implementations (WaveGAN and HiFi-WaveGAN). To prepare the synthesised walks we did not curate or select the samples, we just picked them randomly.
Material | Real | WaveGAN | HiFi-WaveGAN |
---|---|---|---|
Carpet | |||
Deck | |||
Metal | |||
Pavement | |||
Rug | |||
Wood | |||
Wood Internal |
Here we include samples from several "traditional" non data-driven synthesis methods. By non data-driven we mean those methods that do not rely on a reference which they re-synthesise/learn from. Specifically: 3 procedural models (PM1, PM2, PM3) and Additive Synthesis (Add). PM1 and Add have no control over type of shoe or surface material. PM2 offers control over the surface material. PM3 offers control over shoe type and surface material.
PM1 - Fontana | PM2 - Farnell | PM3 - Nemisindo | Add - Verron |
---|---|---|---|
|
Dirt |
Concrete - Boots |
|
Grass |
Concrete - High Heels |
|
|
Gravel |
Concrete - Oxford |
|
|
Snow |
Concrete - Trainer |
|
|
Wood |
Dirt - Boots |
|
|
Dirt - High Heels |
|
||
Dirt - Oxford |
|||
Dirt - Trainers |
|||
Grass - Boots |
|||
Grass - High Heels |
|||
Grass - Oxford |
|||
Grass - Trainers |
|||
Gravel - Boots |
|||
Gravel - High Heels |
|||
Gravel - Oxford |
|||
Gravel - Trainers |
Here we include samples from several "traditional" data-driven synthesis methods. By data-driven we mean those methods that do rely on a reference which they re-synthesise/learn from. Specifically: Sinusoidal plus stochastic (SPS) and Statistical Modelling (Stat). We also include the real recordings they re-synthesised/learned from.
Material - Shoe | Real | SPS | Stat |
---|---|---|---|
Carpet - Trainers | |||
Carpet - Women Shoes | |||
Concrete - Sandal | |||
Concrete - Soccer Shoes | |||
Concrete - Trainers | |||
Dirt - Sneakers | |||
Gravel - Boots | |||
Metal - N.A. | |||
Metal - N.A. | |||
Snow - Boots | |||
Tiles - Boots | |||
Wood - Boots | |||
Wood - Sneakers | |||
Woodboard - Boots |
@conference{comunit2022neural,
title={Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks},
author={Comunità, Marco and Phan, Huy and Reiss, Joshua D.},
booktitle={Audio Engineering Society Convention 152},
month={May},
year={2022},
url={http://www.aes.org/e-lib/browse.cfm?elib=21696}
}