Neural synthesis of footsteps sound effects
with generative adversarial networks

Marco Comunità, Huy Phan, Joshua D. Reiss

Paper Code

Abstract


Footsteps are among the most ubiquitous sound effects in multimedia applications. There is substantial research into understanding the acoustic features and developing synthesis models for footstep sound effects. In this paper, we present a first attempt at adopting neural synthesis for this task. We implemented two GAN-based architectures and compared the results with real recordings as well as six traditional sound synthesis methods. Our architectures reached realism scores as high as recorded samples, showing encouraging results for the task at hand.


Neural Synthesis Samples


Here we compare recorded samples with the ones synthesised by our implementations (WaveGAN and HiFi-WaveGAN). To prepare the synthesised walks we did not curate or select the samples, we just picked them randomly.

Material Real WaveGAN HiFi-WaveGAN
Carpet
Deck
Metal
Pavement
Rug
Wood
Wood Internal

Traditional Synthesis Samples - Non data-driven methods


Here we include samples from several "traditional" non data-driven synthesis methods. By non data-driven we mean those methods that do not rely on a reference which they re-synthesise/learn from. Specifically: 3 procedural models (PM1, PM2, PM3) and Additive Synthesis (Add). PM1 and Add have no control over type of shoe or surface material. PM2 offers control over the surface material. PM3 offers control over shoe type and surface material.

PM1 - Fontana PM2 - Farnell PM3 - Nemisindo Add - Verron

 

Dirt

Concrete - Boots

 

Grass

Concrete - High Heels

 

Gravel

Concrete - Oxford

 

Snow

Concrete - Trainer

 

Wood

Dirt - Boots

 

Dirt - High Heels

 

Dirt - Oxford

Dirt - Trainers

Grass - Boots

Grass - High Heels

Grass - Oxford

Grass - Trainers

Gravel - Boots

Gravel - High Heels

Gravel - Oxford

Gravel - Trainers

Traditional Synthesis Methods - Data-driven methods


Here we include samples from several "traditional" data-driven synthesis methods. By data-driven we mean those methods that do rely on a reference which they re-synthesise/learn from. Specifically: Sinusoidal plus stochastic (SPS) and Statistical Modelling (Stat). We also include the real recordings they re-synthesised/learned from.

Material - Shoe Real SPS Stat
Carpet - Trainers
Carpet - Women Shoes
Concrete - Sandal
Concrete - Soccer Shoes
Concrete - Trainers
Dirt - Sneakers
Gravel - Boots
Metal - N.A.
Metal - N.A.
Snow - Boots
Tiles - Boots
Wood - Boots
Wood - Sneakers
Woodboard - Boots

Citation


                
    @conference{comunit2022neural,
        title={Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks},
        author={Comunità, Marco and Phan, Huy and Reiss, Joshua D.},
        booktitle={Audio Engineering Society Convention 152},
        month={May},
        year={2022},
        url={http://www.aes.org/e-lib/browse.cfm?elib=21696}
    }