Marco Comunità, Dan Stowell, Joshua D. Reiss
Paper Code Models + ResultsDataset:
Despite the popularity of guitar effects, there is very little existing research on classification and parameter estimation of specific plugins or effect units from guitar recordings. In this paper, convolutional neural networks were used for classification and parameter estimation for 13 overdrive, distortion and fuzz guitar effects. A novel dataset of processed electric guitar samples was assembled, with four sub-datasets consisting of monophonic or polyphonic samples and discrete or continuous settings values, for a total of about 250 hours of processed samples. Results were compared for networks trained and tested on the same or on a different sub-dataset. We found that discrete datasets could lead to equally high performance as continuous ones, whilst being easier to design, analyse and modify. Classification accuracy was above 80\%, with confusion matrices reflecting similarities in the effects timbre and circuits design. With parameter values between 0.0 and 1.0, the mean absolute error is in most cases below 0.05, while the root mean square error is below 0.1 in all cases but one.
To assemble our dataset we selected 13 overdrive, distortion and fuzz plug-ins designed to emulate some of the most iconic and widely used analogue guitar effect pedals. All the plugins have 2 or 3 controls and, regardless of the specific name adopted by the designer, the controls can be identified by their processing function: Level, Gain, Tone/Equalisation. For training and testing purposes, 4 sub-datasets were generated: Mono Discrete , Mono Continuous, Poly Discrete and Poly Continuous.
Designer | Plugin | Emulation of | Id |
---|---|---|---|
Audified | MultiDrive Pedal Pro | Ibanez TS808 | 808 |
Ibanez TS9 | TS9 | ||
Boss BD2 | BD2 | ||
Boss OD1 | OD1 | ||
Boss SD1 | SD1 | ||
Boss DS1 | DS1 | ||
ProCo Rat | RAT | ||
MXR Distortion+ | DPL | ||
Arbiter Fuzz Face | FFC | ||
Electro-Harmonix Big Muff | BMF | ||
Mercuriall | Greed Smasher | Mesa/Boogie Grid Slammer | MGS |
Analog Obsession | Pig Pie | Electro-Harmonix Russian Big Muff | RBM |
Zupaa | Vox Tone Bender | VTB |
The first two subsets (Mono Discrete, Poly Discrete) use a discrete set of combinations selected as the most common and representative settings a person might use: Gain = [0.0, 0.1, 0.2, 0.5, 0.8, 1.0], Tone/Eq = [0.0, 0.2, 0.5, 0.8, 1.0]. Also, since the Level control has no effect on the output timbre it was set to 1.0 for every combination. For the second two subsets (Mono Continuous, Poly Continuous), both unprocessed samples as well as settings' values were drawn from a uniform distribution.
Id | Level | Gain | Tone/Eq |
---|---|---|---|
808 | [1.0] | [0.2, 0.5, 0.8, 1.0] | [0.0, 0.2, 0.5, 0.8, 1.0] |
TS9 | [1.0] | [0.2, 0.5, 0.8, 1.0] | [0.0, 0.2, 0.5, 0.8, 1.0] |
BD2 | [1.0] | [0.2, 0.5, 0.8, 1.0] | [0.0, 0.2, 0.5, 0.8, 1.0] |
OD1 | [1.0] | [0.2, 0.5, 0.8, 1.0] | --- |
SD1 | [1.0] | [0.2, 0.5, 0.8, 1.0] | [0.0, 0.2, 0.5, 0.8, 1.0] |
DS1 | [1.0] | [0.2, 0.5, 0.8, 1.0] | [0.0, 0.2, 0.5, 0.8, 1.0] |
RAT | [1.0] | [0.2, 0.5, 0.8, 1.0] | [0.0, 0.2, 0.5, 0.8, 1.0] |
DPL | [1.0] | [0.2, 0.5, 0.8, 1.0] | --- |
FFC | [1.0] | [0.0, 0.2, 0.5, 0.8, 1.0] | --- |
BMF | [1.0] | [0.2, 0.5, 0.8, 1.0] | [0.0, 0.2, 0.5, 0.8, 1.0] |
MGS | [1.0] | [0.2, 0.5, 0.8, 1.0] | [0.0, 0.2, 0.5, 0.8, 1.0] |
RBM | [1.0] | [0.2, 0.5, 0.8, 1.0] | [0.0, 0.2, 0.5, 0.8, 1.0] |
VTB | [1.0] | [0.1, 0.2, 0.5, 0.8, 1.0] | --- |
The classification (FxNet) and estimation (SetNetCond) networks are both based on 2 convolutional and 3 fully connected layers, with batch normalisation layers at each hidden level. SetNetCond is conditioned on the effect class to improve the settings estimation. In this way, once the plugin is identified by FxNet, we can pass the information to SetNetCond. FxNet is trained with cross-entropy loss while SetNetCond with Mean Square Error
Layer | Size | #Fmaps | Activation |
---|---|---|---|
Conv 2D | 5x5 | 6 | Linear |
Batch Norm | - | - | - |
Activation | - | - | ReLU |
Max Pool | 2x2 | - | - |
Conv 2D | 5x5 | 12 | Linear |
Batch Norm | - | - | - |
Activation | - | - | ReLU |
Max Pool | 2x2 | - | - |
Fully Connected | 120 | - | Linear |
Batch Norm | - | - | - |
Activation | - | - | ReLU |
Fully Connected | 60 | - | Linear |
Batch Norm | - | - | - |
Activation | - | - | ReLU |
Fully Connected | (1) | - | (2) |
Trainable Parameters: FxNet = ∼760k; SetNetCond = ∼1.5M
(1) FxNet = #Plug-ins - SetNet = #Settings
(2) FxNet = Linear - SetNet = Tanh
FxNet accuracy (%)
Test
Train
|
Mono Disc. | Mono Cont. | Poly Disc. | Poly Cont. |
---|---|---|---|---|
Mono Disc. | 86.3 | 83.1 | --- | --- |
Mono Cont. | 81.3 | 90.9 | --- | --- |
Poly Disc. | --- | --- | 88.4 | 89.4 |
Poly Cont. | --- | --- | 84.1 | 91.4 |
Confusion matrix for all training/testing conditions on the 4 datasets.
Mono datasets:
Poly datasets:
808 and TS9 are very similar and cause a lot of misclassifications. Retraining the model without TS9 samples, the accuracy improves sensibly. The 808 is not misclassified anymore and also other plugins are better classified.
SetNetCond accuracy (%) - We define accuracy as the percentage of samples for which all settings are estimated with an absolute error below 0.1.
Test
Train
|
Mono Disc. | Mono Cont. | Poly Disc. | Poly Cont. |
---|---|---|---|---|
Mono Disc. | 80.06 | 68.56 | --- | --- |
Mono Cont. | 68.51 | 85.14 | --- | --- |
Poly Disc. | --- | --- | 90.75 | 75.74 |
Poly Cont. | --- | --- | 88.93 | 97.01 |
SetNetCond errors - For more details we also measure MAE and RMSE
Train set | Gain MAE (RMSE) |
Tone/Eq MAE (RMSE) |
Test set |
---|---|---|---|
Mono Disc. | 0.030 (0.061) | 0.039 (0.070) | Mono Disc. |
0.064 (0.084) | 0.044 (0.080) | Mono Cont. | |
Mono Cont. | 0.062 (0.096) | 0.067 (0.108) | Mono Disc. |
0.033 (0.045) | 0.039 (0.072) | Mono Cont. | |
Poly Disc. | 0.017 (0.033) | 0.024 (0.047) | Poly Disc. |
0.055 (0.070) | 0.038 (0.062) | Poly Cont. | |
Poly Cont. | 0.036 (0.063) | 0.036 (0.062) | Poly Disc. |
0.020 (0.028) | 0.019 (0.034) | Poly Cont. | |
Avg | 0.040 (0.060) | 0.038 (0.066) |
Boxplots for all training/testing conditions on the 4 datasets (whiskers extend to 1.5 the interquartile range).
Mono datasets:
Poly datasets:
Scatterplots for all training/testing conditions on the 4 datasets.
Mono datasets:
Poly datasets:
Mean error and skew for all training/testing conditions on the 4 datasets.
Mono datasets:
Poly datasets:
To analyse the misclassifications reported in the confusion matrices we used an AXY test. The test compared 808 vs TS9, OD1 vs SD1 and DPL vs RAT. In many cases, samples from these effects are misclassified. We wanted to understand whether listeners (8 in total) would incur in similar challenges trying to distinguish them. If the reference cannot be identified reliably, i.e., with a low p-value (<0.05), then the null hypothesis cannot be rejected and it cannot be proven that there is a perceptible difference between X and Y. We use the 1-sided p-value for our evaluation but include also the 2-sided one. 10 conditions compared the plugins of interest, while another 5 conditions where used to check that participants could reliably distinguish between samples with clear perceivable differences.
In 9 out of 10 cases the participants could not reliably distinguish between X and Y (p >0.05). Regardless of the settings, the two effects seem to be indistinguishable.
A | X | Y | Correct | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | 1-sided p-value | 2-sided p-value |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
G61-61311-808-O5T8-20630 | G61-61311-808-O5T8-20630 | G61-61311-TS9-D5T8-20630 | X | Y | Y | X | Y | Y | X | X | X | 0.6367 | 1.0000 |
G73-50205-808-O5T5-20845 | G73-50205-BMF-S5T5-20845 | G73-50205-808-O5T5-20845 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G91-73609-808-O8T10-21135 | G91-73609-808-O8T10-21135 | G91-73609-TS9-O8T10-21135 | X | X | Y | Y | Y | Y | X | Y | Y | 0.1445 | 0.2891 |
G63-48108-808-O2T8-20679 | G63-48108-TS9-O2T8-20679 | G63-48108-808-O2T8-20679 | Y | X | Y | Y | X | X | X | Y | X | 0.3633 | 0.7266 |
G73-60405-808-O5T10-20871 | G73-60405-808-O5T10-20871 | G73-60405-TS9-O5T10-20871 | X | X | X | X | X | Y | X | X | X | 0.0352 | 0.0703 |
G83-49109-808-O10T10-20992 | G83-49109-RAT-D10T10-20992 | G83-49109-808-O10T10-20992 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G61-49109-808-O5T0-20602 | G61-49109-808-O5T0-20602 | G61-49109-TS9-O5T0-20602 | X | Y | X | Y | X | X | X | X | X | 0.1445 | 0.2891 |
G91-51206-808-O10T0-21080 | G91-51206-808-O10T0-21080 | G91-51206-TS9-O10T0-21080 | X | X | X | Y | X | Y | X | X | Y | 0.3633 | 0.7266 |
G71-48203-808-O10T2-20765 | G71-48203-BMF-S10T2-20765 | G71-48203-808-O10T2-20765 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G83-60310-808-O10T2-21019 | G83-60310-TS9-D10T2-21019 | G83-60310-808-O10T2-21019 | Y | X | Y | X | Y | Y | X | Y | Y | 0.3633 | 0.7266 |
G93-69605-808-O8T2-21209 | G93-69605-RAT-D8T2-21209 | G93-69605-808-O8T2-21209 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G93-56306-TS9-D2T10-21171 | G93-56306-TS9-D2T10-21171 | G93-56306-808-O2T10-21171 | X | X | X | Y | X | X | X | Y | X | 0.1445 | 0.2891 |
G71-55305-808-O8T5-20780 | G71-55305-808-O8T5-20780 | G71-55305-TS9-D8T5-20780 | X | Y | Y | Y | Y | Y | X | Y | X | 0.1445 | 0.2891 |
G63-60405-808-O2T2-20715 | G63-60405-808-O2T2-20715 | G63-60405-BD2-G2T2-20715 | X | X | X | X | X | X | X | X | X | 0.0039 | 0.0078 |
G91-57402-808-O10T10-21102 | G91-57402-808-O10T10-21102 | G91-57402-TS9-O10T10-21102 | X | Y | Y | Y | X | X | X | Y | X | 0.6367 | 1.0000 |
In 9 out of 10 cases the participants could not reliably distinguish between X and Y (p >0.05). For Gain set to 0.2 or 0.5 the two effects seem to be indistinguishable.
A | X | Y | Correct | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | 1-sided p-value | 2-sided p-value |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
G81-47202-OD1-D2-20920 | G81-47202-SD1-D2T5-20920 | G81-47202-OD1-D2-20920 | Y | Y | Y | Y | X | Y | X | Y | Y | 0.1445 | 0.2891 |
G91-44104-OD1-D5-21065 | G91-44104-OD1-D5-21065 | G91-44104-DPL-D5-21065 | X | X | X | X | X | X | X | X | X | 0.0039 | 0.0078 |
G63-55400-OD1-D5-20710 | G63-55400-MGS-D5-20710 | G63-55400-OD1-D5-20710 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G91-59404-OD1-D2-21104 | G91-59404-RAT-D2F5-21104 | G91-59404-OD1-D2-21104 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G73-60405-OD1-D5-20871 | G73-60405-SD1-D5T5-20871 | G73-60405-OD1-D5-20871 | Y | X | X | X | X | Y | X | Y | X | 0.1445 | 0.2891 |
G83-52207-OD1-D2-21003 | G83-52207-RAT-D2F5-21003 | G83-52207-OD1-D2-21003 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G81-65506-OD1-D5-20963 | G81-65506-OD1-D5-20963 | G81-65506-SD1-D5T2-20963 | X | Y | Y | Y | X | X | X | Y | X | 0.6367 | 1.0000 |
G81-51206-OD1-D2-20924 | G81-51206-DPL-D2-20924 | G81-51206-OD1-D2-20924 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G81-59500-OD1-D2-20957 | G81-59500-OD1-D2-20957 | G81-59500-SD1-D2T5-20957 | X | Y | X | Y | X | Y | Y | Y | X | 0.3633 | 0.7266 |
G93-48203-OD1-D2-21155 | G93-48203-SD1-D2T2-21155 | G93-48203-OD1-D2-21155 | Y | Y | X | Y | Y | Y | X | Y | Y | 0.1445 | 0.2891 |
G73-46201-OD1-D2-20841 | G73-46201-OD1-D2-20841 | G73-46201-SD1-D2T5-20841 | X | X | Y | X | X | Y | X | X | Y | 0.3633 | 0.7266 |
G63-69510-OD1-D5-20733 | G63-69510-SD1-D5T2-20733 | G63-69510-OD1-D5-20733 | Y | Y | Y | X | X | Y | X | Y | Y | 0.3633 | 0.7266 |
G63-65506-OD1-D5-20729 | G63-65506-SD1-D5T5-20729 | G63-65506-OD1-D5-20729 | X | Y | Y | Y | X | Y | X | Y | X | 0.3633 | 0.7266 |
G63-60310-OD1-D2-20707 | G63-60310-SD1-D2T2-20707 | G63-60310-OD1-D2-20707 | X | Y | Y | X | Y | X | Y | Y | X | 0.3633 | 0.7266 |
G81-65410-OD1-D5-20954 | G81-65410-OD1-D5-20954 | G81-65410-SD1-D5T5-20954 | X | X | X | X | Y | X | X | X | X | 0.0352 | 0.0703 |
In 5 out of 10 cases the participants could not reliably distinguish between X and Y (p >0.05). In some cases, the DPL with Gain set to 1.0 seems to be indistinguishable from the RAT with Gain set to 0.5.
A | X | Y | Correct | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | 1-sided p-value | 2-sided p-value |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
G63-61406-DPL-D10-20716 | G63-61406-DPL-D10-20716 | G63-61406-RBM-T10S10-20716 | X | X | X | X | X | X | X | X | X | 0.0039 | 0.0078 |
G63-65601-DPL-D10-20737 | G63-65601-DPL-D10-20737 | G63-65601-RAT-D2F0-20737 | X | X | X | X | X | X | X | Y | X | 0.0352 | 0.0703 |
G63-50205-DPL-D10-20689 | G63-50205-RBM-D10-20689 | G63-50205-DPL-D10-20689 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G91-67603-DPL-D10-21129 | G91-67603-DPL-D10-21129 | G91-67603-808-D10T10-21129 | X | X | X | X | X | X | X | X | X | 0.0039 | 0.0078 |
G73-52207-DPL-D10-20847 | G73-52207-808-D10-20847 | G73-52207-DPL-D10-20847 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G93-57307-DPL-D10-21172 | G93-57307-DPL-D10-21172 | G93-57307-RBM-T8S10-21172 | X | X | X | X | X | X | X | X | X | 0.0039 | 0.0078 |
G91-55400-DPL-D10-21100 | G91-55400-DPL-D10-21100 | G91-55400-RAT-D5F0-21100 | X | X | X | Y | X | X | X | X | X | 0.0352 | 0.0703 |
G91-61502-DPL-D10-21115 | G91-61502-RAT-D5F0-21115 | G91-61502-DPL-D10-21115 | Y | Y | Y | Y | X | Y | Y | X | Y | 0.1445 | 0.2891 |
G63-50300-DPL-D10-20697 | G63-50300-RAT-D5F0-20697 | G63-50300-DPL-D10-20697 | Y | X | Y | Y | Y | Y | Y | X | Y | 0.1445 | 0.2891 |
G73-68604-DPL-D10-20896 | G73-68604-RAT-D2F0-20896 | G73-68604-DPL-D10-20896 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G61-50110-DPL-D10-20603 | G61-50110-RAT-D5F0-20603 | G61-50110-DPL-D10-20603 | Y | Y | Y | X | Y | Y | X | Y | Y | 0.1445 | 0.2891 |
G93-76612-DPL-D10-21216 | G93-76612-DPL-D10-21216 | G93-76612-RAT-D5F0-21216 | X | X | X | X | Y | X | X | Y | X | 0.1445 | 0.2891 |
G61-65506-DPL-D10-20651 | G61-65506-RAT-D5F0-20651 | G61-65506-DPL-D10-20651 | Y | Y | Y | Y | Y | Y | Y | Y | Y | 0.0039 | 0.0078 |
G71-64409-DPL-D10-20797 | G71-64409-DPL-D10-20797 | G71-64409-RAT-D5F0-20797 | X | Y | X | Y | X | Y | X | Y | X | 0.6367 | 1.0000 |
G61-61406-DPL-D10-20638 | G61-61406-DPL-D10-20638 | G61-61406-RAT-D5F0-20638 | X | X | X | X | X | X | X | X | X | 0.0039 | 0.0078 |
@article{comunità2021guitar,
title={Guitar Effects Recognition and Parameter Estimation with Convolutional Neural Networks},
author={Comunità, Marco and Stowell, Dan and Reiss, Joshua D.},
journal={Journal of the Audio Engineering Society},
year={2021},
volume={69},
number={7/8},
pages={594-604},
doi={},
month={July}}