Guitar Effects Recognition and Parameter Estimation with Convolutional Neural Networks

Abstract

Despite the popularity of guitar effects, there is very little existing research on classification and parameter estimation of specific plugins or effect units from guitar recordings. In this paper, convolutional neural networks were used for classification and parameter estimation for 13 overdrive, distortion and fuzz guitar effects. A novel dataset of processed electric guitar samples was assembled, with four sub-datasets consisting of monophonic or polyphonic samples and discrete or continuous settings values, for a total of about 250 hours of processed samples. Results were compared for networks trained and tested on the same or on a different sub-dataset. We found that discrete datasets could lead to equally high performance as continuous ones, whilst being easier to design, analyse and modify. Classification accuracy was above 80\%, with confusion matrices reflecting similarities in the effects timbre and circuits design. With parameter values between 0.0 and 1.0, the mean absolute error is in most cases below 0.05, while the root mean square error is below 0.1 in all cases but one.

Dataset

To assemble our dataset we selected 13 overdrive, distortion and fuzz plug-ins designed to emulate some of the most iconic and widely used analogue guitar effect pedals. All the plugins have 2 or 3 controls and, regardless of the specific name adopted by the designer, the controls can be identified by their processing function: Level, Gain, Tone/Equalisation. For training and testing purposes, 4 sub-datasets were generated: Mono Discrete , Mono Continuous, Poly Discrete and Poly Continuous.

Designer	Plugin	Emulation of	Id
Audified	MultiDrive Pedal Pro	Ibanez TS808	808
		Ibanez TS9	TS9
		Boss BD2	BD2
		Boss OD1	OD1
		Boss SD1	SD1
		Boss DS1	DS1
		ProCo Rat	RAT
		MXR Distortion+	DPL
		Arbiter Fuzz Face	FFC
		Electro-Harmonix Big Muff	BMF
Mercuriall	Greed Smasher	Mesa/Boogie Grid Slammer	MGS
Analog Obsession	Pig Pie	Electro-Harmonix Russian Big Muff	RBM
	Zupaa	Vox Tone Bender	VTB

The first two subsets (Mono Discrete, Poly Discrete) use a discrete set of combinations selected as the most common and representative settings a person might use: Gain = [0.0, 0.1, 0.2, 0.5, 0.8, 1.0], Tone/Eq = [0.0, 0.2, 0.5, 0.8, 1.0]. Also, since the Level control has no effect on the output timbre it was set to 1.0 for every combination. For the second two subsets (Mono Continuous, Poly Continuous), both unprocessed samples as well as settings' values were drawn from a uniform distribution.

Id	Level	Gain	Tone/Eq
808	[1.0]	[0.2, 0.5, 0.8, 1.0]	[0.0, 0.2, 0.5, 0.8, 1.0]
TS9	[1.0]	[0.2, 0.5, 0.8, 1.0]	[0.0, 0.2, 0.5, 0.8, 1.0]
BD2	[1.0]	[0.2, 0.5, 0.8, 1.0]	[0.0, 0.2, 0.5, 0.8, 1.0]
OD1	[1.0]	[0.2, 0.5, 0.8, 1.0]	---
SD1	[1.0]	[0.2, 0.5, 0.8, 1.0]	[0.0, 0.2, 0.5, 0.8, 1.0]
DS1	[1.0]	[0.2, 0.5, 0.8, 1.0]	[0.0, 0.2, 0.5, 0.8, 1.0]
RAT	[1.0]	[0.2, 0.5, 0.8, 1.0]	[0.0, 0.2, 0.5, 0.8, 1.0]
DPL	[1.0]	[0.2, 0.5, 0.8, 1.0]	---
FFC	[1.0]	[0.0, 0.2, 0.5, 0.8, 1.0]	---
BMF	[1.0]	[0.2, 0.5, 0.8, 1.0]	[0.0, 0.2, 0.5, 0.8, 1.0]
MGS	[1.0]	[0.2, 0.5, 0.8, 1.0]	[0.0, 0.2, 0.5, 0.8, 1.0]
RBM	[1.0]	[0.2, 0.5, 0.8, 1.0]	[0.0, 0.2, 0.5, 0.8, 1.0]
VTB	[1.0]	[0.1, 0.2, 0.5, 0.8, 1.0]	---

Architecture

The classification (FxNet) and estimation (SetNetCond) networks are both based on 2 convolutional and 3 fully connected layers, with batch normalisation layers at each hidden level. SetNetCond is conditioned on the effect class to improve the settings estimation. In this way, once the plugin is identified by FxNet, we can pass the information to SetNetCond. FxNet is trained with cross-entropy loss while SetNetCond with Mean Square Error

Layer	Size	#Fmaps	Activation
Conv 2D	5x5	6	Linear
Batch Norm	-	-	-
Activation	-	-	ReLU
Max Pool	2x2	-	-
Conv 2D	5x5	12	Linear
Batch Norm	-	-	-
Activation	-	-	ReLU
Max Pool	2x2	-	-
Fully Connected	120	-	Linear
Batch Norm	-	-	-
Activation	-	-	ReLU
Fully Connected	60	-	Linear
Batch Norm	-	-	-
Activation	-	-	ReLU
Fully Connected	(1)	-	(2)

Trainable Parameters: FxNet = ∼760k; SetNetCond = ∼1.5M
(1) FxNet = #Plug-ins - SetNet = #Settings
(2) FxNet = Linear - SetNet = Tanh

Results: Classification

FxNet accuracy (%)

Test Train	Mono Disc.	Mono Cont.	Poly Disc.	Poly Cont.
Mono Disc.	86.3	83.1	---	---
Mono Cont.	81.3	90.9	---	---
Poly Disc.	---	---	88.4	89.4
Poly Cont.	---	---	84.1	91.4

Confusion matrix for all training/testing conditions on the 4 datasets.

Mono datasets:

Poly datasets:

808 and TS9 are very similar and cause a lot of misclassifications. Retraining the model without TS9 samples, the accuracy improves sensibly. The 808 is not misclassified anymore and also other plugins are better classified.

Results: Estimation

SetNetCond accuracy (%) - We define accuracy as the percentage of samples for which all settings are estimated with an absolute error below 0.1.

Test Train	Mono Disc.	Mono Cont.	Poly Disc.	Poly Cont.
Mono Disc.	80.06	68.56	---	---
Mono Cont.	68.51	85.14	---	---
Poly Disc.	---	---	90.75	75.74
Poly Cont.	---	---	88.93	97.01

SetNetCond errors - For more details we also measure MAE and RMSE

Train set	Gain MAE (RMSE)	Tone/Eq MAE (RMSE)	Test set
Mono Disc.	0.030 (0.061)	0.039 (0.070)	Mono Disc.
	0.064 (0.084)	0.044 (0.080)	Mono Cont.
Mono Cont.	0.062 (0.096)	0.067 (0.108)	Mono Disc.
	0.033 (0.045)	0.039 (0.072)	Mono Cont.
Poly Disc.	0.017 (0.033)	0.024 (0.047)	Poly Disc.
	0.055 (0.070)	0.038 (0.062)	Poly Cont.
Poly Cont.	0.036 (0.063)	0.036 (0.062)	Poly Disc.
	0.020 (0.028)	0.019 (0.034)	Poly Cont.
Avg	0.040 (0.060)	0.038 (0.066)

Boxplots for all training/testing conditions on the 4 datasets (whiskers extend to 1.5 the interquartile range).

Mono datasets:

Poly datasets:

Scatterplots for all training/testing conditions on the 4 datasets.

Mono datasets:

Poly datasets:

Mean error and skew for all training/testing conditions on the 4 datasets.

Mono datasets:

Poly datasets:

Results: Listening Test

To analyse the misclassifications reported in the confusion matrices we used an AXY test. The test compared 808 vs TS9, OD1 vs SD1 and DPL vs RAT. In many cases, samples from these effects are misclassified. We wanted to understand whether listeners (8 in total) would incur in similar challenges trying to distinguish them. If the reference cannot be identified reliably, i.e., with a low p-value (<0.05), then the null hypothesis cannot be rejected and it cannot be proven that there is a perceptible difference between X and Y. We use the 1-sided p-value for our evaluation but include also the 2-sided one. 10 conditions compared the plugins of interest, while another 5 conditions where used to check that participants could reliably distinguish between samples with clear perceivable differences.

808 vs TS9

In 9 out of 10 cases the participants could not reliably distinguish between X and Y (p >0.05). Regardless of the settings, the two effects seem to be indistinguishable.

A	X	Y	Correct	P1	P2	P3	P4	P5	P6	P7	P8	1-sided p-value	2-sided p-value
G61-61311-808-O5T8-20630	G61-61311-808-O5T8-20630	G61-61311-TS9-D5T8-20630	X	Y	Y	X	Y	Y	X	X	X	0.6367	1.0000
G73-50205-808-O5T5-20845	G73-50205-BMF-S5T5-20845	G73-50205-808-O5T5-20845	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G91-73609-808-O8T10-21135	G91-73609-808-O8T10-21135	G91-73609-TS9-O8T10-21135	X	X	Y	Y	Y	Y	X	Y	Y	0.1445	0.2891
G63-48108-808-O2T8-20679	G63-48108-TS9-O2T8-20679	G63-48108-808-O2T8-20679	Y	X	Y	Y	X	X	X	Y	X	0.3633	0.7266
G73-60405-808-O5T10-20871	G73-60405-808-O5T10-20871	G73-60405-TS9-O5T10-20871	X	X	X	X	X	Y	X	X	X	0.0352	0.0703
G83-49109-808-O10T10-20992	G83-49109-RAT-D10T10-20992	G83-49109-808-O10T10-20992	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G61-49109-808-O5T0-20602	G61-49109-808-O5T0-20602	G61-49109-TS9-O5T0-20602	X	Y	X	Y	X	X	X	X	X	0.1445	0.2891
G91-51206-808-O10T0-21080	G91-51206-808-O10T0-21080	G91-51206-TS9-O10T0-21080	X	X	X	Y	X	Y	X	X	Y	0.3633	0.7266
G71-48203-808-O10T2-20765	G71-48203-BMF-S10T2-20765	G71-48203-808-O10T2-20765	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G83-60310-808-O10T2-21019	G83-60310-TS9-D10T2-21019	G83-60310-808-O10T2-21019	Y	X	Y	X	Y	Y	X	Y	Y	0.3633	0.7266
G93-69605-808-O8T2-21209	G93-69605-RAT-D8T2-21209	G93-69605-808-O8T2-21209	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G93-56306-TS9-D2T10-21171	G93-56306-TS9-D2T10-21171	G93-56306-808-O2T10-21171	X	X	X	Y	X	X	X	Y	X	0.1445	0.2891
G71-55305-808-O8T5-20780	G71-55305-808-O8T5-20780	G71-55305-TS9-D8T5-20780	X	Y	Y	Y	Y	Y	X	Y	X	0.1445	0.2891
G63-60405-808-O2T2-20715	G63-60405-808-O2T2-20715	G63-60405-BD2-G2T2-20715	X	X	X	X	X	X	X	X	X	0.0039	0.0078
G91-57402-808-O10T10-21102	G91-57402-808-O10T10-21102	G91-57402-TS9-O10T10-21102	X	Y	Y	Y	X	X	X	Y	X	0.6367	1.0000

OD1 vs SD1

In 9 out of 10 cases the participants could not reliably distinguish between X and Y (p >0.05). For Gain set to 0.2 or 0.5 the two effects seem to be indistinguishable.

A	X	Y	Correct	P1	P2	P3	P4	P5	P6	P7	P8	1-sided p-value	2-sided p-value
G81-47202-OD1-D2-20920	G81-47202-SD1-D2T5-20920	G81-47202-OD1-D2-20920	Y	Y	Y	Y	X	Y	X	Y	Y	0.1445	0.2891
G91-44104-OD1-D5-21065	G91-44104-OD1-D5-21065	G91-44104-DPL-D5-21065	X	X	X	X	X	X	X	X	X	0.0039	0.0078
G63-55400-OD1-D5-20710	G63-55400-MGS-D5-20710	G63-55400-OD1-D5-20710	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G91-59404-OD1-D2-21104	G91-59404-RAT-D2F5-21104	G91-59404-OD1-D2-21104	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G73-60405-OD1-D5-20871	G73-60405-SD1-D5T5-20871	G73-60405-OD1-D5-20871	Y	X	X	X	X	Y	X	Y	X	0.1445	0.2891
G83-52207-OD1-D2-21003	G83-52207-RAT-D2F5-21003	G83-52207-OD1-D2-21003	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G81-65506-OD1-D5-20963	G81-65506-OD1-D5-20963	G81-65506-SD1-D5T2-20963	X	Y	Y	Y	X	X	X	Y	X	0.6367	1.0000
G81-51206-OD1-D2-20924	G81-51206-DPL-D2-20924	G81-51206-OD1-D2-20924	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G81-59500-OD1-D2-20957	G81-59500-OD1-D2-20957	G81-59500-SD1-D2T5-20957	X	Y	X	Y	X	Y	Y	Y	X	0.3633	0.7266
G93-48203-OD1-D2-21155	G93-48203-SD1-D2T2-21155	G93-48203-OD1-D2-21155	Y	Y	X	Y	Y	Y	X	Y	Y	0.1445	0.2891
G73-46201-OD1-D2-20841	G73-46201-OD1-D2-20841	G73-46201-SD1-D2T5-20841	X	X	Y	X	X	Y	X	X	Y	0.3633	0.7266
G63-69510-OD1-D5-20733	G63-69510-SD1-D5T2-20733	G63-69510-OD1-D5-20733	Y	Y	Y	X	X	Y	X	Y	Y	0.3633	0.7266
G63-65506-OD1-D5-20729	G63-65506-SD1-D5T5-20729	G63-65506-OD1-D5-20729	X	Y	Y	Y	X	Y	X	Y	X	0.3633	0.7266
G63-60310-OD1-D2-20707	G63-60310-SD1-D2T2-20707	G63-60310-OD1-D2-20707	X	Y	Y	X	Y	X	Y	Y	X	0.3633	0.7266
G81-65410-OD1-D5-20954	G81-65410-OD1-D5-20954	G81-65410-SD1-D5T5-20954	X	X	X	X	Y	X	X	X	X	0.0352	0.0703

DPL vs RAT

In 5 out of 10 cases the participants could not reliably distinguish between X and Y (p >0.05). In some cases, the DPL with Gain set to 1.0 seems to be indistinguishable from the RAT with Gain set to 0.5.

A	X	Y	Correct	P1	P2	P3	P4	P5	P6	P7	P8	1-sided p-value	2-sided p-value
G63-61406-DPL-D10-20716	G63-61406-DPL-D10-20716	G63-61406-RBM-T10S10-20716	X	X	X	X	X	X	X	X	X	0.0039	0.0078
G63-65601-DPL-D10-20737	G63-65601-DPL-D10-20737	G63-65601-RAT-D2F0-20737	X	X	X	X	X	X	X	Y	X	0.0352	0.0703
G63-50205-DPL-D10-20689	G63-50205-RBM-D10-20689	G63-50205-DPL-D10-20689	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G91-67603-DPL-D10-21129	G91-67603-DPL-D10-21129	G91-67603-808-D10T10-21129	X	X	X	X	X	X	X	X	X	0.0039	0.0078
G73-52207-DPL-D10-20847	G73-52207-808-D10-20847	G73-52207-DPL-D10-20847	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G93-57307-DPL-D10-21172	G93-57307-DPL-D10-21172	G93-57307-RBM-T8S10-21172	X	X	X	X	X	X	X	X	X	0.0039	0.0078
G91-55400-DPL-D10-21100	G91-55400-DPL-D10-21100	G91-55400-RAT-D5F0-21100	X	X	X	Y	X	X	X	X	X	0.0352	0.0703
G91-61502-DPL-D10-21115	G91-61502-RAT-D5F0-21115	G91-61502-DPL-D10-21115	Y	Y	Y	Y	X	Y	Y	X	Y	0.1445	0.2891
G63-50300-DPL-D10-20697	G63-50300-RAT-D5F0-20697	G63-50300-DPL-D10-20697	Y	X	Y	Y	Y	Y	Y	X	Y	0.1445	0.2891
G73-68604-DPL-D10-20896	G73-68604-RAT-D2F0-20896	G73-68604-DPL-D10-20896	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G61-50110-DPL-D10-20603	G61-50110-RAT-D5F0-20603	G61-50110-DPL-D10-20603	Y	Y	Y	X	Y	Y	X	Y	Y	0.1445	0.2891
G93-76612-DPL-D10-21216	G93-76612-DPL-D10-21216	G93-76612-RAT-D5F0-21216	X	X	X	X	Y	X	X	Y	X	0.1445	0.2891
G61-65506-DPL-D10-20651	G61-65506-RAT-D5F0-20651	G61-65506-DPL-D10-20651	Y	Y	Y	Y	Y	Y	Y	Y	Y	0.0039	0.0078
G71-64409-DPL-D10-20797	G71-64409-DPL-D10-20797	G71-64409-RAT-D5F0-20797	X	Y	X	Y	X	Y	X	Y	X	0.6367	1.0000
G61-61406-DPL-D10-20638	G61-61406-DPL-D10-20638	G61-61406-RAT-D5F0-20638	X	X	X	X	X	X	X	X	X	0.0039	0.0078

Citation

                
    @article{comunità2021guitar,
        title={Guitar Effects Recognition and Parameter Estimation with Convolutional Neural Networks},
        author={Comunità, Marco and  Stowell, Dan and  Reiss, Joshua D.},
        journal={Journal of the Audio Engineering Society},
        year={2021},
        volume={69},
        number={7/8},
        pages={594-604},
        doi={}, 
        month={July}}