SFERA Archivio dei prodotti della Ricerca dell'Università di Ferrara

In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.

Neural architecture search via standard machine learning methodologies

Franchini G.;Ruggiero V.^Secondo;Porta F.;Zanni L.

2023

Abstract

In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	DOI
	
				https://dx.doi.org/10.3934/MINE.2023012
			
	Titolo della Rivista
	
				MATHEMATICS IN ENGINEERING
			
	Tutti gli autori
	
						Franchini, G.; Ruggiero, V.; Porta, F.; Zanni, L.
					
	Appare nelle tipologie:
	
				03.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
MINE (1).pdf solo gestori archivio Descrizione: Pre-print Tipologia: Pre-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 668.21 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	668.21 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
10.3934_mine.2023012.pdf accesso aperto Descrizione: Full text editoriale Tipologia: Full text (versione editoriale) Licenza: Creative commons Dimensione 621.83 kB Formato Adobe PDF Visualizza/Apri	621.83 kB	Adobe PDF	Visualizza/Apri

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2480932

Citazioni

ND

23

12

social impact