SFERA Archivio dei prodotti della Ricerca dell'Università di Ferrara

In this paper we study a stochastic gradient algorithm which rules the increase of the minibatch size in a predefined fashion and automatically adjusts the learning rate by means of a monotone or non -monotone line search procedure. The mini -batch size is incremented at a suitable a priori rate throughout the iterative process in order that the variance of the stochastic gradients is progressively reduced. The a priori rate is not subject to restrictive assumptions, allowing for the possibility of a slow increase in the mini -batch size. On the other hand, the learning rate can vary non -monotonically throughout the iterations, as long as it is appropriately bounded. Convergence results for the proposed method are provided for both convex and non -convex objective functions. Moreover it can be proved that the algorithm enjoys a global linear rate of convergence on strongly convex functions. The low per -iteration cost, the limited memory requirements and the robustness against the hyperparameters setting make the suggested approach well -suited for implementation within the deep learning framework, also for GPGPU-equipped architectures. Numerical results on training deep neural networks for multiclass image classification show a promising behaviour of the proposed scheme with respect to similar state of the art competitors.

A stochastic gradient method with variance control and variable learning rate for Deep Learning

Franchini G.;Porta F.;Ruggiero V.;Trombini I.^Penultimo;Zanni L.

2024

Abstract

In this paper we study a stochastic gradient algorithm which rules the increase of the minibatch size in a predefined fashion and automatically adjusts the learning rate by means of a monotone or non -monotone line search procedure. The mini -batch size is incremented at a suitable a priori rate throughout the iterative process in order that the variance of the stochastic gradients is progressively reduced. The a priori rate is not subject to restrictive assumptions, allowing for the possibility of a slow increase in the mini -batch size. On the other hand, the learning rate can vary non -monotonically throughout the iterations, as long as it is appropriately bounded. Convergence results for the proposed method are provided for both convex and non -convex objective functions. Moreover it can be proved that the algorithm enjoys a global linear rate of convergence on strongly convex functions. The low per -iteration cost, the limited memory requirements and the robustness against the hyperparameters setting make the suggested approach well -suited for implementation within the deep learning framework, also for GPGPU-equipped architectures. Numerical results on training deep neural networks for multiclass image classification show a promising behaviour of the proposed scheme with respect to similar state of the art competitors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	DOI
	
				https://dx.doi.org/10.1016/j.cam.2024.116083
			
	Titolo della Rivista
	
				JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS
			
	Tutti gli autori
	
						Franchini, G.; Porta, F.; Ruggiero, V.; Trombini, I.; Zanni, L.
					
	Appare nelle tipologie:
	
				03.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0377042724003327-main.pdf accesso aperto Descrizione: Full text editoriale Tipologia: Full text (versione editoriale) Licenza: Creative commons Dimensione 717.12 kB Formato Adobe PDF Visualizza/Apri	717.12 kB	Adobe PDF	Visualizza/Apri

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2557052

Citazioni

ND

4

5

social impact