Controllability of neural ODEs for data classification
Author:
Publication date:
Editorial:
Servicio de Publicaciones de la Universidad de Oviedo
Citación:
Descripción física:
Abstract:
In this work, we explore the capacity of neural ordinary differential equations (ODEs) for supervised learning from a control perspective. Specifically, we rely on the property of simultaneous controllability and explicitly construct the controls that achieve this as piecewise constant functions in time. First, we analyze the expressivity of the model for cluster-based classification by estimating the number of neurons required for the classification of a set constituted by 𝑁 points. We consider a worst-case scenario where these points are independently sampled from 𝑈([0, 1]𝑑). Assuming only that the initial points are in general position, we propose an algorithm that classifies clusters of 𝑑 points simultaneously, employing 𝑂(𝑁/𝑑) neurons. Secondly, we examine the impact of the architecture, determined by the depth 𝑝 and width 𝐿, for interpolating a set of 𝑁 pairs of points. Our findings reveal a balance where 𝐿 scales as 𝑂(1 + 𝑁/𝑝). For the autonomous model, with constant controls (𝐿 = 0), we relax the problem to approximate controllability of 𝑁 pairs of points, establishing an explicit error decay with respect to 𝑝. Finally, we extend the problem to the approximate control of measures in the Wasserstein space, finding another balance between 𝑝 and 𝐿.
In this work, we explore the capacity of neural ordinary differential equations (ODEs) for supervised learning from a control perspective. Specifically, we rely on the property of simultaneous controllability and explicitly construct the controls that achieve this as piecewise constant functions in time. First, we analyze the expressivity of the model for cluster-based classification by estimating the number of neurons required for the classification of a set constituted by 𝑁 points. We consider a worst-case scenario where these points are independently sampled from 𝑈([0, 1]𝑑). Assuming only that the initial points are in general position, we propose an algorithm that classifies clusters of 𝑑 points simultaneously, employing 𝑂(𝑁/𝑑) neurons. Secondly, we examine the impact of the architecture, determined by the depth 𝑝 and width 𝐿, for interpolating a set of 𝑁 pairs of points. Our findings reveal a balance where 𝐿 scales as 𝑂(1 + 𝑁/𝑝). For the autonomous model, with constant controls (𝐿 = 0), we relax the problem to approximate controllability of 𝑁 pairs of points, establishing an explicit error decay with respect to 𝑝. Finally, we extend the problem to the approximate control of measures in the Wasserstein space, finding another balance between 𝑝 and 𝐿.
ISBN:
Enlace a recurso relacionado:
Collections
- Obras colectivas [692]
Files in this item
