## On computational barriers in optimisation and the paradoxes of deep learning

#### Dr. Anders Hansen (Applied Functional and Harmonic Analysis group, Cambridge Centre for Analysis, Department of Applied Mathematics and Theoretical Physics (DAMTP), University of Cambridge, UK.)

Frank Adams 1,

Linear and semidefinite programming (LP, SDP), regularisation with Basis Pursuit and Lasso, as well as neural networks in deep learning have seen great success in mathematics, learning, statistics and data science. The success of LP is traditionally described by the fact that it is in P for rational inputs. However, in the list of problems for the 21st century Smale calls for "[Computational] models which process approximate inputs and which permit round-off computations". Hence, we will discuss the following paradox: It is impossible to design accurate algorithms for the above problems when given inaccurate, yet well-conditioned input data with arbitrarily small inaccuracies. The paradox implies the following failure for any algorithm: For fixed dimensions and any small accuracy parameter epsilon > 0, one can choose any time T and find an input such that the runtime > T without having reached epsilon accuracy. Moreover, it is impossible to determine when the algorithm should halt to achieve an epsilon accurate solution. The largest epsilon for which this failure happens is called the Breakdown-epsilon. Typically the Breakdown-epsilon > 1/2 even for randomised algorithms when the input is bounded by one and well-conditioned. Moreover, it is impossible to design an algorithm to check whether the algorithm will fail on a given input. This is in fact strictly harder than solving the original problem. Thus, e.g. even if given an oracle for solving LP accurately, one cannot determine if a given algorithm for solving LP produces nonsense on its input.

The paradox implies two surprises: (1) In mathematics, statistics and learning, one computes non-computable problems on a daily basis. (2) In order to mathematically characterise this phenomenon, there is an intricate complexity theory for, paradoxically, non-computable problems. In particular, the key to explaining the success (and failure) of many algorithms in real-world scenarios lies in this rich classification theory, where sparsity turns out to be a key ingredient.

The above result leads to the paradoxes of deep learning: (1) One cannot guarantee the existence of algorithms for accurately training the neural network, even if there is one minimum and no local minima. Moreover, (2) one can have 100% success rate on arbitrarily many test cases, yet uncountably many misclassifications on elements that are arbitrarily close to the training set.