RELU DEEP NEURAL NETWORKS AND LINEAR FINITE ELEMENTS

Juncai He, Lin Li, Jinchao Xu, Chunyue Zheng

Journal of Computational Mathematics ›› 2020, Vol. 38 ›› Issue (3) : 502-527.

Journal of Computational Mathematics ›› 2020, Vol. 38 ›› Issue (3) : 502-527. DOI: 10.4208/jcm.1901-m2018-0160

RELU DEEP NEURAL NETWORKS AND LINEAR FINITE ELEMENTS

  • Juncai He1, Lin Li2, Jinchao Xu3, Chunyue Zheng3
Author information +
History +

Abstract

In this paper, we investigate the relationship between deep neural networks (DNN) with rectified linear unit (ReLU) function as the activation function and continuous piecewise linear (CPWL) functions, especially CPWL functions from the simplicial linear finite element method (FEM). We first consider the special case of FEM. By exploring the DNN representation of its nodal basis functions, we present a ReLU DNN representation of CPWL in FEM. We theoretically establish that at least 2 hidden layers are needed in a ReLU DNN to represent any linear finite element functions in Ω⊆Rd when d ≥ 2. Consequently, for d=2, 3 which are often encountered in scientific and engineering computing, the minimal number of two hidden layers are necessary and sufficient for any CPWL function to be represented by a ReLU DNN. Then we include a detailed account on how a general CPWL in Rd can be represented by a ReLU DNN with at most 「log2(d + 1)」 hidden layers and we also give an estimation of the number of neurons in DNN that are needed in such a representation. Furthermore, using the relationship between DNN and FEM, we theoretically argue that a special class of DNN models with low bit-width are still expected to have an adequate representation power in applications. Finally, as a proof of concept, we present some numerical results for using ReLU DNNs to solve a two point boundary problem to demonstrate the potential of applying DNN for numerical solution of partial differential equations.

Key words

Finite element method / Deep neural network / Piecewise linear function

Cite this article

Download Citations
Juncai He, Lin Li, Jinchao Xu, Chunyue Zheng. RELU DEEP NEURAL NETWORKS AND LINEAR FINITE ELEMENTS. Journal of Computational Mathematics, 2020, 38(3): 502-527 https://doi.org/10.4208/jcm.1901-m2018-0160

References

[1] Y. LeCun, Y. Bengio and G. Hinton, Deep learning, nature, 521(2015), 436.

[2] N. Cohen, O. Sharir and A. Shashua, On the expressive power of deep learning:A tensor analysis, in Conference on Learning Theory, 2016, 698-728.

[3] K. Hornik, M. Stinchcombe and H. White, Multilayer feedforward networks are universal approximators, Neural networks, 2(1989), 359-366.

[4] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems, 2(1989), 303-314.

[5] L.K. Jones, A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training, The annals of Statistics, 20(1992), 608-613.

[6] A.R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Transactions on Information theory, 39(1993), 930-945.

[7] S.W. Ellacott, Aspects of the numerical analysis of neural networks, Acta numerica, 3(1994), 145-202.

[8] A. Pinkus, Approximation theory of the MLP model in neural networks, Acta numerica, 8(1999), 143-195.

[9] M. Leshno, V.Y. Lin, A. Pinkus and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural networks, 6(1993), 861-867.

[10] V. Nair,and G. Hinton, Rectified linear units improve restricted boltzmann machines, in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, 807-814.

[11] U. Shaham, A. Cloninger and R.R. Coifman, Provable approximation properties for deep neural networks, Applied and Computational Harmonic Analysis, 2016.

[12] J.M. Klusowski and A.R. Barron, Approximation by Combinations of ReLU and Squared ReLU Ridge Functions with l1 and l0 Controls, 2018.

[13] D.X. Zhou, Universality of Deep Convolutional Neural Networks, arXiv preprint arXiv:1805.10769, 2018.

[14] R. Arora, A. Basu, P. Mianjy and A. Mukherjee, Understanding deep neural networks with rectified linear units, arXiv preprint arXiv:1611.01491, 2016.

[15] J.M. Tarela and M.V. Martinez, Region configurations for realizability of lattice piecewise-linear models, Mathematical and Computer Modelling, 30(1999), 17-27.

[16] P.G. Ciarlet, The finite element method for elliptic problems, Classics in applied mathematics, 40(2002), 1-511.

[17] P. Yin, S. Zhang, J. Lyu, S. Osher, Y. Qi and J. Xin, BinaryRelax:A Relaxation Approach For Training Deep Neural Networks With Quantized Weights, arXiv preprint arXiv:1801.06313, 2018.

[18] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86(1998), 2278-2324.

[19] A. Krizhevsky and G. Hinton, Learning multiple layers of features from tiny images, Technical report, University of Toronto, 2009.

[20] F. Li, B. Zhang and B. Lin, Ternary weight networks, arXiv preprint arXiv:1605.04711, 2016.

[21] D. Gobovic and M.E. Zaghloul, Analog cellular neural network with application to partial differential equations with variable mesh-size, in Circuits and Systems, 1994. ISCAS'94., 1994 IEEE International Symposium on, IEEE, 1994, 359-362.

[22] A.J. Meade Jr and A.A. Fernandez, The numerical solution of linear ordinary differential equations by feedforward neural networks, Mathematical and Computer Modelling, 19(1994), 1-25.

[23] A.J. Meade Jr and A.A. Fernandez, Solution of nonlinear ordinary differential equations by feedforward neural networks, Mathematical and Computer Modelling, 20(1994), 19-44.

[24] W. E, J. Han and A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Communications in Mathematics and Statistics, 5(2017), 349-380.

[25] J. Han, A. Jentzen, and W. E, Overcoming the curse of dimensionality:Solving high-dimensional partial differential equations using deep learning, arXiv preprint arXiv:1707.02568, 2017.

[26] Y. Khoo, J. Lu and L. Ying, Solving parametric PDE problems with artificial neural networks, arXiv preprint arXiv:1707.03351, 2017.

[27] R.A. DeVore, Nonlinear approximation, Acta numerica, 7(1998), 51-150.

[28] S. Brenner and R. Scott, The mathematical theory of finite element methods (Vol. 15), Springer Science & Business Media, 2007.

[29] R.H. Nochetto and A. Veeser, Primer of adaptive finite element methods, in Multiscale and adaptivity:modeling, numerics and applications, Springer, Berlin, Heidelberg, 2011, 125-225.

[30] S. Wang and X. Sun, Generalization of hinging hyperplanes, IEEE Transactions on Information Theory, 51(2005), 4425-4431.

[31] J. Dai, Y. Li, K. He and J. Sun, R-fcn:Object detection via region-based fully convolutional networks, in Advances in neural information processing systems, Curran Associates, Inc., 2016, 379-387.

[32] I.E. Lagaris, A. Likas and D.I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Transactions on Neural Networks, 9(1998), 987-1000.

[33] M. Raissi, P. Perdikaris and G.E. Karniadakis, Physics Informed Deep Learning (Part I):Datadriven solutions of nonlinear partial differential equations, arXiv preprint arXiv:1711.10561, 2017.

[34] M. Raissi, P. Perdikaris and G.E. Karniadakis, Physics informed deep learning (Part Ⅱ):Datadriven discovery of nonlinear partial differential equations, arXiv preprint arXiv:1711.10566, 2017.

[35] W. E and B. Yu, The Deep Ritz Method:A Deep Learning-Based Numerical Algorithm for Solving Variational Problems, Communications in Mathematics and Statistics, 6(2018), 1-12.

[36] R. Li, T. Tang and P. Zhang, Moving mesh methods in multiple dimensions based on harmonic maps, Journal of Computational Physics, 170(2001), 562-588.

[37] R. Li, T. Tang and P. Zhang, A moving mesh finite element algorithm for singular problems in two and three space dimensions, Journal of Computational Physics, 177(2002), 365-393.

[38] J. Xu, Deep Neural Networks and Multigrid Methods(Lecture Notes), Penn State University, 2017.

[39] S.R. Idelsohn, E. Onate, N. Calvo and F.D. Pin, The meshless finite element method, International Journal for Numerical Methods in Engineering, 58(2003), 893-912.

[40] G.R. Liu, Mesh free methods:moving beyond the finite element method, CRC press, 2002.

[41] G. Yagawa and T. Yamada, Free mesh method:a new meshless finite element method, Computational Mechanics, 18(1996), 383-386.

[42] V. Kurková and M. Sanguineti, Comparison of worst case errors in linear and neural network approximation, IEEE Transactions on Information Theory, 48(2002), 264-275.

[43] H.N. Mhaskar, On the tractability of multivariate integration and approximation by neural networks, Journal of Complexity, 20(2004), 561-590.

[44] K. Hornik, M. Stinchcombe, H. White and P. Auer, Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives, Neural Computation, 6(1994), 1262-1275.

Funding

This work is partially supported by Beijing International Center for Mathematical Research, the Elite Program of Computational and Applied Mathematics for PhD Candidates of Peking University, NSFC Grant 91430215, NSF Grants DMS-1522615 and DMS-1819157.

752

Accesses

0

Citation

4

Altmetric

Detail

Sections
Recommended

/