in the period of 2017-2019, physics-informed neural networks (PINNs) have been a very popular area of research in the scientific machine learning (SciML) community [1,2]. PINNs are used to solve ordinary and partial differential equations (PDEs) by representing the unknown solution field with a neural network, and finding the weights and biases (parameters) of the network by minimizing a loss function based on the governing differential equation. For example, the original PINNs approach penalizes the sum of pointwise errors of the governing PDE, whereas the Deep Ritz method minimizes an “energy” functional whose minimum enforces the governing equation [3]. Another alternative is to discretize the solution with a neural network, then construct the weak form of the governing equation using adversarial networks [4] or polynomial test functions [5,6]. Regardless of the choice of physics loss, neural network discretizations have been successfully used to analyze a number of systems governed by PDEs. From the Navier-Stokes equations [7] to conjugate heat transfer [8] and elasticity [9], PINNs and their variants have proven themselves a worthy addition to the computational scientist’s toolkit.
As is the case with all machine learning problems, an essential ingredient of obtaining robust and accurate solutions with PINNs is hyperparameter tuning. The analyst has much freedom in constructing a solution method—as discussed above, the choice of physics loss function is not unique, nor is the optimizer, technique of boundary condition enforcement, or neural network architecture. For example, while ADAM has historically been the go-to optimizer for machine learning problems, there has been a surge of interest in second-order Newton-type methods for physics-informed problems [10,11]. Other studies have compared techniques for enforcing boundary conditions on the neural network discretization [12]. Best practices for the PINNs architecture have primarily been investigated through the choice of activation functions. To combat the spectral bias of neural networks [13], sinusoidal activation functions have been used to better represent high-frequency solution fields [14,15]. In [16], a number of standard activation functions were compared on compressible fluid flow problems. Activation functions with partially learnable features were shown to improve solution accuracy in [17]. While most PINNs rely on multi-layer perceptron (MLP) networks, convolutional networks were investigated in [18] and recurrent networks in [19].
The studies referenced above are far from an exhaustive list of works investigating the choice of hyperparameters for physics-informed training. However, these studies demonstrate that the loss function, optimizer, activation function, and basic class of network architecture (MLP, convolutional, recurrent, etc.) have all received attention in the literature as interesting and important components of the PINN solution framework. One hyperparameter that has seen comparatively little scrutiny is the size of the neural network discretizing the solution field. In other words, to the best of our knowledge, there are no published works that ask the following question: how many parameters should the physics-informed network consist of? While this question is in some sense obvious, the community’s lack of interest in it is not surprising—there is no price to pay in solution accuracy for an overparameterized network. In fact, overparameterized networks can provide beneficial regularization of the solution field, as is seen with the phenomenon of double descent [20]. Furthermore, in the context of data-driven classification problems, overparameterized networks have been shown to lead to smoother loss landscapes [21]. Because solution accuracy provides no incentive to drive the size of the network down, and because the optimization problem may actually favor overparameterization, many authors use very large networks to represent PDE solutions.
While the accuracy of the solution only stands to gain by increasing the parameter count, the computational cost of the solution does scale with the network size. In this study, we take three examples from the PINNs literature and show that networks with orders of magnitude fewer parameters are capable of satisfactorily reproducing the results of the larger networks. The conclusion from these three examples is that, in the case of low-frequency solution fields, small networks can obtain accurate solutions with decreased computational cost. We then provide a counterexample, where regression to a complex oscillatory function continuously benefits from increasing the network size. Thus, our suggestion is as follows: the number of parameters in a PINN should be as few as possible, but no fewer.
The first three examples are inspired by problems taken from the PINNs literature. In these works, large networks are used to obtain the PDE solution, where the size of the network is measured by the number of parameters. While different network architectures may perform differently with different parameter counts, we use this metric as a proxy for network complexity, independent of the architecture. In our examples, we incrementally decrease the parameter count of a multilayer perceptron network until the error with a reference solution begins to increase. This point represents a lower limit on the network size for the particular problem, and we compare the number of parameters at this point to the number of parameters used in the original paper. In each case, we find that the networks from the literature are overparameterized by at least an order of magnitude. In the fourth example, we solve a regression problem to show how small networks can fail to represent oscillatory fields, which acts as a caveat to our findings.
The phase field model of fracture is a variational approach to fracture mechanics, which simultaneously finds the displacement and damage fields by minimizing a suitably defined energy functional [22]. Our study is based on the one-dimensional example problem given in [23], which uses the Deep Ritz method to determine the displacement and damage fields that minimize the fracture energy functional. This energy functional is given by

where \( x \) is the spatial coordinate, \( u(x) \) is the displacement, \( \alpha(x)\in[0,1] \) is the crack density, and \( \ell \) is a length scale determining the width of smoothed cracks. The energy functional comprises two components \( \Pi^u \) and \( \Pi^{\alpha} \), which are the elastic and fracture energies respectively. As in the cited work, we take \(\ell=0.05\). The displacement and phase field are discretized with a single neural network \( N: \mathbb R \rightarrow \mathbb R^2 \) with parameters \(\boldsymbol \theta\). The problem is driven by an applied tensile displacement on the right end, which we denote \( U \). Boundary conditions are built into the two fields with
\[ \begin{bmatrix}
u(x ; \boldsymbol \theta) \\ \alpha (x; \boldsymbol \theta)
\end{bmatrix} = \begin{bmatrix} x(1-x) N_1(x;\boldsymbol \theta) + Ux \\ x(1-x) N_2(x; \boldsymbol \theta)
\end{bmatrix} ,\]
where \(N_i \) refers to the \(i\)-th output of the network and the Dirichlet boundary conditions on the crack density are used to suppress cracking at the edges of the domain. In [23], a four hidden-layer MLP network with a width of 50 is used to represent the two solution fields. If we neglect the bias at the final layer, this corresponds to \(7850\) trainable parameters. For all of our studies, we use a two-hidden layer network with hyperbolic tangent activation functions and no bias at the output layer, as, in our experience, these networks suffice to represent any solution field of interest. If both hidden layers have width \(M\), the total number of parameters in this network is \(M^2+5M\). When \(M=86\), we obtain \(7826\) trainable parameters. In the absence of an analytical solution, we use this as the large network reference solution to which the smaller networks are compared.
To generate the solution fields, we minimize the total potential energy using ADAM optimization with a learning rate of \(1 \times 10^{-3}\). Total fracture of the bar is observed around \(U=0.6\). As in the paper, we compute the elastic and fracture energies over a range of applied displacements, where ADAM is run for \(3500\) epochs at each displacement increment to obtain the solution fields. These “loading curves” are used to compare the performance of networks of different sizes. Our experiment is conducted with \(8\) different network sizes, each comprising \(20\) increments of the applied displacement to build the loading curves. See Figure 1 for the loading curves computed with the different network sizes. Only when there are \(|\boldsymbol \theta|=14\) parameters, which corresponds to a network of width \(2\), do we see a divergence from the loading curves of the large network reference solution. The smallest network that performs well has \(50\) parameters, which is \(157\times\) smaller than the network used in the paper. Figure 2 confirms that this small network is capable of approximating the discontinuous displacement field, as well as the localized damage field.

Figure 1: The loading curves agree with the reference solution (grey) for all but the network with \(14\) trainable parameters. Measuring performance in this way, the large network is overparameterized by a factor of \(157\). All figures in this article are by the author.

Figure 2: A network with \(|\boldsymbol \theta|=50\) parameters can represent the discontinuous displacement field as well as the narrow band of damage. This example suggests that small networks perform well even on problems with localized features.
We now study the effect of network size on a common model problem from fluid mechanics. Burgers’ equation is frequently used to test numerical solution methods because of its nonlinearity and tendency to form sharp features. The viscous Burgers’ equation with homogeneous Dirichlet boundaries is given by
\[\frac{\partial u}{\partial t} + u\frac{\partial u}{\partial x} = \nu \frac{\partial^2 u}{\partial x^2}, \quad u(x,0) = u_0(x), \quad u(-1,t)=u(1,t) = 0,\]
where \(x\in[-1,1]\) is the spatial domain, \(t\in[0,T]\) is the time coordinate, \(u(x,t)\) is the velocity field, \(\nu\) is the viscosity, and \(u_0(x)\) is the initial velocity profile. In [24], a neural network discretization of the velocity field is used to obtain a solution to the governing differential equation. Their network contains \(3\) hidden layers with \(64\) neurons in each layer, corresponding to \(8576\) trainable parameters. Again, we use a two hidden-layer network that has \(5M+M^2\) trainable parameters where \(M\) is the width of each layer. If we take \(M=90\), we obtain \(8550\) trainable parameters in our network. We take the solution from this network to be the reference solution \(u_{\text{ref}}(x,t)\), and compute the discrepancy between velocity fields from smaller networks. We do this with an error function given by
\[ E \Big( u(x,t)\Big)= \frac{\int_{\Omega}| u(x,t) – u_{\text{ref}}(x,t)| d\Omega}{\int_{\Omega}| u_{\text{ref}}(x,t)| d\Omega},\]
where \(\Omega = [-1,1] \times [0,T]\) is the computational domain. To solve Burgers’ equation, we adopt the standard PINNs approach and minimize the squared error of the governing equation:
\[ \underset{\boldsymbol \theta}{\text{argmin }} L(\boldsymbol \theta), \quad L(\boldsymbol \theta) = \frac{1}{2} \int_{\Omega} \Big(\frac{\partial u}{\partial t} + u\frac{\partial u}{\partial x} – \nu \frac{\partial^2 u}{\partial x^2}\Big)^2 d\Omega.\]
The velocity field is discretized with the help of an MLP network \(N (x,t;\boldsymbol \theta)\), and the boundary and initial conditions are built-in with a distance function-type approach [25]:
\[ u(x,t;\boldsymbol \theta) = (1+x)(1-x)\Big(t N(x,t; \boldsymbol \theta) + u_0(x)(1-t/T)\Big). \]
In this problem, we take the viscosity to be \(\nu=0.01\) and the final time to be \(T=2\). The initial condition is given by \(u_0(x) = – \sin(\pi x)\), which leads to the well-known shock pattern at \(x=0\). We run ADAM optimization for \(1.5 \times 10^{4}\) epochs with a learning rate of \(1.5 \times 10^{-3}\) to solve the optimization problem at each network size. By sweeping over \(8\) network sizes, we again look for the parameter count at which the solution departs from the reference solution. Note that we verify our reference solution against a spectral solver to ensure the accuracy of our implementation. See Figure 3 for the results. All networks with \(|\boldsymbol \theta|\geq 150\) parameters show approximately equal performance. As such, the original network is overparameterized by a factor of \(57\).

Figure 3: The viscous Burgers’ equation forms a shock at \(x=0\) which decays with time. The network with \(|\boldsymbol \theta|=50\) parameters fails to accurately resolve the shock pattern. All networks larger than this show approximately equal performance in terms of error with the reference solution.
In this example, we consider the nonlinearly elastic deformation of a cube under a prescribed displacement. The strain energy density of a 3D hyperelastic solid is given by the compressible Neohookean model [26] as
\[\Psi\Big( \mathbf{u}(\mathbf{X}) \Big) = \frac{\ell_1}{2}\Big( I_1 – 3 \Big) – \ell_1 \ln J + \frac{\ell_2}{2} \Big( \ln J \Big)^2 ,\]
where \(\ell_1\) and \(\ell_2\) are material properties which we take as constants. The strain energy makes use of the following definitions:
\[ \mathbf{F} = \mathbf{I} + \frac{\partial \mathbf{u}}{\partial \mathbf{X}}, \\
I_1 = \mathbf{F} : \mathbf{F}, \\
J = \det(\mathbf{F}),\]
where \(\mathbf{u}\) is the displacement field, \(\mathbf{X}\) is the position in the reference configuration, and \(\mathbf F\) is the deformation gradient tensor. The displacement field is obtained by minimizing the total potential energy, given by
\[ \Pi\Big( \mathbf{u}(\mathbf{X}) \Big) = \int_{\Omega} \Psi\Big( \mathbf{u}(\mathbf{X}) \Big) – \mathbf{b} \cdot \mathbf{u} d\Omega – \int_{\partial \Omega} \mathbf{t} \cdot \mathbf{u} dS,\]
where \(\Omega\) is the undeformed configuration of the body, \(\mathbf{b}\) is a volumetric force, and \(\mathbf{t}\) is an applied surface traction. Our investigation into the network size is inspired by [27], in which the Deep Ritz method is used to obtain a minimum of the hyperelastic total potential energy functional. However, we opt to use the Neohookean model of the strain energy, as opposed to the Lopez-Pamies model they employ. As in the cited work, we take the undeformed configuration to be the unit cube \(\Omega=[0,1]^3\) and we subject the cube to a uniaxial strain state. To enforce this strain state, we apply a displacement \(U\) in the \(X_3\) direction on the top surface of the cube. Roller supports, which zero only the \(X_3\) component of the displacement, are applied on the bottom surface. All other surfaces are traction-free, which is enforced weakly by the chosen energy functional. The boundary conditions are satisfied automatically by discretizing the displacement as
\[ \begin{bmatrix}
u_1(\mathbf{X}; \boldsymbol \theta) \\
u_2(\mathbf{X}; \boldsymbol \theta) \\
u_3(\mathbf{X}; \boldsymbol \theta)
\end{bmatrix} = \begin{bmatrix}
X_3 N_1(\mathbf{X}; \boldsymbol \theta)\\
X_3 N_2(\mathbf{X}; \boldsymbol \theta)\\
\sin(\pi X_3) N_3(\mathbf{X}; \boldsymbol \theta) + UX_3
\end{bmatrix},\]
where \( N_i\) is the \(i\)-th component of the network output. In the cited work, a six hidden-layer network of width \(40\) is used to discretize the three components of the displacement field. This corresponds to \(8480\) trainable parameters. Given that the network is a map \( N: \mathbb R^3 \rightarrow \mathbb R^3\), a two hidden-layer network of width \(M\) has \(8M+M^2\) trainable parameters when no bias is applied at the output layer. Thus, if we take \( M=88 \), our network has \(8448\) trainable parameters. We will take this network architecture to be the large network reference.
In [27], the relationship between the normal component of the first Piola-Kirchhoff stress tensor \(\mathbf{P}\) in the direction of the applied displacement and the corresponding component of the deformation gradient \(\mathbf{F}\) was computed to verify their Deep Ritz implementation. Here, we study the relationship between this tensile stress and the applied displacement \(U\). The first Piola-Kirchhoff stress tensor is obtained with the strain energy density as
\[ \mathbf{P} = \frac{ \partial \Psi}{\partial \mathbf{F}} = \ell_1( \mathbf{F} – \mathbf{F}^{-T} ) + \ell_2 \mathbf{F}^{-T}\log J.\]
Given the unit cube geometry and the uniaxial stress/strain state, the deformation gradient is given by
\[ \mathbf{F} = \begin{bmatrix}
1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 1+U
\end{bmatrix}.\]
With these two equations, we compute the tensile stress \(P_{33}\) and the strain energy as a function of the applied displacement to be
\[ \begin{aligned} P_{33} = \ell_1\Big( 1 + U – \frac{1}{1+U}\Big) + \ell_2\frac{\log(1+U)}{1+U}, \\ \Pi = \frac{\ell_1}{2}(2+(1+U)^2-3) – \ell_1 \log(1+U) + \frac{\ell_2}{2}(\log(1+U))^2. \end{aligned}\]
These analytical solutions can be used to verify our implementation of the hyperelastic model, as well as to gauge the performance of different size networks. Using the neural network model, the tensile stress and the strain energy are computed at each applied displacement with:
\[ P_{33} = \int_{\Omega} \ell_1( {\mathbf{F}} – {\mathbf{F}}^{-T} ) + \ell_2 {\mathbf{F}}^{-T}\log J d\Omega, \quad \Pi = \int_{\Omega} \Psi\Big( {\mathbf{u}}(\mathbf{X})\Big) d\Omega,\]
where the displacement field is constructed from parameters obtained from the Deep Ritz method. To compute the stress, we average over the entire domain, given that we expect a constant stress state. In this example, the material parameters are set at \(\ell_1=1\) and \(\ell_2=0.25\). We iterate over \(8\) network sizes and take \(10\) load steps at each size to obtain the stress and strain energy as a function of the applied displacement. See Figure 4 for the results. All networks exactly reproduce the strain energy and stress loading curves. This includes even the network of width \(2\), with only \(20\) trainable parameters. Thus, the original network has \(424\times\) more parameters than necessary to represent the results of the tensile test.

Figure 4: Loading curves for the Neohookean hyperelastic solid. Even the smallest network in our test leads to accurate predictions of the stress and strain energy.
In the fourth and final example, we solve a regression problem to show the failure of small networks to fit high-frequency functions. The one-dimensional regression problem is given by
\[ \underset{\boldsymbol \theta}{\text{argmin }} L(\boldsymbol \theta), \quad L(\boldsymbol \theta) = \frac{1}{2}\int_0^1\Big( v(x) – N(x;\boldsymbol \theta) \Big)^2 dx,\]
where \( N\) is a two hidden-layer MLP network and \(v(x)\) is the target function. In this example, we take \(v(x)=\sin^5(20\pi x)\). We iterate over \(5\) different network sizes and report the converged loss value \(L\) as an error measure. We train using ADAM optimization for \(5 \times 10^4\) epochs and with a learning rate of \(5 \times 10^{-3}\). See Figure 5 for the results. Unlike the previous three examples, the target function is sufficiently complex that large networks are required to represent it. The converged error decreases monotonically with the parameter count. We also time the training procedure at each network size, and note the dependence of the run time (in seconds) on the parameter count. This example illustrates that representing oscillatory functions requires larger networks, and that the parameter count drives up the cost of training.

Figure 5: Because of the complexity of the target function, the converged error monotonically decreases with the parameter count, indicating that small networks are not sufficiently expressive. This contrasts with our findings from the PINNs problems, in which the solution fields were not oscillatory.
While tuning hyperparameters governing the loss function, optimization process, and activation function is common in the PINNs community, it is less common to tune the network size. With three example problems taken from the literature, we have shown that very small networks often suffice to represent PDE solutions, even when there are discontinuities and/or other localized features. See Table 1 for a summary of our results on the possibility of using small networks. To qualify our findings, we then presented the case of regression to a high-frequency target function, which required a large number of parameters to fit accurately. Thus, our conclusions are as follows: solution fields which do not oscillate can often be represented by small networks, even when they contain localized features such as cracks and shocks. Because the cost of training scales with the number of parameters, smaller networks can expedite training for physics-informed problems with non-oscillatory solution fields. In our experience, such solution fields appear regularly in practical problems from heat conduction and static solid mechanics. By shrinking the size of the network, these problems and others represent opportunities to render PINN solutions more computationally efficient, and thus more competitive with traditional approaches such as the finite element method.
| Problem | Overparameterization |
| Phase field fracture [23] | \(157 \times \) |
| Burgers’ equation [24] | \( 57 \times \) |
| Neohookean hyperelasticity [27] | \( 424 \times \) |
[1] Justin Sirignano and Konstantinos Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics, 375:1339–1364, December 2018. arXiv:1708.07469 [q-fin].
[2] M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, February 2019.
[3] Weinan E and Bing Yu. The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems, September 2017. arXiv:1710.00211 [cs].
[4] Yaohua Zang, Gang Bao, Xiaojing Ye, and Haomin Zhou. Weak Adversarial Networks for High-dimensional Partial Differential Equations. Journal of Computational Physics, 411:109409, June 2020. arXiv:1907.08272 [math].
[5] Reza Khodayi-Mehr and Michael M. Zavlanos. VarNet: Variational Neural Networks for the Solution of Partial Differential Equations, December 2019. arXiv:1912.07443 [cs].
[6] E. Kharazmi, Z. Zhang, and G. E. Karniadakis. Variational Physics-Informed Neural Networks For Solving Partial Differential Equations, November 2019. arXiv:1912.00873 [cs].
[7] Xiaowei Jin, Shengze Cai, Hui Li, and George Em Karniadakis. NSFnets (Navier-Stokes Flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations. Journal of Computational Physics, 426:109951, February 2021.
[8] Shengze Cai, Zhicheng Wang, Sifan Wang, Paris Perdikaris, and George Em Karniadakis. Physics-Informed Neural Networks for Heat Transfer Problems. Journal of Heat Transfer, 143(060801), April 2021.
[9] Min Liu, Zhiqiang Cai, and Karthik Ramani. Deep Ritz method with adaptive quadrature for linear elasticity. Computer Methods in Applied Mechanics and Engineering, 415:116229, October 2023.
[10] Sifan Wang, Ananyae Kumar Bhartari, Bowen Li, and Paris Perdikaris. Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective, September 2025. arXiv:2502.00604 [cs].
[11] Jorge F. Urban, Petros Stefanou, and Jose A. Pons. Unveiling the optimization process of physics informed neural networks: How accurate and competitive can PINNs be? Journal of Computational Physics, 523:113656, February 2025.
[12] Conor Rowan, Kai Hampleman, Kurt Maute, and Alireza Doostan. Boundary condition enforcement with PINNs: a comparative study and verification on 3D geometries, December 2025. arXiv:2512.14941[math].
[13] Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, and Aaron Courville. On the Spectral Bias of Neural Networks, May 2019. arXiv:1806.08734 [stat].
[14] Mirco Pezzoli, Fabio Antonacci, and Augusto Sarti. Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responses. In Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023, pages 2177–2184, January 2024.
[15] Shaghayegh Fazliani, Zachary Frangella, and Madeleine Udell. Enhancing Physics-Informed Neural Networks Through Feature Engineering, June 2025. arXiv:2502.07209 [cs].
[16] Duong V. Dung, Nguyen D. Song, Pramudita S. Palar, and Lavi R. Zuhal. On The Choice of Activation Functions in Physics-Informed Neural Network for Solving Incompressible Fluid Flows. In AIAA SCITECH 2023 Forum. American Institute of Aeronautics and Astronautics. Eprint: https://arc.aiaa.org/doi/pdf/10.2514/6.2023-1803.
[17] Honghui Wang, Lu Lu, Shiji Song, and Gao Huang. Learning Specialized Activation Functions for Physics-informed Neural Networks. Communications in Computational Physics, 34(4):869–906, June 2023. arXiv:2308.04073 [cs].
[18] Zhao Zhang, Xia Yan, Piyang Liu, Kai Zhang, Renmin Han, and Sheng Wang. A physics-informed convolutional neural network for the simulation and prediction of two-phase Darcy flows in heterogeneous porous media. Journal of Computational Physics, 477:111919, March 2023.
[19] Pu Ren, Chengping Rao, Yang Liu, Jianxun Wang, and Hao Sun. PhyCRNet: Physics-informed Convolutional-Recurrent Network for Solving Spatiotemporal PDEs. Computer Methods in Applied Mechanics and Engineering, 389:114399, February 2022. arXiv:2106.14103 [cs].
[20] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32):15849–15854, August 2019. Publisher: Proceedings of the National Academy of Sciences.
[21] Arthur Jacot, Franck Gabriel, and Cl´ement Hongler. Neural Tangent Kernel: Convergence and Generalization in Neural Networks, February 2020. arXiv:1806.07572 [cs].
[22] B. Bourdin, G. A. Francfort, and J-J. Marigo. Numerical experiments in revisited brittle fracture. Journal of the Mechanics and Physics of Solids, 48(4):797–826, April 2000.
[23] M. Manav, R. Molinaro, S. Mishra, and L. De Lorenzis. Phase-field modeling of fracture with physics-informed deep learning. Computer Methods in Applied Mechanics and Engineering, 429:117104, September 2024.
[24] Xianke Wang, Shichao Yi, Huangliang Gu, Jing Xu, and Wenjie Xu. WF-PINNs: solving forward and inverse problems of burgers equation with steep gradients using weak-form physics-informed neural networks. Scientific Reports, 15(1):40555, November 2025. Publisher: Nature Publishing Group.
[25] N. Sukumar and Ankit Srivastava. Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks. Computer Methods in Applied Mechanics and Engineering, 389:114333, February 2022.
[26] Javier Bonet and Richard D. Wood. Nonlinear Continuum Mechanics for Finite Element Analysis. Cambridge University Press, Cambridge, 2 edition, 2008.
[27] Diab W. Abueidda, Seid Koric, Rashid Abu Al-Rub, Corey M. Parrott, Kai A. James, and Nahil A. Sobh. A deep learning energy method for hyperelasticity and viscoelasticity. European Journal of Mechanics – A/Solids, 95:104639, September 2022. arXiv:2201.08690 [cs].