Partial differential equations discovery with EPDE framework: Application for real and synthetic data

doi:10.1016/j.jocs.2021.101345

Journal of Computational Science

Volume 53, July 2021, 101345

https://doi.org/10.1016/j.jocs.2021.101345 Get rights and content

Highlights

•
Framework for partial differential discovery is described.
•
Evolutionary algorithm works with sparse regression to achieve concise PDE model.
•
Neural network vs. finite-difference approach considered.

Abstract

Data-driven methods provide model creation tools for systems where the application of conventional analytical methods is restrained. The proposed method involves the data-driven derivation of a partial differential equation (PDE) for process dynamics, helping process simulation and study. The paper describes the methods that are used within the EPDE (Evolutionary Partial Differential Equations) partial differential equation discovery framework [1]. The framework involves a combination of evolutionary algorithms and sparse regression. Such an approach is versatile compared to other commonly used data-driven partial differential derivation methods by making fewer assumptions about the resulting equation. This paper highlights the algorithm features that allow data processing with noise, which is similar to the algorithm's real-world applications. This paper is an extended version of the ICCS-2020 conference paper [2].

Introduction

The ability to simulate complex processes, neglecting a lack of knowledge about the system's underlying structure, can be vital for developing models in such spheres of science as biology, medicine, materials technology, and metocean studies. In contrast to the deterministic physics-based models, developed by application of conservation laws to the studied process, data-driven modeling (DDM) involves developing complete models from various fields of measurements, describing the process, using means of statistics and machine learning algorithms. Moreover, in some occasions, DDM can enhance the existing physics-based models with supplementary expressions or refined weight values [3]. In fluid dynamics science and hydrometeorology, surrogate models’ development is the most common application of data-driven algorithms.

In the current paper's scope are the methods of data-driven differential equation discovery. Differential equations, in some cases, are interpretable by the expert either in the application field or in the differential equations. Moreover, the well-developed mathematical physics methods for the differential equations analysis may interpret the equations. In most cases, actual algorithms utilize the sparse regression in a prescribed differential terms library [4], [5]. The second popular case of the study is the neural network's algorithms for differential equations discovery [6], [7], [8].

We consider discovered models as the surrogate models that could be applied to the hydrometeorological examples. Various approaches to surrogate modeling are described below, including differential equations discovery.

The modern surrogate models tend to belong to one of three major groups [9]:

•
Data-driven empirical approximations of the deterministic model outputs. These models use conclusions obtained with the statistical or machine learning tools (response surfaces, kriging) applied to the data.
•
Reduced-order models are based on the projection of the model's main equations to the subspace with the reduced dimensionality, using various orthogonal decompositions.
•
Multifidelity models: simplifications of representing the complex physics of the model's process by omitting the less significant subprocesses or increasing the model's scale. In some cases, the experimental setup requires applying models with different fidelity levels to evaluate multiple scales of processes or modeling ensemble [10], [11].

In this research, we are interested in developing a new approach that belongs to the first class of models. However, natural sciences applications require robustness of the model and should work in high-dimensional space to handle spatio-temporal and other types of variability. Transferring from one spatial dimension usually considered in references to higher spatial dimensions requires the algorithm to handle exponentially growing noise levels.

In the previous works [12] we have described the EPDE (Evolutionary Partial Differential Equations)¹ approach, that can provide a flexible, yet efficient tool for data-driven equation derivation. This work increases the problem's difficulty by introducing higher-dimensional cases and high-magnitude noise in the data.

This version extends conference paper [2] and introduces a series of experiments that allow comparing EPDE framework with the analogs in a better way. The module system of the PDE algorithm that is briefly described in Section 6 allows to, as an example, use different from the finite-difference differentiation scheme. We show it using neural networks and automatic derivatives in Section 7.

This paper is organized as follows: Section 2 briefly introduces the existing surrogate modeling approaches. Section 3 describes the problem of the data-driven PDE discovery and Section 4 describes the practical realization. In Section 5, numerical examples of the synthetic data and the real data are shown. Section 6 presents the additions to the method described in the previous article [12], which allows dealing with the higher-dimension data-driven PDE discovery. Section 7 is dedicated to illustrating the module structure and experiments with replacement of differentiation model with neural network approximation. Section 8 concludes the paper.

Section snippets

Related work

The first examples of the data-driven surrogate modeling in hydrometeorology have appeared in its earliest stages with the understanding, that the contemporary full-scale models required computational powers, inaccessible for many research teams. The original approaches were based on the pattern scaling – the extension of the present trend, obtained from the ensemble of full-scale models [13], [14]. The statistical emulation on the base of an ensemble of pre-computed deterministic models has

Problem statement

The class of problems, which the described EPDE algorithm can solve, can be summarized as follows: the process, which involves scalar field $u$ , is occurring in the area $Ω$ and is governed by the partial differential equation (1). However, there is no a priori information about the dynamics of the process except that some form of PDE can describe it (for simplicity, we consider temporally varying 2D field case, even though the problem could be formulated for an arbitrary field). In recent

Method description

In this section, the details of the evolutionary method of partial differential equation derivation are described. The proposed method involves a combination of evolutionary algorithms and sparse regression to detect the equation structure. The sparse regression aims to construct equation terms set, while the evolutionary algorithm is focused on selecting significant terms from the created set and calculating weights that will be present in the resulting equation. At first, we introduce the

Synthetic data

a) Wave equation. The analysis of the algorithm performance is held on the synthetic data. This simplification can show the result's response to various types and magnitudes of noise, which is generally unknown on the measurement data. As in the previous studies, the solution of the wave equation with two spatial variables Eq. (14), where $t$ – time, $x$ , $y$ – spatial coordinates, $u$ – studied function (for example, small out-of-plane membrane displacement), and $α_{1} = α_{2} = 1$ was taken as the synthetic

EPDE framework description

The framework, encompassing the described method, is designed to allow the user to customize the algorithm's significant elements while giving the default pipeline and necessary tools for the differential equation discovery. The setup of the equation discovery experiment requires the selection of functions (tokens) that form the pool, from which the algorithm creates the candidate equations. The main element that has to be defined is obtaining the function values on the set of processed points

Neural networks approximation with automatic differentiation

This section is dedicated to changing the differentiation method. The proposed algorithm has a modular structure. Thus, we may replace the differentiation algorithm from finite differences or analytical differentiation of polynomials to the neural network approximation with further automatic differentiation.

Conclusion

The proposed method has proven to be suitable for the data-driven derivation of equations that can model various physical processes. The robustness of the algorithm to the noise in the input data provided by improved preprocessing of data allows the framework applicable to real-world problems. Even in the cases of substantial noise in the input data, the resulting equations had the correct structures and, therefore, can correctly describe the studied system. Other notable points about the

Declaration of interests

None.

Acknowledgements

This research is financially supported by The Russian Scientific Foundation, Agreement #19-71-00150.

Mikhail Maslyaev is an engineer and PhD student of Nature System Simulation lab, National Centre for Cognitive research, ITMO University, Russia. Mikhail works on his thesis on an evolutionary algorithm for PDE discovery. Mikhail is a creator and maintainer of EPDE framework repository.

References (26)

T. Qin et al.
Data driven governing equations approximation using deep neural networks
J. Comput. Phys.
(2019)
J. Berg et al.
Data-driven discovery of pdes in complex datasets
J. Comput. Phys.
(2019)
I. Knowles et al.
On the recovery of multiple flow parameters from transient head data
J. Comput. Appl. Math.
(2004)
K. Hornik et al.
Multilayer feedforward networks are universal approximators
Neural Networks
(1989)
NSS Team, Fedot E* algotirhms, https://github.com/ITMO-NSS-team/FEDOT.Algs...
M. Maslyaev et al.
Data-driven partial differential equations discovery approach for the noised multi-dimensional data
J. Berg, K. Nyström, Neural network augmented inverse problems for pdes, arXiv preprint arXiv:1712.09685 (2017)....
H. Schaeffer et al.
Learning partial differential equations via data discovery and sparse optimization
Proc. Royal Soc. A: Math. Phys. Eng. Sci.
(2017)
S.H. Kang, W. Liao, Y. Liu, Ident: Identifying differential equations with numerical time evolution, arXiv preprint...
Z. Long et al.
PDE-net: Learning PDEs from data
International Conference on Machine Learning
(2018)

M. Raissi

Deep hidden physics models: Deep learning of nonlinear partial differential equations

J. Mach. Learn. Res.

(2018)

M.J. Asher et al.

A review of surrogate models and their application to groundwater modeling

Water Resour. Res.

(2015)

M.P. Rumpfkeil et al.

Multi-fidelity surrogate models for flutter database generation

Comput. Fluids

(2020)

Cited by (32)

MORL4PDEs: Data-driven discovery of PDEs based on multi-objective optimization and reinforcement learning
2024, Chaos, Solitons and Fractals
Extracting fundamental behavior patterns or governing equations from data can deepen our understanding and insights into physical systems, it will lead to the better control and application of these systems in science and engineering. Currently, most existing methods in extracting governing equations require a candidate function term library in advance, which results in the limitations of those learned equations. To overcome this problem in this paper we propose a new method for data-driven discovery of parsimonious partial differential equations (PDEs) by utilizing symbolic regression based on multi-objective optimization and reinforcement learning, we call the MORL4PDEs in short. Specifically, neural network agent aims to generate the pre-order traversal sequence of a binary tree, and through which we can obtain the expression for each PDE. Then the resulting individuals can be used as the initial population in the multi-objective genetic algorithm to ensure the accuracy and parsimony of the equations, whose plausibility is guaranteed according to the constraints generated from the rules of PDEs. Meanwhile, the neural network is optimized through reinforcement learning with the final expression of each PDE as a reward. Finally, several experiments are conduct to demonstrate the effectiveness of the proposed method, and the results show MORL4PDEs can identify governing equations in different dynamic systems, including those PDEs with complex forms and high-order derivatives.
Learning dynamics from coarse/noisy data with scalable symbolic regression
2023, Mechanical Systems and Signal Processing
Distilling equations from data can provide insights into physics systems, helping validate theoretical modeling, infer unknown system properties, and explore indeterminate processes. Noisy or downsampled data have been a bottleneck limiting wide applications of symbolic regression, since identified equations are sensitively affected by data statistics. Coarse and noisy data can deteriorate equation reliability by magnifying errors in derivative estimation. While physics-informed surrogate models have been introduced in the literature to reconstruct augmented data that are high-resolution, less affected by noise, and more consistent with underlying physics mechanism (“physics-consistent”) for symbolic regression, dominant symbolic regression methods with data augmentation features are reluctant to consider wide function search space. This is due to the optimization burden in nontrivial function search, which is further exacerbated when dynamics exhibit chaotic or rapid oscillation. In this paper, a novel physics-informed equation learning method is proposed to address these issues. Specifically, leveraging a Fourier feature mapping enables a regular and accessible fully-connected neural network (the surrogate model) to learn dynamics with various frequency components. A neural-network-based symbolic model is improved to efficiently represent and separate function combinations in the form of polynomial series. Joint training of the surrogate model and the symbolic model enables “physics-consistent” data augmentation to the original low-quality data and lays the ground for a more reliable equation discovery. The proposed method is demonstrated by numerical and experimental systems parameterized by ordinary differential equations. Compared with baseline methods, such as sparse Bayesian learning and physics-informed neural network with dictionaries, it is found that the proposed method possesses evident competitive edges regarding optimization tractability for scalable function search. Codes will be publicly available upon publication.
Forecasting of Sea Ice Concentration using CNN, PDE discovery and Bayesian Networks
2023, Procedia Computer Science
Predicting the spatiotemporal data of natural processes is crucial for both academic research and industrial applications. In particular, ice formation and melting processes play a critical role in the oil industry operating on the Arctic shelf. While traditional machine learning methods face limitations in forecasting multidimensional phenomena, physical modelling can be computationally intensive and less flexible. To address these challenges, we have proposed three lightweight methods for modelling the time-spatial data of ice concentration. They demonstrate high forecasting accuracy over extended time horizons. Our proposed solution involves a convolutional neural network (CNN) and physics-based approach that uses the discovered partial differential equations (PDE) to capture the complex dynamics of ice melting. Additionally, we present a Bayesian network as a simple yet effective tool for simulating the annual distribution of the parameter. The practicality and effectiveness of these approaches are demonstrated through the forecasting of sea ice concentration in the Arctic Kara Sea.
Discovery of multivariable algebraic expressions using evolutionary optimization
2022, Procedia Computer Science
Machine learning interpretation has a well-established discussion in various areas. Whereas most interpretation is made after the modelling, one may try to obtain the informative by itself model. Mainly such models are represented by time-series models such as AR(auto-regression). For time series, we may also obtain the expression in closed form, which has the advantage of being interpretable. Transferring from single to multiple dimensions increases the search space drastically. Thus, we must reduce it for the initial assumption search. The paper proposes the algorithm of algebraic expression discovery using evolutionary optimization. The initial assumption is made using the Fourier series decomposition. Several examples of the algorithm's work based on known functions and arctic ocean ice concentration data are shown.
DISCOVER: Deep identification of symbolically concise open-form partial differential equations via enhanced reinforcement learning
2024, Physical Review Research
Towards Discovery of the Differential Equations
2023, Doklady Mathematics

View all citing articles on Scopus

Dr. Alexander Hvatov is a senior researcher of Nature System Simulation lab, National Centre for Cognitive research, ITMO University, Russia. Alex mostly interested in the classical mathematical models such as partial differential equations and mathematical physics methods. Another direction of Alex's research is the wave propagation in periodic structures.

Dr. Anna Kalyuzhnaya is the head of Nature System Simulation lab National Centre for Cognitive research, ITMO University, Russia and an assistance professor. Anna mostly interested in the statistical methods and machine learning methods applied in different fields of natural and social sciences.

^☆: The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.

View full text

Partial differential equations discovery with EPDE framework: Application for real and synthetic data ☆

Highlights

Abstract

Introduction

Section snippets

Related work

Problem statement

Method description

Synthetic data

EPDE framework description

Neural networks approximation with automatic differentiation

Conclusion

Declaration of interests

Acknowledgements

J. Comput. Phys.

J. Comput. Phys.

J. Comput. Appl. Math.

Neural Networks

Data-driven partial differential equations discovery approach for the noised multi-dimensional data

Learning partial differential equations via data discovery and sparse optimization

Proc. Royal Soc. A: Math. Phys. Eng. Sci.

PDE-net: Learning PDEs from data

International Conference on Machine Learning