Skip to main content
Log in

Combining neural computation and genetic programming for observational causality detection and causal modelling

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

A methodology, to determine the causal relations between time series and to derive the set of equations describing the interacting systems, has been developed. The techniques proposed are completely data driven and they are based on ensembles of Time Delay Neural Networks (TDNNs) and Symbolic Regression (SR) via Genetic Programming (GP). With regard to the detection of the causal influences and the identification of graphical causal networks, the developed tools have better performances than those reported in the literature. For example, the TDNN ensembles can cope with evolving systems, non-Markovianity, feedback loops and multicausality. In its turn, on the basis of the information derived from the TDNN ensembles, SR via GP permits to identify the set of equations, i.e. the detailed model of the interacting systems. Numerical tests and real life examples from various disciplines prove the power and versatility of the developed tools, capable of handling tens of time series and even images. The excellent results obtained emphasize the importance of recording the time evolution of signals, which would allow a much better understanding of many issues, ranging from the physical to the social and medical sciences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Rossi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Theoretical interpretation of the indicators for causality detection

To analyse the properties of the statistical indicators introduced in Sect. 3, let’s name y the time series under study, which can be represented by the following generic function:

$${y}_{i}=f\left({y}_{i-1},\dots ,{y}_{i-{\Delta }_{y}},{x }_{j,i-1},\dots ,{x}_{j,i-{\Delta }_{{x}_{i}}}\right)+\epsilon$$

where i indicates the time axis, \({x}_{j}\) is the j-th driver of the time series, and \(\epsilon\) represents all the stochastics processes influencing the measurements (for example random noise), assumed additive and independent. The causality detection procedure consists of calculating the prediction error in two cases, first using all the expected drivers (“all”), and secondly removing from the network one driver identified with the letter “k”. The prediction using all the drivers is indicated by yp,all while the prediction without the k driver is indicated by yp,k. With this notation, the overall prediction errors can be written as follows:

$${E}_{all}=\frac{1}{N}\sum {\left({y}_{i}-{y}_{p,all}\right)}^{2}=\frac{1}{N}\sum {\left(f\left({y}_{i-1},\dots ,{y}_{i-{\Delta }_{y}},{x }_{j,i-1},\dots ,{x}_{j,i-{\Delta }_{{x}_{j}}},{x }_{k,i-1},\dots ,{x}_{k,i-{\Delta }_{{x}_{k}}}\right)+\epsilon -f\left({y}_{i-1},\dots ,{y}_{i-{\Delta }_{y}},{x }_{j,i-1},\dots ,{x}_{j,i-{\Delta }_{{x}_{i}}},{x }_{k,i-1},\dots ,{x}_{k,i-{\Delta }_{{x}_{k}}}\right)\right)}^{2}=\frac{1}{N}\sum {\epsilon }^{2}$$
$${E}_{k}=\frac{1}{N}\sum {\left({y}_{i}-{y}_{p,k}\right)}^{2}=\frac{1}{N}\sum {\left(f\left({y}_{i-1},\dots ,{y}_{i-{\Delta }_{y}},{x }_{j,i-1},\dots ,{x}_{j,i-{\Delta }_{{x}_{j}}},{x }_{k,i-1},\dots ,{x}_{k,i-{\Delta }_{{x}_{k}}}\right)+\epsilon -f\left({y}_{i-1},\dots ,{y}_{i-{\Delta }_{y}},{x }_{j,i-1},\dots ,{x}_{j,i-{\Delta }_{{x}_{j}}}\right)\right)}^{2}=\frac{1}{N}\sum {\left(f\left({y}_{i-1},\dots ,{y}_{i-{\Delta }_{y}},{x }_{j,i-1},\dots ,{x}_{j,i-{\Delta }_{{x}_{j}}},{x }_{k,i-1},\dots ,{x}_{k,i-{\Delta }_{{x}_{k}}}\right)-f\left({y}_{i-1},\dots ,{y}_{i-{\Delta }_{y}},{x }_{j,i-1},\dots ,{x}_{j,i-{\Delta }_{{x}_{j}}}\right)\right)}^{2}+\frac{1}{N}\sum {\epsilon }^{2}$$

From the previous equations, it is evident that \({E}_{k}\) is always larger or equal to\({E}_{all}\). If \({E}_{k}={E}_{all}\), excluding \({x}_{k}\) from the list of regressors does not lead to any prediction degradation, and therefore \({x}_{k}\) cannot have a causal influence on y. On the contrary, if\({E}_{k}>{E}_{all}\), including \({x}_{k}\) has the effect of improving the prediction and therefore it can be considered to have a causal influence on y.

In the proposed methodology, \({E}_{k}\) and \({E}_{all}\) are estimated with TDNN ensembles. The motivation for this choice resides in the fact that TDNNs are universal approximators and therefore allow avoiding a priori assumptions on the mathematical form of the function f (contrary to classical causality detection algorithms, such as the various versions of Granger causality detection). Their ensembles also improve significantly the statistical estimator of the prediction errors, id est the median value (more robust than the average to avoid fluctuations due to outliers) and the standard deviation of the median errors.

With the estimators proposed in Sect. 3, two major aspects of causality can be tackled. The first is causality detection, which is a binary problem (does \({x}_{k}\) causes y?); this task is accomplished with a hypothesis test on the difference of means. The corresponding indicator Z is calculated as follows:

$$Z=\frac{{E}_{k}-{E}_{all}}{\sqrt{\frac{{\sigma }_{k}^{2}+{\sigma }_{all}^{2}}{2}}}$$

If \(Z\ge {Z}_{threshod}\), the null hypothesis (that \({x}_{k}\) has no causal influence on y) can be discarded with a significance level α depending on the value of Z; otherwise the null hypothesis must be accepted as valid.

The second objective is causality quantification, and it has been tackled with the \({R}_{s}\) indicator, defined as:

\({R}_{\sigma }=\frac{{E}_{k}}{{E}_{all}}=\frac{\sum {\left(f\left({y}_{i-{\Delta }_{y}},{x}_{j,i-{\Delta }_{{x}_{j}}},{x}_{k,i-{\Delta }_{{x}_{k}}}\right)-f\left({y}_{i-{\Delta }_{y}},{x}_{j,i-{\Delta }_{{x}_{j}}}\right)\right)}^{2}+\sum {\epsilon }^{2}}{\sum {\epsilon }^{2}}\) A1.

In some complex cases, equation A1 can be difficult to calculate. The problem can be often circumvented by using the Taylor expansion with respect to \({x}_{k,i-1},\dots ,{x}_{k,i-\Delta {x}_{i}}\):

$$f\left({y}_{i-1},\dots ,{y}_{i-{\Delta }_{y}},{x }_{j,i-1},\dots ,{x}_{j,i-{\Delta }_{{x}_{j}}},{x }_{k,i-1},\dots ,{x}_{k,i-{\Delta }_{{x}_{k}}}\right)=f\left({y}_{i-1},\dots ,{y}_{i-{\Delta }_{y}},{x }_{j,i-1},\dots ,{x}_{j,i-{\Delta }_{{x}_{j}}}\right)+\sum {\left[\frac{\partial f}{\partial {x}_{k,t}}\right]}_{0}\Delta {x}_{k,t}$$

One then obtains:

\({R}_{\sigma }=\frac{{E}_{k}}{{E}_{all}}=\frac{\sum {\left(\sum {\left[\frac{\partial f}{\partial {x}_{k,t}}\right]}_{0}\Delta {x}_{k,t}\right)}^{2}+\sum {\epsilon }^{2}}{\sum {\epsilon }^{2}}\) A2.

From the above results, it is clear that the higher the value of the indicator, the higher the influence of \({x}_{k}\) on the time series \(y\).

In some cases, the above relations provide an immediate interpretation of the methodology. In the following, we report various cases that have been also analysed in the main text of the paper and can be addressed directly with equation A1.

The first example is the “Coupled AR model”, described by the following equations:

$$\begin{array}{c}x\left(t\right)=0.5x\left(t-1\right)+0.2y\left(t-1\right)+{\epsilon }_{x}(t)\\ y\left(t\right)=Cx\left(t-1\right)+0.7y\left(t-1\right)+{\epsilon }_{y}(t)\end{array}$$

where the objective consists of calculating the influence of x on y. With the ensemble, it is possible to derive:

$${E}_{all}=\frac{1}{N}\sum {\left(y-{y}_{p,all}\right)}^{2}=\frac{1}{N}\sum {\epsilon }_{y}^{2}$$
$${E}_{k}=\frac{1}{N}\sum {\left(y-{y}_{p,k}\right)}^{2}=\frac{1}{N}{\sum {C}^{2}\left(x\left(t-1\right)-\stackrel{-}{x\left(t-1\right)}\right)}^{2}+\frac{1}{N}\sum {\epsilon }_{y}^{2}$$

Then, the \({R}_{\sigma }\) indicator becomes:

$${R}_{\sigma }=\frac{\frac{1}{N}{\sum {C}^{2}\left(x\left(t-1\right)-\stackrel{-}{x\left(t-1\right)}\right)}^{2}+\frac{1}{N}\sum {\epsilon }_{y}^{2}}{\frac{1}{N}\sum {\epsilon }_{y}^{2}}=1+\frac{{C}^{2}{\sigma }_{x}^{2}\left(t-1\right)}{{\sigma }_{{\epsilon }_{y}}^{2}}$$

Consequently:

$$\begin{array}{c}{R}_{\sigma }=1 \leftrightarrow C=0\\ {R}_{\sigma }>1\leftrightarrow \left|C\right|>0\end{array}$$

Therefore any value of \({R}_{\sigma }\) significantly higher than one indicates a causal link between x and y.

The second example is the Lorenz trivariate case introduced in Sect. 3:

System X:

$$\begin{array}{c}\frac{d{x}_{1}}{dt}=\sigma ({x}_{2}-{x}_{1})\\ \frac{d{x}_{2}}{dt}=r{x}_{1}-{x}_{2}-{x}_{1}{x}_{3}+{c}_{21}{y}_{2}^{2}+{c}_{31}{z}_{2}^{2}\\ \frac{d{x}_{3}}{dt}={x}_{1}{x}_{2}-b{x}_{3}\end{array}$$

System Y:

$$\begin{array}{c}\frac{d{y}_{1}}{dt}=\sigma ({y}_{2}-{y}_{1})\\ \frac{d{y}_{2}}{dt}=r{y}_{1}-{y}_{2}-{y}_{1}{y}_{3}+{c}_{12}{x}_{2}^{2}+{c}_{32}{z}_{2}^{2}\\ \frac{d{y}_{3}}{dt}={y}_{1}{y}_{2}-b{y}_{3}\end{array}$$

System Z:

$$\begin{array}{c}\frac{d{z}_{1}}{dt}=\sigma ({z}_{2}-{z}_{1})\\ \frac{d{z}_{2}}{dt}=r{z}_{1}-{z}_{2}-{z}_{1}{z}_{3}+{c}_{13}{x}_{2}^{2}+{c}_{23}{y}_{2}^{2}\\ \frac{d{z}_{3}}{dt}={z}_{1}{z}_{2}-b{z}_{3}\end{array}$$

where, for simplicity sake, the prediction of the system variables and not an embedded signals is used. In this case, we can write the \({R}_{\sigma }\) for all the systems as:

$${R}_{\sigma }\left(Y\to X\right)=1+\frac{\frac{1}{N}{\sum \left({c}_{21}\left({y}_{2}^{2}-\overline{{y}_{2}^{2}}\right)\right)}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}=1+\frac{{c}_{21}^{2}{\sigma }_{{y}_{2}^{2}}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}$$
$${R}_{\sigma }\left(Z\to X\right)=1+\frac{\frac{1}{N}{\sum \left({c}_{21}\left({z}_{2}^{2}-\overline{{z}_{2}^{2}}\right)\right)}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}=1+\frac{{c}_{31}^{2}{\sigma }_{{z}_{2}^{2}}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}$$
$${R}_{\sigma }\left(X\to Y\right)=1+\frac{\frac{1}{N}{\sum \left({c}_{21}\left({x}_{2}^{2}-\overline{{x}_{2}^{2}}\right)\right)}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}=1+\frac{{c}_{12}^{2}{\sigma }_{{x}_{2}^{2}}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}$$
$${R}_{\sigma }\left(Z\to Y\right)=1+\frac{\frac{1}{N}{\sum \left({c}_{21}\left({z}_{2}^{2}-\overline{{z}_{2}^{2}}\right)\right)}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}=1+\frac{{c}_{32}^{2}{\sigma }_{{z}_{2}^{2}}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}$$
$${R}_{\sigma }\left(X\to Z\right)=1+\frac{\frac{1}{N}{\sum \left({c}_{21}\left({x}_{2}^{2}-\overline{{x}_{2}^{2}}\right)\right)}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}=1+\frac{{c}_{13}^{2}{\sigma }_{{x}_{2}^{2}}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}$$
$${R}_{\sigma }\left(Y\to Z\right)=1+\frac{\frac{1}{N}{\sum \left({c}_{21}\left({y}_{2}^{2}-\overline{{y}_{2}^{2}}\right)\right)}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}=1+\frac{{c}_{23}^{2}{\sigma }_{{y}_{2}^{2}}^{2}}{{\sigma }_{{\epsilon }_{x1}}^{2}+{\sigma }_{{\epsilon }_{x2}}^{2}+{\sigma }_{{\epsilon }_{x3}}^{2}}$$

where the results are perfect in line with the expected ones. For example, particularised for case 6 of Table 1 of the main text, the following values have to be assigned to the constants \({c}_{ij}\):

$${c}_{12}={c}_{23}={c}_{31}=0 {c}_{21},{c}_{32},{c}_{13}\ne 0$$

And the \({R}_{\sigma }\) indicators become:

\({R}_{\sigma }\left(Y\to X\right)>1\); \({R}_{\sigma }\left(Z\to X\right)=1\); \({R}_{\sigma }\left(X\to Y\right)=1\); \({R}_{\sigma }\left(Z\to Y\right)>1\); \({R}_{\sigma }\left(X\to Z\right)>1\); \({R}_{\sigma }\left(Y\to Z\right)=1\)

The extension to all the other trivariate cases is immediate.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Murari, A., Rossi, R. & Gelfusa, M. Combining neural computation and genetic programming for observational causality detection and causal modelling. Artif Intell Rev 56, 6365–6401 (2023). https://doi.org/10.1007/s10462-022-10320-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-022-10320-3

Keywords

Navigation