Jekyll2021-10-29T11:48:44-07:00https://vboussange.github.io/feed.xmlVictor BoussangePhD candidateVictor Boussangebvictor@ethz.chhttps://landecology.ethz.ch/the-group.htmlParameter Inference in Dynamical Systems2021-09-01T00:00:00-07:002021-09-01T00:00:00-07:00https://vboussange.github.io/posts/2021/09/parameter-inference<p align="center">
<img src="/images/misc/param_inference.png" alt="Effect of topology of neutral diversity" style="max-width:400px !important;" />
</p>
<figcaption> Calibrating a system of differential equations on data for the project "<a href="https://vboussange.github.io/research#developping-numerical-schemes-for-solving-high-dimensional-non-local-nonlinear-pdes">What can Economics learn from Biology?</a>". </figcaption>
<p><em>One of the challenges modellers face in biological sciences is to calibrate models in order to match as closely as possible observations and gain predictive power. This can be done via direct measurements through experimental design, but this process is often costly, time consuming and even sometimes not possible.
Scientific machine learning addresses this problem by applying optimisation techniques originally developed within the field of artificial intelligence to mechanistic models, allowing to infer parameters directly from observation data.
In this blog post, I shall explain the basics of this approach, and how the Julia ecosystem has efficiently embedded such techniques into ready to use packages. This promises exciting perspectives for modellers in all areas of environmental sciences.</em></p>
<blockquote>
<p>๐ง This is Work in progress ๐ง</p>
</blockquote>
<p>Dynamical systems are models that allow to reproduce, understand and forecast systems. They connect the time variation of the state of the system to the fundamental processes we believe driving it, that is</p>
\[\begin{equation*}
\text{ time variation of } ๐_t = \sum \text{processes acting on } ๐_t
\end{equation*}\]
<p>where $๐_t$ denotes the state of the system at time $t$. This translates mathematically into</p>
\[\begin{equation}
\partial_t(๐_t) = f_\theta( ๐_t )
\end{equation}\]
<p>where the function $f_\theta$ captures the ensembles of the processes considered, and depend on the parameters $\theta$.</p>
<p>Eq. (1) is a Differential Equation, that can be integrated with respect to time to obtain the state of the system at time $t$ given an initial state $๐_{t_0}$.</p>
\[\begin{equation}
๐_t = ๐_{t_0} + \int_0^t f_\theta( ๐_s ) ds
\end{equation}\]
<p>Dynamical systems have been used for hundreds of years and have successfully captured e.g. the motion of planets (<a href="https://en.wikipedia.org/wiki/Kepler%27s_laws_of_planetary_motion#Second_law">second law of Kepler</a>), <a href="https://en.wikipedia.org/wiki/RC_circuit#Natural_response">the voltage in an electrical circuit</a>, population dynamics (<a href="https://en.wikipedia.org/wiki/LotkaโVolterra_equations">Lotka Volterra equations</a>) and morphogenesis (<a href="https://en.wikipedia.org/wiki/Turing_pattern">Turing patterns</a>)โฆ</p>
<p>Such models can be used to <strong>forecast the state of the system in the future</strong>, or can be used in the sense of <strong>virtual laboratories</strong>. In both cases, one of the requirement is that they <strong>reproduce patterns</strong> - at least at a qualitative level. To do so, the modeler needs to find the true parameter combination $\theta$ that correspond to the system under consideration. And this is tricky! In this post we adress this challenge.</p>
<h2 id="model-calibration">Model calibration</h2>
<blockquote>
<p>How to determine $\theta$ so that $\text{simulations} \approx \text{empirical data}$?</p>
</blockquote>
<p>The best way to do that is to design an experiment!</p>
<iframe src="https://giphy.com/embed/0DYipdNqJ5n4GYATKL" width="480" height="360" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe>
<p><a href="https://giphy.com/gifs/BTTF-back-to-the-future-bttf-one-0DYipdNqJ5n4GYATKL"></a></p>
<p>When possible, measuring directly the parameters in a controlled experiment with e.g. physical devices is a great approach. This is a very powerful scientific method, used e.g. in <a href="https://en.wikipedia.org/wiki/General_circulation_model">global circulation models</a> where scientists can measure the water viscosity, the change in water density with respect to temperature, etcโฆ Unfortunately, such direct methods are often not possible considering other systems!</p>
<p>An opposite approach is to infer the parameters undirectly with the empirical data available.</p>
<h3 id="parameter-exploration">Parameter exploration</h3>
<p>One way to find right parameters is to perform parameter exploration, that is, slicing the parameter space and running the model for all parameter combinations chosen. Comparing the simulation results to the empirical data available, one can elect the combination with the higher explanatory power.</p>
<p>But as the parameter space becomes larger (higher number of parameters) this becomes tricky. Such problem is often refered to as the <a href="https://en.wikipedia.org/wiki/Curse_of_dimensionality">curse of dimensionality</a></p>
<p>Feels very much like being lost in a giant maze. We need more clever technique to get out!</p>
<h3 id="a-machine-learning-problem">A Machine Learning problem</h3>
<p>In artificial intelligence, people try to predict a variable $y$ from predictors $x$ by finding suitable parameters $\theta$ of a parametric function $F_\theta$ so that</p>
\[\begin{equation}
y = F_\theta(x)
\end{equation}\]
<p>For example, in computer vision, this function might be designed for the specific task of labelling images, such as for instance</p>
<p>$F_\theta ($ <img src="/images/misc/cat.png" alt="" /> $) \to \{\text{cat}, \text{dog}\}$</p>
<p>Usually people use neural networks, as they are good approximators for high dimensional function (see the <a href="https://en.wikipedia.org/wiki/Universal_approximation_theorem">Universal approximation theorem</a>).</p>
\[\begin{equation*}
F_\theta \equiv NN_\theta
\end{equation*}\]
<p>One should really see neural networks as functions ! For example, feed forward neural networks are mathematically described by matrix multiplications</p>
\[\begin{equation*}
NN_\theta (x) = A x + b
\end{equation*}\]
<p><em>(without activation function)</em></p>
<p><strong>Notice that Eq. (2) is similar to Eq. (3)</strong>! Indeed one can think of $๐_0$ as the analogous to $x$ - i.e. the predictor - and $๐_t$ as the variable $y$ to predict.</p>
\[\begin{equation*}
๐_t = F_\theta(๐_{t_0})
\end{equation*}\]
<p>where \(F_\theta (๐_{t_0}) \equiv ๐_{t_0} + \int_0^t f_\theta( ๐_s ) ds .\)</p>
<p>With this perspective in mind, techniques developed within the field of Machine Learning - to find suitable parameters $\theta$ that best predict $y$ - become readily available to reach our specific needs: model calibration!</p>
<h2 id="parameter-inference">Parameter inference</h2>
<p>The general strategy to find a suitable neural network that can perform the tasks required is to โtrainโ it, that is, to find the parameters $\theta$ so that its predictions are accurate.</p>
<p>In order to train it, one โscoresโ how good a combination of parameter $\theta$ performs. A way to do so is to introduce a โ<strong>Loss function</strong>โ</p>
\[\begin{equation*}
L(\theta) = (F_\theta(x) - y_{\text{empirical}})^2
\end{equation*}\]
<p>One can then use an optimisation method to find a local minima (and in the best scenario, the global minima) for $L$.</p>
<h3 id="gradient-descent">Gradient descent</h3>
<p>You ready?</p>
<iframe src="https://giphy.com/embed/l2Je5HLxfeuppOkuc" width="480" height="270" frameborder="0" class="giphy-embed" allowfullscreen=""></iframe>
<p><a href="https://giphy.com/gifs/toferra-ski-skiing-into-the-mind-l2Je5HLxfeuppOkuc"></a></p>
<p>Gradient descent and stochastic gradient descent are โiterative optimisation methods that seek to find a local minimum of a differentiable functionโ (<a href="https://en.wikipedia.org/wiki/Gradient_descent">Wikipedia</a>). Such methods have become widely used with the development of artifical intelligence.</p>
<p>Those methods are used to compute iteratively $\theta$ using the sensitivity of the loss function to changes in $\theta$, denoted by $\partial_\theta L(\theta)$</p>
\[\begin{equation*}
\theta^{i+1} = \theta^{(i)} - \lambda \partial_\theta L(\theta)
\end{equation*}\]
<p>where $\lambda$ is called the learning rate.</p>
<h2 id="in-practice">In practice</h2>
<p>The sensitivity with respect to the parameters $\partial_\theta L(\theta)$ is in practice obtained by differentiating the code (<a href="https://en.wikipedia.org/wiki/Automatic_differentiation">Automatic Differentiation</a>).</p>
<p>For some programming languages this can be done automatically, with low computational cost. In particular, <a href="https://github.com/FluxML/Flux.jl">Flux.jl</a> allows to efficiently obtain the gradient of any function written in the wonderful language <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/Julia_prog_language.svg/1200px-Julia_prog_language.svg.png" width="50" />.</p>
<p>The library <a href="https://github.com/SciML/DiffEqFlux.jl">DiffEqFlux.jl</a> based on <a href="https://github.com/FluxML/Flux.jl">Flux.jl</a> implements differentiation rules (<a href="https://juliadiff.org/ChainRulesCore.jl/stable/">custom adjoints</a>) to obtain even more efficiently the sensitivity of a loss function that depends on the numerical solution of a differential equation. That is, <a href="https://github.com/SciML/DiffEqFlux.jl">DiffEqFlux.jl</a> precisely allows to do parameter inference in dynamical systems. Go and check it out!</p>Victor Boussangebvictor@ethz.chhttps://landecology.ethz.ch/the-group.htmlCalibrating a system of differential equations on data for the project "What can Economics learn from Biology?".Highly dimensional non-local non-linear PDEs2021-05-01T00:00:00-07:002021-05-01T00:00:00-07:00https://vboussange.github.io/posts/highdimpde<p align="center">
<img src="https://www.researchgate.net/publication/333656913/figure/fig3/AS:771545693700098@1560962236380/The-phenotypic-diversity-of-daylily-cultivars-flower-colors-Note-The-cultivars.jpg" style="max-width:400px !important;" />
</p>
<figcaption> Phenotypic diversity in Hemerocallis. Plants can be caracterised by many different traits, all of which can be assigned numerical values: Flower colour, Specific Leaf Area (SLA), seed mass, Plant nitrogen fixation capacity, Leaf shape, Flower sex, plant woodiness. Source: <a href="https://doi.org/10.1371/journal.pone.0216460">H Cui et al. 2019</a>.
</figcaption>
<p>Non-local nonlinear PDEs arise from a variety of models in physics, engineering, finance and biology.</p>
<p>In general, non-local models provide more accurate predictions since they are generalisations of their local counterparts. Yet they lead to further complications in obtaining approximation solutions. Non-local nonlinear PDEs can generally not be solved analytically in practical cases, and it is one of the most challenging issues in applied mathematics and numerical analysis to design and analyze approximation methods.</p>
<p>The Deep Splitting method uses a stochastic representation of the PDE under consideration, which allows to train a Neural Network that approximate the solution of the equation. This technique breaks the <strong>curse of dimensionality</strong>, in the sense that its computational cost is dramatically reduced compared to standard schemes such as the Finite Element Method.</p>
<h3 id="where-do-such-equations-arise">Where do such equations arise?</h3>
<p>In <strong>finance</strong>, non-local PDEs arise e.g. in jump-diffusion models for the pricing of derivatives, where underlying stochastic processes experience large jumps. Nonlinearities occur when considering e.g. large investor, where the agent policy affects the assets prices, considering default risks, transactions costs or Knightian uncertainty.</p>
<p>In <strong>economics</strong>, non-local nonlinear PDEs arise e.g. in evolutionary game theory with the so-called replicator mutator equation considering infinite strategy spaces, or in growth model where consumption is non-local.</p>
<p>In <strong>biology</strong>, non-local nonlinear PDEs appear e.g. in models of morphogeneis and cancer evolution, or in models of gene regulatory networks. They are also present in many models of evolution, for example in population genetics with the non-local Fisher KPP equation, or in quantitative genetics where populations are structured with quantitative traits.</p>
<h3 id="why-is-it-important-to-solve-such-equations-in-high-dimensions">Why is it important to solve such equations in high dimensions</h3>
<p>In financial modelling, the dimensionality of the problem corresponds to the <strong>number of financial assets</strong> (such as stocks, commodities, exchange rates and interest rates) in the involved portfolio. In evolutionary dynamics, it relates to the dimension of the <strong>strategy space</strong>. In biology, it usually corresponds to the dimension of the <strong>phenotypic space</strong> and / or of the <strong>geographical space</strong> that represent the population.</p>Victor Boussangebvictor@ethz.chhttps://landecology.ethz.ch/the-group.htmlPhenotypic diversity in Hemerocallis. Plants can be caracterised by many different traits, all of which can be assigned numerical values: Flower colour, Specific Leaf Area (SLA), seed mass, Plant nitrogen fixation capacity, Leaf shape, Flower sex, plant woodiness. Source: H Cui et al. 2019.