Data Assimilation
=================

What is Data Assimilation ?
----------------

Data Assimilation is an approach / method for combining **high-fidelity observations** with **low-fidelity model output** to improve the latter.

We want to predict the state of the system and its future in the **best possible way**.

.. figure:: _figures/DA_methods-1.png
    :width: 700
    :align: center
    :alt: DA methods

    Data assimilation available methods.


|


Sequential Data Assimilation : Kalman Filter
----------------


**Initialization**: initial estimate for :math:`\mathbf{x}^a_0`.

At every :math:`k` Data Assimilation cycle,

1. **Prediction / Forecast step**: the forecast state :math:`\mathbf{x}_k^f` is given by the foreward model :math:`\mathcal{M}_{k-1:k}` of our system.
2. **Correction / analysis step**: we perform a correction of the state :math:`\mathbf{x}_k^a` based on the forecast state  :math:`\mathbf{x}_k^f` and some high-fidelity observation :math:`\mathbf{y}_k` through the estimation of the so-called *Kalman gain* matrix  :math:`\mathbf{K}_k` that minimizes the error covariance matrix of the updated state :math:`\mathbf{P}_k^a`.

.. figure:: _figures/KF_algorithm-1.png
    :width: 500
    :align: center
    :alt: Kalman Filter algorithm scheme.

    Kalman Filter algorithm scheme.

|

The inconveniences of the Kalman Filter are:

1. The Kalman Filter - also the Extended Kalman Filter (EKF) - only works for moderate deviations from linearity and Gaussianity:
   
.. math::
    \begin{equation}
        \mathcal{M}_{k:k-1} \rightarrow \textrm{Navier-Stokes equations (non-linear)}
    \end{equation}

1. High dimensionality of the error covariance matrix :math:`\mathbf{P}_k`, with a number of degree of freedom equal to :math:`n_{\textrm{vars}} \times n_{\textrm{cells}}`:
   
.. math::
    \begin{equation}
        \left[ \mathbf{P}_k \right]_{n_{\textrm{vars}} \times n_{\textrm{cells}}, n_{\textrm{vars}} \times n_{\textrm{cells}}}
    \end{equation}


The possible alternatives to the KF are:

1. Particle filter: cannot yet to be applied to very high dimensional systems (:math:`m` = size the ensemble).
2. Ensemble Kalman Filter:
   
   a. High dimensional systems :math:`\rightarrow` we avoid the explicit defintion of :math:`\mathbf{P}_k`. 
   b. Non-linear models :math:`\rightarrow` Monte-Carlo realisations.
   c. But underestimation of :math:`\mathbf{P}_k^a` :math:`\rightarrow` inflation and localisation.


Ensemble Kalman Filter with extended state
----------------


The system state :math:`\mathbf{x}_k` is as follows:

.. math::
    \begin{equation}
        \begin{bmatrix}
            \mathbf{u}_k \\
            \theta_k
        \end{bmatrix}_{n_{\textrm{vars}} \times n_{\textrm{cells}}+n_\theta, m}
    \end{equation}


where:

- :math:`\mathbf{u}_k` is the state matrix (can contain velocities, pressure, temperature...) of the domain at the time instant :math:`k`.
  
- :math:`\theta_k` are the coefficients from the model we want to infer at the time instant :math:`k`.


The observation data :math:`\mathbf{y}_k` is defined as 

.. math::
    \begin{equation}
        \left[ \mathbf{y}_k \right]_{n_{\textrm{obs}},m} \rightarrow \mathcal{N} \left( y_k, \mathbf{R}_k \right)
    \end{equation}


Observation data may be built using:

1. Velocity, pressure, temperature and more... values.

2. Instantaneous global force coefficients :math:`C_D`, :math:`C_L`, :math:`C_f`, ...


.. figure:: _figures/EnKF_algorithm-1.png
    :width: 500
    :align: center
    :alt: Ensemble Kalman Filter algorithm scheme.

    Ensemble Kalman Filter algorithm scheme.

|


The following matrices are needed to build the EnKF algorithm:

1. Observation covariance matrix :math:`\mathbf{R}_k`: we assume Gaussian, non-correlated incertainty for the observations (:math:`\sigma_{i,k}` is the standard deviation for the variable :math:`i` at time instant :math:`k`).
   
.. math::
    \begin{equation}
        \left[ \mathbf{R}_k \right]_{n_{\textrm{obs}}, n_{\textrm{obs}}} = \sigma^2_{i,k} \, \left[ \mathbf{I} \right]_{n_{\textrm{obs}}, n_{\textrm{obs}}}
    \end{equation}


2. Sampling matrix :math:`\mathcal{H} \left( \mathbf{x}^f_i \right)`: it is the projection the model into the position of the observations.

.. math::
    \begin{equation}
    \left[ \mathcal{H} \left( \mathbf{x}^f_k \right) \right]_{n_{\textrm{obs}},m}
    \end{equation}


3. Anomaly matrices for the system's state :math:`\mathbf{x}^f_k` and sampling :math:`\mathcal{H} \left( \mathbf{x}^f_i \right) = \mathbf{s}^f_k` matrices. They measure the deviation of each realisations with respect to the mean.

.. math::
    \begin{equation}
        \mathbf{X} _i =\frac{\mathbf{x}_i - {\overline{\mathbf{x}_i}}}{\sqrt{m-1}} \quad , \quad         \mathbf{X} _i =\frac{\mathbf{x=s}_i - {\overline{\mathbf{s}_i}}}{\sqrt{m-1}}
    \end{equation}

.. math::
    \begin{equation}
        \textrm{ with } \quad \left[ \mathbf{X} \right]_{n_{\textrm{vars}} \times n_{\textrm{cells}}+n_\theta, m} \quad , \quad \left[ \mathbf{S} \right]_{n_{\textrm{obs}}, m}
    \end{equation}


4. Kalman Gain matrix:
   
.. math::
    \begin{equation}
        \mathbf{K} = \mathbf{X} \mathbf{S}^\top \left(  \mathbf{S} \mathbf{S}^\top + \mathbf{R}  \right)^{-1}
    \end{equation}


.. math::
    \begin{equation}
         \textrm{ with } \quad \left[ \mathbf{K}_k \right]_{n_{\textrm{vars}} \times n_{\textrm{cells}}+n_{\theta},n_{\textrm{obs}}} 
    \end{equation}

Multigrid Ensemble Kalman Filter (MGEnKF)
-----------------------------------------

MultiFidelity Ensemble Kalman Filter (MFEnKF)
--------------------------------------------

The MFEnKF is built using 3 sources of information:

1. The principal members, denoted as :math:`X_i`

2. The control members, denoted as :math:`\hat{U}_i`

3. The ancillary members, denoted as :math:`U_i`

Control members :math:`\hat{U}_i` are a projection of the principal members :math:`X_i` using the projection operator :math:`\mathbf{\Phi}_r^\star`:


.. math::

   \hat{U}_i = \mathbf{\Phi}^*_r \left(X_i \right)

The MFEnKF methodology is sketched below, from [popov_multifidelity_2021]:

.. image:: SCHEMES_MFENKF.png
   :width: 50%


The Kalman gain :math:`\tilde{K}_i` is computed using the 3 ensemble types.

.. |K_tilde_i| replace:: **~K\ :subscript:`i`\**

Ensemble means
^^^^^^^^^^^^^^

.. math::

   \begin{aligned}
       \tilde{\mu}_X &= N_X^{-1} E_X 1_{N_X}, \\
       \tilde{\mu}_{\hat{U}} &= N_X^{-1} E_{\hat{U}} 1_{N_X}, \\
       \tilde{\mu}_U &= N_U^{-1} E_U 1_{N_U}, \\
       \tilde{\mu}_{\mathcal{H}(X)} &= N_o^{-1} E_{H(X)} 1_{N_o}, \\
       \tilde{\mu}_{H(\hat{U})} &= N_o^{-1} E_{H(\hat{U})} 1_{N_o}, \\
       \tilde{\mu}_{H(U)} &= N_o^{-1} E_{H(U)} 1_{N_o}\end{aligned}

Ensemble anomaly matrices
^^^^^^^^^^^^^^^^^^^^^^^^^

.. math::

   \begin{aligned}
       A_X &= \left( E_X - \tilde{\mu}_X 1_{N_X}^\top \right) \left( N_X - 1 \right)^{-\frac{1}{2}}, \\
       A_{\hat{U}} &= \left( E_{\hat{U}} - \tilde{\mu}_{\hat{U}} 1_{N_X}^\top \right) \left( N_X - 1 \right)^{-\frac{1}{2}}, \\
       A_U &= \left( E_U - \tilde{\mu}_U 1_{N_U}^\top \right) \left( N_U - 1 \right)^{-\frac{1}{2}}, \\
       A_{H(X)} &= \left( E_{H(X)} - \tilde{\mu}_{H(X)} 1_{N_o}^\top \right) \left( N_X - 1 \right)^{-\frac{1}{2}}, \\
       A_{H(\hat{U})} &= \left( E_{H(\hat{U})} - \tilde{\mu}_{H(\hat{U})} 1_{N_o}^\top \right) \left( N_X - 1 \right)^{-\frac{1}{2}}, \\
       A_{H(U)} &= \left( E_{H(U)} - \tilde{\mu}_{H(U)} 1_{N_o}^\top \right) \left( N_U - 1 \right)^{-\frac{1}{2}}\end{aligned}

Ensemble covariances
^^^^^^^^^^^^^^^^^^^^

.. math::

   \begin{aligned}
       \tilde{\Sigma}_{X,H(X)} &= A_X A_{H(X)}^\top \\
       \tilde{\Sigma}_{\hat{U},H(\Phi_r \hat{U})} &= A_{\hat{U}} A_{H(\Phi_r \hat{U})}^\top \\
       \tilde{\Sigma}_{X,H(\Phi_r \hat{U})} &= A_X A_{H(\Phi_r \hat{U})}^\top \\
       \tilde{\Sigma}_{\hat{U},H(X)} &= A_{\hat{U}} A_{H(X)}^\top \\
       \tilde{\Sigma}_{U,H(\Phi_r U)} &= A_U A_{H(\Phi_r U)}^\top \\
       \tilde{\Sigma}_{H(X),H(X)} &= A_{H(X)} A_{H(X)}^\top \\
       \tilde{\Sigma}_{H(\Phi_r \hat{U}),H(\Phi_r \hat{U})} &= A_{H(\Phi_r \hat{U})} A_{H(\Phi_r \hat{U})}^\top \\
       \tilde{\Sigma}_{H(X),H(\Phi_r \hat{U})} &= A_{H(X)} A_{H(\Phi_r \hat{U})}^\top \\
       \tilde{\Sigma}_{H(\Phi_r \hat{U}),H(X)} &= A_{H(\Phi_r \hat{U})} A_{H(X)}^\top \\
       \tilde{\Sigma}_{H(\Phi_r U),H(\Phi_r U)} &= A_{H(\Phi_r U)} A_{H(\Phi_r U)}^\top \end{aligned}

Kalman gain intermediate covariances
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. math::

   \begin{aligned}
       \tilde{\Sigma}_{Z, H(Z)} &= \tilde{\Sigma}_{X,H(X)} + \frac{1}{4} \Phi_r \tilde{\Sigma}_{\hat{U},H(\Phi_r \hat{U})} - \frac{1}{2} \tilde{\Sigma}_{X,H(\Phi_r \hat{U})} - \frac{1}{2} \Phi_r \tilde{\Sigma}_{\hat{U},H(X)} + \frac{1}{4} \Phi_r \tilde{\Sigma}_{U,H(\Phi_r U)} \\
       \tilde{\Sigma}_{H(Z), H(Z)} &= \tilde{\Sigma}_{H(X),H(X)} + \frac{1}{4} \tilde{\Sigma}_{H(\Phi_r \hat{U}),H(\Phi_r \hat{U})} - \frac{1}{2} \tilde{\Sigma}_{H(X),H(\Phi_r \hat{U})} - \frac{1}{2} \tilde{\Sigma}_{H(\Phi_r \hat{U}),H(X)} + \frac{1}{4} \tilde{\Sigma}_{H(\Phi_r U),H(\Phi_r U)}\end{aligned}

Computing Kalman Gain
^^^^^^^^^^^^^^^^^^^^^

.. math::

   \tilde{K} = \tilde{\Sigma}_{Z, H(Z)} \left( \tilde{\Sigma}_{H(Z), H(Z)} + \tilde{\Sigma}_{\eta,\eta} \right)^{-1}

Update and total variate
^^^^^^^^^^^^^^^^^^^^^^^^

Each variable can be updated:

.. math::

   \begin{aligned}
       X^a &= X^f + \tilde{K} \left( y - H(X) \right) \\
       \hat{U}^a &= \hat{U}^f + \Phi_r^* \tilde{K} \left( y - H(\hat{U}) \right) \\
       U^a &= U^f + \Phi_r^* \tilde{K} \left( y - H(U) \right) \end{aligned}

The total variate |Z| represents the combined prior and posterior knowledge through the linear control variate technique:

.. |Z| replace:: **Z**

.. math::

   \begin{aligned}
       Z^f &= X^f - S \left( \hat{U}^f - \overline{U^f} \right) \\
       Z^a &= X^a - S \left( \hat{U}^a - \overline{U^a} \right)\end{aligned}

where |S| = |Phi_r| / 2

.. |S| replace:: **S**