.. _solver: Solvers ======= In this section, we describe first the general solver strategy, how the chemical network equations are solved. Followed by the available optimizers. General solver strategy ----------------------- The chemical network equations are solved by using numerical optimization methods. Currently the AdamW and the AdamaxW algorithms are implemented. Different optimizers could be added. .. _Nwalkers: Using multiple walkers ^^^^^^^^^^^^^^^^^^^^^^ For most initial conditions the optimizer is able to find the global solution within 100'000 to 200'000 steps. But sometimes, the solver can find only a local minimum. Since the solver takes only a fraction of a second to run, it is most of the time sufficient to use multiple walkers to avoid being stuck in a local minimum. As default we use 100 walkers. .. _Nbest: Using multiple iterations ^^^^^^^^^^^^^^^^^^^^^^^^^ To further improve the results, multiple iterations can be used. For that the best :literal:`nBestSolutions` are used to generate a new set of walkers, by add some perturbation to the best results. Optimizers ---------- In the following all available optimizers are described. More optimizers could be added to the code. AdamW ^^^^^ An iteration of the AdamW optimizer includes these steps: .. math:: \eta = \eta \cdot 0.9999 \\ m_t = \beta_1 m_{t - 1} + (1 - \beta_1) g_t \\ v_t = \beta_2 v_{t - 1} + (1 - \beta_2) (g_t \cdot g_t) \\ \hat{m}_t = \frac{m_t}{1 - \beta_1^t} \\ \hat{v}_t = \frac{v_t}{1 - \beta_2^t} \\ \theta_{t + 1} = \theta_t - \eta \frac{\hat{m}_t}{ \sqrt{\hat{v}_t} + \epsilon} - \eta \lambda_{reg} \theta_t \\ \beta_1^t = \beta_1^t \cdot \beta1 \\ \beta_2^t = \beta_2^t \cdot \beta2 with - :math:`\eta`: learning rate, user parameter (default = 0.4). - :math:`\lambda_{reg}`: Weight decay, user parameter (default = 0.0001). - :math:`\beta_1 = 0.9` - :math:`\beta_2 = 0.999` - :math:`\epsilon = 10^{-6}` AdamaxW ^^^^^^^ An single iteration of the AdamW optimizer includes these steps: .. math:: \eta = \eta \cdot 0.9999 \\ m_t = \beta_1 m_{t - 1} + (1 - \beta_1) g_t \\ v_t = max(\beta_2 v_{t - 1}, |g_t|) \\ \theta_{t + 1} = \theta_t - \eta \frac{m_t}{ v_t + \epsilon} - \eta \lambda_{reg} \theta_t \\ \beta_1^t = \beta_1^t \cdot \beta1 \\ \beta_2^t = \beta_2^t \cdot \beta2 with - :math:`\eta`: learning rate, user parameter (default = 0.4). - :math:`\lambda_{reg}`: Weight decay, user parameter (default = 0.0001). - :math:`\beta_1 = 0.9` - :math:`\beta_2 = 0.999` - :math:`\epsilon = 10^{-6}` .. https://optimization.cbe.cornell.edu/index.php?title=AdamW .. https://arxiv.org/pdf/1711.05101v3