Solvers

In this section, we describe first the general solver strategy, how the chemical network equations are solved. Followed by the available optimizers.

General solver strategy

The chemical network equations are solved by using numerical optimization methods. Currently the AdamW and the AdamaxW algorithms are implemented. Different optimizers could be added.

Using multiple walkers

For most initial conditions the optimizer is able to find the global solution within 100’000 to 200’000 steps. But sometimes, the solver can find only a local minimum. Since the solver takes only a fraction of a second to run, it is most of the time sufficient to use multiple walkers to avoid being stuck in a local minimum. As default we use 100 walkers.

Using multiple iterations

To further improve the results, multiple iterations can be used. For that the best nBestSolutions are used to generate a new set of walkers, by add some perturbation to the best results.

Optimizers

In the following all available optimizers are described. More optimizers could be added to the code.

AdamW

An iteration of the AdamW optimizer includes these steps:

\[ \begin{align}\begin{aligned}\begin{split}\eta = \eta \cdot 0.9999 \\ m_t = \beta_1 m_{t - 1} + (1 - \beta_1) g_t \\ v_t = \beta_2 v_{t - 1} + (1 - \beta_2) (g_t \cdot g_t) \\ \hat{m}_t = \frac{m_t}{1 - \beta_1^t} \\ \hat{v}_t = \frac{v_t}{1 - \beta_2^t} \\ \theta_{t + 1} = \theta_t - \eta \frac{\hat{m}_t}{ \sqrt{\hat{v}_t} + \epsilon} - \eta \lambda_{reg} \theta_t \\\end{split}\\\begin{split}\beta_1^t = \beta_1^t \cdot \beta1 \\ \beta_2^t = \beta_2^t \cdot \beta2\end{split}\end{aligned}\end{align} \]

with

\(\eta\): learning rate, user parameter (default = 0.4).

\(\lambda_{reg}\): Weight decay, user parameter (default = 0.0001).

\(\beta_1 = 0.9\)

\(\beta_2 = 0.999\)

\(\epsilon = 10^{-6}\)

AdamaxW

An single iteration of the AdamW optimizer includes these steps:

\[ \begin{align}\begin{aligned}\begin{split}\eta = \eta \cdot 0.9999 \\ m_t = \beta_1 m_{t - 1} + (1 - \beta_1) g_t \\ v_t = max(\beta_2 v_{t - 1}, |g_t|) \\\end{split}\\\begin{split}\theta_{t + 1} = \theta_t - \eta \frac{m_t}{ v_t + \epsilon} - \eta \lambda_{reg} \theta_t \\\end{split}\\\begin{split}\beta_1^t = \beta_1^t \cdot \beta1 \\ \beta_2^t = \beta_2^t \cdot \beta2\end{split}\end{aligned}\end{align} \]

with

\(\eta\): learning rate, user parameter (default = 0.4).

\(\lambda_{reg}\): Weight decay, user parameter (default = 0.0001).

\(\beta_1 = 0.9\)

\(\beta_2 = 0.999\)

\(\epsilon = 10^{-6}\)