# An Inverse QSAR Method Based on a Two-Layered Model and Integer Programming

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Preliminary

**Graphs.**Given a graph G, let $V\left(G\right)$ and $E\left(G\right)$ denote the sets of vertices and edges, respectively. For a subset ${V}^{\prime}\subseteq V\left(G\right)$ (resp., ${E}^{\prime}\subseteq E\left(G\right))$ of a graph G, let $G-{V}^{\prime}$ (resp., $G-{E}^{\prime}$) denote the graph obtained from G by removing the vertices in ${V}^{\prime}$ (resp., the edges in ${E}^{\prime}$), where we remove all edges incident to a vertex in ${V}^{\prime}$ in $G-{V}^{\prime}$. The rank $\mathrm{r}\left(G\right)$ of a graph G is defined to be the minimum $\left|F\right|$ of an edge subset $F\subseteq E\left(G\right)$ such that $G-F$ contains no cycle. A path with two end-vertices u and v is called a $u,v$-path. An edge $e={u}_{1}{u}_{2}$ in a connected graph G is called a bridge if the graph $G-e$ obtained from G by removing edge e is not connected, i.e., $G-e$ consists of two connected graphs ${G}_{i}$ containing vertex ${u}_{i}$, $i=1,2$. For a cyclic graph G, an edge e is called a core-edge if it is in a cycle of G or is a bridge $e={u}_{1}{u}_{2}$ such that each of the connected graphs ${G}_{i}$, $i=1,2$ of $G-e$ contains a cycle. A vertex incident to a core-edge is called a core-vertex of G.

**Two-layered Model.**Let G be an unrooted graph. For an integer $\rho \ge 0$, which we call a branch-parameter, a two-layered model of G is a partition of G into an “interior” and an “exterior” in the following way. We call a vertex $v\in V\left(G\right)$ (resp., an edge $e\in E\left(G\right))$ of G an exterior-vertex (resp., exterior-edge) if ht($v)<\rho $ (resp., e is incident to an exterior-vertex) and denote the sets of exterior-vertices and exterior-edges by ${V}^{\mathrm{ex}}\left(G\right)$ and ${E}^{\mathrm{ex}}\left(G\right)$, respectively and denote ${V}^{\mathrm{int}}\left(G\right)=V\left(G\right)\backslash {V}^{\mathrm{ex}}\left(G\right)$ and ${E}^{\mathrm{int}}\left(G\right)=E\left(G\right)\backslash {E}^{\mathrm{ex}}\left(G\right)$, respectively. We call a vertex in ${V}^{\mathrm{int}}\left(G\right)$ (resp., an edge in ${E}^{\mathrm{int}}\left(G\right)$) an interior-vertex (resp., interior-edge). The set ${E}^{\mathrm{ex}}\left(G\right)$ of exterior-edges forms a collection of connected graphs each of which is regarded as a rooted tree T rooted at the vertex $v\in V\left(T\right)$ with the maximum ht($v)$, where we call T a ρ-fringe-tree (or a fringe-tree). Let ${\mathcal{T}}^{\mathrm{ex}}\left(G\right)$ denote the set of fringe-trees in G. The interior of G is defined to be the subgraph $({V}^{\mathrm{int}}\left(G\right),{E}^{\mathrm{int}}\left(G\right))$ of G. Note that every core-vertex (resp., core-edge) in G is an interior-vertex (resp., interior-edge) of G. Figure 2 illustrates an example of a graph G, such that ${V}^{\mathrm{int}}=\{{u}_{1},{u}_{2},\dots ,{u}_{28}\}$, ${V}^{\mathrm{ex}}=\{{w}_{1},{w}_{2},\dots ,{w}_{19}\}$ and ${\mathcal{T}}^{\mathrm{ex}}\left(G\right)=\{{T}_{1},{T}_{2},\dots ,{T}_{8}\}$ for a branch-parameter $\rho =2$.

#### 2.1.1. Modeling of Chemical Compounds

#### 2.1.2. Introducing Descriptors of Feature Vectors

- ${\mathrm{dcp}}_{1}\left(G\right)$: the number $n\left(G\right)=\left|V\right(G\left)\right|$ of vertices in G.
- ${\mathrm{dcp}}_{2}\left(G\right)$: the number $|{V}^{\mathrm{int}}\left(G\right)|$ of interior-vertices in G.
- ${\mathrm{dcp}}_{3}\left(G\right)$: the average $\overline{\mathrm{ms}}\left(G\right)$ of mass${}^{*}$ over all non-hydrogen atoms in G, i.e., $\overline{\mathrm{ms}}\left(G\right)\triangleq {\sum}_{v\in V\left(G\right)}{\mathrm{mass}}^{*}\left(\alpha \left(v\right)\right)/n\left(G\right)$.
- ${\mathrm{dcp}}_{i}\left(G\right)$, $i=3+d,d\in [1,4]$: the number ${\mathrm{dg}}_{d}\left(G\right)$ of interior-vertices of degree d in G.
- ${\mathrm{dcp}}_{i}\left(G\right)$, $i=7+d,d\in [1,4]$: the number ${\mathrm{dg}}_{d}^{\mathrm{int}}\left(G\right)$ of interior-vertices of interior-degree ${deg}_{({V}^{\mathrm{int}},{E}^{\mathrm{int}})}\left(v\right)=d$ in the interior $({V}^{\mathrm{int}},{E}^{\mathrm{int}})$ of G.
- ${\mathrm{dcp}}_{i}\left(G\right)$, $i=11+d,d\in [0,3]$: the number ${\mathrm{hydg}}_{d}\left(G\right)$ of vertices in G of hydro-degree ${deg}_{\mathrm{hyd}}\left(v\right)=d$.
- ${\mathrm{dcp}}_{i}\left(G\right)$, $i=15+m$, $m\in [2,3]$: the number ${\mathrm{bd}}_{m}^{\mathrm{int}}\left(G\right)$ of interior-edges with bond multiplicity m in G, i.e., ${\mathrm{bd}}_{m}^{\mathrm{int}}\left(G\right)\triangleq \{e\in {E}^{\mathrm{int}}\mid \beta \left(e\right)=m\}$.
- ${\mathrm{dcp}}_{i}\left(G\right)$, $i=17+{\left[\mathrm{a}\right]}^{\mathrm{int}}$, $\mathrm{a}\in {\mathsf{\Lambda}}^{\mathrm{int}}\left({D}_{\pi}\right)$: the frequency ${\mathrm{na}}_{\mathrm{a}}^{\mathrm{int}}\left(G\right)$ of chemical element $\mathrm{a}$ in the set of interior-vertices in G.
- ${\mathrm{dcp}}_{i}\left(G\right)$, $i=17+|{\mathsf{\Lambda}}^{\mathrm{int}}\left({D}_{\pi}\right)|+{\left[\mathrm{a}\right]}^{\mathrm{ex}}$, $\mathrm{a}\in {\mathsf{\Lambda}}^{\mathrm{ex}}\left({D}_{\pi}\right)$: the frequency ${\mathrm{na}}_{\mathrm{a}}^{\mathrm{ex}}\left(G\right)$ of chemical element $\mathrm{a}$ in the set of exterior-vertices in G.
- ${\mathrm{dcp}}_{i}\left(G\right)$, $i=17+|{\mathsf{\Lambda}}^{\mathrm{int}}\left({D}_{\pi}\right)|+|{\mathsf{\Lambda}}^{\mathrm{ex}}\left({D}_{\pi}\right)|+\left[\gamma \right]$, $\gamma \in {\mathsf{\Gamma}}^{\mathrm{int}}\left({D}_{\pi}\right)$: the frequency ${\mathrm{ec}}_{\gamma}\left(G\right)$ of edge-configuration $\gamma $ in the set of interior-edges $e\in {E}^{\mathrm{int}}$ in G.
- ${\mathrm{dcp}}_{i}\left(G\right)$, $i=17+|{\mathsf{\Lambda}}^{\mathrm{int}}\left({D}_{\pi}\right)|+|{\mathsf{\Lambda}}^{\mathrm{ex}}\left({D}_{\pi}\right)|+|{\mathsf{\Gamma}}^{\mathrm{int}}\left({D}_{\pi}\right)|+\left[\psi \right]$, $\psi \in \mathcal{F}\left({D}_{\pi}\right)$: the frequency ${\mathrm{fc}}_{\psi}\left(G\right)$ of fringe-configuration $\psi $ in the set of $\rho $-fringe-trees in G.

#### 2.2. Specifying Target Chemical Graphs

- R1
**Removal of all $\rho $-fringe-trees:**The interior ${H}^{\mathrm{int}}=({V}^{\mathrm{int}}\left(H\right),{E}^{\mathrm{int}}\left(H\right))$ of G is obtained by removing the non-root vertices of each $\rho $-fringe-trees $T\in {\mathcal{T}}^{\mathrm{ex}}\left(G\right)$. Figure 4 illustrates the interior ${H}^{\mathrm{int}}$ of chemical graph G with $\rho =2$ in Figure 2.- R2
**Removal of some leaf paths:**We call a $u,v$-path Q in ${H}^{\mathrm{int}}$ a leaf path if vertex v is a leaf-vertex of ${H}^{\mathrm{int}}$ and the degree of each internal vertex of Q in ${H}^{\mathrm{int}}$ is 2, where we regard that Q is rooted at vertex u. A connected subgraph S of the interior ${H}^{\mathrm{int}}$ of G is called a cyclical-base if S is obtained from H by removing the vertices in $V\left({Q}_{u}\right)\backslash \left\{u\right\},u\in X$ for a subset X of interior-vertices and a set $\{{Q}_{u}\mid u\in X\}$ of leaf $u,v$-paths ${Q}_{u}$ such that no two paths ${Q}_{u}$ and ${Q}_{{u}^{\prime}}$ share a vertex. Figure 5a illustrates a cyclical-base $S={H}^{\mathrm{int}}-{\bigcup}_{u\in X}(V\left({Q}_{u}\right)\backslash \left\{u\right\})$ of the interior ${H}^{\mathrm{int}}$ for a set $\{{Q}_{{u}_{5}}=({u}_{5},{u}_{24}),{Q}_{{u}_{18}}=({u}_{18},{u}_{25},{u}_{26},{u}_{27}),{Q}_{{u}_{22}}=({u}_{22},{u}_{28})\}$ of leaf paths in Figure 4.- R3
**Contraction of some pure paths:**A path in S is called pure if each internal vertex of the path is of degree 2. Choose a set $\mathcal{P}$ of several pure paths in S so that no two paths share vertices except for their end-vertices. A graph ${S}^{\prime}$ is called a contraction of a graph S (with respect to $\mathcal{P}$) if ${S}^{\prime}$ is obtained from S by replacing each pure $u,v$-path with a single edge $a=uv$, where ${S}^{\prime}$ may contain multiple edges between the same pair of adjacent vertices. Figure 5b illustrates a contraction ${S}^{\prime}$ obtained from the chemical graph S by contracting each $uv$-path ${P}_{a}\in \mathcal{P}$ into a new edge $a=uv$, where ${a}_{1}={u}_{1}{u}_{2},{a}_{2}={u}_{1}{u}_{3},{a}_{3}={u}_{4}{u}_{7},{a}_{4}={u}_{10}{u}_{11}$, and ${a}_{5}={u}_{11}{u}_{12}$, and $\mathcal{P}=\{{P}_{{a}_{1}}=({u}_{1},{u}_{13},{u}_{2}),{P}_{{a}_{2}}=({u}_{1},{u}_{14},{u}_{3}),{P}_{{a}_{3}}=({u}_{4},{u}_{15},{u}_{16},{u}_{7}),{P}_{{a}_{4}}=({u}_{10},{u}_{17},{u}_{18},{u}_{19},{u}_{11}),{P}_{{a}_{5}}=({u}_{11},{u}_{20},{u}_{21},{u}_{22},{u}_{12})\}$ of pure paths in Figure 5a.

- -
- Each edge $e=uv\in {E}_{(\ge 2)}$ is replaced with a $u,v$-path ${P}_{e}$ of length at least 2;
- -
- Each edge $e=uv\in {E}_{(\ge 1)}$ is replaced with a $u,v$-path ${P}_{e}$ of length at least 1 (equivalently e is directly used or replaced with a $u,v$-path ${P}_{e}$ of length at least 2);
- -
- Each edge $e\in {E}_{(0/1)}$ is either used or discarded; and
- -
- Each edge $e\in {E}_{(=1)}$ is always used directly.

- -
- Lower and upper bounds ${\mathrm{n}}_{\mathrm{LB}}^{\mathrm{int}},{\mathrm{n}}_{\mathrm{UB}}^{\mathrm{int}}\in {\mathbb{Z}}_{+}$ on the number of interior-vertices of a target chemical graph G.
- -
- For each edge $e=u{u}^{\prime}\in {E}_{(\ge 2)}\cup {E}_{(\ge 1)}$,
- a lower bound ${\ell}_{\mathrm{LB}}\left(e\right)$ and an upper bound ${\ell}_{\mathrm{UB}}\left(e\right)$ on the length $\left|E\right({P}_{e}\left)\right|$ of a pure $u,{u}^{\prime}$-path ${P}_{e}$. (For a notational convenience, set ${\ell}_{\mathrm{LB}}\left(e\right):=0$, ${\ell}_{\mathrm{UB}}\left(e\right):=1$, $e\in {E}_{(0/1)}$ and ${\ell}_{\mathrm{LB}}\left(e\right):=1$, ${\ell}_{\mathrm{UB}}\left(e\right):=1$, $e\in {E}_{(=1)}$. )
- a lower bound ${\mathrm{bl}}_{\mathrm{LB}}\left(e\right)$ and an upper bound ${\mathrm{bl}}_{\mathrm{UB}}\left(e\right)$ on the number of leaf paths ${Q}_{v}$ attached to at internal vertices v of a pure $u,{u}^{\prime}$-path ${P}_{e}$.
- a lower bound ${\mathrm{ch}}_{\mathrm{LB}}\left(e\right)$ and an upper bound ${\mathrm{ch}}_{\mathrm{UB}}\left(e\right)$ on the maximum length $\left|E\right({Q}_{v}\left)\right|$ of a leaf path ${Q}_{v}$ attached at an internal vertex $v\in V\left({P}_{e}\right)\backslash \{u,{u}^{\prime}\}$ of a pure $u,{u}^{\prime}$-path ${P}_{e}$.

- -
- For each vertex $v\in {V}_{\mathrm{C}}$,
- a lower bound ${\mathrm{ch}}_{\mathrm{LB}}\left(e\right)$ and an upper bound ${\mathrm{ch}}_{\mathrm{UB}}\left(e\right)$ on the number of leaf paths ${Q}_{v}$ attached to v, where $0\le {\mathrm{ch}}_{\mathrm{LB}}\left(e\right)\le {\mathrm{ch}}_{\mathrm{UB}}\left(e\right)\le 1$.
- a lower bound ${\mathrm{ch}}_{\mathrm{LB}}\left(v\right)$ and an upper bound ${\mathrm{ch}}_{\mathrm{UB}}\left(v\right)$ on the length $\left|E\right({Q}_{v}\left)\right|$ of a leaf path ${Q}_{v}$ attached to v.

- -
- For each edge $e=u{u}^{\prime}\in {E}_{\mathrm{C}}$, a lower bound ${\mathrm{bd}}_{m,\mathrm{LB}}\left(e\right)$ and an upper bound ${\mathrm{bd}}_{m,\mathrm{UB}}\left(e\right)$ on the number of edges with bond-multiplicity $m\in [2,3]$ in $u,{u}^{\prime}$-path ${P}_{e}$, where we regard ${P}_{e}$, $e\in {E}_{(0/1)}\cup {E}_{(=1)}$ as single edge e.

- -
- Lower and upper bounds ${n}_{\mathrm{LB}},{n}^{*}\in {\mathbb{Z}}_{+}$ on the number of vertices in G, where ${\mathrm{n}}_{\mathrm{LB}}^{\mathrm{int}}\le {n}_{\mathrm{LB}}\le {n}^{*}$.
- -
- Subsets $\mathcal{F}\left(v\right)\subseteq \mathcal{F}\left({D}_{\pi}\right),v\in {V}_{\mathrm{C}}$ and ${\mathcal{F}}_{E}\subseteq \mathcal{F}\left({D}_{\pi}\right)$ of chemical rooted trees with height at most $\rho $, where we require that every $\rho $-fringe-tree ${T}_{v}$ rooted at a vertex $v\in {V}_{\mathrm{C}}$ (resp., at an internal vertex v not in ${V}_{\mathrm{C}}$) in G belongs to $\mathcal{F}\left(v\right)$ (resp., ${\mathcal{F}}_{E}$). Let ${\mathcal{F}}^{*}:={\mathcal{F}}_{E}\cup {\bigcup}_{v\in {V}_{\mathrm{C}}}\mathcal{F}\left(v\right)$ and ${\mathsf{\Lambda}}^{\mathrm{ex}}$ denote the set of chemical elements assigned to non-root vertices over all chemical rooted trees in ${\mathcal{F}}^{*}$.
- -
- A subset ${\mathsf{\Lambda}}^{\mathrm{int}}\subseteq {\mathsf{\Lambda}}^{\mathrm{int}}\left({D}_{\pi}\right)$, where we require that every chemical element $\alpha \left(v\right)$ assigned to an interior-vertex v in G belongs to ${\mathsf{\Lambda}}^{\mathrm{int}}$. Let $\mathsf{\Lambda}:={\mathsf{\Lambda}}^{\mathrm{int}}\cup {\mathsf{\Lambda}}^{\mathrm{ex}}$ and ${\mathrm{na}}_{\mathrm{a}}\left(G\right)$ (resp., ${\mathrm{na}}_{\mathrm{a}}^{\mathrm{int}}\left(G\right)$ and ${\mathrm{na}}_{\mathrm{a}}^{\mathrm{ex}}\left(G\right)$) denote the number of vertices (resp., interior-vertices and exterior-vertices) v such that $\alpha \left(v\right)=\mathrm{a}$ in G.
- -
- A set ${\mathsf{\Lambda}}_{\mathrm{dg}}^{\mathrm{int}}\subseteq \mathsf{\Lambda}\times [1,4]$ of chemical symbols and a set ${\mathsf{\Gamma}}^{\mathrm{int}}\subseteq {\mathsf{\Gamma}}^{\mathrm{int}}\left({D}_{\pi}\right)$ of edge-configurations $(\mu ,\xi ,m)$ with $\mu \le \xi $, where we require that the edge-configuration $\mathrm{ec}\left(e\right)$ of an interior-edge e in G belongs to ${\mathsf{\Gamma}}^{\mathrm{int}}$. We do not distinguish $(\mu ,\xi ,m)$ and $(\xi ,\mu ,m)$.
- -
- Define ${\mathsf{\Gamma}}_{\mathrm{ac}}^{\mathrm{int}}$ to be the set of adjacency-configurations such that ${\mathsf{\Gamma}}_{\mathrm{ac}}^{\mathrm{int}}:=\{(\mathrm{a},\mathrm{b},m)\mid (\mathrm{a}d,\mathrm{b}{d}^{\prime},m)\in {\mathsf{\Gamma}}^{\mathrm{int}}\}$. Let ${\mathrm{ac}}_{\nu}^{\mathrm{int}}\left(G\right),\nu \in {\mathsf{\Gamma}}_{\mathrm{ac}}^{\mathrm{int}}$ denote the number of interior-edges e such that $\mathrm{ac}\left(e\right)=\nu $ in G.
- -
- Subsets ${\mathsf{\Lambda}}^{*}\left(v\right)\subseteq \{\mathrm{a}\in {\mathsf{\Lambda}}^{\mathrm{int}}\mid \mathrm{val}\left(\mathrm{a}\right)\ge 2\}$, $v\in {V}_{\mathrm{C}}$, we require that every chemical element $\alpha \left(v\right)$ assigned to a vertex $v\in {V}_{\mathrm{C}}$ in the seed graph belongs to ${\mathsf{\Lambda}}^{*}\left(v\right)$.
- -
- Lower and upper bound functions ${\mathrm{na}}_{\mathrm{LB}},{\mathrm{na}}_{\mathrm{UB}}:\mathsf{\Lambda}\to [1,{n}^{*}]$ and ${\mathrm{na}}_{\mathrm{LB}}^{\mathrm{int}},{\mathrm{na}}_{\mathrm{UB}}^{\mathrm{int}}:{\mathsf{\Lambda}}^{\mathrm{t}}\to [1,{n}^{*}]$ on the number of interior-vertices v such that $\alpha \left(v\right)=\mathrm{a}$ in G.
- -
- Lower and upper bound functions ${\mathrm{ns}}_{\mathrm{LB}}^{\mathrm{int}},{\mathrm{ns}}_{\mathrm{UB}}^{\mathrm{int}}:{\mathsf{\Lambda}}_{\mathrm{dg}}^{\mathrm{int}}\to [1,{n}^{*}]$ on the number of interior-vertices v such that $\mathrm{cs}\left(v\right)=\mu $ in G.
- -
- Lower and upper bound functions ${\mathrm{ac}}_{\mathrm{LB}}^{\mathrm{int}},{\mathrm{ac}}_{\mathrm{UB}}^{\mathrm{int}}:{\mathsf{\Gamma}}_{\mathrm{ac}}^{\mathrm{int}}\to {\mathbb{Z}}_{+}$ on the number of interior-edges e such that $\mathrm{ac}\left(e\right)=\nu $ in G.
- -
- Lower and upper bound functions ${\mathrm{ec}}_{\mathrm{LB}}^{\mathrm{int}},{\mathrm{ec}}_{\mathrm{UB}}^{\mathrm{int}}:{\mathsf{\Gamma}}^{\mathrm{int}}\to {\mathbb{Z}}_{+}$ on the number of interior-edges e such that $\mathrm{ec}\left(e\right)=\gamma $ in G.

#### 2.3. Examples of Specification

## 3. Results

_{OW}), boiling point (B

_{P}), melting point (M

_{P}), flash point (closed cup) (F

_{P}), lipophylicity (L

_{P}), solubility (S

_{L}) provided by HSDB from PubChem [29] for K

_{OW}, B

_{P}, M

_{P}, and F

_{P}, figshare [30] for L

_{P}and MoleculeNet [31] for S

_{L}.

**Results on Phase 1.**

**Stage 1.**We set a graph class $\mathcal{G}$ to be the set of all chemical graphs with any graph structure, and set a branch-parameter $\rho $ to be 2. For each property $\pi \text{}\in \text{}\{$K

_{OW}, B

_{P}, M

_{P}, F

_{P}, L

_{P}, S

_{L}}, we first select a set $\mathsf{\Lambda}$ of chemical elements and then collect a data set ${D}_{\pi}$ on chemical graphs over the set $\mathsf{\Lambda}$ of chemical elements. To construct the data set ${D}_{\pi}$, we eliminated chemical compounds that have at most three carbon atoms or contain a charged element such as ${\mathrm{N}}^{+}$ or an element $\mathrm{a}\in \mathsf{\Lambda}$ whose valence is different from our setting of valence function $\mathrm{val}$.

- $\mathsf{\Lambda}$: the set of selected chemical elements (hydrogen atoms are added at the final stage);
- $|{D}_{\pi}|$: the size of data set ${D}_{\pi}$ over $\mathsf{\Lambda}$ for property $\pi $;
- $|{\mathsf{\Gamma}}^{\mathrm{int}}\left({D}_{\pi}\right)|$: the number of different edge-configurations of interior-edges over the compounds in ${D}_{\pi}$;
- $\left|\mathcal{F}\right({D}_{\pi}\left)\right|$: the number of non-isomorphic chemical rooted trees in the set of all 2-fringe-trees in the compounds in ${D}_{\pi}$;
- $[\underline{n},\overline{n}]$: the minimum and maximum values of $n\left(G\right)$ over the compounds G in ${D}_{\pi}$; and
- $[\underline{a},\overline{a}]$: the minimum and maximum values of $a\left(G\right)$ in $\pi $ over compounds G in ${D}_{\pi}$.

**Stage 2.**We used the new feature function that consists of the descriptors such as fringe-configuration defined in Section 2.1 and let ${f}_{\mathrm{fc}}$ denote the feature function.

**Stage 3.**Let $\eta :{\mathbb{R}}^{K}\to \mathbb{R}$ be a prediction function to a property function $a:D\to \mathbb{R}$ with a feature function $f:D\to {\mathbb{R}}^{K}$ for a data set D of chemical graphs. We define the coefficient of determination ${\mathrm{R}}^{2}(f,\eta ,D)$ of a prediction function $\eta $ over a data set D to be

_{OW}, B

_{P}, M

_{P}, F

_{P}, L

_{P}, S

_{L}}, and an architecture ${A}_{j}$, $j\in [1,10]$, we constructed five prediction functions in order to evaluate the performance with cross-validation as follows. Partition data set ${D}_{\pi}$ into five subsets ${D}_{\pi}^{\left(i\right)}$, $i\in [1,5]$ randomly and for each set ${D}_{\pi}\backslash {D}_{\pi}^{\left(i\right)}$ construct an ANN $\mathcal{N}(j,i)$ and its prediction function ${\eta}_{\mathcal{N}(j,i)}$ using the feature function ${f}_{\mathrm{fc}}$. We used scikit-learn version 0.23.2 with Python 3.8.5, MLPRegressor and ReLU activation function to construct each ANN $\mathcal{N}(j,i)$. We evaluated the resulting prediction function ${\eta}_{\mathcal{N}(j,i)}$ with the coefficient ${\mathrm{R}}^{2}({f}_{\mathrm{fc}},{\eta}_{\mathcal{N}(j,i)},{D}_{\pi}^{\left(i\right)})$ of determination for the test set ${D}_{\pi}^{\left(i\right)}$. For each property $\pi $, let t-${\mathrm{R}}_{\mathrm{cv}}^{2}\left(j\right)$ denote the average of ${\mathrm{R}}^{2}({f}_{\mathrm{fc}},{\eta}_{\mathcal{N}(j,i)},{D}_{\pi}^{\left(i\right)})$ over all $i\in [1,5]$ in the cross-validation to an architecture ${A}_{j}$.

- -
- $\mathsf{\Lambda}$: the set of selected chemical elements (hydrogen atoms are added at the final stage);
- -
- L-time: the average time (s) to construct an ANN over all $10\times 5=50$ ANNs;
- -
- t-${\mathrm{R}}_{\mathrm{cv}}^{2}$ (best): the best value of t-${\mathrm{R}}_{\mathrm{cv}}^{2}\left(j\right)$ over all architectures ${A}_{j}$, $j\in [1,10]$;
- -
- t-${\mathrm{R}}_{\mathrm{max}}^{2}$: the maximum of ${\mathrm{R}}^{2}({f}_{\mathrm{fc}},{\eta}_{\mathcal{N}(j,i)},{D}_{\pi}^{\left(i\right)})$ over all $j\in [1,10],i\in [1,5]$; and
- -
- Arch.: The architecture ${A}_{j}$, $j\in [1,10]$ that attains t-${\mathrm{R}}_{\mathrm{max}}^{2}$. An architecture $(K,p,1)$ (resp., $(K,{p}_{1},{p}_{2},1)$) consists of an input layer with K nodes, a hidden layer with p nodes (resp., two hidden layers with ${p}_{1}$ and ${p}_{2}$ nodes, respectively), and an output layer with a single node, where K is equal to the number of descriptors in the feature vector.

**An Additional Experiment in Stage 3.**We conducted an additional experiment to compare our new feature function ${f}_{\mathrm{fc}}$ with the feature function ${f}_{\mathrm{ec}}$ based edge-configuration in the previous method [27] designed with the same framework. Note that the previous feature vector ${f}_{\mathrm{ec}}\left(G\right)$ can be defined only for a cyclic graph G, whereas our feature vector ${f}_{\mathrm{fc}}\left(G\right)$ is defined for an arbitrary graph G. For each property $\pi \text{}\in \text{}\{$K

_{OW}, B

_{P}, M

_{P}, F

_{P}, L

_{P}, S

_{L}}, we set a set $\mathsf{\Lambda}$ of chemical elements to be $\left\{\mathrm{C},\mathrm{O},\mathrm{N},\mathrm{S},\mathrm{Cl}\right\}$ and then collect a data set ${\tilde{D}}_{\pi}$ of chemical cyclic graphs from the data set ${D}_{\pi}$ of all chemical graphs over the set $\mathsf{\Lambda}$ of chemical elements in the previous experiment. For each of the feature functions ${f}_{\mathrm{ec}}$ and ${f}_{\mathrm{fc}}$, we constructed five prediction functions with the same set of ten architectures ${A}_{j}$, $j\in [1,10]$ and the data set ${\tilde{D}}_{\pi}$ of chemical cyclic graphs in the same manner of the previous experiment.

- -
- $|{\tilde{D}}_{\pi}|$, $|{D}_{\pi}|$: the size of data set ${\tilde{D}}_{\pi}$ of cyclic graphs (resp., ${D}_{\pi}$ of all chemical graphs) for property $\pi $;
- -
- t-${\mathrm{R}}_{\mathrm{cv}}^{2}$ (ave.): the average of ${\mathrm{R}}^{2}(f,{\eta}_{\mathcal{N}(j,i)},{D}^{\left(i\right)})$ over all $j\in [1,10],i\in [1,5]$ for $f={f}_{\mathrm{ec}},{f}_{\mathrm{fc}}$ and $D={\tilde{D}}_{\pi},{D}_{\pi}$; and
- -
- t-${\mathrm{R}}_{\mathrm{cv}}^{2}$ (best): ${max}_{j\in [1,10]}\{$the average of ${\mathrm{R}}^{2}({f}_{\mathrm{fc}},{\eta}_{\mathcal{N}(j,i)},{D}_{\pi}^{\left(i\right)})$ over all $i\in [1,5]\}$.

_{P}and F

_{P}(resp., B

_{P}, M

_{P}, and F

_{P}). Recall that our new feature function ${f}_{\mathrm{fc}}$ can be defined for arbitrary graphs and we can select a larger data set than that by ${f}_{\mathrm{ec}}$ in a learning stage. This advantage is observed in the experiment. We guess that the better prediction function for B

_{P}(resp., F

_{P}) is obtained by using ${f}_{\mathrm{fc}}$ because the size of data set becomes considerably larger from $|{\tilde{D}}_{\pi}|=224$ to $|{D}_{\pi}|=425$ (resp., from $|{\tilde{D}}_{\pi}|=218$ to $|{D}_{\pi}|=399$).

**Results on Phase 2.**

- (a)
- ${I}_{\mathrm{a}}=({G}_{\mathrm{C}},{\sigma}_{\mathrm{int}},{\sigma}_{\mathrm{ce}})$: The instance used in Section 2.2 to explain the target specification.
- (b)
- ${I}_{\mathrm{b},i}=({G}_{\mathrm{C}}^{i},{\sigma}_{\mathrm{int}}^{i},{\sigma}_{\mathrm{ce}}^{i})$, $i=1,2,3,4$: An instance for inferring chemical graphs with rank at most 2. In the four instances ${I}_{\mathrm{b},i}$, $i=1,2,3,4$, the following specifications in $({\sigma}_{\mathrm{int}},{\sigma}_{\mathrm{ce}})$ are common.
- Set $\mathsf{\Lambda}:=\left\{\mathrm{C},\mathrm{N},\mathrm{O}\right\}$, set ${\mathsf{\Lambda}}_{\mathrm{dg}}^{\mathrm{int}}$ to be the set of all possible symbols in $\mathsf{\Lambda}\times [1,4]$, and set ${\mathsf{\Gamma}}^{\mathrm{int}}$ to be the set of all possible edge-configurations. Set ${\mathsf{\Lambda}}^{*}\left(v\right):=\mathsf{\Lambda}$, $v\in {V}_{\mathrm{C}}$.
- The lower bounds ${\ell}_{\mathrm{LB}}$, ${\mathrm{bl}}_{\mathrm{LB}}$, ${\mathrm{ch}}_{\mathrm{LB}}$, ${\mathrm{bd}}_{2,\mathrm{LB}}$, ${\mathrm{bd}}_{3,\mathrm{LB}}$, ${\mathrm{na}}_{\mathrm{LB}}$, ${\mathrm{na}}_{\mathrm{LB}}^{\mathrm{int}}$, ${\mathrm{ns}}_{\mathrm{LB}}^{\mathrm{int}}$, ${\mathrm{ac}}_{\mathrm{LB}}^{\mathrm{int}}$, ${\mathrm{ec}}_{\mathrm{LB}}^{\mathrm{int}}$ are all set to be 0.
- The upper bounds ${\ell}_{\mathrm{UB}}$, ${\mathrm{bl}}_{\mathrm{UB}}$, ${\mathrm{ch}}_{\mathrm{UB}}$, ${\mathrm{bd}}_{2,\mathrm{UB}}$, ${\mathrm{bd}}_{3,\mathrm{UB}}$, ${\mathrm{na}}_{\mathrm{UB}}$, ${\mathrm{na}}_{\mathrm{UB}}^{\mathrm{int}}$, ${\mathrm{ns}}_{\mathrm{UB}}^{\mathrm{int}}$, ${\mathrm{ac}}_{\mathrm{UB}}^{\mathrm{int}}$, ${\mathrm{ec}}_{\mathrm{UB}}^{\mathrm{int}}$ are all set to be an upper bound ${n}^{*}$ on $n\left({G}^{*}\right)$.
- For each property $\pi $, let $\mathcal{F}\left({D}_{\pi}\right)$ denote the set of 2-fringe-trees in the compounds in ${D}_{\pi}$, and select a subset ${\mathcal{F}}_{\pi}^{i}\subseteq \mathcal{F}\left({D}_{\pi}\right)$ with $|{\mathcal{F}}_{\pi}^{i}|=45-5i$, $i\in [1,5]$. For each instance ${I}_{\mathrm{b},i}$, set ${\mathcal{F}}_{E}:=\mathcal{F}\left(v\right):={\mathcal{F}}_{\pi}^{i}$, $v\in {V}_{\mathrm{C}}$.

Instance ${I}_{\mathrm{b},1}$ is given by the rank-1 seed graph ${G}_{\mathrm{C}}^{1}$ in Figure 9a and Instances ${I}_{\mathrm{b},i}$, $i=2,3,4$ are given by the rank-2 seed graph ${G}_{\mathrm{C}}^{i}$, $i=2,3,4$ in Figure 9b–d.- (i)
- For instance ${I}_{\mathrm{b},1}$, select as a seed graph the monocyclic graph ${G}_{\mathrm{C}}^{1}=({V}_{\mathrm{C}},{E}_{\mathrm{C}}={E}_{(\ge 2)}\cup {E}_{(\ge 1)})$ in Figure 9a, where ${V}_{\mathrm{C}}=\{{u}_{1},{u}_{2}\}$, ${E}_{(\ge 2)}=\left\{{a}_{1}\right\}$ and ${E}_{(\ge 1)}=\left\{{a}_{2}\right\}$. Set ${\mathrm{n}}_{\mathrm{LB}}^{\mathrm{int}}:=0,{\mathrm{n}}_{\mathrm{UB}}^{\mathrm{int}}:=12$ and ${n}_{\mathrm{LB}}:={n}^{*}:=38$. We include a linear constraint $\ell \left({a}_{1}\right)\le \ell \left({a}_{2}\right)$ as part of the side constraint.
- (ii)
- For instance ${I}_{\mathrm{b},2}$, select as a seed graph the graph ${G}_{\mathrm{C}}^{2}=({V}_{\mathrm{C}},{E}_{\mathrm{C}}={E}_{(\ge 2)}\cup {E}_{(\ge 1)}\cup {E}_{(=1)})$ in Figure 9b, where ${V}_{\mathrm{C}}=\{{u}_{1},{u}_{2},{u}_{3},{u}_{4}\}$, ${E}_{(\ge 2)}=\{{a}_{1},{a}_{2}\}$, ${E}_{(\ge 1)}=\left\{{a}_{3}\right\}$ and ${E}_{(=1)}=\{{a}_{4},{a}_{5}\}$. Set ${\mathrm{n}}_{\mathrm{LB}}^{\mathrm{int}}:={\mathrm{n}}_{\mathrm{UB}}^{\mathrm{int}}:=30$ and ${n}_{\mathrm{LB}}:={n}^{*}:=50$. We include a linear constraint $\ell \left({a}_{1}\right)\le \ell \left({a}_{2}\right)$.
- (iii)
- For instance ${I}_{\mathrm{b},3}$, select as a seed graph the graph ${G}_{\mathrm{C}}^{3}=({V}_{\mathrm{C}},{E}_{\mathrm{C}}={E}_{(\ge 2)}\cup {E}_{(\ge 1)}\cup {E}_{(=1)})$ in Figure 9c, where ${V}_{\mathrm{C}}=\{{u}_{1},{u}_{2},{u}_{3},{u}_{4}\}$, ${E}_{(\ge 2)}=\left\{{a}_{1}\right\}$, ${E}_{(\ge 1)}=\{{a}_{2},{a}_{3}\}$ and ${E}_{(=1)}=\{{a}_{4},{a}_{5}\}$. Set ${\mathrm{n}}_{\mathrm{LB}}^{\mathrm{int}}:={\mathrm{n}}_{\mathrm{UB}}^{\mathrm{int}}:=30$ and ${n}_{\mathrm{LB}}:={n}^{*}:=50$. We include linear constraints $\ell \left({a}_{1}\right)\le \ell \left({a}_{2}\right)+\ell \left({a}_{3}\right)$ and $\ell \left({a}_{2}\right)\le \ell \left({a}_{3}\right)$.
- (iv)
- For instance ${I}_{\mathrm{b},4}$, select as a seed graph the graph ${G}_{\mathrm{C}}^{4}=({V}_{\mathrm{C}},{E}_{\mathrm{C}}={E}_{(\ge 2)}\cup {E}_{(\ge 1)}\cup {E}_{(=1)})$ in Figure 9d, where ${V}_{\mathrm{C}}=\{{u}_{1},{u}_{2},{u}_{3},{u}_{4}\}$, ${E}_{(\ge 1)}=\{{a}_{1},{a}_{2},{a}_{3}\}$ and ${E}_{(=1)}=\{{a}_{4},{a}_{5}\}$. Set ${\mathrm{n}}_{\mathrm{LB}}^{\mathrm{int}}:={\mathrm{n}}_{\mathrm{UB}}^{\mathrm{int}}:=30$ and ${n}_{\mathrm{LB}}:={n}^{*}:=50$. We include linear constraints $\ell \left({a}_{2}\right)\le \ell \left({a}_{1}\right)+1$, $\ell \left({a}_{2}\right)\le \ell \left({a}_{3}\right)+1$ and $\ell \left({a}_{1}\right)\le \ell \left({a}_{3}\right)$.

- (c)
- ${I}_{\mathrm{c}}=({G}_{\mathrm{C}},{\sigma}_{\mathrm{int}},{\sigma}_{\mathrm{ce}})$: An instance aimed to infer a chemical graph ${G}^{\u2020}$ such that the core of ${G}^{\u2020}$ is equal to the core of ${G}_{A}$ and the frequency of each edge-configuration in the non-core of ${G}^{\u2020}$ is equal to that of ${G}_{B}$. We use chemical compounds CID 24822711 and CID 59170444 in Figure 10a,b for ${G}_{A}$ and ${G}_{B}$, respectively.Set a seed graph ${G}_{\mathrm{C}}=({V}_{\mathrm{C}},{E}_{\mathrm{C}}={E}_{(=1)})$ to be the core of ${G}_{A}$.Set $\mathsf{\Lambda}:=\left\{\mathrm{C},\mathrm{N},\mathrm{O}\right\}$, and set ${\mathsf{\Lambda}}_{\mathrm{dg}}^{\mathrm{int}}$ to be the set of all possible chemical symbols in $\mathsf{\Lambda}\times [1,4]$.Set ${\mathsf{\Gamma}}^{\mathrm{int}}:={\mathsf{\Gamma}}_{A}^{\mathrm{int}}\cup {\mathsf{\Gamma}}_{B}^{\mathrm{int}}$ and ${\mathsf{\Lambda}}^{*}\left(v\right):=\left\{{\alpha}_{A}\left(v\right)\right\}$, $v\in {V}_{\mathrm{C}}$.Set ${\mathrm{n}}_{\mathrm{LB}}^{\mathrm{int}}:=min\{{\mathrm{n}}^{\mathrm{int}}\left({G}_{A}\right),{\mathrm{n}}^{\mathrm{int}}\left({G}_{B}\right)\}$, ${\mathrm{n}}_{\mathrm{UB}}^{\mathrm{int}}:=max\{{\mathrm{n}}^{\mathrm{int}}\left({G}_{A}\right),{\mathrm{n}}^{\mathrm{int}}\left({G}_{B}\right)\}$,${n}_{\mathrm{LB}}:=min\{n\left({G}_{A}\right),n\left({G}_{B}\right)\}-10$ and ${n}^{*}:=max\{n\left({G}_{A}\right),n\left({G}_{B}\right)\}+5$.Set lower bounds ${\ell}_{\mathrm{LB}}$, ${\mathrm{bl}}_{\mathrm{LB}}$, ${\mathrm{ch}}_{\mathrm{LB}}$, ${\mathrm{bd}}_{2,\mathrm{LB}}$, ${\mathrm{bd}}_{3,\mathrm{LB}}$, ${\mathrm{na}}_{\mathrm{LB}}$, ${\mathrm{na}}_{\mathrm{LB}}^{\mathrm{int}}$, ${\mathrm{ns}}_{\mathrm{LB}}^{\mathrm{int}}$ and ${\mathrm{ac}}_{\mathrm{LB}}^{\mathrm{int}}$ to be 0.Set upper bounds ${\ell}_{\mathrm{UB}}$, ${\mathrm{bl}}_{\mathrm{UB}}$, ${\mathrm{ch}}_{\mathrm{UB}}$, ${\mathrm{bd}}_{2,\mathrm{UB}}$, ${\mathrm{bd}}_{3,\mathrm{UB}}$, ${\mathrm{na}}_{\mathrm{UB}}$, ${\mathrm{na}}_{\mathrm{UB}}^{\mathrm{int}}$, ${\mathrm{ns}}_{\mathrm{UB}}^{\mathrm{int}}$ and ${\mathrm{ac}}_{\mathrm{UB}}^{\mathrm{int}}$ to be ${n}^{*}$.Set ${\mathrm{ec}}_{\mathrm{LB}}^{\mathrm{int}}\left(\gamma \right)$ to be the number of core-edges in ${G}_{A}$ with $\gamma \in {\mathsf{\Gamma}}^{\mathrm{int}}$ and ${\mathrm{ec}}_{\mathrm{UB}}^{\mathrm{int}}\left(\gamma \right)$ to be the number interior-edges in ${G}_{A}$ and ${G}_{B}$ with edge-configuration $\gamma $.Let ${\mathcal{F}}_{B}^{\left(p\right)},p\in [1,2]$ denote the set of chemical rooted trees r-isomorphic p-fringe-trees in ${G}_{B}$.Set ${\mathcal{F}}_{E}:=\mathcal{F}\left(v\right):={\mathcal{F}}_{B}^{\left(1\right)}\cup {\mathcal{F}}_{B}^{\left(2\right)}$, $v\in {V}_{\mathrm{C}}$.
- (d)
- ${I}_{\mathrm{d}}=({G}_{\mathrm{C}}^{1},{\sigma}_{\mathrm{int}},{\sigma}_{\mathrm{ce}})$: An instance aimed to infer a chemical monocyclic graph ${G}^{\u2020}$ such that the frequency vector of edge-configurations in ${G}^{\u2020}$ is a vector obtained by merging those of ${G}_{A}$ and ${G}_{B}$. We use chemical monocyclic compounds CID 10076784 and CID 44340250 in Figure 10c,d for ${G}_{A}$ and ${G}_{B}$, respectively. Set a seed graph to be the monocyclic seed graph ${G}_{\mathrm{C}}^{1}=({V}_{\mathrm{C}},{E}_{\mathrm{C}}={E}_{(\ge 2)}\cup {E}_{(\ge 1)})$ with ${V}_{\mathrm{C}}=\{{u}_{1},{u}_{2}\}$, ${E}_{(\ge 2)}=\left\{{a}_{1}\right\}$ and ${E}_{(\ge 1)}=\left\{{a}_{2}\right\}$ in Figure 9a.Set $\mathsf{\Lambda}:=\left\{\mathrm{C},\mathrm{N},\mathrm{O}\right\}$, ${\mathsf{\Lambda}}_{\mathrm{dg}}^{\mathrm{int}}:={\mathsf{\Lambda}}_{\mathrm{dg},A}^{\mathrm{int}}\cup {\mathsf{\Lambda}}_{\mathrm{dg},B}^{\mathrm{int}}$ and ${\mathsf{\Gamma}}^{\mathrm{int}}:={\mathsf{\Gamma}}_{A}^{\mathrm{int}}\cup {\mathsf{\Gamma}}_{B}^{\mathrm{int}}$.Set ${\mathrm{n}}_{\mathrm{LB}}^{\mathrm{int}}:=min\{{\mathrm{n}}^{\mathrm{int}}\left({G}_{A}\right),{\mathrm{n}}^{\mathrm{int}}\left({G}_{B}\right)\}$, ${\mathrm{n}}_{\mathrm{UB}}^{\mathrm{int}}:=max\{{\mathrm{n}}^{\mathrm{int}}\left({G}_{A}\right),{\mathrm{n}}^{\mathrm{int}}\left({G}_{B}\right)\}$,${n}_{\mathrm{LB}}:=min\{n\left({G}_{A}\right),n\left({G}_{B}\right)\}$ and ${n}^{*}:=max\{n\left({G}_{A}\right),n\left({G}_{B}\right)\}$.Set lower bounds ${\ell}_{\mathrm{LB}}$, ${\mathrm{bl}}_{\mathrm{LB}}$, ${\mathrm{ch}}_{\mathrm{LB}}$, ${\mathrm{bd}}_{2,\mathrm{LB}}$, ${\mathrm{bd}}_{3,\mathrm{LB}}$, ${\mathrm{na}}_{\mathrm{LB}}$, ${\mathrm{na}}_{\mathrm{LB}}^{\mathrm{int}}$, ${\mathrm{ns}}_{\mathrm{LB}}^{\mathrm{int}}$ and ${\mathrm{ac}}_{\mathrm{LB}}^{\mathrm{int}}$ to be 0.Set upper bounds ${\ell}_{\mathrm{UB}}$, ${\mathrm{bl}}_{\mathrm{UB}}$, ${\mathrm{ch}}_{\mathrm{UB}}$, ${\mathrm{bd}}_{2,\mathrm{UB}}$, ${\mathrm{bd}}_{3,\mathrm{UB}}$, ${\mathrm{na}}_{\mathrm{UB}}$, ${\mathrm{na}}_{\mathrm{UB}}^{\mathrm{int}}$, ${\mathrm{ns}}_{\mathrm{UB}}^{\mathrm{int}}$ and ${\mathrm{ac}}_{\mathrm{UB}}^{\mathrm{int}}$ to be ${n}^{*}$.For each edge-configuration $\gamma \in {\mathsf{\Gamma}}^{\mathrm{int}}$, let ${\mathit{x}}_{A}^{*}\left({\gamma}^{\mathrm{int}}\right)$ (resp., ${\mathit{x}}_{B}^{*}\left({\gamma}^{\mathrm{int}}\right)$) denote the number of interior-edges with $\gamma $ in ${G}_{A}$ (resp., ${G}_{B}$), $\gamma \in {\mathsf{\Gamma}}^{\mathrm{int}}$ and set${\mathit{x}}_{min}^{*}\left(\gamma \right):=min\{{\mathit{x}}_{A}^{*}\left(\gamma \right),{\mathit{x}}_{B}^{*}\left(\gamma \right)\}$, ${\mathit{x}}_{max}^{*}\left(\gamma \right):=max\{{\mathit{x}}_{A}^{*}\left(\gamma \right),{\mathit{x}}_{B}^{*}\left(\gamma \right)\}$,${\mathrm{ec}}_{\mathrm{LB}}^{\mathrm{int}}\left(\gamma \right):=\lfloor (3/4){\mathit{x}}_{min}^{*}\left(\gamma \right)+(1/4){\mathit{x}}_{max}^{*}\left(\gamma \right)\rfloor $ and${\mathrm{ec}}_{\mathrm{UB}}^{\mathrm{int}}\left(\gamma \right):=\lceil (1/4){\mathit{x}}_{min}^{*}\left(\gamma \right)+(3/4){\mathit{x}}_{max}^{*}\left(\gamma \right)\rceil $.Set ${\mathcal{F}}_{E}:=\mathcal{F}\left(v\right):={\mathcal{F}}_{A}\cup {\mathcal{F}}_{B}$, $v\in {V}_{\mathrm{C}}$.

_{OW}in Table 3, and an ANN $\mathcal{N}$ constructed in Stage 3 contains 109 input nodes that correspond to the descriptors for the fringe-configuration. However, the set of input nodes for the fringe-configuration is reduced to a set of $|{\mathcal{F}}^{*}|=40$ input nodes when we formulate an MILP for solving instance ${I}_{\mathrm{b},1}$, saving the number of integer variables.

- -
- $\mathsf{\Lambda}$: the set of non-hydrogen chemical elements for inferring a target graph;
- -
- $|{\mathsf{\Gamma}}^{\mathrm{int}}|$: the number of different edge-configurations of interior-edges for inferring a target graph;
- -
- $|{\mathcal{F}}^{*}|$: the number of different chemical rooted trees in the set ${\mathcal{F}}^{*}={\mathcal{F}}_{E}\cup {\bigcup}_{v\in {V}_{\mathrm{C}}}\mathcal{F}\left(v\right)$; and
- -
- $[{\mathrm{n}}_{\mathrm{LB}}^{\mathrm{int}},{\mathrm{n}}_{\mathrm{UB}}^{\mathrm{int}}]$, $[{n}_{\mathrm{LB}},{n}^{*}]$: the lower and upper bounds on ${\mathrm{n}}^{\mathrm{int}}\left({G}^{\u2020}\right)$ and $n\left({G}^{\u2020}\right)$ for inferring a target graph ${G}^{\u2020}$.

**Stage 4.**To solve an MILP in Stage 4, we used CPLEX version 12.10. Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 show the results on Stages 4 and 5, where we denote the following:

- -
- $[\underline{a},\overline{a}]$: the minimum and maximum values of $a\left(G\right)$ in $\pi $ over compounds G in ${D}_{\pi}$ in Table 3;
- -
- $[\underline{y},\overline{y}]$: $\underline{y}$ (resp., $\overline{y}$) denotes the minimum (resp., maximum) target value y with $\underline{a}\le y\le \overline{a}$ such that the MILP instance for the target value ${y}^{*}=y$ becomes feasible (i.e., admits a target chemical graph ${G}^{\u2020}$). To determine the minimum and minimum target values $\underline{y}$ and $\overline{y}$, we solved many numbers of MILP instances. Note that the MILP instance may become infeasible for some value y within the range $[\underline{y},\overline{y}]$;
- -
- ${y}^{*}$: a target value in $[\underline{y},\overline{y}]$ for a property $\pi $;
- -
- #v: the number of variables in the MILP in Stage 4;
- -
- #c: the number of constraints in the MILP in Stage 4;
- -
- IP-time: the time (sec.) to solve the MILP in Stage 4;
- -
- n: the number $n\left({G}^{\u2020}\right)$ of non-hydrogen atoms in the chemical graph ${G}^{\u2020}$ inferred in Stage 4; and
- -
- ${\mathrm{n}}^{\mathrm{int}}$: the number ${\mathrm{n}}^{\mathrm{int}}\left({G}^{\u2020}\right)$ of interior-vertices in the chemical graph ${G}^{\u2020}$ inferred in Stage 4.

_{OW}in Table 7.

_{P}in Table 11.

**Stage 5.**We computed chemical isomers ${G}^{*}$ of each target chemical graph ${G}^{\u2020}$ inferred in Stage 4. We execute the algorithm for generating chemical isomers of ${G}^{\u2020}$ up to 100 when the number of all chemical isomers exceeds 100. The algorithm can evaluate a lower bound on the total number of all chemical isomers ${G}^{\u2020}$ without generating all of them.

- -
- DP-time: the running time (s) to execute the dynamic programming algorithm in Stage 5 to compute a lower bound on the number of all chemical isomers ${G}^{*}$ of ${G}^{\u2020}$ and generate all (or up to 100) chemical isomers ${G}^{*}$;
- -
- G-LB: a lower bound on the number of all chemical isomers ${G}^{*}$ of ${G}^{\u2020}$; and
- -
- #G: the number of all (or up to 100) chemical isomers ${G}^{*}$ of ${G}^{\u2020}$ generated in Stage 5.

## 4. Discussions and Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

ANN | artificial neural network |

MILP | mixed integer linear programming |

## References

- Miyao, T.; Kaneko, H.; Funatsu, K. Inverse QSPR/QSAR analysis for chemical structure generation (from y to x). J. Chem. Inf. Model.
**2016**, 56, 286–299. [Google Scholar] [CrossRef] - Ikebata, H.; Hongo, K.; Isomura, T.; Maezono, R.; Yoshida, R. Bayesian molecular design with a chemical language model. J. Comput. Aided Mol. Des.
**2017**, 31, 379–391. [Google Scholar] [CrossRef][Green Version] - Rupakheti, C.; Virshup, A.; Yang, W.; Beratan, D.N. Strategy to discover diverse optimal molecules in the small molecule universe. J. Chem. Inf. Model.
**2015**, 55, 529–537. [Google Scholar] [CrossRef] [PubMed] - Fujiwara, H.; Wang, J.; Zhao, L.; Nagamochi, H.; Akutsu, T. Enumerating treelike chemical graphs with given path frequency. J. Chem. Inf. Model.
**2008**, 48, 1345–1357. [Google Scholar] [CrossRef] - Kerber, A.; Laue, R.; Grüner, T.; Meringer, M. MOLGEN 4.0. MATCH Commun. Math. Comput. Chem.
**1998**, 37, 205–208. [Google Scholar] - Li, J.; Nagamochi, H.; Akutsu, T. Enumerating substituted benzene isomers of tree-like chemical graphs. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2016**, 15, 633–646. [Google Scholar] [CrossRef] - Reymond, J.L. The chemical space project. Acc. Chem. Res.
**2015**, 48, 722–730. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bohacek, R.S.; McMartin, C.; Guida, W.C. The art and practice of structure-based drug design: A molecular modeling perspective. Med. Res. Rev.
**1996**, 16, 3–50. [Google Scholar] [CrossRef] - Akutsu, T.; Fukagawa, D.; Jansson, J.; Sadakane, K. Inferring a graph from path frequency. Discrete Appl. Math.
**2012**, 160, 1416–1428. [Google Scholar] [CrossRef][Green Version] - Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv
**2016**, arXiv:1609.02907. [Google Scholar] - Gómez-Bombarelli, R.; Wei, J.N.; Duvenaud, D.; Hernández-Lobato, J.M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T.D.; Adams, R.P.; Aspuru-Guzik, A. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci.
**2018**, 4, 268–276. [Google Scholar] [CrossRef] [PubMed] - Segler, M.H.S.; Kogej, T.; Tyrchan, C.; Waller, M.P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci.
**2017**, 4, 120–131. [Google Scholar] [CrossRef][Green Version] - Yang, X.; Zhang, J.; Yoshizoe, K.; Terayama, K.; Tsuda, K. ChemTS: An efficient python library for de novo molecular generation. Sci. Technol. Adv. Mater.
**2017**, 18, 972–976. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kusner, M.J.; Paige, B.; Hernández-Lobato, J.M. Grammar variational autoencoder. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; Volume 70, pp. 1945–1954. [Google Scholar]
- De Cao, N.; Kipf, T. MolGAN: An implicit generative model for small molecular graphs. arXiv
**2018**, arXiv:1805.11973. [Google Scholar] - Madhawa, K.; Ishiguro, K.; Nakago, K.; Abe, M. GraphNVP: An invertible flow model for generating molecular graphs. arXiv
**2019**, arXiv:1905.11600. [Google Scholar] - Shi, C.; Xu, M.; Zhu, Z.; Zhang, W.; Zhang, M.; Tang, J. GraphAF: A flow-based autoregressive model for molecular graph generation. arXiv
**2020**, arXiv:2001.09382. [Google Scholar] - Cherkasov, A.; Muratov, E.M.N.; Fourches, D.; Varnek, A.; Baskin, I.I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y.C.; Todeschini, R.; et al. QSAR modeling: Where have you been? Where are you going to? J. Med. Chem.
**2014**, 57, 4977–5010. [Google Scholar] [CrossRef][Green Version] - Cramer, R.D., III; Patterson, D.E.; Bunce, J.D. Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc.
**1988**, 110, 5959–5967. [Google Scholar] [CrossRef] - Cramer, R.D. Template CoMFA generates single 3D-QSAR models that, for twelve of twelve biological targets, predict all ChEMBL-tabulated affinities. PLoS ONE
**2015**, 10, e0129307. [Google Scholar] [CrossRef] [PubMed][Green Version] - Moriwaki, H.; Tian, Y.-S.; Kawashita, N.; Takagi, T. Three-dimensional classification structure–activity relationship analysis using convolutional neural network. Chem. Pharm. Bull.
**2019**, 67, 426–432. [Google Scholar] [CrossRef] - Azam, N.A.; Chiewvanichakorn, R.; Zhang, F.; Shurbevski, A.; Nagamochi, H.; Akutsu, T. A method for the inverse QSAR/QSPR based on artificial neural networks and mixed integer linear programming. In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies, Valletta, Malta, 24–26 February 2020; Volume 3, pp. 101–108. [Google Scholar]
- Zhang, F.; Zhu, J.; Chiewvanichakorn, R.; Shurbevski, A.; Nagamochi, H.; Akutsu, T. A new integer linear programming formulation to the inverse QSAR/QSPR for acyclic chemical compounds using skeleton trees. In Proceedings of the 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Kitakyushu, Japan, 22–25 September 2020; pp. 433–444. [Google Scholar]
- Azam, N.A.; Zhu, J.; Sun, Y.; Shi, Y.; Shurbevski, A.; Zhao, L.; Nagamochi, H.; Akutsu, T. A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming.
**2020**. submitted. [Google Scholar] - Ito, R.; Azam, N.A.; Wang, C.; Shurbevski, A.; Nagamochi, H.; Akutsu, T. A novel method for the inverse QSAR/QSPR to monocyclic chemical compounds based on artificial neural networks and integer programming. In Proceedings of the International Conference on Bioinformatics & Computational Biology (BIOCOMP 2020), Las Vegas, NV, USA, 27–30 July 2020. [Google Scholar]
- Zhu, J.; Wang, C.; Shurbevski, A.; Nagamochi, H.; Akutsu, T. A novel method for inference of chemical compounds of cycle index two with desired properties based on artificial neural networks and integer programming. Algorithms
**2020**, 13, 124. [Google Scholar] [CrossRef] - Akutsu, T.; Nagamochi, H. A novel method for inference of chemical compounds with prescribed topological substructures based on integer programming. arXiv
**2020**, arXiv:2010.09203. [Google Scholar] - Zhu, J.; Azam, N.A.; Zhang, F.; Shurbevski, A.; Haraguchi, K.; Zhao, L.; Nagamochi, H.; Akutsu, T. A novel method for inferring of chemical compounds with prescribed topological substructures based on integer programming.
**2020**. submitted. [Google Scholar] - PubChem. Available online: https://pubchem.ncbi.nlm.nih.gov/ (accessed on 13 May 2020).
- Figshare. Available online: https://figshare.com/articles/dataset/Lipophilicity_Dataset_-_logD7_4_of_1_130_Compounds/5596750/1 (accessed on 13 May 2020).
- A Benchmark for Molecular Machine Learning. Available online: http://moleculenet.ai/datasets-1 (accessed on 13 May 2020).

**Figure 2.**An illustration of a chemical graph G, where for $\rho =2$, the exterior-vertices are ${w}_{1},{w}_{2},\dots ,{w}_{19}$ and the interior-vertices are ${u}_{1},{u}_{2},\dots ,{u}_{28}$.

**Figure 3.**(

**a**) An illustration of a seed graph ${G}_{\mathrm{C}}$ where the vertices in ${V}_{\mathrm{C}}$ are depicted with gray squares, the edges in ${E}_{(\ge 2)}$ are depicted with dotted lines, the edges in ${E}_{(\ge 1)}$ are depicted with dashed lines, the edges in ${E}_{(0/1)}$ are depicted with gray bold lines, and the edges in ${E}_{(=1)}$ are depicted with black solid lines. (

**b**) A set $\mathcal{F}=\{{\psi}_{1},{\psi}_{2},\dots ,{\psi}_{11}\}\subseteq \mathcal{F}\left({D}_{\pi}\right)$ of 11 chemical rooted trees ${\psi}_{i},i\in [1,11]$, where the root of each tree is depicted with a black circle.

**Figure 5.**(

**a**) A cyclical-base $S={H}^{\mathrm{int}}-{\bigcup}_{u\in \{{u}_{5},{u}_{18},{u}_{22}\}}(V\left({Q}_{u}\right)\backslash \left\{u\right\})$ of the interior ${H}^{\mathrm{int}}$ in Figure 4; (

**b**) A contraction ${S}^{\prime}$ of S for a pure path set $\mathcal{P}=\{{P}_{{a}_{1}},{P}_{{a}_{2}},\dots ,{P}_{{a}_{5}}\}$ in (

**a**), where a new edge obtained by contracting a pure path is depicted with a thick line.

**Figure 6.**An illustration of a graph ${H}^{*}$ that is obtained from the seed graph ${G}_{\mathrm{C}}$ in Figure 3 under the interior-specification ${\sigma}_{\mathrm{int}}$ in Table 1, where the vertices newly introduced by pure paths ${P}_{{a}_{i}}$ and leaf paths ${Q}_{{v}_{i}}$ are depicted with white squares and circles, respectively.

**Figure 7.**Illustration of a set ${\mathcal{G}}^{*}=\{{G}_{1},{G}_{2},{G}_{3},{G}_{4}\}$ of four flavonoids, a seed graph ${G}_{\mathrm{C}}$, and a set ${\mathcal{F}}^{*}=\{{\psi}_{1},{\psi}_{2},{\psi}_{3},{\psi}_{4}\}$ of chemical rooted trees for $\rho =2$: (

**a**) fisetin ${G}_{1}$; (

**b**) ruteorinn ${G}_{2}$; (

**c**) aurone ${G}_{3}$; (

**d**) chalcone ${G}_{4}$; (

**e**) ${G}_{\mathrm{C}}=({V}_{\mathrm{C}},{E}_{\mathrm{C}})$; (

**f**) ${\mathcal{F}}^{*}={\mathcal{F}}_{E}\cup {\bigcup}_{v\in {V}_{\mathrm{C}}}\mathcal{F}\left(v\right)$.

**Figure 8.**Illustration of a set ${\mathcal{G}}^{*}=\{{G}_{1},{G}_{2},{G}_{3}\}$ of three dibenzodiazepine atypical antipsychotics, a seed graph ${G}_{\mathrm{C}}$ and a set ${\mathcal{F}}^{*}=\{{\psi}_{1},{\psi}_{2},\dots ,{\psi}_{8}\}$ of chemical rooted trees for $\rho =2$: (

**a**) clozabine ${G}_{1}$; (

**b**) quetiapine ${G}_{2}$; (

**c**) olanzapine ${G}_{3}$; (

**d**) ${G}_{\mathrm{C}}=({V}_{\mathrm{C}},{E}_{\mathrm{C}})$; (

**e**) ${\mathcal{F}}^{*}={\mathcal{F}}_{E}\cup {\bigcup}_{v\in {V}_{\mathrm{C}}}\mathcal{F}\left(v\right)$.

**Figure 9.**An illustration of seed graphs: (

**a**) A monocyclic graph ${G}_{\mathrm{C}}^{1}$; (

**b**) A rank-2 cyclic graph ${G}_{\mathrm{C}}^{2}$ with two vertex-disjoint cycles; (

**c**) A rank-2 cyclic graph ${G}_{\mathrm{C}}^{3}$ with two disjoint cycles sharing a vertex; (

**d**) A rank-2 cyclic graph ${G}_{\mathrm{C}}^{4}$ with three cycles.

**Figure 10.**An illustration of chemical compounds for instances ${I}_{\mathrm{c}}$ and ${I}_{\mathrm{d}}$: (

**a**) ${G}_{A}$: CID 24822711; (

**b**) ${G}_{B}$: CID 59170444; (

**c**) ${G}_{A}$: CID 10076784; (

**d**) ${G}_{B}$: CID 44340250.

**Figure 11.**(

**a**) ${G}^{\u2020}$ inferred from ${I}_{\mathrm{c}}$ with ${y}^{*}=3.0$ of K

_{OW}; (

**b**) ${G}^{\u2020}$ inferred from ${I}_{\mathrm{d}}$ with ${y}^{*}=1.6$ of L

_{P}.

$\mathit{\pi}$ | $\mathsf{\Lambda}$ | $|{\mathit{D}}_{\mathit{\pi}}|$ | $|{\mathsf{\Gamma}}^{\mathbf{int}}\left({\mathit{D}}_{\mathit{\pi}}\right)|$ | $\left|\mathcal{F}\right({\mathit{D}}_{\mathit{\pi}}\left)\right|$ | $[\underline{\mathit{n}},\overline{\mathit{n}}]$ | $[\underline{\mathit{a}},\overline{\mathit{a}}]$ |
---|---|---|---|---|---|---|

K_{OW} | C,O,N | 644 | 24 | 109 | [4, 58] | [−7.53, 13.45] |

K_{OW} | C,O,N,S,Cl | 837 | 31 | 142 | [4, 73] | [−7.53, 13.45] |

B_{P} | C,O,N | 358 | 21 | 91 | [4, 30] | [−11.70, 470.0] |

B_{P} | C,O,N,S,Cl | 425 | 23 | 114 | [4, 30] | [−11.70, 470.0] |

M_{P} | C,O,N | 448 | 22 | 94 | [4, 122] | [−185.3, 300.0] |

M_{P} | C,O,N,S,Cl | 548 | 26 | 118 | [4, 122] | [−185.3, 300.0] |

F_{P} | C,O,N | 348 | 20 | 85 | [4, 66] | [−82.99, 300.0] |

F_{P} | C,O,N,S,Cl | 399 | 24 | 107 | [4, 66] | [−82.99, 300.0] |

L_{P} | C,O,N | 592 | 27 | 71 | [6, 60] | [−3.62, 6.84] |

L_{P} | C,O,N,S,Cl | 779 | 32 | 78 | [6, 74] | [−3.62, 6.84] |

S_{L} | C,O,N | 640 | 25 | 111 | [4, 55] | [−9.33, 1.11] |

S_{L} | C,O,N,S,Cl | 847 | 31 | 144 | [4, 55] | [−11.60, 1.11] |

$\mathit{\pi}$ | $\mathsf{\Lambda}$ | L-Time | t-${\mathbf{R}}_{\mathbf{cv}}^{2}$ (Best) | t-${\mathbf{R}}_{\mathbf{max}}^{2}$ | Arch. |
---|---|---|---|---|---|

K_{OW} | C,O,N | 0.7 | 0.959 | 0.983 | (156,10,10,1) |

K_{OW} | C,O,N,S,Cl | 0.7 | 0.947 | 0.968 | (199,20,10,1) |

B_{P} | C,O,N | 3.5 | 0.858 | 0.923 | (135,30,20,1) |

B_{P} | C,O,N,S,Cl | 3.3 | 0.821 | 0.899 | (163,10,1) |

M_{P} | C,O,N | 3.8 | 0.784 | 0.893 | (139,40,1) |

M_{P} | C,O,N,S,Cl | 4.1 | 0.796 | 0.880 | (170,10,10,1) |

F_{P} | C,O,N | 1.1 | 0.750 | 0.874 | (128,40,1) |

F_{P} | C,O,N,S,Cl | 1.8 | 0.707 | 0.853 | (157,10,10,1) |

L_{P} | C,O,N | 0.5 | 0.868 | 0.908 | (121,30,1) |

L_{P} | C,O,N,S,Cl | 0.7 | 0.861 | 0.892 | (137,20,10,1) |

S_{L} | C,O,N | 0.7 | 0.870 | 0.913 | (159,30,1) |

S_{L} | C,O,N,S,Cl | 0.9 | 0.870 | 0.903 | (201,30,20,1) |

**Table 5.**Results of prediction functions by ${f}_{\mathrm{ec}}$ and ${f}_{\mathrm{fc}}$ in data set ${\tilde{D}}_{\pi}$ of cyclic graphs and ${f}_{\mathrm{fc}}$ in data set ${D}_{\pi}$ of all graphs.

$\mathit{f}={\mathit{f}}_{\mathbf{ec}}$, $\mathit{D}={\tilde{\mathit{D}}}_{\mathit{\pi}}$ | $\mathit{f}={\mathit{f}}_{\mathbf{fc}}$, $\mathit{D}={\tilde{\mathit{D}}}_{\mathit{\pi}}$ | $\mathit{f}={\mathit{f}}_{\mathbf{fc}}$, $\mathit{D}={\mathit{D}}_{\mathit{\pi}}$ | ||||||
---|---|---|---|---|---|---|---|---|

$\mathit{\pi}$ | $|{\tilde{\mathit{D}}}_{\mathit{\pi}}|$ | t-${\mathbf{R}}_{\mathbf{cv}}^{2}$ (ave.) | t-${\mathbf{R}}_{\mathbf{cv}}^{2}$ (Best) | t-${\mathbf{R}}_{\mathbf{cv}}^{2}$ (ave.) | t-${\mathbf{R}}_{\mathbf{cv}}^{2}$ (Best) | $|{\mathit{D}}_{\mathit{\pi}}|$ | t-${\mathbf{R}}_{\mathbf{cv}}^{2}$ (ave.) | t-${\mathbf{R}}_{\mathbf{cv}}^{2}$ (Best) |

K_{OW} | 580 | 0.952 | 0.959 | 0.950 | 0.954 | 837 | 0.944 | 0.947 |

B_{P} | 224 | 0.688 | 0.718 | 0.680 | 0.693 | 425 | 0.809 | 0.821 |

M_{P} | 348 | 0.668 | 0.694 | 0.712 | 0.736 | 548 | 0.776 | 0.796 |

F_{P} | 218 | 0.435 | 0.476 | 0.574 | 0.623 | 399 | 0.688 | 0.707 |

L_{P} | 776 | 0.832 | 0.842 | 0.853 | 0.861 | 779 | 0.854 | 0.861 |

S_{L} | 638 | 0.851 | 0.863 | 0.853 | 0.861 | 847 | 0.860 | 0.870 |

Instance | $\mathsf{\Lambda}$ | $|{\mathsf{\Gamma}}^{\mathbf{int}}|$ | $|{\mathcal{F}}^{*}|$ | $[{\mathbf{n}}_{\mathbf{LB}}^{\mathbf{int}},{\mathbf{n}}_{\mathbf{UB}}^{\mathbf{int}}]$ | $[{\mathit{n}}_{\mathbf{LB}},{\mathit{n}}^{*}]$ |
---|---|---|---|---|---|

${I}_{\mathrm{a}}$ | C,O,N | 10 | 11 | [30,50] | [20,28] |

${I}_{\mathrm{b},1}$ | C,O,N | 28 | 40 | [38,38] | [6,6] |

${I}_{\mathrm{b},2}$ | C,O,N | 28 | 35 | [50,50] | [30,30] |

${I}_{\mathrm{b},3}$ | C,O,N | 28 | 30 | [50,50] | [30,30] |

${I}_{\mathrm{b},4}$ | C,O,N | 28 | 25 | [50,50] | [30,30] |

${I}_{\mathrm{c}}$ | C,O,N | 8 | 12 | [46,46] | [24,24] |

${I}_{\mathrm{d}}$ | C,O,N | 7 | 8 | [40,45] | [18,18] |

Instance | $[\underline{\mathit{a}},\overline{\mathit{a}}]$ | $[\underline{\mathit{y}},\phantom{\rule{3.33333pt}{0ex}}\overline{\mathit{y}}]$ | ${\mathit{y}}^{*}$ | #v | #c | IP-Time | n | ${\mathbf{n}}^{\mathbf{int}}$ |
---|---|---|---|---|---|---|---|---|

${I}_{\mathrm{a}}$ | [−7.53, 13.45] | [−7.0, 13.4] | 3.2 | 7663 | 9162 | 3.9 | 35 | 24 |

${I}_{\mathrm{b},1}$ | [−7.53, 13.45] | [−7.5, 13.4] | 3.0 | 9894 | 6626 | 17.5 | 38 | 7 |

${I}_{\mathrm{b},2}$ | [−7.53, 13.45] | [−7.5, 13.4] | 3.0 | 11,514 | 8934 | 14.0 | 50 | 30 |

${I}_{\mathrm{b},3}$ | [−7.53, 13.45] | [−7.5, 13.4] | 3.0 | 11,318 | 8926 | 24.6 | 50 | 30 |

${I}_{\mathrm{b},4}$ | [−7.53, 13.45] | [−7.5, 13.4] | 3.0 | 11,122 | 8918 | 22.0 | 50 | 30 |

${I}_{\mathrm{c}}$ | [−7.53, 13.45] | [−7.5, 13.4] | 3.0 | 7867 | 8630 | 2.1 | 49 | 32 |

${I}_{\mathrm{d}}$ | [−7.53, 13.45] | [−7.5, 13.4] | 3.0 | 5395 | 6899 | 5.2 | 45 | 23 |

Instance | $[\underline{\mathit{a}},\overline{\mathit{a}}]$ | $[\underline{\mathit{y}},\phantom{\rule{3.33333pt}{0ex}}\overline{\mathit{y}}]$ | ${\mathit{y}}^{*}$ | #v | #c | IP-Time | n | ${\mathbf{n}}^{\mathbf{int}}$ |
---|---|---|---|---|---|---|---|---|

${I}_{\mathrm{a}}$ | [−11.70, 470.0] | [352, 470] | 411 | 7583 | 8982 | 2.7 | 42 | 25 |

${I}_{\mathrm{b},1}$ | [−11.70, 470.0] | [−11, 470] | 229 | 9816 | 6449 | 2.7 | 38 | 7 |

${I}_{\mathrm{b},2}$ | [−11.70, 470.0] | [−11, 470] | 229 | 11,436 | 8757 | 9.1 | 50 | 30 |

${I}_{\mathrm{b},3}$ | [−11.70, 470.0] | [−11, 470] | 229 | 11,240 | 8749 | 11.0 | 50 | 30 |

${I}_{\mathrm{b},4}$ | [−11.70, 470.0] | [−11, 470] | 229 | 11,044 | 8741 | 24.0 | 50 | 30 |

${I}_{\mathrm{c}}$ | [−11.70, 470.0] | [170, 470] | 320 | 7575 | 8450 | 25.9 | 49 | 33 |

${I}_{\mathrm{d}}$ | [−11.70, 470.0] | [151, 470] | 310 | 5315 | 6719 | 4.4 | 43 | 23 |

Instance | $[\underline{\mathit{a}},\overline{\mathit{a}}]$ | $[\underline{\mathit{y}},\phantom{\rule{3.33333pt}{0ex}}\overline{\mathit{y}}]$ | ${\mathit{y}}^{*}$ | #v | #c | IP-Time | n | ${\mathbf{n}}^{\mathbf{int}}$ |
---|---|---|---|---|---|---|---|---|

${I}_{\mathrm{a}}$ | [−185.3, 300.0] | [55, 300] | 177.5 | 7602 | 9023 | 16.1 | 41 | 24 |

${I}_{\mathrm{b},1}$ | [−185.3, 300.0] | [−180, 300] | 60 | 9833 | 6487 | 2.3 | 38 | 9 |

${I}_{\mathrm{b},2}$ | [−185.3, 300.0] | [−185, 300] | 57.4 | 11,453 | 8795 | 44.7 | 50 | 30 |

${I}_{\mathrm{b},3}$ | [−185.3, 300.0] | [−185, 300] | 57.4 | 11,257 | 8787 | 10.5 | 50 | 30 |

${I}_{\mathrm{b},4}$ | [−185.3, 300.0] | [−185, 300] | 57.4 | 11,061 | 8779 | 93.9 | 50 | 30 |

${I}_{\mathrm{c}}$ | [−185.3, 300.0] | [253, 300] | 260.0 | 7580 | 6172 | 24.0 | 41 | 33 |

${I}_{\mathrm{d}}$ | [−185.3, 300.0] | [−75, 299] | 58 | 5110 | 4050 | 104.6 | 45 | 23 |

Instance | $[\underline{\mathit{a}},\overline{\mathit{a}}]$ | $[\underline{\mathit{y}},\phantom{\rule{3.33333pt}{0ex}}\overline{\mathit{y}}]$ | ${\mathit{y}}^{*}$ | #v | #c | IP-Time | n | ${\mathbf{n}}^{\mathbf{int}}$ |
---|---|---|---|---|---|---|---|---|

${I}_{\mathrm{a}}$ | [−82.99, 300.0] | [98, 300] | 199 | 7459 | 8696 | 1.6 | 35 | 22 |

${I}_{\mathrm{b},1}$ | [−82.99, 300.0] | [−82, 300] | 109 | 9694 | 6166 | 1.4 | 38 | 8 |

${I}_{\mathrm{b},2}$ | [−82.99, 300.0] | [−82, 300] | 109 | 11,314 | 8474 | 8.7 | 50 | 30 |

${I}_{\mathrm{b},3}$ | [−82.99, 300.0] | [−82, 300] | 109 | 11,118 | 8466 | 25.8 | 50 | 30 |

${I}_{\mathrm{b},4}$ | [−82.99, 300.0] | [−82, 300] | 109 | 10,922 | 8458 | 8.5 | 50 | 30 |

${I}_{\mathrm{c}}$ | [−82.99, 300.0] | [250, 300] | 275 | 7667 | 8170 | 60.9 | 47 | 34 |

${I}_{\mathrm{d}}$ | [−82.99, 300.0] | [54, 300] | 177 | 5193 | 6436 | 2.0 | 45 | 23 |

Instance | $[\underline{\mathit{a}},\overline{\mathit{a}}]$ | $[\underline{\mathit{y}},\phantom{\rule{3.33333pt}{0ex}}\overline{\mathit{y}}]$ | ${\mathit{y}}^{*}$ | #v | #c | IP-Time | n | ${\mathbf{n}}^{\mathbf{int}}$ |
---|---|---|---|---|---|---|---|---|

${I}_{\mathrm{a}}$ | [−3.6, 6.84] | [−3.6, 6.8] | 1.6 | 7597 | 9008 | 1.9 | 39 | 23 |

${I}_{\mathrm{b},1}$ | [−3.6, 6.84] | [−3.6, 6.8] | 1.6 | 9836 | 6481 | 2.9 | 38 | 8 |

${I}_{\mathrm{b},2}$ | [−3.6, 6.84] | [−3.6, 6.8] | 1.6 | 11,456 | 8789 | 21.1 | 50 | 30 |

${I}_{\mathrm{b},3}$ | [−3.6, 6.84] | [−3.6, 6.8] | 1.6 | 11,260 | 8781 | 20.4 | 50 | 30 |

${I}_{\mathrm{b},4}$ | [−3.6, 6.84] | [−3.6, 6.8] | 1.6 | 11,064 | 8773 | 24.2 | 50 | 30 |

${I}_{\mathrm{c}}$ | [−3.6, 6.84] | [−3.6, 6.8] | 1.6 | 7801 | 8476 | 1.1 | 47 | 32 |

${I}_{\mathrm{d}}$ | [−3.6, 6.84] | [−3.6, 6.8] | 1.6 | 5335 | 6754 | 4.3 | 45 | 23 |

Instance | $[\underline{\mathit{a}},\overline{\mathit{a}}]$ | $[\underline{\mathit{y}},\phantom{\rule{3.33333pt}{0ex}}\overline{\mathit{y}}]$ | ${\mathit{y}}^{*}$ | #v | #c | IP-Time | n | ${\mathbf{n}}^{\mathbf{int}}$ |
---|---|---|---|---|---|---|---|---|

${I}_{\mathrm{a}}$ | [−9.33, 1.11] | [−9.3, −2.0] | −5.6 | 7674 | 9186 | 2.4 | 41 | 23 |

${I}_{\mathrm{b},1}$ | [−9.33, 1.11] | [−9.3, −2.0] | −5.6 | 9906 | 6650 | 22.3 | 38 | 12 |

${I}_{\mathrm{b},2}$ | [−9.33, 1.11] | [−9.3, −2.0] | −5.6 | 11,526 | 8958 | 15.2 | 50 | 30 |

${I}_{\mathrm{b},3}$ | [−9.33, 1.11] | [−9.3, −2.0] | −5.6 | 11,330 | 8950 | 16.2 | 50 | 30 |

${I}_{\mathrm{b},4}$ | [−9.33, 1.11] | [−9.3, −2.0] | −5.6 | 11,134 | 8942 | 122.7 | 50 | 30 |

${I}_{\mathrm{c}}$ | [−9.33, 1.11] | [−9.3, −2.0] | −5.6 | 7874 | 8648 | 1.2 | 54 | 33 |

${I}_{\mathrm{d}}$ | [−9.33, 1.11] | [−9.3, −3.0] | −6.1 | 5402 | 6917 | 8.1 | 43 | 23 |

Kow | Lp | Bp | |||||||
---|---|---|---|---|---|---|---|---|---|

Instance | DP-Time | G-LB | #G | DP-Time | G-LB | #G | DP-time | G-LB | #G |

${I}_{\mathrm{a}}$ | 0.031 | 16 | 16 | 0.164 | 128 | 100 | 0.164 | $1.4\times {10}^{5}$ | 100 |

${I}_{\mathrm{b}}^{1}$ | 0.149 | $2.8\times {10}^{5}$ | 100 | 0.148 | $2.0\times {10}^{10}$ | 100 | 0.162 | $4.4\times {10}^{5}$ | 100 |

${I}_{\mathrm{b}}^{2}$ | 44.1 | $3.9\times {10}^{10}$ | 100 | 118 | 900 | 100 | 171 | 6 | 6 |

${I}_{\mathrm{b}}^{3}$ | 27.2 | 20 | 20 | 80.2 | 6 | 6 | 28.6 | 7 | 7 |

${I}_{\mathrm{b}}^{4}$ | 0.166 | 6000 | 100 | 73 | 12 | 12 | 142 | 5 | 5 |

${I}_{\mathrm{c}}$ | 0.166 | 6000 | 100 | 0.168 | 288 | 100 | 0.168 | $4.0\times {10}^{5}$ | 100 |

${I}_{\mathrm{d}}$ | 22.3 | $8.3\times {10}^{10}$ | 100 | 1.44 | $3.2\times {10}^{8}$ | 100 | 1.7 | $9.7\times {10}^{9}$ | 100 |

F_{P} | M_{P} | S_{L} | |||||||
---|---|---|---|---|---|---|---|---|---|

Instance | DP-Time | G-LB | #G | DP-Time | G-LB | #G | DP-Time | G-LB | #G |

${I}_{\mathrm{a}}$ | 0.057 | 32 | 32 | 0.165 | 256 | 100 | 0.165 | 1024 | 100 |

${I}_{\mathrm{b}}^{1}$ | 0.164 | $3.1\times {10}^{6}$ | 100 | 0.166 | $1.4\times {10}^{6}$ | 100 | 0.163 | $4.5\times {10}^{5}$ | 100 |

${I}_{\mathrm{b}}^{2}$ | 28.8 | 720 | 100 | 8.26 | $2.4\times {10}^{10}$ | 100 | 1.07 | $5.6\times {10}^{9}$ | 100 |

${I}_{\mathrm{b}}^{3}$ | 72.2 | 27 | 27 | 51.9 | 1 | 1 | 46.5 | 1680 | 100 |

${I}_{\mathrm{b}}^{4}$ | 40.3 | 20 | 20 | 125 | $6.1\times {10}^{7}$ | 100 | 7.01 | $1.1\times {10}^{8}$ | 100 |

${I}_{\mathrm{c}}$ | 0.169 | $1.1\times {10}^{5}$ | 100 | 0.173 | 6048 | 100 | 0.168 | 120 | 100 |

${I}_{\mathrm{d}}$ | 0.057 | 32 | 32 | 0.17 | $4.2\times {10}^{8}$ | 100 | 0.165 | 1024 | 100 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Shi, Y.; Zhu, J.; Azam, N.A.; Haraguchi, K.; Zhao, L.; Nagamochi, H.; Akutsu, T.
An Inverse QSAR Method Based on a Two-Layered Model and Integer Programming. *Int. J. Mol. Sci.* **2021**, *22*, 2847.
https://doi.org/10.3390/ijms22062847

**AMA Style**

Shi Y, Zhu J, Azam NA, Haraguchi K, Zhao L, Nagamochi H, Akutsu T.
An Inverse QSAR Method Based on a Two-Layered Model and Integer Programming. *International Journal of Molecular Sciences*. 2021; 22(6):2847.
https://doi.org/10.3390/ijms22062847

**Chicago/Turabian Style**

Shi, Yu, Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Hiroshi Nagamochi, and Tatsuya Akutsu.
2021. "An Inverse QSAR Method Based on a Two-Layered Model and Integer Programming" *International Journal of Molecular Sciences* 22, no. 6: 2847.
https://doi.org/10.3390/ijms22062847