T approximate methods that can quite accurately estimate the ab initio alignment probabilities fairly efficiently. Such approximate methods will be the subject of a related study (Ezawa, unpublished; draft manuscripts available [40, 41]).Conclusions To the best of our knowledge, this is the first study to theoretically dissect the ab initio calculation PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/28380356 of alignment probabilities under a genuine stochastic evolutionary model, which describes the evolution of an entire sequence via insertions and deletions (indels) along the time axis. The model handled here extends the previously most general evolutionary model, i.e., the general form of the “substitution/inserton/deletion models” [21]. It should be noted that we did not impose any unnatural restrictions such as the prohibition of overlapping indels. Nor did we make the pre-proof assumption that the probability is factorable into the product of column-tocolumn transition probabilities or block-wise contributions. The essential tool introduced in this study was the operator representation of indels. This enabled us to shift the focus from the trajectory of sequence states (as in [21]) to the series of indel events, and to define local-history-set (LHS) equivalence classes of indel Mitochondrial division inhibitor 1 custom synthesis histories. Moreover, the operator representation also facilitated the adaptation of the time-dependent perturbation expansion (e.g., [29, 30, 33]), which enabled us to express the probability of an alignment as a summation of probabilities over all alignment-consistent indel histories. Then, under a most general set of indel rate parameters, we exploited the LHS equivalence classes and found a “sufficient and nearly necessary” set of conditions on the indel rate parameters and exit rates under which the ab initio alignment probabilities can be factorized to provide a sort of generalized HMMs. We also showed that quite a wide variety of indel models could satisfy this set of conditions. Such models include not only the “longMethods Methodological details in this study are described in Supplementary methods in Additional file 1, or in Supplementary appendix in Additional file 2. Endnotes 1 In a sense, the models implemented in the genuine sequence evolution simulators (e.g., [26?8]) could also be considered as special cases of this evolutionary model. 2 This poses no problem here because the summation is always bounded from above (by a number less than or equal to unity) and all terms in the summation are nonnegative, which guarantees the convergence of the expansion. 3 However, the two manuscripts differ in some aspects, e.g., their starting points. In [32], we described our model from scratch, as a continuous-time Markov model, and only briefly discussed its relationship with the general SID model [21]. In contrast, this manuscript explicitly presents our model as an extension of the general SID model, because that would put our model into the historical context of studies on indel evolutionary models. 4 It may be worth mentioning, though, that the operator notation allows flexible representations of general state changes (by mutations). In the vector-matrix notation, this is possible only via abstract mutation matrices (and state vectors); concrete matrices (and vectors) can at most describe some fixed specific indel histories.Ezawa BMC Bioinformatics (2016) 17:Page 23 ofIn Eq.(R1.1), () and 0(), respectively, are the residues before and after the substitution. sI in Eq.(R1.2) denotes the inserted subsequence,.