Entially methylated CpGs (DMCs) between the normal samples from women who remained disease-free and the normal samples which progressedTeschendorff et al. BMC Bioinformatics (2016) 17:Page 6 ofto CIN2+, as assessed using moderated t-tests (Table 1, Fig. 2a), in agreement with our previous observation [13]. We next compared all five DV GS-9620 web algorithms in their ability to detect differentially variable CpGs (DVCs) between the same two phenotypes. We observed marked differences, with J-DMDV and DiffVar not identifying any DVCs at genome-wide significance, in stark contrast to iEVORA and GAMLSS which could identify many DVCs (Table 1, Fig. 2a). On the other hand, if we compared normal to CIN2+ samples, or normal samples to cervical cancer, we observed many DMCs and all DV algorithms had enough sensitivity to identify DVCs (Table 1). The algorithms performed similarly in a second data set, measuring DNA methylation (now Illumina 450 k beadarrays) in over 300 samples, including 50 normal breast tissue samples from healthy women, 42 normaladjacent breast tumor matched pairs and an additional 263 unmatched breast cancers (Methods). In this independent set we could also not detect any DMCs between the normal cells from healthy women and the normal cells adjacent to breast cancers (Table 1, Fig. 2b). The two DV algorithms which in the cervical smear analysis could not identify any DVCs, could also not identify any DVCs in this set (J-DMDV), or in the case of DiffVar, not as many as GALMSS or iEVORA (Table 1, Fig. 2b). In agreement with the cervical study, if we compared the normal samples from healthy women to breast cancers, we observed that most sites in the genome constituted DMCs, as well as DVCs, and that any DV algorithm could identify DVCs (Table 1).DVCs pinpoint epigenetic field defects which progress to invasive cancerThe increased sensitivity of iEVORA and GAMLSS to detect DVCs in pre-neoplastic lesions does not necessarily mean that these DVCs are biological features of relevance to the carcinogenic process. However, if DVCs detected between normal and pre-neoplastic lesionsexhibit progressive changes in neoplasia and invasive cancer, then this would support their biological relevance. Thus, we compared all the algorithms in their ability to detect CpG sites in pre-neoplastic lesions, which later progress in neoplasia and/or invasive cancer (Methods). In the context of cervical carcinogenesis, progression was assessed using two independent data sets profiling normal and CIN2+ samples, as well as a data set profiling normals and invasive cervical cancers [14]. We observed that DVCs selected and ranked using iEVORA, Bartlett’s test (BT) or GAMLSS were more likely to undergo further significant DNAm changes (preserving directionality) in CIN2+ and cervical cancer compared to features selected using t-tests, or one of the other DV algorithms (J-DMDV and DiffVar) (Fig. 3, Additional file 1: Figures S1-S2). iEVORA was more robust than BT and GAMLSS, attaining positive predictive values (PPV) for CIN2+ of over 25 and for cervical cancer of over 60 across a larger range of top ranked DVCs (Fig. 3). iEVORA also outperformed all other DV algorithms in the context of breast carcinogenesis, where progression was assessed by comparing the 50 normal breast samples from healthy women to the 305 breast cancers. In most cases, iEVORA achieved PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/28404814 PPVs for breast cancer of around 80 or over, in stark contrast to BT or GAMLSS, whose PPVs never exceeded 40.