Tion simulation was constructed. For 1000 instances of a perfectly periodic synthetic
Tion simulation was constructed. For 1000 instances of a perfectly periodic synthetic sequence of length N = 150 with 10 bp periodicity degraded between 1 and 50 (20 instances per 1 increment), i.e. the true positives, and 1000 randomly permuted instances of the sameAll methods were implemented originally in MATLAB Version 2008a. The methods for period estimation and significance testing were then independently implemented in Python and Pyrex, a language that generates C-code which is compiled into dynamically loaded Python extensions. The compiled versions substantially improve compute performance. The PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/27107493 source code has been contributed to the open sourced genome biology toolkit PyCogent [43] and is available from the subversion repository. All genomic analyses were conducted using the Python implementation. All scripts used are available on request from the authors.Biological dataYeast genome sequence coordinates for nucleosome associated DNA were obtained from Lee et al [37], whose procedure we briefly summarise. This data set was generated by analysis of a micrococcal nuclease (MNase) digestion of whole yeast genomic chromatin that had been subjected to cross-linking of histones to DNA. The resulting purified DNA fragments were then hybridized to an Affymetrix probe array with a 4 bp resolution. A Hidden Markov Model was used for detecting regions corresponding to `well-positioned’, defined as spanning 31-38 probes, or `fuzzy’, defined as spanning 39 probes, nucleosomes. Linker regions were defined as those spanning between identified nucleosome positions. Coordinates for the wellpositioned, fuzzy and linker regions were downloaded from http://chemogenomics.stanford.edu/supplements/ 03nuc/datasets.html (dataset S5). Since these regions differed in length, and statistical power of the period estimation methods are sensitive to length, we modified these sequence coordinates such that sequence fragments from each class were all exactly 150 bp long. Specifically, the sequence coordinates from Lee et al were adjusted by equivalent symmetric expansion (in the 5′ and 3′ directions) until the coordinates were exactly 150 bp long. Only sequence coordinates that were independent (did not overlap with any other coordinates) were used. The genomic sequences corresponding to these coordinates wereEpps et al. Biology Direct 2011, 6:21 http://www.biology-direct.com/content/6/1/Page 13 ofdownloaded from http://www.ebi.ac.uk/ huber/yetia/ yetiadata/SGD-0508/. The total number of sequences in each class were: 31557 well-positioned nucleosomes; 41770 fuzzy nucleosomes; and 10465 linker regions. Mouse genomic sequences were obtained from Ensembl release 58.the periodicity profile determined using either the autocorrelation, discrete Fourier transform or Hybrid method. Our comments concerning the use of exploratory period estimation for confirmatory purposes have been revised to Lurbinectedin biological activity clarify their meaning.Reviewer’s reportReviewers’ commentsReviewer’s reportProf Tomas Radivoyevitch, Case Western Reserve University, Ohio It would be nice if a new Figure 1 was created to give the overall organizational structure of the methods. As it stands, my default inclination is to think of the exploratory estimation methods as being “feature extractions” in pattern recognition (i.e. a data dimension reduction step). But I am not sure of this. Does any filtering happen in the frequency domain to reduce the dimensionality of the data before the statistical tests are appli.