Project

Participants

Community

Powered by BOINC

Correlizer

https | Log in

Revealing the Mysteries of Genome Organization

Genomes are fantastic keepers of genetic information and are the outcome of evolutionary replication, mutation and selection. Genomes organize functions from the cellular level, via the organismic level, up to the complex basis of mind. In human cells the genetic information controlling most processes from the cellular level, over embryogenesis to cognitive ability, manifests in a diploid set of 23 DNA molecules (chromosomes), combined they consist of ~3x10e9 base pairs (bp) stored in ~2.80 GB of data. This whole genome, whose added molecular length totals ~2 m, is kept in comparably small cell nuclei with typical diameters of ~10 µm or volumes of 500 µm3. The sequential organization of genomes, i.e. the relations between distant base pairs and regions within sequences, and its connection to the three-dimensional architectural organization of genomes is still a largely unresolved problem.

Correlizer has been set up to unravel these mysteries, and we found long-range power-law correlations on almost the entire observable scale of 132 completely sequenced chromosomes of 0.5 x 10e6 to 3.0 x 10e7 bp. Varying from Archaea, Bacteria, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, and Homo sapiens. The local correlation coefficients show a species-specific multi-scaling behavior: close to random correlations on the scale of a few base pairs, a first maximum from 40 to 3,400 bp (for Arabidopsis thaliana and Drosophila melanogaster divided in two submaxima), and often a region of one or more second maxima from 10e5 to 3 x 10e5 bp. Within this multi-scaling behavior, an additional fine-structure is present and attributable to codon usage in all except the human sequences, where it is related to nucleosomal binding.

Computer-generated random sequences assuming a block organization of genomes, the codon usage, and nucleosomal binding explain these results. Mutation by sequence reshuffling destroyed all correlations. Thus, the stability of correlations seems to be evolutionarily tightly controlled and connected to the spatial genome organization, especially on large scales.

In summary, we found that genomes show a complex sequential organization related closely to their three-dimensional organization, which has resulted in entire new insights into genomes. This is important for the general understanding of genomes, from which then new diagnostics and treatments can be developed. With modern high-throughput sequencing techniques now entire genomes are sequenced on a daily basis for a very low price. Actually thousands of completely sequenced genomes fill the public databases meanwhile. Consequently, the analysis of the genetic sequence gets more and more important as well.

Here we ask for your support by donating your computer resources for Correlizer, to reveal these mysteries further and thus lay the foundation for better diagnosis and treatment of diseases.

User of the Day

User profile
I started to work for distributed computing in 1997 with distributed.net's RC5-56 project. Afterwards I took part in a lot of projects. In 2002 I joined the community of Rechenkraft.net.

2005 the non profit association Rechenkraft.net e.V. was founded with members all over Germany to promote distributed computing.


  • We tell other people about DC also in radio interviews and newspapers.
  • We answer questions regarding DC and the scientific background in forums, email and chat.
  • We prepare a book about DC.
  • We also operate the DC projects yoyo@home and RNA World.
  • We

News

... more

News is available as an RSS feed RSS