Structural Variation in the Human Genome

January 14, 2008

Back in mid December I attended a talk by Evan Eichler entitled, “Structural Variation in the Human Genome”. Evan gave an excellent talk focusing on his lab’s recent work on identifying the regions of the genome which change in structure and content very quickly. In short, identify length variations within the human genome. His talk broke down into two major components:

  1. Identify duplicated regions associated with phenotypes
  2. Catalog normal structural variation in the genome

I’ll go into more details on both below the fold …
In the first half of his talk he focused on recent work which looks at duplicated regions of the Human genome. First, he defines a duplicated region as > 90% identical over > 1kb. Using this definition, approximately 4-5% of the human genome is composed of duplicated regions. The majority are interspersed, in contrast to the mouse where most are tandem. These duplications drive unequal cross-overs. [Refs: She et. al. (2004)]

A number of diseases have been linked to recurrent rearrangements. With this in mind, they identify 130 candidate hotspots, build a custom array against these, and investigate them with arrayCGH. They then use idiopathic mental retardation (IMR) as their disease of focus and test a population which was previously determined to be karyotype normal. From this exploratory setup, they identified a couple recurrent events which correspond to sets of individuals having similar phenotypic manifestations of IMR. (Interestingly, one of these appears to be specific to European populations.) They explain roughly 6-7% of IMR with 4 identified microdeletions. [Refs: Sharp et. al. (2005); Locke et. al. (2006)]

Do these deletions give rise to new transcripts? They identify focal points (cores) for each chromosome by looking at duplication blocks. These cores are enriched for fast evolving annotated genes (Great Ape Specific Gene Families). [Ref: Jiang et. al. (2007)]

In the second half of his talk he focused on his lab’s related efforts at cataloging the normal structural variation within the genome. Some normal copy number variants are associated with propensity of certain diseases. Approximately 615 Mb have copy number variations by arrayCGH, but this methodology lacks resolution so it is unlikely that 20% of the genome varies in copy number. In addition, arrayCGH can not detect certain structural variations because of its dependence on the reference genome.

Ideally want to sequence multiple individual human genomes. An initial look at this methodology is described in Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., Haugen, E., Hayden, H., Albertson, D., Pinkel, D., Olson, M.V., Eichler, E.E. (2005). Fine-scale structural variation of the human genome. Nature Genetics, 37(7), 727-732. DOI: 10.1038/ng1562. My notes on Eichler’s talk get sparse here, but over the holidays I read over this paper. It is short and concise description of the big picture of structural variation that their fosmid paired-end sequencing approach identified. There are a couple very nice summary figures in the paper. They note two biases in their discovered variants: (1) over half of variants map to segmentally duplicated regions (which by themselves only represent 5% of the genome); (2) More than 85% of the variants they detect do not overlap those from previous studies, possibly owing to differences in method of ascertainment; and finally (3) insertions outnumber deletions, perhaps indicating that haploinsufficiency is less tolerated than trisomy in evolving populations.

In Eichler’s conclusion to his talk, he gives the following list of common features of structural variants: recurrent, selective constraint, non-randomly distributed, association with disease may be under-ascertained because of association with duplications, may be positively selected for to create new genes and offset the inherent load.

He also mentions the need for a methodology which can distinguish 15 versus 16 copies of a region, even if the individual copies are > 95% identical. Interestingly, similar desire was expressed by Huntington Willard in a recent talk on centromere and satellite DNA assembly. Clearly there is a need here which is difficult, but would open a number of different (largely uncharted) territories.

Leave a comment