Monday 29 October 2012

RAD-seq for Next-Generation Phylogenetics

Following decades of using mitochondrial DNA or universal multi-copy ribosomal DNA regions for taxonomy, phylogeny and phylogeography in plant and animal taxa, here comes the turn of nuclear genes. In recent years, single-copy or low-copy genes have been increasingly employed to resolve species-level or even deeper relationships. Some genes (or portions of them) were chosen because of their rapid evolutionary rate, some because of their capacity in revealing hybridization, introgression, allopolyploidization or adaptation. In the process of developing these markers, there is a need to identify genes with an evolutionary speed suitable for that phylogenetic level and a need to compare the nuclear phylogeny with the phylogeny inferred from the uniparentally inherited mitochondrial DNA.

Today, I needed this hung on
the wall next to my lab bench
(from www.sugarscientist.com)
Negative issues are: a) the development of low-copy or single-copy nuclear markers relies heavily on the availability of genomic resources for the group in question, b) nuclear genes may not be suitable in cases of recent speciation therefore the concatenation of multiple loci is required to resolve among species, c) in the process of marker selection, it is necessary to eliminate from the formula the “paralogous” (i.e. duplicated genes, gene families) to avoid confusion between gene genealogies and species phylogenies. Several protocols have been proposed for the selection of single copy nuclear markers from genomic data.  The process however is a pain in the neck. This is becasue the development of the markers is almost always based on genomic data from a limited – nearly insignificant - number of taxa when compared to the real number of the species in the group…. Hence the rounds of failed PCR, the redesigned, degenerated primers, the years of postdoc effort into a single project, hence the missing data….




Next-generation sequencing technology (NGS) has revolutionized multiple scientific fields by delivering effortlessly and cost-effectively huge quantities of data at such an extent that computers cannot handle them anymore. These data represent first class sources for quick and inexpensive marker development (among other goodies obviously). Most importantly however, NGS coupled with third part protocols, brought traditional population genetics and phylogenetics into population level genomics and phylogenomics respectively.
Generating and scoring loci with NGS (McCormack et. 2012)
One of these rapidly emerging techniques is “Restriction site Associated DNA sequencing”, otherwise known as “RAD-seq”. Briefly RAD-seq makes use of Illumina or SOLiD high throughput NG sequencing and involves three simple steps: a) cut the genome of an individual with a restriction enzyme, b) paste the restriction fragments to a modified adapter containing a unique identifying sequence and c) sequence the ends of the resulting fragments using NGS. Fragments from multiple individuals can be pooled together and sequenced on a single Illumina lane. The reads can be separated bioinformatically and many biologically relevant SNPs and genetic loci can be identified in a single experiment even from species with no reference genome availability. So far, a first class highly promising approach for next-generation population genomics, SNP discovery, genotyping and genetic mapping, but what about systematics, phylogeography and phylogenetics? Apparently it takes a bit to make the roots and convince in these fields, mainly because of, yet again, the difficulties in assessing orthology from such short DNA fragments. Most important however appears to be the group’s age and the extent of evolutionary divergence among lineages.  Basically, the odds of identifying a suitable number of orthologous restriction sites across species within a target group decrease with increasing the age of that group.
Relevant literature

No comments:

Post a Comment