The estimation of species trees (phylogenies) is one of the most important problems in evolutionary biology, and recently, there has been greater appreciation of the need to estimate species trees directly rather than using gene trees as a surrogate. A Bayesian method constructed under the multispecies coalescent model can consistently estimate species trees but involves intensive computation, which can hinder its application to the phylogenetic analysis of large-scale genomic data. Many summary statistics-based approaches, such as shallowest coalescences (SC) and Global LAteSt Split (GLASS), have been developed to infer species phylogenies for multilocus data sets. In this paper, we propose 2 methods, species tree estimation using average ranks of coalescences (STAR) and species tree estimation using average coalescence times (STEAC), based on the summary statistics of coalescence times. It can be shown that the 2 methods are statistically consistent under the multispecies coalescent model. STAR uses the ranks of coalescences and is thus resistant to variable substitution rates along the branches in gene trees. A simulation study suggests that STAR consistently outperforms STEAC, SC, and GLASS when the substitution rates among lineages are highly variable. Two real genomic data sets were analyzed by the 2 methods and produced species trees that are consistent with previous results.
The human genome project has been recently complemented by whole-genome assessment sequence of 32 mammals and 24 nonmammalian vertebrate species suitable for comparative genomic analyses. Here we anticipate a precipitous drop in costs and increase in sequencing efficiency, with concomitant development of improved annotation technology and, therefore, propose to create a collection of tissue and DNA specimens for 10 000 vertebrate species specifically designated for whole-genome sequencing in the very near future. For this purpose, we, the Genome 10K Community of Scientists (G10KCOS), will assemble and allocate a biospecimen collection of some 16 203 representative vertebrate species spanning evolutionary diversity across living mammals, birds, nonavian reptiles, amphibians, and fishes (ca. 60 000 living species). In this proposal, we present precise counts for these 16 203 individual species with specimens presently tagged and stipulated for DNA sequencing by the G10KCOS. DNA sequencing has ushered in a new era of investigation in the biological sciences, allowing us to embark for the first time on a truly comprehensive study of vertebrate evolution, the results of which will touch nearly every aspect of vertebrate biological enquiry.
The zebra finch has long been an important model system for the study of vocal learning, vocal production, and behavior. With the imminent sequencing of its genome, the zebra finch is now poised to become a model system for population genetics. Using a panel of 30 noncoding loci, we characterized patterns of polymorphism and divergence among wild zebra finch populations. Continental Australian populations displayed little population structure, exceptionally high levels of nucleotide diversity (pi = 0.010), a rapid decay of linkage disequilibrium (LD), and a high population recombination rate (rho approximate to 0.05), all of which suggest an open and fluid genomic background that could facilitate adaptive variation. By contrast, Substantial divergence between the Australian and Lesser Sunda Island populations (K-ST = 0.193), reduced genetic diversity (pi = 0.002), and higher levels of LD in the island Population suggest a strong but relatively recent founder event, which may have contributed to speciation between these populations as envisioned under founder-effect speciation models. Consistent with this hypothesis, we find that tinder a simple quantitative genetic model both drift and selection could have contributed to the observed divergence in six quantitative traits. In both Australian and Lesser Sundas populations, diversity in Z-linked loci was significantly lower than in autosomal loci. Our analysis provides a quantitative framework for studying the role of selection and drift in shaping patterns of molecular evolution in the zebra finch genome.
Pseudoautosomal regions (PARs) shared by avian Z and W sex chromosomes are typically small homologous regions within which recombination still occurs and are hypothesized to share the properties of autosomes. We capitalized on the unusual structure of the sex chromosomes of emus, Dromaius novaehollandiae, which consist almost entirely of PAR shared by both sex chromosomes, to test this hypothesis. We compared recombination, linkage disequilibrium (LD), GC content, and nucleotide diversity between pseudoautosomal and autosomal loci derived from 11 emu bacterial artificial chromosome (BAC) clones that were mapped to chromosomes by fluorescent in situ hybridization. Nucleotide diversity (pi = 4N(e)mu) was not significantly lower in pseudoautosomal loci (14 loci, 1.9 +/- 2.4 x 10(-3)) than autosomal loci (8 loci, 4.2 +/- 6.1 x 10(-3)). By contrast, recombination per site within BAC-end sequences (rho = 4Nc) (pseudoautosomal, 3.9 +/- 6.9 x 10(-2); autosomal, 2.3 +/- 3.7 x 10(-2)) was higher and average LD (D') (pseudoautosomal, 4.2 +/- 0.2 x 10(-1); autosomal, 4.7 +/- 0.5 x 10(-1)) slightly lower in pseudoautosomal sequences. We also report evidence of deviation from a simple neutral model in the PAR and in autosomal loci, possibly caused by departures from demographic equilibrium, such as population growth. This study provides a snapshot of the population genetics of avian sex chromosomes at an early stage of differentiation.
With the publication of the draft chicken genome and the recent production of several BAC clone libraries from non-avian reptiles and birds, it is now possible to undertake more detailed comparative genomic studies in Reptilia. Of interest in particular are the genomic events that transformed the large, repeat-rich genomes of mammals and non-avian reptiles into the minimalist chicken genome. We have used paired BAC end sequences (BESs) from the American alligator (Alligator mississippiensis), painted turtle (Chrysemys picta) and emu (Dromaius novaehollandiae) to investigate patterns of sequence divergence, gene and retroelement content, and microsynteny between these species and chicken.
From a total of 11,967 curated BESs, we successfully mapped 725, 773 and 2597 sequences in alligator, turtle, and emu, respectively, to sites in the draft chicken genome using a stringent BLAST protocol. Most commonly, sequences mapped to a single site in the chicken genome. Of 1675, 1828 and 2936 paired BESs obtained for alligator, turtle, and emu, respectively, a total of 34 (alligator, 2%), 24 (turtle, 1.3%) and 479 (emu, 16.3%) pairs were found to map with high confidence and in the correct orientation and with BAC-sized intermarker distances to single chicken chromosomes, including 25 such paired hits in emu mapping to the chicken Z chromosome. By determining the insert sizes of a subset of BAC clones from these three species, we also found a significant correlation between the intermarker distance in alligator and turtle and in chicken, with slopes as expected on the basis of the ratio of the genome sizes.
Our results suggest that a large number of small-scale chromosomal rearrangements and deletions in the lineage leading to chicken have drastically reduced the number of detected syntenies observed between the chicken and alligator, turtle, and emu genomes and imply that small deletions occurring widely throughout the genomes of reptilian and avian ancestors led to the ~50% reduction in genome size observed in birds compared to reptiles. We have also mapped and identified likely gene regions in hundreds of new BAC clones from these species.
In this review, we describe the history of amniote sex determination as a classic example of Darwinian evolution. We suggest that evolutionary changes in sex determination provide a foundation for understanding important aspects of chromosome and genome organization that otherwise appear haphazard in their origins and contents. Species with genotypic sex determination often possess heteromorphic sex chromosomes, whereas species with environmental sex determination lack them. Through a series of mutations followed by selection at key genes, sex-determining mechanisms have turned over many times throughout the amniote lineage. As a consequence, amniote genomes have undergone gains or losses of sex chromosomes. We review the genomic and ecological contexts in which either temperature-dependent or genotypic sex determination has evolved. Once genotypic sex determination emerges in a lineage, viviparity and heteromorphic sex chromosomes become more likely to evolve. For example, in extinct marine reptiles, genotypic sex determination apparently led to viviparity, which in turn facilitated their pelagic radiation. Sex chromosomes comprise genome regions that differ from autosomes in recombination rate, mutation rate, levels of polymorphism, and the presence of sex-determining and sexually antagonistic genes. In short, many aspects of amniote genome complexity, life history, and adaptive radiation appear contingent on evolutionary changes in sex-determining mechanisms.
A locus that we name SubA was discovered during large-scale sequencing and characterization of a bacterial artificial chromosome library from an emu, Dromaius novaehollandiae. This locus yields a significantly negative Tajimas D in emus and is conserved across emu, chicken, mouse, and human. Expression of SubA orthologs has been reported in human ovaries and in mouse testes, but remains unknown in emus. The locus was physically mapped onto a pair of microchromosomes in emus by fluorescent in situ hybridization and also in chicken as previously reported. By characterizing emu SubA in this article, we aim to improve current descriptions of the cascade of genes associated with avian sex differentiation. Future experimentation will report the expression of SubA in ratites, other birds, and nonavian reptiles.
Although the power of multi-locus data in estimating species trees is apparent, it is also clear that the analytical methodologies for doing so are still maturing. For example, of the methods currently available for estimating species trees from multiocus data, the Bayesian method introduced by Liu and Pearl (2007; BEST) is the only one that provides nodal support values. Using gene sequences from five nuclear loci, we explored two analytical methods (deep coalescence and BEST) to reconstruct the species tree of the five primary Manacus OTUs: M. aurantiacus, M. candei, M. vitellinus, populations of M. manacus from west of the Andes (M. manacus (w)), and populations of M. manacus from east of the Andes (M. manacus (e)). Both BEST and deep coalescence supported a sister relationship between M. vitellinus and M. manacus (w). A lower probability tree from the BEST analysis and one of the most parsimonious deep coalescence trees also supported a sister relationship between M. candei and M. aurantiacus. Because hybrid zones connect the distributions of most Manacus species, we examined the potential influence of post-divergence gene flow on the sister relationship of parapatrically distributed M. vitellinus and M. manacus (w). An isolation-with-migration (IM) analysis found relatively high levels of gene flow between M. vitellinus and M. manacus (w). Whether the gene flow is obscuring a true sister relationship between M. manacus (w) and M. manacus (e) remained unclear, pointing to the need for more detailed models accommodating multispecies, multilocus DNA sequence data.
Multilocus analysis of phylogeography and population history is a powerful tool for understanding the origin, dispersal, and geographic structure of species over time and space. Using 36 genetic markers (29 newly developed anonymous nuclear loci, six introns and one from mitochondrial DNA, amounting to over 15 kb per individual), we studied population structure and demographic history of the red-backed fairy wren Malurus melanocephalus, a small passerine distributed in the northern and eastern part of Australia across the Carpentarian barrier. Analysis of anonymous loci markers revealed large amounts of genetic diversity (pi = 0.016 +/- 0.01; average number of SNPs per locus = 48; total number of SNPs = 1395), and neither nuclear nor mitochondrial gene trees showed evidence of reciprocal monophyly among Cape York (CY), Eastern Forest (EF), and Top End (TE) populations. Despite traditional taxonomy linking TE and CY populations to the exclusion of EF, we found that the CY population is genetically closer to the EF population, consistent with predicted area cladograms in this region. Multilocus coalescent analysis suggests that the CY population was separated from the other two regions approximately 0.27 million years ago, and that significant gene flow between the ER and the CY populations (similar to 2 migrants per generation) suggests geographic continuity in eastern Australia. By contrast, gene flow between the CY and the TE populations has been dampened by divergence across the Carpentarian barrier.
Characterization of reptilian genomes is essential for understanding the overall diversity and evolution of amniote genomes, because reptiles, which include birds, constitute a major fraction of the amniote evolutionary tree. To better understand the evolution and diversity of genomic characteristics in Reptilia, we conducted comparative analyses of online sequence data from Alligator mississippiensis (alligator) and Sphenodon punctatus (tuatara) as well as genome size and karyological data from a wide range of reptilian species. At the whole-genome and chromosomal tiers of organization, we find that reptilian genome size distribution is consistent with a model of continuous gradual evolution while genomic compartmentalization, as manifested in the number of microchromosomes and macrochromosomes, appears to have undergone early rapid change. At the sequence level, the third genomic tier, we find that exon size in Alligator is distributed in a pattern matching that of exons in Gallus (chicken), especially in the 101200 bp size class. A small spike in the fraction of exons in the 301 bp1 kb size class is also observed for Alligator, but more so for Sphenodon. For introns, we find that members of Reptilia have a larger fraction of introns within the 101 bp2 kb size class and a lower fraction of introns within the 530 kb size class than do mammals. These findings suggest that the mode of reptilian genome evolution varies across three hierarchical levels of the genome, a pattern consistent with a mosaic model of genomic evolution.