Avian influenza A virus (an orthomyxovirus) is a zoonotic pathogen with a natural reservoir entirely in birds. The influenza virus genome is an 8-segment single-stranded RNA with high potential for in situ recombination. Two segments code for the hemagglutinin (H) and neuraminidase (N) antigens used for host-cell entry. At present, 16 H and 9 N subtypes are known, for a total of 144 possible different influenza subtypes, each with potentially different host susceptibility. With >10,000 species of birds found in nearly every terrestrial and aquatic habitat, there are few places on earth where birds cannot be found. The avian immune system differs from that of humans in several important features, including asynchronous B and T lymphocyte systems and a polymorphic multigene immune complex, but little is known about the immunogenetics of pathogenic response. Postbreeding dispersal and migration and a naturally high degree of environmental vagility mean that wild birds have the potential to be vectors that transmit highly pathogenic variants great distances from the original sources of infection.
The vast majority of phylogenetic models focus on resolution of gene trees, despite the fact that phylogenies of species in which gene trees are embedded are of primary interest. We analyze a Bayesian model for estimating species trees that accounts for the stochastic variation expected for gene trees from multiple unlinked loci sampled from a single species history after a coalescent process. Application of the model to a 106-gene data set from yeast shows that the set of gene trees recovered by statistically acknowledging the shared but unknown species tree from which gene trees are sampled is much reduced compared with treating the history of each locus independently of an overarching species tree. The analysis also yields a concentrated posterior distribution of the yeast species tree whose mode is congruent with the concatenated gene tree but can do so with less than half the loci required by the concatenation method. Using simulations, we show that, with large numbers of loci, highly resolved species trees can be estimated under conditions in which concatenation of sequence data will positively mislead phylogeny, and when the proportion of gene trees matching the species tree is <10%. However, when gene tree/species tree congruence is high, species trees can be resolved with just two or three loci. These results make accessible an alternative paradigm for combining data in phylogenomics that focuses attention on the singularity of species histories and away from the idiosyncrasies and multiplicities of individual gene histories.
The Andean uplift played important roles in the historical diversification of Neotropical organisms, both by producing new high-elevation habitats that could be colonized and by isolating organisms on either side of the mountains. Here, we present a molecular phylogeny of Thamnophlius antshrikes, a clade of 30 species whose collective distribution spans nearly the entirety of lowland habitats in tropical South America, the eastern slope foothills of the Andes, and the tepuis of northern South America. Our goal was to examine the role of the Andes in the diversification of lowland and foothill species. Using parsimony and Bayesian ancestral state reconstructions of a three-state distribution character (lowland-restricted, lowland-to-highland, highland-restricted), we found that the Andes were colonized twice independently and the tepuis once from lowland-restricted ancestors. Over the entire evolutionary history of Thamnophilus, the highest transition rates were between highland-restricted and lowland-to-highland distributions, with extremely low rates into and out of lowland-restricted distributions. This pattern suggests lowland-restricted distributions are limited not by physiological constraints, but by other forces, such as competition. These results highlight the need for additional comparative studies in elucidating processes associated with the colonization of high-elevation habitats and the differentiation of populations within them.
Avian genomes are small and streamlined compared with those of other amniotes by virtue of having fewer repetitive elements and less non-coding DNA(1,2). This condition has been suggested to represent a key adaptation for flight in birds, by reducing the metabolic costs associated with having large genome and cell sizes(3,4). However, the evolution of genome architecture in birds, or any other lineage, is difficult to study because genomic information is often absent for long-extinct relatives. Here we use a novel bayesian comparative method to show that bone-cell size correlates well with genome size in extant vertebrates, and hence use this relationship to estimate the genome sizes of 31 species of extinct dinosaur, including several species of extinct birds. Our results indicate that the small genomes typically associated with avian flight evolved in the saurischian dinosaur lineage between 230 and 250 million years ago, long before this lineage gave rise to the first birds. By comparison, ornithischian dinosaurs are inferred to have had much larger genomes, which were probably typical for ancestral Dinosauria. Using comparative genomic data, we estimate that genome-wide interspersed mobile elements, a class of repetitive DNA, comprised 5 - 12% of the total genome size in the saurischian dinosaur lineage, but was 7 - 19% of total genome size in ornithischian dinosaurs, suggesting that repetitive elements became less active in the saurischian lineage. These genomic characteristics should be added to the list of attributes previously considered avian but now thought to have arisen in non-avian dinosaurs, such as feathers(5), pulmonary innovations 6, and parental care and nesting
We report results of a megabase-scale phylogenomic analysis of the Reptilia, the sister group of mammals. Large-scale end-sequence scanning of genomic clones of a turtle, alligator, and lizard reveals diverse, mammal-like landscapes of retroelements and simple sequence repeats (SSRs) not found in the chicken. Several global genomic traits, including distinctive phylogenetic lineages of CR1-like long interspersed elements (LINEs) and a paucity of A-T rich SSRs, characterize turtles and archosaur genomes, whereas higher frequencies of tandem repeats and a lower global GC content reveal mammal-like features in Anolis. Nonavian reptile genomes also possess a high frequency of diverse and novel 50-bp unit tandem duplications not found in chicken or mammals. The frequency distributions of approximately 65,000 8-mer oligonucleotides suggest that rates of DNA-word frequency change are an order of magnitude slower in reptiles than in mammals. These results suggest a diverse array of interspersed and SSRs in the common ancestor of amniotes and a genomic conservatism and gradual loss of retroelements in reptiles that culminated in the minimalist chicken genome. The sequences reported in this paper have been deposited in the GenBank database (accession nos. CZ 250707-CZ 257443 and DX 390731-DX 389174).
We used ancient DNA analysis of seven museum specimens of the endangered North American ivory-billed woodpecker (Campephilus principalis) and three specimens of the species from Cuba to document their degree of differentiation and their relationships to other Campephilus woodpeckers. Analysis of these mtDNA sequences reveals that the Cuban and North American ivory bills, along with the imperial woodpecker (Campephilus imperialis) of Mexico, are a monophyletic group and are roughly equidistant genetically, suggesting each lineage may be a separate species. Application of both internal and external rate calibrations indicates that the three lineages split more than one million years ago, in the Mid-Pleistocene. We thus can exclude the hypothesis that Native Americans introduced North American ivory-billed woodpeckers to Cuba. Our sequences of all three woodpeckers also provide an important DNA barcoding resource for identification of non-invasive samples or remains of these critically endangered and charismatic woodpeckers.
The tuatara (Sphenodon punctatus) is of "extraordinary biological interest" as the most distinctive surviving reptilian lineage (Rhyncocephalia) in the world. To provide a genomic resource for an understanding of genome evolution in reptiles, and as part of a larger project to produce genomic resources for various reptiles (evogen.jgi.doe.gov/second_levels/BACs/our_libraries.html), a large-insert bacterial artificial chromosome (BAC) library from a male tuatara was constructed. The library consists of 215 424 individual clones whose average insert size was empirically determined to be 145 kb, yielding a genomic coverage of approximately 6.3x. A BAC-end sequencing analysis of 121 420 bp of sequence revealed a genomic GC content of 46.8%, among the highest observed thus far for vertebrates, and identified several short interspersed repetitive elements (mammalian interspersed repeat-type repeats) and long interspersed repetitive elements, including chicken repeat 1 element. Finally, as a quality control measure the arrayed library was screened with probes corresponding to 2 conserved noncoding regions of the candidate sex-determining gene DMRT1 and the DM domain of the related DMRT2 gene. A deep coverage contig spanning nearly 300 kb was generated, supporting the deep coverage and utility of the library for exploring tuatara genomics.
Molecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets.
Here we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers.
The signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis.
Multilocus genealogical approaches are still uncommon in phylogeography and historical demography, fields which have been dominated by microsatellite markers and mitochondrial DNA, particularly for vertebrates. Using 30 newly developed anonymous nuclear loci, we estimated population divergence times and ancestral population sizes of three closely related species of Australian grass finches (Poephila) distributed across two barriers in northern Australia. We verified that substitution rates were generally constant both among lineages and among loci, and that intralocus recombination was uncommon in our dataset, thereby satisfying two assumptions of our multilocus analysis. The reconstructed gene trees exhibited all three possible tree topologies and displayed considerable variation in coalescent times, yet this information provided the raw data for maximum likelihood and Bayesian estimation of population divergence times and ancestral population sizes. Estimates of these parameters were in close agreement with each other regardless of statistical approach and our Bayesian estimates were robust to prior assumptions. Our results suggest that black-throated finches (Poephila cincta) diverged from long-tailed finches (P. acuticauda and P. hecki) across the Carpentarian Barrier in northeastern Australia around 0.6 million years ago (mya), and that P. acuticauda diverged from P. hecki across the Kimberley Plateau–Arnhem Land Barrier in northwestern Australia approximately 0.3 mya. Bayesian 95% credibility intervals around these estimates strongly support Pleistocene timing for both speciation events, despite the fact that many gene divergences across the Carpentarian region clearly predated the Pleistocene. Estimates of ancestral effective population sizes for the basal ancestor and long-tailed finch ancestor were large (about 521,000 and about 384,000, respectively). Although the errors around the population size parameter estimates are considerable, they are the first for birds taking into account multiple sources of variance.
Hitchhiking phenomena and genetic recombination have important consequences for a variety of fields for which birds are model species, yet we know virtually nothing about naturally occurringrates of recombination or the extent of linkage disequilibrium in birds. We took advantage of apreviously sequenced cosmid clone from Red-winged Blackbirds (Agelaius phoeniceus) bearing a highly polymorphic Mhc class II gene, Agph-DAB1, to measure the extent of linkage disequilibrium acrossy 40 kb of genomic DNA and to determine whether non-coding nucleotide diversity was elevated as a result of physical proximity to a target of balancing selection. Application of coalescent theory predicts that the hitchhiking effect is enhanced by the larger effective population size of blackbirds compared with humans, despite the presumably higher rates of recombination in birds. We surveyed sequence polymorphism at three Mhc-linked loci occurring 1. 5–40 kb away from Agph-DAB1 and found that nucleotide diversity was indistinguishable from that found at three presumably unlinked, non-coding introns (b -actin intron 2, b-fibrinogen intron 7 and rhodopsin intron 2). Linkage disequilibrium as measured by Lewontin’s D’ was found only across a few hundred base pairs within any given locus, and was not detectable among any Mhc-linked loci. Estimated rates of the per site recombination rater derived from three different analytical methods suggest that the amounts of recombination in blackbirds are up to two orders of magnitude higher than in humans, a discrepancy that cannot be explained entirely by the higher effective population size of blackbirds relative to humans. In addition, the ratio of the number of estimated recombination events per mutation frequently exceeds 1, as in Drosophila, again much higher than estimates in humans. Although the confidence limits of the blackbird estimates themselves span an order of magnitude, these data suggest that in blackbirds the hitchhiking effect for this region is negligible and may imply that the per site per individual recombination rate is high, resembling those of Drosophila more than those of humans.