molecular evolution - washington university...

62
Molecular Evolution Justin Fay Center for Genome Sciences Department of Genetics 4515 McKinley Ave. Rm 4305 [email protected]

Upload: others

Post on 03-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Molecular Evolution

Justin FayCenter for Genome Sciences

Department of Genetics4515 McKinley Ave. Rm 4305

[email protected]

Page 2: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Molecular evolution is the study of the cause and effects of

evolutionary changes in molecules

PhylogeneticsDivergence timesComparative Genomics(mutation and selection)

Species 1 GGCAGTGACATTTTCTAACGCGAAGGTACTTSpecies 2 GGCAGCGCCATTTTCTAATGCGAGGGTACTTSpecies 3 GGCAGCGCCATTGTCTAATGCGAGGGTACTT

***** * **** ***** **** *******

ArcheaHuman-chimp-neanderthalUltraconserved sequencesENCODE, FOXP2

Page 3: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Origins of Molecular EvolutionInsulin was the first protein sequenced in 1955 for which Fred Sanger received the Nobel prize. Cytochrome C protein sequence (Margoliash et al. 1961).

The sequencing of the same proteins from different species established a number of key principles of molecular evolution:

1. Most proteins are highly conserved and changes that do occur are not found within functionally important sites. For example human diabetics were treated with insulin purified from pigs and cows.

2. The rate of amino acid substitution is constant across phylogenetic lineages.

Molecular clock - the rate of amino acid or nucleotide substitution is constant per year across phylogenetic lineages (Zuckerkandl and Pauling 1962). Controversial but revolutionized phylogenetics and set the stage for the neutral theory.

Neutral theory or neutral mutation random drift hypothesis - the vast majority of mutations that become polymorphic in a population and fixed between species are not driven by Darwinian selection but are neutral or nearly neutral with respect to fitness (Kimura 1968; King and Jukes 1969). The neutral theory is dead; long live the neutral theory.

Page 4: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Not all amino acid changes are equal

Grantham's Distance – carbon-composition, polarity, volume, weight

Page 5: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Amino Acid Substitution ModelsPAM (Point Accepted Mutation, 1966) matrix was developed by Margaret Dayhoff. PAM1 matrix estimates what rate of substitution would be expected if 1% of the amino acids had changed. (Global alignments)

BLOSUM (BLOck SUbstitution Matrix, 1992) was developed by Henikoff and Henikoff. PAM didn't do well at modeling sequence changes over long evolutionary time scales since these are not well approximated by compounding small changes that occur over short time scales. The probabilities used in the matrix calculation are computed by looking at "blocks" of conserved sequences found in multiple protein alignments. Sequence with percent identity above a certain threshold are downweighted, e.g. BLOSUM62 which is used for BLASTP. (Local alignments)

Page 6: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Nucleotide Substitution Models

Nucleotide substitution models correct for multiple hits

A G

C T

Purines

Pyrimidines

Jukes and Cantor (JC69) Model (1969)

Assumptions of JC model. 1) Equal base frequencies2) Equal mutation rates between the bases3) Constant mutation rate4) No selection

Page 7: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Jukes Cantor Model

p = 3/31 = 0.097K = 0.104 substitutions per site

Page 8: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Other nucleotide substitution models

Model Assumption Free Parameters

Reference

JC69 A=G=C=Tts=tv

1 Jukes & Cantor 1969

K80 A=G=C=T 2 Kimura 1980

F81 ts=tv 4 Felsenstein 1980

HKY85 5 Hasegawa, Kishino & Yano

GTR unequal rates 9 Tavare 1986

Page 9: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Difference between mutation rate and substitution rate.

Time

Popu

latio

n fr

eque

ncy

Mutation rate the chance of a mutation occurring in each generation or cell division (does NOT depend on selection)

Substitution rate the frequency at which mutations become fixed within a population (depends on selection)

Substitution rate = mutation rate * fixation probability * timeFixation probability depends on selection

Page 10: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Substitution Rates with Selection

No selection: The substitution rate between two species is K = 2t.

Selection:

S.cerevisiae S.paradoxus

t

P=1−e

−4Ne sq

1−e−4Ne s

Substitution rate = mutation rate * fixation probability * time

The substitution rate for neutral mutations = 2Nµ * 1/2N * t = µtThe substitution rate for adaptive mutations = 2Nµ * 2s * t = 4Nsµt for 4Ns > 1

Page 11: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Rapidly Evolving Genes (dN/dS)

Detecting selection using the nucleotide substitution rateSynonymous change - mutation that does not change the amino

acid sequence of a protein. Nonsynonymous change - mutation that changes the amino acid

sequence of a protein.

Table 1. The genetic code.Codon AA Codon AA Codon AA Codon AATTT Phe TCT Ser TAT Tyr TGT CysTTC Phe TCC Ser TAC Tyr TGC CysTTA Leu TCA Ser TAA Stop TGA StopTTG Leu TCG Ser TAG Stop TGG Trp

CTT Leu CCT Pro CAT His CGT ArgCTC Leu CCC Pro CAC His CGC ArgCTA Leu CCA Pro CAA Gln CGA ArgCTG Leu CCG Pro CAG Gln CGG Arg

ATT Ile ACT Thr AAT Asn AGT SerATC Ile ACC Thr AAC Asn AGC SerATA Ile ACA Thr AAA Lys AGA ArgATG Met ACG Thr AAG Lys AGG Arg

GTT Val GCT Ala GAT Asp GGT GlyGTC Val GCC Ala GAC Asp GGC GlyGTA Val GCA Ala GAA Glu GGA GlyGTG Val GCG Ala GAG Glu GGG Gly

dN or Ka = the nonsynonymous substitution rate = # nonsynonymous changes / # nonsynonymous sites.dS or Ks = the synonymous substitution rate = # synonymous changes / # synonymous sites.

Interpretation of dN/dS ratios (assuming synonymous sites areneutral):

dN/dS = 1No constraint on protein sequence, i.e. nonsynonymouschanges are neutral.

dN/dS < 1Functional constraint on the protein sequence, i.e.nonsynonymous mutations are deleterious.

dN/dS > 1Change in the function of the protein sequence, i.e.nonsynonymous mutations are adaptive.

Page 12: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Rapidly Evolving Genes

Nayak et al. 2005

dN increased by positive selectiondN decreased by negative selectionProblem: dN may be influenced by both and still be less than dS

Page 13: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

BRCA1 sliding window Ka/Ks analysis

Page 14: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Branch Model (dN/dS)(rate heterogeneity)

15 copies in humanVary in copy in other primates

Johnson et al. 2001

Page 15: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Site Model (dN/dS)

● Positive selection on the egg receptor (VERL) for abalone sperm lysin.

● VERL – lysin are a lock and key for fertilization.

● Co-evolution by sexual selection, conflict or microbial attack.

Gilando et al. 2003

Sites – methodsMaximum Parsimony (Suzuki)Maximum Likelihood (PAML, HyPhy)

Page 16: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Codon models

αs = synonymous rate

βs = nonsynonymous rate

R = tv/ts

πny

= frequency of target nucleotiden in codon y

Page 17: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Models for the Evolution ofTranscription Factor Binding Sites

● Sequence ~ binding affinity (Schneider et al. 1986, Berg and von Hippel 1987)

● Binding affinity ~ fitness (Gerland and Hwa 2002, Sengupta et al. 2002)

● Fitness ~ substitution rate (Moses et al. 2004)

Kimura 1962

Bulmer 1991

Moses et al. 2004

Page 18: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Molecular Evolution(Comparative Genomics)

1. Conservation

Annotation of genes, regulatory sequences and other functional elements

Functional sequences will remain conserved across distantly related species whereas non-functional sequences will accumulate changes

2. Divergence

Evolution of genes, regulatory sequences and other functional elements

Species-specific functional sequences

Functional sequences with new or modified functions

Page 19: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Conserved sequences

Human-Mouse conservation

Species Conserved* Conserved Noncoding(non-repetitive aligned)

Reference

Humans 3-8% 21% Waterston et al. (2002)

Worms 18-37% 18% Shabalina & Kondrashov (1999)

Flies 37-53% 40-70% Andolfatto (2005)

Yeast 47-68% 30-40% Chin et al. (2005), Doniger et al. (2005)

*Siepel et al. (2005)

Page 20: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Deletion and expression assays of conserved noncoding sequences

Pennacchio et al. 2006 Yun et al. 2012

Page 21: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Scan for positively selected genes using the branch-site model

Koisol et al. (2014)

Page 22: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Models of molecular evolution

Key Assumptions:

➔Tree is correct➔Alignments are correct➔Sites are independent➔Stationarity and time reversibility➔Mutational & selection parameters

Page 23: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Phylogenetics Methods

2 1 13 3 14 15 35 954 105

10 34,459,425 2,027,025

Table 1. Number of possible rooted and unrooted trees.

Number of sequences

Number of rooted trees

Number of unrooted trees

Taxonomists have long debated phylogenetic methods.

There are many types of methods:

Character state methods (also called cladistic methods), like parsimony.

Distance or similarity based methods (also called phenetic methods), like UPGMA.

Maximum likelihood and Bayesian Methods.

Parsimony (non-parametric) and Maximum likelihood (parametric) are both used when phylogeny is critical.

Software:

PAUPPHYLIPMEGAMrBayes

D

A

C

B

Table 2. Distance matrix.Sequence A B CAB d(AB)C d(AC) d(BC)D d(AD) d(BD) d(CD)Each d is the distance (substitution rate) between pairs of sequences

Page 24: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Gene trees vs Species trees

1. Orthology2. Independence (no concerted evolution or horizontal transfer)

Orthologs are genes created by speciation events. Paralogs are genes created by duplication events.Homologs are genes that are similar because of shared ancestry.

Speciation

Duplication

Species 1     Species 2

Orthologues and paralogues can be distinguished by i) synteny or ii) phylogeny.

Page 25: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Gene Conversion and Horizontal Gene Transfer

Locus 1 Chr02

Locus 2 Chr14

HHT2 HHF2

HHF1 HHT1

No conversion(true phylogeny)

Gene Conversion

Species tree

Vertebrate toBacteria

Bacteria toVertebrate

Page 26: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Alignment Accuracy & Coverage

Pollard et al. 2004

No indels No indelsIndels Indels

No constraint

Constraint

Page 27: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Alignment differences gp120 HIV/SIV

ClustalW alignment PRANK alignment(phylogeny aware)

EDFLEKVL

E--DFLEKV--L

Two subsvs

Two indels

Page 28: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Detection of positive selection depends on the alignment

Markova-Raina and Petrov (2011)

Page 29: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Mutation rate variation

● Transitions vs. Transversions – transitions occur twice as often as transversions

● CpG - Spontaneous deamination of 5-methylcytosine results in thymine and ammonia, 20x higher rate of transition

● 28% of mutations are transitions at CpG sites but only 3.5% of sites are CpG

● Genomic position (5-10%)● Age, sex (2 – 10 fold)● Repeats (polynucleotides, microsatellites)● Transcription, chromatin

Page 30: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Types of Mutations - WGS

Single nucleotideTranspositionsDuplicationsInsertion/DeletionRearrangement

G/C to A/T 2.9-fold higherthan reverse! Predicts 74% AT content

Page 31: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Substitution rate as a function ofGC content

Page 32: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Biased Gene ConversionAT to GC bias

Recombination occurs in hotspotsRecombination hotspots evolve rapidlyBiased gene conversion occurs in bursts (non-equilibrium)

Page 33: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Recombination and predicted equilibrium GC frequency

Page 34: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Codon Bias

Page 35: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Measures of Codon BiasCAI – codon adaptive index based on relative usage of the codon to the most abundant codon for an amino acid

Fop – frequency of the optimal codon

ENC – effective number of codons based on the deviation from equal usage

Explanation of Codon BiasBias towards GC ending codons that is not found in adjacent noncoding regions

Correlates with highly expressed genes

Correlates with tRNA abundance

Explanations: translational accuracy/speed, protein misfolding

Page 36: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Codon Bias is correlated with Synonymous Substitution Rate

Page 37: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Codon Bias correlation depends on distance

Page 38: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Correlomics

r (Interaction ~ Fitness) = 0.15, P = 3.4x10e-13

r (Fitness ~ Evolutionary rate) = -0.13, P = 4.3x10e-7

r (Interactions ~ Evolutionary rate) = -0.24, P = 0.002

Page 39: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Spurious (strong) correlations

Page 40: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Significance and effect sizeStatistical significance (a low P value) measures how certain we are that a given effect exists.Effect size measures the magnitude of an effect.

r = 0.10, P < 1e-16 A squared correlation coefficient below 0.1 (r < 0.3) means the effect is pretty much non-existent, regardless of how low the P value is.

Claus Wilke, UT-Austin (Blog 2013)

Page 41: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Gene expression predictsthe rate of evolution

Page 42: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Polymorphisms vs Divergence

P ( SNP | conserved amino acid )

P ( SNP | conserved transcription factor binding site )

Page 43: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Methods for Predicting Human Disease Mutations

SIFT: Ng P, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 11: 863-874.

PolyPhen: Sunyaev S, Ramensky V, Koch I, Lathe W, Kondrashov A et al. (2001) Prediction of deleterious human alleles. Hum Mol Genet 10: 591-597.

Method True Positive False Positives

SIFT 69% 20%

PolyPhen 69% 9%

Disease mutations

Conservedsites

2.2% of human diseasealleles are WT in mouse

Page 44: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Likelihood Ratio Test

human GYCF G AQEQ chimp GYCF G AQEQ orangutan GYCF G AQEQ rhesus GYCF G AQEQ bushbaby GYCF G VQEQ treeshrew GYCF G VQEQ rat GYCF G VQEQ mouse GYCF G VQEQ squirrel GYCF G VQEQ guineapig GYCF G VQEQ dog GYCF G IQEQ cat GYCF G VQEQ horse GYCF G VQEQ cow GYCF G VQEQ microbat GYCF G VQEQ armadillo GYCF G VQEQ opossum GYCF G VAEQ platypus GYGF G EQEQ frog GFCF G ETKQ tetraodon GCCF G NLEE stickleback GYCF G DGEE medaka GYCF G DLEE zebrafish GYCF G DLEE

Pla

cent

als

Fis

h

Non

-pla

cent

alm

amm

als

Chi

cken

Fro

g

32 vertebrate species18,993 alignmentsdS = 12.2 subs/site

Page 45: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Tons of Deleterious Mutations

Chun and Fay (2009)

Page 46: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Most Deleterious SNPs are Rare

Page 47: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Three Methods Applied to Venter

Method Tested (%) Deleterious (%)

SIFT 5,401 (72%) 890 (16%)

PolyPhen 6,746 (90%) 555 (8.2%) probably768 (11%) possibly

LRT 5,645 (75%) 796 (14%)

7,534 High Quality NSN SNPs in Venter Genome

Page 48: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Disturbing Overlap Among Three Methods

LRT

PolyPhen SIFT

28%

5%

6%

10%

30%

3%

18%

7,534 NSN SNPs in Venter Genome1,735 SNPs predicted deleterious by any one of the three methods

Page 49: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Human disease associated SNPs

Chen et al. 2010

21,429 disease-associated SNPs (2,113 publications)5,270 in HapMap3

Page 50: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Deleterious SNPs in Coding and Noncoding Sequences

Doniger et al. (2008)

Class Method Deleterious SNPs

False Discovery Rate

Coding(n = 15,378)

Codon method (LRT)

1,472 1%

Coding(n = 15,378)

SIFT 970 NA

Noncoding(n = 20,714)

Phylonet motif 1,643 6%

Noncoding(n = 20,714)

Transcription factor binding site model

383 20%

776 (47%)

163 (9%)

Page 51: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Deleterious SNPs are Associated with Allele-Specific Expression

A

T

Doniger et al. (2008)

Class Tested Significant Percent

Phylonet Motif Only

56 23 41%

Binding SiteOnly

25 9 36%

Phylonet and Binding Site

28 5 18%

Not Conserved 72 14 19%

Phastcons

2

Binding Site Phylonet

3

146

3

3

9

Page 52: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Example: Disruption of a Phylonet motif in GPB2

Doniger et al. (2008)

Page 53: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Conservation of GWAS SNPs

Dudley et al. (2012)

High-confidence

Page 54: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

GWAS SNPsOR vs. Conservation

Dudley et al. (2012)

Page 55: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Binding Site Turnover and Rewiring

Binding Site Turnover Rewiring

Page 56: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Two Problems: False Positives & False Negatives

Clo

se (

sens

u st

rictu

)

Dis

tant

S. cerevisiaeGene 1

S. cerevisiaeGene 2

Gene 1Blast hit

Gene 2Blast hit

S. cerevisiaeIntergenic

Syntenic Intergenic

Page 57: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

No Significant Sequence Homology Between Distantly Related Syntenic

Intergenic Regions

Spar Smik Skud Sbay Scas Cgla Kpol Zrou Kthe Kwal Sklu Klac Agos0

10

20

30

40

50

60

70

80

90

Pe

rce

nt

Ide

ntit

y w

ith S

. ce

revi

sia

e

Venkataram (unpublished)

Observed

Randomized

Species

Page 58: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Significant Yet Depleted Conservation of Binding Sites in Distantly Related

Species

Scer Spar Smik Skud Sbay Scas Cgla Kpol Zrou Kthe Kwal Sklu Klac Agos0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Fra

ctio

n o

f b

ind

ing

site

s fo

un

d Observed

Randomized

Species

Page 59: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Binding site turnover

Wratten et al. (2006)M. domestica vs. D. melanogaster

Sequence divergence ~ Regulatory divergence

Page 60: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

MLS1

Mig1

TATASip4Abf1

Bergen et al. (2016)

Nucleosomes

Page 61: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Bergen et al. (2016)

Page 62: Molecular Evolution - Washington University Geneticsgenetics.wustl.edu/.../04/SP2017_Molecular_evolution.pdf · 2017-04-10 · Molecular Evolution Justin Fay Center for Genome Sciences

Bergen et al. (2016)