ESSE
Reference
Quintero, I., Landis, M. J., Jetz, W., & Morlon, H. (2022). The build-up of the present-day tropical diversity of tetrapods. Proceedings of the National Academy of Sciences, 2023. 120 (20) e2220672120. https://doi.org/10.1073/pnas.2220672120
Example
In this example we run some random data for 50 species across 2 areas, where each area has a specific covariate that affects speciation and we also allow for 2 hidden states.
Open Julia and load the Tapestree package:
using TapestreeSpecify the path to the phylogenetic tree (in a format that ape::read.tree() can read):
tree_file = joinpath(dirname(pathof(Tapestree)), "..", "data", "tree_50.tre")Specify state data. Data should be a .txt file where each row is a species, first the species name that matches the tree tip labels and each subsequent column specifying presence or absence in a given area with 0 or 1, respecitvely. Open st2_data.txt in the data folder to see an example for 2 areas.
states_file = joinpath(dirname(pathof(Tapestree)), "..", "data", "st2_data.txt")Specify covariate data $y = f(x)$. Data should be a .txt file where the first column is time $x$ in backward fashion (the present is $0$ and the past is $> 0$), and the subsequent columns are the respective time covariates for each area $f(x)$. If there is only one covariate, the same is used across all areas, if not, the number of covariates should match the specific model. More than one covariate per area is allowed, and in the case of covariates affecting colonization rates, they should match the number of possible colonization parameters between all areas. Open env_data_2.txt in the data folder to see an example for 2 covariates for 2 areas.
envdata_file = joinpath(dirname(pathof(Tapestree)), "..", "data", "env_data_2.txt")Specify output MCMC file (homedir() is an alias to your home folder)
out_file = *(homedir(),"...")Specify the (optional) output file (homedir() is an alias to your home folder) for the node marginal probabilities.
out_states = *(homedir(),"...")Run the esse() (ESSE: Environmental and State dependent Speciation and Extinction) model, with covariates affecting speciation rates:
esse(tree_file, out_file, 2, envdata_file = envdata_file,
states_file = states_file, out_states = out_states, cov_mod = ("s",))If one would like to make the covariates affect other rates, such as dispersal, in addition to speciation rates, one would specify the following covariate model cov_mod = ("s","g"). Note however that this has not been validated. Moreover, covariate effect on extinction is non retrievable from extant-only phylogenetic trees.
Parallel MC3
It is encouraged to use Metropolis coupled MCMC (MC3) for more robust convergence (the posterior surfaced is highly peaked).
Load the Distributed package, set the number of processors for Julia, and make Tapestree available to all (see the Distributed package for more information). Below we add 3 processors.
using Distributed
addprocs(3)
@everywhere using TapestreeSet parameter constraints
To constrain parameters to be equal to one another or to fix them to be 0, it is necessary to create equalities between parameters and pass them as an argument. In general parameters are specified as follows: "<parameter name>_<area>_<hidden state>"
- Speciation is "lambda" (e.g. speciation rate for area A and hidden state 0:
lambda_A_0) - Local extinction is "loss" (e.g. local extinction rate for area B and hidden state 1:
loss_B_1) - colonization is "gain" (e.g. gain rate from A -> B and hidden state 1:
gain_AB_1) - the effect of the covariate is "beta_<effect parameter name>" (e.g. effect of first covariate on speciation in area A and hidden state 0:
beta_lambda_1_A_0- here the 1 after the lambda is because is the first covariate) - transition between hidden states is "q" (e.g. transition rate from hidden state 0 -> 1 and hidden state 1:
q_01).
You have to make sure that the given constraints apply to the specification of model. In the following example, we constraint local and global extinction rates to be the same across 2 areas and 2 hidden states , and constrain the hidden states transition rates.
cpar = ("q_01 = q_10",
"loss_A_0 = mu_A_0",
"loss_B_0 = mu_B_0",
"loss_A_1 = mu_A_1",
"loss_B_1 = mu_B_1")We now run esse, using MC3 with Metropolis-Hastings MCMC (here using 3 parallel chains).
esse(tree_file, out_file, 2,
envdata_file = envdata_file,
states_file = states_file,
out_states = out_states,
constraints = cpar,
cov_mod = ("s",),
ncch = 3,
parallel = true,
niter = 5_000,
nthin = 100,
dt = 0.8,
nburn = 1_000,
mc = "mh",
node_ps = (true, 100))For optional (keyword) arguments, see below
Function Documentation
Tapestree.ESSE.esse — Functionesse(tree_file ::String,
out_file ::String,
h ::Int64;
states_file ::String = "NaN",
envdata_file::String = "NaN",
cov_mod ::NTuple{M,String} = ("",),
node_ps ::Tuple{Bool,Int64} = (true, 10),
out_states ::String = "",
constraints ::NTuple{N,String} = (" ",),
mvpars ::NTuple{O,String} = (" ",),
niter ::Int64 = 10_000,
nthin ::Int64 = 10,
nburn ::Int64 = 200,
tune_int ::Int64 = 100,
nswap ::Int64 = 10,
ncch ::Int64 = 1,
parallel ::Bool = ncch > 1,
dt ::Float64 = 0.2,
ntakew ::Int64 = 100,
winit ::Float64 = 2.0,
scale_y ::NTuple{2,Bool} = (true, false),
algorithm ::String = "pruning",
mc ::String = "slice",
λpriors ::Float64 = .1,
μpriors ::Float64 = .1,
gpriors ::Float64 = .1,
lpriors ::Float64 = .1,
qpriors ::Float64 = .1,
βpriors ::NTuple{2,Float64} = (0.0, 10.0),
hpriors ::Float64 = .1,
optimal_w ::Float64 = 0.8,
tni ::Float64 = 1.0,
obj_ar ::Float64 = 0.6,
screen_print::Int64 = 5,
Eδt ::Float64 = 1e-3,
ti ::Float64 = 0.0,
ρ ::Array{Float64,1} = [1.0]) where {M,N,O}Run geographic esse. See tutorial for how these files should be specified.
Arguments
tree_file::String: full path to tree file.out_file::String: full path to write MCMC output.h::Int64: number of hidden states.states_file ::String = "NaN": full path to states file. If"NaN", no
observed states are used (only hidden states).
envdata_file::String = "NaN": full path to covariates file. If"NaN", no
covariates are used (i.e., constant rates).
cov_mod::NTuple{M,String} = ("",): specifies which rates are affected by
covariates: s for speciation, e for extinction, and g for colonization. More than 1 is possible.
node_ps::Tuple{Bool,Int64} = (true, 10): first index specifies if posterior
marginal probabilities for nodes should be computed, second index the number of iterations to be computed.
out_states::String = "": full path to write node probabilities output.constraints::NTuple{N,String} = (" ",): constraints for the model
parameters.
mvpars::NTuple{O,String} = (" ",): which parameters should be multivariate
when using slice sampling for better convergence.
niter::Int64 = 10_000: number of iterations.nthin::Int64 = 10: frequency at which to record MCMC state.nburn::Int64 = 200: number of iterations to discard as burn-in.tune_int::Int64 = 100: number of iterations duringnburnto tune proposal
window for MH.
nswap::Int64 = 10: every iteration to try to swap chain likelihoods in MC3.ncch::Int64 = 1: number of chains.parallel::Bool = false: if parallel run.dt::Float64 = 0.2: temperature for MC3.ntakew::Int64 = 100: number of iterations fromnburnto tune the window
for slice sampling.
winit::Float64 = 2.0: initial window for slice sampling.scale_y::NTuple{2,Bool} = (true, false): first index if scale covariatesy
to [0,1], second, if scale covariates y all together between [0,1].
algorithm::String = "pruning": likelihood algorithm betweenpruning
(recommended) or flow.
mc::String = "slice": which samplingslice(slice-sampling) or
mh (metropolis-hasting).
λpriors::Float64 = 0.1: rate of Exponential prior for speciation.μpriors::Float64 = 0.1: rate of Exponential prior for global extinction.gpriors::Float64 = 0.1: rate of Exponential prior for colonization.lpriors::Float64 = 0.1: rate of Exponential prior for local extinction.qpriors::Float64 = 0.1: rate of Exponential prior for hidden state
transitions.
βpriors::NTuple{2,Float64} = (0.0, 10.0): mean and variance of Normal
prior for effect of covariates.
hpriors::Float64 = 0.1: rate of Exponential prior for differences between
hidden states.
optimal_w::Float64 = 0.8: optimal window.tni::Float64 = 1.0: initial tuning for rates.obj_ar::Float64 = 0.23: objective acceptance rate.screen_print::Int64 = 5: seconds to wait to update screen log.Eδt::Float64 = 1e-3: for flow algorithm.ti::Float64 = 0.0: for flow algorithm.ρ::Array{Float64,1} = [1.0]: sampling fraction for each state (each area and
widespread).
Returned values
- Array of the mcmc parameters.
esse(tv ::Dict{Int64,Array{Float64,1}},
ed ::Array{Int64,2},
el ::Array{Float64,1},
x ::Array{Float64,1},
y ::Array{Float64,L},
cov_mod ::NTuple{M,String},
out_file ::String,
h ::Int64;
constraints ::NTuple{N,String} = (" ",),
mvpars ::NTuple{O,String} = ("lambda = beta",),
niter ::Int64 = 10_000,
nthin ::Int64 = 10,
nburn ::Int64 = 200,
ncch ::Int64 = 1,
ntakew ::Int64 = 100,
winit ::Float64 = 2.0,
scale_y ::NTuple{2,Bool} = (true, false),
algorithm ::String = "pruning",
λpriors ::Float64 = .1,
μpriors ::Float64 = .1,
gpriors ::Float64 = .1,
lpriors ::Float64 = .1,
qpriors ::Float64 = .1,
βpriors ::NTuple{2,Float64} = (0.0, 10.0),
hpriors ::Float64 = .1,
optimal_w ::Float64 = 0.8,
screen_print::Int64 = 5,
Eδt ::Float64 = 1e-3,
ti ::Float64 = 0.0,
ρ ::Array{Float64,1} = [1.0]) where {L,M,N,O}Wrapper for running a esse model from simulations.
Tapestree.ESSE.simulate_sse — Functionsimulate_sse(λ ::Array{Float64,1},
μ ::Array{Float64,1},
l ::Array{Float64,1},
g ::Array{Float64,1},
q ::Array{Float64,1},
t ::Float64;
δt ::Float64 = 1e-4,
ast ::Int64 = 0,
nspp_max ::Int64 = 100_000,
retry_ext ::Bool = true,
rejectel0 ::Bool = true,
verbose ::Bool = true,
rm_ext ::Bool = true,
states_only::Bool = false,
start ::Symbol = :crown)Simulate tree according to the geographic esse model. The number of areas and hidden states is inferred from the parameter vectors, but they must be consistent between them and with the covariates. See tutorial for an example.
Arguments
λ::Array{Float64,1}: rates for within-area and between-area speciation.μ::Array{Float64,1}: per-area extinction rates when it leads to global
extinction.
l::Array{Float64,1}: per-area extinction rates when it leads to local
extinction.
g::Array{Float64,1}: colonization rates between areas.q::Array{Float64,1}: transition rates between hidden states.t::Float64: simulation time.δt::Float64 = 1e-4: time step to perform simulations. Smaller more precise
but more computationally expensive.
ast::Int64 = 0: initial state.0specifies random sampling based on the
input parameters.
nspp_max::Int64 = 100_000: maximum number of species allowed to stop
simulation.
retry_ext::Bool = true: automatically restart simulation if simulation goes
extinct.
rejectel0::Bool = true: reject simulations where there are edges of 0
length.
verbose::Bool = true: print messages.rm_ext::Bool = true: remove extinct taxa from output.states_only::Bool = false: if only return tip states (faster).start::Symbol = :crown: ifcrown, starts after a speciation event with
two lineages, if stem, starts with one lineage.
Returned values
- Dictionary with tip number and corresponding state.
- Array with parent -> daughter edges.
- Array with edge lengths.
- Number of maximum species.
simulate_sse(λ ::Array{Float64,1},
μ ::Array{Float64,1},
l ::Array{Float64,1},
g ::Array{Float64,1},
q ::Array{Float64,1},
β ::Array{Float64,1},
t ::Float64,
x ::Array{Float64,1},
y ::Array{Float64,N},
cov_mod ::Tuple{Vararg{String}};
δt ::Float64 = 1e-4,
ast ::Int64 = 0,
nspp_max ::Int64 = 100_000,
retry_ext ::Bool = true,
rejectel0 ::Bool = true,
verbose ::Bool = true,
rm_ext ::Bool = true,
states_only::Bool = false,
start ::Symbol = :crown) where {N}Simulate tree according to the geographic sse model. The number of areas and hidden states is inferred from the parameter vectors, but they must be consistent between them and with the covariates. See tutorial for an example.
Arguments
λ::Array{Float64,1}: rates for within-area and between-area speciation.μ::Array{Float64,1}: per-area extinction rates when it leads to global
extinction.
l::Array{Float64,1}: per-area extinction rates when it leads to local
extinction.
g::Array{Float64,1}: colonization rates between areas.q::Array{Float64,1}: transition rates between hidden states.β::Array{Float64,1}: per-area effect of covariates.t::Float64: simulation time.x::Array{Float64,1}: times where the covariateyis sampled.y::Array{Float64,N}: value of covariates, i.e.,f(x). Can be multivariate.cov_mod::Tuple{Vararg{String}}: specifies which rates are affected by
covariates: s for speciation, e for extinction, and g for colonization. More than 1 is possible.
δt::Float64 = 1e-4: time step to perform simulations. Smaller more precise
but more computationally expensive.
ast::Int64 = 0: initial state.0specifies random sampling based on the
input parameters.
nspp_max::Int64 = 100_000: maximum number of species allowed to stop
simulation.
retry_ext::Bool = true: automatically restart simulation if simulation goes
extinct.
rejectel0::Bool = true: reject simulations where there are edges of 0
length.
verbose::Bool = true: print messages.rm_ext::Bool = true: remove extinct taxa from output.states_only::Bool = false: if only return tip states (faster).start::Symbol = :crown: ifcrown, starts after a speciation event with
two lineages, if stem, starts with one lineage.
Returned values
- Dictionary with tip number and corresponding state.
- Array with parent -> daughter edges.
- Array with edge lengths.
- Number of maximum species.