ESSE

Reference

Quintero, I., Landis, M. J., Jetz, W., & Morlon, H. (2022). The build-up of the present-day tropical diversity of tetrapods. Proceedings of the National Academy of Sciences, 2023. 120 (20) e2220672120. https://doi.org/10.1073/pnas.2220672120

Example

In this example we run some random data for 50 species across 2 areas, where each area has a specific covariate that affects speciation and we also allow for 2 hidden states.

Open Julia and load the Tapestree package:

using Tapestree

Specify the path to the phylogenetic tree (in a format that ape::read.tree() can read):

tree_file = joinpath(dirname(pathof(Tapestree)), "..", "data", "tree_50.tre")

Specify state data. Data should be a .txt file where each row is a species, first the species name that matches the tree tip labels and each subsequent column specifying presence or absence in a given area with 0 or 1, respecitvely. Open st2_data.txt in the data folder to see an example for 2 areas.

states_file = joinpath(dirname(pathof(Tapestree)), "..", "data", "st2_data.txt")

Specify covariate data $y = f(x)$. Data should be a .txt file where the first column is time $x$ in backward fashion (the present is $0$ and the past is $> 0$), and the subsequent columns are the respective time covariates for each area $f(x)$. If there is only one covariate, the same is used across all areas, if not, the number of covariates should match the specific model. More than one covariate per area is allowed, and in the case of covariates affecting colonization rates, they should match the number of possible colonization parameters between all areas. Open env_data_2.txt in the data folder to see an example for 2 covariates for 2 areas.

envdata_file = joinpath(dirname(pathof(Tapestree)), "..", "data", "env_data_2.txt")

Specify output MCMC file (homedir() is an alias to your home folder)

out_file = *(homedir(),"...")

Specify the (optional) output file (homedir() is an alias to your home folder) for the node marginal probabilities.

out_states = *(homedir(),"...")

Run the esse() (ESSE: Environmental and State dependent Speciation and Extinction) model, with covariates affecting speciation rates:

esse(tree_file, out_file, 2, envdata_file = envdata_file, 
  states_file = states_file, out_states = out_states, cov_mod = ("s",))

If one would like to make the covariates affect other rates, such as dispersal, in addition to speciation rates, one would specify the following covariate model cov_mod = ("s","g"). Note however that this has not been validated. Moreover, covariate effect on extinction is non retrievable from extant-only phylogenetic trees.

Parallel MC3

It is encouraged to use Metropolis coupled MCMC (MC3) for more robust convergence (the posterior surfaced is highly peaked).

Load the Distributed package, set the number of processors for Julia, and make Tapestree available to all (see the Distributed package for more information). Below we add 3 processors.

using Distributed
addprocs(3)
@everywhere using Tapestree

Set parameter constraints

To constrain parameters to be equal to one another or to fix them to be 0, it is necessary to create equalities between parameters and pass them as an argument. In general parameters are specified as follows: "<parameter name>_<area>_<hidden state>"

Speciation is "lambda" (e.g. speciation rate for area A and hidden state 0: lambda_A_0)
Local extinction is "loss" (e.g. local extinction rate for area B and hidden state 1: loss_B_1)
colonization is "gain" (e.g. gain rate from A -> B and hidden state 1: gain_AB_1)
the effect of the covariate is "beta_<effect parameter name>" (e.g. effect of first covariate on speciation in area A and hidden state 0: beta_lambda_1_A_0 - here the 1 after the lambda is because is the first covariate)
transition between hidden states is "q" (e.g. transition rate from hidden state 0 -> 1 and hidden state 1: q_01).

You have to make sure that the given constraints apply to the specification of model. In the following example, we constraint local and global extinction rates to be the same across 2 areas and 2 hidden states , and constrain the hidden states transition rates.

cpar = ("q_01 = q_10",
        "loss_A_0 = mu_A_0",
        "loss_B_0 = mu_B_0",
        "loss_A_1 = mu_A_1",
        "loss_B_1 = mu_B_1")

We now run esse, using MC3 with Metropolis-Hastings MCMC (here using 3 parallel chains).

esse(tree_file, out_file, 2,
     envdata_file = envdata_file,
     states_file  = states_file, 
     out_states   = out_states,
     constraints  = cpar,
     cov_mod      = ("s",),
     ncch         = 3,
     parallel     = true,
     niter        = 5_000,
     nthin        = 100,
     dt           = 0.8,
     nburn        = 1_000, 
     mc           = "mh",
     node_ps      = (true, 100))

For optional (keyword) arguments, see below

Function Documentation

Tapestree.ESSE.esse — Function

esse(tree_file   ::String,
     out_file    ::String,
     h           ::Int64;
     states_file ::String            = "NaN",
     envdata_file::String            = "NaN",
     cov_mod     ::NTuple{M,String}  = ("",),
     node_ps     ::Tuple{Bool,Int64} = (true, 10),
     out_states  ::String            = "",
     constraints ::NTuple{N,String}  = (" ",),
     mvpars      ::NTuple{O,String}  = (" ",),
     niter       ::Int64             = 10_000,
     nthin       ::Int64             = 10,
     nburn       ::Int64             = 200,
     tune_int    ::Int64             = 100,
     nswap       ::Int64             = 10,
     ncch        ::Int64             = 1,
     parallel    ::Bool              = ncch > 1,
     dt           ::Float64          = 0.2,
     ntakew      ::Int64             = 100,
     winit       ::Float64           = 2.0,
     scale_y     ::NTuple{2,Bool}    = (true, false),
     algorithm   ::String            = "pruning",
     mc          ::String            = "slice",
     λpriors     ::Float64           = .1,
     μpriors     ::Float64           = .1,
     gpriors     ::Float64           = .1,
     lpriors     ::Float64           = .1,
     qpriors     ::Float64           = .1,
     βpriors     ::NTuple{2,Float64} = (0.0, 10.0),
     hpriors     ::Float64           = .1,
     optimal_w   ::Float64           = 0.8,
     tni         ::Float64           = 1.0,
     obj_ar      ::Float64           = 0.6,
     screen_print::Int64             = 5,
     Eδt         ::Float64           = 1e-3,
     ti          ::Float64           = 0.0,
     ρ           ::Array{Float64,1}  = [1.0]) where {M,N,O}

Run geographic esse. See tutorial for how these files should be specified.

Arguments

tree_file::String: full path to tree file.
out_file::String: full path to write MCMC output.
h::Int64: number of hidden states.
states_file ::String = "NaN": full path to states file. If "NaN", no

observed states are used (only hidden states).

envdata_file::String = "NaN": full path to covariates file. If "NaN", no

covariates are used (i.e., constant rates).

cov_mod::NTuple{M,String} = ("",): specifies which rates are affected by

covariates: s for speciation, e for extinction, and g for colonization. More than 1 is possible.

node_ps::Tuple{Bool,Int64} = (true, 10): first index specifies if posterior

marginal probabilities for nodes should be computed, second index the number of iterations to be computed.

out_states::String = "": full path to write node probabilities output.
constraints::NTuple{N,String} = (" ",): constraints for the model

parameters.

mvpars::NTuple{O,String} = (" ",): which parameters should be multivariate

when using slice sampling for better convergence.

niter::Int64 = 10_000: number of iterations.
nthin::Int64 = 10: frequency at which to record MCMC state.
nburn::Int64 = 200: number of iterations to discard as burn-in.
tune_int::Int64 = 100: number of iterations during nburn to tune proposal

window for MH.

nswap::Int64 = 10: every iteration to try to swap chain likelihoods in MC3.
ncch::Int64 = 1: number of chains.
parallel::Bool = false: if parallel run.
dt::Float64 = 0.2: temperature for MC3.
ntakew::Int64 = 100: number of iterations from nburn to tune the window

for slice sampling.

winit::Float64 = 2.0: initial window for slice sampling.
scale_y::NTuple{2,Bool} = (true, false): first index if scale covariates y

to [0,1], second, if scale covariates y all together between [0,1].

algorithm::String = "pruning": likelihood algorithm between pruning

(recommended) or flow.

mc::String = "slice": which sampling slice (slice-sampling) or

mh (metropolis-hasting).

λpriors::Float64 = 0.1: rate of Exponential prior for speciation.
μpriors::Float64 = 0.1: rate of Exponential prior for global extinction.
gpriors::Float64 = 0.1: rate of Exponential prior for colonization.
lpriors::Float64 = 0.1: rate of Exponential prior for local extinction.
qpriors::Float64 = 0.1: rate of Exponential prior for hidden state

transitions.

βpriors::NTuple{2,Float64} = (0.0, 10.0): mean and variance of Normal

prior for effect of covariates.

hpriors::Float64 = 0.1: rate of Exponential prior for differences between

hidden states.

optimal_w::Float64 = 0.8: optimal window.
tni::Float64 = 1.0: initial tuning for rates.
obj_ar::Float64 = 0.23: objective acceptance rate.
screen_print::Int64 = 5: seconds to wait to update screen log.
Eδt::Float64 = 1e-3: for flow algorithm.
ti::Float64 = 0.0: for flow algorithm.
ρ::Array{Float64,1} = [1.0]: sampling fraction for each state (each area and

widespread).

Returned values

Array of the mcmc parameters.

source

esse(tv          ::Dict{Int64,Array{Float64,1}},
     ed          ::Array{Int64,2}, 
     el          ::Array{Float64,1}, 
     x           ::Array{Float64,1},
     y           ::Array{Float64,L}, 
     cov_mod     ::NTuple{M,String},
     out_file    ::String,
     h           ::Int64;
     constraints ::NTuple{N,String}  = (" ",),
     mvpars      ::NTuple{O,String}  = ("lambda = beta",),
     niter       ::Int64             = 10_000,
     nthin       ::Int64             = 10,
     nburn       ::Int64             = 200,
     ncch        ::Int64             = 1,
     ntakew      ::Int64             = 100,
     winit       ::Float64             = 2.0,
     scale_y     ::NTuple{2,Bool}    = (true, false),
     algorithm   ::String            = "pruning",
     λpriors     ::Float64           = .1,
     μpriors     ::Float64           = .1,
     gpriors     ::Float64           = .1,
     lpriors     ::Float64           = .1,
     qpriors     ::Float64           = .1,
     βpriors     ::NTuple{2,Float64} = (0.0, 10.0),
     hpriors     ::Float64           = .1,
     optimal_w   ::Float64           = 0.8,
     screen_print::Int64             = 5,
     Eδt         ::Float64           = 1e-3,
     ti          ::Float64           = 0.0,
     ρ           ::Array{Float64,1}  = [1.0]) where {L,M,N,O}

Wrapper for running a esse model from simulations.

source

Tapestree.ESSE.simulate_sse — Function

simulate_sse(λ          ::Array{Float64,1},
             μ          ::Array{Float64,1},
             l          ::Array{Float64,1},
             g          ::Array{Float64,1},
             q          ::Array{Float64,1},
             t          ::Float64;
             δt         ::Float64 = 1e-4,
             ast        ::Int64   = 0,
             nspp_max   ::Int64   = 100_000,
             retry_ext  ::Bool    = true,
             rejectel0  ::Bool    = true,
             verbose    ::Bool    = true,
             rm_ext     ::Bool    = true,
             states_only::Bool    = false, 
             start      ::Symbol  = :crown)

Simulate tree according to the geographic esse model. The number of areas and hidden states is inferred from the parameter vectors, but they must be consistent between them and with the covariates. See tutorial for an example.

Arguments

λ::Array{Float64,1}: rates for within-area and between-area speciation.
μ::Array{Float64,1}: per-area extinction rates when it leads to global

extinction.

l::Array{Float64,1}: per-area extinction rates when it leads to local

extinction.

g::Array{Float64,1}: colonization rates between areas.
q::Array{Float64,1}: transition rates between hidden states.
t::Float64: simulation time.
δt::Float64 = 1e-4: time step to perform simulations. Smaller more precise

but more computationally expensive.

ast::Int64 = 0: initial state. 0 specifies random sampling based on the

input parameters.

nspp_max::Int64 = 100_000: maximum number of species allowed to stop

simulation.

retry_ext::Bool = true: automatically restart simulation if simulation goes

extinct.

rejectel0::Bool = true: reject simulations where there are edges of 0

length.

verbose::Bool = true: print messages.
rm_ext::Bool = true: remove extinct taxa from output.
states_only::Bool = false: if only return tip states (faster).
start::Symbol = :crown: if crown, starts after a speciation event with

two lineages, if stem, starts with one lineage.

Returned values

Dictionary with tip number and corresponding state.
Array with parent -> daughter edges.
Array with edge lengths.
Number of maximum species.

source

simulate_sse(λ          ::Array{Float64,1},
             μ          ::Array{Float64,1},
             l          ::Array{Float64,1},
             g          ::Array{Float64,1},
             q          ::Array{Float64,1},
             β          ::Array{Float64,1},
             t          ::Float64,
             x          ::Array{Float64,1},
             y          ::Array{Float64,N},
             cov_mod    ::Tuple{Vararg{String}};
             δt         ::Float64 = 1e-4,
             ast        ::Int64   = 0,
             nspp_max   ::Int64   = 100_000,
             retry_ext  ::Bool    = true,
             rejectel0  ::Bool    = true,
             verbose    ::Bool    = true,
             rm_ext     ::Bool    = true,
             states_only::Bool    = false,
             start      ::Symbol  = :crown) where {N}

Simulate tree according to the geographic sse model. The number of areas and hidden states is inferred from the parameter vectors, but they must be consistent between them and with the covariates. See tutorial for an example.

Arguments

λ::Array{Float64,1}: rates for within-area and between-area speciation.
μ::Array{Float64,1}: per-area extinction rates when it leads to global

extinction.

l::Array{Float64,1}: per-area extinction rates when it leads to local

extinction.

g::Array{Float64,1}: colonization rates between areas.
q::Array{Float64,1}: transition rates between hidden states.
β::Array{Float64,1}: per-area effect of covariates.
t::Float64: simulation time.
x::Array{Float64,1}: times where the covariate y is sampled.
y::Array{Float64,N}: value of covariates, i.e., f(x). Can be multivariate.
cov_mod::Tuple{Vararg{String}}: specifies which rates are affected by

covariates: s for speciation, e for extinction, and g for colonization. More than 1 is possible.

δt::Float64 = 1e-4: time step to perform simulations. Smaller more precise

but more computationally expensive.

ast::Int64 = 0: initial state. 0 specifies random sampling based on the

input parameters.

nspp_max::Int64 = 100_000: maximum number of species allowed to stop

simulation.

retry_ext::Bool = true: automatically restart simulation if simulation goes

extinct.

rejectel0::Bool = true: reject simulations where there are edges of 0

length.

verbose::Bool = true: print messages.
rm_ext::Bool = true: remove extinct taxa from output.
states_only::Bool = false: if only return tip states (faster).
start::Symbol = :crown: if crown, starts after a speciation event with

two lineages, if stem, starts with one lineage.

Returned values

Dictionary with tip number and corresponding state.
Array with parent -> daughter edges.
Array with edge lengths.
Number of maximum species.

source