Insane tree data access and processing functions

Basic tree information

Basic information about the tree, such as the number tips, the number of extinct tips, the number of fossils, the tree height (duration of the tree) and the tree length (sum of all branch lengths) can be performed using, respectively:

ntips(tree)
ntipsextinct(tree)
nfossils(tree)
treeheight(tree)
treelength(tree)

Tree vector statistics

Julia makes it simple to look at statistics across a vector of trees. For example, using the package Statistics, we can estimate the average number of extinct species on tree vector tv by simply:

using Statistics 

mean(ntipsextinct, tv)

See the documentation of mean for more details, but basically, mean, and many other functions in Julia allow to perform a undefined function on each element before calculating the mean. In this case, we are estimating the number of extinct tips in each tree in tv, and then averaging over them.

Tree labels and obtaining subtrees

For labelled trees, one can extract the tip labels using tiplabels. Moreover we can create subclades based on a vector of tips, where the subclade will be the minimum tree that has all the tips. For example, for a vector tip_vector holding Strings that correspond to the tip labels in the tree of type sT_label, we can use

subclade(tree, tip_vector)

However, most time we want to extract subclades of other types of trees. Since these do not hold label information but should be ordered in the same order as the sT_label tree, one has to use both.

Warning

if you change the order of either the sT_label tree or the single of vector of trees of other types, this will not work.

Thus, if we want a the subclades that have the tips in tip_vector, we can use

subclade(tv, tree, tip_vector, true)

where the last argument states if returning the stem or crown tree.

Lineage and Diversity through time (LTT & DTT)

One can also estimate the Lineage Through Time (LTT), or, for a data augmented tree (or a vector of trees), the Diversity Through Time (DTT) using

ltt(tree)

To be clear, the LTT is usually used to describe the accumulation of reconstructed lineages (those that have been sampled) while DTT is used to describe estimated diversity (sampled and unsampled lineages). Thus, the result is either LTT or DTT simply depending on the tree you use as input.

We can also estimate the ltt for a tree vector tv using

ltt(tv)

Trees with diffusion information (e.g., BDD, FBDD, DBM)

Estimating posterior average rates along the tree

Of particular interest is the estimation of posterior average rates along the reconstructed tree. Since the data augmented (unsampled) lineages change between different iterations of the algorithm, we obtain lineage-specific instantaneous rate distributions only for the reconstructed (observed) part of the trees (the tree we used as input). Consequently, we first need to remove the data augmented lineages from all the trees in the posterior tree vector:

tv0 = remove_unsampled(tv)

We can then estimate the average tree using

tm = imean(tv0)

We can also estimate any quantile tree, for instance, for the $0.25$ quantile tree:

t025 = iquantile(tv0, 0.25)

Clearly, these resulting trees can then be further scrutinized as with any other tree in INSANE.

Attribute wrappers

For convenience, Tapestree provides the following tree attribute wrappers:

  • birth: To obtain speciation rates (i.e., x -> exp.(lλ(x)))
  • logbirth: To obtain the logarithm of speciation rates (i.e., x -> lλ(x))
  • death: To obtain extinction rates (i.e., x -> exp.(lμ(x)))
  • logdeath: To obtain the logarithm of extinction rates (i.e., x -> lμ(x))
  • turnover: To obtain turnover rates (i.e., x -> exp.(lμ(x) .- lλ(x)))
  • diversification: To obtain speciation rates (i.e., x -> exp.(lλ(x)) .- exp.(lμ(x)))
  • trait: To obtain speciation rates (i.e., x -> xv(x))
  • logtrait: To obtain speciation rates (i.e., x -> log.(xv(x)))
  • traitrate: To obtain speciation rates (i.e., x -> exp.(lσ2(x)))
  • logtraitrate: To obtain speciation rates (i.e., x -> lσ2(x))

Other data access and averaging functions

The value of function f at the tips of the tree and any fossil samples can be obtained using the tipget function. For example, to obtain the speciation rates for sampled species from a data augmented tree treeda (any tree output when running inference), use

tipget(treeda, tree, birth)

where tree is the labelled tree used as input (of type sT_label or sTf_label). This function returns a dictionary of labels pointing to the specific value returned by f.

A common need is to obtain the posterior value of function f for each species. This can be done by first Estimating posterior average rates along the tree, and, assuming the resulting psoterior average tree is named tm, then using

tipget(tm, tree, f)

to get any attribute returned by f (e.g., speciation rates, extinction rates, traits, trait rates, etc., see Attribute wrappers for functions)

If one wants to obtain the range (i.e., extrema) of the output of function f on tree, for example, the maximum and minimum speciation rates:

irange(tree, birth)

If one wants to sample, recursively, some function at regular intervals along a tree, one can use sample. For example if we want to sample speciation rates every $0.1$ time units, we can use

sample(tv, birth, 0.1)
Note

Here we are sampling along each branch of the tree in recursive order, not sampling across lineages through time.

If we would like to extract an array across lineages in a given tree of the output of function f, we would use time_rate. For example, if we want the cross-lineage extinction rates of a tree of type iTbd sampled every $0.5$ time units, we would use

time_rate(tv, death, 0.5)

which returns a vector of vectors, where each element is a time holding the rates (in this case extinction rates) of all contemporary lineages at that time.

Finally, a convenience wrapper to extract information recursively from a tree is trextract. For example, if we want all branch lengths for a tree, we can use

trextract(tree, e)

Below are some functions to obtain data from trees.

Full documentation

Tapestree.INSANE.ntipsextinctFunction
ntipsextinct(tree::T) where {T <: iTree}

Return the number of extinct nodes for tree.

source
ntipsextinct(Ξ::Vector{T}) where {T <: iTree}

Return the number of extinct nodes in Ξ.

source
Tapestree.INSANE.treeheightFunction
treeheight(tree::T) where {T <: iTree}

Return the tree height of tree.

source
treeheight(tree::T) where {T <: Union{iTf, iTpbd}}

Return the tree height of tree.

source
treeheight(tree::T, nd::Int64) where {T <: iTree}

Return the tree height of tree.

source
treeheight(tree::T, nd::Int64) where {T <: Union{iTf, iTpbd}}

Return the tree height of tree.

source
Tapestree.INSANE.treelengthFunction
treelength(tree::T) where {T <: iTree}

Return the branch length sum of tree.

source
treelength(tree::T, ets::Vector{Float64})  where {T <: Union{iTf, iTpbd}}

Return the branch length sum of tree at different epochs, initialized at l.

source
treelength(Ξ::Vector{T}) where {T <: iTree}

Return the branch length sum of Ξ.

source
treelength(Ξ  ::Vector{T},
           ets::Vector{Float64},
           bst::Vector{Float64},
           eix::Vector{Int64})  where {T <: iTf}

Return the branch length sum of tree at different epochs, initialized at l.

source
Tapestree.INSANE.lttFunction
ltt(tree::T) where {T <: iTree}

Returns number of species through time.

source
ltt(tree::Vector{T}) where {T <: iTree}

Returns number of species through time for a tree vector.

source
ltt(tree::T, tor::Float64) where {T <: iTree}

Returns number of species through time for a tree vector.

source
Tapestree.INSANE.tipgetFunction
tipget(treeda::T, tree::D, f::Function) where {T <: iTree, D <: Tlabel}

Return function f for tips or fossils in treeda with labels from tree.

source
Tapestree.INSANE.time_rateFunction
time_rate(tree::T, f::Function, δt::Float64) where {T <: iT}

Extract values from f function at times sampled every δt across the tree.

source
Tapestree.INSANE.subcladeFunction
subclade(tree::iTree, ix::Int64)

Return the minimum stem subclade according to recursive position ix.

source
subclade(trees::Vector{T}, 
              ltree::sT_label, 
              tips ::Vector{String},
              stem ::Bool) where {T <: iTree}

Return the minimum subclade that includes tip labels in tips.

source
subclade(tree::sT_label, tips::Vector{String})

Return the minimum subclade that includes tip labels in tips.

source
subclade(tree::iTree, 
         ltree::sT_label, 
         tips ::Vector{String}, 
         stem ::Bool)

Return the minimum subclade that includes tip labels in tips.

source
Tapestree.INSANE.lλFunction
lλ(tree::T) where {T <: iT}

Return the speciation rate (speciation completion in a protracted model).

source

Insane tree manipulation functions

Two important manipulation functions are, first to be able to remove extinct lineages, which can be performed on a tree or a tree vector using

remove_extinct(tree)

Similarly, as shown above, one can remove the unsampled lineages (all the data augmented lineages) on a single or vector of trees using

remove_unsampled(tree)
Note

remove_extinct and remove_unsampled are different. First, when performing simulations, the tree is not fixed, which means that if you run remove_unsampled, you will remove the tree. You would have to fix the tree before, which can be done using fixtree!(tree). Also, if sampling fraction is not $1$, remove_unsampled will also remove lineages alive that were not sampled, while remove_extinct will only remove those lineages extinct.

For fossil trees, one can remove all fossils using

remove_fossils(tree)

or make a given tree a fossil by using

fossilize!(tree)

which will only make fossil that specific tree (not the recursive daughters).

Full documentation

Tapestree.INSANE.reorder!Function
reorder(tree::T) where {T <: iTree}

Reorder order of daughter branches according to number of tips, with daughter 1 always having more than daughter 2.

source
reorder!(tree::T, treeda::D) where {T <: iTree, D <: iTree}

Reorder data augmented tree treeda according to tree.

source
Tapestree.INSANE.remove_extinctFunction
remove_extinct(tree::T) where {T <: iTree}

Remove extinct tips from iTce.

source
remove_extinct(treev::Vector{T}) where {T <: iTree}

Remove extinct taxa for a vector of trees.

source
Tapestree.INSANE.remove_unsampledFunction
remove_unsampled(tree::T) where {T <: iTree}

Remove unsampled tips from iTree.

source
remove_unsampled(treev::Vector{T}) where {T <: iTree}

Remove unsampled taxa for a vector of trees.

source