Evo-Improvising: 2015

Friday, October 16, 2015

Dave Matthews Band in Zürich

I discovered Dave Matthews Band when I was doing my Bachelors in Mexico City, back in 2007-2008. In particular certain concert with Tim Reynolds has been my favorite show in youtube for years, and specially during my time in Montpellier (2014), it was by far my favorite music for working. Therefore, back in February or March (2015) when I saw that they were coming to Zurich, I immediately decided to go. Sadly, all my friends that like the band were in Mexico, but I bought a couple of tickets anyway (expensive ones, 80CHF each, but back then I had a Swiss salary...).

"Crush" was always my favorite. That chicken is indeed sooo good.

The concert was yersterday (15th of October 2015) and it was extremely, extremely good. Excluding the 2011 concert of Pearl Jam in Mexico City, I think this one is now my favorite ever.

Pearl Jam in Mexico City (Foro Sol) in 2011. Also a concert very close to my heart.

It's clear they have tons of experience and careful planning with lights and order of songs. They take you from extreme energy and loudness to soft-relaxing tunes no problem. They played songs like "Don't drink the water", "When the World Ends", "What would you say", and "Space between". The band also played a few new songs, from which I was particularly pleased with one that (I think) it's called something like "Black and blue".

"What would you say" in Zurich, taken by somebody with a much better seat than me lol. The cartoon on the background was very cool!

To my great happiness, they did play "Crush"!! My cellphone run out of memory exactly then, so I only caught the beginning, and it was actually the end that was a-m-a-z-i-n-g. Very heavy in violin! I love it, I will be looking for it in youtube like crazy.

The start of "Crush" *o* in Zurich (my lousy video)!

I was thrilled that Tim Reynolds was there, and he did some solos with his electric guitar. His playing is so intense that I was constantly expecting him to catch on fire spontaneously, so gooooood >_<! I was also very impressed by the other members of the band, all super talented and experienced. Boyd Tinsley (violinist) is a rock star completely, amazing solos haha. Also the incredible Jeff Cofin (saxophonist)! So much stamina and versatility! At some point he and the trumpeter Rashawn Ross played something like in a competition, super cool! The bass Stefan Lessard did some relaxing solos too. The drummer Carter Beauford was always smiling and having a great time. At then end of the concert he threw away MANY drumsticks to the fans up front (I was so envious...). And after that, Zurich people made so much noise that they came back for an encore (it took them a while, I thought they wouldn't...), which include "You and me". All and all, the concert lasted three hours!

Absolutely worth it, despite the gloomy weather in Zurich these days :).

Wednesday, July 29, 2015

When we fail MrBayes (my version)

In the last few weeks I’ve been extensively working with basic molecular phylogenetics again. My previous experience with this topic has been mostly positive, with reasonably well behaved datasets (I think). However, thanks to the reviewers of the paper that I’m trying to publish, I came to realize that we, the community of random mortal little researchers working with non-model and typically diverse organisms, are doing things rather wrong.

I feel that phylogenetics (like many other topics in science nowadays) have become so complex in theoretical terms, but at the same time so accessible through easy-peasy PCR and friendly programs like RAxML, that the end-users are just blindly applying default values and getting quick (and dirty) phylogenies for their favorite organisms. Maybe I'm exaggerating, and people for the most part know what they are doing, but I don’t think this is reflected on the papers published.

An old post (the original "When we fail MrBayes, Part II") from 2009 written by Dr. Dan Rabosky (whom I had the pleasure to personally meet in a course he was giving, although I was just one among 30 other grad students) truly opened my eyes. I think the blog in question is not longer working, which is a terrible shame since one can quickly see that the people talking there are “elite” in phylogenetic methods. Anyway, the point for me with that particular post was a central comment:

“On the whole, the results of this survey do not encourage me. Is our research community doing enough to diagnose convergence failure in MCMC analyses? How severe is this problem? Maybe I’m making a mountain out of a molehill here based on my own experience with a few poorly-behaved datasets. But looking at the literature, it is hard to convince myself that most studies are adequately diagnosing convergence problems, and I can’t help but feel a bit unsettled by all of this.”

Frankly, I don’t think this issue has improved now, in 2015. I see the papers that I cite about corals or fungi, and really...we say very little. Ironically (or maybe not), the papers that are most explicit about their actual practices are in very low impact journals that are relevant for the very few people that care about these groups (e.g. The Lichenologist). I think this is related to the fact than in the NGS era you cannot publish a plain old phylogeny anymore. It's as if in our frenzy of spending tons of money for billions of reads, we've forgotten than 1) you need still to collect very difficult samples of obscure species and field work is still expensive, and 2) sometimes the best thing you can get is a bloody nuclear ribosomal marker. So... yeah.

In his post, Rabosky argues that we should for sure be using AWTY ("Are We There Yet?"... I know, it's awesome..), a “system for graphical exploration of Markov chain Monte Carlo (MCMC) convergence in Bayesian phylogenetic inference”. Unlike Tracer, AWTY actually looks at convergence in topology, the one thing that most of us care about!! On my own I tried to understand AWTY’s output based on the original paper and Rabosky et al’s comments in that post. At this point I think I kinda get it, but it’s just me being self-thought.

In any case, my current coral dataset is not well behaved and that gives me headaches. I did the relevant analyses for the paper back in 2013, and truth is, the MCMC chains were a disaster:

Funny enough, no matter what I did, the topology was always the same (and the same one as the RAxML tree)…Years later I came back to the data set, I run MrBayes with the same data block, and PUM! The horrible jumps are gone (and the topology was still the same)! It took me forever to understand what was the difference, silly me.

As it turns out, back in 2013 I was using MrBayes v. 3.2.1 in an old iMac. Now I’ve been using the new MrBayes 3.2.4 x64 (in a cluster). Maybe it’s just me, but among the million things that MrBayes authors might be working on, the one thing that was different and relevant for my study was the prior for the branch lengths! The new prior, described by Rannala et al (2012), seems to be definitely better than the old exponential distribution with rate of 10.0 (and therefore an expected mean branch length of 0.1). I look at my current tree and clearly the branches are not even close to 0.1, they are much shorter. I suspect that was messing up the estimates of the model of molecular evolution...

In any case, the real lessons here are:

- No, you cannot just go ahead and just use the default temperature and default priors so easily.

- We should reaaaally be reporting things like EES, PSRF, average standard deviation of split frequencies values, along with Tracer and AWTY plots.

There is more to life than the model chosen by JModelTest, people!

Anyway, check out Rannala et al’s paper. Even if you are not really familiar with the actual mathematics (I’m not, for one thing), you will learn a thing or two.

Another terrific paper that I recently encountered is Marchall et al. (2006). It explicitly talks about working with MrBayes, which I loved!

Rannala, B., Zhu, T., and Yang, Z. (2012). Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Molecular Biology and Evolution, 29(1):325-335.

Marchall, D. C., Simon, C., and Buckley, T. R. (2006) Accurate branch length estimation in partitioned Bayesian analyses requires accommodation of among-partition rate variation and attention to branch length priors. Systematic Biology, 55(6):993-1003.

Wednesday, January 7, 2015

How to collapse unsupported branches of trees into multichotomies using Ape in R

To collapse the unsupported branches of a phylogenetic tree into polytomies sounds actually pretty basic, but I couldn't find a satisfactory way (or code) around, except for this. There the user brice.server makes a suggestion, but it doesn't show any code for it.

So in here I put my own attempt, which I think is rather clumsy, but oh well...

First, I got a bit familiar with a phylo object, by reading for example this. Pay special attention to their explanation of tree$edge. There goes the code:

library(ape) # Load the Ape library
setwd("Path/to/my/trees") # Add here your path to the tree file

I was working with the output of RAxML, which is a Newick tree. It can be read like this:

tree <- read.tree("My_RAxML_tree.tre")

Or if you're working with a nexus file, like for example from MrBayes, you can call it:

tree <- read.nexus("My_MrBayes_tree.con")[[1]] # Extract only the first tree

Remember that in MrBayes, to get your consensus tree, you have to type conformat=simple when using sumt. Otherwise, ape won't see the branch support values (although FigTree will). So do (for example):

MrBayes > sumt relburnin = yes burninfrac = 0.25 conformat=simple

Anyway, back to getting rid of the unsupported branches:

# ==== Prune branches that have low support ====
# Which nodes have low support?
Badnodes <- which(as.numeric(tree$node.label) < 70) + length(tree$tip.label)

# The number of those nodes are equal to the position in node.label plus the number of species (the first n nodes)

# The branch lengths INDEXES of each node are tree$edge[,1], and tree$edge[,2] are the nodes names

Notice that in here I considered unsupported a branch with bootstrap values that were less than 70. You could instead use 0.95 for posterior probabilities, for instance.

# The actual indexes of the nodes with low support in tree$edge
Badnodes_indexes <- c()
for(node in Badnodes){
  Badnodes_indexes <- c(Badnodes_indexes, which(tree$edge[,2] == node))
}

I'm sure there is a much smarter way of doing that last part than a for loop, but I was lazy to check (suggestions?)

tree$edge.length[Badnodes_indexes] <- 0 # Make the branch length of the bad nodes = 0
tree_multi <- di2multi(tree) #Use di2multi function to convert the branches with lenght = 0, to multichotomies

Finally, save your tree in a Newick file:

write.tree(tree_multi, file = "My_edited_tree.tre")