Evo-Improvising

Friday, October 16, 2015

Dave Matthews Band in Zürich

I discovered Dave Matthews Band when I was doing my Bachelors in Mexico City, back in 2007-2008. In particular certain concert with Tim Reynolds has been my favorite show in youtube for years, and specially during my time in Montpellier (2014), it was by far my favorite music for working. Therefore, back in February or March (2015) when I saw that they were coming to Zurich, I immediately decided to go. Sadly, all my friends that like the band were in Mexico, but I bought a couple of tickets anyway (expensive ones, 80CHF each, but back then I had a Swiss salary...).

"Crush" was always my favorite. That chicken is indeed sooo good.

The concert was yersterday (15th of October 2015) and it was extremely, extremely good. Excluding the 2011 concert of Pearl Jam in Mexico City, I think this one is now my favorite ever.

Pearl Jam in Mexico City (Foro Sol) in 2011. Also a concert very close to my heart.

It's clear they have tons of experience and careful planning with lights and order of songs. They take you from extreme energy and loudness to soft-relaxing tunes no problem. They played songs like "Don't drink the water", "When the World Ends", "What would you say", and "Space between". The band also played a few new songs, from which I was particularly pleased with one that (I think) it's called something like "Black and blue".

"What would you say" in Zurich, taken by somebody with a much better seat than me lol. The cartoon on the background was very cool!

To my great happiness, they did play "Crush"!! My cellphone run out of memory exactly then, so I only caught the beginning, and it was actually the end that was a-m-a-z-i-n-g. Very heavy in violin! I love it, I will be looking for it in youtube like crazy.

The start of "Crush" *o* in Zurich (my lousy video)!

I was thrilled that Tim Reynolds was there, and he did some solos with his electric guitar. His playing is so intense that I was constantly expecting him to catch on fire spontaneously, so gooooood >_<! I was also very impressed by the other members of the band, all super talented and experienced. Boyd Tinsley (violinist) is a rock star completely, amazing solos haha. Also the incredible Jeff Cofin (saxophonist)! So much stamina and versatility! At some point he and the trumpeter Rashawn Ross played something like in a competition, super cool! The bass Stefan Lessard did some relaxing solos too. The drummer Carter Beauford was always smiling and having a great time. At then end of the concert he threw away MANY drumsticks to the fans up front (I was so envious...). And after that, Zurich people made so much noise that they came back for an encore (it took them a while, I thought they wouldn't...), which include "You and me". All and all, the concert lasted three hours!

Absolutely worth it, despite the gloomy weather in Zurich these days :).

Wednesday, July 29, 2015

When we fail MrBayes (my version)

In the last few weeks I’ve been extensively working with basic molecular phylogenetics again. My previous experience with this topic has been mostly positive, with reasonably well behaved datasets (I think). However, thanks to the reviewers of the paper that I’m trying to publish, I came to realize that we, the community of random mortal little researchers working with non-model and typically diverse organisms, are doing things rather wrong.

I feel that phylogenetics (like many other topics in science nowadays) have become so complex in theoretical terms, but at the same time so accessible through easy-peasy PCR and friendly programs like RAxML, that the end-users are just blindly applying default values and getting quick (and dirty) phylogenies for their favorite organisms. Maybe I'm exaggerating, and people for the most part know what they are doing, but I don’t think this is reflected on the papers published.

An old post (the original "When we fail MrBayes, Part II") from 2009 written by Dr. Dan Rabosky (whom I had the pleasure to personally meet in a course he was giving, although I was just one among 30 other grad students) truly opened my eyes. I think the blog in question is not longer working, which is a terrible shame since one can quickly see that the people talking there are “elite” in phylogenetic methods. Anyway, the point for me with that particular post was a central comment:

“On the whole, the results of this survey do not encourage me. Is our research community doing enough to diagnose convergence failure in MCMC analyses? How severe is this problem? Maybe I’m making a mountain out of a molehill here based on my own experience with a few poorly-behaved datasets. But looking at the literature, it is hard to convince myself that most studies are adequately diagnosing convergence problems, and I can’t help but feel a bit unsettled by all of this.”

Frankly, I don’t think this issue has improved now, in 2015. I see the papers that I cite about corals or fungi, and really...we say very little. Ironically (or maybe not), the papers that are most explicit about their actual practices are in very low impact journals that are relevant for the very few people that care about these groups (e.g. The Lichenologist). I think this is related to the fact than in the NGS era you cannot publish a plain old phylogeny anymore. It's as if in our frenzy of spending tons of money for billions of reads, we've forgotten than 1) you need still to collect very difficult samples of obscure species and field work is still expensive, and 2) sometimes the best thing you can get is a bloody nuclear ribosomal marker. So... yeah.

In his post, Rabosky argues that we should for sure be using AWTY ("Are We There Yet?"... I know, it's awesome..), a “system for graphical exploration of Markov chain Monte Carlo (MCMC) convergence in Bayesian phylogenetic inference”. Unlike Tracer, AWTY actually looks at convergence in topology, the one thing that most of us care about!! On my own I tried to understand AWTY’s output based on the original paper and Rabosky et al’s comments in that post. At this point I think I kinda get it, but it’s just me being self-thought.

In any case, my current coral dataset is not well behaved and that gives me headaches. I did the relevant analyses for the paper back in 2013, and truth is, the MCMC chains were a disaster:

Funny enough, no matter what I did, the topology was always the same (and the same one as the RAxML tree)…Years later I came back to the data set, I run MrBayes with the same data block, and PUM! The horrible jumps are gone (and the topology was still the same)! It took me forever to understand what was the difference, silly me.

As it turns out, back in 2013 I was using MrBayes v. 3.2.1 in an old iMac. Now I’ve been using the new MrBayes 3.2.4 x64 (in a cluster). Maybe it’s just me, but among the million things that MrBayes authors might be working on, the one thing that was different and relevant for my study was the prior for the branch lengths! The new prior, described by Rannala et al (2012), seems to be definitely better than the old exponential distribution with rate of 10.0 (and therefore an expected mean branch length of 0.1). I look at my current tree and clearly the branches are not even close to 0.1, they are much shorter. I suspect that was messing up the estimates of the model of molecular evolution...

In any case, the real lessons here are:

- No, you cannot just go ahead and just use the default temperature and default priors so easily.

- We should reaaaally be reporting things like EES, PSRF, average standard deviation of split frequencies values, along with Tracer and AWTY plots.

There is more to life than the model chosen by JModelTest, people!

Anyway, check out Rannala et al’s paper. Even if you are not really familiar with the actual mathematics (I’m not, for one thing), you will learn a thing or two.

Another terrific paper that I recently encountered is Marchall et al. (2006). It explicitly talks about working with MrBayes, which I loved!

Rannala, B., Zhu, T., and Yang, Z. (2012). Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Molecular Biology and Evolution, 29(1):325-335.

Marchall, D. C., Simon, C., and Buckley, T. R. (2006) Accurate branch length estimation in partitioned Bayesian analyses requires accommodation of among-partition rate variation and attention to branch length priors. Systematic Biology, 55(6):993-1003.

Wednesday, January 7, 2015

How to collapse unsupported branches of trees into multichotomies using Ape in R

To collapse the unsupported branches of a phylogenetic tree into polytomies sounds actually pretty basic, but I couldn't find a satisfactory way (or code) around, except for this. There the user brice.server makes a suggestion, but it doesn't show any code for it.

So in here I put my own attempt, which I think is rather clumsy, but oh well...

First, I got a bit familiar with a phylo object, by reading for example this. Pay special attention to their explanation of tree$edge. There goes the code:

library(ape) # Load the Ape library
setwd("Path/to/my/trees") # Add here your path to the tree file

I was working with the output of RAxML, which is a Newick tree. It can be read like this:

tree <- read.tree("My_RAxML_tree.tre")

Or if you're working with a nexus file, like for example from MrBayes, you can call it:

tree <- read.nexus("My_MrBayes_tree.con")[[1]] # Extract only the first tree

Remember that in MrBayes, to get your consensus tree, you have to type conformat=simple when using sumt. Otherwise, ape won't see the branch support values (although FigTree will). So do (for example):

MrBayes > sumt relburnin = yes burninfrac = 0.25 conformat=simple

Anyway, back to getting rid of the unsupported branches:

# ==== Prune branches that have low support ====
# Which nodes have low support?
Badnodes <- which(as.numeric(tree$node.label) < 70) + length(tree$tip.label)

# The number of those nodes are equal to the position in node.label plus the number of species (the first n nodes)

# The branch lengths INDEXES of each node are tree$edge[,1], and tree$edge[,2] are the nodes names

Notice that in here I considered unsupported a branch with bootstrap values that were less than 70. You could instead use 0.95 for posterior probabilities, for instance.

# The actual indexes of the nodes with low support in tree$edge
Badnodes_indexes <- c()
for(node in Badnodes){
  Badnodes_indexes <- c(Badnodes_indexes, which(tree$edge[,2] == node))
}

I'm sure there is a much smarter way of doing that last part than a for loop, but I was lazy to check (suggestions?)

tree$edge.length[Badnodes_indexes] <- 0 # Make the branch length of the bad nodes = 0
tree_multi <- di2multi(tree) #Use di2multi function to convert the branches with lenght = 0, to multichotomies

Finally, save your tree in a Newick file:

write.tree(tree_multi, file = "My_edited_tree.tre")

Tuesday, October 7, 2014

How to use reads2snp with ALR files

Now that I'm back in Montpellier, I remembered that I wanted to write something about how to use an interesting program called reads2snp (Tsagkogeorga et al. 2012; Gayral et al. 2013, both open access :D)*.

"reads2snp is a SNP and genotype caller: it predicts the genotype of distinct individuals at distinct positions of a set of sequences based on read mapping / read counts."

The program was developed in the framework of the project PopPhyl, "in which transcriptome data are used to characterize the population genomics of non-model organisms." Unlike other programs, reads2snp makes no prior hypothesis regarding the population frequency of alleles, and it deals with biases in gene expression and hidden paralogy.

You can download it from PopPhyl's website here (at the time of writing, the available version was 2.0 march 2014). Once you unzip the file you already have the binaries for linux (there is also something for mac, but I won't be concerned with it, I guess it's similar). You can directly put the binary in your PATH, or use it as it is:

$ ./reads2snp_2.0.64.bin -v
reads2snp version = 2.0 march 2014

which prints the version. In the following I will assume the binary is in your PATH.

To see the options of the program, just type the name of the program, that I renamed to be just reads2snp:

$ reads2snp

Actually all you need to know is already in the README.md file that comes with the distribution. Unlike the older versions, as input you can give either BAM files or the so-called ALR files, which is a non-standard format explained at the end of the README.md. Because I need the ALR files in particular (the format has some information that I want), I used another program called bam2alr that you can get here.

To transform a single BAM file into an ALR (for just one individual)

$ bam2alr -r genome.fas sorted.bam > genome.alr

Or if you have several (sorted) bam files for different individuals you put their names in a single file. In my example all my files end with ".sorted.bam", so:

$ ls *.sorted.bam > bams

And then do

$ bam2alr -r genome.fas -b bams > genomes.alr

Now, from older versions to the new one, there are fewer options because I think they were trying to do the program simpler to use (and maybe less intimidating), while keeping the parameters that they normally use for PopPhyl as defaults.

So finally, to run it I do

$ reads2snp -alr genomes.alr -par 1 -opt newton -min 20 -th2 0.01 -tol -fis 0 -pre 0.001 th1 0.95 >> genomes.output

Notice that actually most of the parameters that I use are the defaults (since I'm following PopPhyl specifications) but I would like to make some comments. Either way you could just run it like this (it's the same):

$ reads2snp -alr genomes.alr -min 20 >> genomes.output

So the parameters:
[-par 0|1|2] In previous versions, -par didn't have options. These new version has three, tho, 0:no paraclean 1:stringent 2:fast. We definitely want Paraclean (the program involved in finding the hidden paralogs), and it seems that using option 2 is not worthy (I suspect they even took the option out).

[-opt bfgs|newton] The ML optimization method used by Paraclean. I think I tried at some point with the bfgs option in older versions and it didn't work for me. Probably now it works, but newton is the default for a reason.

[-min integer] The minimal number of reads to call a genotype. This is the only parameter that I don't keep as default (10). I guess I like to have some more confidence on my genotypes ;).

[-th2 float] Paraclean LRT p-val threshold use to be -thr in older versions.

[-acc|-tol|-for] multi alleles: accepted|tolerated|forbidden

[-fis positive float] assumed FIS value, the inbreeding coefficient of an individual (I) relative to the subpopulation (S). Notice that this assumes diploid individuals and Hardy-Weinberg equilibrium.

[-pre] precision on parameters; default=0.001

[-th1 float] the genotype posterior probability threshold (default=0.95) use to be -gpv in older versions.

* This program is NOT the same as discoSnp, which use to have a similar name (read2SNPs)

Saturday, October 4, 2014

Going to the beach in Montpellier

Today I went to one of the beaches near Montpellier. Just so I don't forget how to, I want just to describe how to get there.

These are instructions to get to Grand Travers beach on a Saturday. The bus times change depending on the day. Check this website for the time tables.

First, I took the Tram line 1 (the blue one) towards Odysseum. I got down at the station before Odysseum: Place de France.

There, I looked for a bus stop (there are at least two, the one towards Odysseum but on the other side of the tram line). The bus that I was looking for is number 106, and the ticket is 1.60€ one way. I took it at 11:30am. The normal transport card from Montpellier doesn't work for this.

Then I just waited for the stop that I was interested in (you have to hit Stop or the bus won't stop!). I think the bus can go all the way to La Grande Motte, which is supposed to be a nice beach, but I went down before that in the Grand Travers.

According to this website, Grand Travers is not so great, but I arrived in the morning (around 11:10am) in a cloudy day and it was pretty good. A couple of families here and there. The day got super good later, and then a lot of people appeared, but still, it was nice. The particular spot next to the bus stop is not close to any building, which can be good or bad depending on your needs. I think there are some restaurant somewhere, but I didn't look for them. There was one boy selling ice cream at some point. I would anyway recommend to bring your own food. That's what a lot of families and myself did.

I returned early, taking the bus at 16:22. But there are a few other buses later. Be careful because the buses don't run very often.

Saturday, September 6, 2014

War in the world of lichens

The quote of the day:

"(...) in fact there is war in the world of lichens. Following a brief consideration of parasitism and other forms of symbiosis, this war will be illustrated under the headings of algal slaves, exploiters of two kingdoms, alien invaders, cosy niche seekers and take-over specialists."

Richardson, D.H.S. (1999) War in the world of lichens: parasitism and symbiosis as exemplified by lichens and lichenicolous fungi, Mycol. Res. 103(6): 641-650.

Thursday, August 28, 2014

Careless lichenology

Today I was reading about taxonomy and lichens. I found something that I think is very fitting:

"(...) as stated by Santesson (1954), 'in no other group of plants is the taxonomic work carried out more carelessly than in lichenology'."

Honegger R (1983) The Ascus Apex in Lichenized Fungi IV. Baeomyces and Icmadophila in Comparison with Cladonia (Lecanorales) and the Non-Lichenized Leotia (Helotiales), Lichenologist 14: 205-217

I think the modern lichenologist are no longer like that, tho.