Category Archives: Project Updates

installing macqiime

Within the span of 1 week, I set up my new super-powerful Mac Pro, we got all of the ZEN sequence data back, and QIIME version 1.9 is live! I also posted my IPython notebook for a basic QIIME analysis.

Quite a confluence of events…

Anyway… I’m christening my new machine with QIIME.

Notes on macqiime install:

1. I went through the installation instructions, including the optional add-ons with no glitches here:

2. I ignored AmpliconNoise because I do not use 454 data.

3. I could not get Topiary Explorer to work. At first, there was a problem with the security, but I figured out how to add exceptions, but then it still didn’t work, and the error message said: “Unable to launch application.” Then, I clicked on the Details button, and I think this describes the problem:

Caused by: Relative path in absolute URI: file://topiaryexplorer1.0.jar

But, I’m not sure how to fix it, so I decided to move on and come back to Topiary Explorer when/if I need it.

4. In the bit about installing R, I noticed this:

Please note that even if you installed R and these libraries previously for MacQIIME 1.8.0, you should still upgrade to/install the latest version of R, 3.1.2, and re-install all these R packages to get everything working.

And, that’s how I learned that QIIME 1.9 was out there. BUT, it doesn’t look like macqiime has been updated, so it installed QIIME 1.8 instead. Maybe that’s because it’s a “release candidate” at this point? Anyway, I’ll have to go back and update QIIME somehow. Macqiime appears to be working. See below for the output of -t

System information
Platform:    darwin
Python version:    2.7.3 (default, Dec 19 2012, 09:12:08)  [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
Python executable:    /macqiime/bin/python

Dependency versions
PyCogent version:    1.5.3
NumPy version:    1.7.1
matplotlib version:    1.1.0
biom-format version:    1.3.1
qcli version:    0.1.0
QIIME library version:    1.8.0
QIIME script version:    1.8.0
PyNAST version (if installed):    1.2.2
Emperor version:    0.9.3
RDP Classifier version (if installed):    rdp_classifier-2.2.jar
Java version (if installed):    Not installed.

QIIME config values
blastmat_dir:    None
sc_queue:    all.q
topiaryexplorer_project_dir:    None
pynast_template_alignment_fp:    /macqiime/greengenes/core_set_aligned.fasta.imputed
cluster_jobs_fp:    /macqiime/QIIME/bin/
pynast_template_alignment_blastdb:    None
assign_taxonomy_reference_seqs_fp:    /macqiime/greengenes/gg_13_8_otus/rep_set/97_otus.fasta
torque_queue:    friendlyq
template_alignment_lanemask_fp:    /macqiime/greengenes/lanemask_in_1s_and_0s
jobs_to_start:    1
cloud_environment:    False
qiime_scripts_dir:    /macqiime/QIIME/bin/
denoiser_min_per_core:    50
working_dir:    /tmp/
python_exe_fp:    /macqiime/bin/python
temp_dir:    /tmp/
blastall_fp:    blastall
seconds_to_sleep:    60
assign_taxonomy_id_to_taxonomy_fp:    /macqiime/greengenes/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt
FAIL: test_ampliconnoise_install (__main__.QIIMEDependencyFull)
AmpliconNoise install looks sane.
Traceback (most recent call last):
File “/macqiime/QIIME/bin/”, line 392, in test_ampliconnoise_install
“$PYRO_LOOKUP_FILE variable is not set. See %s for help.” % url)
AssertionError: $PYRO_LOOKUP_FILE variable is not set. See for help.

Ran 35 tests in 0.456s

FAILED (failures=1)

Fungal ITS Taxonomy Problem: SOLVED (for now)

The past couple weeks (maybe months?) I’ve been struggling with analyzing some fungal ITS data that we have for our Edge Effects side project. No one in our lab really specializes in fungal barcoding (or fungal anything) so we became sheep and followed the mainstream path. We amplified the ITS region, between the small subunit and large subunits of RNA, which was to our knowledge the “chosen one” for fungal barcoding, using ITS1F and ITS2 primers. ITS appears to serves its purpose in terms of detailed classification (family/genus taxonomic levels) but it is definitely not a perfect barcode – for one ITS reads cannot be aligned (perhaps due to too much variation between reads, insertions, deletions, length variation, etc) which makes the reads useless by themselves for phylogenetic approaches.

Before this particular dataset fell into my hands, it was in Jenna’s and when issues with the ITS dataset arose, she turned to twitter for answers (part 1 and part 2). The conclusion – due to our desire for phylogenetic analysis it is highly likely that future fungal analysis will not be done using ITS as ultimately we care more about phylogeny than taxonomy.

That is great – but we still have our ITS dataset, what do we do with it?

I essentially did what they do here in this tutorial which I of course found after figuring out what to do from scratch. I used the UNITE ITS database to cluster my forward unmerged reads into OTUs in QIIME using UCLUST. I also used UCLUST to assign taxonomy (because it was the default option). I then did some basic filtering using and to remove singletons, mitochondria, chloroplasts and unassigned (at kingdom level) taxa. This is where things began to go wrong (if they weren’t already wrong to start with).

I summarized my biom table using biom summarize-table and I saw this:

Counts/Sample summary:
Min: 0.0
Max: 838.0
Median: 30.000
Mean: 100.512
Std. dev.: 201.292

What happened to all my sequences?? Better yet, are there even fungi on seagrass? Is what we are seeing the result of low fungal biomass????

Let the investigation begin. I decided to look at what my biom table looked like before I filtered out the unassigned reads. This is what I saw.

Min: 14.0
Max: 48889.0
Median: 2847.000
Mean: 6001.653
Std. dev.: 9287.627

Now, that looks a bit better… except that the “unassigned” reads could be anything (seagrass, jellyfish, bacteria, fungi, sponges, etc). Since we want to do a “fungal” analysis this just won’t do. So to investigate further, I downloaded NCBI’s nucleotide “nt” database. Approx ~4250 OTU’s in my dataset were classified as “Unassigned” so I pulled these out and locally blasted them against the “nt” database to get some idea of their taxonomy. What I found was that my “Unassigned” OTUs were seagrass, jellyfish, bacteria, sponges and lots and lots of uncultured fungi. Of my ~4250 OTU’s, ~3250 hit something in the “nt” database and ~700 of hit something with >70% identity over >70% of the query OTU length.  So there are obviously fungi (or fungi-like sequences) in my dataset that aren’t being identified using the method for taxonomic assignment I’ve been using (UCLUST & UNITE).

On a whim while writing this blog post about the dreary nature of ITS, I took a second look at the earlier mentioned tutorial. On the surface, it looks identical to what I did with my dataset (reassuring), but I then noticed they were using a mysterious parameter file. Perhaps this parameter file was filled with rainbows, pixie dust and unicorns that could solve all my fungal problems? To investigate further, I downloaded and took a peak at this mythical parameter file. Cue dramatic music. Low and behold, they are using the “blast” method for taxonomic assignment over UCLUST. So I thought what the heck, I’ll try anything at this point to make this fungal data usable, let’s give it a go. Of course (because this is how my life seems to be going recently) using the “blast” method of taxonomic assignment worked like magic. My new biom table summary (and this is after removing OTUs with “No blast hit”) looks like this:

Min: 7.0
Max: 47199.0
Median: 2344.000
Mean: 5441.093
Std. dev.: 9146.485

According to the log file, using the “blast” method 4717 sequences were inspected and only 1796 could not be identified. This is a huge improvement from before where ~4250 were “Unassigned”. I will note here, that upon investigating the blast assigned taxonomies, I do see a lot of unidentified fungi so this solution might not work for you if you care about specific taxonomy. I still have to analyze this new biom table which since I can’t use phylogenetic approaches will be its own hurdle, but at least I have enough truly “fungal” data to analyze now. Thinking back on all of my struggles, I am so incredibly angry that one silly QIIME parameter was what was keeping me from moving forward. Even before this I was wary of what the default QIIME script options meant for my data, but moving forward I’ll be even more vigilant in my choice of programs and parameters. This entire situation is equal parts ridiculous, embarrassing, frustrating and dumb luck. Perhaps, the craziest part is that had I not decided to write a blog post about my problems with ITS, I would never have found the solution to this particular problem. I can’t be the only one to ever have had this issue – is this some well kept mycologist secret method to ITS success? My hope is that by writing this blog post, I can save others from weeks (or months) of mental anguish over poor quality ITS taxonomic classification when the answer is hidden (or not so hidden) away in a silly parameter file.