At last, all the Potomac River Seagrass samples have had their DNA extracted via MoBio Powersoil kits (reagents, tubes, and pipettes, oh my!). All the samples showed promising amounts of DNA going into PCR. Unfortunately, after doing the first 12 samples and finding only 4 that amplified, I’m a little discouraged. I still have the rest of the ~60 samples to go through, so hopefully there will be enough bacterial DNA to amplify in the rest of the samples to do some meaningful analyses. I suspect most of the DNA extracted was non-bacterial though, so it might be hard getting sequences from these plants.
We had a meeting today on updating each other on our projects. Here are my brief notes from the meeting:
Seagrass meeting 2/27/15:
It’s bigger, it’s better, it’s Biogeography 2!
About a year ago I started an Intra-plant biogeography project. Limited in scope, this project’s primary aim was to determine how much variation there was in the microbial communities across a single plant in “high resolution.” The goal was to determine whether it mattered where our ZEN collaborators cut their samples from along the roots and leaves.
The general project was this: Cut a plant into about 50 strategically chosen pieces and look at the community variation across the surface.
We got some really interesting results which I presented in a poster at the 2014 Lake Arrowhead Microbial Genomics Conference.
One thing that always bothered me about these results were that they were for only one plant. I didn’t know if the cool patterns I was seeing were normal or a fluke. That’s where Biogeography 2 comes in, it’s a continuation of the first project but with more replicates (five, to be precise) all collected at the same time and from the same place. In the coming weeks I’ll be processing these samples and updating you about the progress.
This week’s update:
This week I finally was able to
mutilate dissect the plants and now we can begin extracting DNA from the samples. Here are some pictures of plants prior to dissection.
For a plant that withstands daily tidal forces, seagrass are surprisingly delicate when taken out of water. When they dry out, they crumble so I try to section them as fast as possible to prevent drying.
Sample preparation includes painstakingly disentangling these roots from each other and from the shoots without breaking them. (About a 2 hour process per plant).
No word from the QIIME forum about my problem. So, I asked twitter for help.
So far, everyone thinks it has something to do with my path, BUT 1) macqiime sets the PYTHONPATH variable, and 2) the package it’s looking for exists in both macqiime python and anaconda.
OK, so it’s fixed now. There may be another way to deal with it, but what I did was install ipython notebook into the macqiime python folder, using get-pip.py to install pip, and then pip install ipython[notebook], and then comment out the line in my .bash_profile that points to the anaconda version of ipython.
Marisano James actually did all of the work for me, I asked him to summarize:
“When anaconda was installed, it added a path to its own ipython in the .bash_profile. Then, no matter what python was running, it would wind up using the anaconda version of ipython, which didn’t have the same settings as the system Python. I wound up renaming the anaconda folder (so it could no longer be found), and then commenting out the added line in ~/.bash_profile. Just commenting out the line in the ~/.bash_profile is sufficient, but I didn’t know anaconda’s ipython was being called until I effectively removed its folder. If you run into this problem, be sure to open a new terminal after commenting out the offending anaconda ipython line so it will be able to use the updated PATH.”
1. Downloaded the notebook from here:
2. immediate failure:
!validate_mapping_file.py -m $mapping_file
Traceback (most recent call last): File "/macqiime/QIIME/bin/validate_mapping_file.py", line 14, in <module> from qiime.util import parse_command_line_parameters, get_options_lookup,\ File "/macqiime/lib/python2.7/site-packages/qiime/util.py", line 26, in <module> import gzip File "/Users/Jenna/anaconda/lib/python2.7/gzip.py", line 10, in <module> import io File "/Users/Jenna/anaconda/lib/python2.7/io.py", line 51, in <module> import _io ImportError: dlopen(/Users/Jenna/anaconda/lib/python2.7/lib-dynload/_io.so, 2): Symbol not found: __PyInt_AsInt Referenced from: /Users/Jenna/anaconda/lib/python2.7/lib-dynload/_io.so Expected in: dynamic lookup
2. because my computer is so shiny and new, I don’t have any microsoft applications installed (yet?). I’m using “Numbers,” which I’ve never used before and does not have tab-delimited format as an export option. I’m hoping that the problem was that I was trying to use a .csv file for my mapping file. So, I converted it:
perl -pi -e ‘s/\,/\t/g’ ZEN.csv
and tried again. Nope!
3. tried googling this:
ImportError: dlopen Symbol not found
cannot understand what I see there, but it does convince me that it’s probably an anaconda problem
tried googling this:
Expected in: dynamic lookup anaconda
I felt pretty hopeful when the first hit was this:
but, that was an unresolved issue. I did notice that the issue might actually be with lib-dynload, whatever the hell that is.
So, googled this:
and ended up here:
something in there made it click that the path to python should be via macqiime, and you can see in the traceback that it starts off with the macqiime python, but then switches to the anaconda python. I always feel better about asking the community a question if I have some sense of what might be going on. So, I’ll post it on the QIIME forum now.
While I was writing the post, I thought to try to run the command from the command line instead of the notebook, and it worked, so that really helps narrow things down…
ALSO, while posting it, I realized that I haven’t update to QIIME 1.9 yet. Blach.
Yesterday, I installed macqiime, and today, I have just a few minutes to install IPython notebook.
Installation instructions are here:
pip install "ipython[notebook]"
This didn’t work because I don’t have pip. Instructions for installing pip:
It says that pip should be included with Python 2.7.9 and later, and (according to the handy qiime config output from yesterday) I know that I have 2.7.3.
It seems dumb to just install pip, because I’ll probably need to install lots of other things as well. I should probably upgrade to a later version of python, but in my experience, upgrading python breaks all sorts of things. However, I have nothing to lose, since this is a brand new machine, so I installed the Anaconda distribution of python.
…and now pip works and I have IPython notebook!
Within the span of 1 week, I set up my new super-powerful Mac Pro, we got all of the ZEN sequence data back, and QIIME version 1.9 is live! I also posted my IPython notebook for a basic QIIME analysis.http://jennomics.github.io/QIIMEbyJennomics/
Quite a confluence of events…
Anyway… I’m christening my new machine with QIIME.
Notes on macqiime install:
1. I went through the installation instructions, including the optional add-ons with no glitches here:
2. I ignored AmpliconNoise because I do not use 454 data.
3. I could not get Topiary Explorer to work. At first, there was a problem with the security, but I figured out how to add exceptions, but then it still didn’t work, and the error message said: “Unable to launch application.” Then, I clicked on the Details button, and I think this describes the problem:
Caused by: java.net.URISyntaxException: Relative path in absolute URI: file://topiaryexplorer1.0.jar
But, I’m not sure how to fix it, so I decided to move on and come back to Topiary Explorer when/if I need it.
4. In the bit about installing R, I noticed this:
Please note that even if you installed R and these libraries previously for MacQIIME 1.8.0, you should still upgrade to/install the latest version of R, 3.1.2, and re-install all these R packages to get everything working.
And, that’s how I learned that QIIME 1.9 was out there. BUT, it doesn’t look like macqiime has been updated, so it installed QIIME 1.8 instead. Maybe that’s because it’s a “release candidate” at this point? Anyway, I’ll have to go back and update QIIME somehow. Macqiime appears to be working. See below for the output of print_qiime_config.py -t
Python version: 2.7.3 (default, Dec 19 2012, 09:12:08) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
Python executable: /macqiime/bin/python
PyCogent version: 1.5.3
NumPy version: 1.7.1
matplotlib version: 1.1.0
biom-format version: 1.3.1
qcli version: 0.1.0
QIIME library version: 1.8.0
QIIME script version: 1.8.0
PyNAST version (if installed): 1.2.2
Emperor version: 0.9.3
RDP Classifier version (if installed): rdp_classifier-2.2.jar
Java version (if installed): Not installed.
QIIME config values
FAIL: test_ampliconnoise_install (__main__.QIIMEDependencyFull)
AmpliconNoise install looks sane.
Traceback (most recent call last):
File “/macqiime/QIIME/bin/print_qiime_config.py”, line 392, in test_ampliconnoise_install
“$PYRO_LOOKUP_FILE variable is not set. See %s for help.” % url)
AssertionError: $PYRO_LOOKUP_FILE variable is not set. See http://qiime.org/install/install.html#ampliconnoise-install-notes for help.
Ran 35 tests in 0.456s
The past couple weeks (maybe months?) I’ve been struggling with analyzing some fungal ITS data that we have for our Edge Effects side project. No one in our lab really specializes in fungal barcoding (or fungal anything) so we became sheep and followed the mainstream path. We amplified the ITS region, between the small subunit and large subunits of RNA, which was to our knowledge the “chosen one” for fungal barcoding, using ITS1F and ITS2 primers. ITS appears to serves its purpose in terms of detailed classification (family/genus taxonomic levels) but it is definitely not a perfect barcode – for one ITS reads cannot be aligned (perhaps due to too much variation between reads, insertions, deletions, length variation, etc) which makes the reads useless by themselves for phylogenetic approaches.
Before this particular dataset fell into my hands, it was in Jenna’s and when issues with the ITS dataset arose, she turned to twitter for answers (part 1 and part 2). The conclusion – due to our desire for phylogenetic analysis it is highly likely that future fungal analysis will not be done using ITS as ultimately we care more about phylogeny than taxonomy.
That is great – but we still have our ITS dataset, what do we do with it?
I essentially did what they do here in this tutorial which I of course found after figuring out what to do from scratch. I used the UNITE ITS database to cluster my forward unmerged reads into OTUs in QIIME using UCLUST. I also used UCLUST to assign taxonomy (because it was the default option). I then did some basic filtering using filter_taxa_from_otu_table.py and filter_otus_from_otu_table.py to remove singletons, mitochondria, chloroplasts and unassigned (at kingdom level) taxa. This is where things began to go wrong (if they weren’t already wrong to start with).
I summarized my biom table using biom summarize-table and I saw this:
Std. dev.: 201.292
What happened to all my sequences?? Better yet, are there even fungi on seagrass? Is what we are seeing the result of low fungal biomass????
Let the investigation begin. I decided to look at what my biom table looked like before I filtered out the unassigned reads. This is what I saw.
Std. dev.: 9287.627
Now, that looks a bit better… except that the “unassigned” reads could be anything (seagrass, jellyfish, bacteria, fungi, sponges, etc). Since we want to do a “fungal” analysis this just won’t do. So to investigate further, I downloaded NCBI’s nucleotide “nt” database. Approx ~4250 OTU’s in my dataset were classified as “Unassigned” so I pulled these out and locally blasted them against the “nt” database to get some idea of their taxonomy. What I found was that my “Unassigned” OTUs were seagrass, jellyfish, bacteria, sponges and lots and lots of uncultured fungi. Of my ~4250 OTU’s, ~3250 hit something in the “nt” database and ~700 of hit something with >70% identity over >70% of the query OTU length. So there are obviously fungi (or fungi-like sequences) in my dataset that aren’t being identified using the method for taxonomic assignment I’ve been using (UCLUST & UNITE).
On a whim while writing this blog post about the dreary nature of ITS, I took a second look at the earlier mentioned tutorial. On the surface, it looks identical to what I did with my dataset (reassuring), but I then noticed they were using a mysterious parameter file. Perhaps this parameter file was filled with rainbows, pixie dust and unicorns that could solve all my fungal problems? To investigate further, I downloaded and took a peak at this mythical parameter file. Cue dramatic music. Low and behold, they are using the “blast” method for taxonomic assignment over UCLUST. So I thought what the heck, I’ll try anything at this point to make this fungal data usable, let’s give it a go. Of course (because this is how my life seems to be going recently) using the “blast” method of taxonomic assignment worked like magic. My new biom table summary (and this is after removing OTUs with “No blast hit”) looks like this:
Std. dev.: 9146.485
According to the log file, using the “blast” method 4717 sequences were inspected and only 1796 could not be identified. This is a huge improvement from before where ~4250 were “Unassigned”. I will note here, that upon investigating the blast assigned taxonomies, I do see a lot of unidentified fungi so this solution might not work for you if you care about specific taxonomy. I still have to analyze this new biom table which since I can’t use phylogenetic approaches will be its own hurdle, but at least I have enough truly “fungal” data to analyze now. Thinking back on all of my struggles, I am so incredibly angry that one silly QIIME parameter was what was keeping me from moving forward. Even before this I was wary of what the default QIIME script options meant for my data, but moving forward I’ll be even more vigilant in my choice of programs and parameters. This entire situation is equal parts ridiculous, embarrassing, frustrating and dumb luck. Perhaps, the craziest part is that had I not decided to write a blog post about my problems with ITS, I would never have found the solution to this particular problem. I can’t be the only one to ever have had this issue – is this some well kept mycologist secret method to ITS success? My hope is that by writing this blog post, I can save others from weeks (or months) of mental anguish over poor quality ITS taxonomic classification when the answer is hidden (or not so hidden) away in a silly parameter file.
Right now, we have a lot of Zostera marina microbiome samples from around the world. So, pairing that with the ZEN data, we should have a pretty nice ecological/biogeographical story, and hopefully we will soon have a postdoc to help us address questions about community assembly with those data.
Now, my attention is turning more earnestly towards the big evolutionary questions, and how to obtain the data we need to answer those. For the most part, because we are plugged in to a nice network of seagrass researchers, I don’t feel like getting all of the seagrass species that we’ll need is going to be too difficult. However, we are kinda lost when it comes to the fresh and brackish water and terrestrial relatives. I wouldn’t say that I’m panicking about it yet, but I am starting to feel like the right way to tackle the problem of collecting those samples is going to be to do it myself. In order to ask for help in collecting those samples, I’d have to:
1. Make a list of target species.
2. Find out their ranges and who is likely to have access to samples.
3. Contact the person/s who might be willing to go grab a sample for me.
4. Wait (hope) for that person to get back to me.
5. Send sampling supplies, and hope that they will be able to freeze samples for me (because the Zymo buffer is pretty sucky.)
6. Wait for the samples to be collected and returned to me.
The problem is that doing this for all of the target species could take forever. I have control over 1-3, but absolutely none over 4-6.
In the past, I’ve had success taking epic road trips and meeting researchers along the way who were willing to help me collect local species. So, I’m thinking that an approach like that might work here.
Here’s a list of the taxa of which I’d like to have representatives, and a tree of them below. I’ll start compiling range and contact information.
-Baidellia (or Baldellia)
The Seagrass Microbiome Project is looking for a Postdoc!
Postdoctoral Position in Microbial Ecology and Evolution
Jessica Green at the University of Oregon (http://pages.uoregon.edu/green/) is currently seeking a postdoctoral researcher to collaborate on the Seagrass Microbiome Project (http://seagrassmicrobiome.org). Applicants should have a Ph.D. in a biological, computational, mathematical, or statistical field and strong writing skills. The ideal candidate will have experience developing and applying models to understand the ecology, evolution, and/or function of complex systems. Experience in the analysis of environmental sequence data is highly desirable, but not required.
The successful candidate will have the opportunity to creatively and independently tackle one or more of the science questions outlined in the Seagrass Microbiome Project grant proposal (https://seagrassmicrobiome.org/2014-grant-proposal/), funded by the Gordon and Betty Moore Foundation. The successful candidate will interact regularly with team members Jonathan Eisen (http://phylogenomics.wordpress.com), Jay Stachowicz http://www-eve.ucdavis.edu/stachowicz/stachowicz.shtml, and Jenna Lang (http://jennomics.com/) at the University of California, Davis through weekly tele-conferencing and also through regular visits to the UC Davis campus. At the University of Oregon, the candidate will benefit from ongoing microbiome research programs including the Microbial Ecology and Theory of Animals Center for Systems Biology (http://meta.uoregon.edu/) and the Biology and Built Environment Center (http://biobe.uoregon.edu/).
The position is available for 1 year with the possibility for renewal depending on performance. The start date is flexible. Please email questions regarding the position to Jessica Green (firstname.lastname@example.org).
A complete application will consist of the following materials:
(1) a brief cover letter explaining your background and career interests
(2) CV (including publications)
(3) names and contact information for three references
Submit materials to email@example.com. Subject: Posting 14431
To ensure consideration, please submit applications by March 10, 2015, but the position will remain open until filled.
Women and minorities encouraged to apply. We invite applications from qualified candidates who share our commitment to diversity.
The University of Oregon is an equal opportunity, affirmative action institution committed to cultural diversity and compliance with the ADA. The University encourages all qualified individuals to apply, and does not discriminate on the basis of any protected status, including veteran and disability status.