Welcome to the web home of the Stoltzfus research group at The Institute Formerly Known as CARB.

About Us

The Computational and Analytical Molecular Evolution Lab (camel) is the home of the Stoltzfus group at IBBR.  We use our minds and our computers to address issues in evolutionary genetics, molecular evolution, and bioinformatics.  We are especially interested in one particular scientific challenge, which is to understand the role of mutation as an evolutionary cause, and one particular techological (in some respects) challenge, which is to improve the interoperability of software and data resources used by evolutionary researchers

Phylotastic - making the Tree of Life useful (really) for scientists

An analysis of the re-use of phylogenetic trees in the research literature (Stoltzfus, et al., in progress-- see the story on it) showed some notable patterns.   Trees aren't archived very often, and so they aren't re-used very often.  We expected that.  

But we found something we didn't expect-- a case of re-use so frequent that, in a targeted sample of 40 phylogeny-related studies, we found 5 studies that use the same tree from the same source-- "Phylomatic" (Webb & Donoghue).  Phylomatic provides the APG (Angiosperm Phylogeny Group) "megatree" with ~100K nodes for flowering plants, and provides operations for pruning and grafting.  The result is that users with a list of, e.g., 285 plant species, can go to Phylomatic and get a species tree for their precise set of 285 species. 

Best practices for scientific programmers - top ten

Today I'm teaching a session on "Best Practices" in a "Programming for Biologists" course.  My course materials are online (feedback from other instructors is welcome).  I'll start out with my "top ten" list:

  • Interface, interface, interface
  • Modularize
  • Write code to be understood
  • Write tests and trap errors
  • Stamp your output
  • Use revision control
  • Make use of prior art
  • Create an installable package
  • Make your project open source
  • Set up a project management infrastructure

The first 3 are universally important, not just for scientists.  For scientists without formal training, I would stress the practice of designing interfaces before writing the "guts" of the code.  To stress the importance of interfaces, we have an exercise to write the skeleton of a script that has only 1 line of bioinformatics (going to NCBI to get something), and all the rest is interface-- including command-line options, help message, internal documentation, output-translation.  

To prioritize the others, we have to take into account the typical conditions of scientific programming.  In my experience, the typical scientific software product:  

Syndicate content