rOpenSci | USGS App Contest

USGS App Contest

Many US federal agencies are now running app competitions to highlight their web services (see here), and hopefully get people to build cool stuff using government data (see Data.gov for more). See here for a nice list of the US government’s web services.

One of these agencies was the United States Geological Survey (USGS). They opened up an app competition and [we won best overall app! Check out our app called TaxaViewer here: http://glimmer.rstudio.com/ropensci/usgs_app/. We were directed to use one or more of their web services, including mashing up with other web services. Of the USGS web services, we only used ITIS, but included 4 web services from other providers.

The motivation behind our app

At rOpenSci, we want to create tools to facilitate open science. Making an app that replicates nearly exactly what you can do in R on your own machine, in a replicable fashion, is what we were aiming for. Although we could have made an app in a modern web framework like Rails or Django, we tried to implement an app appropriate for our target user base. Our TaxaViewer app is a web app, but we see it largely as a way to get users to see the power of using web APIs to do science. There is a lot of data out there - we want to show scientists that they can get that data as part of a reproducible science workflow - and it doesn’t have to be hard!

Tools we used to make the app

  • Shiny, from RStudio, is a framework to build web applications using R as the backend. It’s not just R code though - you can include CSS, Javascript, etc. This was the framework for our app - so yes, the backend is indeed R.
  • ITIS - Integrated Taxonomic Information Database API - The ITIS API provides access to a large data store of taxonomic names, synonyms, higher taxonomy, etc, and exposes a lot of different API methods. You can interact with the ITIS API via our R pacakge taxize - read more about taxize below.
  • Phylotastic TNRS API - This comes from the Phylotastic/iPlant, and allows you to query against a database of plant names, mammal names, and the NCBI names database.
  • GBIF - Global Biodiversity Information Facility API - You can interact with the GBIF API via our R pacakge gbif - read more about rgbif below.
  • Phylomatic API - The phylomatic API allows you to query a series of different master plant phylogenies. Essentially, you send a species list, and it searches for matches in a master tree, and then prunes the tree to only give you the phylogeny of the species you sent. This sort of functionality is only easily available for plants right now (thus only plant queries in our ‘Phylogeny’ tab in the app), but should be coming shortly to other taxonomic groups thanks to Phylotastic and Open Tree of Life.
  • GISD - Global Invasive Species Database - There is little available for thorough information available via web APIs for information on invasive status of species. However the GISD does have some information, which you can access via our R package taxize.

Future plans

the TaxaViewer app

We plan to improve upon the current form of the app in a few ways.

  • Improved stability and performance: We will host it on our own server soon, where we can control performance.
  • Improved interface: The interface could use some work, including the input box (it’s too small).
  • Downloads: We can easily allow folks to download data from any of the tabs in the app - something to add.
  • Autocomplete for the species names input box: Using typeahead.js is a super easy to implement tool for allowing users to have feedback on possible options. We’ll attempt to incorporate into our Shiny app.
  • Your ideas?

taxize/ITIS

Soon we will be integrating the ability to easily (with one parameter) use local SQL to query your local copy of the ITIS database instead of calling the ITIS web API. This should result in vastly faster queries, without leaving the comfort of your R command line (i.e., you don’t have to know SQL :)). You can preview these some of these changes in the sql branch of taxize here.

BISON

BISON is a service just released from the USGS, that serves up United States species occurrence data. They have an API too here. They are a node of GBIF, so there will be some overlap in data available between BISON and GBIF - perhaps there will be some data available in BISON that is not available in GBIF? BISON does serve up maps; I think GBIF only serves up the data itself. We may utilize BISON web services if we think people will use it through R.