rOpenSci | emldown - From machine readable EML metadata to a pretty documentation website

emldown - From machine readable EML metadata to a pretty documentation website

How do you get the maximum value out of a dataset? Data is most valuable when it can easily be shared, understood, and used by others. This requires some form of metadata that describes the data. While metadata can take many forms, the most useful metadata is that which follows a standardized specification. The Ecological Metadata Language (EML) is an example of such a specification originally developed for ecological datasets. EML describes what information should be included to describe the data, and what format that information should be represented in.

There are many benefits to writing metadata according to a specification like EML:

  • Understand data: thorough metadata allows you, your collaborators, and anyone else to learn more about what the data represents, when/where/how it was collected, and how it can be used.
  • Discover data: data repositories like the Knowledge Network for Biocomplexity use EML to allow users to search for datasets based on things like creator, location, date, or species.
  • Integrate data: with structured, machine-readable metadata describing exactly what the fields in a dataset represent, it is possible to combine multiple datasets, even if the datasets themselves have different structure.

Despite these benefits, it’s not always fun to write standardized metadata. Although there’s a very good package for helping you create the EML, rOpenSci’s EML package, documenting the data can be quite tedious. Furthermore, before you share the data on a public repository that enforces EML, the only prize you get is a happy conscience, which isn’t very tangible. In our unconf project, we created immediate gratification for EML users: a package that transforms the non-human readable EML file into a pretty documentation website for any dataset!

What’s EML, exactly, and why can’t humans read it?

As we mentioned above, EML is a metadata standard originally created for the ecologists. In practice it’s a set of XML schema documents, telling you what you need to document (e.g. the dataset creator, geographic coverage of the data, etc.) and how to build those documents (format of the XML). The EML R package provides tools to create and read EML without needing to learn about XML thanks to helper functions and extensive documentation. Although its name contains the word ecological, one can use EML for documenting datasets from any field. One of our team members uses it for documenting epidemiological datasets, because everything she wants to document is present in the standard.

After creating EML documentation for your data, you get an XML document that’s, well… not very human readable.

raw eml

In the EML package there’s a function called eml_view relying on the listviewer package to produce an interactive view of the XML in the Viewer pane of say, RStudio, which allows one to check some things quickly but which is far from being a user-friendly representation of the metadata. Our goal with emldown was to bridge this gap and provide a quick and easy way to take EML and convert it to a more user-friendly web page.

What does emldown do with your EML?

After installing the package from Github, you can apply it to your EML…

devtools::install_github("ropenscilabs/emldown")
library("emldown")
render_eml(path_to_my_eml)

and you get something like this!

emldown

This format is much more likely to make you and your collaborators happy because it’s more engaging and easier to explore to find useful information. Note how little effort you needed to invest into making it! Viewing your metadata in this way makes it easy to read, and easy to spot any errors.

The resulting website is based on Bootstrap and has some interactive components:

demo1

Geographic information turns into a map, made using leaflet:

demo2

Right now, we are able to capture some of the most common parts of Ecological Metadata Language, including the Title, Abstract, Authors, Keywords, Coverage (where in space and time the samples were taken), the Data Tables and Units associated with these. Over time we plan to add support for additional components of the EML specification.

demo3

You can see a live example of a website created with emldown here

How can you contribute?

Use the package to transform the EML you have lying around on your PC into a pretty website, and if you find a bug while doing so we’ll be happy to tackle it! Report any issue or feature request here, or feel free to contribute with code.

We’re very happy to have been able to create this working package in two days (and thankful for that opportunity, thanks rOpenSci!); and we not-so-secretly hope that it will contribute to making writing good metadata more attractive and therefore more common.