rOpenSci | Databases

Databases

Work with Databases From R
Showing 10 of 12

Archive and Unarchive Databases Using Flat Files

Carl Boettiger
Description

Flat text files provide a robust, compressible, and portable way to store tables from databases. This package provides convenient functions for exporting tables from relational database connections into compressed text files and streaming those text files back into a database without requiring the whole table to fit in working memory.

View Documentation
dittodb
CRAN

A Test Environment for Database Requests

Jonathan Keane
Description

Testing and documenting code that communicates with remote databases can be painful. Although the interaction with R is usually relatively simple (e.g. data(frames) passed to and from a database), because they rely on a separate service and the data there, testing them can be difficult to set up, unsustainable in a continuous integration environment, or impossible without replicating an entire production cluster. This package addresses that by allowing you to make recordings from your database interactions and then play them back while testing (or in other contexts) all without needing to spin up or have access to the database your code would typically connect to.

View Documentation

NoSQL Database Connector

Scott Chamberlain
Description

Simplified document database manipulation and analysis, including support for many NoSQL databases, including document databases (Elasticsearch, CouchDB, MongoDB), key-value databases (Redis), and (with limitations) SQLite/json1.

View Documentation
virtuoso
CRAN Peer-reviewed

Interface to Virtuoso using ODBC

Carl Boettiger
Description

Provides users with a simple and convenient mechanism to manage and query a Virtuoso database using the DBI (Data-Base Interface) compatible ODBC (Open Database Connectivity) interface. Virtuoso is a high-performance “universal server,” which can act as both a relational database, supporting standard Structured Query Language (SQL) queries, while also supporting data following the Resource Description Framework (RDF) model for Linked Data. RDF data can be queried using SPARQL (SPARQL Protocol and RDF Query Language) queries, a graph-based query that supports semantic reasoning. This allows users to leverage the performance of local or remote Virtuoso servers using popular R packages such as DBI and dplyr, while also providing a high-performance solution for working with large RDF triplestores from R. The package also provides helper routines to install, launch, and manage a Virtuoso server locally on Mac, Windows and Linux platforms using the standard interactive installers from the R command-line. By automatically handling these setup steps, the package can make using Virtuoso considerably faster and easier for a most users to deploy in a local environment. Managing the bulk import of triples from common serializations with a single intuitive command is another key feature of this package. Bulk import performance can be tens to hundreds of times faster than the comparable imports using existing R tools, including rdflib and redland packages.

View Documentation

Client for the Comprehensive Knowledge Archive Network (CKAN) API

Scott Chamberlain
Description

Client for CKAN API (https://ckan.org/). Includes interface to CKAN APIs for search, list, show for packages, organizations, and resources. In addition, provides an interface to the datastore API.

Scientific use cases
  1. White, L., & Santy, S. (2018). DataDepsGenerators.jl: making reusing data easy by automatically generating DataDeps.jl registration code. Journal of Open Source Software, 3(31), 921. https://doi.org/10.21105/joss.00921
View Documentation

General Purpose Interface to Elasticsearch

Scott Chamberlain
Description

Connect to Elasticsearch, a NoSQL database built on the Java Virtual Machine. Interacts with the Elasticsearch HTTP API (https://www.elastic.co/elasticsearch/), including functions for setting connection details to Elasticsearch instances, loading bulk data, searching for documents with both HTTP query variables and JSON based body requests. In addition, elastic provides functions for interacting with API’s for indices, documents, nodes, clusters, an interface to the cat API, and more.

View Documentation

Connector to CouchDB

Scott Chamberlain
Description

Provides an interface to the NoSQL database CouchDB (http://couchdb.apache.org). Methods are provided for managing databases within CouchDB, including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local CouchDB instance, or a remote CouchDB databases such as Cloudant. Documents can be inserted directly from vectors, lists, data.frames, and JSON. Targeted at CouchDB v2 or greater.

View Documentation
datastorr

Simple Data Versioning

Rich FitzJohn
Description

Simple dataversioning using GitHub to store data.

Scientific use cases
  1. Falster, D. S., FitzJohn, R. G., Pennell, M. W., & Cornwell, W. K. (2019). Datastorr: a workflow and package for delivering successive versions of “evolving data” directly into R. GigaScience, 8(5). https://doi.org/10.1093/gigascience/giz035
View Documentation
sparqldsl
Staff maintained

SPARQL DSL Client

Scott Chamberlain
Description

SPARQL DSL Client.

View Documentation
MonetDBLite

In-Process Version of MonetDB

Hannes Mühleisen
Description

An in-process version of MonetDB, a SQL database designed for analytical tasks. Similar to SQLite, the database runs entirely inside the R shell.

View Documentation
rrlite
Peer-reviewed

R Bindings to rlite

Rich FitzJohn
Description

R bindings to rlite. rlite is a “self-contained, serverless, zero-configuration, transactional redis-compatible database engine. rlite is to Redis what SQLite is to SQL.”.

View Documentation