sburns.org

strong opinions,
loosely held

sburns.org

Intro to the REDCap API

As far as I can tell, there isn’t a tutorial on the internet about how to use the REDCap API. So here goes…

REDCap is an advanced web-based application for securely storing and retrieving tabular data. In simple terms, it can be thought of a web-based spreadsheet, though it is much more than that. It provides an Application Programming Interface which means external software can programmatically download and upload data into REDCap Projects. This tutorial assumes working knowledge of REDCap. When all else fails, please consult your site’s API help page, which is at here for Vanderbilt.

Becuase the API is based on simple HTTP requests, any programming langauge with a HTTP library can use the REDCap API. I’m going to demonstrate simple API usage in python using the wonderful requests library.

To use the REDCap API, you must know the following:

The API URL for your site’s REDCap installation. For Vanderbilt, this url is https://redcap.vanderbilt.edu/api/.
The API token for your Project. A token is generated by the REDCap administrators and connects your user account to a particular REDCap Project. Therefore, if you have API access to many Projects, you will have many tokens to manage.

Basic Usage

Every call to the REDCap API is a HTTP POST request with specific parameters in the payload. The token parameter is always required as this tells the API from which Project you’re requesting a response. Next, the content parameter is used to declare the type of request you’re making. Finally, you may want to include the format field as well as this tells the API in what format you want the response. It defaults to returning a CSV string, but I generally prefer getting json-formatted responses as that format can be easily converted to actual in-memory objects like lists, dictionaries, strings, etc.

So let’s begin by making the most simple request, exporting the Project’s Metadata (AKA Data Dictionary).

from requests import post
# Two constants we'll use throughout
TOKEN = '8E66DB6844D58E990075AFB51658A002'
URL = 'https://redcap.vanderbilt.edu/api/'

payload = {'token': TOKEN, 'format': 'json', 'content': 'metadata'}

response = post(URL, data=payload)
print response.status_code
200

A few things to talk about here:

At least at Vanderbilt, don’t forgot the trailing slash at the end of the API URL string. Your site may differ but if you mess up the URL, nothing will work and you’ll probably get 501 “Method Not Implemented” responses.
Under no circumstance should you ever publicize your Project token. This is like publishing the password you use to login to REDCap, which you would never do. In this instance however, this token is from a dummy project I use to test things with. There’s no real data and definitely not any PHI in it, so I’m not super worried.

But just to be clear:

Under no circumstances should you publicize your project token(s)!

You’ve been warned. (If you do publicize them for whatever reason, don’t fret. Just delete those tokens through the web app ASAP and request new tokens).

With that out of the way, the API accepted our request and returned data with a ‘200’ status, which means “everything is peachy” in HTTP.

Now let’s examine our metadata a bit. The .json() method I’m going to use just decodes the response (every language’s JSON library will work a bit differently, though).

metadata = response.json()
print "This project has %d fields" % len(metadata)
print
print "field_name (type) ---> field_label"
print "---------------------------"
for field in metadata:
    print "%s (%s) ---> %s" % (field['field_name'], field['field_type'], field['field_label'])
print
print 'Every field has these keys: %s' % ', '.join(sorted(metadata[0].keys()))

This project has 11 fields

field_name (type) ---> field_label
---------------------------
study_id (text) ---> Study ID
first_name (text) ---> First Name
last_name (text) ---> Last Name
dob (text) ---> Date of Birth
sex (dropdown) ---> Gender
address (notes) ---> Street, City, State, ZIP
phone_number (text) ---> Phone number
file (file) ---> File
foo_score (text) ---> Test score for Foo test
bar_score (text) ---> Test score for Bar test
image_path (text) ---> image_path

Every field has these keys: branching_logic, custom_alignment, field_label, field_name, field_note, field_type, form_name, identifier, matrix_group_name, question_number, required_field, section_header, select_choices_or_calculations, text_validation_max, text_validation_min, text_validation_type_or_show_slider_number

The returned json decodes to a list of dict objects (python’s name for hash tables). We see that there are 11 fields in this project, we print out a mapping of the field_name (the “machine” name for a field) along with it’s type and the field_label (the human-readable description). Finally, I just print out all of the keys from the first field so we can look at all of the data that comes with each field.

For all intents and purposes, this data structure is what we get when we manually download the Data Dictionary from our project, just in a slightly more machine-readable format.

Data Export

Here’s the fun part. Just tweak the request payload a little and we’ll download all of the data from our project:

payload['content'] = 'record'
payload['type'] = 'flat' # we want each row to contain the entire record
response = post(URL, data=payload)
data = response.json()

Voilà, we’ve just downloaded all of the data from our project. Let’s examine it.

print "This project has %d records" % len(data)

print "Each record has the following keys: %s." % ', '.join(data[0].keys())
print
print "But our metadata structure has the following fields: %s!" % ', '.join(f['field_name'] for f in metadata)
print

This project has 3 records
Each record has the following keys: phone_number, first_name, last_name, image_path, dob, demographics_complete, foo_score, sex, study_id, file, address, imaging_complete, testing_complete, bar_score.

But our metadata structure has the following fields: study_id, first_name, last_name, dob, sex, address, phone_number, file, foo_score, bar_score, image_path!

You’d be wrong to assume the fields we get from exporting the data matches the field_names from the metadata structure. This is because the REDCap API also returns the status of all of the forms for a particular record. These fields are always called [form name]_complete where [form name] is the lowercased & underscore-replaced version of the forms you see in the web-application. (You would be correct to assume the fields from an export are a superset of the fields from the metadata structure)

We can examine a particular record like so:

record = data[0]
for field_name, value in record.items():
    print "%s: %s" % (field_name, value)

phone_number: (615) 555-1234
first_name: Billy Bob
last_name: blah blah
image_path: /path/to/image
dob: 2000-01-01
demographics_complete: 2
foo_score: 100
sex: 1
study_id: 1
file: [document]
address: 123 Main Street, Anytown USA 23456
imaging_complete: 2
testing_complete: 2
bar_score: 2

Pretty neat. Within the payload that you send to the API, you can specify parameters that will limit the response to just include specific records, fields, forms, events (if your Project is longitudinal) and whether to get the raw or human-label in mutliple-choice fields. Experimenting with these calls is left to the reader.

Importing new data

Even fancier than exporting current data from the Project is updating records through the API. This payload looks a little different, though. We’ve got to encode the data that we want to import and attach it to the payload.

from json import dumps # the function we'll need to make a json-string of our new data

updated_record = data[0]
# Update a particular field
updated_record['foo_score'] = '100'

#we have to pass a list of records to the redcap API, so we're going to dump our new record within a list
# and we need to specify how to format the json string
to_import_json = dumps([updated_record], separators=(',',':'))
payload['data'] = to_import_json

response = post(URL, data=payload)
print response.json()['count']

Real quickly:

We updated a field from the first record.
We made a json-formatted string of this data structure (after packing it into a list because that’s what the API wants).
We attached this data to the data field of the payload and made the request to the API.
By default when importing data, the API will respond with a dict with the key count. This number is how many records you imported. You can see here that we import one record.

You might be wondering to yourself, how did the API know which record to update? That information is specified in the study_id field because study_id is the primary key of the Project, which is by definition the first field in the metadata (take this opportunity to look back and see that study_id was in fact the first field).

Note, we formatted the incoming data as json because that was the format we specified in the format parameter of the payload. You could just as easily import data formatted as CSV or XML if you change that parameter.

Exporting and Importing data are the two most important methods of the API. You can also download, upload and delete files stored in file fields per record but doing this is different for every HTTP library so I’ll let you figure it out for your programming language :)

That brings us to the end of how to use the REDCap API generally. I’ve implemented everything above in python, but you’re free to use whatever language you like as long as it has an HTTP library.

That being said, python is fantastic language with great libraries for high- level data manipulation like pandas, low-level data structures like NumPy, scientific libraries like SciPy. Python is also very popular in web development communities so there are web frameworks like Django and Flask in case you want to build websites or applications. If you need to do some advanced task, there probably exists a python package to help you on your way. It’s a great platform to build all sorts of tools.

Using the REDCap API in Python Applications

To make it easier to use the REDCap API from within python scripts and applications, I wrote PyCap. I’ll assume a Mac OS X or Linux environment, though all of this should work on Windows. It assumes working knowledge of the shell and the python language.

First, we must install the package. In a shell:

$ pip install PyCap

If you don’t have pip installed, this will work (you really should though, easy_install is considered deprecated by much of the python community):

$ easy_install PyCap

You may notice another package, requests, is installed as well.

With installation out of the way, let’s start writing python. We’ll begin with importing the package. The two main classes your scripts and applications should use are the Project class and the RedcapError exception.

from redcap import Project, RedcapError

(As long as this import doesn’t fail, you installed PyCap correctly).

Connecting to REDCap Projects

Just like above, you’ll need to know your API token and URL for your site.

project = Project(URL, TOKEN)

for field in project.metadata:
    print "%s (%s) ---> %s" % (field['field_name'], field['field_type'], field['field_label'])

study_id (text) ---> Study ID
first_name (text) ---> First Name
last_name (text) ---> Last Name
dob (text) ---> Date of Birth
sex (dropdown) ---> Gender
address (notes) ---> Street, City, State, ZIP
phone_number (text) ---> Phone number
file (file) ---> File
foo_score (text) ---> Test score for Foo test
bar_score (text) ---> Test score for Bar test
image_path (text) ---> image_path

When you create a Project, PyCap automatically exports the metadata from your project. First, it does so to setup a few nice attributes on the object but more importantly, if the metadata request works correctly, the URL and token are correct and can be trusted to work later on.

All of the methods the API provides are available. To demonstrate what we did above, consider the following:

metadata = project.export_metadata()
data = project.export_records()
data[0]['first_name'] = 'Billy Bob'
response = project.import_records(data)
print response['count']

In these 5 lines, we:

Made an export metadata request (by default in json format), then automatically decoded it.
Made a data export request (again, by default in JSON format) and returning the decoded data.
Tweaked a single field of the first record.
Imported the new data.
Printing how many records were imported.

All of the HTTP request machinery, making sure the payloads correct, encoding and decoding the JSON responses is handled for you. I wrote PyCap because I think most people just want their data and shouldn’t have to know HTTP to make it happen. Trust me, I made a lot of mistakes in building this library. You should use it so you don’t have to waste your time.

File downloads/uploads/deletions

I didn’t really go through file actions above because every HTTP library is going to deal with files differently. If you use PyCap, file operations are super simple:

record = '1'
field = 'file'
contents, headers = project.export_file(record, field)
print contents
print headers['name']

Just some data, you know.
data.txt

Obviously, most important returned data is the file contents. In the web- application, the filename you see for this particular record/field is what comes through in headers['name']. So if you want to save it to your local hard drive, it’s easy to keep the same name.

with open(headers['name'], 'w') as f:
    f.write(contents)

Just FYI, if you download a stored PDF, contents will be the binary data string and you’ll want to open the file in the wb mode.

Let’s say we want to upload a new file to that record. A little more complicated, but still pretty easy.

# First write a new file
with open(headers['name'], 'w') as f:
    f.write('Yeah, I decided to change the contents of the file')

new_fname = 'new_data.txt'
with open(headers['name'], 'r') as f:
    response = project.import_file(record, field, new_fname, f)

# just to check...
contents, headers = project.export_file(record, field)
print contents

Yeah, I decided to change the contents of the file

And if you really want to delete a file from REDCap, that too is possible. Warning there is no undo button for this :)

response = project.delete_file(record, field)

There is more documentation for PyCap here.

Feedback/Questions/Comments

Any feedback about this tutorial is greatly appreciated. There isn’t much on the internet about this so I hope you find it helpful in your work with REDCap. Feel free to open an issue on this post on GitHub

July 22, 2013

← to posts

sburns.org

strong opinions, loosely held

sburns.org

Intro to the REDCap API

Basic Usage

Under no circumstances should you publicize your project token(s)!

Data Export

Importing new data

Using the REDCap API in Python Applications

Connecting to REDCap Projects

File downloads/uploads/deletions

Feedback/Questions/Comments

strong opinions,
loosely held