As far as I can tell, there isn't a tutorial on the internet about how to use the REDCap API. So here goes...
REDCap is an advanced web-based application for securely storing and retrieving tabular data. In simple terms, it can be thought of a web-based spreadsheet, though it is much more than that. It provides an Application Programming Interface which means external software can programmatically download and upload data into REDCap Projects. This tutorial assumes working knowledge of REDCap. When all else fails, please consult your site's API help page, which is at here for Vanderbilt.
Becuase the API is based on simple HTTP requests, any programming langauge with a HTTP library can use the REDCap API. I'm going to demonstrate simple API usage in python using the wonderful requests library.
To use the REDCap API, you must know the following:
Every call to the REDCap API is a HTTP POST request with specific parameters in
the payload. The
token parameter is always required as this tells the API from
which Project you're requesting a response. Next, the
content parameter is
used to declare the type of request you're making. Finally, you may want to
format field as well as this tells the API in what format you want
the response. It defaults to returning a CSV string, but I generally prefer
getting json-formatted responses as that format can be
easily converted to actual in-memory objects like lists, dictionaries, strings,
So let's begin by making the most simple request, exporting the Project's Metadata (AKA Data Dictionary).
A few things to talk about here:
But just to be clear:
You've been warned. (If you do publicize them for whatever reason, don't fret. Just delete those tokens through the web app ASAP and request new tokens).
With that out of the way, the API accepted our request and returned data with a '200' status, which means "everything is peachy" in HTTP.
Now let's examine our metadata a bit. The
.json() method I'm going to use just
decodes the response (every language's JSON library will work a bit differently,
This project has 11 fields field_name (type) ---> field_label --------------------------- study_id (text) ---> Study ID first_name (text) ---> First Name last_name (text) ---> Last Name dob (text) ---> Date of Birth sex (dropdown) ---> Gender address (notes) ---> Street, City, State, ZIP phone_number (text) ---> Phone number file (file) ---> File foo_score (text) ---> Test score for Foo test bar_score (text) ---> Test score for Bar test image_path (text) ---> image_path Every field has these keys: branching_logic, custom_alignment, field_label, field_name, field_note, field_type, form_name, identifier, matrix_group_name, question_number, required_field, section_header, select_choices_or_calculations, text_validation_max, text_validation_min, text_validation_type_or_show_slider_number
The returned json decodes to a list of
dict objects (python's name for hash
tables). We see that there are 11 fields in this project, we print out a mapping
field_name (the "machine" name for a field) along with it's type and
field_label (the human-readable description). Finally, I just print out
all of the keys from the first field so we can look at all of the data that
comes with each field.
For all intents and purposes, this data structure is what we get when we manually download the Data Dictionary from our project, just in a slightly more machine-readable format.
Here's the fun part. Just tweak the request payload a little and we'll download all of the data from our project:
Voilà, we've just downloaded all of the data from our project. Let's examine it.
This project has 3 records Each record has the following keys: phone_number, first_name, last_name, image_path, dob, demographics_complete, foo_score, sex, study_id, file, address, imaging_complete, testing_complete, bar_score. But our metadata structure has the following fields: study_id, first_name, last_name, dob, sex, address, phone_number, file, foo_score, bar_score, image_path!
You'd be wrong to assume the fields we get from exporting the data matches the
field_names from the metadata structure. This is because the REDCap API also
returns the status of all of the forms for a particular record. These fields are
[form name]_complete where
[form name] is the lowercased &
underscore-replaced version of the forms you see in the web-application. (You
would be correct to assume the fields from an export are a superset of the
fields from the metadata structure)
We can examine a particular record like so:
phone_number: (615) 555-1234 first_name: Billy Bob last_name: blah blah image_path: /path/to/image dob: 2000-01-01 demographics_complete: 2 foo_score: 100 sex: 1 study_id: 1 file: [document] address: 123 Main Street, Anytown USA 23456 imaging_complete: 2 testing_complete: 2 bar_score: 2
Pretty neat. Within the
payload that you send to the API, you can specify
parameters that will limit the response to just include specific records,
fields, forms, events (if your Project is longitudinal) and whether to get the
raw or human-label in mutliple-choice fields. Experimenting with these calls is
left to the reader.
Even fancier than exporting current data from the Project is updating records through the API. This payload looks a little different, though. We've got to encode the data that we want to import and attach it to the payload.
datafield of the payload and made the request to the API.
count. This number is how many records you imported. You can see here that we import one record.
You might be wondering to yourself, how did the API know which record to update?
That information is specified in the
study_id field because
study_id is the
primary key of the Project, which is by definition the first field in the
metadata (take this opportunity to look back and see that
study_id was in fact
the first field).
Note, we formatted the incoming data as json because that was the format we
specified in the
format parameter of the payload. You could just as easily
import data formatted as CSV or XML if you change that parameter.
Exporting and Importing data are the two most important methods of the API. You
can also download, upload and delete files stored in
file fields per record
but doing this is different for every HTTP library so I'll let you figure it out
for your programming language :)
That brings us to the end of how to use the REDCap API generally. I've implemented everything above in python, but you're free to use whatever language you like as long as it has an HTTP library.
That being said, python is fantastic language with great libraries for high- level data manipulation like pandas, low-level data structures like NumPy, scientific libraries like SciPy. Python is also very popular in web development communities so there are web frameworks like Django and Flask in case you want to build websites or applications. If you need to do some advanced task, there probably exists a python package to help you on your way. It's a great platform to build all sorts of tools.
To make it easier to use the REDCap API from within python scripts and
applications, I wrote
PyCap. I'll assume a
Mac OS X or Linux environment, though all of this should work on Windows. It
assumes working knowledge of the shell and the python language.
First, we must install the package. In a shell:
$ pip install PyCap
If you don't have
pip installed, this will work (you really should though,
easy_install is considered deprecated by much of the python community):
$ easy_install PyCap
You may notice another package,
requests, is installed as well.
With installation out of the way, let's start writing python. We'll begin with
importing the package. The two main classes your scripts and applications should
use are the
Project class and the
(As long as this import doesn't fail, you installed
Just like above, you'll need to know your API token and URL for your site.
study_id (text) ---> Study ID first_name (text) ---> First Name last_name (text) ---> Last Name dob (text) ---> Date of Birth sex (dropdown) ---> Gender address (notes) ---> Street, City, State, ZIP phone_number (text) ---> Phone number file (file) ---> File foo_score (text) ---> Test score for Foo test bar_score (text) ---> Test score for Bar test image_path (text) ---> image_path
When you create a
Project, PyCap automatically exports the metadata from your
project. First, it does so to setup a few nice attributes on the object but more
importantly, if the metadata request works correctly, the URL and token are
correct and can be trusted to work later on.
All of the methods the API provides are available. To demonstrate what we did above, consider the following:
In these 5 lines, we:
All of the HTTP request machinery, making sure the payloads correct, encoding and decoding the JSON responses is handled for you. I wrote PyCap because I think most people just want their data and shouldn't have to know HTTP to make it happen. Trust me, I made a lot of mistakes in building this library. You should use it so you don't have to waste your time.
I didn't really go through file actions above because every HTTP library is going to deal with files differently. If you use PyCap, file operations are super simple:
Just some data, you know. data.txt
Obviously, most important returned data is the file contents. In the web-
application, the filename you see for this particular record/field is what comes
headers['name']. So if you want to save it to your local hard
drive, it's easy to keep the same name.
Just FYI, if you download a stored PDF,
contents will be the binary data
string and you'll want to open the file in the
Let's say we want to upload a new file to that record. A little more complicated, but still pretty easy.
Yeah, I decided to change the contents of the file
And if you really want to delete a file from REDCap, that too is possible. Warning there is no undo button for this :)
There is more documentation for PyCap here.
Any feedback about this tutorial is greatly appreciated. There isn't much on the internet about this so I hope you find it helpful in your work with REDCap. Feel free to open an issue on this post on GitHub