The Denoise Benchmark (DNB) for task-based fMRI

The Denoise Benchmark (DNB) for task-based fMRI
Questions, comments? E-mail Kendrick Kay

News

2013/12/17 - Paper in Frontiers published (see below).
2013/07/03 - Release of version 1.0.

History of major changes

2013/07/03 - Version 1.0.

Introduction

The Denoise Benchmark (DNB) is an architecture for testing and comparing denoising methods for task-based fMRI. The performance metric is cross-validation accuracy, whereby a denoising method is evaluated according to how accurately its estimate of task-related responses predict held-out data. The underlying idea is that if a denoising method genuinely removes (or accounts for) noise in the data, then the estimate of the signal (i.e. the task-related component of the data that we are interested in) should be more accurate. DNB is written in MATLAB and consists of three main components:

fMRI data (21 datasets available)
Code framework for automatic evaluation of denoising methods
Implementations of several denoising methods

The DNB is described in the following paper:

Kay, K.N., Rokem, A., Winawer, J., Dougherty, R.F., & Wandell, B.A. GLMdenoise: a fast, automated technique for denoising task-based fMRI data. Frontiers in Neuroscience (2013).

In this paper, we present a denoising method, GLMdenoise, and use DNB to evaluate GLMdenoise and other denoising methods. If you use DNB in your research, please cite the above paper.

If you have data that you would like to add to the DNB, please contact me. Also, if you have a denoising method that you would like to evaluate using the DNB, you can either try it yourself using the materials available here, or I would be happy to take the method and run it through my existing setup.

Getting started

Download code (latest tagged version 1.01)
Download code (latest development version)
Clone code from github
After downloading and unzipping the files, launch MATLAB and change to the directory containing the files. You can then run the example scripts by typing "example1" or "example2". More information on the example scripts is provided below.

Example scripts

The following are links to HTML pages produced by the MATLAB 'publish' command.

Denoise Benchmark API

The DNB requires that different denoising methods conform to a common API (Application Programming Interface). Any denoising method that conforms to this API can be automatically evaluated in the DNB.

A denoising method is defined to be a function that accepts three inputs:

A dataset struct containing the fMRI data and associated information (see Format of the data).
A directory location to which the function is allowed to write figures and/or output. Omit the trailing slash.
A column vector of dimensions time x 1 with an HRF estimate. It is up to the denoising method to decide whether to use this HRF estimate.

The function should return a model, which is defined to be a function that accepts a single input:

A cell vector of design matrices that are each time x conditions. The number of conditions should be the same across design matrices, but the number of time points may differ across design matrices.

This function should return a cell vector of matrices with the predicted time-series. Each matrix should be X x Y x Z x time (with the number of time points matching the corresponding design matrix).

All of the denoising methods provided in the DNB (i.e. DNBmethod_*.m) conform to the API described above.

Format of the data (datasetNN.mat)

The fMRI data provided in the DNB comprise 21 independent datasets. For convenience, the data have already undergone pre-processing, which consists of the following steps: removal of the first five volumes of each run (to avoid initial magnetization effects), slice time correction (sinc interpolation), motion correction (rigid-body alignment as estimated by SPM), and spatial undistortion based on fieldmap measurements (custom code). The raw fMRI data are available upon request.

The 'datasetNN.mat' file contains the data corresponding to dataset NN (where NN ranges from 01 to 21). The following variables are contained in each dataset file:

'data' - This is the fMRI data. The dimensions of the data for one run are X x Y x Z x time, and the data for multiple runs are given as a cell vector like { data1 data2 data3 ... }. Voxels for which a complete set of data is not available (due to head motion) have been set such that their time-series consist of all 0s.
'design' - This is the experimental design. The dimensions of the design for one run are time x conditions, and the design for multiple runs is given as a cell vector like { design1 design2 design3 ... }. In each design matrix, values are either 0 or 1, where 1s indicate the onset of a given condition.
'stimdur' - The duration of a condition in seconds (e.g. 3). (In each dataset, all conditions have the same duration.)
'tr' - The sampling rate of the data in seconds (e.g. 2).
'voxelsize' - A 3-element vector with the voxel size in millimeters.
'meanvol' - A matrix of dimensions X x Y x Z containing the mean across all volumes in 'data'.
'brainmask' - A matrix of dimensions X x Y x Z containing a binary brain mask generated by FSL's BET utility.
'motionparameters' - These are the motion parameter estimates obtained in motion correction procedure. The dimensions of the estimates for one run are time x 6 (three columns for the translation parameters, three columns for the rotation parameters), and the estimates for multiple runs are given as a cell vector like { motion1 motion2 motion3 ... }.
'runsets' - A vector of dimensions 1 x runs consisting of positive integers. Runs labeled with the same number comprise a run set (i.e., a group of runs such that all conditions in the experiment are presented at least once). The cross-validation procedure used in the DNB involves leave-one-run-set-out cross-validation.
'runtypes' - A vector of dimensions 1 x runs consisting of positive integers. Runs labeled with the same number are of the same type (i.e., the same set of conditions are presented in each of these runs). Having this information may be useful for balancing a bootstrapping procedure (indeed this is what occurs in GLMdenoise).
'hrf' - This contains HRF estimates. On each cross-validation iteration, the DNB passes an HRF estimate to the denoising method being evaluated, and the method has the option to make use of the estimate (see DNBevaluatemethod.m for details). The dimensions of the HRF estimates is time x cross-validation iterations. The HRF estimates were obtained using the GLMdenoise method. The reason that the 'hrf' field is useful is that we are able to ensure that all of the denoising methods provided in the DNB repository make use of the same HRF estimate, which has the consequence that any differences in cross-validation performance across methods reflect differences in the accuracy of the beta weight estimates.

In some datasets, physiological data were collected (cardiac and respiratory monitoring). For these datasets (namely, 14–21), the following additional variables are available in the dataset file:

'dataRETRO1' - This has the same format as 'data' except that RETROICOR regressors (8 regressors) have been regressed out from the data prior to subsequent pre-processing (slice time correction, motion correction, spatial undistortion).
'dataRETRO2' - This has the same format as 'data' except that RETROICOR regressors (8 regressors) and RVHRCOR regressors (4 regressors) have been regressed out from the data prior to subsequent pre-processing (slice time correction, motion correction, spatial undistortion).

Note that in the regression procedure, the physiological regressors are orthogonalized with respect to constant, linear, and quadratic terms before they are projected out from each run.

Format of the physiological data (datasetNN_physio.tar)

The RETROICOR and RVHRCOR denoising methods have already been applied to the fMRI data provided in the DNB (see description above). However, we provide the raw physiological data and the raw fMRI data for those wishing to try their own methods based on physiological data.

The 'datasetNN_physio.tar' file, after untarring, contains the following files:

physio - A directory with the raw physiological data
physio.mat - A .mat file with 'regressors' which contains the RETROICOR and RVHRCOR regressors. The dimensions for one run are time x regressors x slices, and the regressors for multiple runs are embedded in a cell vector.
rawdata.mat - A .mat file with 'data' which contains the raw fMRI data (no pre-processing has been applied).

Results of the Denoise Benchmark

We used the Denoise Benchmark (DNB) to evaluate a number of different denoising methods. The implementation of these methods is provided in the DNB repository (i.e., DNBmethod_*.m).

The results of the evaluation are provided by the DNBresults/results.mat file, and figures that illustrate these results are provided by the DNBresults/figures.tar file. Both of these files can be downloaded using DNBdownloadresults.m.

The following variables are contained in the 'results.mat' file:

'allR2' - This is a cell matrix of dimensions datasets x methods. Each element contains cross-validated R² values achieved by a given method on a given dataset. Some entries are [] because the RETROICOR and RVHRCOR methods cannot be run on datasets that lack accompanying physiological data.
'denoisemethods' - This is a cell vector of strings that indicate the names of different denoising methods. For example, the string 'GLMomnibus' is valid because there is a function called DNBmethod_GLMomnibus.m in the DNB repository.
'denoisenames' - This is a cell vector of strings with nice names (human-readable) for the denoising methods.
'meanvols' - This is a cell vector of dimensions 1 x datasets with the mean across the volumes in each dataset (see Format of the data).
'brainmasks' - This is a cell vector of dimensions 1 x datasets with a binary brain mask for each dataset (see Format of the data). The masks are used in voxel selection (see below).
'voxelselections' - This is a cell vector of dimensions 1 x datasets. Each element is a binary mask indicating the voxels used to compare the performance of different denoising methods. This mask is computed as any voxel that (1) is in the brain mask (see 'brainmasks'), (2) has a cross-validated R² that is greater than 0% under any of the denoising methods after spatial smoothing of the R² maps (this is useful for diminishing the impact of "speckles"), and (3) has a cross-validated R² that is greater than 0% under any of the denoising methods being compared.

The 'figures.tar' file, after untarring, contains the following files:

datasetNN_brainmask.png - the binary brain mask
datasetNN_meanvol.png - the mean volume
datasetNN_voxelselection.png - the voxels selected for comparing performance levels
datasetNN_GLM*.png - cross-validated R² for different denoising methods
scatter/*.png - scatter plots comparing cross-validated R² values for pairs of denoising methods
summary*.png - bar charts showing the median cross-validated R² achieved by each method on each dataset

Code to generate the figures is provided in the example2.m script (see Example scripts). The figures generated by example2.m are written to the DNB directory and will not interfere with the figures contained in figures.tar since that resides in the DNBresults directory.

Implementation notes

The DNB includes implementation of several denoising methods. Here are some notes regarding this implementation:

In various methods, calls to GLMdenoise are used. In these calls, we set the 'seed' option to 0 so that all of the methods use the same random number generation seed. This makes it such that the bootstrap sampling of runs is identical across methods, which minimizes variability in results due to the bootstrap sampling. Also, we set the 'wantpercentbold' option to 0 so that units are unchanged; this is necessary for predicting the held-out data (which is in raw units).
GLMdenoise optimizes the shape of the HRF based on the available data. To ensure that performance differences across denoising methods reflect only the accuracy of the beta weights (and not differences in the HRF used), we make it such that the HRF estimate provided by GLMdenoise is used by other denoising methods (see the 'hrf' variable).