Models of BOLD responses in visual cortex
Kendrick Kay (with contributions from Jon Winawer, Ariel Rokem, Aviv Mezer, and Brian Wandell)
Questions, comments? E-mail Kendrick

News

Introduction

We want to understand how visual stimuli are represented in visual cortex. To that end, we design stimuli, measure responses in visual cortex, and attempt to build quantitative models that predict responses from the stimulus. Building models is important as models formalize our intuitions and help summarize and interpret experimental measurements. Moreover, given that different types of stimuli are typically studied in isolation, models are useful for relating different stimuli to one another.

We believe that a key determinant of the value of a model is its accuracy, that is, how well and for what range of stimuli the model predicts responses. A model that accurately predicts responses to a small, limited set of stimuli has limited value. Conversely, a model that can predict responses to a large, diverse set of stimuli but has only modest accuracy also has limited value.

Cross-validation is critical when evaluating the accuracy of a model. When evaluating models on a fixed set of data, more complex models (e.g., models with many free parameters) will inevitably fit the data better than simpler models. However, such models may be simply overfitting the noise in the data, and there is no guarantee that such models will generalize well to new data. Cross-validation provides unbiased assessments of model accuracy.

What we provide

On this web site, we provide data and models from recent experiments we have conducted. These experiments are described in two manuscripts:

Kay, K.N., Winawer, J., Mezer, A., & Wandell, B.A. Compressive spatial summation in human visual cortex. Journal of Neurophysiology (2013).

Kay, K.N., Winawer, J., Rokem, A., Mezer, A., & Wandell, B.A. A two-stage cascade model of BOLD responses in human visual cortex. PLoS Computational Biology (2013).

We provide:

  1. Stimuli that we used to probe visual responses

  2. Measurements of BOLD responses to these stimuli

  3. MATLAB code implementing the models we propose

We welcome efforts to scrutinize our models, to improve our models, and to consider whether alternative models (including models developed in psychophysics, computer vision, and the theoretical literature) better account for the experimental measurements we have made. We hope the open exchange of data and code will spur further modeling efforts and lead to the development of improved models.

Terms of use: The content provided by this web site is licensed under a Creative Commons Attribution 3.0 Unported License. You are free to share and adapt the content as you please, under the condition that you cite the appropriate manuscripts.

Notes on our data and models

Type of data: Our data are collected using functional magnetic resonance imaging (fMRI) in human subjects. As a measurement technique, fMRI has the advantage of broad coverage, providing simultaneous measurements of activity in multiple visual areas. However, fMRI has limited spatial and temporal resolution and is complementary to other types of measurements such as single-cell electrophysiology and optical imaging. We hope that as the field progresses, models of visual responses will be able to flexibly describe responses at many levels of measurement.

Stimulus selection: Given the large space of possible visual stimuli, the choice of stimuli to use in an experiment is critical. We use both artificial and naturalistic stimuli, as each type of stimulus has advantages and disadvantages. From our perspective, the major advantage of artificial stimuli is ease of interpretation, which is especially valuable to the process of model development. Naturalistic stimuli, on the other hand, has special status as the type of stimulus whose representation is arguably the most important to understand. An omnibus approach that includes a diversity of stimuli (both artificial and naturalistic) seems wise. And of course, the more stimuli and data collected, the better.

Modeling approach: The models we propose are parametric nonlinear models with few parameters. These models are highly constrained and incorporate specific nonlinear computations. The philosophy behind this approach is that because data are limited, parameters are precious and we should incorporate very specific computations into models. This approach can be contrasted with large-scale linearized models that consist of a large number of regressors in a linear model (and often involve regularization to control overfitting). These models are flexible and can describe a variety of different computations. The philosophy behind this approach is to attempt to allow the data to inform the model as to what types of computations are necessary. Different approaches have advantages and disadvantages. Ultimately, the goal is to develop models that have high predictive power and good interpretability; exactly what modeling approaches will help achieve this goal is an empirical question.

Getting started

Download all of the files. After downloading and unzipping the files, launch MATLAB and change to the directory containing the files. You can then run the example scripts described below.

Join the socmodel discussion group. This group is for people interested in the content provided on this web site.

Description of files

code - This directory contains a copy of the "knkutils" github code repository. This code repository is required to run our models, but is not required for accessing the stimuli and the responses.

conimages - This directory contains thumbnails of the spatial masks (i.e. contrast images) that were used to construct some of the stimuli. These thumbnails are generated by the 'stimuli_example.m' script.

cssmodel_example.m (or raw file) - An example script illustrating the code that implements the CSS model.

dataset01.mat, dataset02.mat, dataset03.mat, dataset04.mat, dataset05.mat - Files containing different datasets. The contents of the files are detailed in a later section.

dataset01_benchmark.mat, dataset02_benchmark.mat, dataset03_benchmark.mat, dataset04_benchmark.mat, dataset05_benchmark.mat - Files containing the performance of the CSS model on datasets 1 and 2 and the performance of the SOC model on datasets 3–5. The contents of the files are detailed in a later section.

dataset_example.m (or raw file) - An example script that loads in the third dataset and plots some data inspections.

html - This directory contains the outputs of the various scripts that we provide, generated through the MATLAB 'publish' command.

README - A brief note pointing back to this web site.

socmodel_example.m (or raw file) - An example script illustrating the code that implements model fitting for the SOC model.

socmodel_simulation.m (or raw file) - An example script illustrating the code that implements simulations of the SOC model.

stimuli_example.m (or raw file) - An example script that loads in the stimuli and plots some visualizations.

stimuli.mat - A file containing the stimuli. The contents of this file are detailed in a later section.

thumbnails - This directory contains thumbnails of the first frame of each stimulus. These thumbnails are generated by the 'stimuli_example.m' script.

Stimuli

General characteristics of stimulus set 1. All stimuli were high-contrast grayscale images. Stimuli were constructed at a resolution of 600 pixels x 600 pixels. The stimuli occupied a field-of-view of 21–30 degrees of retinal angle (the specific field-of-view for each dataset is given below).

Stimulus set 1. This stimulus set consists of 69 stimuli. Each stimulus consists of 30 frames. (The reason for using multiple frames is to average out unwanted variability. For example, if a stimulus is designed to stimulate a specific region of the visual field with a given type of noise pattern, we have multiple frames, each of which corresponds to a different randomly generated noise pattern, so as to average out unwanted variability due to the specific noise pattern on any given frame.) Stimuli were presented in random order. On each trial, the 30 frames from a given stimulus were sequentially presented over a duration of 3 seconds (each frame lasted for 1/10 s). Successive stimuli were separated by temporal gaps of at least 5 seconds. Subjects performed a central fixation task while stimuli were presented. Here is an example movie illustrating what subjects saw during the experiment.

General characteristics of stimulus sets 2–3. All stimuli were band-pass filtered grayscale images. Stimuli were constructed at a resolution of 256 pixels x 256 pixels (this is relevant to the band-pass filtering step, as discussed below), but were ultimately upsampled to 800 pixels x 800 pixels for the purposes of stimulus presentation. The stimuli occupied a field-of-view of 12.7 degrees of retinal angle (on average across subjects; the specific field-of-view for each dataset is given below).

Stimulus set 2. This stimulus set consists of 156 stimuli. Each stimulus consists of multiple frames, typically 9. Stimuli were presented in random order. On each trial, 9 frames drawn from a given stimulus were sequentially presented over a duration of 3 seconds (duty cycle was 1/6 s ON, 1/6 s OFF, etc.). Successive stimuli were separated by temporal gaps of at least 5 seconds. Subjects performed a central fixation task while stimuli were presented. Here is an example movie illustrating what subjects saw during the experiment.

Stimulus set 3. This stimulus set has the same general characteristics as stimulus set 2, except that the stimuli in this set consist of 35 distinct object images. Only one frame was associated with each stimulus, so on a given trial, the same object image was flashed 9 times over a duration of 3 seconds. Here is an example movie illustrating what subjects saw during the experiment.

Contents of 'stimuli.mat':

'images' contains the raw stimulus frames for stimulus sets 1, 2, and 3 (concatenated). The 'images' variable is a cell vector of dimensions 1 x 260 (since 69+156+35=260). Each entry corresponds to one stimulus, and each stimulus consists of one or more frames. For stimulus set 1 (1 through 69), all entries consist of 30 distinct frames. For stimulus set 2 (70 through 225), all entries consist of 9 distinct frames except for entry #174 which consists of 7 distinct frames. For stimulus set 3 (226 through 260), all entries consist of a single frame. The resolution of the images in stimulus set 1 is 600 pixels x 600 pixels; the resolution of the images in stimulus sets 2–3 is 800 pixels x 800 pixels. For all images, the format is uint8; the range of values is [0,254]; and the background has a value of 127.

'conimages' contains the spatial masks (contrast images) used in the generation of some of the stimuli. There is a direct correspondence between 'conimages' and 'images'. The 'conimages' variable is a cell vector of dimensions 1 x 260. Each entry gives the mask that was used to generate the corresponding stimulus in 'images'. The indices of the stimuli that have associated masks are 1:69, 70:138, and 185:208. For the other indices, the entry in 'conimages' is simply the empty matrix. The resolution of the contrast images in stimulus set 1 is 600 pixels x 600 pixels; the resolution of the contrast images in stimulus sets 2–3 is 256 pixels x 256 pixels. For all contrast images, the format is double; and the range of values is [0,1] where a value of X indicates that at that pixel, the underlying stimulus, weighted by X, was blended with the gray background, weighted by 1–X.

'bpfilter' contains the band-pass filter used to generate some of the stimuli in stimulus sets 2–3. The filter is a matrix of dimensions 21 x 21 and was used in stimulus construction, at which point the image resolution was 256 pixels x 256 pixels.

Responses

The datasets provided here are taken from the manuscripts Compressive spatial summation in human visual cortex (datasets 1–2) and A two-stage cascade model of BOLD responses in human visual cortex (datasets 3–5). The data are fMRI measurements in human visual cortex at 3T using gradient-echo, isotropic 2.5-mm voxels, and a TR (sampling rate) of approximately 1.3 s (1.323751 s in datasets 1–2, 1.337702 s in datasets 3–5). Each dataset has a dimensionality of 64 voxels x 64 voxels x 21 voxels (datasets 1–2) or 64 voxels x 64 voxels x 22 voxels (datasets 3–5). The data have been pre-processed, including slice time correction, motion correction, and spatial undistortion based on fieldmap measurements. Additionally, the data have been analyzed using a GLM time-series model that includes a flexible HRF (hemodynamic response function), beta weights representing the amplitude of the BOLD response to each stimulus, polynomial regressors, and regressors that remove global BOLD fluctuations unrelated to the stimulus. To obtain error bars, 30 bootstraps of the GLM model were performed. We selected for further analysis voxels for which the GLM model has positive R2 cross-validation accuracy (it is these voxels that are provided in the data files). Voxels are identified using indices into the 3D volume; for example, voxel 66 is located at matrix location (2,2,1).

Dataset 1 (subject A). This dataset consists of responses to stimulus set 1 (69 stimuli). The data were collected in one scan session and each stimulus was presented 5 times. The field-of-view was 24.7°.

Dataset 2 (subject B). This dataset consists of responses to stimulus set 1 (69 stimuli). The data were collected in one scan session and each stimulus was presented 5 times. The field-of-view was 23.5°.

Dataset 3 (subject B). This dataset consists of responses to stimulus set 2 (156 stimuli). The data were collected over two scan sessions and each stimulus was presented 6 times. The field-of-view was 12.5°.

Dataset 4 (subject C). This dataset consists of responses to stimulus set 2 (156 stimuli). The data were collected in one scan session and each stimulus was presented 3 times. The field-of-view was 12.7°.

Dataset 5 (subject C). This dataset consists of responses to stimulus set 3 (35 stimuli). The data were collected in one scan session and each stimulus was presented 10 times. The field-of-view was 12.7°.

Contents of 'dataset01.mat', 'dataset02.mat', etc.:

'tr' is the TR in seconds.

'vxs' is voxels x 1 with the indices into the 3D volume.

'vxsselect' is as follows:

(datasets 1–2) 'vxsselect' is voxels x 1 with logical values (0/1) indicating the voxel selection group used in the manuscript. The column indicates voxels in V1, V2, V3, V3AB, hV4, VO-1, VO-2, LO-1, LO-2, TO-1, and TO-2 that have a mean beta weight that is positive (e.g. mean(betamn,2) > 0).

(datasets 3–4) 'vxsselect' is voxels x 3 with logical values (0/1) indicating various voxel selection groups used in the manuscript. The first column indicates voxels in V1, V2, V3, and hV4 that have a mean beta weight that is positive (e.g. mean(betamn,2) > 0). The second column indicates, of the voxels satisfying the first column, the top 10 voxels in each area (V1, V2, V3, hV4) in terms of GLM R2 (see 'glmr2'). The third column indicates, of the voxels satisfying the first column, those voxels for which at least 90% of the receptive field (as determined through fitting a simple spatial model) is contained within the stimulus bounds.

(dataset 5) 'vxsselect' is the empty matrix (not applicable).

'hrfs' is voxels x time x bootstraps with different bootstraps of HRF estimates. The first time point coincides with stimulus onset and subsequent time points occur at the data sampling rate (TR). Each HRF is scaled such that the peak of the HRF is 1.

'hrfmn' is voxels x time with final HRF estimates. These are obtained by taking the median across the bootstraps in 'hrfs'.

'hrfse' is voxels x time with error bars on HRF estimates. These are obtained by calculating half of the 68% range of the bootstraps in 'hrfs'.

'betas' is voxels x amplitudes x bootstraps with different bootstraps of beta weights. Beta weights have been divided by the mean signal level in each voxel and multiplied by 100, thereby converting beta weights into units of percent BOLD change.

'betamn' is voxels x amplitudes with final beta weights. These are obtained by taking the median across the bootstraps in 'betas'.

'betase' is voxels x amplitudes with error bars on beta weights. These are obtained by calculating half of the 68% range of the bootstraps in 'betas'.

'glmr2' is voxels x 1 with the cross-validated R2 of the GLM time-series model. The values are percentages, with a maximum of 100.

'roi' is voxels x 1 with integers between 1–12 indicating the ROI assignment (e.g. V1). ROI assignment was determined based on separate experiments.

'roilabels' is a 1 x 12 cell vector of strings with ROI names.

'meanvol' is 64 x 64 x [21 or 22] with the mean of all of the functional volumes in the dataset (after pre-processing). The values reflect raw scanner units. Note that voxels for which a complete set of data is not available (e.g. due to motion) are assigned a value of 0 in 'meanvol'.

Additional notes: Datasets 2 and 3 come from the same subject but are not co-registered to one another. Datasets 4 and 5 come from the same subject and are co-registered to one another. Thus, data from these two datasets can be treated as reflecting the same voxels. However, since the two datasets are collected in different scan sessions, there may be across-session instabilities (e.g. changes in the overall gain of the beta weights).

Compressive spatial summation (CSS) model

The CSS model takes a contrast image (i.e. an image representing the location of contrast in the visual field), computes a weighted sum of the contrast image using an isotropic 2D Gaussian, and applies a static power-law nonlinearity. The nonlinearity is typically compressive, hence the name of the model. There are two parameters controlling the position of the Gaussian, one parameter controlling the size of the Gaussian, a gain parameter, and an exponent parameter.

Example script. In the 'cssmodel_example.m' script, we show how to use the provided code to fit the CSS model to an example voxel from dataset 1. (Note that the data that are fit are beta weights derived from the time-series data, not the original time-series data. However, the code can be easily adapted to fit time-series data (i.e. for fitting population receptive field (pRF) models in the style of Dumoulin & Wandell, 2008). If you are interested in this, feel free to contact me.)

Second-order contrast (SOC) model

The SOC model starts with a grayscale image (luminance values), applies Gabor filters as a way of computing local contrast, computes second-order contrast within the spatial extent of an isotropic 2D Gaussian, and applies a static power-law nonlinearity. There are two parameters controlling a divisive-normalization operation performed on the outputs of the Gabor filters, two parameters controlling the position of the Gaussian, one parameter controlling the size of the Gaussian, a gain parameter, an exponent parameter, and a parameter that controls the strength of the second-order contrast computation. The SOC model can be viewed as an extension of the CSS model, with the primary addition being the computation of second-order contrast. Whereas the CSS model explains only how the location and size of the stimulus relate to the response, the SOC model is more general, explaining how an arbitrary grayscale image relates to the response.

Example script. In the 'socmodel_example.m' script, we show how to use the provided code to fit the SOC model to an example voxel from dataset 3.

Benchmark performance

To facilitate comparison with other models, we provide .mat files with fits (and cross-validated predictions) of our models to the data provided. The .mat files also provide R2 values that summarize model performance. (Note that these R2 values are computed with respect to 0, which is sensible given that beta weights reflect evoked activity above the baseline signal level; see papers for details.)

For datasets 1 and 2, we provide the fit (and cross-validated prediction) of the CSS model for each voxel in each dataset. The data that are fit (and cross-validated) are those contained in the 'betamn' variable, and there are 69 data points for each voxel. For datasets 3 and 4, we provide the fit (and cross-validated prediction) of the SOC model for the best 50 voxels in each of areas V1–V4 in each dataset (best in terms of GLM cross-validation accuracy). The data that are fit (and cross-validated) are the first 99 (out of 156) data points contained in the 'betamn' variable (these are the data points considered in the SOC paper). For dataset 5, we take the SOC model, as fit to the 99 data points from dataset 4, and generate a prediction of the responses in dataset 5. The data that are predicted are those contained in the 'betamn' variable, and there are 35 data points for each voxel.

Contents of 'dataset01_benchmark.mat', 'dataset02_benchmark.mat', etc.:

'modelfit' is voxels x amplitudes with the fit of the model.

'modelfitr2' is voxels x 1 with R2 values between the model fit ('modelfit') and the data ('betamn').

'modelpred' is voxels x amplitudes with the cross-validated prediction of the model.

'modelpredr2' is voxels x 1 with R2 values between the model prediction ('modelpred') and the data ('betamn'). Since these R2 values are cross-validated, they can be used for comparing different models (even if the models have different numbers of parameters, flexibility, etc.).

'xvalscheme' is cases x amplitudes with the resampling scheme used to compute 'modelpred'. Each row corresponds to a distinct resampling case; 1s indicate data points used for training and –1s indicate data points used for testing (prediction). (For the CSS model on datasets 1 and 2, predictions are obtained using leave-one-out cross-validation, i.e., a single beta weight is held out, the model is fit to the remaining beta weights, the prediction of the fitted model is computed for the held-out beta weight, etc. For the SOC model on datasets 3 and 4, predictions are obtained using five-fold cross-validation, with random selection of folds.)

'vxsselect' is voxels x 1 with logical values (0/1) indicating voxels for which modeling results are provided.

Note that the 'dataset05_benchmark.mat' file includes only the 'modelpred', 'modelpredr2', and 'vxsselect' variables, since these data were only predicted and not fit. Also, note that for dataset 5, a non-negative scale factor was applied to the predicted responses before computing R2 values in order to compensate for across-session instabilities (see SOC paper for details).

Simulations of the SOC model

The 'socmodel_example.m' script (described above) concerns the fitting of the SOC model and includes details specific to that endeavor. However, suppose we already know the parameters of the SOC model and we want to simulate responses of the SOC model to some stimuli. In this case, different code machinery is appropriate.

We provide code that performs simulations of the SOC model. One aspect of the code is that an entire map of outputs is produced (as opposed to a single output). This is because we assume, for simplicity and convenience, that a single instance of the SOC model is repeated systematically at different positions in the visual field (similar to a convolution).

Example script. In the 'socmodel_simulation.m' script, we show how to use the provided code to simulate responses of the SOC model to some example stimuli.

Typical parameter values. The script uses parameter values similar to the typical values found in V3. For completeness, we report here the typical parameter values found in each of V1, V2, V3, and hV4: sd = 0.9308, 1.0738, 1.4671, 2.1242; n = 0.1814, 0.1285, 0.1195, 0.1152; and c = 0.9276, 0.9928, 0.9941, 0.9472. (These values correspond to what is shown in the SOC paper.)

Note on implementation. The implementation we provide (specifically, socmodel.m) has not been optimized for memory efficiency. In particular, it may be difficult to run the code for high-resolution images. Please contact me if you encounter this problem.