Sunday, December 22, 2024

Predicting behavior from image features: Insights from cortical tuning

Must read

In a recent study published in Nature Communications, researchers investigate whether the human occipital-temporal cortex (OTC) co-represents the semantic and affective content of visual stimuli to guide behavior.

Study: Occipital-temporal cortical tuning to semantic and affective features of natural images predicts associated behavioral responses. Image Credit: patrice6000 / Shutterstock.com

The neuropathophysiology of responding to stimuli

Recognizing and responding to emotionally salient stimuli is crucial for evolutionary success, as it aids survival and reproductive behaviors. Adaptive responses vary by context, such as different avoidance strategies for a large bear as compared to a weak animal, or distinct approach responses for infants and potential mates.

While emotional stimuli activate various brain regions, including the amygdala and OTC, the neural mechanisms that contribute to these behavioral choices remain unclear. Thus, further research is needed to clarify how the integrated representation of semantic and affective features in the OTC translates into specific and context-dependent behavioral responses.

About the study 

The current study protocol was approved by the University of California Berkeley committee for protection of human subjects and informed consent. Data were collected from six healthy adults with a mean age of 24 and with normal or corrected vision.

Study participants viewed 1,620 natural images that were categorized into 23 semantic categories by four raters and obtained from the International Affective Picture System (IAPS), Lotus Hill image set, and internet searches.

The study cohort also completed six functional magnetic resonance imaging (fMRI) sessions, one of which was to obtain retinotopy scans and five for the main task, while viewing images projected onto a screen. All images were presented for one second with a three-second interval. Estimation scans involved pseudo-random image presentations with null trials, while validation scans used controlled sequences.

After the scan, the study participants rated image valence as negative, neutral, or positive and their arousal by the image on a nine-point scale. Additionally, fMRI data were collected on a three Tesla Siemens Total Imaging Matrix Trio scanner (3T Siemens TIM Trio scanner) and pre-processed using MATrix LABoratory (Matlab) and Statistical Parametric Mapping version 8 (SPM8), including converting images to Neuroimaging Informatics Technology Initiative (NIFTI) format, cleaning time series data, realignment, and slice timing correction.

Design matrices were constructed for data modeling, with L2-penalized regression used for feature weight estimation. Model validation used voxel-wise prediction accuracy, whereas principal components analysis (PCA) identified patterns of co-tuning to image features. 

Study findings 

The current study utilized a multi-feature encoding modeling approach to investigate how natural image semantic and affective features are represented in the brain. The experimental stimuli included 1,620 images varying widely in semantic categories and affective content.

Ridge regression was used to fit multi-feature encoding models to fMRI data acquired as subjects viewed these images. Six subjects each completed fifty fMRI scans over six two-hour sessions, with thirty training scans used for model estimation and twenty test scans for validation.

The Combined Semantic, Valence, and Arousal (CSVA) model described each image using a combination of semantic categories, valence, arousal judgments, and additional compound features. Moreover, fMRI data from model estimation runs were concatenated, and ridge regression was used to fit the CSVA model to each subject’s blood oxygen level dependent (BOLD) data.

Voxel-wise weights were estimated for each model feature and applied to the values of feature regressors for images viewed during validation scans to generate predicted BOLD time-courses for each voxel. These predicted time-courses were correlated with observed validation BOLD time-courses to obtain estimates of model prediction accuracy.

The CSVA model was found to accurately validate BOLD time-courses across the OTC. Additionally, the model outperformed simpler models containing only semantic or valence and arousal features.

Comparison using a bootstrap procedure revealed that the CSVA model outperformed the valence by arousal and semantic only models at both group and individual levels. The superiority of the CSVA model was particularly apparent in OTC regions with known semantic selectivity, such as the occipital face area (OFA) and fusiform face area (FFA).

Variance partitioning techniques showed that many voxels responsive to the full CSVA model maintained significant prediction accuracies when only variance explained by semantic category by affective feature interactions was retained. Furthermore, coding stimulus affective features was found to differentially improve model fit for animate versus inanimate stimuli, with a significantly greater increase for animate stimuli.

PCA of the CSVA model feature weights revealed consistent patterns of OTC tuning to stimulus animacy, valence, and arousal across subjects. The top three principal components (PCs) accounted for significantly more variance than stimulus features alone, and their structure was consistent across subjects. These PCs represented dimensions including stimulus animacy, arousal, and valence, with spatial transitions in tuning across subjects showing distinct cortical patches responding selectively.

OTC tuning to affective and semantic features of emotional images predicted behavioral responses, which explained more variance in behaviors than low-level image structure or simpler models. 

Conclusions 

Using voxel-wise modeling of fMRI data from subjects viewing over 1,600 emotional images, the researchers of the current study found that many OTC voxels represented both semantic categories and affective values, especially for animate stimuli. A separate group identified behaviors suited to each image.

Regression analyses showed that OTC tuning to these combined features predicted behaviors better than tuning to either feature alone or low-level image structures, thus suggesting that OTC efficiently processes behaviorally relevant information.

Journal reference:

  • Abdel-Ghaffar, S.A., Huth, A.G., Lescroart, M.D. et al. (2024). Occipital-temporal cortical tuning to semantic and affective features of natural images predicts associated behavioral responses. Nature Communicationsdoi:10.1038/s41467-024-49073-8

Latest article