Prior Reward Conditioning Dampens Hippocampal and Striatal Responses during an Associative Memory Task

Offering reward during encoding typically leads to better memory [Adcock, R. A., Thangavel, A., Whitfield-Gabrieli, S.,Knutson, B., & Gabrieli, J. D. E. Reward-motivated learning: Mesolimbic activation precedes memory formation. Neuron, 50, 507–517, 2006]. Whether such memory benefit persists when tested in a different task context remains, however, largely understudied [Wimmer, G. E., & Buechel, C. Reactivation of reward-related patterns from single past episodes supports memory-based decision making. Journal of Neuroscience, 36, 2868–2880, 2016]. Here, we ask whether reward at encoding leads to a generalized advantage across learning episodes, a question of high importance for any everyday life applications, from education to patient rehabilitation. Although we confirmed that offering monetary reward increased responses in the ventral striatum and pleasantness judgments for pictures used as stimuli, this immediate beneficial effect of reward did not carry over to a subsequent and different picture–location association memory task during which no reward was delivered. If anything, a trend for impaired memory accuracy was observed for the initially high-rewarded pictures as compared to low-rewarded ones. In line with this trend in behavioral performance, fMRI activity in reward (i.e., ventral striatum) and in memory (i.e., hippocampus) circuits was reduced during the encoding of new associations using previously highly rewarded pictures (compared to low-reward pictures). These neural effects extended to new pictures from same, previously highly rewarded semantic category. Twenty-four hours later, delayed recall of associations involving originally highly rewarded items was accompanied by decreased functional connectivity between the hippocampus and two brain regions implicated in value-based learning, the ventral striatum and the ventromedial PFC. We conclude that acquired reward value elicits a downward value-adjustment signal in the human reward circuit when reactivated in a novel nonrewarded context, with a parallel disengagement of memory–reward (hippocampal–striatal) networks, likely to undermine new associative learning. Although reward is known to promote learning, here we show how it may subsequently hinder hippocampal and striatal responses during new associative memory formation.


INTRODUCTION
Reward is a powerful tool to guide learning. It is usually the case that immediate reward drives various forms of learning (Kringelbach & Berridge, 2016;Seitz, Kim, & Watanabe, 2009;Singer & Frank, 2009;Bouton, 2007), including improving relearning after forgetting (Miendlarzewska, Ciucci, Cannistraci, Bavelier, & Schwartz, 2018). These effects of reward on learning are mediated by interactions between reward and memory networks (Bartra, McGuire, & Kable, 2013;Jocham, Klein, & Ullsperger, 2011;Lisman & Grace, 2005). Reward may in turn become associated with specific contexts in which it was delivered (Loh et al., 2016;Rigoli, Friston, & Dolan, 2016;Palminteri, Khamassi, Joffily, & Coricelli, 2015;Nakahara, Itoh, Kawagoe, Takikawa, & Hikosaka, 2004). In particular, midbrain dopamine neurons can represent context-dependent prediction error (Nakahara et al., 2004), whereas the representation of value in the human ventral striatum and medial/orbital PFC integrates value-relevant information (Bartra et al., 2013) from memory (the hippocampus), current emotional state (the amygdala), and cognitive goals (PFC; Samanez-Larkin & Knutson, 2015;Haber & Knutson, 2010). Whether such context dependency of reward representation could impair, rather than promote, subsequent learning remains unclear ( Wimmer, Braun, Daw, & Shohamy, 2014;. On the one hand, evoking the memory of a rewarding episode is usually associated with positive feelings and may restore a state of reward motivation. For example, effects of extrinsic incentives have been shown to "spill over" and lead to various forms of associative generalization (Miendlarzewska, Bavelier, & Schwartz, 2016), including increased response speed and vigor as in Pavlovian-to-instrumental transfer. This and related processes of memory reactivation have been documented to engage the hippocampus and the surrounding cortices in the medial temporal lobe together with the dopaminergic reward circuit, in particular the ventral striatum and the substantia nigra and ventral tegmental area (SN/ VTA; Dudai, Karni, & Born, 2015; Cohen et al., 2014;Dudai, 2012;. Retrieving a rewarding memory episode may also yield a preference for an object explicitly or implicitly associated with that episode (Hütter, Kutzner, & Fiedler, 2013; De Houwer, Thomas, & Baeyens, 2001). In addition, according to the Penumbra hypothesis (Lisman, Grace, & Duzel, 2011), the presence of dopamine at the hippocampal synapses, which may be triggered by recalling a rewarded memory, can enhance new memory formation (Atherton, Dupret, & Mellor, 2015;Thomas, 2015;Redondo & Morris, 2011;Wittmann et al., 2005). In this view, past reward should facilitate learning of new information.
On the other hand, presenting a stimulus for which reward was expected but is no longer offered may induce disappointment. Whereas such a shift in motivation may be particularly detrimental in an educational context, most studies reporting undermining effects of changing reward contexts or contingencies have not involved learning tasks (Ma, Jin, Meng, & Shen, 2014;Chib et al., 2012;Murayama, Matsumoto, Izuma, & Matsumoto, 2010). Similarly, when value-associated information appears in a novel context, participants tend to make economically suboptimal choices because of the lingering of the remembered value: When switched to a new context, participants' choices reflect option values with reference to the previously available alternative options and not in line with objective reward probabilities (Klein, Ullsperger, & Jocham, 2017).
Here, we used fMRI to probe the effects of reward conditioning on subsequent nonrewarded learning of object-location associations (a task known to engage the hippocampus; Bridge & Voss, 2014;Manelis, Reder, & Hanson, 2012;Takashima et al., 2009;Sommer, Rose, Gläscher, Wolbers, & Büchel, 2005). We hypothesized that initially rewarded stimuli trigger a reevaluation process when presented in a different task context in the absence of reward, which would hinder, rather than facilitate, the formation of new associations with the reward-related stimuli. We tested this hypothesis by examining the effects of reward conditioning on the early phase of a subsequent nonrewarded associative learning task, that is, at a time when reward was most likely to still exert lingering effects. We observed that, in such a new, nonrewarded task context after reward conditioning, the ventral striatum and hippocampus were relatively deactivated at encoding of associations with previously highly (vs. low-) rewarded stimuli, and that memory recall of those associations was poorer when tested 24 hr later. Because striatal regions display the characteristics of a prediction error that is used to update the relative value of options, a distinctive feature of this study was to assess whether such prediction error signaling would impair associative learning in a new context in which reward was no longer offered. Moreover, because memory retrieval typically triggers not only the reactivation of specific information about the stimulus but also about similar stimuli in memory (e.g., Horner, Bisby, Bush, Lin, & Burgess, 2015), we also tested whether reward-biased learning would affect stimuli that were semantically related to those previously paired with reward value.
In a two-step procedure, we first conditioned pictures from two distinct semantic categories with two levels of reward (high and low, respectively). In a subsequent object-location associative learning task, we tested how participants learned to associate these pictures to locations in the absence of reward. In this associative learning task, we also included a set of related but new pictures to simultaneously test whether any delayed effect of reward conditioning might transfer to nonconditioned but semantically related pictures.

METHODS Participant Details
Twenty-five participants took part in the experiment. Data from five participants were not used because of technical problems during acquisition (n = 2), excessive motion (total displacement > 3 mm; n = 1), and noncompliance with the learning task (n = 2). Thus, data from 20 participants were included in the analyses (13 women; age: mean = 24.35 years, SD = 3.71, range = 19-34 years). None of the participants reported a history of neurological, psychiatric, or medical disorders or any current medical problems and had normal or corrected-to-normal visual acuity. In addition, all had normal-range scores in the French versions of Beck Depression Inventory (Beck, Steer, & Brown, 1996; group mean score = 5.5, SD = 4.3) and state anxiety measured by the State-Trait Anxiety Inventory (Spielberger, 1983; group mean score = 36.9, SD = 7.5). All participants were students of the University of Geneva recruited by advertisements and provided written informed consent for participation. The study protocol was approved by the Ethics Committee of the Geneva University Hospitals, which abides by Helsinki principles.

Stimuli
Stimuli used were photographs obtained from an Internet search engine and belonging to two broad semantic categories: the sea and the savanna. The pictures were selected from a large picture dataset (n = 150) based on ratings performed by an independent group of 10 participants. Ratings were performed on five different 5-point Likert scales assessing emotional valence, arousal, familiarity, and also how interesting the content and visual composition of the pictures were.
The conditioning task used 80 pictures (40 from sea category and 40 from savanna category). Twenty of these pictures were used for a pleasantness task and 36 for the associative learning task (see below; while 24 were fillers to further strengthen category-specific conditioning). Another 36 "new" pictures (18 from sea, 18 from savanna categories) were not presented during conditioning but only used during the associative learning task, for testing transfer of conditioning effects. Thus, stimuli used in the associative learning task formed four separate lists of 18 photos (two lists for each semantic category), each containing the same number of exemplars from the following subcategories: single animal, multiple animals, vehicles, landscapes, human activity, and objects.
All pictures were high-resolution photographs scaled to 512 × 512 pixels. The mean luminosity was equalized to the overall mean using an in-house MATLAB script. Apparent contrast was calculated by dividing the standard deviation of luminance values by the mean luminance of each filtered picture. Luminance values were obtained using ImageJ (Schneider, Rasband, & Eliceiri, 2012). ANOVA performed on these metrics demonstrated that apparent contrasts did not differ significantly between the lists. Spatial frequencies of all pictures were analyzed in eight different bands using discrete wavelet transform (Delplanque, N'diaye, Scherer, & Grandjean, 2007). Using MANOVA, we determined that the prepared lists did not differ in terms of spatial frequencies. Pictures presented on a 1024 × 1280 screen were viewed by the participants in the mirror mounted on the head coil.

Experimental Design
The experiment consisted of two fMRI scanning sessions performed 24 hr apart. The first session was preceded by The experiment comprised three tasks: reward conditioning followed by a nonrewarded associative learning task and delayed recall. The learning task was composed of three learning cycles, each with two encoding and recall runs for the picture-location associations for a total of 72 pictures (36 per run). Finally, 24 hr later, the participants performed a 24-hr delayed recall test for learned picturelocation associations. (B) Example of a trial in reward conditioning. The task was to categorize the picture as semantically related to the sea or the savanna. For each trial, one category was consistently rewarded with 10 points (0.50 CHF, HR); and the other, with 1 point (0.05 CHF, LR) for a correct response. (C) Trials in the picture-location association learning task. In an encoding trial, a picture appeared in the middle of the screen and moved toward one of the six locations of the screen. The participants' task was to memorize the position for each picture. In a recall trial, the participants indicated the remembered location for the picture with one of six buttons on the response pads held in both hands. (D) On every trial of the delayed recall test, the participants were additionally asked to indicate their response confidence (0 = guessing, 4 = confident) and answer a source memory (temporal context) question ("Which run?" 1|2|I don't know). an instructional training session and comprised two successive tasks: a reward conditioning task followed by a nonrewarded object-location learning task (three cycles). The second session comprised a delayed recall test for locations of pictures learned on the previous day ( Figure 1A) as well as an anatomical scan and a proton-density scan.
Reward conditioning task. The first task was a reward conditioning procedure with 80 unique trials in which participants gained points for correctly assigning a picture to "sea" or "savanna" (40 trial-unique pictures per category) by a button press ( Figure 1B). One picture category was associated with high potential reward (HR = 10 points), whereas the other yielded low reward (LR = 1 point). The assignment of one semantic category to a given reward level was counterbalanced across participants. Correspondence of the buttons (left/right and sea/savanna) stayed constant throughout the task for a given participant and was counterbalanced across participants and across reward-category assignment. Participants were informed that points will be converted into real money that would add to their monetary compensation (10 points = 0.5 CHF) and that the maximum amount that they could win in the task was 440 points = 22 CHF.
Each trial of the reward conditioning task began with a fixation cross, followed by a cue (a color photograph from either the sea or savanna semantic category) presented at the center of the screen for 1.5 sec ( Figure 1B). After a variable time interval (mean = 2.5 sec, with min = 1.5 sec and max = 3.5 sec), a response screen appeared for 1.5 sec during which participants categorized the preceding picture by pressing, with their right hand, the left or right button to select one of the semantic categories written on the left or right part of the screen. Next, a feedback display was presented indicating whether the cue yielded an HR (10 points) or an LR (1 point). The HR feedback was a smiling piggy bank with animated golden coins falling into it; the LR was a sad-looking piggy bank with one silver coin falling into it. Participants were told that successful categorization of a picture in one of the categories (either sea or savanna) would always be associated with an HR whereas the other would always yield an LR, that no reward would be given for incorrect responses, and that this reward scheme would not change during the task. Participants collected points if they responded correctly while the response screen was on, but not if they responded incorrectly, or too early or too late, in which cases feedback was provided ("wrong button" or "too early"/"too late"). The conditioning task lasted about 18 min. No more than four trials of a same category appeared in a row. Intermediate feedback with accumulated points appeared four times during the task (after every 20 trials) with a message "You won xx points. Try to win some more!". The final score converted into CHF appeared at the end of the task.
To obtain a behavioral index of conditioning strength, participants rated the pleasantness for a subset of 20 pictures (10 from each of the semantic categories) presented one at a time. They moved the cursor on a horizontal scale from "unpleasant" (left) to "pleasant" (right) using the button box. This measure was collected twice: before and after the conditioning task. The distance of the final placing of the cursor from the center of the screen was used as a dependent measure in the computation of a behavioral conditioning index for the HR and LR pictures separately (i.e., average rating after minus before conditioning). This part of the task was not scanned, and time for response was unlimited. These 20 pictures were not used in the subsequent associative learning task. Because of extreme pleasantness scores (>2.5 SDs below the group mean), data from one participant were excluded from the pleasantness analysis (but retained for all other data analyses).
Nonrewarded object-location associative learning task. The second task consisted of three cycles of object-location association learning composed of subsequent runs of encoding and recall. Each cycle contained two such encoding-recall runs, each consisting of 36 trials for a total of 72 unique picture-location associations repeated in each cycle. Because our focus was on the portion of the task where the effects of conditioning were lingering, our analysis was performed on the first cycle of the task only.
There were four conditions: 18 pictures that have been HR conditioned in the first task (ConHR), 18 that were LR conditioned (ConLR), and 18 new pictures from each semantic category that formed the transfer conditions (transfer high-reward [TrHR] and transfer low-reward [TrLR]). The pictures were presented in a semi-randomized order within an encoding or recall run. In rapid eventrelated fMRI designs, activation of the reward system in the HR condition could spill over to the next low-rewarded trial. Therefore, to isolate and potentiate the likelihood of seeing a differential effect of HR and LR conditions on memory, HR and LR trials were blocked. Pictures from one condition were grouped into mini-blocks of nine consecutive trials, and mini-blocks were distributed within each cycle as follows: the order of pictures in a mini-block was randomized from cycle to cycle and between encoding and testing. Such a structure-single presentations of the individual stimuli in a cycle and randomization of the stimuli within a mini-block-allows to fully benefit from the advantages of blocked presentations (i.e., potentiating the differential effect of the levels of reward) while preventing possible adaptation effects over the course of a mini-block.
One cycle was composed of four fMRI runs (two for encoding and two for recall; Figure 1A), and each fMRI run contained one mini-block of each of the four conditions, for a total of 36 pictures per run. This temporal separation was necessary because it is difficult to memorize more than 36 associations.
Locations were assigned to the pictures such that, for a given condition, three pictures were associated with one location, evenly distributed across the two blocks. Care was taken that the subcategories (i.e., animals, human activity, vehicles) were shuffled across the possible screen locations so as to prevent the participants from inadvertently relying on location subcategory patterns. At each encoding/recall run, the order of pictures within the blocks changed. The condition order within the blocks (e.g., ConHR, TrLR, ConLR, TrHR) changed across the runs and cycles. However, at both immediate and delayed recall, blocks within a run appeared in the same order as during the corresponding encoding run. That way, we ensured that a similar amount of time had passed from encoding to recall of each association.
Participants' task was to observe and memorize the placement of each picture on the screen (one of six fixed positions) during the encoding phase and to indicate the remembered position with one of six response buttons during the recall phase ( Figure 1C). Participants were explicitly asked to give a response on every trial, even if unsure. We suggested the participants form stories as an example of an encoding strategy (such as "people on the beach went North-East"); however, no formal control over strategies was applied. They were also told that the objectlocation assignment was random and that, although the order of trials differed between the cycles, each picturelocation association was unique and remained the same throughout the task.
Participants were trained outside the scanner to learn to associate the screen locations (six dots) with buttons of the response boxes they held in both hands on a version of the task with a separate set of black-and-white drawings (not used in the main task). Once in the scanner, the same training version of the task was repeated to facilitate the visuomotor mapping of the picture locations onto the motor response in a supine position.
Object-location delayed recall. Twenty-four hours after the first session, participants came back to the laboratory to perform a delayed memory test that lasted about 25 min ( Figure 1A). Each trial required three successive responses: an object-location decision, a confidence rating for that response, and a source judgment ( Figure 1D). For the source memory question, we used the temporal separation between the first and second runs of 36 pictures in the learning on the day before, to test the participants' memory of temporal context in which an association was presented. At delayed recall, all trials were shuffled randomly.
At the beginning of each trial, the picture was displayed at the center of screen with six white dots indicating possible locations. For the first object-location response, as during the first session, participants had 1 sec to deliberate and 2 sec to respond by selecting a location using one of the six buttons. The screen subsequently changed to display "How confident are you?" with four response options: "Confident = 3 | Rather certain = 2 | Somewhat sure = 1 | Guessing = 0." Participants had 3 sec to respond using their right-hand button box. Finally, the screen changed to display the source memory question "Which run of the study phase?" with three response options: "First | Second | Don't know" for 5 sec ( Figure 1D). The "don't know" source option was offered to reduce potential contamination by guessing on the source decision, as has been implemented in similar studies (e.g., Duarte, Henson, Knight, Emery, & Graham, 2010). Each trial lasted on average 11.5 sec. Confidence responses were used as covariate in the analyses of location response (see below Functional MRI data analysis section). Source memory responses are not reported in detail here. On average, about 2% of trials were excluded from analysis because of lack of timely response on the location question.
All tasks were conducted inside the MRI scanner with continuous MRI data acquisition throughout the tasks, except for the pleasantness rating, which was not scanned. Once in the MRI scanner, participants were given noisedampening earplugs and headphones as well as four-button MRI-compatible response boxes (Current Designs Inc.) in their right and left hand.
At the end of the procedure, participants were debriefed and asked for their personal preference about the semantic categories and for the category associated with HR in the conditioning task. All 20 participants correctly remembered the HR category, and 14 reported having no general preference between the sea and the savanna.

Psychometric Questionnaires
After the termination of the experiment, participants filled out the Behavioral Inhibition and Activation Scale (Carver & White, 1994). The three Behavioral Activation Scale (BAS) subscales include items related to the pursuit of appetitive goals (BAS drive), the inclination to seek out new rewarding situations (BAS fun seeking), and positive affect/excitability (BAS reward responsiveness [RR]; e.g., "When good things happen to me, it affects me strongly"). Because the subscale RR of the BAS was found to correlate with the connectivity between striatal and sensory regions (DelDonno et al., 2017) and with shorter RTs in conditions of HR motivation (Chaillou, Giersch, Hoonakker, Capa, & Bonnefond, 2017), we included it as a covariate in our analyses.

MRI Data Acquisition Parameters
A 3-T whole-body MRI scanner (TIM Trio) with the product 32-channel head coil was used in the experiment. Earplugs were used to attenuate scanner noise, and head movement was restricted using memory foam pillows. Functional images were acquired using a multiplexed EPI sequence (Feinberg et al., 2010) with repetition time (TR) = 650 msec, echo time (TE) = 30 msec, flip angle = 50°, 36 slices, 64 × 64 pixels, 3 × 3 mm voxel size, and 3.9-mm slice spacing. The multiband acceleration factor was 4, and parallel acquisition technique was not used. A high-resolution structural T1 scan and a proton-density weighted scan were acquired at the end of the second scanning session. Structural images were acquired with a T1-weighted 3-D sequence (magnetization prepared rapid gradient echo; TR = 1900 msec, TE = 2.27 msec, flip angle = 9°, parallel acquisition technique factor = 2, 256 × 256 × 192 voxels, 1 × 1 × 1 mm voxel size).

Statistical Analyses fMRI Data Preprocessing
EPI images were preprocessed using SPM software SPM8 (Wellcome Trust Centre for Neuroimaging) implemented in MATLAB R2012a (The MathWorks, Inc.). To avoid T1 saturation effects, image acquisition for each run started after 10 dummy volumes had been recorded. Functional images were spatially realigned to the mean of the images, coregistered to the anatomical scan, spatially normalized to the standard MNI EPI template, and spatially smoothed with an isotropic 8-mm FWHM Gaussian kernel (Friston et al., 1994).
Inspection of motion parameters obtained after image realignment using ArtRepair (Mazaika, Hoeft, Glover, & Reiss, 2009) revealed that all but one participant's total motion was less than 3 mm (n = 1 was excluded from analysis). Selected participants' (n = 20) structural volumes were normalized to the MNI T1 template before creating a mean T1 image used for visualization in reported figures.
Separate first-level models were built for the conditioning task, the first cycle of the nonrewarded object-location learning task, and the delayed recall of the learning task. Functional data were analyzed by convolving the onset of each event with a hemodynamic response function. The six movement parameters estimated during realignment were also included to capture residual (linear) movement artifacts. The sets of voxel values obtained from the different contrasts constituted maps of t statistics. The individual summary statistical images were used in a secondlevel analysis, corresponding to a flexible factorial design analysis. Above-threshold activation using an SPM 12's default whole-brain FWE correction at p < .05 with a minimum cluster size of 5 contiguous voxels was reported. Where noted, ROI-based analyses with small volume correction (SVC) were carried out to complement whole-brain FWE-corrected results.
The general linear model (GLM) for the conditioning task included the cue and feedback onset, separately for HR and LR. For the nonrewarded object-location learning task, the GLM included eight event types, corresponding to the four conditions (ConHR, ConLR, TrHR, and TrLR) for encoding and recall phases, separately. The same recall conditions were modeled for the delayed recall test. Each regressor modeled the BOLD activity corresponding to the onset of a picture with the addition of a parametric modulator representing time (linearly descending with trial count). We included this time modulation as the initial effect of reward conditioning is expected to undergo extinction as the items are presented again and again without reward.
The recall regressors also included a second modulator, that is, the scaled Euclidean distance to target (DTT) for the particular response. Euclidean DTT in pixels was simply divided by the absolute maximum possible DTT. As a result of the scaling, the modulator takes values between 0 and 1, where 0 is a correct answer and 1 is the absolute maximum error. Group effects were investigated using separate second-level flexible factorial models for encoding and recall, with correction for condition variance nonsphericity.
The participant-level covariate RR (z-scored RR; subscale of BAS) was included in the second-level fMRI analyses of data acquired in the first session because we found it had a significant influence on the behavioral results. The focus of the analysis of Cycle 1 of the learning task was on the effect of reward level, with critical contrasts between conditions of HR and LR (ConHR + TrHR > ConLR + TrLR).
First-level GLM of the data from the second session (acquired 24 hr later) included the following regressors: location recall with two parametric modulators, DTT and response confidence (per each condition), and the onset of the source memory question.

Regions of Interest
Building on the literature describing the interactions between reward learning and memory, we selected three ROIs to examine in detail the interaction of the reward system with the spatial learning system. Specifically, we focused on the bilateral hippocampus interaction with the SN/VTA (Ripollés et al., 2016;Adcock, Thangavel, Whitfield-Gabrieli, Knutson, & Gabrieli, 2006) and bilateral ventral striatum regions (Wimmer, Daw, & Shohamy, 2012;. Bilateral hippocampus masks (left and right) were defined from WFU-Pickatlas v3.0.5 (Maldjian, Laurienti, Kraft, & Burdette, 2003). A bilateral ventral striatum ROI was defined based on the online metaanalysis tool Neurosynth.org ( Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011) of studies associated with "reward anticipation" (reverse inference). The image was thresholded with FDR correction of p < .01; visualized in bspmview toolbox (v. 20151217, Bob Spunt, California Institute of Technology) at t value > 5, with a minimum cluster size of 40; and then smoothed with robust smoothing (default settings) to isolate the clusters. The bilateral ventral striatum cluster contained 97 voxels and was saved as a binary image matching the size of the fMRI contrast images. An anatomically defined SN/VTA area was manually defined in MRIcron (Rorden, Karnath, & Bonilha, 2007) on the group mean of the proton density images, where it can be distinguished from surrounding structures as a bright stripe (after Schott et al., 2006) and saved as a binary mask (64 voxels). These ROIs were used for functional connectivity analyses. We also extracted signal change from the mesolimbic ROIs defined above (ventral striatum and SN/VTA) to test for a link between activation of these regions during the conditioning task and the subsequent encoding of the nonrewarded learning task. The same ROIs were used for SVC correction in the main linear contrasts when indicated. SVC correction is applied using FWE cluster correction.

Functional Connectivity
All connectivity analyses were carried out using CONN Toolbox v.17f (Whitfield-Gabrieli & Nieto-Castanon, 2012) implemented in SPM 12 (www.fil.ion.ucl.ac.uk/spm/ ) using ROI masks described in ROIs section (left and right hippocampus, bilateral ventral striatum, SN/VTA) as seed regions. This toolbox permits computation of temporal correlations of BOLD signals between selected ROIs, or between selected ROIs to other voxels in the brain, and has been used in earlier functional connectivity studies of reward processing (Alba-Ferrara, Müller-Oehring, Sullivan, Pfefferbaum, & Schulte, 2015; Peciña & Berridge, 2013). Generalized psychophysiological interaction (gPPI; task-modulation effects) analyses were performed using smoothed functional images modeled as zero-duration events with onsets on picture display (in the encoding of the nonrewarded object-location task from Cycle 1) and picture location question (at delayed recall) for each of the four conditions. This analysis used the hemodynamic-response-functionconvolved impulse time series, bandpass-unfiltered, as parametric/linear modulators of the connectivity between two ROIs or voxels. The time series of activity from a seed ROI were extracted and plugged into the original GLMs to compute a correlation to test whether this correlation changed as a function of a specific contrast. Briefly, gPPI allows for an analysis of task-associated connectivity without the two-condition constraint necessary for traditional PPI analysis by controlling for the main effects of any number of conditions across the scanning session in a single model. "Task-associated" connectivity can therefore be analyzed independent of task-associated effects on BOLD response. We used the Conn toolbox's (v.17) recommended FDR seed-level correction for the ROI-to-ROI and seed-tovoxel analyses reported in the paper.
gPPI analyses were performed to model voxels whose covariance with the hippocampus, ventral striatum, and SN/VTA was influenced by reward (High, Low) and picture status condition (Conditioned, Transfer). Individual participants' motion parameters and main effects of task condition were modeled as nuisance covariates. Reported connectivity values are the Fisher-transformed correlation coefficients extracted for each condition at the second level. For display purposes, we extracted connectivity values for each participant for each condition in the ROI-to-ROI analysis on data from the recall phase of Cycle 1 (Figure 7). To illustrate mean group connectivity values for clusters detected in the seed-to-voxel analysis of data from Day 2, we extracted connectivity values from an F contrast of all recall conditions within the obtained clusters using an inclusive mask ( Figure 7B).

Behavioral Data Analysis
For the conditioning task, only successful trials (with a response executed in time) were included in the analysis. Misses and incorrect responses constituted only 0.56% of all trials.
For the object-location learning task, performance was assessed as RTs and DTT using trial-wise data. For correlations, mean DTT of a given category per subject was used.
Euclidean distance was calculated as where n refers to the location on the screen (in pixels) selected by the participant and 0 is the target location. Consequently, there were six possible values the DTT can take: five values for incorrect responses and zero for a correct response (distance of zero).
To verify learning performance, we measured memory accuracy ] correct responses total ] of trials for each condition and cycle. The average group performance was well above the chance level of 16.67% throughout the task. Note that n = 1 participant who failed to show improvement from Cycle 1 to Cycle 3 was excluded from all analyses (accuracy of about 25% across the three cycles; reported in Table 1 below). RTs for the conditioning and associative learning tasks were log10-transformed before analysis. Misses were excluded from data analysis (0% in the reward conditioning task, 1.6% of all responses in the associative learning task). Responses with RT < 100 msec were regarded as impulsive and excluded from behavioral data analysis (3.9% of responses in the reward conditioning task, 0.002% of responses in the associative learning task). RT and DTT data were analyzed at a single-trial level using a linear mixed model (Baayen & Milin, 2010) implemented in SPSS v.22 (IBM Corp. Released 2013; IBM SPSS Statistics for Windows, IBM Corp.), using restricted maximum likelihood estimation method. A linear mixed model (also known as random effects model) accounts for within-participant correlation of repeated measurements with the inclusion of a random intercept for participants. In addition, and unlike traditional repeated-measures analyses, mixed models can handle unbalanced data (e.g., unequal trials per condition) and trial-level covariates (such as confidence ratings) and do not require normally distributed data. All linear mixed models included random factor Participant and fixed factors Reward (high, low) and Picture Status (conditioned, transfer). RR trait scores were z-scored on group level and included as a participant-level covariate in behavioral data and fMRI analyses of the associative learning task. For the analysis of RT data in the associative learning task at immediate and delayed recall, we additionally separated the trials into those correctly and incorrectly recalled, thus adding one fixed factor Correctness (correct, incorrect). DTT data from the delayed test were analyzed excluding responses reported as guesses (∼12.5%), but with a covariate confidence for the retained nonguessed responses.

Behavior
The experiment began with a reward conditioning task in which monetary rewards of two levels (HR = 10 points, 0.50 CHF; LR = 1 point, 0.05 CHF) were offered for correct picture-categorization responses ( Figure 1B). Reward conditioning success was assessed via pleasantness ratings performed before and after conditioning on a visual analog scale for a subset of pictures (10 for each reward level) and confirmed that preference scores changed as a function of reward level (linear mixed model with fixed factor Reward [high, low] and random factor Participant: F(1, 378) = 4.42, p = .036; Figure 2A). We used the average difference score (postconditioning minus preconditioning preference; Figure 2A) measured on a visual analog scale from "unpleasant" to "pleasant" as a behavioral index of each participant's conditioning strength and found that it correlated positively with individual trait RR, a subscale of the BAS (Carver & White, 1994; Pearson's r = .45, p = .043). There was no main effect of Reward on RTs in the conditioning task (RTs; linear mixed model with fixed factor Reward [high, low]: F(1, 1507) = 0.123, p = .726), but the random factor Participant was significant (Wald's Z = 3.023, p = .003), pointing to significant inter-participant differences. Indeed, when we included RR as a covariate, we found that participants with higher trait RR classified pictures faster in the conditioning task than those with lower trait RR (a main effect of the covariate RR on RTs: F(1, 9.5) = 19.4, p = .002), but no significant interaction with factor Reward (RR × Reward: F(1, 743.1) = 2.275, p = .132).

fMRI-Reward Conditioning
Because trait RR was found to significantly interact with behavioral performance in the conditioning task, we systematically included it as a covariate in our second-level fMRI models to account for potential individual modulations in the effects of reward on the nonrewarded objectlocation associative learning task. Comparing HR to LR picture presentation (main regressors) revealed robust activation clusters within the mesocorticolimbic dopaminergic reward circuit, including the SN/VTA, bilateral ventral striatum, left superficial amygdala, and ventromedial PFC Only data from Cycle 1 and delayed recall 24 hr later are reported in this paper. (vmPFC; Figure 2B and During the presentation of the reward feedback, significant activation was observed mostly in visual occipital cortex, likely reflecting perceptual differences between the reward feedback type: HR was presented as four moving coins falling into a piggy bank, whereas LR was represented as one coin falling. At a lower threshold ( p < .001 uncorrected), activation was detected also in the vmPFC (corrected with an SVC using a functionally defined mask; reported in Table 2). We conclude that the conditioning task successfully induced reward learning with a higher anticipatory reward response for the HR compared to LR pictures.

Effects of Reward Conditioning on Subsequent Nonrewarded Associative Learning Task
In the learning task that followed the conditioning procedure, the participants learned to assign one spatial location (of six possible locations) to pictures that has been reward conditioned (18 ConHR and 18 ConLR) as well as to new pictures that belonged to the same two semantic categories (referred to as transfer pictures; 18 TrHR and 18 TrLR). We henceforth refer to this factor with two levels-Conditioned and Transfer-as "picture status." The task was composed of alternating encoding (memorize) and recall (respond) runs ( Figure 1A).

Behavior-Associative Learning Task
Main analyses. Consistent with the effect observed in the conditioning task, trait RR interacted with recall performance for both DTT and RT measures. Specifically, we found that individuals with high RR responded faster and more accurately on trials with pictures previously conditioned with HR compared to LR (negative correlation between RR and mean DTT for ConHR > ConLR Spearman's ρ = −0.46, p = .041; negative correlation between RR and RTs ConHR > ConLR Spearman's ρ = −0.4174, p = .0671; no effects were found when comparing transfer pictures, DTT TrHR > TrLR ρ = −0.15, p = .52; RTs: p = .68). Consequently, RR was used as a covariate in secondlevel fMRI analyses.  [conditioned, transfer], and Correctness [on target, off target] and covariate RR as well as all two-and three-way interactions of main factors). We also found a triple interaction of Picture Status × Correctness × RR, F(1, 1365.789) = 5.164, p = .023. There was also a trend for an interaction between Picture Status × Correctness, F(1, 1372.14) = 3.146, p = .076, because of faster RTs for correct responses to previously conditioned pictures. Main effect of reward was not significant, F(1, 1377.453) = 2.458, p = .117, neither was the interaction Reward × RR, F(1, 1319.929) = 0.637, p = .425 ( p values for other effects > .385). We next split the analysis following the interaction pattern and discovered that, for directly conditioned but not transfer pictures, the RR covariate was significant, Behavioral effects of trial position within mini-blocks. To avoid any progressive change (increase or decrease) of reward effect within a mini-block, we made sure that each individual picture was presented only once during Cycle 1, thus preventing any habituation effect from the repetition of identical pictures. Yet, blocked presentation of rewarding events could potentially increase expectation (i.e., reduce reward prediction error-like activity) over the course of one mini-block. To ensure that the reward effect was not significantly affected by the succession of pictures of the same reward condition within a mini-block, we analyzed the behavioral data considering the trial position in mini-block at encoding in Cycle 1 (thereafter "Trial Position") as a covariate.
For the recall data from Cycle 1, the analysis of RT data included fixed factors Reward (high, low), Picture Status In summary, these results suggest that recall performance (RT and DTT) in Cycle 1 did not interact with trial position within the mini-block during encoding in Cycle 1.

fMRI-Associative Learning Task
HR vs. LR. Comparing brain activity elicited by previously high-versus low-rewarded pictures (ConHR + TrHR > ConLR + TrLR) at encoding of picture-location associations Figure 3. Effects of reward conditioning on brain activation in a subsequent nonrewarded object-location learning task. (A) Activation of the right hippocampus and the bilateral ventral striatum during encoding of previously LR conditioned and semantically related transfer pictures versus HR conditioned and transfer pictures. Whole-brain contrast (ConLR + TrLR > ConHR + TrHR) corrected with FWE at p < .05. (B) Mean parameter estimates ± 95% confidence interval extracted from the activation clusters shown in A, presented for visualization purpose only. Con = previously conditioned; Tr = transfer. yielded no significant voxels, even at a lenient threshold ( p < .001 uncorrected). Interestingly, the opposite contrast (ConHR + TrHR < ConLR + TrLR) revealed robust activation in the bilateral ventral striatum and in the right hippocampus ( Figure 3 and Table 3).
We also compared signal extracted from the a priori defined reward-related ROIs (ventral striatum and SN/VTA) during the conditioning task and the subsequent first encoding of the nonrewarded learning task. In both regions, response to HR feedback during conditioning (contrast HR > LR feedback; Table 2) correlated negatively with the response of the same region during the subsequent learning of locations for the same pictures (ventral striatum: Spearman's ρ = −0.52, p = .017; SN/VTA: Spearman's ρ = −0.63, p = .0036). These results suggest that the more a participant's reward areas responded to high monetary reward during the conditioning task, the lower was their activation when subsequently learning the location of HR conditioned pictures during the nonrewarded object-location task.
Reward effects in subsequent encoding cycles of the associative learning task. We performed additional exploratory whole-brain analyses to assess the reward effect in subsequent encoding cycles, beyond Cycle 1. Comparing the HR and LR for conditioned and transfer pictures revealed a strong decrease of activity in the ventral striatum for Cycle 1 compared to the data for the same contrast from Cycle 2 ( Figure 4A). No significant activation was found when directly comparing reward effect in Cycle 2 versus Cycle 3, even at p < .001 uncorrected ( Figure 4B). Together, these results indicate that, as expected, the reward-valuerelated modulation of activity predominated at the beginning of the task as compared to during subsequent repetitions of the cycles.
The main effects of reward in Cycle 2 and in Cycle 3 were evaluated separately (Table 4). In Cycle 2, we did not observe any activation for HR < LR for both conditioned and transfer pictures at the corrected threshold (FWE, p < .05); there was some inferior temporal gyrus activation at p < .01 uncorrected, as illustrated in Figure 5A. In Cycle 3, HR < LR (Conditioned + Transfer) at p(FWE) < .05 revealed a 7-voxel activation in the left hippocampal area ( Figure 5B). As mentioned in the Methods section, all other fMRI analyses of the nonrewarded associative learning task focus solely on Cycle 1.
Picture status during the associative learning task. We examined the effects of prior reward on the encoding of object-location associations separately for conditioned and transfer pictures. We first conducted a whole-brain FWE analysis. For high versus low conditioned pictures, only bilateral ventral striatum was significantly less activated, and for high versus low transfer pictures, only the right hippocampus was significant ( Figure 6 and Table 5). Because these effects were in the similar direction (HR < LR) for conditioned and transfer pictures in both areas (see Figure 3B), the net effect of reward was stronger when pooling both conditioned and transfer conditions together. Indeed, in a second analysis step using an SVC with an a priori defined anatomical bilateral hippocampal mask on the contrast comparing high versus low conditioned pictures (see Methods), we verified a cluster in the right hippocampus (Table 5; Figure 6A, left). Applying the same approach with the a priori bilateral functional ventral striatum mask for the transfer pictures (TrHR < TrLR), we identified a cluster of activation (Table 5; Figure 6B, right). Overall, both regions showed similar trends for a reduced activity for HR versus LR for each of the picture status.
Functional connectivity during the associative learning task. Motivated by previous research reporting stronger connectivity during encoding of reward-paired stimuli Clusters reported at whole-brain cluster p(FWE) < .05.
Effects of trial position within mini-blocks on rewardrelated activation during the associative learning task.
As blocked presentation of rewarding events could potentially increase expectation (i.e., reduce reward prediction error-like activity) over the course of one mini-block, we  investigated time-dependent changes in reward effects within a mini-block for the fMRI data. Thus, to test whether the overall decrease in fMRI activity for previously rewarded pictures might be partly because of trial position within a mini-block, we performed a control analysis, which included separate regressors for the first three trials, middle three trials, and last three trials (separately for the directly conditioned HR trials, the TrHR trials, the direct LR trials, and the TrLR trials). Because this new design incorporated a temporal dimension (here three successive triplets of trials), we did not include an additional parametric modulator by time. Next, we extracted the parameter estimates from the ventral striatum ROI (Figure 8). A repeatedmeasure ANOVA performed on these values, with Reward, Picture Status, and Trial Position as factors, replicated the original findings of less activation for high-than lowrewarded pictures (with the Greenhouse-Geisser correction for nonsphericity; main effect of Reward: F(1, 19) = 7.039, p = .016; no effect of Picture Status: F(1, 19) = 0.001, p = .973; no Reward × Picture Status interaction: F(1, 19) = 856, p = .366). Critically, there was no interaction of Trial Position × Reward, F(1.582, 30.06) = 1.46, p = .247. Together, these results demonstrate that the average decrease in activity for HR compared to LR pictures was not because of some dynamic change in prediction error that would unfold over the course of the mini-blocks.

Effects of Reward Conditioning at 24-hr Delayed Memory Test
Behavior Participants returned to the laboratory exactly 24 hr later and were asked to recall the learned picture locations from the nonrewarded associative learning task. On each recall trial, they also indicated the confidence of their response, including an option "guessing." The proportion of "guess" responses did not differ between the picture categories (on average, guesses constituted 12.2 ± 10% [mean ± SD] of all responses; ANOVA with factors Reward and Picture  DTT was analyzed excluding "guess" responses and including a trial-level covariate of level of confidence. We found that responses for HR pictures tended to produce higher DTT (i.e., worse memory performance). This was the case for both originally conditioned HR and TrHR pictures (   In summary, none of these additional behavioral analyses disclosed any significant interaction of the trial position within the mini-block during encoding in Cycle 1 and subsequent delayed memory recall.

fMRI Connectivity Analysis-24-hr Delayed Memory Test
With the functional connectivity of fMRI data acquired while the participants were encoding the associations with previously HR pictures (vs. LR), we found reduced coupling from the seed region SN/VTA with the ventral striatum and the left hippocampus. Consequently, we tested the SN/VTA and the left hippocampus as seeds for functional connectivity analysis during recall at Day 2. We performed a functional seed-to-voxel connectivity analysis (gPPI) on a priori selected hippocampal seed regions. During recall 24 hr after learning, functional connectivity with the left hippocampus was significantly lower for recall of HR versus LR picture locations (ConHR + TrHR < ConLR + TrLR) in two clusters: the vmPFC (peak coordinates: x = 8, y = 32, z = −12; k = 121, cluster p(FDR) = .014) and the right nucleus accumbens (peak: x = 8, y = 10, z = −16; k = 137, cluster p(FDR) = .014; Figure 9). Same analysis for the right hippocampus seed revealed a main effect of reward in the contrast HR < LR (Con + Tr). Connectivity with the right hippocampus was lower for right orbitofrontal cortex (OFC; k = 172; x = 36, y = 62, z = −4; cluster p(FWE) = .004) and inferior temporal gyrus (k = 161; x = +54, y = −46, z = −20; cluster p(FWE) = .006) during recall of high compared to low value associations. Note that the connectivity was significantly lower for conditions with worse associative memory performance (i.e., HR) and, as in behavioral data, the effect was general for both originally rewarded stimuli and those that were semantically related. The SN/VTA seed yielded no significant voxels in the comparison of ConHR versus ConLR pictures, at the adopted threshold of cluster-based p(FDR) < .05.

DISCUSSION
Research on how extrinsic incentives affect learning and decision-making has demonstrated an intriguing paradox. Although much experimental evidence shows that activation of the reward circuit enhances learning and recognition memory (Igloi, Gaggioni, Sterpenich, & Schwartz, 2015;Gruber, Gelman, & Ranganath, 2014;Murty & Adcock, 2014;Adcock et al., 2006;Wittmann et al., 2005), several studies pointed out situations in which offering monetary reward reduces performance-an effect accompanied by reduced activation of the ventral striatum (Murayama et al., 2010).
Our study contributes to deciphering the complex relationship between the timing of reward and its effects to facilitate or interfere with establishing new associations in memory, so important for practical purposes such as education. In other words, would a previously rewarded item when presented again in a different learning context facilitate or hinder that novel learning? Here, we hypothesized that initially rewarded stimuli trigger a reevaluation process when presented in a different task context in the absence of reward, which would hinder, rather than facilitate, the formation of new associations with the reward-related stimuli. We tested this hypothesis by assessing the effects of reward conditioning on a subsequent nonrewarded associative learning task at a time when reward was most likely to still exert lingering effects. We report a negative reward prediction error-like signal in the ventral striatum paralleled with reduced hippocampus activity during the encoding of these new associations, an effect that correlated with trait RR. At the behavioral level, we found a trend for worse memory formation for previously highly rewarded (HR) pictures compared to LR pictures, 24 hr after picture-location learning, despite an increased preference for ConHR pictures right after the conditioning task. Furthermore, we found indication that these neural and memory effects may affect not only the specific reward-associated items but also new, never-conditioned pictures from the same semantic category.
One previous behavioral study demonstrated a rewarddriven enhancement of recognition memory that emerged 24 hr after encoding pictures from the same semantic category (Patil, Murty, Dunsmoor, Phelps, & Davachi, 2017) that had not been directly paired with performancedependent reward. Here, we report an opposite effect presumably because of the crucial difference in timing of reward association. In contrast to our design, the nonrewarded task, for which a memory enhancement was observed by Patil and colleagues, was administered before the reward association phase. The enhancement of recognition memory was thus a retroactive effect of reward that affected the consolidation process because of postencoding interactions between the reward and memory systems. Comparing the study by Patil et al. and ours emphasizes that timing of reward association may determine the direction of interaction between the ventral striatum, the VTA, and the hippocampus, and their consequences on delayed recall accuracy.

Reactivation of Reward Memory in a New Context Leads to Adjustment of Associated Reward Values
First, at encoding during the novel unrewarded picturelocation association task, presentation of a previously HRconditioned picture (now, in a different context that no longer predicts reward) resulted in decreased BOLD signal in the ventral striatum and the hippocampus (Figure 3). Human participants have been shown to rely on relative rather than absolute values, their choices and neural value Figure 9. Functional brain connectivity during delayed (24-hr) recall of the objectlocation task. (A) Reduced functional connectivity during picture-location recall 24 hr after learning for stimuli from previously HR-conditioned category. Activity in the seed region in the left hippocampus (CA) is significantly less correlated with both the vmPFC (+8, +32, −12) and the ventral striatum (+8, +10, −16) for the contrast ConHR + TrHR < ConLR + TrLR. (B) Data represent the group average of Fisher-transformed correlation coefficients extracted for each condition for each participant at the second level for the clusters ventral striatum and vmPFC. Error bars are ±95% confidence intervals. * denotes p < .05 for the main effect of reward.
representation reflecting contextual adaptation to previously learned values in the striatum (Klein et al., 2017) as well as in the SN/VTA (Hétu, Luo, D'Ardenne, Lohrenz, & Montague, 2017). Thus, the associated value signal in the striatum displays the characteristics of a relative value prediction error. We found a similar relative value prediction error-like signal in the ventral striatum. In our task, the consequence of such memory-related relative value coding may have contributed to poorer learning of novel picture-location association. Because the ventral striatum has been shown to reflect the current value of a stimulus (Levy & Glimcher, 2012;Jocham et al., 2011;Lim, O'Doherty, & Rangel, 2011), we may interpret this result as devaluation of the stimulus mediated by the ventral striatum and functional connectivity with the SN/VTA (Figure 7), suggesting the involvement of the mesocorticolimbic dopaminergic pathway (Hauser et al., 2009).

Downward Value Adjustment Spreads to Semantically Similar Stimuli
In the current experiment, we found that the ventral striatum and hippocampus signaled a negative reward prediction error signal for pictures from the entire semantic category (sea or savanna) that had been paired with a high level of reward (Figures 3 and 9).
Previous studies reported that the ventral striatum prediction error during reward learning incorporates information related not only to the predictive cue (the specific rewarded pictures in our case) but also to perceptually similar cues (encoded by the hippocampus), accounting for generalization of reward expectation (Aberg, Doell, & Schwartz, 2015;Gerraty, Davidow, Wimmer, Kahn, & Shohamy, 2014;Kahnt, Park, Burke, & Tobler, 2012). The degree of perceptual generalization at a transfer test depends on dopaminergic transmission and correlates with activation in the hippocampus (Kahnt & Tobler, 2016). Our results suggest that fMRI signal related to reward memory in the ventral striatum and the hippocampus areas could also represent a semantic (rather than purely perceptual) dimension of generalization, analogous to such a generalization previously reported in the fear learning domain (Dunsmoor, White, & LaBar, 2011). Moreover, these effects also influenced the connectivity between the SN/VTA and the ventral striatum (Figure 7), which could indicate a role of the dopaminergic circuit in value generalization (Bunzeck, Dayan, Dolan, & Duzel, 2010).

Downward Value Adjustment May Impede Subsequent Associative Memory Formation
At odds with the literature showing a positive modulation of memory by reward, we found that learning of new associations with previously highly rewarded material did not lead to better but, if anything, rather worse memory and that the interaction between the hippocampus and the ventral striatum appeared to mediate this effect. Specifically, we found decreased functional connectivity between the ventral striatum and the left hippocampus during encoding of picture-location associations with previously highly rewarded pictures (Figure 7), including the inferior temporal gyrus. In addition, delayed recall of associations with picture carrying a history of HR 24 hr after encoding was accompanied by decreased functional connectivity between the left hippocampus and the vmPFC as well as the ventral striatum (Figure 9), structures of the brain's automatic valuation system (Lebreton, Jorge, Michel, Thirion, & Pessiglione, 2009). These results suggest impaired new memory formation when signal in the value-related brain areas decreases because of relative cue-value adjustment, resulting in a decrease in memory accuracy after night-time consolidation-the time when an effect of memory modulation by reward has been shown to emerge (Igloi et al., 2015;Javadi, Tolat, & Spiers, 2015;Adcock et al., 2006). We stipulate that the magnitude of dopaminergic ventral striatum activation modulated hippocampal activity, consistent with the role of dopamine in memory modulation (Miendlarzewska et al., 2016;Thomas, 2015;Lisman et al., 2011;Lisman & Grace, 2005;). To summarize, although the brain revealed low activation in the reward circuit for HR history stimuli (compared to LR stimuli) on both days of the experiment, the behavioral trend emerged only after 24 hr. Such delayed effects are consistent with a large body of evidence supporting that memory consolidation mechanisms, including the lasting modulation of memory retention by reward, may require sleep (Igloi et al., 2015;Javadi et al., 2015).

Limitation
One limitation of the study may relate to the design of the study, in which HR and LR trials were grouped into miniblocks. This structure was chosen to minimize that activation of the reward system in response to an HR trial would spill over to an immediately following LR trial. Yet, blocked presentation of rewarding events could potentially increase expectation (i.e., reduce reward prediction error-like activity) during a mini-block. We therefore analyzed the time course of the ventral striatum activity within mini-blocks and found no evidence that the average decrease in activity for HR compared to LR pictures could be because of dynamic changes (e.g., in prediction error) that would unfold over the course of the mini-blocks.

Conclusion
Despite the potentially important implications for learning in an educational context, the consequences of past reward on new memory formation have not received much attention in cognitive neuroscience. Our findings go beyond previous human studies of this phenomenon to show that the downward value adjustment after discontinuation of rewards may generalize to semantically related stimuli and that this undermining effect can impact both encoding and postconsolidation recall activity of the hippocampus.
We found that, although participants reported increased subjective preference for ConHR pictures, they showed no learning improvement and reduced ventral striatum and hippocampal activation for these same pictures as well as for semantically similar new exemplars (compared to LR pictures) during the subsequent object-location task and at recall 24 hr later.
Our study provides neural evidence to help explain the potential paradoxical undermining effects of monetary incentive on subsequent performance in human volunteers that has been observed in some situations (Murayama et al., 2010). We show that the removal of the incentive leads to a reduction in relative reward (negative prediction error) in the ventral striatum.