Development of a temperature-controlled phantom for magnetic resonance quality assurance of diffusion, dynamic, and relaxometry measurements

Purpose: Diffusion-weighted (DW) and dynamic contrast-enhanced magnetic resonance imaging (MRI) are increasingly applied for the assessment of functional tissue biomarkers for diagnosis, lesion characterization, or for monitoring of treatment response. However, these techniques are vulnerable to the influence of various factors, so there is a necessity for a standardized MR quality assurance procedure utilizing a phantom to facilitate the reliable estimation of repeatability of these quantitative biomarkers arising from technical factors (e.g., B1 variation) affecting acquisition on scanners of different vendors and field strengths. The purpose of this study is to present a novel phantom designed for use in quality assurance for multicenter trials, and the associated repeatability measurements of functional and quantitative imaging protocols across different MR vendors and field strengths. Methods: A cylindrical acrylic phantom was manufactured containing 7 vials of polyvinylpyrrolidone (PVP) solutions of different concentrations, ranging from 0% (distilled water) to 25% w/w, to create a range of different MR contrast parameters. Temperature control was achieved by equilibration with ice-water. Repeated MR imaging measurements of the phantom were performed on four clinical scanners (two at 1.5 T, two at 3.0 T; two vendors) using the same scanning protocol to assess the long-term and short-term repeatability. The scanning protocol consisted of DW measurements, inversion recovery (IR) T1 measurements, multiecho T2 measurement, and dynamic T1-weighted sequence allowing multiple variable flip angle (VFA) estimation of T1 values over time. For each measurement, the corresponding calculated parameter maps were produced. On each calculated map, regions of interest (ROIs) were drawn within each vial and the median value of these voxels was assessed. For the dynamic data, the autocorrelation function and their variance were calculated; for the assessment of the repeatability, the coefficients of variation (CoV) were calculated. Results: For both field strengths across the available vendors, the apparent diffusion coefficient (ADC) at 0 ◦C ranged from (1.12±0.01)×10−3 mm2/s for pure water to (0.48±0.02)×10−3 mm2/s for the 25% w/w PVP concentration, presenting a minor variability between the vendors and the field strengths. T2 and IR-T1 relaxation time results demonstrated variability between the field strengths and the vendors across the different acquisitions. Moreover, the T1 values derived from the VFA method

Purpose: Diffusion-weighted (DW) and dynamic contrast-enhanced magnetic resonance imaging (MRI) are increasingly applied for the assessment of functional tissue biomarkers for diagnosis, lesion characterization, or for monitoring of treatment response. However, these techniques are vulnerable to the influence of various factors, so there is a necessity for a standardized MR quality assurance procedure utilizing a phantom to facilitate the reliable estimation of repeatability of these quantitative biomarkers arising from technical factors (e.g., B 1 variation) affecting acquisition on scanners of different vendors and field strengths. The purpose of this study is to present a novel phantom designed for use in quality assurance for multicenter trials, and the associated repeatability measurements of functional and quantitative imaging protocols across different MR vendors and field strengths. Methods: A cylindrical acrylic phantom was manufactured containing 7 vials of polyvinylpyrrolidone (PVP) solutions of different concentrations, ranging from 0% (distilled water) to 25% w/w, to create a range of different MR contrast parameters. Temperature control was achieved by equilibration with ice-water. Repeated MR imaging measurements of the phantom were performed on four clinical scanners (two at 1.5 T, two at 3.0 T; two vendors) using the same scanning protocol to assess the long-term and short-term repeatability. The scanning protocol consisted of DW measurements, inversion recovery (IR) T 1 measurements, multiecho T 2 measurement, and dynamic T 1 -weighted sequence allowing multiple variable flip angle (VFA) estimation of T 1 values over time. For each measurement, the corresponding calculated parameter maps were produced. On each calculated map, regions of interest (ROIs) were drawn within each vial and the median value of these voxels was assessed. For the dynamic data, the autocorrelation function and their variance were calculated; for the assessment of the repeatability, the coefficients of variation (CoV) were calculated. Results: For both field strengths across the available vendors, the apparent diffusion coefficient (ADC) at 0 • C ranged from (1.12 ± 0.01) × 10 −3 mm 2 /s for pure water to (0.48 ± 0.02) × 10 −3 mm 2 /s for the 25% w/w PVP concentration, presenting a minor variability between the vendors and the field strengths. T 2 and IR-T 1 relaxation time results demonstrated variability between the field strengths and the vendors across the different acquisitions. Moreover, the T 1 values derived from the VFA method

INTRODUCTION
Magnetic resonance imaging (MRI) is a powerful noninvasive imaging modality in oncology. Functional information on tissue structure gained from diffusion-weighted (DW) and dynamic contrast-enhanced (DCE) sequences is increasingly used to yield observations beyond lesion size and location. DW-MRI gives signal contrast derived from random motion of water molecules in biological tissues and depends on tissue structure; the derived apparent diffusion coefficient (ADC) is also modulated by the presence of macromolecules, interactions with cell membranes, and flow within vessels. 1, 2 The sensitivity of this technique to water diffusion properties of tissue is generated by application of gradient pulses of varying amplitudes, separations, and durations, summarized by the parameter known as the b-value. In practice b-values are selected not only by consideration of the anatomical region being evaluated but also by the system capabilities and even the investigator's preferences.
DCE-MRI uses the kinetics of an administered exogenous contrast agent to assess characteristics of tumor vasculature. Pharmacokinetic modeling of the signal intensity curve over repeated T 1 -weighted scans [utilizing a rapid T 1 -weighted gradient-echo acquisition method as used for the variable flip angle (VFA) measurements] with sufficient time resolution following modeling gives parameters such as forward transfer constant (K trans ) and rate constant (k ep ) between extracellular extravascular space and plasma, and fractional volumes of extracellular extravascular space (v e ) and blood plasma (v p ) per unit volume of tissue. Model-independent parameters can also be obtained from the DCE-MRI measurement, such as initial area under the gadolinium curve over 60 s after arrival of contrast agent (IAUGC60) and the precontrast longitudinal relaxation time (precontrast T 1 ). Precontrast T 1 is calculated using the VFA method with one or more (precontrast) volume from the dynamic scan and a matched volume acquired with a distinct (usually lower) flip angle. 3 Although DW-MRI (Refs. [4][5][6] and DCE-MRI are valuable modalities in functional imaging, 7-10 they are susceptible to the influence of various factors including scanner type, field strength, hardware specifications, and software implemen-tation. Understanding the repeatability and the variability of these measurements is critical if these modalities are to be used in a quantitative manner for the prediction or monitoring of treatment response in clinical trials, particularly in pediatric trials where the relative rarity of disease requires participation of multiple centers. A standardized MR quality assurance procedure with a test object, or "phantom," applied to various scanners of different vendors and field strengths with a repeatability estimate, would facilitate the reliable use of these quantitative results as biomarkers. An ideal phantom must provide reliable and reproducible multiparameter measurements without any temperature dependence. Ideally, phantoms 11 should be made of materials that (a) provide values relevant to physiological ranges, (b) are easily prepared, inexpensive, stable over time, and nontoxic, and (c) provide reliable and reproducible values. There is no currently available phantom that satisfies all of these requirements. Pierpaoli et al. 12 first introduced solutions of polyvinylpyrrolidone (PVP) in water as potential phantoms for DW-MRI measurements, though without temperature control. PVP is an organic polymer, and by varying its concentration in solution, a range of desirable T 1 and ADC values can be created without the addition of any paramagnetic metal ions. Additionally, PVP solutions are known to be chemically stable over a long period of time, in contrast to the sucrose solutions previously used for quality assurance of diffusion measurements. 13,14 Malyarenko et al. 15 used ice-water as a universal temperature control fluid in diffusion measurements, removing the need to adjust for the known temperature dependence of T 1 and ADC values. Subsequently, Boss et al. 16 proposed an ice-water temperature-controlled phantom with PVP solutions for the quality assurance of diffusion measurements providing reliable and repeatable values relevant to the physiological range. For DCE-MRI, different dynamic phantoms have been presented for specific applications; 17 a general-purposed phantom providing a physiological range of T 1 values for assessment of stability with controlled temperature would be ideal for deriving comparative metrics across different MR vendors.
A standardized MR quality assurance procedure, combining a protocol for performing multiple functional MRI measurements (like DW and DCE-MRI) in a limited time with a well-designed phantom of multiple temperaturecontrolled solutions with a suitable range of multiple MR contrast parameters, will provide a valuable resource for multicenter imaging protocol development and optimization within multicenter trials. The aim of this study is therefore to present such a phantom and the repeatability measurements of functional, DW and DCE-MRI, and quantitative T 1 and T 2 , imaging protocols across different MR vendors and field strengths as a quality assurance tool in multicenter clinical trials.

2.A. Phantom preparation
Polyvinylpyrrolidone [PVP, (C 6 H 9 NO) x , Sigma-Aldrich] with a mean molar mass of 55 000 g/mol was used for the generation of gels with different MR characteristics. Materials for construction of the phantom are inexpensive and readily available, and construction itself is approximately one day's labor. PVP in powder form was dissolved in 320 ml of distilled water inside a water bath in order to prepare the mixture under constant temperature conditions (55 • C). The solution was continuously stirred until the PVP was fully dissolved and then left in the water bath for 45 min to stabilize, before being removed and left to cool to room temperature (18 • C). Different amounts of this stock solution were diluted in distilled water for the production of PVP concentration solutions ranging from 2.5% to 25% w/w. Each solution was degassed using He in order to remove any dissolved oxygen. All the solutions were produced under the same conditions, to reduce systematic error on the MR measurements. In the initial preparation, containing only PVP and distilled water, bacterial growth was observed four months following production; the whole procedure was thus repeated with the inclusion of sodium azide (5 mg) to the initial PVP mixture to prevent this growth. The different concentrations were transferred to 60 ml vials (diameter 27 mm), and these were sealed using their appropriate caps and paraffin film.
Six different PVP solutions with concentrations 2.5%, 5%, 10%, 15%, 20%, and 25% w/w, together with one vial containing distilled water (0%, with sodium azide) and a localization rod were fixed inside a custom-built cylindrical acrylic phantom, with an inner diameter of 18 cm and height of 19 cm. The vial with the distilled water was positioned in the center of the phantom and the PVP vials arranged such that when viewed axially, PVP concentration decreased counterclockwise (Fig. 1).
Temperature control of the phantom was achieved by filling the cylinder with ice-water 1 h before scanning. Following a 45-min equilibration time, more ice was added to replenish the amount melted. Before transferring the phantom to the MR scanner room, the temperature was measured to verify that the phantom contents were at 0 • C.

2.B. MRI acquisition
MR imaging measurements were performed on four clinical scanners, two at 1.5 T, denoted scanners A (MAG-NETOM Avanto, Siemens Healthcare, Erlangen, Germany) and B (MAGNETOM Aera, Siemens Healthcare, Erlangen, Germany) and two at 3.0 T, denoted C (MAGNETOM Skyra, Siemens Healthcare, Erlangen, Germany) and D (Achieva, Philips Healthcare, Best, The Netherlands), using the appropriate modified protocol for each case. The scanning protocol included DW-MRI, inversion recovery T 1 measurements (IR-T 1 ), a multiecho T 2 measurement, and a dynamic scan as used for DCE-MRI allowing variable flip angle estimation of T 1 (VFA-T 1 ) with successive dynamic volumes. The scanning parameters for these scans on each system are presented in Tables I-IV. For DW-MRI, four separate successive measurements were acquired and stored separately to avoid signal averaging in the scanner and allow investigation of measurement stability. The flip angles for the VFA measurement were determined as optimal for the expected range of T 1 (500-1400 ms) in the phantom at 1.5 T. 18,19 The phantom was placed on the scanner bed with its central axis parallel to the z-axis of the magnet for axial image F. 1. Phantom construction with PVP gels (left), calculated axial T 2 map (center, T 2 range from 510 to 1406 ms), and calculated axial T 1 map (right, T 1 range from 520 to 1415 ms) of the phantom at 1.5 T with the selected ROIs with corresponding PVP concentrations are given in % w/w. T I. Sequence parameters for acquisition of diffusion-weighted imaging measurements. acquisition. To ensure the reproducible positioning of the phantom on the bed across the repeatable measurements, the internal localization rod was used as a localization reference. A plastic wedge was used to slightly raise one end of the phantom, in order to force any bubbles to the top of the phantom and remove them from the field of view (FoV). The phantom was scanned three times on each scanner in separate imaging sessions to assess both long-term and short-term repeatability of DW, DCE, T 1 , and T 2 measurements, using paired scans separated by approximately 1 month and 2-24 h (with complete removal and repositioning of the phantom and routine scanning performed in between), respectively. In each case, the scanning duration was 50 min, during which period the temperature of the phantom components was between 0 and 1 • C.

2.C. Data analysis
All calculations of functional parameters were performed on a voxel-by-voxel basis, with no image smoothing, using in-house software (ADEPT and MRIW, Institute of Cancer Research, London). For all calculated parameter maps, large circular regions of interest (ROIs) (area 138-250 mm 2 ) were drawn within each vial in the phantom, and the median value for the voxels in each ROI recorded. ADC and T 2 maps were calculated using a monoexponential model for all b-value and echo time images, respectively [Eqs.
(1) and (2)]. T 1 maps were calculated using the images acquired at different inversion times (IR-T 1 ) according to Eq. (3) (valid given the long TR of 10 s) and then separately using the variable flip angle method, using a combination of the multiple-averaged data at low flip T II. Sequence parameters for acquisition of inversion recovery T 1 (IR-T 1 ) measurements. angle with each of the images in the dynamic series at higher flip angle (VFA-T 1 ) according to Eq. (4), S n sinθ n = exp To confirm good signal-to-noise ratios (SNRs) for derived parameters from the diffusion-weighted and T 2 -weighted measurements, SNRs were estimated for the vial with the largest ADC at highest b-value and for the vial with the shortest T 2 at longest echo for the 4 scanners using the average signal within the ROI divided by the variance of noise from a corresponding region ROI positioned in empty space (devoid of artefacts) on the same image.
From the dynamic data, it is also possible to calculate the autocorrelation (AC) function for each T 1 measurement to its successive measure, as well as the variance of the T 1 measurement across the whole dynamic series, yielding the T 1 noise factor (T 1 NF), Eq. (5), 20 and T 1 SNR, Eq. (6), 21 where σ T 1 and σ S are the standard deviations (s.d.) of T 1 and the signal, respectively, S 0 is the initial signal intensity, and T S is the total scan time.

2.D. Statistical analysis
The short-term and long-term repeatability of each MR contrast parameter were assessed using the repeated measurements coefficient of variation (CoV, expressed as percentage) between the second and the third imaging session (time separation between 2 and 24 h, with complete removal and repositioning of the phantom) and between the first and the second imaging session (time separation more than 1 month), respectively. For the calculation of the CoVs, a log-normal distribution was assumed and logarithms of the parameters were used when calculating the various statistics. 22,23 First, the sample variance (V ) of the logarithmic difference of the two compared imaging sessions were calculated, and then the CoV (%) was assessed by the following equation: 3. RESULTS

3.A. Apparent diffusion coefficient values
For each vendor, four individual diffusion-weighted measurements were acquired, the SNR of the highest b-  value image on the central vial (0% w/w PVP solution) was estimated [ranged from 147.2 to 3155.8 (signal/noise)], and the mean voxel signal intensity at each voxel for all images for each b-value was calculated for the production of the ADC maps (Fig. 2). As presented in Table I, the field of view of scanner D was different to the other scanners, reflecting current clinical protocol variation. The ADC values of each vial were estimated using each of the four individual measurements (nonaveraged, acquired within a single imaging session) and using the mean of these measurements. The percentage deviations of the ADC estimates in each vial between the mean and each individual measurement were less than 1%, indicating low intrinsic measurement noise. Moreover, the percentage deviations between the ADC estimates of each slice and the corresponding estimates of all the slices were also less than 1%, showing negligible variation through the slices. Consequently, the ADC values reported are the mean values at each of the four acquisitions (nonaveraged measurements) on each scanner for one imaging session, from the central slice (Table V). At both 1.5 and 3.0 T across both vendors, the ADC values ranged from (1.12 ± 0.01) × 10 −3 mm 2 /s for pure water to (0.48 ± 0.02) × 10 −3 mm 2 /s for the 25% w/w PVP concentration, exhibiting an anticipated decrease with the increase in PVP concentration. The differences in ADC values between the vendors and the field strengths were small.

3.B. IR-T 1 and T 2 relaxation times
The mean calculated IR-T 1 and T 2 relaxation times of each vial are presented in Tables VI and VII. In both cases, the relaxation times varied across the different acquisitions showing some variance with field strength and vendor, although generally a good agreement was found across the scanners for each gel concentration. T 2 relaxation times derived from 1.5 T data showed variation when compared to the corresponding values from 3.0 T data, whereas for the majority of the vials, the IR-T 1 relaxation times at 3.0 T demonstrate an increasing T 1 as compared to the 1.5 T  values. The SNR of the T 2 -weighted images was estimated by drawing an ROI on the vial with the shortest T 2 within the longest-TE image and was found to be satisfactory at 492-1960 (signal/noise).

3.C. VFA-T 1 and dynamic measurement stability
The T 1 values derived from the VFA method, VFA-T 1 , are presented in Table VIII and show a large variation from those derived by the (gold-standard) inversion recovery method, with larger variance across scanners as well as for repeated measurements on each scanner. Scanner C gives consistently higher T 1 estimates for VFA than for IR-T 1 , with the other scanners in general agreement. The standard deviation seen in the VFA-T 1 estimate across each individual ROI, Table IX, are relatively small, indicating a stable dynamic series of images, with the main source of variation being in T 1 estimation across separate scanning sessions. In all cases, the average mean/(standard deviation) of each T 1 estimate within the ROIs was greater than 20 (Table X).
The next-value autocorrelation of the dynamic series, measuring stability of the signal over the dynamic time course, was observed to be close to 0 (range −0.06 to 0.02) for all PVP concentrations in three out of four scanners, with the final scanner (Scanner D, 3.0 T) returning values consistently higher and indicating instability in the signal over time (Table XI). Examination of one such T VII. T 2 (ms) relaxation times (median ± s.d. of each ROI) of each PVP solution in the phantom measured across the different scanners.  example reveals a periodic variation in signal of around 15% (Fig. 3), and the correlogram shows a clear periodicity of around 38 s.

3.D. Repeatability measurements
The short-term and long-term repeatability estimates, assessed by calculating of the coefficient of variation for each case, are given for each parameter in Table XII. Excellent repeatability of 5% or lower, often below 1%, was observed across all scanners for both long-term and short-term comparisons of ADC, T 2 , and IR-T 1 values. The short-term repeatability was almost always better than the long-term for these parameters.
Parameters derived from the dynamic data show consistently higher CoV, with values as high as 17% and 10% for long-term and short-term repeatability, respectively. While short term CoVs are generally smaller than long-term, the difference is often minor and suggests no fundamental difference from repetitions of measurements on the different timescales. Moreover, for each vendor the dependence of the ADC estimates with the morning and afternoon session was estimated in order to investigate any effect related to the time of the scanning session. In all cases, the ADC CoVs between the morning and the afternoon measurement were less than 2%.

DISCUSSION
This study presents a new phantom containing multiple compartments of PVP solutions at ice-water temperatures for the quality assurance of functional and quantitative imaging across different MRI platforms. The preparation of PVP solutions at different concentrations provides a physiologically relevant range of ADC and T 1 values, and with a range of T 2 values that can be measured to assess reliability, repeatability, and reproducibility in a multicenter setting using a standardized acquisition. The design of the phantom combines the features of existing alternative phantoms, including a chemically stable gel with user-modifiable MR properties, multiple compartments allowing the coverage of a wide MR parameter range, and known and reproducible temperature, to create a new phantom with all the conferred advantages.
There are several limitations in this study; first, scanners from only two different vendors were used, and this could be usefully expanded in future studies. Second, measurements were performed at both field strengths only for one vendor, allowing investigation of the influence of field strength only for this particular vendor. Finally, the DCE-MRI protocols in this study are not directly comparable, each implementation being a typical protocol for that platform, and thus allowing only assessment of their repeatability and variability. While this is not ideal, the primary scope of this study is not T XI. Autocorrelation with the standard deviation of each calculation of dynamic data signal intensity with succeeding point. to compare scanner performance, but the presentation of a phantom for the quality assurance of functional and quantitative imaging protocols allowing assessment of their reproducibility across multicenter studies.
The ADC values of this phantom are consistent with a variety of biological tissues (brain, bone marrow, and tumor), delivering a relevant physiological range, although the ADC range available is limited by the temperature control at 0 • C, and some tissues have larger ADC values (e.g., kidney). The excellent repeatability of the ADC measurements for both long and short term, assessed via calculation of the CoVs, indicates a reliable quality assurance procedure. Previous published repeatability of ADC measurements in phantoms 12,15,16,24,25 are in agreement with our values.
The calculated ADC values across the MR scanners depicted differences between the vendors, in line with another phantom study. 25 In contrast to the study by Lavdas et al. 25 the ADC values reported in this study did not show any dependence with the field strength; the use of 7 different solutions in this study, compared to only 3, increases the confidence of the comparison. Malyarenko et al. 15 report repeatability and reproducibility of a single ADC value in an ice-water phantom across different vendors and field strengths. With 3.0 T MR scanners becoming increasingly common, the utilization of a standardized quality assurance procedure across scanners of different field strength and vendor is crucial  for quality assurance in multicenter studies. In this study, not only repeatable DW measurements of multiple temperaturecontrolled solutions were presented but also quantitative T 1 , T 2 , and DCE-MRI measurements across two different vendors and field strengths.
The estimated values for IR-T 1 and T 2 relaxation times were comparable with the corresponding values of biological tissues and fluids, respectively. 26 It is both practical and convenient to acquire the T 1 and T 2 measurements under the same temperature control as required for reliable estimation of ADC, since these are also known to be sensitive to temperature. 27,28 It is worth noting that T 1 values showed a dependence on the field strength, of approximately 40 ms/T for all vials (calculated as the average difference across field strengths for all the scanners), and this result is in agreement with the documented increase in T 1 values observed at 3.0 T by De Bazelaire. 26 Moreover, the good short-term and long-term repeatability (less than 5%) of these parameters using the protocol described demonstrates the reliability and repeatability of these measurements with this phantom.
Parameters derived from the dynamic scans, as used for DCE-MRI measurements, clearly demonstrate the difficulty with estimation of T 1 relaxation times using the variable flip angle measurements 29 and suggest that such data may only reliably provide relative T 1 values. The two-point VFA measurement works well for a single TR/T 1 combination, 3 but there is an inherent difficulty in accurate T 1 determination across a range of T 1 values; a multiple-flip angle measurement has been shown to improve T 1 accuracy but only at the expense of time. 30,31 In the context of biomarkers for response, however, repeatability and stability of measurements are critical in being able to confidently identify changes in T 1 relaxation times, in clinical applications where the change may be small. The autocorrelation functions for the dynamic data showed acceptable stability across the varying PVP concentrations (with corresponding T 1 range) for all scanners over the acquisitions, with the exception of one scanner where the quality assurance procedure identified a periodic signal variation that would compromise the DCE-MRI study, which was not observed from a simple examination of T 1 standard deviation across the series. The CoV of dynamic parameters were consistently higher than those for ADC, IR-T 1 values, and T 2 , which reflects the compromise in data quality necessary when acquiring at high temporal resolution, though the CoVs returned were shown to be acceptable (<10%) for both longterm and short-term repeatability.
In conclusion, the combination of a novel PVP phantom, with multiple compartments to give a physiologically relevant range of ADC and T 1 values, together with the simplicity and reproducibility of ice-water temperatures, allows reliable quality assurance measurements that can be used to measure agreement of MR scanners, which will be critical in the context of multicenter functional imaging studies that contain diffusion-weighted imaging, dynamic contrastenhanced imaging, and T 1 and T 2 measurements.