Verification of the accuracy of a photon dose‐calculation algorithm

An extensive set of measured data was developed for the purpose of verifying the accuracy of a photon dose‐calculation algorithm. Dose distributions from a linear accelerator were measured using an ion chamber in a water phantom and thermoluminescent dosimeters in a heterogeneous anthropomorphic phantom. Test cases included square fields, rectangular fields, fields having different source‐to‐surface distances, wedged fields, irregular fields, obliquely incident fields, asymmetrically collimated fields with wedges, multileaf collimator‐shaped fields, and two heterogeneous density cases. The data set was used to validate the photon dose‐calculation algorithm in a commercial radiation treatment planning system. The treatment planning system calculated photon doses to within the American College of Medical Physics (AAPM) Task Group 53 (TG‐53) criteria for 99% of points in the buildup region, 90% of points in the inner region, 88% of points in the outer region, and 93% of points in the penumbra. For the heterogeneous phantoms, calculations agreed with actual measurements to within ±3%. The monitor unit tests revealed that the 18‐MV open square fields, oblique incidence, oblique incidence with wedge, and mantle field test cases did not meet the TG‐53 criteria but were within ±2.5% of measurements. It was concluded that (i) the photon dose calculation algorithm used by the treatment planning system did not meet the TG‐53 criteria 100% of the time; (ii) some of the TG‐53 criteria may need to be modified, and (iii) the generally stated goal of accuracy in dose delivery of within 5% cannot be met in all situations using this beam model in the treatment planning system. PACS number(s): 87.53.–j, 87.66.–a


INTRODUCTION
Before implementing a radiation treatment planning system in the clinic, the dose-calculation algorithm must be validated using rigorous, clinically relevant criteria. The algorithm must accurately calculate dose distributions for a variety of clinical beam configurations. Verifying the accuracy of the dose computation requires a comprehensive set of test cases. Although dosecalculation algorithms can generally calculate dose distributions for a radiation beam at normal incidence on a water phantom, their accuracy in a variety of clinical situations may be questionable. Some such situations include oblique incidence of the radiation beam ͑e.g., tangential breast irradiation͒ or multiple density heterogeneities ͑e.g., lung irradiation͒.
The American Association of Physicists in Medicine ͑AAPM͒ Radiation Therapy Committee Task Group 23 ͑TG-23͒ 1 developed a test package for verifying the accuracy of photon-beam dose-calculation algorithms. Data for the test cases were acquired for two beam energies from two clinical linear accelerators: a 4-MV x-ray beam from a Clinac-4 ͑Varian Oncology Systems, Palo Alto, CA͒, and an 18-MV x-ray beam from a Therac-20 ͑Atomic Energy of Canada, Ltd., Kanata, Ontario, Canada͒. Although TG-23 used 13 test cases for algorithm verification, several clinically significant situations were not included. For example, the TG-23 cases were developed at a time when three-dimensional ͑3D͒ radiation treatment planning was just beginning; consequently, TG-23 did not include test cases that examined issues present in 3D treatment planning such as 3D density heterogeneities. Inclusion of 3D test cases is essential because the patient cannot always be modeled as a two-dimensional ͑2D͒ object. In fact, many clinics no longer use 2D treatment planning. Additionally, when treating the thoracic region, some beams have to traverse bone and then lung tissue. The differences in material composition in this region can significantly affect dose-calculation algorithms because electronic equilibrium is not established at the interfaces. The representation of inhomogeneous media in the TG-23 cases is only two-dimensional.
Another shortcoming of the TG-23 report is that the wedged-field case presented therein was that of a 45°wedge. Although this is a clinically valid situation, we have found that a 60°wedge has been more difficult to model than a 45°wedge. 2 Moreover, the wedged beam is presented at normal incidence on the phantom, while a common clinical use of wedged beams is at an oblique incidence. The TG-23 report also presented a significant example of an irregular field: the ''hockey stick'' field. Another clinically relevant but more complex irregular field is the mantle field used to treat Hodgkin's disease. Although TG-23 covered oblique incidence, the clinical range of obliquities was not explored. The range of obliquities commonly encountered in current radiation therapy includes obliquities greater than 45°, such as the angles encountered when treating breast cancer using tangential fields. Additionally, asymmetric collimation was not addressed. Finally, the TG-23 data set did not include absolute dose determination, which could be used for monitor unit ͑MU͒ verification. A more recent study done has extended the TG-23 data set to include some of these additional capabilities. 3 Another group that addressed algorithm verification was the Electron Collaborative Working Group ͑ECWG͒. 4 The ECWG's experiments were designed to test the fundamental characteristics of electron dose-calculation algorithms, as well as the accuracy of these algorithms in clinical situations. Measurements were performed for situations including variation of energy, source-toskin distance ͑SSD͒, electron applicator, field shaping, and irregular surfaces and heterogeneities ͑1D, 2D, and 3D͒ using air, lung, and bone substitutes. A useful outcome from the ECWG data was that their data were formatted and made available for distribution so that other institutions could apply them to verify their electron dose-calculation algorithms. The ECWG dataset was recently reviewed and expanded by Boyd et al. 5 Given that several clinically significant situations were not included in the 13 TG-23 test cases, the AAPM Radiation Therapy Committee Task Group 53 ͑TG-53͒ report 6 suggested several photon dose-calculation verification situations. The test cases generated in the present study as well as the methodology used in generating these test cases evolved from the TG-23 and TG-53 work.
The purpose of this study is to generate a data set that could be used for evaluating photon dose-calculation algorithms used in contemporary treatment planning systems. To achieve this goal, several revisions were made to the data set described in the TG-23 project. First, the test cases pertaining to inhomogeneous media were 3D instead of 2D. Additional test cases were needed; these included oblique incidence with a wedged field, significant asymmetric ͑half beam͒ collimation, a mantle field, a field defined using multileaf collimators ͑MLCs͒, a 3D representation of the lung with a tissue-bone interface, and a neck phantom with a tissue-air interface. The accuracy of treatment planning system MU calculations was also assessed. 7 Test cases representing 12 different clinical setups were included in the data set to verify the accuracy of the photon dose-calculation algorithm. These setups included open square and rectangular fields, extended SSDs, wedged fields, irregular fields, short SSDs, oblique incidence, as well as the cases described in the previous paragraph. Also, the data set contained measured data, including fractional depth dose ͑FDD͒ curves, sagittal and transverse beam dose profiles, total scatter factors, and point doses in the heterogeneous case.
This data set was then applied to a commonly used photon-beam dose-calculation algorithm with goals of ͑i͒ validating the determinations of the parameters used in the beam model, and ͑ii͒ evaluating the accuracy of the dose calculated by the model in various clinically relevant situations.
The data set was developed specifically for photon dose-calculation verification. Data from the open square field test case can be used to generate a beam model in the treatment planning system. The other test cases can then be used to verify the photon dose-calculation algorithm under the particular clinical configurations. It is our intention that the data set developed in this work be appropriately formatted for distribution in the same manner as the ECWG data set.

A. Measurement of the data set
In all of the test cases, with the exception of the heterogeneity cases, the FDD curves and profiles in the x and y directions were measured in a water phantom ͑Wellhöffer, Schwarzenbruck, Germany͒. Unless otherwise noted, the profile depths were 1.2, 4.0, 10.0, and 20.0 cm for 6-MV beams, and 3.2, 6.0, 10.0, and 20.0 cm for 18-MV beams. These depths were chosen so that the majority of the calculated profiles would not have to be interpolated from the calculated dose matrix. In each case, the shallowest depth was selected so that the depth would be in the buildup region if the field were small, and close to d max if the field were large.
Data were acquired using a Clinac 2100C linear accelerator ͑Varian Medical Systems, Inc., Palo Alto, CA͒. Two 0.1-cm 3 ion chambers ͑Model No. N23323, PTW-Freiburg, Freiburg, Ger-many͒ were employed to acquire data in the water phantom. Off-axis profiles not passing through the central ray of the beam were measured at a distance equal to 80% of the half-width of the radiation field at the particular depth and in the positive direction ͑coordinate system discussed below͒. For example, the off-axis profiles measured for the 5 cmϫ5 cm open field test case at a depth of 10 cm were ϩ2.2 cm away from the central axis. While the actual location of the off-axis profile is somewhat arbitrary, selection of this location places the profile in a region in which the beam is still reasonably flat, but may exhibit significant differences from a profile passing through the central axis of the beam. Verification of MU calculations was also performed.
Thermoluminescent dosimeters ͑TLDs͒ were used to measure the absolute dose in the heterogeneous phantoms. Each TLD consisted of approximately 25 mg of TLD-100 powder ͑Harshaw Chemical Co., Solon, OH͒. The cylindrical active volume of the detector was 2-mm diameter by 3-mm length. To calibrate the TLD, a set of reference TLDs was irradiated at a specified reference dose at the depth of the maximum dose in a 10 cmϫ10 cm field for each energy. Three readings were obtained at each measurement point.
The coordinate system for this study was defined as follows: The origin was located at the intersection of the central axis of the beam with the surface of the phantom. For eight of the ten cases, the origin was the machine isocenter. Facing the gantry, the x axis pointed to the observer's right, the z axis pointed upward, and the y axis was chosen so that the coordinate system would be right-handed. These coordinates were consistent with the specifications in the International Electrotechnical Commission 61217 document. 8 Unless otherwise noted, all profiles in the water phantom were normalized to the FDD at a depth of 10 cm for the particular field size and clinical setup. Consequently, the central axis FDD at a depth of 10 cm was equal to 1.00. It was necessary to select a point for normalization in order that absolute comparisons be made, and the selection of the depth at 10 cm was somewhat arbitrary. Total scatter factors were then referenced to an ion-chamber reading at a depth of 10 cm for a 10 cmϫ10 cm field.   Phantom size-lateral

B. Dose-calculation algorithm
The photon dose-calculation algorithm evaluated in this study is the convolution/superposition algorithm that was introduced by Mackie et al. 9 and extended by Papanikolaou et al. 10 to polyenergetic spectra. The implementation of the dose-calculation algorithm in the particular commercial treatment planning system ͑Pinnacle 3 ; ADAC Laboratories, Milpitas, CA͒ has been described previously. 2 The set of beam model parameters used in the clinic was employed in all open field test cases. However, the wedge models had to be commissioned specifically for this study because the clinical model uses dynamic wedges, while the measured test cases use physical wedges. ͑At the time this study was initiated, the version of the treatment planning system did not support dose calculations using dynamic wedges.͒ Consequently, parameters appropriate to the physical wedges had to be determined. These parameters were obtained using guidelines recommended by Starkschall et al. 2 Table I displays the beam parameters for the 18-MV open field models.
All dose calculations were performed on version 4.2 of the treatment planning system. A 4-mm dose grid was used in each test case because this is the grid typically used for calculations in this institution. With the exception of the oblique test cases, all test cases were calculated using the water phantom option provided in the treatment planning system. This option replaces the actual CT data set with a unit density data set and a constant SSD equal to the SSD along the central axis of the beam. Doses in the oblique test cases and the anthropomorphic phantom test cases were computed with the heterogeneity correction option in the treatment planning system, which used the actual CT data set along with a table that converted CT voxel values to electron densities.

C. Data comparison and presentation
After dose computation, files containing dose matrices were stripped of unwanted text and formatted so that they could be imported into a commercial image-manipulation and data-analysis software system ͑IDL, Research Systems Inc., Boulder, CO͒. Profiles were overlaid to compare the computed and measured data. The maximum deviation in each part of the radiation field was tabulated for each test case. Monitor unit comparisons were included in another table.
The report on quality assurance of radiation treatment planning produced by the AAPM TG-53 6 specified acceptance criteria in terms of the percent difference and distance difference. Consequently, these criteria were used for comparing the data in this study. Percent differences were calculated as the difference between the dose calculated at a data point and the dose measured at the same point multiplied by 100%. Because the measured and calculated data were normalized to the same value, no reference value for the percent difference was needed. To calculate the distance differences in high-dose-gradient regions ͑primarily beam penumbra͒, we identified the measured data points that bracket the calculated data point value. The coordinate of the point with the same dose as that of the calculated point was interpolated between these two measured data point coordinates. The distance between the calculated and measured points was calculated. Only those calculated points lying within the boundaries established by the TG-53 definition of penumbra, that is, points within 0.5 cm from the edge of the beam or beam modifier, were tested as penumbra points.
Because of the enormity of the data set, data analysis and presentation methods were used that minimized the number of plots while allowing appropriate analysis of the data. Profile overlays, which are plots of the calculated and measured profile on the same axis, identified regions of discrepancy. In regions of a high-dose gradient, a percent difference between calculated and measured dose values is not clinically significant. In this region, the distance difference was utilized.

A. Test case 1: Water phantom, 100-cm SSD, open square field
This case tested the ability of the photon dose-calculation algorithm to reproduce the dose distribution in a configuration similar to the configuration used to measure the original input data. Data were obtained for the 6-and 18-MV photon beams with fields of 5 cmϫ5 cm and 25 cmϫ25 cm. Four profiles were measured for each depth, two passing through the central axis and two passing through a specified off-axis position. Figure 1 illustrates the beam's-eye-view ͑BEV͒ orientation and locations of the profiles measured for all open square fields.

B. Test case 2: Water phantom, extended SSD "125 cm…, open square field
The extended SSD setup tested the ability of the treatment planning system to predict the increase in penumbra width and change in depth dose due to the increased distance from the source. Data were obtained for the 6-and 18-MV beams using 8 cmϫ8 cm and 20 cmϫ20 cm fields at an SSD of 125 cm.

C. Test case 3: Water phantom, 100-cm SSD, open rectangular field
This case tested the ability of the dose-calculation engine to compute the dose in an elongated rectangular field based on data input from square fields. The dose distributions in rectangular fields of 5 cmϫ25 cm and 25 cmϫ5 cm were measured for the 6-and 18-MV beams at an SSD of 100 cm. Figure 2 illustrates the BEV locations of the profiles for this case.

D. Test case 4: Water phantom, 100-cm SSD, wedged square fields
This case tested the ability of the algorithm to reproduce dose distributions in wedged fields. The wedge angles used were 45°and 60°. Each wedge was oriented in the x direction with the thin end pointing to gantry right. The 45°and 60°wedges were chosen because they were the most difficult to model as their thickness maximized beam hardening and wedge scatter. These data were acquired at an SSD of 100 cm for the 6-and 18-MV beams at field sizes of 6 cmϫ6 cm and 20 cmϫ20 cm for the 45°wedge, and 15 cmϫ15 cm for the 60°wedge.

E. Test case 5: Water phantom, 100-cm SSD, mantle field
To treat Hodgkin's lymphoma and certain other cancers, very irregularly shaped fields are used to conform the beam to the target site and spare critical structures. A mantle field, which is used in the treatment of Hodgkin's lymphoma, was used to test the algorithm's accuracy for irregularly shaped fields. Data were obtained at an SSD of 100 cm for the 6-and 18-MV beams with a collimator setting of 30 cmϫ30 cm. All data were normalized to the FDD at a depth of 10 cm from the surface of the water phantom with the mantle field block in place. Figure 3 illustrates the locations of the profiles.

F. Test case 6: Water phantom, open square field, isocentric setup
Although beam data are normally acquired at a fixed SSD of 100 cm, most patients are treated isocentrically, using SSDs that are less than the source-to-axis distance ͑SAD͒. Because of this, the dose-calculation algorithm was tested using an isocentric setup. Dose distributions for a 6-MV beam were measured at an SSD of 90 cm with an 11.1 cmϫ11.1 cm collimator setting, while 18-MV dose distributions for an 18-MV beam were measured at an SSD of 80 cm with a 12.5 cmϫ12.5 cm collimator setting. These configurations corresponded to isocenter depths of 10 and 20 cm, respectively.

G. Test case 7: Water phantom, 100-cm SSD, open square field, oblique incidence
This case tested the ability of the algorithm to calculate the dose for an obliquely incident beam. In general, dose distributions for oblique incidence should differ from those for normal incidence because an obliquely incident beam causes different amounts of scatter from different parts of the phantom than does a normally incident beam. 11 For this test, the gantry angles were 330°and 305°for each energy in a 10 cmϫ10 cm field at an SSD of 100 cm. All profiles were measured either perpendicular or parallel to the surface. In addition, data for the 305°gantry angle were normalized to the FDD at a depth of 4 cm from the surface of the phantom for the particular field size and energy, while those for the 330°gantry angle were normalized to the FDD at a depth of 6 cm. Figure 4 illustrates a side view of the oblique incidence setup.

H. Test case 8: Water phantom, 100-cm SSD, asymmetric jaws "half beam and 45°wedge…
This case, which was not included in previous test sets, is another test of the ability of the dose-calculation algorithm to produce an accurate dose distribution using a nonstandard, although common, beam configuration. Photon-beam dose distributions of 6 and 18 MV at and SSD of 100 cm were measured with an asymmetrically collimated 10 cmϫ20 cm field and a 45°wedge. The wedge was oriented in the x direction with the toe pointing to gantry left, and the FDD was measured beginning at a point ϩ5.2 cm in the x direction. All of the data were normalized to the FDD at a depth of 10 cm from the surface of the phantom and ϩ5.2 cm away from the central axis in the x direction for the particular energy. Figure 5͑a͒ illustrates the locations of the profiles for this case, while Fig. 5͑b͒ illustrates the side view of the irradiation setup. I. Test case 9: Water phantom, 100-cm SSD, wedged field, oblique incidence In the irradiation of certain sites, for example, breast and vocal cords, oblique incidence is compensated for by the use of wedges. In this case, a 45°wedge was implemented with a gantry angle of 315°. The wedge was oriented in the x direction with the toe pointing to gantry left, and 10 cmϫ10 cm field was used at each energy. All of the data were normalized to the FDD at a depth of 5 cm from the surface of the phantom for the particular field size and energy. All profiles were measured either perpendicular or parallel to the surface.

J. Test case 10: Water phantom, 100-cm SSD, MLC field
Dose-calculation algorithms generally make approximations in modeling the leaves of a multileaf collimator ͑MLC͒. For example, they may not model interleaf leakage or the rounded leaf edges. This case tested the ability of the photon dose-calculation algorithm to predict the dose under MLC leaves and the leaf leakage through them. Measurements for an 80-leaf Varian MLC were obtained using the following collimater settings: x1ϭ6.0, y1ϭ5.0, x2ϭ12.0, and y2 ϭ19.2. The shape of the field was a right triangle, with the hypotenuse at gantry right. Figure 6 illustrates the locations of the profiles for this case.

K. Test case 11: Heterogeneous medium "lung phantom…
In treating lung cancer, the radiation beam has to travel through soft tissue, bone ͑ribs͒, and lung tissue. In this case, measurements were made in a heterogeneous anthropomorphic phantom ͑Rando ® phantom; Radiology Support Devices, Inc., Long Beach, CA͒ to simulate the lung configuration. An 18-MV beam was employed with a 26 cmϫ12 cm field, and 500 MU were delivered at an SSD of 100 cm. External spot markers ͑Beekley Corp., Bristol, CT͒ were placed on the phantom surface to allow accurate, reproducible positioning. Also, measurements were taken using TLD-100 powder at specific points in the lungs; the powder was encapsulated in polyethylene plugs that fit into predrilled holes in the phantom. The TLD reader ͑Harshaw Chemical Co.͒ was used to measure the absorbed dose for each TLD measurement. At the end of the measurement session, three TLD standards were irradiated to 520 MU in a water phantom under calibration conditions ͑100-cm SSD, 10 cmϫ10 cm field, d max ͒ to provide a reference dose of 520 cGy for comparison with phantom measurements. Appropriate corrections for TLD energy dependence, fading and nonlinearity ͑nonlinearity of TLD defined for doses between 0 and 600 cGy͒ were applied to all TLD readings. Three sets of TLD measurements were performed to assess the precision of measurement. Figure 7 shows the locations of the measurement points within the phantom.

L. Test case 12: Heterogeneous medium "neck phantom…
To treat certain head and neck cancers, the radiation beam must pass consecutively through tissue, bone, and air, with a nonequilibrium condition present in the bone-air region. In this case, neck treatment was simulated using the Rando ® phantom. A 10 cmϫ14 cm field was used with a 6-MV beam, and 500 MU were delivered at an SSD of 100 cm. The reference dose for this test case was 460 cGy. Figure 8 displays the locations of the measured data points in the neck phantom.

M. MU verification
Current treatment planning systems may offer the option of calculating MUs, thus relating the dose distributions to the actual machine output. The methods by which the treatment planning systems relate dose distributions to machine output vary widely. For example, one commercial treatment planning system uses calibrated machine output obtained when the machine was originally commissioned as the starting point for MU calculations. In this method, the physicist enters the measured output at a specified reference point ͑usually a depth of 10-cm depth͒ for a reference field size ͑usually 10 cmϫ10 cm͒, and for a reference distance ͑for example, 100-cm SAD͒. Rather than normalizing the detector readings to the reading obtained under the reference conditions at the time of each set of measurements, calculations of the total scatter factor ͑TSF͒ 11 were compared rather than the absolute number of MUs. The TSF is defined to be the output at the dose normalization point divided by the output at a 10-cm depth for a 10 cmϫ10 cm field. Using the TSF for absolute dose determination removes the daily variation of the machine output from the measured data.
To test MU calculations, the TSF in the water phantom was measured at the normalization points for each of the ten water-phantom test cases. TSFs were obtained by referencing the electrometer reading at the particular normalization point to the electrometer reading at a depth of 10 cm for a 10 cmϫ10 cm field at an SSD of 100 cm for each energy for 100 MU. To extract these TSFs from the commercial radiation treatment planning system, 100 MUs were prescribed for each test case, and the absolute dose was recorded and then divided by the dose for a 10 cmϫ10 cm collimator setting for each energy. Figure 9 shows the results of the comparison of the calculated and measured photon doses for an 18-MV 25 cmϫ25 cm field, 100-cm SSD, at a depth of 3.2 cm in a transverse plane. In this comparison, the computed tails of the dose profile are much flatter than the measured tails. The TG-53 tolerance of Ϯ2% for the outer region is not met for the points at Ϯ13.6 cm from the central axis. The TG-53 tolerance of Ϯ2% for the inner region is also not met for the point at Ϫ12.4 cm from the central axis. Figure 10 illustrates the plot of a profile for an 18-MV 5 cmϫ25 cm field, 100-cm SSD, at a depth of 3.2 cm in a sagittal plane. The calculated doses in the inner region away from the central axis are underestimated because of the way the radial dependence of the incident photon fluence was modeled. Figure 11 is a plot of an 18-MV 30 cmϫ30 cm mantle field profile passing through the off-axis point ͑0 cm, 12.7 cm͒ at a depth of 6 cm in the x direction. Discrepancies between calculation and measurement were observed in two regions. First, the calculated doses outside the field ͑in both the beam penumbra and under the block͒ underestimated the measured dose with dose differences beyond the TG-53 criteria. Second, doses to the shoulders were also underestimated. Figure 12 illustrates a dose profile for a 6-MV field, asymmetrically collimated to 10 cmϫ20 cm field, at a depth of 1.2 cm with a 45°wedge inserted in the x direction with the toe pointing toward gantry left. One of the collimator jaws is placed at the central axis of the beam. The TG-53 criteria are normally specified as a percent of the central ray normalization dose. However, doses in this test case could not be normalized to a point on the central axis of the beam because the central axis lay in a region of high-dose gradient. TG-53 criteria were extended to this test case by establishing a normalization point in the approximate center of the radiation field. The depth of the profile was 1.2 cm and the profile passed through the central axis in a transverse plane. On the right-hand side, doses in the tails were underestimated more towards the edge of the field. Figure 13 is a plot of a profile of an 18-MV 10 cmϫ10 cm beam incident at an angle of 45°w ith a 45°wedge measured at a depth of 3.2 cm passing through the central axis in a transverse plane. As with the beam shown in Fig. 12, doses in this test case were not normalized to a point on the central axis of the beam. However, the TG-53 criteria were extended to this test case by establishing a normalization point in a region of high-and low-dose gradient. All calculated profiles then agreed well with the measured data. Figure 14 is a plot of a 6-MV MLC-shaped field at a depth of 1.2 cm passing through the central axis in a transverse plane. This profile was in the buildup region. The treatment planning system matched the shape of the measurements fairly well. It should be noted, however, that the TG-53 tolerances are high in the buildup region, namely Ϯ20%.

A. Water phantom test cases
Comparing calculations to measurements for all points in this study, we found the treatment planning system calculated photon doses to within the AAPM TG-53 criteria for 99% of points in the buildup region, 90% of points in the inner region, 88% of points in the outer region, and 93% of points in the penumbra. Table II summarizes the results of the monitor unit testing process. The numbers in the cells are the total scatter factors for each test situation. A noteworthy trend is seen in the table. Specifically, when modifiers or blocks were applied to the beam, the treatment planning system consistently underestimated the total scatter factor. The discrepancies in monitor units for the 18-MV 5 cmϫ5 cm and 18-MV 25 cmϫ25 cm beams also did not meet the TG-53 criterion of Ϯ0.5%. However, these criteria do not include the errors in determining the absolute dose under standard calibration conditions in their tolerance figures for the absolute dose at the normalization point. The criteria also do not provide for errors in determining the total scatter factor in their estimate for acceptable agreement. In addition, the errors in monitor units for rectangular fields exceeded the TG-53 tolerance of Ϯ0.5%. The error in monitor units for the mantle field also did not meet the TG-53 criterion for blocked fields of Ϯ1%. The error in monitor units for the oblique incidence field exceeded the TG-53 criterion for external surface variations of Ϯ0.5%. The error in monitor units for the last test case that exceeded the TG-53 criteria was the oblique incidence with a wedge. Here, the TG-53 criterion for wedges of Ϯ2% was used because an explicit criterion for an obliquely incident field with a wedge does not exist. Table III shows the measured and calculated doses in the anthropomorphic lung phantom test case. The last column in the table, the standard error of the mean, demonstrates the precision of the TLD measurements. It is interesting to note that doses to all of the points except one were underestimated by the treatment planning system. The calculated dose to Point 7 deviated the most from the measurements. However, this point was in the penumbra of the beam. Of the points within the beam, the dose at Point 5 deviated the most from the measurements. This could be due to the underestimation of scatter from nearby bone or to the fact that this point had the highest standard error of the mean. In the presence of heterogeneities with significantly different average atomic numbers, such as lung and bone, electron transport should be dealt with explicitly. 12 If we apply the TG-53 criteria, all dose discrepancies were within the specified limits of Ϯ7% in the outer region, Ϯ7% in the inner region, or 7 mm in the penumbra. A previous study verifying calculations from this commercial radiation therapy treatment planning system against Monte Carlo-generated dose distributions on treatment plans found all calculations were within Ϯ2.6% of the Monte Carlo-generated data. 13 In fact, a previous study conducted within this institution comparing treatment plans for large-breasted patients and measurements obtained by thermoluminescent dosimetry ͑TLD͒ in an anthropomorphic phantom found the calculated doses to all points were within Ϯ3% of the measured doses. 14 Table IV displays the measured and calculated doses for the anthropomorphic neck phantom test case. Doses to all test points were overestimated by the treatment planning system except for Point 5, which was located in the spinal cord. This dose underestimation is consistent with the previous test case point that was close to bone. Point 3, which lies on the left edge of the spinal cord, exhibited the largest dose discrepancy. However, the standard error of the mean for this measurement was the largest of all the measured data points. Again, there was good agreement between the calculated dose values and the measured dose values.

DISCUSSION
The primary cause for discrepancies between calculations and measurements were deficiencies in the beam model. For small, square open fields (5 cmϫ5 cm), the calculated shoulders and tails underestimated the measured data. The underestimation resulted because parameters that described the finite source size and stray scatter from the head had to be modified so that monitor unit calculations would closely match clinical data, thus compromising the accuracy of calculations in the shoulders and tails. 7 For large, square open fields (25 cmϫ25 cm), calculations overestimated measurements in the tails, because the parameter that described stray scatter from the head was also modified so that monitor unit calculations would closely match clinical data. Inaccuracies in modeling scatter were also evident in the effect of modifiers or blocks on the accuracy of monitor unit calculations. A possible remedy to the extra focal radiation problem is to use a dual-source photon beam model. 15   Calculated profiles along the long axis of elongated fields ͑5 cmϫ25 cm or 25 cmϫ5 cm͒ underestimated measurements in the shoulder region, while calculated profiles along the short axis overestimated measurements. These inaccuracies occurred because of the manner in which the radial distribution of the in-air fluence was modeled. Specifically, the incident photon fluence was assumed to increase linearly with the distance from the central axis until a certain boundary, beyond which the fluence was assumed to be flat. Thus, two parameters specified the incident fluence: a cone angle, which described the rate of increase in the fluence as the off-axis distance increased; and a cone radius, which described the point at which the fluence profile became flat. 16 In the treatment planning system, all rectangular fields were modeled with cone angles and cone radii for the equivalent square-field size. In commissioning this beam, the cone radius was taken to be field-size dependent to match calculation with measurement. A more realistic beam model, however, would have a cone radius independent of the field size. For the 5 cmϫ25 cm field, the equivalent square is 8.3 cmϫ8.3 cm. The cone radius that should have been used for this field was the one for a 25 cmϫ25 cm field. Similarly, the cone radius that should have been used for the profiles acquired in the x direction for this setup was the cone radius for a 5 cmϫ5 cm field. Consequently, the cone radius of 7 cm, which would have been appropriate for an 8.3 cmϫ8.3 cm field, resulted in a cutoff of the fluence increase at too small a radius for the 25-cm width of the 5 cmϫ25 cm field.
Calculations in blocked fields underestimated measurements both in the tails and in the shoulders, as seen in Fig. 11. The underestimation of dose in the tails may be due to inaccurate modeling of the attenuation and scatter from the block, while the underestimation of dose in the shoulders may also be due to inaccurate modeling of the fluence profile within the field.
Calculations in wedged fields underestimated measurements in the tails on the side of the heel of the wedge and in the shoulder near the toe of the wedge. These discrepancies were due to the symmetric nature of the parameters that were radially dependent such as the Gaussian height parameter, which accounts for more head scatter and modifies the calculated dose in the both tails and the cone angle, which accounts for the profile of the in-air fluence. In the case of a wedge, the relative dose profile is not radially symmetric, resulting in a compromise when selecting the cone radius and cone angle. Moreover, the beam model does not directly account for wedge-generated scatter. One remedy to this situation is to include the wedge in the calculation volume, as in the extended phantom model. 17 The beam model also does not address differential hardening from the wedge. Consequently, calculated depth doses tend to underestimate measurements at deeper depths and overestimate measurements at shallower depths. Calculated doses outside the field yet under MLC leaves were underestimated because interleaf leakage was not modeled.
Ion chamber measurements indicate that doses to most of the calculated points are acceptable according to the TG-53 criteria. The sources of the deviations from the criteria were identified. TLD measurements indicated that the treatment planning system accurately predicted doses in heterogeneous media to within Ϯ3%. However, the generally stated goal of dose delivery accuracy to within 5% was not met in all situations with this beam model. Clinically, the greatest difficulty is posed by rectangular fields, where the inner region of the beam was underestimated by as much as 9.75% in some cases. Also, the monitor unit calculations for the oblique incidence cases show deviations around 2.4%, which is considered borderline acceptable in a clinical context.
To compare calculated and measured doses, the TG-53 report divided the beam into several regions, the buildup region, the inner region, the penumbra, and the outer region. The tolerances for the buildup region range from Ϯ20% for open fields at standard SSD to Ϯ50% for wedged fields. The present study found only six points out of 4138 points exceeding the TG-53 criteria for the buildup region. All six points occurred in the MLC-shaped field test case. The errors that were typically encountered were less than Ϯ20%. According to the TG-53 report, dose acceptability criteria were based on the collective expectations of the members of the task group and were not to be used as goals or requirements for any particular situation. The present work indicated that the TG-53 dose acceptability criteria for the buildup region are too forgiving and may require adjust-ment. Furthermore, the buildup region might be construed as a region of a high-dose gradient and a distance criterion might be used rather than a dose criterion. The criteria cited by Venselaar et al. 18 of 10-15 % or 2-3 mm might be more appropriate here.
A shortcoming of the TG-53 report may be in how the various regions are defined. For example, the TG-53 report defines the penumbra as the region from 0.5 cm inside to 0.5 cm outside the beam/modifier edge. However, this definition does not allow for broadening of the penumbra with depth. This leads to a definition of the penumbra that may not encompass the entire highdose-gradient portion of the beam. For example, as was seen in Fig. 9, several points in the calculated 18-MV 25 cmϫ25 cm beam failed to meet the TG-53 tolerance of Ϯ2% for the outer and inner regions. According to the TG-53 definitions, these points should lie in either the outer or inner regions, but such assignment is questionable because the points are located in a region of steep dose gradient. Consequently, the penumbra of the beam might be better defined by a criterion based on slope. For example, the penumbra for a square, open field could be defined to be in the region where the magnitude of the slope is у3% per mm, as was suggested by Venselaar et al. 18 Such a definition must also ensure that the slope search occurs in a region containing the 50% isodose level to prevent the definition from satisfying the slope limits in the center of the field or other points that do not include the edge of a beam/modifier edge.
Lastly, the TG-53 criteria do not specify tolerances for regions of electronic disequilibrium in heterogeneous media. Goals in the buildup region for heterogeneous media also need to be defined; otherwise one would be unable to judge whether the algorithm is predicting dose acceptably in these situations.
Although this study was performed on a software version ͑Version 4.2͒ that has since been superseded, the same analysis can be performed on newer versions of the software as well as on other radiation treatment planning software.

CONCLUSIONS
We have generated a measured data set for verifying photon dose calculations. In contrast to previous data sets, this set includes measured TSFs, and measurements in an anthropomorphic phantom. The effects of oblique incidence with a wedged field, asymmetric collimation with a wedged field, mantle-field irradiation, and use of an MLC were also studied.
This data set was designed so that it could be used for general verification of photon dosecalculation algorithms. Indeed, the first test case can be used to generate the beam model, and the subsequent test cases can be used to validate the dose-calculation accuracy under various situations. The data set is available on request for anyone wishing to verify their beam model. Further information on obtaining the data set can be obtained via the Radiological Physics Center web site ͑http://rpc.mdanderson.org͒.