A systematic approach to statistical analysis in dosimetry and patientspecific IMRT plan verification measurements
 Songbing Qin^{2},
 Miao Zhang^{1},
 Sung Kim^{1},
 Ting Chen^{1},
 Leonard H Kim^{1},
 Bruce G Haffty^{1} and
 Ning J Yue^{1}Email author
https://doi.org/10.1186/1748717X8225
© Qin et al.; licensee BioMed Central Ltd. 2013
Received: 27 June 2013
Accepted: 22 September 2013
Published: 30 September 2013
Abstract
Purpose
In the presence of random uncertainties, delivered radiation treatment doses in patient likely exhibit a statistical distribution. The expected dose and variance of this distribution are unknown and are most likely not equal to the planned value since the current treatment planning systems cannot exactly model and simulate treatment machine. Relevant clinical questions are 1) how to quantitatively estimate the expected delivered dose and extrapolate the expected dose to the treatment dose over a treatment course and 2) how to evaluate the treatment dose relative to the corresponding planned dose. This study is to present a systematic approach to address these questions and to apply this approach to patientspecific IMRT (PSIMRT) plan verifications.
Methods
The expected delivered dose in patient and variance are quantitatively estimated using Student T distribution and Chi Distribution, respectively, based on pretreatment QA measurements. Relationships between the expected dose and the delivered dose over a treatment course and between the expected dose and the planned dose are quantified with mathematical formalisms. The requirement and evaluation of the pretreatment QA measurement results are also quantitatively related to the desired treatment accuracy and to the tobedelivered treatment course itself. The developed methodology was applied to PSIMRT plan verification procedures for both QA result evaluation and treatment quality estimation.
Results
Statistically, the pretreatment QA measurement process was dictated not only by the corresponding plan but also by the delivered dose deviation, number of measurements, treatment fractionation, potential uncertainties during patient treatment, and desired treatment accuracy tolerance. For the PSIMRT QA procedures, in theory, more than one measurement had to be performed to evaluate whether the tobedelivered treatment course would meet the desired dose coverage and treatment tolerance.
Conclusion
By acknowledging and considering the statistical nature of multifractional delivery of radiation treatment, we have established a quantitative methodology to evaluate the PSIMRT QA results. Both the statistical parameters associated with the QA measurement procedure and treatment course need to be taken into account to evaluate the QA outcome and to determine whether the plan is acceptable and whether additional measures should be taken to reduce treatment uncertainties. The result from a single QA measurement without the appropriate statistical analysis can be misleading. When the required number of measurements is comparable to the planned number of fractions and the variance is unacceptably high, action must be taken to either modify the plan or adjust the beam delivery system.
Keywords
Dose measurement IMRT QA Uncertainty Statistical analysisIntroduction
Successful radiation treatment depends on precise calibration of the treatment machine and on the machine’s accuracy and precision in delivering that particular treatment plan. Protocols have been established to standardize the treatment machine calibration process in order to improve the accuracy of radiation dosimetry [1–5]. Similarly, various treatment quality assurance (QA) protocols and recommendations have been established and followed in many radiation treatment centers [6–10]. These protocols and recommendations typically involve dosimetric measurements, which must then be correctly interpreted in order to ensure proper radiation delivery and patient safety.
One of the major goals of radiation treatments is to deliver the desired dose coverage to the target volume. The dose coverage is determined based on dose computations by a treatment planning system (TPS) and is hopefully achieved by the radiation treatment machine. In the presence of random uncertainties arising from various components of the treatment machine, and given an infinite number of deliveries, the delivered doses would exhibit a statistical distribution with an expected variance and expected mean value. The expected mean delivered value is most likely not equal to the corresponding planned dose because current treatment planning systems do not perfectly model treatment machines. Furthermore, since a treatment course consists of a finite number of fractions, the mean value of the delivered doses over the treatment course may well differ from the expected mean value from an infinite number of fractions. Though the treatment goal is often stated simply as a desired dose (planned dose) delivered to a patient, it would be more accurate to state the goal as delivering a mean dose over a treatment course to a patient within a certain confidence interval (e.g., 95% confidence interval of 3%) around the desired dose. It has been generally accepted that delivered dose to patient should be within 5% of desired one with a 95% confidence level [11, 12], and its precision is affected by uncertainties in every step of radiation treatment process. The goal of this paper is to focus on the radiation delivery step and to present an approach to estimate and evaluate whether tobedelivered doses over a treatment course meet the treatment goal during the radiation delivery.
For more complicated treatment deliveries such as IMRT, patientspecific IMRT (PSIMRT) plan verification QA is usually performed. The purpose of the PSIMRT is to verify the computed dose distribution of a plan is accurate by conducting measurements in (typically) homogeneous phantoms. If the PSIMRT result is deemed acceptable according to certain criteria, the implicit assumption is that the plan, when delivered to patient via a same delivery system, will deliver similarly acceptable doses. The ultimate aim of PSIMRT QA is to ensure plan integrity at treatment and agreement (within a certain tolerance) between delivered and planned dose over a course of treatment. In PSIMRT QA, even after the integrity of plan transfer and treatment machine performance is thoroughly inspected and verified, the existence of measurement uncertainties is welldocumented [13, 14], and investigations have been conducted to incorporate those uncertainties into IMRT treatment delivery and planning [15, 16]. With these uncertainties, it is almost certain that repeated PSIMRT QA measurements will produce a statistical distribution. Therefore, it may be inappropriate to draw a conclusion from PSIMRT QA based on a single measurement. For example, under conditions of correct plan transfer, normal machine function, and proper measurement equipment, it is a relatively common scenario where an initial PSIMRT QA measurement fails to meet preset criteria [17, 18], then subsequently passes on repeat measurement, and a decision must be made whether the plan is acceptable or not. Obviously, according to the general statistical theories, the final decision should not completely ignore the initial failure even if subsequent measurements are acceptable. The decision should be based on statistical analysis of the QA measurements including both the failed and passing results and the expected treatment goal. Furthermore, it is intuitive that a treatment course of fewer fractions requires a higher standard in the distribution of its QA results, e.g., for stereotactic body radiotherapy (SBRT). Therefore, the ultimate goal of PSIMRT QA should be threefold: (1) and most importantly to verify the integrity of plan transfer from the TPS to the treatment unit, to identify major discrepancy such as beam modeling errors, and accelerator/MLC performance, (2) to check the deliverability of the plan, and (3) to evaluate the variation of the plan delivery would be within the statistical tolerance of the treatment prescription based on the fractionation scheme and the measured variation.
In general, there are two types of errors: systematic errors and random errors. Systematic errors are normally caused by inaccuracy in a system or a tendency to consistently be off from a predicted value. Random errors are unpredictable, unknown, and fluctuating variations. Several studies have demonstrated that a finite number of fractions lead to residual errors in total doses delivered to patient [19–21]. According to statistical theory, for a dose quantity of random errors, the expected value and expected standard deviation (SD) of its statistical distribution are unknown but can be estimated using results from a number of repeated and independent measurements. In many cases, the measured mean and standard deviation are directly used as the expected value and expected standard deviation, respectively. However, substitution of a measured mean value for an expected value is scientifically meaningful only if a confidence interval and its corresponding confidence level for the substitution are clearly specified. The confidence interval and confidence level, in turn, are highly dependent on the number and variance of measurements. With the expected value and expected SD of the dose quantity statistically determined, based on the "finitesample distribution theory" of a given statistics, the mean value distribution of the dose quantity delivered over a limited number of fractions can be statistically estimated and is highly dependent on the number of fractions.
Therefore, theoretically, the evaluation of PSIMRT plan verification QA results (and other similar dosimetric measurement procedures) should not be based on a preselected value or a single observation of pass/fail. Rather it should be based on a statistical approach incorporating the number and variance of measured results, the associated accuracy confidence interval and level, and related treatment details such as number of fractions, uncertainties during treatment, and desired dosimetric tolerance. Additionally, uncertainties exist in the measurement equipment and measurement setups. These uncertainties should also be carefully analyzed and taken into consideration for the evaluation of measurement results.
The current study is attempted to build a closed and complete statistical model and expand the scope of the newly improved statistical model and method to include another realm where such a method would be beneficial: dosimetry and PSIMRT plan verification QA measurements
Materials and methods
In the subsequent sections, unless otherwise stated, the term dose or dose value is referred to the dose at a specific point in patient or phantom.
For clarification, a few notations are first defined:

PD_{ X }: Percent Difference between the expected value and mean value of subject X;

P: Probability of the subject of interest;

N: Number of fractions of a treatment course

n: Number of pretreatment QA measurements

R: the expected dose value at a point in QA measurement

σ^{ 2 }: the expected dose variance at a point in QA measurement

${\overline{R}}_{n}$: the mean of measured results for dose at a point from n pretreatment QA measurements

${\overline{\sigma}}_{n}^{2}$: the variance of measured results for dose at a point from n pretreatment QA measurements

${\overline{R}}_{N}$: the mean dose quantity at a point delivered over N fractions of treatment

${\sigma}_{N}^{2}$: the variance of the mean dose quantity at a point delivered over N fractions of treatment
For complicated treatments and plans, such as those involving IMRT and requiring higher delivery accuracy, the questions are 1) how to estimate tobedelivered dose over a treatment course and its deviation from the planned dose, 2) how to use pretreatment QA measurements to infer these dose estimates for the tobedelivered treatment and evaluate the QA outcome accordingly.
Assuming all machine components and the plan transfer are within specification, in a typical PSIMRT plan verification QA procedure, the current practice is to conduct measurement and compare the measured result to the corresponding planned value, typically in a homogeneous phantom. If the difference is smaller than certain preset criteria, the IMRT treatment design is deemed acceptable. As discussed above, the delivered radiation dose most likely exhibits a statistical distribution if repeated, even using the same IMRT plan and radiation treatment machine. Furthermore, the average dose and its deviation over a treatment course also present a statistical distribution and vary with the number of treatment fractions. Thus, a more scientific approach to PSIMRT plan verification QA procedures is not to simply compare the QA results with the corresponding plan value(s) but to adopt a systematic approach to conduct statistical analysis on the QA results, taking into account treatment details such as the number of fractions and desired deviation tolerance.
General formalism for measurement statistical analysis
Assuming a dose quantity has uncertainties (random) and its value follows a certain statistical distribution if it is delivered an infinite number of times, there exist an expected value R and an expected variance σ^{ 2 } for this dose quantity. There are two types of statistical estimations for this dose quantity: 1) estimation of the expected R and σ^{ 2 } by conducting n measurements and 2) estimation of the mean value and variance of the dose quantity when it is to be delivered for a limited number N times. While pretreatment measurements of a dose quantity fall into the first type, the second type is analogous to the estimation of the dose delivered to a patient over a treatment course. In this study, it is assumed that the statistical distribution of a dose quantity follows a Normal Distribution.
Estimation of the percent difference between the expected and QA measured results
where ${\overline{R}}_{n}=\frac{{\displaystyle \sum _{i=1}^{n}{R}_{i}}}{n}$ and ${\overline{\sigma}}_{n}^{2}=\frac{{\displaystyle \sum _{i=1}^{n}{\left({R}_{i}{\overline{R}}_{n}\right)}^{2}}}{n1}$ are the mean value and the variance of the n measurement results, respectively, and Γ(x) is the Gamma function and can be expressed as $\Gamma \left(x\right)={\displaystyle {\int}_{0}^{\infty}{y}^{x1}}{e}^{y}\mathit{dy}$. It should be noted that the expected variance σ^{ 2 } is different from ${\overline{\sigma}}_{n}^{2}$. Whereas ${\overline{\sigma}}_{n}^{2}$ is a measured quantity, the expected variance ${\sigma}^{2}=\underset{n\to \infty}{{\displaystyle lim}}{\overline{\sigma}}^{2}$.
In Eq. 4, $\frac{R{\overline{R}}_{n}}{{\overline{R}}_{n}}100\%$, denoted as PD_{ R }, is the percent difference between the expected result for the procedure under consideration and the mean value of the measurement results. From the equation, it is apparent that the probability distribution of PD_{ R } is independent of the expected variance σ^{ 2 } and can be determined with the parameters of a particular measurement process, such as the number, the mean and variance of the measured results.
where $A=\frac{{\overline{R}}_{n}\sqrt{n}}{{\overline{\sigma}}_{n}}\times \mathit{y\%}$.
Equation (5) indicates that the probability of measurement accuracy is dependent on the number (n), mean (${\overline{R}}_{n}$) and deviation (${\overline{\sigma}}_{n}$) of the measurement results and can be computed directly from the measurement process itself.
It should be emphasized that the expected value is unknown unless there exists zero uncertainty. What are known from a measurement procedure are the measurement results and their distribution.
Estimation of the expected standard deviation
The quantity $Q=\frac{{\overline{\sigma}}_{n}\sqrt{\left(n1\right)}}{\sigma}$ follows "Chi Distribution" with n1 degrees of freedom, with a probability density of ${f}_{Q}\left(t\right)=\frac{{t}^{\left(n1\right)1}{e}^{{t}^{2}/2}}{{2}^{\frac{n1}{2}1}\Gamma \left(\frac{n1}{2}\right)}$ and t ≥ 0.
There exists relationship $\sigma =E\left({\overline{\sigma}}_{n}\right)\frac{\mathrm{\Gamma}\left(\frac{n1}{2}\right)}{\mathrm{\Gamma}\left(\frac{n}{2}\right)}\sqrt{\frac{n1}{2}}$, where $E\left({\overline{\sigma}}_{n}\right)$ is the expected value of ${\overline{\sigma}}_{n}$. σ can be estimated with term ${\overline{\sigma}}_{n}\frac{\mathrm{\Gamma}\left(\frac{n1}{2}\right)}{\mathrm{\Gamma}\left(\frac{n}{2}\right)}\sqrt{\frac{n1}{2}}$.
Probability determination of the percent difference between the expected measurement value and a given value
As stated previously, the dose from the treatment plan (given value) is most likely not equal to the expected dose delivered by the treatment machine.
where a and b are two values to be determined.
Estimation of the dose delivered to a patient over a treatment course
It can be inferred from Eqn. 9 that for a treatment course with a large number of fractions the mean value of a dose quantity delivered to the patient is likely to be close to the expected value even if the dose delivery uncertainty of one fraction (variance σ) is large. However, for a treatment course of few fractions, such as SBRT treatment, dose delivery uncertainty must be reduced to ensure that dose delivered to patient is close to the expected value.
For example, suppose that the mean dose delivered to a patient over a treatment course is required to be within 3% of the expected value with a 95% confidence level, then $2{\sigma}_{N}=\frac{2\sigma}{\sqrt{N}}=3\%$ and $\sigma =1.5\%\times \sqrt{N}$. If the number of fractions is 30, the treatment quality can be maintained as long as the delivery uncertainty (standard deviation) is within 8.2%. However, if the number of fractions is 3 as in many SBRT treatments, the delivery uncertainty must be reduced to 2.6% to ensure treatment quality. Assuming that the uncertainty σ_{ others } from all other sources is 2%, the machine delivery uncertainty should be controlled below $\sqrt{{\left(8.2\%\right)}^{2}{\left(2\%\right)}^{2}}=7.6\%$ to achieve the 3% criteria for the 30 fractions of treatment; while for a SBRT treatment of 3 fractions, the machine delivery uncertainty must be controlled below $\sqrt{{\left(2.6\%\right)}^{2}{\left(2\%\right)}^{2}}=1.7\%$, a much higher requirement than the conventional treatment of 30 fractions. Apparently, if σ_{ others } is higher than 2.6%, there is no way to achieve the expected treatment accuracy for the SBRT treatments and measures have to be taken to reduce σ_{ others }.
where δ is the percent difference between a given dose value (e.g., planned dose value ) and the expected mean dose value delivered over N fractions. δ is different from Δ which is the percent difference between a given dose value and the expected dose value in a single delivery.
Where $F\left(t\right)={\displaystyle {\int}_{\infty}^{+\infty}\frac{\sqrt{N}}{\sqrt{{\sigma}_{\mathit{other}}^{2}+{\sigma}_{m}^{2}}\sqrt{2\pi}}{e}^{\frac{N\left(tR\right)}{2{\sigma}^{2}}}\frac{\mathrm{\Gamma}\left(n/2\right)\frac{{\overline{\sigma}}_{n}}{\sqrt{n}}\mathit{dR}}{\sqrt{\pi \left(n1\right)}\mathrm{\Gamma}\left(\left(n1\right)/2\right)}{\left(1+\frac{{\left(\frac{{\overline{R}}_{n}R}{{\overline{\sigma}}_{n}/\sqrt{n}}\right)}^{2}}{n1}\right)}^{n/2}}$and where z% ≥ 0 and is desired accuracy to the given value.
Using the formalisms presented in sections a) through d), probabilities can be quantitatively derived for relationships among the expected measurement value, a given value, and the expected treatment mean dose over a treatment course using the information readily available from the QA measurement procedure, assuming there is not additional patient treatment uncertainties. However, in clinical treatments, the value of σ_{ others } needs to be carefully estimated based on various factors such as immobilization devices, IGRT devices, etc.
Application of the abovederived analytical method to dosimetric and clinical PSIMRT plan verification QA measurements
As a demonstration, the above derived equations were used to analyze a clinical PSIMRT plan verification QA procedure and to predict the expected treatment mean dose over a treatment course. The QA measurements were carried out using four 0.05 cc Exradin Model A1SL ionization chambers and Standard Imaging 8 channel TomoElectrometer (Standard Imaging, Middleton, WI) on a Tomothearpy HiArt™ machine (TomotherapyAccuray, Madison, WI). The four chambers were positioned inside a "Cheese Phantom" (TomotherapyAccuray, Madison, WI) at four locations of various dose gradients. The measurements were repeated multiple times. Each of the measurements was independent (meaning that the measurements did not affect each other), and the equipment was identical for each measurement. Temperature and pressure were corrected for all the measurements.
The variance in a QA measurement procedure is attributed to the variances of both machine delivery and measurement instruments. However, during a treatment course the variance of measurement instruments are absent while patient setup uncertainty can also contribute to the treatment variance. To simulate the patient setup variance to a limited extent, some of the QA measurements presented in the following section were conducted by resetting the phantom position and then taking measurements. Certainly, this simulation could underestimate the degrees of uncertainty in patient setup since unlike phantom patient body is generally not rigid and may also experience intrafractional motion.
Results
The results presented in this study were measured after careful examinations of the machine and measurement instruments to ensure they functioned normally and within specification. If large deviations in the results were observed, investigations were also conducted on the machines and instruments to verify their working integrity.
Statistical analysis of the accuracy of dose QA measurement results
Patient specific IMRT QA absolute dose measurement analysis
Meas. Num (n)  Measurement result (Gy)  Mean value (1 to n)  Result standard deviation (1 to n)  Estimated dose delivery standard deviation  Probability for the mean to be within 2% accuracy (1 to n)  Probability for the mean to be within 3% accuracy (1 to n) 

a. Chamber 1: corresponding plan dose was 1.702 Gy  
1  1.706  1.706        
2  1.695  1.700  0.008  0.010  89.63%  93.05% 
3  1.695  1.699  0.006  0.007  98.85%  99.50% 
4  1.689  1.696  0.007  0.008  99.76%  99.93% 
5  1.689  1.695  0.007  0.007  99.96%  99.98% 
b. Chamber 2: corresponding plan dose was 0.546 Gy  
1  0.513  0.513        
2  0.507  0.510  0.005  0.006  80.50%  86.76% 
3  0.504  0.508  0.005  0.006  93.11%  96.77% 
4  0.504  0.507  0.005  0.005  97.94%  99.33% 
5  0.502  0.506  0.005  0.005  99.25%  99.83% 
The statistical results were presented for the measurement accuracy analysis. The measurement accuracy was defined as the percent difference between the expected dose value (not the plan value) and the measurement mean value. The analysis estimated the expected dose delivered from the machine assuming that the measurement instruments had minimal uncertainties.
In the tables, columns 1 to 7 are, respectively, the measurement sequence number n (n =1, 2, … , 6), the measurement raw data value R_{ n }, the mean value ${\overline{R}}_{n}=\frac{{\displaystyle \sum _{i=1}^{n}{R}_{n}}}{n}$, the standard deviation ${\overline{\sigma}}_{n}=\sqrt{\frac{{\displaystyle \sum _{i=1}^{n}{\left({R}_{i}{\overline{R}}_{n}\right)}^{2}}}{n1}}$, the estimated expected dose delivery standard deviation, the probability P_{ n }(PD_{ R } ≤ 2%), and the probability P_{ n }(PD_{ R } ≤ 3%) computed from Eqn. 5. P_{ n }(PD_{ R } ≤ 2%) and P_{ n }(PD_{ R } ≤ 3%) were the probabilities of the measurement accuracy being better than 2% and 3%, respectively, if their corresponding mean values were used for the expected value after n measurements.
Using the measurement results from Chamber 1 as an example, the systematic analysis for this particular measurement process can be described as follows. According to Eqns. 1–4 and intuitively, it is improbable to evaluate the accuracy using just the very first measurement reading 1.706 Gy. After the second reading was taken (1.695 Gy), the mean value of the two readings was 1.700 Gy. Using Eqn. 5, the probability of percent difference between this mean value and the expected measurement value being less than 2% was 89.63%. In other words, there was 89.63% probability that the accuracy of this mean value was better than 2%. With a third measurement (1.695), that probability (or confidence level) increased to 98.85%. As more readings were taken, this probability increased. Assuming that the accuracy requirement for this particular measurement process was 2% and confidence level requirement was 95%, it is apparent that in this particular case 3 measurements had to be conducted to achieve the confidence level to substitute the measured mean value for the expected value. However, if the required confidence level was still 95% but the required accuracy was 3%, 2 total measurements, instead of 3, would have been adequate (column 7 of Table 1).
Similar analysis can be performed for Chamber 2 measurements.
Patient specific IMRT QA absolute dose measurement analysis
Meas. num (n)  Measurement result (Gy)  Mean value (1 to n)  Result standard deviation (1 to n)  Estimated dose delivery standard deviation  Probability for the mean to be within 3% accuracy (1 to n)  Probability for the mean to be within 5% accuracy (1 to n) 

a. Chamber 1: corresponding plan dose was 1.702 Gy  
1  1.689  1.689        
2  1.706  1.698  0.012  0.015  89.62%  93.73% 
3  1.550  1.648  0.086  0.097  57.74%  76.25% 
4  1.700  1.661  0.075  0.081  72.60%  88.76% 
5  1.678  1.665  0.065  0.069  83.87%  95.41% 
b. Chamber 2: corresponding plan dose was 0.546 Gy  
1  0.502  0.502        
2  0.466  0.484  0.025  0.032  43.55%  59.61% 
3  0.457  0.475  0.024  0.027  59.52%  77.70% 
4  0.465  0.473  0.020  0.022  75.01%  90.16% 
5  0.453  0.469  0.019  0.021  82.01%  94.62% 
Statistical analysis of the differences between measurement dose and corresponding plan dose
Patient specific IMRT QA absolute dose measurement analysis
Meas. num (n)  Percent difference between the mean and plan dose (%, 1 to n)  Probability for the mean to be within 3% of plan value (1 to n)  Probability for the mean to be within 5% of plan value (1 to n) 

a. Chamber 1: corresponding plan dose was 1.702 Gy. No QA phantom resetup between each measurement.  
1  0.232     
2  0.095  93.05%  95.81% 
3  0.205  99.49%  99.84% 
4  0.341  99.93%  99.97% 
5  0.423  99.98%  99.98% 
b. Chamber 2: corresponding plan dose was 0.546 Gy. No QA phantom resetup between each measurement.  
1  5.960     
2  6.551  3.28%  9.96% 
3  6.939  0.70%  3.12% 
4  7.132  0.10%  0.70% 
5  7.319  0.02%  0.17% 
c. Chamber 1: corresponding plan dose was 1.702 Gy. The QA phantom was resetup between each measurement.  
1  0.751     
2  0.259  89.57%  93.72% 
3  3.153  39.74%  65.16% 
4  2.388  55.53%  81.86% 
5  2.193  65.01%  90.52% 
d. Chamber 2: corresponding plan dose was 0.546 Gy. The QA phantom was resetup between each measurement.  
1  8.065     
2  11.327  4.76%  8.88% 
3  12.963  1.69%  3.35% 
4  13.432  0.39%  0.85% 
5  14.164  0.07%  0.18% 
It should be reemphasized that the plan dose is not at all necessarily equal to the expected dose delivered from the machine.
Without taking the phantom resetup into account (Table 3a), at the location of Chamber 1, two measurements were adequate to have high confidence (95.81%) that the expected machine delivery dose would be within 5% of the corresponding plan dose; after three measurements, one would have high confidence (99.49%) that it would be within 3% of the plan dose. However, with the uncertainty caused by the phantom resetup taken into account, although the mean value of the five measurements was well within 3% (2.2%) of the plan dose, the confidence level (65.01%) was fairly low for the expected delivery dose being within 3% of the plan dose. The confidence level was only 90.52% for the expected delivery dose to be within 5% of the plan dose after 5 measurements.
These results demonstrate that simple comparison of measured mean value to a given value is insufficient to draw a statistically meaningful conclusion about the difference between the expected measurement value and the given value. Appropriate statistical analysis has to be conducted.
As shown in Table 3b and 3d, it was statistically impossible for the expected delivery dose at the location of Chamber 2 to be within 5% of the corresponding plan dose. A different physical quantity such as Gamma Index [22] needs to be used for the QA outcome evaluation.
Statistical estimation of the difference between plan dose and the average dose delivered over a treatment course
Statistical estimation of the difference between planned dose and the average dose delivered over a treatment course
Number of fractions (N)  Probability for the mean to be within 1% of plan value  Probability for the mean to be within 3% of plan value  Probability for the mean to be within 5% of plan value 

a. Estimation at the point corresponding to Chamber 1: corresponding plan dose was 1.702 Gy. No QA phantom resetup between each measurement. The mean measured value used was 1.684 Gy.  
36  20.23%  100.00%  100.00% 
3  40.59%  100.00%  100.00% 
b. Estimation at the point corresponding to Chamber 2: corresponding plan dose was 0.546 Gy. No QA phantom resetup between each measurement. The mean measured value used was 0.503 Gy.  
36  0.00%  0.00%  0.00% 
3  0.00%  0.00%  0.00% 
c. Estimation at the point corresponding to Chamber 1: corresponding plan dose was 1.702 Gy. The QA phantom was resetup between each measurement. The mean measured value used was 1.669 Gy.  
36  8.21%  94.12%  100.00% 
3  23.89%  65.58%  90.16% 
d. Estimation at the point corresponding to Chamber 2: corresponding plan dose was 0.546 Gy. The QA phantom was resetup between each measurement. The mean measured value used was 0.471 Gy.  
36  0.00%  0.00%  0.00% 
3  0.00%  0.00%  0.00% 
e. Estimation at the point corresponding to Chamber 1: corresponding plan dose was 1.702 Gy. The QA phantom was resetup between eachmeasurement and additional 5% uncertainty was included. The mean measured value used was 1.669 Gy.  
36  18.55%  84.00%  99.80% 
3  18.69%  52.33%  76.72% 
f. Estimation at the point corresponding to Chamber 2: corresponding plan dose was 0.546 Gy. The QA phantom was resetup between each measurement and additional 5% uncertainty was included. The mean measured value used was 0.471 Gy.  
36  0.00%  0.00%  0.00% 
3  0.01%  0.06%  0.40% 
As expected, at the location of Chamber 2, regardless of whether a conventional or SBRT treatment course was to be delivered, it was nearly impossible for the average treatment dose to be within 5% of the corresponding plan dose (Table 4b, d and f).
At the location corresponding to Chamber 1, for the conventional treatment course, even at the largest deviation (Table 4e), the average treatment dose was still almost certain to be within 5% of the plan dose (confidence level of 99.8%). With the deviation from the machine delivery alone, it was almost guaranteed that the treatment would be within 3% of the plan dose (Table 4a, the confidence level was 100%). If the resetup uncertainty was added (Table 4c), that confidence level dropped to 94%. If the additional 5% clinical uncertainty was further taken into account (Table 4c), the confidence level to be within 3% decreased to 84%, indicating there was a need to reduce clinical uncertainty to ensure treatment quality. For the assumed SBRT treatment course, the uncertainty caused by the phantom resetup alone brought the confidence levels of the 3% and 5% from 100% down to 65.58% and 90.16%, respectively (Table 4a and c). The additional 5% clinical uncertainty decreased the confidence levels even further to 52.33% and 76.72%, respectively. These results indicated that to ensure SBRT treatment quality as planned there is a need to apply more stringent requirement to minimize any source of uncertainties.
From Table 4a, c and e, it is interesting to note that with the same amount of treatment uncertainty the confidence levels were higher for the SBRT treatment course to achieve high level of treatment accuracy (e.g., within 1% of the plan dose) than the conventional treatment course. It can be explained by the fact that the standard deviation of the average delivery dose in a SBRT treatment course is larger than that of a conventional treatment course (according to Eqn. 11) and the larger standard deviation leads to a broader distribution which may be more likely to span over the plan dose.
Discussion
This study is to establish a model taking the radiation delivery random errors into account. Even after taking the random errors into account, the expected measurement result may still differ from the corresponding planning quantity. This difference is likely caused by the systematic errors that originate from different sources, such as planning algorithm inaccuracy, machine calibration deviation, etc. These systematic errors can be potentially significant.
One of the major purposes of dosimetry measurements is to identify the expected value of the subject of measurement. For certain measurement procedures such as PSIMRT plan verification QA, this expected value of measurement may then be compared to a given value from the treatment plan. The expected value can only be quantified using measurement results, such as the mean and standard deviation, by using statistical concepts like probability, confidence level, and interval. Although a single measurement may provide a numerical value for the subject of measurement, its statistical relevance and significance is impossible to define. To obtain statistically meaningful results, at least two independent measurements must be performed.
From a statistical perspective, the number of measurements should not be predetermined. What should be predetermined are the desired confidence level and interval based on required dosimetric accuracy. The number of required measurements is then dependent on the measurement variance and the chosen confidence interval and level. According to the results presented in Results Section 3.1, it is obvious that smaller measurement deviations require a fewer number of required measurements.
Percent accuracy tolerances have been recommended for radiation treatment beam calibrations and PSIMRT plan verification QA [6, 8, 9]. During those procedures, the current practice is to take measurement(s) and compare the mean values to corresponding desired values (e.g., IMRT planned dose values or 1 cGy/MU in the case of machine calibration measurements). Decisions are then made based on whether the values are within tolerance. As demonstrated in the previous sections, this type of decisionmaking may be flawed since the confidence level of such accuracy is very dependent on the measured variance and number of measurements, and so should be evaluated based on not only simple comparison of the mean values to the given values but also measurement details such as measurement deviations and number of measurements. Moreover, it is anticipated that the doses delivered according to an IMRT treatment plan vary from fraction to fraction and exhibit a statistical distribution even if all machine components function within specifications. The standard deviation of this distribution may be influenced by several factors, such as machine delivery variation, patient setup variation, patient organ motion and body contour change, etc. On the other hand, IMRT treatment consists of a limited number of fractions, sometimes only a few fractions (e.g., Stereotactic Body Radiotherapy). The standard deviation of the expected average dose over a treatment course is not only dependent on the standard deviation of individual treatments but also highly dependent on the number of fractions (Eqn. 9). Thus, evaluation of a QA measurement outcome should also take into consideration the details of the treatment course for which the QA is performed. For example, as shown in Table 3c, after five QA measurements, the percent difference between the average QA result and the corresponding plan dose was less than 3% (2.19%). Using the common practice of direct comparison with a preset tolerance of 3%, one would likely draw a conclusion that the QA result was acceptable. However, based on the statistical analysis, it was found that the confidence level was only 65% for the expected dose to be within 3% of the plan dose in an individual dose delivery, making it a little difficult to decide whether the plan was indeed acceptable for patient treatment. On the other hand, if the treatment course consisted of 36 fractions, the confidence level was 94% for the expected average treatment dose to be within 3% of the plan dose (Table 4c). In this case, the QAed plan could be deemed acceptable for this treatment course. Conversely, if the treatment course consisted of only 3 fractions, the confidence level was only 65.6% for the expected average dose to be within 3% of the plan dose over this hypofractionated course. In this later case, whether or not the QAed plan should be used for treatment might become questionable and action might be required to either modify the plan or adjust the beam delivery system to ensure treatment quality.
In a typical PSIMRT plan verification QA procedure, there is more than one point of interest that is evaluated by measurement and compared to a corresponding plan value. Thus, the decision making process is actually more difficult than the cases presented in this study. On the other hand, the basic principle still holds that a single measurement showing agreement or disagreement with the corresponding plan value cannot be used to draw a definitive conclusion in the pass or failure of the PSIMRT plan verification QA.
The purpose of PSIMRT plan verification QA is to verify the accuracy and precision of plan delivery. The current standard for PSIMRT plan verification QA verifies only the accuracy of plan delivery without providing statistical details. The ultimate goal of the plan QA is to ensure that the average dose delivered over a treatment course is within a desired tolerance with the plan. According to the proposed method, determining the accuracy of an IMRT plan requires multiple measurements and the information about the treatment course itself. The standard deviation in Eqns 10 and 11 should contain the contributions from various uncertainty sources, such as machine delivery, patient setup, anatomic motion and deformation, daytoday machine variations, etc. Unfortunately, the only component that the conventional QA measurements can detect relatively accurately is the machine plan delivery variation. Therefore, a more reasonable way of evaluating a QA outcome may be as follows. First, a desired accuracy tolerance (e.g., the percent dose difference between plan and average delivery dose) with a specified confidence level is decided upon for the tobeQAed treatment course. Second, a percent standard deviation (uncertainty) is estimated for the clinical patient treatment based on the patient anatomy study and motion evaluation (e.g., 4DCT for motion analysis), daytoday machine stability, estimated patient setup variation, etc. Third, two QA measurements are performed and the results are analyzed using Eqns. 9–13. Fourth, if the subsequent confidence level does not meet the specified confidence level for the desired accuracy, additional QA measurements are conducted and the results are analyzed until no improvements in the confidence level are seen. Fifth, if the confidence level still does not meet the specified level, either the clinical patient treatment uncertainty needs to be further reduced or the plan needs to be revised.
In the analysis throughout the current study, measurement equipment uncertainty was not taken into account. If equipment random uncertainty (denoted as σ_{ equip }) is known, the measurement deviation for the subject of interest can be approximated as ${\overline{\sigma}}_{n}=\sqrt{{\overline{\sigma}}_{\mathit{measurement}}^{2}{\sigma}_{\mathit{equip}}^{2}}$. If equipment is found to contribute to systematic errors, they should be identified and corrected for.
Dose distributions in conformal radiation treatments (e.g., IMRT and 3DCRT) can exhibit signification variations. Although the presented method is applicable to any point inside a phantom/patient, the derived statistical results most likely vary at different locations. Thus, a more comprehensive three dimensional approach is required to analyze the dose coverage inside a patient. This three dimensional analytical approach, taking many points into consideration, is beyond the scope of the current study and is subject to further investigation.
QA device is available now to simultaneously measure delivered doses at many different points. In principle, it is reasonable to utilize the measurement results at numerous points to derive the delivery variance, assuming that the dose measurement equipment is perfect. Unfortunately, the currently available measurement equipment has its own intrinsic limitations. Depending on the measurement region (e.g. high dose gradient regions vs low dose gradient regions), the variance introduced by the measurement equipment can be different at different locations. Careful analysis is required to utilize the measurement results at number locations for this purpose. On the other hand, the measurement results, obtained from many comparable patient QA measurements, can be useful to estimate the variance. The method and analysis presented in this study require two conditions: 1) measurements are independent and 2) the measurement results are normally distributed. The first condition is easily satisfied since one measurement does not affect the others, while the second condition is still an assumption, though generally accepted. If measurement of a subject of interest is proven to having something other than a normal distribution, the results and conclusions from the current study are not applicable.
The analysis presented above assumed that the difference in dose calculated for a QA phantom relative to the dose actually delivered by that machine to the phantom, is similar to the difference in dose calculated for a patient relative to the dose delivered to the same rigidly positioned patient by the same machine. This assumption is approximately true given that the phantom materials are similar to patient tissues. However, it needs to be again emphasized that there exist other errors, such as uncertainties in CT numbers, anatomical changes between simulation and treatment and during treatment, beam calibration variation, etc [18], which may lead to additional overall dose delivery uncertainty.
We acknowledge that although we believe our method to be scientifically sound it clearly will add to the workload of medical physicists and its practicality remains to be evaluated. Furthermore, in many centers, pretreatment dosimetric verification is not carried out for every patient but only in a limited number of complicated cases. Therefore the feasibility of implementing the methods described in this work to assess PSIMRT QA results, although valid from a theoretical/methodological point of view, would have to be evaluated in the clinical scenario and perhaps combined with a population systematic and random errors based approach (van Herk, 2004) [23]. On the other hand, if the QA passing tolerance is established and delivery process uncertainty is established in a department, with the developed method, one QA measurement result should yield to a probability value for the QA to pass. If the probability value is higher than a certain acceptable level (e.g., 95%), no additional measurement is needed. The results of the current study were based on the assumptions that there were no human errors and that the user’s equipment was in good condition during measurements. If the measurement deviation is larger than usual, equipment malfunction must be ruled out. Rote adherence to this statistical method and approach without careful examination and analysis could lead to serious errors. It is also noticed that the some measurement data presented in this study exhibited certain nonrandomness. It could be coincidence since all involved equipment were carefully evaluated and underwent adequate warmup process before measurements.
Conclusions
By acknowledging and considering the statistical nature of multifractional delivery of radiation treatment, we have established and demonstrated a quantitative methodology to evaluate the PSIMRT QA results. Both the statistical parameters associated with QA measurement procedures and the treatment course itself need to be taken into account to evaluate the QA outcome and to determine whether the plan is acceptable and whether additional measures should be taken to reduce treatment uncertainties. The result from a single QA measurement without statistical analysis can be misleading. When the required number of measurements is comparable to the planned number of fractions and the variance is unacceptably high, action must be taken to either modify the plan or adjust the beam delivery system.
Declarations
Authors’ Affiliations
References
 Gerbi BJ, Higgins PD, Khan FM, Antolak JA, Herman MG, Deibel FC, Followill DS, Huq MS, Mihailidis DN, Yorke ED, Hogstrom KR: TASK GROUP REPORT: recommendations for clinical electron beam dosimetry: supplement to the recommendations of task group 25. Med Phys 2009, 36: 32393279. 10.1118/1.3125820View ArticlePubMedGoogle Scholar
 Almond PR, Biggs BJ, Coursey BM, Hanson WF, Huq MS, Nath R, Rogers DWO: AAPM’s TG51 protocol for clinical reference dosimetry of highenergy photon and electron beams. Med. Phys 1999, 26: 18471870. 10.1118/1.598691View ArticlePubMedGoogle Scholar
 Khan FM, Doppke KP, Hogstrom KR, Kutcher GJ, Nath R, Prasad SC, Purdy JA, Rozenfeld M, Werner BL: Clinical electron‒beam dosimetry: report of AAPM radiation therapy committee task group No. 25. Med. Phys. 1991, 18: 73109. 10.1118/1.596695View ArticlePubMedGoogle Scholar
 Schulz RJ, Almond PR, Cunningham JR, Holt JG, Loevinger R, Suntharalingam N, Wright KA, Nath R, Lempert GD: A protocol for the determination of absorbed dose from high‒energy photon and electron beams. Med. Phys. 1983, 10: 741771.View ArticleGoogle Scholar
 International Atomic Energy Agency: Absorbed Dose Determination in External Beam Radiotherapy: An International Code of Practice for Dosimetry Based on Standards of Absorbed Dose to Water, Technical Reports Series No. 398. Vienna: IAEA; 2000.Google Scholar
 Kutcher G, Coia L, Gillin M, Hanson WF, Leibel S, Morton RJ, Palta JR, Purdy JA, Reinstein LE, Svensson GK, Weller M, Wingfield L: Comprehensive QA for radiation oncology: report of AAPM radiation therapy committee task group 40. Med. Phys. 1994, 21: 581618.View ArticlePubMedGoogle Scholar
 Das IJ, Zhu TC, Cheng CW, Watts RJ, Ahnesjo A, Gibbons J, Li XA, Lowenstein J, Mitra RK, Simon WE: Accelerator beam data commissioning equipment and procedures: report of the TG106 of the therapy physics committee of the AAPM. Med. Phys. 2008, 35: 41864215. 10.1118/1.2969070View ArticlePubMedGoogle Scholar
 Ezzell GA, Burmeister JW, Dogan N, LoSasso TJ, Mechalakos JG, Mihailidis D, Molineu A, Palta JR, Ramsey CR, Salter BJ, Shi J, Xia P, Yue NJ, Xiao Y: IMRT commissioning: multiple institution planning and dosimetry comparisons, a report from AAPM task group 119. Med. Phys. 2009, 36: 53595373. 10.1118/1.3238104View ArticlePubMedGoogle Scholar
 Klein EE, Hanley J, Bayouth J, Yin FF, Simon W, Dresser S, Serago C, Aguirre F, Ma L, Liu C, Sandin C, Holms T: Task group 142 report: quality assurance of medical accelerators. Med. Phys. 2009, 36: 41974212. 10.1118/1.3190392View ArticlePubMedGoogle Scholar
 James H, Beavis A, Budgell G, Clark C, Convery D, Mott J, Dearnaley D, Perry R, Scrase C: Guidance for the Clinical Implementation of Intensity Modulated Radiation Therapy, IPEM Report 96 2008. York, UK: Institute of Physics and Engineering in Medicine; 2008.Google Scholar
 Svensson GK, Baily NA, Loevinger R, Morton RJ, Moyer RF, Purdy JA, Shalek RJ, Wootton P, Wright KA: Physical Aspects of Quality Assurance in Radiation Therapy, AAPM REPORT No. 13, International Standard Book Number: O883 184575. New York, NY: The American Institute of Physics, Inc.; 1984.Google Scholar
 International Commission on Radiation Units and Measurements: Determination of absorbed dose in a patient irradiated by beams of x or gammarays in radiotherapy procedures, ICRU Report 24. Oxford, UK: Journal of the ICRU, Oxford University Press; 1976.Google Scholar
 Li JS, Lin T, Chen L, Price RA Jr, Ma CM: Uncertainties in IMRT dosimetry. Med Phys 2010,37(6):2491500. 10.1118/1.3413997View ArticlePubMedGoogle Scholar
 SánchezDoblado F, Hartmann GH, Pena J, Capote R, Paiusco M, Rhein B, Leal A, Lagares JI: Uncertainty estimation in intensitymodulated radiotherapy absolute dosimetry verification. Int J Radiat Oncol Biol Phys 2007,68(1):30110. 10.1016/j.ijrobp.2006.11.056View ArticlePubMedGoogle Scholar
 Jin H, Palta J, Suh TS, Kim S: A generalized a priori dose uncertainty model of IMRT delivery. Med Phys 2008,35(3):98296. 10.1118/1.2837290View ArticlePubMedGoogle Scholar
 Jin H, Palta JR, Kim YH, Kim S: Application of a novel doseuncertainty model for doseuncertainty analysis in prostate intensitymodulated radiotherapy. Int J Radiat Oncol Biol Phys 2010,78(3):9208. 10.1016/j.ijrobp.2010.01.063View ArticlePubMedGoogle Scholar
 Palta JR, Kim S, Li JG, Liu C: Tolerance limits and action levels for planning and delivery of IMRT. IntensityModulated Radiation Therapy: The State of The Art, AAPM 2003, Medical Physics Monograph No. 29:593–612. Madison, WI, USA: Medical Physics Publishing; 2003.Google Scholar
 Palta JR, Jin H, Kim S: Developing a rationale for tolerance values and action levels for the performance of external beam planning and delivery systems. Uncertainties in External Beam Radiation Therapy, AAPM 2011, Medical Physics Monograph No. 35. Madison, WI, USA: Medical Physics Publishing;Google Scholar
 van Herk M, Witte M, van der Geer J, Schneider C, Lebesque JV: Biologic and physical fractionation effects of random geometric errors. Int J Radiat Oncol Biol Phys 2003,57(5):146071. 10.1016/j.ijrobp.2003.08.026View ArticlePubMedGoogle Scholar
 Leong J: Implementation of random positioning error in computerized radiation treatment planning systems as a result of fractionation. Phys Med Biol 1987, 32: 327334. 10.1088/00319155/32/3/002View ArticlePubMedGoogle Scholar
 Lujan AE, Ten Haken RK, Larsen EW, Balter JM: Quantization of setup uncertainties in 3D dose calculations. Med Phys 1999, 26: 23972402. 10.1118/1.598756View ArticlePubMedGoogle Scholar
 Low DA, Harms WB, Mutic S, Purdy JA: A technique for the quantitative evaluation of dose distributions. Med. Phys. 1998,25(5):656661. 10.1118/1.598248View ArticlePubMedGoogle Scholar
 van Herk M: Errors and margins in radiotherapy. Semin Radiat Oncol 2004,14(1):5264. 10.1053/j.semradonc.2003.10.003View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.