Volume 44, Issue 4 p. 1590-1601
Special Report
Free Access

Development and testing of a database of NIH research funding of AAPM members: A report from the AAPM Working Group for the Development of a Research Database (WGDRD)

Brendan Whelan

Corresponding Author

Brendan Whelan

Radiation Physics Laboratory, University of Sydney, Sydney, NSW, 2006 Australia

Ingham Institute for Applied Medical Research, Liverpool, NSW, 2170 Australia

Author to whom correspondence should be addressed. Electronic mail: [email protected].Search for more papers by this author
Eduardo G. Moros

Eduardo G. Moros

H. Lee Moffitt Cancer Center, Tampa, FL, 33612 USA

Search for more papers by this author
Rebecca Fahrig

Rebecca Fahrig

Stanford University, Palo Alto, CA, 94305 USA

Search for more papers by this author
James Deye

James Deye

National Cancer Institute, Bethesda, MD, 20892 USA

Search for more papers by this author
Thomas Yi

Thomas Yi

Department of Biomedical Engineering, John Hopkins University, Baltimore, MD, 21205 USA

Search for more papers by this author
Michael Woodward

Michael Woodward

American Association of Physicists in Medicine, Alexandria, VA, 22314 USA

Search for more papers by this author
Paul Keall

Paul Keall

Radiation Physics Laboratory, University of Sydney, Sydney, NSW, 2006 Australia

Search for more papers by this author
Jeff H. Siewerdsen

Jeff H. Siewerdsen

Department of Biomedical Engineering, John Hopkins University, Baltimore, MD, 21205 USA

Search for more papers by this author
First published: 11 January 2017
Citations: 14

Abstract

Purpose

To produce and maintain a database of National Institutes of Health (NIH) funding of the American Association of Physicists in Medicine (AAPM) members, to perform a top-level analysis of these data, and to make these data (hereafter referred to as the AAPM research database) available for the use of the AAPM and its members.

Methods

NIH-funded research dating back to 1985 is available for public download through the NIH exporter website, and AAPM membership information dating back to 2002 was supplied by the AAPM. To link these two sources of data, a data mining algorithm was developed in Matlab. The false-positive rate was manually estimated based on a random sample of 100 records, and the false-negative rate was assessed by comparing against 99 member-supplied PI_ID numbers. The AAPM research database was queried to produce an analysis of trends and demographics in research funding dating from 2002 to 2015.

Results

A total of 566 PI_ID numbers were matched to AAPM members. False-positive and -negative rates were respectively 4% (95% CI: 1–10%, N = 100) and 10% (95% CI: 5–18%, N = 99). Based on analysis of the AAPM research database, in 2015 the NIH awarded $USD 110M to members of the AAPM. The four NIH institutes which historically awarded the most funding to AAPM members were the National Cancer Institute, National Institute of Biomedical Imaging and Bioengineering, National Heart Lung and Blood Institute, and National Institute of Neurological Disorders and Stroke. In 2015, over 85% of the total NIH research funding awarded to AAPM members was via these institutes, representing 1.1% of their combined budget. In the same year, 2.0% of AAPM members received NIH funding for a total of $116M, which is lower than the historic mean of $120M (in 2015 USD).

Conclusions

A database of NIH-funded research awarded to AAPM members has been developed and tested using a data mining approach, and a top-level analysis of funding trends has been performed. Current funding of AAPM members is lower than the historic mean. The database will be maintained by members of the Working group for the development of a research database (WGDRD) on an annual basis, and is available to the AAPM, its committees, working groups, and members for download through the AAPM electronic content website. A wide range of questions regarding financial and demographic funding trends can be addressed by these data. This report has been approved for publication by the AAPM Science Council.

1 Introduction and background

The American Association of Physicists in Medicine (AAPM) is the representative body of most medical physicists in the USA, and is the principal organization promoting the professional practice, educational activities, and research endeavors in the field of Medical Physics. At time of writing, the membership of the AAPM consisted of approximately 8350 members working across hospitals, universities, and industry. Broadly speaking, medical physicists perform three important roles in a modern health care system: firstly, to ensure the optimal and safe performance of a variety sophisticated therapeutic and diagnostic machines, systems and processes; secondly, to educate trainees in both medical physics and clinical medicine with respect to medical physics principles, techniques, and technology; and thirdly, to perform research ranging from basic science and technology development to the invention of new techniques, procedures, and translation of new scientific findings into clinical practice. With research presenting a critical aspect of the state and future of medical physics, and with the National Institutes of Health (NIH) presenting a major source of research support in these areas, this paper reports on the development of a database of NIH research funding of AAPM members.

Performing scientific research requires funding sufficient to support resources and personnel. In areas like healthcare, the innovation and improvements that result from research are in the national interest;1 as such, many nations dedicate a substantial proportion of their annual budget to medical research.2 In the USA, approximately 2.5% to 3% of the federal budget is dedicated to scientific and medical research. The main federal funding body of healthcare in the USA is the National Institute of Health (NIH) that attracted approximately 0.75% of the total US budget in 2016 (equating to approx. 32 billion USD).3 This money is allocated to medical researchers through a competitive peer review process.4 Through this process, researchers (including many AAPM members) apply for and are awarded research funding of various amounts and with various degrees of success. Exactly how much a researcher receives, whether this proportion is changing with time, which research areas are funded each year, and which AAPM members are receiving funding are all questions that have been difficult to answer in the past, even though such information is in principle available in the public domain.

Obtaining this information is important to both the AAPM, who must ensure that its members continue to perform clinically relevant and nationally competitive research, and the NIH, which should aim to ensure the discipline that gave the world CT, MRI, ultrasound, radiation therapy, medical lasers, etc. (to name but a few5) continues to attract funding commensurate with the potential of this research to positively impact human health. The purpose of this paper is twofold; (1) present the development and testing of a research database of AAPM members awarded research funding by the NIH, and (2) present a top-level analysis of trends in this funding. Although the NIH records dating back to 1985 are publicly available, it is not straightforward to extract the data associated with a given field or organization. This work presents a data mining approach which can extract funding records associated with an input list of AAPM members from data available from the NIH RePORTER tool.6 This report was developed as part of the activities of the AAPM Working Group for the Development of a Research Database (WGDRD) with approval of the AAPM Science Council. The AAPM research database is available for AAPM members to download through the AAPM website.

2 Methods

The process for creating the AAPM research database is shown in Fig. 1. Each stage involved in creating the database is described in detail below.

Details are in the caption following the image
The basic process used to generate the AAPM research database. This process is described in detail in the methods section 2B. [Color figure can be viewed at wileyonlinelibrary.com]

2.A. Input data

This work utilizes two databases: NIH RePORTER records and AAPM membership records. The AAPM data extends back to 2002, as well as containing a “member since” field that extends back to 1962. Each AAPM member is identified by a unique “status_id” number, and First, Last, and Middle names are stored in separate fields (note that throughout this manuscript, searchable fields are indicated by this formating). All fields in the AAPM data are shown in Table 1.

Table 1. Structure of the AAPM data. The shaded entries represent the fields that are queried at least once in the present work. Note that the last two fields show the time period each record is valid (e.g., 2002–2006)
Field Example
Status_ID (internal unique identifier) 4134
Title prefix Dr
First name Jane
Middle name/initial A
Last name Doe
Title Suffix Jnr.
Date of Birth 12-30-1969
Job title Asst. Professor
Organization University of Excellence
Department Medical Physics
Highest academic degree PhD
Gender F
Email [email protected]
Member since 2001
Dues category Full
Dues sub category Full
Phone Number 555-555-5555
Extension? None
Fax number 555-555-5556
Fellow Y
Charter member N
Active from 2002
Active till 2015

NIH RePORTER funding records are publicly available for download and can be queried online.6, 7 A brief outline of these data is shown in Table 2; a more detailed description of the contents of these records is available online.8 The data extend back to 1985, although funding amount is only recorded from 2000 onward. Each principal investigator (PI) is assigned a seven or eight digit “PI_ID” number, which typically remains constant throughout their career (occasional exceptions do exist as discussed in Section 4). PI names are stored in a single field, in the format “Last, First”. The last name is straightforward to interpret, however, the first can include multiple names as well as middle initials. In this work, we separated the “first” string by spaces, and took the first space separated string to be the first name of that investigator. From 2006 onwards, the NIH introduced a multi-PI application model. In these cases, each record in the NIH data can have more than one PI. In this case, each PI name and number is stored in the same field separated by a semicolon. The PI who submits the grant and is responsible for communication with the NIH is assigned as “contact”. In such a case, the PI name field might read: “PIname1; PIname2 (contact); PIname3”, where each name follows the “Last, First” convention described above. The same structure is applied to the PI_ID field.

Table 2. Relevant fields from the NIH records. Fields that are queried at least once in this work are shaded gray. Other fields shown here were not queried, but may be useful in future work. Note that a complete description of NIH records can be found online.8
Field Example
Application_ID 8913171
Activity R01
ARRA_Funded? No
IC_Name National Cancer Institute
NIH_spending_Cats Cardiovascular; Heart Disease; Lung; Neurosciences; Rare Diseases;
Org_City Dubbo
Org_Country USA
Org_Department Biomedical Engineering
Org_name The University of Dubbo
Org_state CA
PI_IDs 7682827
PI_Names Doe, Jane
ProgramOfficerName Steven Stevens
Suffix S1
Direct_cost_amt 100
Indirect_cost_amt 100
Total_cost 200

In order to extract records relating to research grants held by AAPM members from the NIH funding records, the unique AAPM identifier “Status_ID” must be associated with the unique NIH identifier “PI_ID”. However, there is no straightforward means to do this; therefore a data mining approach was developed and is described below.

2.B. Data mining algorithm

While multiple common fields exist in each dataset, there is no unique field that would allow simple, reliable, and unambiguous information linkage. Therefore, the approach taken was to query multiple non-unique fields and consider the combined net evidence before making a decision. For example, querying according to last name and institution provides far better discriminatory evidence than last name alone. Extraction of grants is a three-stage process, broadly outlined in Fig. 1 and described in detail below.

2.B.1. Initial filtration and processing of NIH data

The first stage is an initial query of the NIH data, in which all grants from PIs with the same last name and first initial as an AAPM member (e.g., “smith, j”) are extracted, processed, and written to a separate Excel spreadsheet. The purpose of this is threefold: (1) it cuts down the amount of data that must be subjected to more detailed analysis downstream; (2) it preprocesses the (occasionally messy) NIH data into a consistent and easy to read format to avoid downstream errors; and (3) it splits up multi-PI grants into records that can be individually queried. This last step warrants further description. For each multi-PI grant, the “contact” PI_ID and PI_names are extracted, and other PIs are deleted (see section 2A for a description of the data format). Note that this does not discard any useful data because we rely on the institution field downstream, and the NIH stores these data only for the “contact” PI. Also, at this stage we are only trying to find PI_ID numbers. Once found, they are still used to query multi-PI grants including non “contact” PIs, as described in section 2. B. 3. At the end of this step, the data are reduced from ~900 MB to ~10 MB, and comprise ~222,000 records in the format: last name, first name, PI_ID, financial year of record, pi institute, and project name.

2.B.2. Associate AAPM members with their PI_ID

The next step is to apply a series of more detailed tests to the data from part I to associate AAPM members with their NIH PI_ID. To do this, we consider the net evidence provided by comparisons between the NIH record and the AAPM record, querying: first name, first initial, last name, and institute. In cases for which the PI changes institution during the period of a grant project, both institutions are listed within the NIH record, and the TOTAL_COST that year is the sum of the records from each institution. As well as this, the funding body from the NIH data is checked. Explanation of the tests used is given in Table 3, and the combined test results that trigger a match are described in Table 4. This process was undertaken for each year of NIH records from 1985 to 2015. For each year, the algorithm reads all recorded NIH funding, and AAPM members who were active within 1 yr of the year being tested. For example, if the year 2000 was being tested, AAPM members active from 1999 to 2001 (inclusive) would be included in the testing process. Because the PI_ID does not (usually) change from year to year, this approach gives multiple opportunities to make a positive match. In any single year, errors may occur in either dataset that can confound the test results; however, when multiple years are analyzed it can substantially reduce the impact of these errors.

Table 3. Description of the tests used to determine whether each record should be matched to an AAPM member
Test Preprocessing Criteria for positive result (“T”)
AAPM data NIH data
Last name Last name already has its own field. Spaces, dashes etc. are removed from string. Names are stored as Last, First Middle. Last name extracted, then spaces, dashes etc. removed from string. Case insensitive string match
First name First name already has its own field. First name is taken as the first space separated string after the comma (see above) Case insensitive string match
First initial First character of first name. First character of first name. Case insensitive string match
Institute
  1. Common institute abbreviation are “unfolded”, for example, “UCLA” becomes “UCLA University of California Los Angeles”
  2. All common words such as “university” removed from name (Table 5)
  3. Remaining string separated into words using the space character
  1. Common institute abbreviation are “unfolded”, for example, “UCLA” becomes “UCLA University of California Los Angeles”
  2. All common words such as “university” removed from name (Table 5)
  3. Remaining string separated into words using the space character
All words between the two data sets are compared using a lower case string match. Only one match between the two sets of words is required for a positive result.
Funding Body NA Funding body stored in the “IC_NAME” field of NIH data.

If funding body is one of: “National Cancer Institute”, “National Heart, Lung and Blood Institute”,

“National Institute of Biomedical Imaging and Bioengineering”, “National Institute of Neurological disorders and stroke.”

Table 4. The test results required to identify a positive match between an NIH record and an AAPM member. Three inclusion instances are currently supported; these could be extended in the future
Inclusion instance Last Name First Name First Initial Institute Funding Body Manual review
1 T T T T T F
2 T F T T T F
3 T T T T F T

The results from each year are stored in separate sheets of a Microsoft Excel workbook, along with the binary indicators for each of the test results, as outlined in Table 4. Based on these test results, the PI_ID associated with each record is either included or discarded. The inclusion criteria are shown in Table 4. After the final year, all results are analyzed and consolidated into a list of unique PI_ID numbers. When multiple matches for the same PI_ID number are found, the binary flags shown in Table 4 are updated, and the best case (determined by the binary flags) is kept. Also at this stage, a list of PI_ID corrections from previous iterations is read — known false positives are discounted, while previously identified “manual review” numbers are unflagged.

Two lists are defined before running this part of the code. The first is a list of words not to include in the institute match test. The second is a list of “cosntrained names” (Table 5). This is a list of very common last names that were observed to cause a large number of false positives (often because they also tended to share a first initial). If the last name is a “constrained name”, then inclusion instance 2 (which allows a match if the first name does not match) is not used, and the entry is automatically flagged for manual review. Both lists have been developed based on repeated running of the code and identification of failure points; both can also be very quickly updated as required for future use.

Table 5. The list of common words removed before institution matching and a list of last names that have been identified as producing a high number of false positives. For these names, inclusion instance 3 (Table 4) is excluded. Both lists can be easily updated and maintained in the code
Words removed from institution matching university, univ, univ., of, the, institute, hospital, medical, center, college, school, cancer, therapy, centre, center, NaN, na, for, city, physics, program, national, health, inc., specialists, centers, ctr, ctr., oncology, royal, and, &, medicine, at, hospitals, clinic, inc, inc., state
Constrained names kim, lee, chen, zhang, huang, wang

Before continuing to the next step, the returned list of PI_IDs is subjected to manual QA. Firstly, any entries flagged for manual review are checked. Entries can be flagged for manual review based on several reasons: the entry was included based on criteria 3, the last name was a constrained name, the PI_ID was matched to more than one person, or the investigator had multiple PI_IDs assigned to them. When errors are found in the PI_ID list, they are stored on a separate worksheet so they do not appear again, and deleted from the final list. Similarly, when entries that were flagged for manual review pass inspection, they are added to an “unflag” list so they are not flagged again in subsequent iterations. In addition to the review of flagged entries, additional QA is carried out on this list to estimate false-positive and -negative rates. This is described in section 2C.

2.B.3. Using the PI_ID list to extract AAPM grant records

Using the list of PI_IDs described above, the NIH data were again queried, and all grants from those PIs were extracted and stored in an Excel workbook, with grants from each stored on a separate sheet. Note that we now query the full NIH dataset, and not the filtered intermediate data that were used to obtain the list of PI_IDs in step II. When a positive match is found, the NIH data are appended with additional data which are only available in the AAPM records: Date of Birth, Gender, and the unique AAPM identifier Status_ID, which allows each grant to be linked back to AAPM records. Grants are recorded by the expenditure each year, so for example, a grant running from 2001 to 2003 will have a record in each of those years. Multiyear projects can easily be consolidated if desired, since each NIH project has a unique ID, APPLICATION_ID.

Occasionally, a single project has multiple entries in the NIH database in 1 yr. There are a number of reasons why this occurs: most commonly, the grant has been appended or supplemented, which means that some change was made to the project. In this work, duplicate records are treated as follows: if funding information exists for both records, then records are combined and total funding updated. If funding information exists for one record but not for the other, the empty record is deleted. If funding information exists for neither, then one is deleted and one kept. In this way, both overall funding and number of grants awarded are as accurate as possible.

In the case of multi-PI grants, two additional databases are created; one for the case where the “contact” PI is an AAPM member, and another where the “contact” PI is not an AAPM member but an AAPM member is listed elsewhere as an investigator. Unless otherwise stated, the results presented below are the combined data from single-PI grants and multi-PI grants where the “contact” investigator is an AAPM member. In other words, multi-PI grants where the lead investigator is not an AAPM member are excluded from most of the analysis below (to get a sense of the amount of funding attributed to each case, see Fig. 6).

2.C. False positives and false negatives

The above process is by no means infallible; some records will be missed (false negatives) while others will be incorrectly included (false positives). For the data to be useful, it is important to have a method for estimating the false-positive and false-negative rates. As both false positives and false negatives are binary quantities (true/false), confidence intervals for the results were assessed using binomial statistics.9

To assess the false-positive rate, a list of 100 PI_IDs was randomly extracted with replacement from the total list of ~600, and each of these was entered into the “Principal Investigator (PI)/Project Leader” field of the NIH reporter website with all available years selected.6 Based on the returned grants, PI name, and institution, each record was examined manually to determine whether or not this PI_ID should legitimately be associated with an AAPM member. This process was repeated numerous times throughout the development of the algorithm to identify failure points and improve the algorithm. Each time this was performed, false-positive results were stored on a separate spreadsheet and excluded from the next iteration of the algorithm. Although it could be argued that such a process biases the final false-positive results presented below, in the future fewer than 20 new PI_ID numbers are expected each year, and these can and should be quickly checked manually in the manner described above to ensure the ongoing fidelity of the database.

False negatives are somewhat harder to detect than false positives, as by definition one is dealing with data that have not been detected. Therefore, a secondary source of data is required against which comparisons can be made. Fortunately, we had at our disposal a list of 99 AAPM member-supplied PI numbers, which were obtained as part of an alternative approach to obtain the NIH funding data. By assessing how many of these PI IDs our algorithm could detect, we were able to obtain a measure of the false-negative rate. This was also repeated throughout the algorithm development, and the results were used to fine tune the algorithm.

3 Results

3.A. Accuracy of PI_ID Data

The data mining algorithm identified 554 PI_ID numbers associated with the AAPM since 1985, of which 272 were active members at the time of writing. Manual testing of this list of numbers as described above resulted in a false-positive rate of 4%. A binomial fit to these data results in a 95% confidence interval of 1–10%, with N = 100. Comparison with the member-supplied PI_ID numbers yielded a false-negative rate of 10%, with a 95% confidence interval of 5–18%, N = 99. Note that after testing, the 10 false negatives were added to the list of PI_IDs used to generate the rest of the data in this report, while the four false positives identified were removed, such that 560 PI_IDs were used to generate the results reported below.

Note that while the exportable NIH data extend back to 1985, the online records only extend back to 2000 — also, funding amount is only available from 2000 onwards. As such, data prior to the year 2000 cannot be easily manually tested for false positives. Also, recall that the AAPM data only extend back to the year 2002, so data prior to this cannot easily be tested for false negatives. Therefore, the above results can only be considered valid for data after the year 2002. Although we present data before this, we caution that data prior to 2002 should be interpreted with the above information in mind. Finally, the AAPM supplied data which was used to guide algorithm development and benchmark results was supplied between 2015 and 2016, meaning that the algorithm may be more accurate for later years than earlier years.

3.B. Analysis of funding data

In the following section, a top level analysis of NIH funding of members of the AAPM is presented. Firstly, some general points: Wherever a funding amount is shown, it is in USD, and no adjustment for inflation has been made unless otherwise stated. Where adjustment for inflation is made, it is to 2015 USD, based on USD using the Biomedical Research and Development Price Index.10 Where box plots are used, the “box” represents the 25th and 75th percentiles of the data, and points lying outside q3+1.5*(q3–q1) are classed as outliers, where “q” represents data quartiles. Although no error bars are shown on the plots, they should be interpreted bearing in mind the false-positive and -negative rates outlined above (4% and 10%, respectively).

Figure 2 shows the yearly expenditure of grants held by AAPM members each year. In this plot, a grant running for 3 years would have one data point in each year, corresponding to the “TOTAL_COST” column of the NIH data in that year. The red line indicates the mean value. In 2015, the mean expenditure each year was $447k. In the year 2000, the mean expenditure each year was $328k, which is equivalent to $522k in 2015 dollars, suggesting this metric has for the most part kept pace with inflation.

Details are in the caption following the image
Box plots of AAPM member grant amount each year. Note that two grants which were larger than 3.5 million are not shown in this figure, occurring in 2014 ($7.86 mill) and 2015 ($7 mill). [Color figure can be viewed at wileyonlinelibrary.com]

Figure 3 shows the age of members receiving funding in each year; a slightly increasing trend (~5 years age / 30 years) is arguably apparent. We note that there do appear to be some errors in this member-entered data. For instance, some grants were apparently awarded to PIs at the age of 14. In the data shown, any grants whose PI age was less than 18 have been discarded on the assumption that this is a data entry artifact.

Details are in the caption following the image
Box plots of the age of NIH funded members versus time. The line plot represents the mean. Note that some of the outliers in these data are a likely a result of incorrectly entered member data. [Color figure can be viewed at wileyonlinelibrary.com]

The number of grants awarded to members of the AAPM each year, as well as the number awarded to male and female members is shown in Fig. 4. In 2015, 11% of grants held by AAPM members were held by female members. In the same year, 22% of all AAPM members were female, suggesting that males are twice as likely to hold research funding compared to females.

Details are in the caption following the image
The total number of grants awarded to AAPM members each year, and total grants awarded to male and female PIs. [Color figure can be viewed at wileyonlinelibrary.com]

Figure 5 shows the total funding awarded to the AAPM, compared to the “funding pool”. This is defined as the total budget of the top four funding agencies providing research grants to members of the AAPM over all years, outlined in Fig. 7 and Table 6. These four funding agencies (NCI, NIBIB, NHLBI, and NINDS) represent approximately 81% of the total funding granted to members of the AAPM since such records were available. Some features of this graph can be explained by the political and economic climate at that time. In 2009 and 2010, a large spike appears in the total AAPM funds. This is due to the American Recovery and Reinvestment Act (ARRA).11 Note that the ARRA allocated funding is not included in the NIH budget, which explains why the NIH budget does not show the same trend during these years. In 2013, there is a sharp dip in the NIH funding pool. This is due to a budget sequestration.12 The mean funding levels of funding attracted by AAPM members between 2002 and 2015 is $120 million in 2015 USD, with a peak of $143 million in 2009 (this includes ARRA funding).

Details are in the caption following the image
Total funds allocated to members of the AAPM, and “funding pool” (defined as the total budget of the top four funding agencies for AAPM members — the National Cancer Institute, the National Heart, Lung, and Blood Institute, National Institute of Biomedical Engineering and Biomedical Imaging, and the National Institute of Neurological Diseases and Stroke (Fig. 7). Also shown is inflation adjusted funding in 2015 dollars. [Color figure can be viewed at wileyonlinelibrary.com]
Table 6. The extent to which different organizations awarded funding to AAPM members in 2015
Institute abbreviation Full name Number of AAPM member grants funded in 2015 (% total) Total funding of AAPM member grants in 2015 in USD millions (% total) 2015 budget (billions) Percent of budget allocated to AAPM member grants
NCI National Cancer Institute 124 (52%) 60.9 (53%) 4.95 1.2
NIBIB National Institute of Biomedical Imaging and Biomedical Engineering 60 (25%) 26.1 (23%) 0.3 8.7
NHLBI National Heart Lung and Blood Institute 19 (8%) 8.4 (7%) 2.9 0.7
NINDS National Institute of Neurological Disorders and Stroke 12 (5%) 3.6 (3%) 1.6 0.2
NIDDK National Institute of Diabetes and Digestive and Kidney Diseases 2 (1%) 0.6 (1%) 1.7 0.04
NIAMS National Institute of Arthritis and Musculoskeletal and Skin Diseases 3 (1%) 1.0 (1%) 0.5 0.2

To give a better insight into grant funds held by members of the AAPM as a percentage of the available funds, Fig. 6 shows the proportion of the “funding pool” that was allocated to members of the AAPM in each year. Again, the apparent spike in 2009 and 2010 is due to the ARRA funding, which does not appear in the formal NIH budget documents. Encouragingly, in 2015 the proportion of the funding pool awarded to AAPM members was the highest is has been since ARRA funding.

Details are in the caption following the image
Percentage of the available funding pool awarded to members of the AAPM each year. From 2006 multi-PI grants are also plotted, separated by cases where the AAPM member was the “contact” PI and when they were not (“multi-PI-other”). [Color figure can be viewed at wileyonlinelibrary.com]

Figure 7 shows the amount to which the various national institutes comprising the NIH have awarded grant funding to members of the AAPM. This graph shows funding data across all available years; the top four funding agencies have been the National Cancer Institute, the National Institute of Biomedical Engineering and Biomedical Engineering, the National Heart, Lung and Blood Institute, and the National Institute of neurological disorders and stroke. The budget for these four funding bodies is used to define the “funding pool” in Figs. 5 and 6. For information on the abbreviations and for the amount of funding awarded to AAPM members, see Table 6.

Details are in the caption following the image
Relative amount that different national institutes have awarded grants to members of the AAPM. For abbreviations, see Table 6: The extent to which different organizations awarded funding to AAPM. [Color figure can be viewed at wileyonlinelibrary.com]

Figure 8 shows the different grant types which AAPM members have been awarded; we see that funding has most prevalently been awarded through the R01 project grant mechanism. An explanation of the different funding types can be found in a previous study.13

Details are in the caption following the image
Funding mechanisms (grant types) for AAPM members. For explanation of abbreviations, see ref 12. [Color figure can be viewed at wileyonlinelibrary.com]

Finally, Fig. 9 shows the percentage of AAPM members who were a listed investigator on at least one NIH-funded research project each year (including PIs who were listed on grants where the contact PI was not an AAPM member). Also shown is the total membership of the AAPM. It can be seen that as the membership of the AAPM has grown, the proportion of members receiving NIH funding has decreased, implying that proportionally less member effort is being directed toward federally funded research projects.

Details are in the caption following the image
Percentage of AAPM members listed as an investigator on at least one NIH-funded grant (including grants in which a non-AAPM member was the primary PI) compared to the number of AAPM members. [Color figure can be viewed at wileyonlinelibrary.com]

4 Discussion

In this work, we have developed and tested a data mining algorithm that extracts PI_ID numbers from NIH records based on tests of name, institution, funding body, and temporal correlation. The utility of the algorithm has been demonstrated using AAPM membership data. We have tested the resultant PI_ID list for false positives and false negatives, obtaining rates of 4% and 10%, respectively. Using this PI_ID list, we have extracted an estimate of NIH funding granted to AAPM members going back to 1985, although the accuracy of the data is likely to be degraded for years prior to 2002. We then presented a top-level analysis of the resultant “AAPM member research grant funding database”. Importantly, the input data required to extract this information is quite simple. This means the code could be quickly adjusted to produce similar databases for other groups of medical scientists, whom would presumably also be interested in such data. The highlighted columns in Table 1 show the data which are needed to adapt this algorithm to other groups.

The data generated in this work are available to AAPM members through the AAPM website. There are a number of uses of this database. The information could facilitate the AAPM board, councils, working groups and task groups, and its members to:
  1. Provide general information about research activities
  2. Understand the magnitude and breadth of AAPM member research activities
  3. Determine trends of overall funding, and trends within specific research areas
  4. Identify funding opportunities where members have been successful in the past
  5. Track the success of AAPM members to specific research initiatives of the NIH
  6. Understand the demographics of successful researchers and identify and address areas with disparities
  7. Identify speakers for the AAPM annual meeting and chapter meetings
  8. Select reviewers for the AAPM annual meeting
  9. Lobby grant-funding bodies for increased consideration of medical physics applications
  10. Perform strategic planning
For non-AAPM members the database could allow:
  1. Media personnel to contact domain experts
  2. Grant funding agencies to seek appropriate reviewers
  3. Related societies (e.g., the American Institute of Physics) to solicit speakers for meetings

The approach we have taken appears novel with respect to analysis of NIH records. Although multiple authors have published on the overall NIH budget, we could not find any other publications where funding records associated with a given national (or international) cohort of investigators. A previous study14 attempted to extract information on radiation oncology funding based on the “department” field of the NIH records, identifying 26 medical physics grants in 2013. A significant limitation of this approach, as acknowledged by the authors, is that the “department” field is often left blank — for instance, in the year 2013, 47% of all NIH records had no recorded department.

There are several limitations of the current algorithm and initial results, and a variety of means by which the analysis could be improved. Probably the foremost of these is that the data used to develop the algorithm and the data used to detect false negatives were the same. It is important to note that the algorithm described in this paper is primarily logic driven rather than data driven. As such, the data used in the development phase will not have as strong an impact on the results as say, a machine learning approach. Nevertheless, it is likely that the true false-negative rate is somewhat higher than the 10% identified in this work, as there are probably failure points that have not been identified yet. However, identifying 89 of 99 member-supplied PI_IDs gives us confidence that the obtained results are representative of the truth. The detected false-negative rate of 4% is also encouraging. Although this was achieved after multiple iterations of manual curation of the data, we believe it is possible to maintain or improve this in future iterations. Regarding future improvements to the algorithm: there are a number of data fields we have not queried that could provide additional discriminatory evidence. These include the department name, study section name, and the grant title itself. However, given that the false-positive and -negative rates outlined in this work are fairly good, further improvements to the algorithm at this point, while certainly worthwhile, may be pursuing diminishing returns.

Regarding the use of the PI_ID number to extract grants: according to the NIH, the PI_ID is a unique number that remains constant throughout a PI's career.8 While we did not find any exceptions to the “uniqueness” claim, there are several instances in which one PI does have multiple PI_IDs. In this work, both PI_IDs were kept in such instances, although the records are flagged for review. These cases all occurred in the early NIH records before funding information was available (i.e., before year 2000). Another limitation is that in this work, once a PI_ID number is associated with a member of the AAPM, it stays that way for all time. So for instance, if an investigator at one point was an AAPM member before changing fields, all subsequent grants would be incorrectly attributed to membership of the AAPM. We suspect that such cases are rare, but it is an area that can be improved upon in future work. Finally, at the present time, the code cannot process names with nonstandard alphanumeric characters (such as umlauts and commas), which means a number of names are skipped each year. This is another clear area in which to improve, although it did not result in any detected false negatives in the current analysis.

The original motivation for this work was to capture data relating to NIH funding awarded to AAPM members as PIs. Figure 6 shows how much funding members of the AAPM are awarded each year as a proportion of total available funding. For the total funding, we have used the combined budget of the top four funding institutes associated with the AAPM research community (Fig. 8), since this should provide a more relevant baseline than the overall NIH budget. From 2000 to 2007, the proportion of funding captured by AAPM members was trending upward. Between 2007 and 2014, this proportion has either dropped or stayed about the same (depending on whether you include multi-PI grants on which AAPM members were not the contact PI). This analysis is complicated somewhat by the presence of the ARRA funding in 2009 and 2010, and the multi-PI grant model introduced in 2006. Regarding the latter, it can be seen that the multi-PI model is being utilized with increasing frequency by AAPM members, while the proportion of single-PI grants is steadily falling.

Our analysis shows that both the total amount of funding and the proportion of members receiving funding are low compared to historic levels, and that when inflation is taken into account, total member funding was lower in 2015 than 2003 (Figs. 5 and 9). The AAPM is hardly alone in suffering such effects; The NIH budget itself is not keeping pace with inflation, a fact that has attracted much discussion.15-17 Total and proportional funding of AAPM members did increase in 2014 and 2015 after hitting a 12-year low in 2013 (Figs. 5 and 6). Despite this, there is still cause for concern. Research drives scientific advances, and with science among the three pillars of the AAPM mission, the Association should consider the ramifications of its membership engaged less and less in research (Fig. 9). If science is to remain a pillar of the AAPM, then the Association should recognize the downward trend in research leadership among its membership and prioritize means by which to better promote and sustain research as a vital aspect of medical physics. One aspect of this is to ensure that scientists who perform research in our core areas find an appropriate home in the AAPM and are not lost to other organizations. Another is to ensure that the AAPM remains aligned with important and competitive research fields pertaining to physics in medicine, and not limit itself to traditional areas. Finally, it is important that medical physicists participate in the definition of NIH-funding priorities and its grant review process to stimulate funding opportunities for medical physics researchers and to recognize high-quality research grants in medical physics.

Another important finding of this work is that among AAPM members, males were found to be about twice as likely to hold research funding as females. This inequality is not unique to the AAPM, and the data evident in this analysis is representative of more widespread gender disparity in research grant funding. A previous study18 gives an excellent overview of issues surrounding women's application and success rates with the NIH funding. The root cause deserves further consideration — for example: are females less successful in the NIH review process, or do they not submit as many grants applications? The data in this work cannot definitively answer with respect to the AAPM membership, but based on other studies conducted in the field of biomedical research, it appears that the number of applications submitted by females is a major factor.18 The analysis therefore suggests that to address this inequality, the AAPM must consider developing mechanisms that better encourage and support female members applying for research funding.

It is important to note that this analysis captures only one source of research funding of AAPM members. Other important sources include federal funding agencies (NSF, DOE etc.) and industry research collaboration. The latter is likely to be particularly important, and particularly difficult to capture. It would appear that private industries invest approximately twice as much money as the NIH into biotechnology research,19, 20 however it not clear how much of this is outsourced as opposed to performed in-house. Capturing meaningful data in this regard is likely to prove challenging, but may be possible in part by parsing conflict of interest declarations required on journal publications and abstracts submitted to the AAPM Annual Meeting each year.

As a final point, AAPM members with interest in research and/or analytics pertaining to our Association should recognize the important role that they play in improving the accuracy of the data presented here. AAPM members who have at any point received research funding from the NIH should register their PI_ID within their AAPM member profile information. The PI_ID tag is the single most important identifier in correlating member data to NIH funding data. Note that one's PI_ID is not the same as one's eRA Commons name. To look up one's PI_ID, a person must login to their eRA Commons account, go to Personal Profile, then View “Name and ID”. The PI_ID can be added to one's AAPM member profile by logging into the AAPM website, going to one's member profile page, and click the link entitled “funding”.

In this work, we have presented an algorithm that can extract records associated with an input list of researchers. This resultant database is a sustainable resource that can be queried by the AAPM board, councils, working groups, task groups, and members to address a wide range of questions pertaining broadly to research funding and grantmanship. Future work could include analysis of specific subspecialties and organizations attracting funding as well as specific grant mechanisms (e.g., via the FOA Number) for which medical physicists are successful in securing funding. For example, if the AAPM member directory better codified “primary role” within member profiles (e.g., Clinical and Research, currently left to free text) in a manner similar to that currently done for “Specialty” (e.g., % Diagnostic Radiology, % Radiation Oncology, etc.), then analysis of extramural funding could be correlated with respect to such designation. The AAPM research database is available for AAPM members to download through the AAPM website and will be maintained by members of the Working group for the development of a research data base (WGDRD) on an annual basis.

Acknowledgments

The authors acknowledge all previous and current members of the Working Group for the Development of Research Database (WGDRD). In addition, we acknowledge Ivaylo Mihaylov (University of Miami Health System), Paul Kinahan (University of Washington), and Shayna Knazic, Farhana Khan, and Tammy Conquest (AAPM). The authors would also acknowledge financial support from the American Association of Physicist in Medicine (AAPM).

    Conflicts of interest

    The authors are members or employees of the AAPM, and financial support was provided by the AAPM.