Medicine:NIH Toolbox

From HandWiki

The NIH Toolbox®, for the assessment of neurological and behavioral function, is a multidimensional set of brief royalty-free measures that researchers and clinicians can use to assess cognitive, sensory, motor and emotional function in people ages 3–85. This suite of measures can be administered to study participants in two hours or less, in a variety of settings, with a particular emphasis on measuring outcomes in longitudinal epidemiologic studies and prevention or intervention trials. The battery has been normed and validated across the lifespan in subjects age 3-85 and its use ensures that assessment methods and results can be used for comparisons across existing and future studies. The NIH Toolbox is capable of monitoring neurological and behavioral function over time, and measuring key constructs across developmental stages.[1][2]


In 2004, the 15 Institutes, Centers and Offices at the National Institutes of Health which support neuroscience research formed a coalition called the NIH Blueprint for Neuroscience Research.[3] The NIH Blueprint goal is to develop new tools, resources, and training opportunities to accelerate the pace of discovery in neuroscience research. Because the research community had long sought the development of standard instruments to measure cognitive and emotional health, in 2006 the NIH Blueprint awarded a contract to develop an innovative approach to meet this need. Under the leadership of principal investigator Richard C. Gershon, a team of more than 300 scientists from nearly 100 academic institutions were charged with developing a set of tools to enhance data collection in large cohort studies and to advance the neurobehavioral research enterprise.[4][5]

Test batteries

The NIH Toolbox divides tests into four aspects of neural function, called "domain batteries":

  • Cognition
  • Sensation
  • Motor
  • Emotion

Impact on neurological research

Prior to the NIH Toolbox, there were many studies that collected information on aspects of neural function with little uniformity among the measures used to capture these constructs.[6] Moreover, capturing information on all four domains within a study would be costly in terms of time and subject burden. Custom measures could not easily be compared across studies, and assessments were typically limited to looking at cognitive variables. Expensive equipment and per-subject royalty fees were often required. Time-consuming measures usually required highly trained administrators.

With the NIH Toolbox, researchers can assess function using a common metric and can “crosswalk” among measures, supporting the pooling and sharing of large data sets. The NIH Toolbox will support scientific discovery by bringing a common language to research questions – both with respect to the primary study aims and to those arising from secondary data analyses. The four batteries provide researchers with measures that have minimal subject burden and cost.[4] The NIH Toolbox battery of measures will be used by The Human Connectome Project (HCP) to understand the relationship between brain connectivity and behavior,[7] Standardized measures are easily compared across studies. Measures are validated against “gold standard” instruments and easily incorporate multiple areas of neurological functioning. NIH Toolbox requires inexpensive equipment, no royalties, low per-subject costs (per-subject costs limited to taste and olfaction assessments). NIH Toolbox offers brief, psychometrically sound measures that can be administered with minimal expertise.

Selection of domains and sub-domains

Initial literature and database reviews and a Request for Information of NIH-funded researchers identified the sub-domains for inclusion in the NIH Toolbox, existing measures relevant to the project goals, and criteria for instrument selection. NIH Project Team members, external content experts, and contract scientists met at a follow-up consensus meeting to discuss potential sub-domains along with the criteria affecting instrument selection, creation, and norming. Additional expert interviews were undertaken to gather more detailed information from clinical and scientific experts to help further refine the list of possible sub-domains. A second consensus group meeting was held and results directed the selection of the sub-domains within each core domain area to be measured in the final NIH Toolbox.

Selection of measures

More than 1,400 existing measures were identified and evaluated for inclusion in the NIH Toolbox. The selection criteria included a measure’s applicability across the life span, psychometric soundness, brevity, ease of use, applicability in diverse settings and with different groups, and lack of intellectual property constraints. There was also a preference for instruments that were already validated and normed for use with individuals between 3 and 85 years old. Results of the instrument selection process greatly facilitated the drafting of plans to develop the NIH Toolbox measures.


Validation studies were conducted for all NIH Toolbox measures, to assure that these important tools for research met rigorous scientific standards. Studies were conducted across the entire age range, typically included 450-500 subjects, and were statistically compared against “gold standard” measures wherever available.[8][9] For tests using item response theory approaches to scoring, calibration samples generally included several thousand participants, ensuring robust models. In total, data was collected from more than 16,000 subjects as part of field-test, calibration and validation activities.[10]


NIH Toolbox conducted a national standardization study in both English and Spanish languages to allow for normative comparisons on each assessment. A sample of 4,859 participants, ages 3–85 – representative of the U.S. population based on gender, race/ethnicity, and socioeconomic status – was administered all of the NIH Toolbox measures at sites around the country. NIH Toolbox normative scores are now available for each year of age from 3 through 17, as well as for ages 18–29, 30-39, 40-49, 50-59, 60-69, 70-79, and 80-85, allowing for targeted, accurate comparisons for any research study participant groups against the U.S. population.

Advanced measurement techniques

The NIH Toolbox measures utilize several advanced approaches in item development, test construction, and scoring. Two of these are item response theory and computer adaptive testing (CAT).[11][12] Item Response Theory allows tests to be brief, yet still precise and valid.[13] Using IRT methodology, sets of items are calibrated along a continuum that covers the full range of the construct to be measured. This calibrated set of items enables the creation of CAT, a specialized type of computer-based testing that enables frequent assessments and immediate feedback with minimal burden on participants and precise evaluation at the individual level.[11]

Early childhood use

NIH Toolbox measure development focused special attention on assessing young children,[14] to ensure that all tests given are developmentally appropriate for ages 3–7. A special team of early childhood assessment consultants was engaged to provide testing guidelines for the very young, to offer input on measure development, and to review all NIH Toolbox measures to ensure they fit the needs of young children.[15][16]

Assessment Center

NIH Toolbox measures are administered using Assessment Center, a free, browser-based research management software application where users can access, practice, and then administer NIH Toolbox measures. Assessment Center enables researchers to create study-specific websites for capturing participant data securely. Studies can include measures within the Assessment Center library as well as custom measures created or entered by the researcher.[12][17][18]

As Assessment Center is no longer available, the toolbox has transitioned into an iPad app, available via the App Store. The iPad version of the toolbox requires a years subscription.

See also


  1. Gershon RC, Cella D, Fox NA, Havlik RJ, Hendrie HC, Wagster MV. Assessment of neurological and behavioural function: the NIH Toolbox. Lancet Neurol. 2010;9(2):138-139.
  2. Neurology, March 2013; 80(11 Supplement 3)
  4. 4.0 4.1 National Institutes of Health.NIH Toolbox is open. A new set of tools to help scientists measure the ways we think, move, feel and sense the world is ready for use in studies.... NIH Record newsletter, October 26th 2012
  5. Talan, Jamie. New NIH Toolbox Rolled Out for Standardized Behavioral and Clinical Assessment Measures.Neurology Today. 2012; 12(21):7
  6. Pilkonis PA, Choi SW, Salsman JM, et al. Assessment of self-reported negative affect in the NIH Toolbox. Psychiatry Res. 2012 Epub ahead of print.
  7. "Components of the Human Connectome Project - Behavioral Testing - Connectome". 
  8. Zelazo, PD, Anderson, JE, Richler, J, et al. NIH Toolbox Cognition Battery (CB): Measuring executive function and attention. Mono. Soc. Res. Child Dev. 2012;78(4):16-33.
  9. Zelazo, PD, Anderson, JE, Richler, J, et al. NIH Toolbox Cognition Battery (CB): Validation of executive function measures in adults. J. Int. Neuropsychol. Soc. 2013;20(6):620-629.
  10. Rine R, Roberts D, Corbin BA, et al. A new portable tool to screen vestibular and visual function in children and adults: NIH Toolbox. J. Rehabil. Res. Dev. 2012;49(2):209-220.
  11. 11.0 11.1 Gershon R. Understanding Rasch Measurement: Computer Adaptive Testing. J. Appl. Meas. 2005;6(1):109-127.
  12. 12.0 12.1 Gershon RC, Cook K. Use of Computer Adaptive Testing in the Development of Machine Learning Algorithms. Pain Med. 2011;12(10):1450-1452.
  13. Cella D, Chang CH, Heinemann AW. Item Response Theory (IRT): Applications in quality of life measurement, analysis, and interpretation. In: Mesbah M, Cole B, Lee MLT, eds. Statistical Methods for Quality of Life Studies: Design, Measurements, and Analysis. Boston, MA: Kluwer Academic Publishers; 2002:169-186.
  14. Dalton P, Mennella JA, Cowart BJ, Maute C, Pribitkin EA, Reilly JS. Evaluating the Prevalence of Olfactory Dysfunction in a Pediatric Population. Ann. N. Y. Acad. Sci. 2009;1170(1):537-542.
  15. Bauer P, Leventon J, Varga N. Neuropsychological Assessment of Memory in Preschoolers. Neuropsychol. Rev. 2012;Epub ahead of print
  16. McClelland MM, Cameron CE. Self-regulation and academic achievement in elementary school children. New Dir. Child Adolesc. Dev. 2011;2011(133):29-44.
  17. Gershon R, Rothrock NE, Hanrahan RT, Jansky LJ, Harniss M, Riley W. The development of a clinical outcomes survey research application: Assessment CenterSM. Qual. Life Res. 2010;19(5):677-685.
  18. Gershon R, Cella D, Rothrock N, Hanrahan RT, Bass M. The Use of PROMIS and Assessment Center to Deliver Patient-Reported Outcome Measures in Clinical Research. J. Appl. Meas. 2010;11(3):304-314.

Further reading

External links