Investigating item response theory model performance in the context of evaluating clinical outcome assessments in clinical trials.
| Title: | Investigating item response theory model performance in the context of evaluating clinical outcome assessments in clinical trials. |
|---|---|
| Authors: | Ayasse ND; Clinical Outcome Assessment Program, Critical Path Institute, Tucson, AZ, USA. cayasse@c-path.org.; Coon CD; Clinical Outcome Assessment Program, Critical Path Institute, Tucson, AZ, USA. |
| Source: | Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation [Qual Life Res] 2025 Apr; Vol. 34 (4), pp. 1125-1136. Date of Electronic Publication: 2024 Dec 12. |
| Publication Type: | Journal Article |
| Language: | English |
| Journal Info: | Publisher: Springer Netherlands Country of Publication: Netherlands NLM ID: 9210257 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1573-2649 (Electronic) Linking ISSN: 09629343 NLM ISO Abbreviation: Qual Life Res Subsets: MEDLINE |
| Imprint Name(s): | Publication: 2005- : Netherlands : Springer Netherlands; Original Publication: Oxford, UK : Rapid Communications of Oxford, Ltd, c1992- |
| MeSH Terms: | Psychometrics*/methods ; Outcome Assessment, Health Care*/methods ; Clinical Trials as Topic* ; Models, Statistical*; Humans ; Quality of Life |
| Abstract: | Purpose: Item response theory (IRT) models are an increasingly popular method choice for evaluating clinical outcome assessments (COAs) for use in clinical trials. Given common constraints in clinical trial design, such as limits on sample size and assessment lengths, the current study aimed to examine the appropriateness of commonly used polytomous IRT models, specifically the graded response model (GRM) and partial credit model (PCM), in the context of how they are frequently used for psychometric evaluation of COAs in clinical trials.; Methods: Data were simulated under varying sample sizes, measure lengths, response category numbers, and slope strengths, as well as under conditions that violated some model assumptions, namely, unidimensionality and equality of item slopes. Model fit, detection of item local dependence, and detection of item misfit were all examined to identify conditions where one model may be preferable or results may contain a degree of bias.; Results: For unidimensional item sets and equal item slopes, the PCM and GRM performed similarly, and GRM performance remained consistent as slope variability increased. For not-unidimensional item sets, the PCM was somewhat more sensitive to this unidimensionality violation. Looking across conditions, the PCM did not demonstrate a clear advantage over the GRM for small sample sizes or shorter measure lengths.; Conclusion: Overall, the GRM and the PCM each demonstrated advantages and disadvantages depending on underlying data conditions and the model outcome investigated. We recommend careful consideration of the known, or expected, data characteristics when choosing a model and interpreting its results.; (© 2024. The Author(s), under exclusive licence to Springer Nature Switzerland AG.) |
| Competing Interests: | Declarations. Competing interests: Authors N.D.A. and C.D.C. declare they have no known relevant financial or non-financial interests to disclose. |
| References: | Elwyn, G., Crowe, S., Fenton, M., Firkins, L., Versnel, J., Walker, S., Cook, I., Holgate, S., Higgins, B., & Gelder, C. (2010). Identifying and prioritizing uncertainties: Patient and clinician engagement in the identification of research questions. Journal of Evaluation in Clinical Practice, 16(3), 627–631. https://doi.org/10.1111/j.1365-2753.2009.01262.x. (PMID: 10.1111/j.1365-2753.2009.01262.x20482747); Fleurence, R. L., Forsythe, L. P., Lauer, M., Rotter, J., Ioannidis, J. P. A., Beal, A., Frank, L., & Selby, J. V. (2014). Engaging patients and stakeholders in research proposal review: The Patient-Centered Outcomes Research Institute. In Annals of Internal Medicine (Vol. 161, Issue 2, pp. 122–130). American College of Physicians. https://doi.org/10.7326/M13-2412.; Lloyd, K., & White, J. (2011). Lloyd and White, 2011, democratizing clinical research. Nature, 474, 277–278. https://doi.org/10.1038/474277a. (PMID: 10.1038/474277a21677725); Sacristán, J. A., Aguarón, A., Avendaño-Solá, C., Garrido, P., Carrión, J., Gutiérrez, A., Kroes, R., & Flores, A. (2016). Patient involvement in clinical research: Why, when, and how. Patient preference and adherence (Vol. 10, pp. 631–640). Dove Medical Press Ltd. https://doi.org/10.2147/PPA.S104259.; van der Scheer, L., Garcia, E., van der Laan, A. L., van der Burg, S., & Boenink, M. (2017). The benefits of patient involvement for Translational Research. Health Care Analysis, 25(3), 225–241. https://doi.org/10.1007/s10728-014-0289-0. (PMID: 10.1007/s10728-014-0289-025537464); Brundage, M., Blazeby, J., Revicki, D., Bass, B., De Vet, H., Duffy, H., Efficace, F., King, M., Lam, C. L. K., Moher, D., Scott, J., Sloan, J., Snyder, C., Yount, S., & Calvert, M. (2013). Patient-reported outcomes in randomized clinical trials: Development of ISOQOL reporting standards. Quality of Life Research, 22(6), 1161–1175. https://doi.org/10.1007/s11136-012-0252-1. (PMID: 10.1007/s11136-012-0252-122987144); Cappelleri, J. C., Lundy, J. J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clinical Therapeutics, 36(5), 648–662. https://doi.org/10.1016/j.clinthera.2014.04.006. (PMID: 10.1016/j.clinthera.2014.04.006248117534096146); Center for Drug Evaluation and Research (CDER), & Center for Biologics Evaluation and Research (CBER) (2020). Patient-Focused Drug Development: Collecting Comprehensive and Representative Input Guidance for Industry, Food and Drug Administration Staff, and Other Stakeholders. https://www.fda.gov/drugs/guidance-compliance-regulatory-information/guidances-drugsand/or.; Center for Drug Evaluation and Research (CDER), & Center for Biologics Evaluation and Research (CBER) (2022). Patient-Focused Drug Development: Methods to Identify What Is Important to Patients Guidance for Industry, Food and Drug Administration Staff, and Other Stakeholders. https://www.fda.gov/drugs/guidance-compliance-regulatory-information/guidances-drugsand/or.; Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), & Center for Devices and Radiological Health (CDRH) (2022). Patient-Focused Drug Development: Selecting, Developing, or Modifying Fit-for-Purpose Clinical Outcome Assessments Guidance for Industry, Food and Drug Administration Staff, and Other Stakeholders. https://www.fda.gov/vaccines-blood-biologics/guidance-compliance-regulatory-information-biologics/biologics-guidances.; Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), & Center for Devices and Radiological Health (CDRH) (2023). Patient-Focused Drug Development: Incorporating Clinical Outcome Assessments Into Endpoints For Regulatory Decision-Making Guidance for Industry, Food and Drug Administration Staff, and Other Stakeholders. https://www.fda.gov/drugs/guidance-compliance-regulatory-information/guidances-drugsand/or.; Gnanasakthy, A., Mordin, M., Clark, M., Demuro, C., Fehnel, S., & Copley-Merriman, C. (2012). A review of patient-reported outcome labels in the United States: 2006 to 2010. Value in Health, 15(3), 437–442. https://doi.org/10.1016/j.jval.2011.11.032. (PMID: 10.1016/j.jval.2011.11.03222583453); Gnanasakthy, A., Barrett, A., Evans, E., D’Alessio, D., Romano, C., & (De, M. (2019). A review of patient-reported outcomes labeling for Oncology drugs approved by the FDA and the EMA (2012–2016). Value in Health, 22(2), 203–209. https://doi.org/10.1016/j.jval.2018.09.2842.; Mercieca-Bebber, R., King, M. T., Calvert, M. J., Stockler, M. R., & Friedlander, M. (2018). The importance of patient-reported outcomes in clinical trials and strategies for future optimization. Patient Related Outcome Measures, 9, 353–367. https://doi.org/10.2147/prom.s156279. (PMID: 10.2147/prom.s156279304646666219423); Petrillo, J., Cano, S. J., McLeod, L. D., & Coon, C. D. (2015). Using classical test theory, item response theory, and rasch measurement theory to evaluate patient-reported outcome measures: A comparison of worked examples. Value in Health, 18(1), 25–34. https://doi.org/10.1016/j.jval.2014.10.005. (PMID: 10.1016/j.jval.2014.10.00525595231); Stover, A. M., McLeod, L. D., Langer, M. M., Chen, W. H., & Reeve, B. B. (2019). State of the psychometric methods: Patient-reported outcome measure development and refinement using item response theory. Journal of Patient-Reported Outcomes, 3(1). https://doi.org/10.1186/s41687-019-0130-5.; Wilson, I. B., & Cleary, P. D. (1995). Linking clinical variables with health-related quality of life: A conceptual model of patient outcomes. Journal of the American Medical Association, 273(1), 59–65. https://jamanetwork.com/. (PMID: 10.1001/jama.1995.035202500750377996652); Turner, R. R., Quittner, A. L., Parasuraman, B. M., Kallich, J. D., & Cleeland, C. S. (2007). Patient-reported outcomes: Instrument development and selection issues. Value in Health, 10(SUPPL. 2). https://doi.org/10.1111/j.1524-4733.2007.00271.x.; McDonald, R. P. (1999). Test theory: A unified treatment. Lawrence Erlbaum Associates, Inc.; Thissen, D., & Wainer, H. (Eds.). (n.d.-u). Test scoring. Lawrence Erlbaum Associates, Inc., Publishers.; Frost, M. H., Reeve, B. B., Liepa, A. M., Stauffer, J. W., Hays, R. D., & Sloan, J. A. (2007). What is sufficient evidence for the reliability and validity of patient-reported outcome measures? Value in Health, 10(SUPPL. 2). https://doi.org/10.1111/j.1524-4733.2007.00272.x.; Morga, A., Dibenedetto, S., Adiutori, R., & Su, J. (2023). Patient-reported outcomes validated in phase 3 clinical trials: A targeted literature review. Current Medical Research and Opinion (Vol. 39, pp. 955–962). Taylor and Francis Ltd. https://doi.org/10.1080/03007995.2023.2224164.; Dai, S., Vo, T. T., Kehinde, O. J., He, H., Xue, Y., Demir, C., & Wang, X. (2021). Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data. Frontiers in Education, 6. https://doi.org/10.3389/feduc.2021.721963.; Doostfatemeh, M., Taghi Ayatollah, S. M., & Jafari, P. (2016). Power and Sample Size Calculations in clinical trials with patient-reported outcomes under equal and unequal Group sizes based on graded response model: A Simulation Study. Value in Health, 19(5), 639–647. https://doi.org/10.1016/j.jval.2016.03.1857. (PMID: 10.1016/j.jval.2016.03.185727565281); Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133–144. https://doi.org/10.1111/j.1745-3984.1990.tb00738.x. (PMID: 10.1111/j.1745-3984.1990.tb00738.x); Reeve, B. B., Hays, R. D., Chang, C. H., & Perfetto, E. M. (2007). Applying item response theory to enhance health outcomes assessment. Quality of Life Research, 16(SUPPL. 1), 1–3. https://doi.org/10.1007/s11136-007-9220-6.; Mercieca-Bebber, R., Palmer, M. J., Brundage, M., Calvert, M., Stockler, M. R., & King, M. T. (2016). Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: A systematic review. British Medical Journal Open, 6(6). https://doi.org/10.1136/bmjopen-2015.; Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement. https://doi.org/10.1002/j.2333-8504.1968.tb00153.x. (PMID: 10.1002/j.2333-8504.1968.tb00153.x); Masters, G. N. (1982). A Rasch model for partial credit scoring. PSYCHOMETRIKA, 47(2).; Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.; Andrich, D. (2004). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, I7–I16.; Masters, G. N. (2016). Partial Credit Model. In Handbook of Item Response Theory (pp. 137–154). Chapman and Hall/CRC. https://doi.org/10.1201/9781315374512.; Nguyen, T. H., Han, H. R., Kim, M. T., & Chan, K. S. (2014). An introduction to item response theory for patient-reported outcome measurement. The Patient – Patient-Centered Outcomes Research, 7, 23–35. https://doi.org/10.1007/s40271-013-0041-0. (PMID: 10.1007/s40271-013-0041-024403095); DeMars, C. (2010). Assumptions. In Item response theory (pp. 38–60). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195377033.003.0003.; Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–l76. https://doi.org/10.1177/014662169201600206. (PMID: 10.1177/014662169201600206); Muraki, E., & Muraki, M. (2016). Generalized partial credit model. In Handbook of item response theory (pp. 127–137). Chapman and Hall/CRC. https://doi.org/10.1201/9781315374512.; R Core Team. (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/.; Chalmers, R. P. (2012). Mirt: A Multidimensional Item Response Theory Package for the R environment. Journal of Statistical Software, 48(6). https://doi.org/10.18637/jss.v048.i06.; Cai, L., & Monroe, S. (2014). A New Statistic for Evaluating Item Response Theory Models for Ordinal Data. CRESST Report 839.; Tucker, L., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10. https://doi.org/10.1007/BF02291170. (PMID: 10.1007/BF02291170); Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118. (PMID: 10.1080/10705519909540118); Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. In K. A. Bollen, & J. S. Long (Eds.), Testing structural equation models (pp. 136–162). Sage.; Maccallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample Size in Factor Analysis. In Psychological Methods (Vol. 4, Issue 1).; Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.3102/10769986022003265. (PMID: 10.3102/10769986022003265); Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. https://doi.org/10.1177/01466216000241003. (PMID: 10.1177/01466216000241003); Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S - X2: An Item Fit Index for Use with Dichotomous Item Response Theory models. Applied Psychological Measurement, 27(4), 289–298. https://doi.org/10.1177/0146621603027004004. (PMID: 10.1177/0146621603027004004); Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. https://doi.org/10.1037//0033-2909.112.1.155.; Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587.; Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573. (PMID: 10.1007/BF02293814); Darrell Bock, R. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51. (PMID: 10.1007/BF02291411) |
| Contributed Indexing: | Keywords: Clinical outcome assessments (COAs); Clinical trials; Item response theory; Patient-reported outcomes (PROs); Psychometric validation |
| Entry Date(s): | Date Created: 20241212 Date Completed: 20250409 Latest Revision: 20250409 |
| Update Code: | 20260130 |
| DOI: | 10.1007/s11136-024-03873-z |
| PMID: | 39666253 |
| Database: | MEDLINE |
Journal Article