Quantitative studies

Click on the links below for information about specific critical appraisal tools for the types of studies listed.

 

Randomized controlled trial (RCT)

Cochrane Risk of Bias (RoB)

Study designsRCT
Number of items6 domains of bias and 7 items (selection bias, performance bias, detection bias, attrition bias, reporting bias, and other bias).
RatingYes, no, unclear
ValidityTool developed by the Cochrane Collaboration’s methods groups that gathered a group of 16 experts (statisticians, epidemiologists and review authors) and used informal consensus and email iterations.

Studies on concurrent validity:
• Hartling, L., Ospina, M., Liang, Y., Dryden, D. M., Hooton, N., Krebs Seida, J., et al. (2009). Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. British Medical Journal, 339, b4012.
• Armijo-Olivo, S., Stiles, C. R., Hagen, N. A., Biondo, P. D., & Cummings, G. G. (2012). Assessment of study quality for systematic reviews: a comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: methodological research. Journal of Evaluation in Clinical Practice, 18(1), 12-18.
ReliabilityStudies on interrater reliability:
• Armijo-Olivo, S., Stiles, C. R., Hagen, N. A., Biondo, P. D., & Cummings, G. G. (2012). Assessment of study quality for systematic reviews: a comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: methodological research. Journal of Evaluation in Clinical Practice, 18(1), 12-18.
• Armijo-Olivo, S., Ospina, M., da Costa, B. R., Egger, M., Saltaji, H., Fuentes, J., et al. (2014). Poor Reliability between Cochrane Reviewers and Blinded External Reviewers When Applying the Cochrane Risk of Bias Tool in Physical Therapy Trials. PLoS ONE, 9(5), e96920.
• Hartling, L., Hamm, M. P., Milne, A., Vandermeer, B., Santaguida, P. L., Ansari, M., et al. (2013a). Testing the Risk of Bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs. Journal of Clinical Epidemiology, 66(9), 973-981.
• Vale, C. L., Tierney, J. F., & Burdett, S. (2013). Can trial quality be reliably assessed from published reports of cancer trials: evaluation of risk of bias assessments in systematic reviews. British Medical Journal, 346, f1798.
Other informationhttps://www.riskofbias.info/welcome/rob-2-0-tool
Main referencesSterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, Cates CJ, Cheng H-Y, Corbett MS, Eldridge SM, Hernán MA, Hopewell S, Hróbjartsson A, Junqueira DR, Jüni P, Kirkham JJ, Lasserson T, Li T, McAleenan A, Reeves BC, Shepperd S, Shrier I, Stewart LA, Tilling K, White IR, Whiting PF, Higgins JPT. (2019). RoB 2: a revised tool for assessing risk of bias in randomised trials. British Medical Journal, 366, l4898.

PEDro

Study designsRCT
Number of items11
RatingYes, no
ValidityTool adapted from the Delphi List tool.

Studies on validity:
• Armijo-Olivo, S., da Costa, B. R., Cummings, G. G., Ha, C., Fuentes, J., Saltaji, H., & Egger, M. (2015). PEDro or Cochrane to Assess the Quality of Clinical Trials? A Meta-Epidemiological Study. PloS one, 10(7), e0132634-e0132634.
• Aubut, J.-A. L., Marshall, S., Bayley, M., & Teasell, R. W. (2013). A comparison of the PEDro and Downs and Black quality assessment tools using the acquired brain injury intervention literature. NeuroRehabilitation, 32(1), 95-102.
• Bhogal, S. K., Teasell, R. W., Foley, N. C., & Speechley, M. R. (2005). The PEDro scale provides a more comprehensive measure of methodological quality than the Jadad Scale in stroke rehabilitation literature. Journal of Clinical Epidemiology, 58(7), 668-673.
• de Morton, N. A. (2009). The PEDro scale is a valid measure of the methodological quality of clinical trials: a demographic study. Australian Journal of Physiotherapy, 55(2), 129-133.
• Yamato, T. P., Maher, C., Koes, B., & Moseley, A. (2017). The PEDro scale had acceptably high convergent validity, construct validity, and interrater reliability in evaluating methodological quality of pharmaceutical trials. Journal of Clinical Epidemiology, 86, 176-181.
ReliabilityStudies on interrater reliability:
• Foley, N. C., Bhogal, S. K., Teasell, R. W., Bureau, Y., & Speechley, M. R. (2006). Estimates of quality and reliability with the physiotherapy evidence-based database scale to assess the methodology of randomized controlled trials of pharmacological and nonpharmacological interventions. Physical Therapy, 86(6), 817-824.
• Maher, C. G., Sherrington, C., Herbert, R. D., Moseley, A. M., & Elkins, M. (2003). Reliability of the PEDro Scale for Rating Quality of Randomized Controlled Trials. Physical Therapy, 83(8), 713-721.
• Moseley, A., Sherrington, C., Herbert, R. and Maher, C. (1999). Reliability of a scale for measuring the methodological quality of clinical trials. Proceedings of the Cochrane Colloquium, Rome, October 1999.
• Yamato, T. P., Maher, C., Koes, B., & Moseley, A. (2017). The PEDro scale had acceptably high convergent validity, construct validity, and interrater reliability in evaluating methodological quality of pharmaceutical trials. Journal of Clinical Epidemiology, 86, 176-181.
Other informationhttps://www.pedro.org.au/english/downloads/pedro-scale/
Main referencesSherrington, C., Herbert, R., Maher, C., & Moseley, A. (2000). PEDro. A database of randomized trials and systematic reviews in physiotherapy. Manual Therapy, 5(4), 223-226.
 

Non-randomized studies

ROBINS-I (Risk Of Bias In Non-randomised Studies – of Interventions)

Study designsNon randomized studies of interventions
Number of items34 signalling questions on 7 domains of bias (confounding, selection of participants into the study, classification of the interventions, deviations from intended interventions, missing data, measurement of outcomes, and selection of the reported result).
Ratingyes, probably yes, no, probably no, no information
ValidityTool developed over expert consensus meetings of the Cochrane Review group. The preliminary version was piloted within the working groups (Sterne et al., 2016).
ReliabilityStudies on interrater reliability:
• Couto, E., Pike, E., Torkilseng, E. B., & Klemp, M. (2015). Inter-rater reliability of the Risk Of Bias Assessment Tool: for Non-Randomized Studies of Interventions (ACROBAT-NRSI). Paper presented at the 2015 Cochrane Colloquium Vienna.
• Losilla, J.-M., Oliveras, I., Marin-Garcia, J. A., & Vives, J. (2018). Three risk of bias tools lead to opposite conclusions in observational research synthesis. Journal of Clinical Epidemiology, 101, 61-72.
Other informationhttps://www.riskofbias.info/welcome/home
Main referencesSterne, J. A., Hernán, M. A., Reeves, B. C., Savović, J., Berkman, N. D., Viswanathan, M., et al. (2016). ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. British Medical Journal, 355(i4919).

ROBANS (Risk of Bias Assessment tool for Non-randomized Studies)

Study designsNon randomized studies of interventions
Number of items6 domains for risk of bias.
Ratinglow, high, unclear risk of bias
ValidityTool developed from a literature reviews and advice from experts. Correlations with another tool (MINORS), effect size, conflicts of interest, funding sources, and journal impact factors were calculated. Also, 8 experts completed a 7-point Likert scale survey (measuring the discrimination power, number of response options, existence of redundant items, need for subjective decisions, wide applicability, presence of adequate instructions, clarity and simplicity, and comprehensiveness) (Kim et al., 2013).
ReliabilityThree raters appraised 39 studies. The agreement ranged from fair (k=0.35) to substantial (k=0.74) (Kim et al., 2013).
Other information
Main referencesKim, S. Y., Park, J. E., Lee, Y. J., Seo, H.-J., Sheen, S.-S., Hahn, S., et al. (2013). Testing a tool for assessing the risk of bias for nonrandomized studies showed moderate reliability and promising validity. Journal of clinical epidemiology, 66(4), 408-414.

EPHPP (Effective Public Health Practice Project quality assessment tool)

Study designsTool for appraising different designs of intervention studies for public health services.
Number of items20 questions on 8 categories (selection bias, study design, confounders, blinding, data collection and methods, withdrawals and drop-outs, intervention integrity, analysis).
Ratingdifferent scales
ValidityTool developed from a review of available instruments, feedback from 6 experts and comparison with another instrument (Thomas et al 2004).

ReliabilityStudies on interrater reliability:
• Armijo‐Olivo, S., Stiles, C. R., Hagen, N. A., Biondo, P. D., & Cummings, G. G. (2012). Assessment of study quality for systematic reviews: a comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: methodological research. Journal of Evaluation in Clinical Practice, 18(1), 12-18.
Other informationhttps://merst.ca/ephpp/
Main references• Thomas, B., Ciliska, D., Dobbins, M., & Micucci, S. (2004). A process for systematically reviewing the literature: providing the research evidence for public health nursing interventions. Worldviews on Evidence‐Based Nursing, 1(3), 176-184.
• Thomas, H. (2003). Quality assessment tool for quantitative studies. Effective Public Health Practice Project. McMaster University, Toronto.

DIAD (Design and Implementation Assessment Device)

Study designsIntervention studies
Number of items4 global questions, 8 composites questions and 32-34 design and implementation questions.
Ratingdifferent scales
ValidityA preliminary version was commented by 14 research methodologists (Valentine and Cooper, 2008). Input on the tool was also sought during a public meeting and through the web.
ReliabilityFive raters participated in a pilot study and 12 studies were appraised (Valentine and Cooper, 2008). The results were: 47% of ratings were in complete agreement, 28% were in good agreement, 13% were considered disagreements, and 12% of the ratings were categorized as bad disagreements.
Other information
Main referencesValentine, J. C., & Cooper, H. (2008). A systematic and transparent approach for assessing the methodological quality of intervention effectiveness research: The Study Design and Implementation Assessment Device (Study DIAD). Psychological Methods, 13(2), 130-149.

SAQOR (Systematic Appraisal of Quality for Observational Research)

Study designsObservational studies
Number of items19 on 5 categories (sample, control/comparison group, quality of measurement(s) and outcome(s), follow-up, and distorting influences).
Ratingyes, no, unclear, NA
ValiditySAQOR was adapted from existing tools in consultations with advisory committee members and experts in epidemiology and the literature on observational studies. The tool was revised and adjusted based on feasibility testing with several studies selected at random (Ross et al., 2011).
ReliabilityTwo raters appraised 82 studies. The authors mentioned that a research team not involved in the tool development assessed inter-rater reliability and over 80% agreement was achieved (Ross et al., 2011).
Other information
Main referencesRoss, L., Grigoriadis, S., Mamisashvili, L., Koren, G., Steiner, M., Dennis, C. L., et al. (2011). Quality assessment of observational studies in psychiatry: an example from perinatal psychiatric research. International Journal of Methods in Psychiatric Research, 20(4), 224-234.

EAI (Epidemiological Appraisal Instrument)

Study designsTool for epidemiological studies including cohort (prospective and retrospective), intervention (randomized and non-randomized), case-control, cross-sectional and hybrid (e.g. nested case-control).
Number of items43 items on five categories (reporting, subject/record selection, measurement quality, data analysis, generalization of results).
Ratingyes (2), partial (1), no or unable to determine (0), not applicable
ValidityTool developed from epidemiological principles and existing checklists. The pilot version was discussed in several meeting among the research team over a period of six months. The members of the research team evaluated two articles each (degree of agreement=59%) and further refined the tool and instructions. The EAI testing demonstrated comparable results to data obtained from the Downs and Black (1998) checklist (Genaidy et al., 2007).
Reliability25 students were asked to appraise one paper with EAI. The degree of agreement for each rater was calculated with regard to the team leader (an expert in epidemiology). The average overall degree of agreement was 59% and Spearman correlation coefficient were respectively of 59% and 0.66. Also, the internal consistency was calculated for each scale and compared to those found in the first part of the pilot study. Two raters appraised 15 papers and weighted Kappa values range from 0.80 to 1.00 (Genaidy et al., 2007).
Other informationN/A
Main referencesGenaidy, A., Lemasters, G., Lockey, J., Succop, P., Deddens, J., Sobeih, T., et al. (2007). An epidemiological appraisal instrument–a tool for evaluation of epidemiological studies. Ergonomics, 50(6), 920-960.

QUIPS (Quality In Prognosis Studies tool)

Study designsPrognosis studies
Number of items6 bias domains
Ratingyes, partly, no, unsure
ValidityFourteen working group members, including epidemiologists, statisticians, and clinicians, collaborated in tool development through a modified Delphi approach and nominal group techniques. The tool was discussed and refined during two in-person workshops. Forty-three research teams provided feedback on the QUIPS through a structured Web-based survey (Hayden et al. 2013)
ReliabilityThe interrater agreement was reported by 9 review teams on 205 studies and varied between 70% and 89.5% (median, 83.5%). The kappa statistic for independent rating of QUIPS items was reported by 9 review teams on 159 studies and varied from 0.56 to 0.82 (median, 0.75) (Hayden et al., 2013).
Other informationN/A
Main referencesHayden, J. A., van der Windt, D. A., Cartwright, J. L., Côté, P., & Bombardier, C. (2013). Assessing bias in studies of prognostic factors. Annals of internal medicine, 158(4), 280-286.

Q-Coch (Quality of cohort studies)

Study designsCohort studies
Number of items26 items and 7 inferences on 7 domains (representativeness, comparability of the groups at the beginning of the study, quality of the exposure measure, maintenance of the comparability during the follow-up time, quality of the outcome measure, attrition, and statistical analyses).
Ratingdifferent scales
ValidityTool developed from a systematic review of CATs for NSR. The pilot version was applied to 3 studies by 3 raters. The agreement between raters on the global quality and external ratings was moderate (k=0.41). They found an inverse association between the external ratings and the number of domains (Jarde et al., 2013).
ReliabilityThree raters appraise 21 articles and the agreement ranged from fair to substantial (k=0.60 to 0.87) (Jarde et al., 2013).

Other reliability studies:
• Losilla, J.-M., Oliveras, I., Marin-Garcia, J. A., & Vives, J. (2018). Three risk of bias tools lead to opposite conclusions in observational research synthesis. Journal of clinical epidemiology, 101, 61-72.
Other informationN/A
Main referencesJarde, A., Losilla, J.-M., Vives, J., & Rodrigo, M. F. (2013). Q-Coh: a tool to screen the methodological quality of cohort studies in systematic reviews and meta-analyses. International Journal of Clinical and Health Psychology, 13(2), 138-146.

NOS (Newcastle Ottawa Scale)

Study designsCase-control and cohort studies
Number of items8 items for case control studies and 8 items for cohort studies
RatingDifferent scales
ValidityThis tool was developed from a collaboration between the Universities of Newcastle, Australia and Ottawa, Canada. The clarity and completeness of the items were reviewed by experts in the field (Wells et al 2000).

Studies on validity:
• Cook, D. A., & Reed, D. A. (2015). Appraising the Quality of Medical Education Research Methods: The Medical Education Research Study Quality Instrument and the Newcastle–Ottawa Scale-Education. Academic Medicine, 90(8), 1067-1076.
• Lo, C. K.-L., Mertz, D., & Loeb, M. (2014). Newcastle-Ottawa Scale: comparing reviewers’ to authors’ assessments. BMC Medical Research Methodology, 14(1), 1.
• Stang, A. (2010). Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. European Journal of Epidemiology, 25(9), 603-605.
ReliabilityStudies on reliability:
• Cook, D. A., & Reed, D. A. (2015). Appraising the Quality of Medical Education Research Methods: The Medical Education Research Study Quality Instrument and the Newcastle–Ottawa Scale-Education. Academic Medicine, 90(8), 1067-1076.
• Hartling, L., Milne, A., Hamm, M. P., Vandermeer, B., Ansari, M., Tsertsvadze, A., & Dryden, D. M. (2013). Testing the Newcastle Ottawa Scale showed low reliability between individual reviewers. Journal of Clinical Epidemiology, 66(9), 982-993.
• Lo, C. K.-L., Mertz, D., & Loeb, M. (2014). Newcastle-Ottawa Scale: comparing reviewers’ to authors’ assessments. BMC Medical Research Methodology, 14(1), 1.
• Margulis, A. V., Pladevall, M., Riera-Guardia, N., Varas-lorenzo, C., Hazell, L., Berkman, N. D., et al. (2014). Quality assessment of observational studies in a drug-safety systematic review, comparison of two tools: the Newcastle–Ottawa scale and the RTI item bank. Clinical Epidemiology, 6, 359-368.
• Oremus, M., Oremus, C., Hall, G. B., McKinnon, M. C., ECT, & Team, C. S. R. (2012). Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales. BMJ Open, 2(4), e001368.
Other informationhttp://www.ohri.ca/programs/clinical_epidemiology/oxford.asp
Main referencesWells, G., Shea, B., O’connell, D., Peterson, J., Welch, V., Losos, M., et al. (2000). The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Retrieved April 16, 2016, from http://www.ohri.ca/programs/clinical_epidemiology/nosgen.pdf.

MINORS (Methodological Index for Non-Randomized Studies)

Study designsNon randomized studies
Number of items12
Rating0 (not reported), 1 (reported but inadequate) or 2 (reported and adequate)
ValidityTool developed based on the findings from a survey with 90 experts that were asked to rate on a 7-point scale the ability of items to assess the quality of a study. Discriminant validity was tested (Slim et al., 2003).
ReliabilityThe inter-rater reliability was assessed by having 2 raters appraising 80 studies. The Kappa ranged from 0.56 to 1.00 on items. The internal consistency was assessed by calculating Cronbach alpha value and was considered good by the authors (0.73). The test-retest reliability was assessed by having 30 articles score twice by a same raters (2 months interval). Kappa ranged from 0.59 to 1.00 on items (Slim et al., 2003).
Other informationN/A
Main referencesSlim, K., Nini, E., Forestier, D., Kwiatkowski, F., Panis, Y., & Chipponi, J. (2003). Methodological index for non‐randomized studies (MINORS): development and validation of a new instrument. ANZ Journal of Surgery, 73(9), 712-716.

MEVORECH (Methodological Evaluation of Observational Research)

Study designsTool for observational studies of risk factors of chronic diseases.
Number of items6 criteria for external validity, 13 for internal validity and 2 aspects of causality.
Ratingdifferent response choices
ValidityTool developed based on literature review on observational nontherapeutic studies and tools for quality assessment of observational studies. Face/content and discriminant validity was tested by experts (Shamliyan et al 2011).
ReliabilityInterrater reliability was pilot tested by experts (Shamliyan et al 2011).
Other informationN/A
Main referencesShamliyan, T. A., Kane, R. L., Ansari, M. T., Raman, G., Berkman, N. D., Grant, M., et al. (2011). Development quality criteria to evaluate nontherapeutic studies of incidence, prevalence, or risk factors of chronic diseases: pilot study of new checklists. Journal of Clinical Epidemiology, 64(6), 637-657.

MORE (Methodological Evaluation of Observational Research)

Study designsTool developed for observational studies of incidence or prevalence of chronic diseases
Number of items6 criteria for external validity and 5 for internal validity
Ratingdifferent response choices
ValidityInterrater reliability was pilot tested by experts (Shamliyan et al 2011).
ReliabilityInterrater reliability was pilot tested by experts (Shamliyan et al 2011).
Other informationN/A
Main referencesShamliyan, T. A., Kane, R. L., Ansari, M. T., Raman, G., Berkman, N. D., Grant, M., et al. (2011). Development quality criteria to evaluate nontherapeutic studies of incidence, prevalence, or risk factors of chronic diseases: pilot study of new checklists. Journal of Clinical Epidemiology, 64(6), 637-657.

RTI-Item Bank (Research Triangle Institute – Item Bank)

Study designsTool developed to appraise the quality of studies examining the outcomes of interventions or exposures (cohort studies, case-control, case-series, and cross-sectional studies).
Number of items29 items on 12 domains (background/context, sample definition and selection, interventions/ exposure, outcomes, creation of treatment groups, blinding, soundness of information, follow-up, analysis comparability, analysis outcome, interpretation, and presentation and reporting.
Ratingdifferent scales
ValidityThe tool was developed from a literature review of existing tools in which 60 items were selected. Sixteen experts provided input in each item. Then, nine potential users participated in a cognitive testing on readability, sufficiency, and appropriateness of questions. The content validity was tested with seven raters that rated the level of essentialness of each item (Viswanathan & Berkman, 2012).
ReliabilityTwelve raters appraised 10 studies. The mean percent agreement between raters was 66% (ranged from 56% to 90%) (Viswanathan & Berkman, 2012).
Other informationN/A
Main referencesViswanathan, M., & Berkman, N. D. (2012). Development of the RTI item bank on risk of bias and precision of observational studies. Journal of Clinical Epidemiology, 65(2), 163-178.
 

Single case experiment design

RoBiNT (Risk of Bias in N-of-1 Trials)

Study designsSingle-case experimental design (or n-of-1 trial)
Number of items15
Rating0, 1, or 2
ValidityThe SCED was developed from items generated from a literature review on the key features of single-case methodology. The tool content validity and utility were empirically tested against 85 published single-subject reports (Tate et al., 2008).
ReliabilityThe inter-rater reliability of the RoBiNT was tested with 2 experienced raters and 2 novice rater appraising 20 papers. The agreement for the total score was excellent, both for experienced raters (overall ICC =0.90) and novice raters (overall ICC=0.88) (Tate et al., 2013).
Other informationThis tool is an update of the SCED (Single-Case Experimental Design Scale).
Main referencesTate, R. L., Perdices, M., Rosenkoetter, U., Wakim, D., Godbee, K., Togher, L., et al. (2013). Revision of a method quality rating scale for single-case experimental designs and n-of-1 trials: The 15-item Risk of Bias in N-of-1 Trials (RoBiNT) Scale. Neuropsychological Rehabilitation, 23(5), 619-638.
Tate, R. L., Mcdonald, S., Perdices, M., Togher, L., Schultz, R., & Savage, S. (2008). Rating the methodological quality of single-subject designs and n-of-1 trials: Introducing the Single-Case Experimental Design (SCED) Scale. Neuropsychological Rehabilitation, 18(4), 385-401.
 

Case series

IHE QA ( Institute of Health Economics Quality Assessment)

Study designsCase series studies with a before-and-after comparison
Number of items20
Ratingyes, partial/unclear, no
ValidityThe tool was developed from the findings of a 4-round e-Delphi study with seven health technology assessment researchers. 105 studies were identified and six raters appraised each 35 studies. Factorial analysis (PCA) was conducted to examine the interrelationships among the criteria and identify clusters of criteria (Guo et al., 2016; Moga et al., 2012).
ReliabilityThe preliminary version was used by three raters that appraised 13 studies. Moderate to substantial agreement was found (Moga et al., 2012). The final version was used two raters that appraised seven studies (results not reported) (Guo et al., 2016).
Other informationhttps://www.ihe.ca/publications/ihe-quality-appraisal-checklist-for-case-series-studies
Main references• Guo, B., Moga, C., Harstall, C., & Schopflocher, D. (2016). A principal component analysis is conducted for a case series quality appraisal checklist. Journal of Clinical Epidemiology, 69, 199-207. e192.
• Moga, C., Guo, B., Schopflocher, D., & Harstall, C. (2012). Development of a quality appraisal tool for case series studies using a modified Delphi technique. Edmonton, AB: Institute of Health Economics.

Instrument for Evaluating the Quality of Case Series Studies in Chinese Herbal Medicine

Study designsTool to assess the quality of case series studies on herbal medicines.
Number of items13 items on 4 factors (study aims and design, description of treatment protocol, description of methods and therapeutic/side-effects, and conduct of the study)
Rating0 or 1
ValidityTool developed from a Delphi study with 7 experts. Five raters piloted the tool with 12 studies and commented on the wording and sequence. Using factorial analysis (PCA with varimax rotation), four factors were identified.
ReliabilityTwenty raters appraised 35 studies. The internal consistency and interrater reliability were good (Cronbach’s alpha between 0.80 and 0.85 and ICC of 0.904).
Other information
Main referencesYang, A. W., Li, C. G., Da Costa, C., Allan, G., Reece, J., & Xue, C. C. (2009). Assessing quality of case series studies: development and validation of an instrument by herbal medicine CAM researchers. The Journal of Alternative and Complementary Medicine, 15(5), 513-522.
 

Diagnostic accuracy studies

QAREL (Quality Appraisal tool for studies of diagnostic RELiability checklist)

Study designsStudies of diagnostic reliability
Number of items11
Ratingyes, no, unclear, N/A
ValidityTool developed based on epidemiologic principles, existing quality appraisal checklists, and the Standards for Reporting of Diagnostic Accuracy (STARD) and Quality Assessment of Diagnostic Accuracy Studies (QUADAS) resources. Three experts in diagnosis research provided feedback throughout the development of the tool (Lucas et al 2010).
ReliabilityThree reviewers independently appraised 29 articles. The agreement ranged from fair (k=0.27) to good (k=0.92) across the items (Lucas et a 2013).
Other information
Main references• Lucas, N., Macaskill, P., Irwig, L., Moran, R., Rickards, L., Turner, R., et al. (2013). The reliability of a quality appraisal tool for studies of diagnostic reliability (QAREL). BMC Medical Research Methodology, 13(1), 111.
• Lucas, N. P., Macaskill, P., Irwig, L., & Bogduk, N. (2010). The development of a quality appraisal tool for studies of diagnostic reliability (QAREL). Journal of Clinical Epidemiology, 63(8), 854-861.

QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies)

Study designsDiagnostic accuracy studies
Number of items4 keys domains of bias (patient selection, index test, reference standard, and flow and timing). Each domain has a set of signalling questions to help reach the judgments regarding bias and applicability.
Ratinglow, high or unclear risk of bias.
ValidityThe scope of the tool was defined by a group of 9 experts in diagnosis research. Then, four reviews were conducted to inform the topics to discuss during a face-to-face consensus meeting with 24 experts. The tool through piloting using online questionnaires (Whiting et al 2011).
ReliabilityPairs of reviewers piloted the tool in 5 reviews and interrater reliability varied considerably (Whiting et al 2011).
Other informationPrevious version: QUADAS developed in 2003.
QUADAS-2 developed in 2011.
Main referencesQUADAS-2:
• Whiting, P. F., Rutjes, A. W., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., et al. (2011). QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine, 155(8), 529-536.

QUADAS:
• Hollingworth, W., Medina, L. S., Lenkinski, R. E., Shibata, D. K., Bernal, B., Zurakowski, D., et al (2006). Interrater reliability in assessing quality of diagnostic accuracy studies using the QUADAS tool: a preliminary assessment. Academic Radiology, 13(7), 803-810.
• Mann, R., Hewitt, C. E., & Gilbody, S. M. (2009). Assessing the quality of diagnostic studies using psychometric instruments: applying QUADAS. Social Psychiatry and Psychiatric Epidemiology, 44(4), 300.
• Whiting, P., Rutjes, A. W., Reitsma, J. B., Bossuyt, P. M., & Kleijnen, J. (2003). The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Medical Research Methodology, 3(25), 1-13.
• Whiting, P. F., Weswood, M. E., Rutjes, A. W., Reitsma, J. B., Bossuyt, P. N., & Kleijnen, J. (2006). Evaluation of QUADAS, a tool for the quality assessment of diagnostic accuracy studies. BMC Medical Research Methodology, 6(9), 1-8.

PROBAST (Prediction model Risk Of Bias ASsessment Tool)

Study designsdiagnostic and prognostic prediction model studies
Number of items20 signalling questions on 4 domains (participants, predictors, outcome, and analysis – 2 to 9 in each domain)
Ratingyes, probably yes, probably no, no, no information
ValidityThe tool was developed from a Delphi study with 38 experts and literature review. The tool was piloted and refined during workshops at conferences and with graduate students as well as with 50 reviews groups (Wolf et al 2019).
Reliability
Other informationhttp://development.probast.org
Main references• Moons, K. G., Wolff, R. F., Riley, R. D., Whiting, P. F., Westwood, M., Collins, G. S., . . . Mallett, S. (2019). PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Annals of internal medicine, 170(1), W1-W33.
• Wolff, R. F., Moons, K. G. M., Riley, R. D., Whiting, P. F., Westwood, M., Collins, G. S., et al. (2019). PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies PROBAST (Prediction model Risk Of Bias ASsessment Tool). Annals of Internal Medicine, 170(1), 51-58.
 

Cross-sectional

AXIS tool (Appraisal tool for Cross-Sectional Studies)

Study designsCross-sectional studies
Number of items20
Ratingyes, no, do not know
ValidityThe tool was developed from a litrature review on critical appraisal tools of cross-sectional studies. It was piloted with reseachers involved in a systematic review, in journal clubs, and research meetings. A Delphi study with experts was conducted on the important components to included in the tool.
ReliabilityN/A
Other information
Main referencesDownes, M. J., Brennan, M. L., Williams, H. C., & Dean, R. S. (2016). Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS). BMJ Open, 6(12), e011458.
 

Other

MERSQI (Medical Education Research Study Quality Instrument)

Study designsTool developed in the field of medical education and designed for experimental, quasi-experimental, and observational studies.
Number of items10 items on 6 domains (study design, sampling, type of data (subjective or objective), validity, data analysis, and outcomes)
RatingA maximal score of 3 for each domain.
ValidityTool developed from a literature review and discussion and revision among authors. The tool dimensionality was examined using factorial analysis (PCA with orthogonal rotation). The criterion validity was tested by comparing with global quality rating (1 to 5) from two independent experts on 50 studies. The associations between the MERSQI scores and the impact factors and citation rates were measured. Total MERSQI scores were associated with expert quality ratings, 3-year citation rate, and journal impact factor. In multivariate analysis, MERSQI scores were independently associated with study funding of $20 000 or more and previous medical education publications by the first author (Reed et al., 2007).
ReliabilityPairs of raters appraised 210 papers. Also, each study was reappraised by the same rater between 3 and 5 months after the first rating. The ICC ranges for interrater and test-retest reliability were 0.72 to 0.98 and 0.78 to 0.998, respectively The Cronbach alpha (internal consistency) for the overall MERSQI was 0.6 (Reed et al., 2007).
Other information
Main referencesReed, D. A., Cook, D. A., Beckman, T. J., Levine, R. B., Kern, D. E., & Wright, S. M. (2007). Association between funding and quality of published medical education research. Journal of the American Medical Association, 298(9), 1002-1009.