user manual cosmin risk of bias tool v4 jan final · 2021. 1. 16. · 6 1. background information...
TRANSCRIPT
![Page 1: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/1.jpg)
1
COSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerrorofoutcomemeasurementinstrument
usermanual
Version1.0datedJanuary2021
LidwineBMokkinkMaartenBoers
CeesvanderVleutenDonaldLPatrickJordiAlonsoLexMBouter
HenricaCWdeVetCarolineBTerwee
ContactLBMokkink,PhDAmsterdamUMC,VrijeUniversiteitAmsterdam,DepartmentofEpidemiologyandDataScienceAmsterdamPublicHealthresearchinstituteDeBoelelaan1117,1081BTAmsterdamTheNetherlandsWebsite:www.cosmin.nlE‐mail:[email protected]
![Page 2: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/2.jpg)
2
ThedevelopmentoftheCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerrorwaspartoftheVENIprogrammewithprojectnumber91617098,fundedbyZonMw(TheNetherlandsOrganisationforHealthResearchandDevelopment).
![Page 3: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/3.jpg)
3
TableofContent
Foreword 5
1. Backgroundinformation 6
1.1 COSMINinitiativeandsteeringcommittee 6
1.2Howtocitethismanual 7
1.3DevelopmentoftheCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerror
7
1.4 Definitionsofreliabilityandmeasurementerror 7
1.5 FocusoftheCOSMINRiskofBiastool 8
1.6ThestructureoftheCOSMINRiskofBiastool 10
1.7 The“worst‐score‐counts”method 10
1.8 Relevanceoftheresearchquestion 11
1.9 UsingtheCOSMINRiskofBiastoolinasystematicreview 11
1.10Expertiserequiredforusingthetool 12
1.11UsingtheCOSMINRiskofBiastooltoassessstudiesonPROMsorObsROMs
12
1.12ARiskofBiastoolisnotastudydesignchecklist,norareportinggiudeline
13
2. PartA.Understandinghowastudyinformsusaboutthereliabilityandmeasurementerrorofanoutcomemeasurementinstrument
14
2.1 Componentsofoutcomemeasurementinstruments 14
2.2 Extractingtheelementsofacomprehensiveresearchquestion 20
2.3 ExampleofhowtousePartAoftheCOSMINRiskofBiastooltoassessthequalityofastudybySkeieetal.(2015)
27
3. PartB.Assessingtheriskofbiasofastudyonreliabilityormeasurementerror
31
3.1Elaborationonstandardsforstudiesonreliability 33
3.2Elaborationonstandardsforstudiesonmeasurementerror 40
3.3ExampleofhowtousePartBoftheCOSMINRiskofBiastooltoassessthequalityofastudybySkeieetal.(2015)
45
4. UsingtheCOSMINRiskofBiastoolinasystematicreviewofoutcomemeasurementinstruments
47
4.1Theeleven‐stepprocedureforconductingasystematicreviewofClinROMs,PerFOMs,orlaboratoryvalues
50
![Page 4: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/4.jpg)
4
Appendix1.DataExtractiontableofrelevantinformationforeachincludedstudyinasystematicreview.
60
Appendix2.RiskofBiasratingsperstandardperstudy 62
Appendix3.ExampleofaFlow‐chart 63
Appendix4.Exampleofreportingtableoncharacteristicsoftheincludedmeasurementinstruments.
64
Appendix5.Exampleofreportingtableoncharacteristicsofthestudypopulations. 65
Appendix6.OverviewTableofqualityandresultsofstudiesonreliabilityandmeasurementerror.
66
Appendix7.SummaryofFindingsTablesforReliabilityandMeasurementerror. 67
References 68
![Page 5: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/5.jpg)
5
ForewordTheCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerrorwasdevelopedtotransparentlyandsystematicallyassessthemethodologicalqualityofstudiesonreliabilityandmeasurementerrorofalltypesofoutcomemeasurementinstruments.ItisanextendedversionoftheCOSMINRiskofBiaschecklistfortheboxesreliabilityandmeasurementerrorforPROMs(1).Itwasdevelopedforclinician‐reportedoutcomemeasures(ClinROMs)(includinge.g.readingsbasedonimagingmodalitiesandratingsbasedonobservations),performance‐basedoutcomemeasurementinstruments(PerFOMs),orbiomarkers–alsocalledlaboratoryvalues(2,3).ThesemeasurementinstrumentsaremorecomplexthanPROMs,asnotonlypatientsareinvolved,butalsoprofessionals,andsometimes(complex)devices.Specificallyinstudiesonreliabilityandmeasurementerrortheseadditionalsourcesofvariationcomplicatethedesignofthesestudiesandmayinfluencetheirquality.Asdifferentsourcesofvariationcanplayarole,differentstudiescanbeconductedtoassessthereliabilityormeasurementerrorofanoutcomemeasurementinstrument.Toassessthequalityofsuchastudy,oneshouldunderstand(1)howtheresultsofapublishedstudyonreliabilityormeasurementerrorinformusaboutthereliabilityandmeasurementerroroftheoutcomemeasurementinstrumentunderstudy,and(2)whetherwecantrusttheresultfoundinthestudybyassessingtheriskofbiasofthestudy.ThesetwostepsarereflectedinthenewCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityormeasurementerrorofoutcomemeasurementinstruments(4).Thequalityassessmentofastudyonreliabilityormeasurementerrorcanbeconductedinthecontextofasystematicreviewofoutcomemeasurementinstruments.Insuchareviewallmeasurementpropertiesareconsidered,thequalityoftheeachstudyisassessed,theresultsofthestudiesareextracted,andpermeasurementpropertyanoverallconclusionisdrawnaboutthequalityoftheinstrumentbasedonallavailableevidenceforeachmeasurementinstrument.Subsequently,thequalityoftheevidenceisgraded,takingthenumber,quality,and(consistencyof)resultsofthestudiesintoaccount.Arecommendationforthemostsuitableinstrumentismade,basedonquality,feasibilityandinterpretabilityofeachinstrument.Asthisisnotaneasytasktoperform,weencouragetousesystematicandtransparentmethodswhenconductingsuchsystematicreviews.WedevelopedtheCOSMINmethodologyforconductingsystematicreviewsofPROMS(5),includingtheCOSMINRiskofBiaschecklist(1,6).Whenconductingasystematicreviewofothertypesofoutcomemeasurementinstruments,suchasClinROMs,PerFOMs,orlaboratoryvalues,thisnewlydevelopedCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerrorcanbeincorporatedintotheCOSMINmethodology.Inthismanualwewillexplainhowthisnewtoolshouldbeused.
![Page 6: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/6.jpg)
6
1. Backgroundinformation
1.1 COSMINinitiativeandsteeringcommittee
TheCOSMINinitiativeaimstoimprovetheselectionofhealthmeasurementinstrumentsbothinresearchandclinicalpracticebydevelopingtoolsforselectingthemostsuitableinstrumentforagivensituation.COSMINisaninternationalinitiativeconsistingofamultidisciplinaryteamofresearcherswithexpertiseinepidemiology,psychometrics,andqualitativeresearch,andinthedevelopmentandevaluationofoutcomemeasurementinstrumentsinthefieldofhealthcare,aswellasinperformingsystematicreviewsofoutcomemeasurementinstruments.ThistoolwasdevelopedinaDelphistudy(4).Thesteeringcommitteeofthisstudyconsistedof:LidwineBMokkinkMaartenBoersCeesvanderVleutenDonaldLPatrickJordiAlonsoLexMBouterHenricaCWdeVetCarolineBTerweeWeareverygratefultoallthepanelistsofthisstudy,whoprovideduswithmanyhelpfulandcriticalcommentsandarguments(inalphabeticalorder):M.A.D’Agostino,DorcasBeaton,SophievanBelle,SandraBeurskens,KristieBjornson,JanBoehnke,PatrickBossuyt,DonBushnell,StefanCano,SaskialeCessie,AlessandroChiarotto,MikeClark,JonDeeks,IrisEekhout,JimFarnsworthII,OkeGerke,SabineGoldhahn,RobertM.Gow,PhilipGriffiths,CristianGugiu,Jean‐BenoitHardouin,DesiréevanderHeijde,I‐ChanHuang,EllenJanssen,BrianJolly,LarsKonge,JanKottner,BrittanyLapin,HannekevanderLee,MariskaLeeflang,NancyMayo,SueMallett,JoyC.MacDermid,GeertMolenberghs,HolgerMuehlan,KoenNeijenhuijs,RaymondOstelo,LauraQuinn,DennisRevicki,JussiRepo,JohannesB.Reitsma,AnneW.Rutjes,MohsenSadatsafavi,DavidStreiner,MatthewStephenson,BerendTerluin,ZyphanieTyack,WernerVach,GemmaVilagutSaiz,MarcK.Walton,MatthijsWarrens,andDanielYeeTakFong.
![Page 7: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/7.jpg)
7
1.2 Howtocitethismanual
ThismanualaccompaniesthetooldevelopedintheDelphistudy.Please,refertothearticlewhenusingthemanualoftheCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerror.LBMokkink,MBoers,CPMvanderVleuten,LMBouter,JAlonso,DLPatrick,HCWdeVet,CBTerwee.COSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityormeasurementerrorofoutcomemeasurementinstruments:aDelphistudy.BMCMedicalResearchMethodology.2020;20(293).1.3 DevelopmentoftheCOSMINRiskofBiastooltoassessthequalityofstudieson
reliabilityandmeasurementerror
ThisCOSMINtoolwasdevelopedinaDelphistudy,containingthreerounds.Formoreinformationaboutthemethodsofthisstudy,werefertoMokkinketal.2020.InthisDelphistudywereachedconsensusonhowtoformulateacomprehensiveresearchquestionforstudiesonreliabilityandmeasurementerror,oncomponentsofoutcomemeasurementinstruments(whicharethepotentialsourcesofvariationrelevantinstudiesonreliabilityandmeasurementerror),andonstandardstoassessthequalityofastudyonreliabilityandmeasurementerrorofClinROMs,PerFOMs,orlaboratoryvalues.Basedonthoseresults,wedevelopedtheCOSMINRiskofBiastoolwhichcomprisestwoparts:1)sevenelementsthatmakeupacomprehensiveresearchquestionofthestudy,whichinformsusonhowthereliabilityandmeasurementerroroftheoutcomemeasurementinstrumentwasstudied,and2)standardsondesignrequirementsandpreferredstatisticalmethodsofstudiesonreliabilityandmeasurementerror,whichcanbeusedtoassessthequalityofthestudy.1.4 Definitionsofreliabilityandmeasurementerror
Reliabilityandmeasurementerrorareimportantmeasurementpropertiesofoutcomemeasurementinstruments.Reliabilityandmeasurementerroraredeterminedbasedonthesamestudydesignanddatacollection,butwithdifferentstatisticalmethods.Thesemeasurementpropertiesarethereforerelated,butdistinct.Reliabilityisdefinedastheproportionofthetotalvarianceinthemeasurementwhichisduetotruedifferencesbetweenpatients(7).Itreferstowhatextendaninstrumentisabletodistinguishbetweenpatients;areliabilitystudyinvestigatestheextenttowhichdifferentsourcesofvariationinfluencethemeasurement.Thisgivesdirectionforhowtoimprovethemeasurement,forexamplebystandardizationorrestrictionofthesourceofvariation.ReliabilitycanbecalculatedwithanIntra‐classCorrelation
![Page 8: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/8.jpg)
8
Coefficient(ICC),aGeneralizabilityCoefficientorwithakappa.Reliabilityparametersareexpressedasaproportionandliesbetween0and1.Measurementerrorisdefinedasthesystematicandrandomerrorofapatient’sscorethatisnotattributedtotruechangesintheconstructtobemeasured(7).Itreferstohowclosethescoresofrepeatedmeasurementsinstablepatientsare;suchstudiesinvestigatetheabsolutedeviationofthescoresortheamountoferrorofrepeatedmeasurementsinstablepatients.Incaseofcategoricaloutcomesitisalsocalled‘agreement’.ForcontinuousoutcomesmeasurementerrorisexpressedinthemeasurementunitsofthemeasurementinstrumentwithaStandardErrorofMeasurement(SEM)orLimitsofAgreement(LoA).Forcategoricaloutcomesagreementisexpressedaspercentagetotalagreementorpercentagesspecific(e.g.positiveandnegative)agreement.1.5 FocusoftheCOSMINRiskofBiastoolWefocusonoutcomemeasurementinstruments,definedasinstrumentsusedtomonitorthehealthstatusof(agroupof)peopleovertime,forexampleinaclinicaltrialorinclinicalpractice.
Severaltypesofmeasurementinstrumentsexist,suchaspatient‐reportedoutcomemeasure(PROM);observer‐reportedoutcomemeasures(ObsROMs;i.e.proxymeasures);clinician‐reportedoutcomemeasurementinstruments(ClinROMs)(includinge.g.readingsbasedonimagingmodalitiesandratingsbasedonobservations);performance‐basedoutcomemeasurementinstruments(PerFOMs);andbiomarkeroutcomes–alsocalledlaboratoryvalues(2).TheCOSMINRiskofBiastooltoassessreliabilityandmeasurementerrorisspecificallydevelopedforClinROMs,PerFOMs,andlaboratoryvalues(seeTable1forexamples).Theseoutcomemeasurementinstrumentstypicallyrequireinvolvementofoneormoreprofessionalstooperateequipmentortools,togiveinstructionstothepatient(e.g.toperformataskoraction)ortocometoascorethroughtheirclinicalexpertise(e.g.afterobservingapatientoranimage).Anoutcomemeasurementinstrumentcomprisesthewholemeasurementproceduretocometoascore,includingissuessuchasmaterials,communication(e.g.instructionsandmotivatingpatientsincaseofperformance‐basedtest),clinicaljudgment,performingatask.Allissuesrelevantforreliableandvalidmeasurementshouldbedescribedinthemeasurementprotocolofanoutcomemeasurementinstrument.
![Page 9: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/9.jpg)
9
Table1.ExamplesofClinROMs,PerFOMs,andlaboratoryvaluesClinician‐reportedoutcomemeasurementinstruments(ClinROMs)Clinician‐reportedratingoftheseverityofadiseaseorcondition.Forexample,theHamiltonAnxietyRatingScaletoassesstheseverityofanxietysymptomscomprises14itemsthatarescoredbyaclinician(8).AGlobalAssessmentoftheseverityofaconditionscorede.g.onasingle‐itemVisualAnalogueScalebyahealth‐careprofessional.Resultofclinicalexaminationof(patho)physiology,suchasbloodpressureoracountofswollenjoints.Clinicalreadingofdevice‐basedresults(oftenimaging),suchpowerDopplerultrasonographytoassessscardiacstructure,functionandhemodynamics(echocardiography)(9),orMRIusedtoevaluatecartilagedefectsize,depth,andsubchondralboneinordertoassesschondralandosteochondrallesionsattheknee(10).Performance‐basedoutcomemeasurementinstrument(PerFOMs)Aperformance‐basedwalkingtest(e.g.thetimed25‐footwalktest(11)),inwhichaprofessionalinstructsapatienttowalk25feetathisowncomfortablepacewithorwithoutawalkingaid.Timeneededtocover25feetismeasuredbytheprofessional.LaboratoryvalueorbiomarkerLaboratoryvaluesuchasHbA1c(glycatedhaemoglobin)measuredbytheturbidimetricinhibitionimmunoassay(TINIA)(12).DifferentversionsoroperationalizationsofoutcomemeasurementinstrumentsTomeasureaspecificconstruct,differentversionsofameasurementinstrumentmayexist.Forexample,theDoloplusisaclinicalassessmenttooltomeasurebehaviouralpainassessmentincognitivelyimpairedpatients,andisadministerede.g.bytheattendingnurse.TheoriginalDoloplus‐1contained15items,whiletheDoloplus‐2contains10items(13).Ameasurementinstrument(i.e.themeasurementprotocol)canbeoperationalizedinmanydifferentways,andeachoperationalizationcouldbeconsideredadifferentversion.Forexample,thespecificequipmentusedtomeasuretherangeofmotion(ROM)candiffer,e.g.,asimpleuniversalgoniometer(14)oranelectromagnetic3‐dimensionaltrackingsystem(15).Thelocationtobemeasuredcandiffer,e.g.,theneck(14)ortheshoulder(16).Thebackgroundoftheprofessionalinvolvedcandiffer,e.g.,arheumatologistoraradiologistwhoconductsthemeasurement,andtheseratersmayhavehaddifferentlevelsoftraining(17).Inprinciple,weconsidereachversionofanoutcomemeasurementinstrumentoreachdifferentoperationalizationofthemeasurementprotocolasaseparatemeasurementinstrument,untilevidenceisprovided(e.g.testingofmeasurementinvariance,orreliability)thattheversionsperformsimilarly.
![Page 10: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/10.jpg)
10
1.6 ThestructureoftheCOSMINRiskofBiastool
TheCOSMINRiskofBiastoolcomprisestwoparts.PartAhelpstounderstandhowtheresultsofapublishedstudyinformusaboutthereliabilityormeasurementerroroftheoutcomemeasurementinstrumentsunderstudy.PartBhelpstoassesswhetherwecantrusttheresultobtainedinthestudybyassessingtheriskofbiasofthestudy.PartAForagoodunderstandingofhowtheresultsofastudyinformsusaboutthereliabilityandmeasurementerroroftheinstrument,agoodunderstandingofthedesignofthestudyanditscorrespondingcomprehensiveresearchquestionisneeded.InpartAwedescribethesevenelementsthatwerecommendtobeextracted,andthattogethercanbeusedtoconstructacomprehensiveresearchquestionforeachanalysis.Inaddition,PartAofthetoolcontainsanoverviewofthecomponentsofoutcomemeasurementinstruments.Thesecomponentarethepotentialsourcesofvariationthatcaneitherbestudied(i.e.variedacrosstherepeatedmeasurements),orarekeptorassumedtobestable(i.e.standardized).PartB.Next,wedevelopedtwoboxeswithstandardsforstudiesonreliabilityandforstudiesonmeasurementerror,respectively.AsintheCOSMINRiskofBiaschecklistforPROMs(1),standardsrefertodesignrequirementsandpreferredstatisticalmethodsofstudiesonmeasurementproperties.Forexample,‘reliabilityandmeasurementerrorshouldbeassessedinpatientsthatareassumedtobestable’;or‘measurementerrorshouldbeassessedwiththestandarderrorofmeasurementorwiththelimitsofagreement’.Thestandardsarestatedasquestions:e.g.‘werepatientsstableintheinterimperiodontheconstructtobemeasured?’.Wereferto‘preferred’statisticalmethods.Wemeanby‘preferred’thatthesestatisticalmethodsareappropriatetousewhenevaluatingreliabilityormeasurementerrorofoutcomemeasurementinstruments,andarecommonlyused.Othermethodsmaybeappropriatetouseaswell(forexamplebi‐factormodelsorMulti‐TraitMulti‐Method(MTMM)analyses,ornewlydevelopedmethods).Itisnotourintentiontocomprehensivelydescribeallpossiblestatisticalmethods,rathertodescribetheadequatemethodsthatarecommonlyusedintheliterature.ItisuptotheuseroftheCOSMINtoolhowstudiesusingtheselesscommonlyusedmethodsareassessed.1.7 The“worst‐score‐counts”principle
Eachstandardinaboxisscoredonthefour‐pointscale,i.e.‘verygood’,‘adequate’,‘doubtful’,and‘inadequate’,seechapter3formoreinformation.SimilarasintheCOSMINRiskofBiaschecklistforPROMs(1),weusetheworst‐score‐countsmethod(18)tocometoaratingforthequalityofthestudyonreliabilityormeasurementerror.
![Page 11: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/11.jpg)
11
1.8 Relevanceoftheresearchquestion
Whilemanydifferentresearchquestionsconcerningthereliabilityormeasurementerrorofanoutcomemeasurementinstrumentcanbeinvestigated,therelevanceofastudyisnotunderquestionwhenusingthistool.Therelevanceofastudyreferstodifferentaspects.
‐ Choiceofthepotentialsource(s)ofvariationthathasbeenvariedovertherepeatedmeasurements.
‐ Choiceofthetargetpopulationofpatientsandprofessionals(whenapplicable)ofthestudy.
‐ Choiceofhowthemeasurementprotocolwasexecuted,whenapplicable.‐ Choiceofevaluatingthespecificmeasurementproperty,eitherreliabilityor
measurementerror.Oftenonlyreliabilityisreported,whilethemeasurementerrorcanbecalculatedusingthesamedata.
WhenusingthisCOSMINRiskofBiastool,theseaspectswillbeextractedfromthedesignofthestudy(inpartA).However,nojudgementwillbegivenabouttheappropriatenessofthechoicesmade.Thechoicesmadeintheresearchquestionandstudydesignbytheresearchersdeterminetheinterpretationandgeneralizabilityoftheresults.1.9 UsingtheCOSMINRiskofBiastoolinasystematicreview
TheCOSMINRiskofBiastoolisdevelopedtoassessthequalityofapublishedstudy.OneapplicationoftheCOSMINRiskofBiastoolistoassessthequalityofstudieswhenconductingasystematicreviewonmeasurementinstruments.COSMINdevelopedasystematicmethodologyforconductingsystematicreviewsofPROMs(5).Itconsistsofa10stepprocedure,inwhichtheCOSMINRiskofBiaschecklist(1)(containingstandardsforallninemeasurementproperties)canbeappliedtothestudiestoassessthequalityofeachstudy.TousetheCOSMINmethodologyforconductingsystematicreviewsofothertypesofinstruments–thatis:otherthanPROMs–weadvisetoreplacetheboxes6(Reliability)and7(Measurementerror)withtheCOSMINRiskofBiastooltoassessthequalityofstudiesonreliabilityandmeasurementerrorofoutcomemeasurementinstruments.MoreinformationabouthowtoconductasystematicreviewusingthenewCOSMINRiskofBiastoolcanbefoundinchapter4.
![Page 12: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/12.jpg)
12
1.10 Expertiserequiredforusingthetool
Toassessthequalityofastudyonreliabilityandmeasurementerror,i.e.foruseinasystematicreviewonthequalityofoutcomemeasurementisquitecomplexandtimeconsuming,anditrequiresexpertisewithintheresearchteamonseveralaspects.Werecommendthatatleastoneoftheteammembersshouldhaveexpertiseontheconstructtobemeasured,e.g.tounderstandwhatappropriatetimeintervalsarebetweenrepeatedmeasurements;onthemeasurementinstruments,e.g.tounderstandwhatconcomitantsourcesofvariationcouldbe(andtheseshouldberestrictedorstandardized–seeelement2inPartA);onthepatientpopulation,e.g.tounderstandwhetherpatientswerestablebetweenrepeatedmeasurementsorwhethersubgroupsofpatientscanbeconsideredinonestudy.Aclinicalexpertmightcombinetheseexpertises.Amethodologicalexpertshouldbepartoftheteammemberwithexpertiseonthetheoryofreliabilityandmeasurementerror,e.g.tounderstandwhetherthedesignisappropriatelyanalyzed(e.g.standards7).1.11 UsingtheCOSMINRiskofBiastooltoassessstudiesonPROMsorObsROMs
ThisnewCOSMINRiskofBiastoolisdevelopedspecificallyforClinROMs,PerFOMs,andlaboratoryvalues.However,itcanalsobeusedtoassessthequalityofstudiesonreliabilityormeasurementerrorofPROMsorobserver‐reportedoutcomemeasures(ObsROMs;i.e.observationsmade,appraised,andrecordedbyapersonotherthanthepatientwhodoesnotrequirespecializedprofessionaltraining(2),e.g.proxymeasures).However,forthesetwotypesofinstrumentsthetoolmayseemunnecessarilycomplex.Thefirststepinthetool(i.e.understandinghowtheresultsinformusonthequalityofthemeasurementinstrumentunderstudy)isoftenobvious,astheaimofreliabilitystudiesofPROMsandObsROMsismostoftentoassesstest‐retestreliabilityormeasurementerrorofthewholemeasurementinstrument(asthesemeasurementinstrumentscanonlybetakeninonego,andtheonlypotentialsourceofvarianceisoccasion).Thesecondstepinthetool(assessingthequalityofthestudyusingthestandards)willleadtothesameratingcomparedtousingthestandardsoftheRiskofBiaschecklistforPROMs.Thestandardsondesignrequirementsinbothtoolsarepartlythesame.However,thenewtypesofoutcomemeasurementinstrumentsforwhichweadaptedtheCOSMINchecklist(i.e.ClinROMs,PerFOMsandlaboratoryvalues),requireadditionalstandards,whicharenotusuallyapplicableforPROMsandObsROMs.(Ifitisapplicableinaspecificstudy,itcouldberatedusingthe‘otherflaws’standardintheRiskofBiaschecklistforPROMs).Theresponseoptionsforstandardsonpreferredstatisticalmethodsinthenewtoolaresomewhatdifferentlyformulated,butwillleadtothesameratingasthePROMRiskofBiaschecklist.
![Page 13: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/13.jpg)
13
1.12 ARiskofBiastoolisnotastudydesignchecklist,norareportingguideline
ThisCOSMINRiskofBiastoolisdevelopedtoassessthequality(i.e.riskofbias)ofapublishedstudyonreliabilityormeasurementerror.Thistoolisnotdevelopedasadesignchecklistorareportingguideline.Whendesigningorreportingastudyonreliabilityormeasurementerroradditionalitemsarerelevanttoconsiderorreport.Forexample,thesamplesizeofpatientsamplesandnumberofratersorrepeatedmeasurementsareimportantinthedesignofastudy,andwhenreportingspecificresultssuchasthevariancecomponents,95%confidenceintervalsaroundICCs,marginalwhenreportingkappa’s,oradditionalassumptionsarerequired.
![Page 14: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/14.jpg)
14
2. PartA.Understandinghowastudyinformsusaboutthereliabilityandmeasurementerrorofanoutcomemeasurementinstrument.
Ingeneral,thedesignofastudyonreliabilityandmeasurementerrorisaboutrepeatedmeasurementinstablepatients.Eachmeasurementisaccompaniedbysomeerror.Thiserroriscausedbysourcesofvariation,suchastheequipmentused,theprofessionalsinvolved,andothercomponentsofmeasurementinstruments.Forexample,thescoreonaninstrumentcanbeinfluencedbyhowtheratermotivatesthepatient,howthemachinewassetup,orbytheoccasion(e.g.firstandsecondoccasion,dayoftheweek,timeoftheday).Inchapter2.1wesystematicallydescribeallcomponentsofoutcomemeasurementinstruments,whicharethepotentialsourcesofvariationofanoutcomemeasurementinstrument.Manydifferentsourcesofvariationcanaffectthemeasurement,andeachofthemcanbestudiedusingadifferentstudydesigns.Eachstudydesignanswersadifferentresearchquestion,andeachresearchquestiongivesspecificinformationaboutthequalityofthemeasurementinstrument.Tounderstandhowastudycaninformusaboutthequalityofanoutcomemeasurementinstrumentwedescribeinchapter2.2sevenelementsofacomprehensiveresearchquestion.PartAofthetoolcontainstheoverviewsofthecomponentsofoutcomemeasurementinstruments(foroutcomemeasurementinstrumentsthatdoesnotinvolvebiologicalsampling,andthosethatinvolvebiologicalsampling,respectively),andthesevenelementsofacomprehensiveresearchquestion.Inchapter2.3weprovideanexampleinwhichweshowhowtousePartAofthetool,byapplyingittoapaperbySkeie(19).Inchapter2.2wewillusethisexample,too(amongotherexamples).
2.1 Componentsofoutcomemeasurementinstruments
Allmeasurementinstrumentsconsistofcomponents,suchasequipmentandpreparatoryactions.Wedevelopedtwotaxonomiesofcomponentsofoutcomemeasurementinstruments,oneforoutcomemeasurementinstrumentsthatdonotinvolvebiologicalsampling(i.e.ClinROMsandPerFOMs)(seeTable2),andoneforthosethatdo(i.e.thelaboratoryvalues,suchasbloodorurinetests,tissuebiopsy)(seeTable3).
![Page 15: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/15.jpg)
15
Table2.ComponentsofoutcomemeasurementinstrumentsthatdonotinvolvebiologicalsamplingComponent Elaboration Examples
Equipment Allequipmentnecessaryinthepreparation,theadministration,andtheassignmentofscoresoftheoutcomemeasurementinstrument
Questionnaireforms,computers,tablet,penandpaper;stairstepsofaspecificheight;deviceortools(suchasstopwatch,probe,tube);ultrasoundmachine,ultrasoundgels,MRIscanner;software.
Preparatoryactionsprecedingrawdatacollectionbyprofessionals,patients,andothers(ifapplicable)
1.Generalpreparatoryactions,suchasrequiredexpertiseortrainingforprofessionalstoprepare,administer,storeorassignthescores2.Specificpreparatoryactionsforeachmeasurement,suchas
preparationsofequipment,environment,storagebyprofessionalsa
preparationsofthepatientbbytheprofessional
Training,educationorexperiencerequired,certification.Preparationofequipment:calibrationofdevice/equipment,adjustsettingsofthemachine.Preparationoftheenvironment:lightconditions,roomtemperature,humidity,specificlengthofawalkingtrack.Preparationforstorage:designdatabaseandlogbookProvidegeneralandpreparatoryinstructionsforthepatients,suchasexplainingthetasks/actionthatneedtobeperformedincludingtimeschedule,safetyissuesandsideeffects;instructionsondiet(e.g.useofcaffeine),clothing(e.g.comfortableshoes,nojewelry,glassesordevices),performanceduringtests(e.g.performataskasusual;trytowalkasfastasyoucan;lieascalmaspossible);setsometrainingorperformafamiliarizationsession.Attachingelectrodestothebody,injectionwithradioactivesubstanceorcontrastdye,positioningthepatient,applyingultrasoundgel.
![Page 16: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/16.jpg)
16
Component Elaboration Examples
Preparationsundertakenbythepatients
Listentoandunderstandingtheinstructionsprovided;adherencetothepreparatoryinstructionssuchasfasting,resting,takingmedication,bowelpreparation,exercising,shaving.
Collectionofrawdata
Allactionsundertakenbypatientandprofessional(s)tocollectthedata,beforeanydataprocessing
Thepatientcompletingquestionsathome,oratthehospital;orperformingthetasks;theraterobservingortimingtheperformance;switchingtheimagingdeviceonandoff;positioningandmovingtheultrasoundprobe.
Dataprocessingandstorage
Allactionsundertakenontherawdatatostoreitinausable(electronic)formforlaterdatamanipulation(suchasscoreassignmentorstatisticalanalysis)
ThedigitallyconvertedsignalofaspecificbodyMRIscanwhichistemporarilystoredintheK‐space,issenttoanimageprocessorwhereamathematicalformula(i.e.Fouriertransformation)isapplied,leadingtoanimagewhichisdisplayedonamonitorandsavedonacomputer;Otherexamples:answersofquestionitemsarerecordedone.g.paperformsandstoredorLikertscaleformatresponseoptionsareconvertedintoa0‐4scoreanddirectlyenteredinacomputerdatabase.Performanceofdataqualitycheckse.g.doubleentryorvalidationchecksonthestored/entereddata.
Assignmentofthescore(s)
Methodsusedtoconvertprocesseddataintoascorecthatconstitutestheoutcomemeasurementinstrument.
Acalculationofamathematicalformulaortheapplicationofascoringsalgorithm(e.g.asetofrulestobefollowed)totheprocesseddata;aclinicianselectsthespecificimagesandjudgestheseverityandquantityofe.g.lesionsonthesetofimagesorcomparesittoareference;scoresadjustedfore.g.missingdataorpatientsusingdevicessuchasmobilityaids.
aProfessionalsarethosewhoareinvolvedinthepreparationortheperformanceofthemeasurement,inthedataprocessing,orintheassignmentofthescore;thismaybedonebyoneandthesameperson,orbydifferentpersons.bIntheCOSMINmethodologyweusetheword‘patient.’However,sometimesthetargetpopulationisnotpatients,bute.g.healthyindividuals,caregivers,clinicians,orbodystructures(e.g.joints,orlesions).Inthesecases,thewordpatientshouldbereadase.g.healthyvolunteer,clinician,ortherelevantbodystructure.cThescorecanbefurtherusedorinterpreted,byconvertingascoretoanotherscale,metricorclassification.Forexample,acontinuousscoreisclassifiedintoanordinalscore(e.g.mild/moderate/severe),ascoreisdichotomizedintobeloworaboveanormalvalue,patientsareclassifiedasrespondertotheintervention(e.g.whentheirchangeislargerthantheMinimalImportantChange(MIC)value).
![Page 17: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/17.jpg)
17
Table3.Componentsofoutcomemeasurementinstrumentsthatinvolvebiologicalsampling
Component Elaboration Examples
Equipment Allequipmentusedinthepreparation,theadministration,andthedeterminationofthevaluesoftheoutcomemeasurementinstrument
Collectiontools,suchasvenapunctureset,biopsytool;materialcontainers,suchasforbloodplasma(EDTAofheparintube),fortissue(containerforfrozenspecimensforimmunofluorescence,jarfilledwithformalin),forurinecollection(sterile,screw‐topcontainer),forstandardmicroscopictissueevaluation(fluidortissueforculture(sterilejar));laboratoryequipmentsuchascentrifuges,cabinets,andchromatographysystems,computers,software.
Preparatoryactionsprecedingsamplecollectionbyprofessionals,patients,andothers(ifapplicable)
1.Generalpreparatoryactions,suchasrequiredexpertiseortrainingforprofessionalstoprepare,administer,storeanddeterminethevalue
Training,educationorexperiencerequired,certification.
2.Specificpreparatoryactionsforeachmeasurement,suchas
preparationsofequipment,environment,andstoragebyprofessionalsa
preparationofthepatientbbytheprofessional
Preparationofequipment:calibrationofdevice/equipment,adjustsettingsofthemachine.Preparationoftheenvironment:lightconditions,roomtemperature,humidity.Preparationofstorage:set‐upallequipmentforstorage.Providegeneralandpreparatoryinstructionstothepatients,suchasexplainingthemeasurementprocedureincludingsafetyissuesandsideeffects;instructionsondiet;insertionandwithdrawalofacatheterintoabloodvessel.
![Page 18: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/18.jpg)
18
Component Elaboration Examples
Preparatoryactionsundertakenbythepatients
Listentoandunderstandingtheinstructionsprovided;adherencetothepreparatoryinstructionssuchasfasting,resting,takingmedication,exercising,shaving,washingofhands.
Collectionofbiologicalsample
Allactionsundertakentocollectthebiologicalsample,beforeanysampleprocessing
Takingabloodsampleortissuebiopsy,collectionofasampleofurine‘mid‐stream’inacontainer.
Biologicalsamplingprocessingandstorage
Allactionsundertakentobeabletopreserve,transport,andstorethebiologicalsamplefordetermination;and,ifapplicable,furtheractionsundertakenonthestoredsampletobeabletoconductthedeterminationofthebiologicalsample
Initialreactionofmaterialtoreagentincontainer(e.g.anticoagulationbyheparin).Bloodisdecomposed(bygravity)intoplasmaandbloodcells,andstoredataspecifictemperature.Tissueissnapfrozenbyimmersioninliquidnitrogen,orfixedinformalinembeddedin/processedtoparaffinforlong‐termstorage.Bloodiscollectedinatubecontaininganaqueoussolutiontetra‐sodiumsaltofethylene‐diamine‐tetra‐aceticacid(EDTA)andmixedwithairtolysetheerythrocytesandconverthemoglobintooxyhemoglobin.Cutsectionsorprepareasmearonaslide,tissuesarestainedbyimmunofluorescentmarkersspecificforcertainsurfaceantigens.Screwthelidoftheurinecontainershut,putinasealedplasticbagandstoreitinthefridgeataround4degreesCelsius,formax.24hours.
Determinationofthevalueofthebiologicalsample
Methodsusedforcountingorquantifyingtheamountofthesubstanceorentityofinterestc
Theabsorbanceofoxyhemoglobinat540nmthroughspectrophotometryquantifiesthehemoglobinconcentrationinthesample.Thepresenceofthemarkeronthecellsurfaceisdetectedandquantifiedbyfluorescencesignalintensity.Raterobserveseachslideandcountspositivecellsinanarea.Acalculationortheapplicationofamathematicalformulatothepreparedsample.
![Page 19: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/19.jpg)
19
aProfessionalsarethosewhoareinvolvedinthepreparationortheperformanceofthemeasurement,inthedataprocessing,orintheassignmentofthescore;thismaybedonebyoneandthesameperson,orbydifferentpersons;bIntheCOSMINmethodologyweusetheword‘patient.’However,sometimesthetargetpopulationisnotpatients,bute.g.healthyindividuals,caregivers,clinicians,orbodystructures(e.g.joints,orlesions).Inthesecases,thewordpatientshouldbereadase.g.healthyvolunteer,clinician,orrelevantbodystructure;cThevaluecanbefurtherprocessedintoaclinicalscore,ifapplicable,byalinearorsemi‐quantitativeconversion.Forexample,acontinuousscoreisclassifiedintoanordinalscore(e.g.mild/moderate/severe),ascoresisdichotomizedintobeloworaboveanormalvalue,patientsareclassifiedasresponderontreatment(e.g.whentheirchangeislargerthantheMinimalImportantChange(MIC)value).Asnonoisewilloccurfromthisconversion,thisisnotapotentialsourceofvariance,butratheraninterpretationofthevalue.Thereforewedonotincludethisphaseinthecomponentsforoutcomemeasurementinstrumentsthatinvolvebiologicalmaterials.
![Page 20: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/20.jpg)
20
2.2ExtractingtheelementsofacomprehensiveresearchquestionBeforewecancomprehensivelyassesstheinformationinastudyonthereliabilityormeasurementerrorofaninstrument,weneedtofullyunderstandthedesignofthestudyandreformulatetheresearchquestionintowhatwecalla‘comprehensiveresearchquestion’.Oftenthepublishedresearchquestionisnotspecificenoughtoratetheadequacyofthestudydesign.Forexample,ifthestatedaimoftheirstudyistoassessinter‐raterreliabilityofaninstrument,itisclearthatraterswillbevaried.However,withoutfurtherinformationitisnotclearwhethertheinterestisintheinter‐raterreliabilityofthewholemeasurementprocedure(e.g.bydifferentclinicians),oronlyinthereliabilityofapartofthemeasurementprocedure(e.g.onlytheassignmentofthescorebasedonanimage).Togetacompletepicture,werecommendtoextractsevenelementsfromthepublicationthattogethercanformthe‘comprehensiveresearchquestion’(seeTable4).Notethatonearticlecancontainmultiplequestions,eachrequiringanextractionofthesevenelements.Table4.Elementsofacomprehensiveresearchquestion.1 thenameoftheoutcomemeasurementinstrument2 theversionoftheoutcomemeasurementinstrumentorwayofoperationalizationofthe
measurementprotocol3 theconstructmeasuredbythemeasurementinstrument4 aspecificationwhetheroneisinterestedinareliabilityparameter(i.e.arelative
parametersuchasforcontinuousoutcomesanICC,Generalizabilitycoefficientφ,orKappaκ)oraparameterofmeasurementerror(i.e.anabsoluteparameterexpressedintheunitofmeasuremente.g.SEM,LoAorSDC;orforcategoricaloutcomesexpressedasagreementormisclassification,e.g.thepercentagespecificagreement).
5 aspecificationofthecomponentsofthemeasurementinstrumentthatwillberepeated(especiallywhenonlypartofthemeasurementinstrumentisrepeated,e.g.onlyassignmentofthescorebasedonthesameimages)
6 aspecificationofthesource(s)ofvariationthatwillbevaried(e.g.timeoroccasion,the(levelofexpertiseof)professionals,themachines,orothercomponentsofthemeasurement)
7 aspecificationofthepatientpopulationstudiedICC=Intraclasscorrelationcoefficient;SEM=standarderrorofmeasurement;LoA=LimitsofAgreement;SDC=smallestdetectablechange.
![Page 21: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/21.jpg)
21
ElaborationontheelementsofacomprehensiveresearchquestionElement1.ThenameoftheoutcomemeasurementinstrumentThenameoftheinstrumentshouldbeexactlyspecified.Sometimes,thisisreadilyapparent,e.g.the6minuteWalkingtest(6MWT)ortheNineHolePegTest(NHPT).Insomecases,ameasurementprotocolinvolvesmultiplemeasurementinstruments(e.g.theMultipleSclerosisFunctionalComposite(MSFC)includestheTimed25‐FootWalktest,theNineHolePegTest,andthePacedAuditorySerialAdditionTest(11)),whileinothercases(e.g.imaging)theremaynotyetbeaclearname.Notethatthenameofthemachineisnotthenameoftheoutcomemeasurementinstrument;oftenamachinecanbeusedtomeasureavarietyofparameters(e.g.Greyscaleultrasound[tomeasure]synovialthickening(synovialhypertrophy)orDopplerultrasound[tomeasure]increasedbloodflow(Synovialhyperemia)(19)),orapathologicalentitycanbemeasuredbydifferenttypesofimages(forexample,enthesitismeasuredbyultrasound(17)orbyMRI(20)).Werecommendtoincludethetypeofmeasurement(e.g.ultrasound)incombinationwiththeentitymeasuredasthenameofthescore(e.g.ultrasoundenthesitisscore).Element2.TheversionoftheoutcomemeasurementinstrumentorwayofoperationalizationofthemeasurementprotocolDetailsontheversion,andoperationalizationoftheoutcomemeasurementinstrumentshouldbeextracted.Detailsonspecificversionreferthee.g.thelengthofthetask(e.g.the2‐,6‐or12‐minutewalkingtest(21)),orthenumberofitemsincludedintheversion(e.g.Doloplus‐1orDoloplus‐2(13)),orthelanguageused(theEnglish(21)orDutchversion(22)ofthe6‐minutewalktest).Choicesinhowthemeasurementprotocolwasoperationalizedmayaffectthemeasurement,andshouldthusbemadeexplicit.Specifically,thecomponentsthatarepotentialsourcesofvariation,needtobelisted,forexample,specificcharacteristicsoftheequipmentused(e.g.brandandtypeofthemachine),andcharacteristicsoftheprofessionalsinvolvedinthemeasurement(e.g.backgroundandexperiences).Thetaxonomyofthecomponentsofmeasurementinstruments(seechapter2.1)canbeusedforthis.Element2referstocomponentsknownorexpectedtoinfluencethescorethatarenottheobjectofstudy.Toeliminatetheinfluenceofthesepotentialsourcesofvariationonthescoresobtained,thesecomponentsshouldhavebeenrestrictedorstandardizedinthestudy.Forexample,ifitisexpectedthatdifferenttypesorbrandsofmachinesmayinterferewiththescore,onlyonetypeandbrandofamachineisused(andreported).InthestudybySkeieetal(2015)onlytheMedisonAccuvixV10ultrasoundscannerwitha3–7MHzcurvilinearprobewasused(19)–inotherwords,thebrandandtypeofmachineandprobewasstandardized.Moreover,chiropractorswithrespectively4and8yearsofexperiencedindiagnosticultrasoundforthemusculoskeletalsystem,andwitha
![Page 22: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/22.jpg)
22
postgraduatediplomaindiagnosticultrasoundwereinvolvedinthemeasurements(19).Thus,thebackgroundoftheraterswasrestrictedtoaspecificprofession(i.e.chiropractors)withspecificdurationofexpertise(4/8yearsindiagnosticultrasound)havingreceivedspecifictraining.Inaddition,insomecasestheinstrumentprocedurerequiresmultiplereadings,andasummarystatistic(usuallythemean,butsometimesthemedian,maximumorminimum)iscalculatedasorusedtoassignthefinalscore(i.e.theresultsofthemeasurement).Awell‐knownexampleisbloodpressuremeasurementintheclinic.1Howthemeasurementistaken,shouldbespecified,asitisneededtoassessstandards7(seechapter3).ForpeoplefamiliarwiththeterminologyoftheGeneralizabilityTheory,theversionorthewayofoperationalizationofthemeasurementinstrumentreferstothefacetsofstratification,wherepatients(i.e.theobjectofmeasurement)arenestedinafacet(23).
Element3.TheconstructmeasuredbythemeasurementinstrumentToidentifyexactlywhichoutcomemeasurementinstrumentwasstudied,werecommendtoextracttheconstructmeasured,unlessitisclearfromthegivenname.Theconstructreferstowhatisbeingmeasured,i.e.the‘aspectofhealth’.Itisalsoreferredtoasthe‘conceptofinterest’orthe’intendedobjectivetobemeasured’.Whenthemeasurementinstrumentdoesnothaveaname,identifyingtheconstructcanhelptofullycharacterizetheoutcomemeasurementinstrument(whichwealsorecommendtomentioninthename,i.e.element1).Table5providessomeexamples.Notethatastudyonreliabilityormeasurementerrordoesnotprovideinformationaboutwhetherindeedtheconstructisbeingmeasured,forthatyouneedvalidityandaccuracystudies.
1 To measure blood pressure, the technician first palpates the radial artery, inflates the cuff until the pulse disappears, inflates an extra 20-30 mm Hg, and then slowly deflates until the pulse reappears. The pressure is noted, and the measurement begins: first, the stethoscope is placed on the brachial artery just medial and above the cubital fold. Then the cuff is reinflated. The pressure is quickly increased to 30 mm Hg above the previous reading, and then slowly deflated until the pulse sounds are detected (systolic blood pressure, measured in 2 mm increments), then further deflated until the sounds disappear (diastolic blood pressure). The cuff is fully deflated, then inflated again to repeat the measurement.
![Page 23: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/23.jpg)
23
Table5.Examplesofelements1,2,and3.
Element 1: name Element2:version/operationalization Element3:construct
Nineholepegtest(24)
Awoodenorplasticboardwith9holes(10mmdiameter,15mmdepth),placedapartby32mm(25)
Fingerdexterity
Ultrasound enthesitis score
Sonography images obtained by experienced sonographers using the Esaote Technos MPX machine
Enthesitis
HbA1cvaluebasedonimmune‐turbidimetry(12)
Turbidimetricinhibitionimmunoassay(TINIA),including2reagens(i.e.anti‐HbA1cantibody(R1),andbuffer/polyhaptenreagent(R2));Tetradecyltrimethylammoniumbromide(TTAB)isdetergent;Roche/Hitachicobascsystems.
HbA1c(glycatedhaemoglobin)
Element4.Specificationofthemeasurementpropertyofinterest
Whenthemeasurementpropertyofinterestisreliability,thestudywillreportrelativeparameterssuchasanICC,Generalizabilitycoefficientφ,orKappaκ.Whenthemeasurementpropertyofinterestismeasurementerror,thestudywillreportabsoluteparameters,eitherexpressedintheunitofmeasurement,suchasSEM,LOAorSDC,orexpressedasagreementormisclassification,e.g.thepercentagespecificagreement.
WerecommendtousetheCOSMINterminologytodeterminewhetherastudyassessedreliabilityormeasurementerror,regardlessofthetermsusedinthearticle,becauseconfusionpersistsaboutthecorrectapplicationoftheseterms.Forexample,wheninaparticulararticleitisstatedthat‘reliability’wasassessed,butthestandarderrorofmeasurement(SEM)orthelimitsofagreementarereported,theresultofthatstudyshouldbeconsideredasevidenceformeasurementerror(26).Whenanauthorstatestohaveevaluated‘agreementbetweenraters’usingthekappastatistic,theresultofthisstudyreferstothereliabilityoftheoutcomemeasurementinstrument(27).
![Page 24: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/24.jpg)
24
Element5.Specificationofthecomponentsofthemeasurementinstrumentthatwillberepeated.(Figure1)
Itshouldbeextractedwhethertheinterestofthestudyisinthereliabilityormeasurementerrorofthewholemeasurementprocedure(seeFigure1,studyA),oronlyinpartofthemeasurementprocedure(seeFigure1,studyB).Forexample,basedonanstaticimagethatwasmadeonceforapatient,onlytheassignmentofthescorewasrepeated,ortheperformanceofataskofeachpatientwasvideotaped,andonlythelastcomponent(i.e.assignmentofthescores)isrepeated.
Figure1.Whichpartofthemeasurementisrepeated.
Element6.Specificationofthecomponentsofthemeasurementinstrumentthatwillbevaried
Thecomponentofthemeasurementinstrumentthatisbeingvariedacrossthemeasurementsisthemainfocusofthestudy.Examplesaretimeoroccasion(test‐retest,orintra‐rater),theprofessionals(inter‐rater),orthemachines(inter‐machineorinter‐device)(28).Forexample,inFigure1ratersarevaried:raterAconductsthefirstmeasurementandraterBconductsthesecondmeasurementforeachpatients.
![Page 25: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/25.jpg)
25
Inthedesignofthestudyoneormoresourcescanbeconsidered.Forexample,boththemachineandtheraterwhoconductsthewholemeasurementarevariedacrosstherepeatedmeasurements(seeFigure2,studyA).Thetaxonomiesofcomponentsofmeasurementinstruments(seechapter2.1)canbeusedtoconsidervariouspotentialsourcesofvariation.
Figure2.Designsinwhichcomponentsarevariedacrossrepeatedmeasurements
Alternatively,theresearcherscanassumethatacomponent(e.g.preparationorassignmentofthescore)is‘stable’,inotherwords,thattheraterwhopreparesthemeasurementorwhoassignsthescorewillnotintroduceerrorinthispartofthemeasurement(indicatedingreyinFigure2studyBandC),andinvestigateonlytheinfluenceofthecomponents(e.g.)equipment,preparation,collectionofrawdataanddataprocessingandstorage.
InthedesignsshowninFigure1and2weassumethatallpatientsweremeasuredthisway.Thisiscalledacrosseddesign(29).However,so‐callednesteddesignsarepossible,too(seeFigure3).Inthesedesigns,partofthepatientsaremeasuredfollowingmeasurementconditionsAandotherpatientsaremeasuredusingmeasurementconditionsB.InFigure3anestedinter‐raterreliabilitydesignisshown,wheresomeofthepatientsaremeasuredfirstbyraterAandnextbyraterB(i.e.measurementconditionA),whileotherpatientsaremeasuredfirstbyRaterCandnextbyraterD(i.e.measurementconditionB),etc.Thesedesignsareappropriatetouse,andinthecalculationoftheICC,thiscouldbetakenintoaccount.Forexample,bycalculating
![Page 26: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/26.jpg)
26
variancecomponentspermeasurementcondition,andnextpoolthesevariancecomponents(weightedbysamplesize)acrossthemeasurementconditions(e.g.(30)),orbyusingaone‐wayrandomeffectsmodel(31).
Figure3.Nestedinter‐raterreliabilitydesign.
ForpeoplefamiliarwiththeterminologyoftheGeneralizabilityTheory,thecomponentsthatarebeingvariedacrossmeasurementsarecalledtherandomorfixedfacetsofGeneralizability(23).
Element7.Patientpopulation
Thereliabilitydependsonthehomogeneityorheterogeneityofthestudypopulation.Therefore,thesample(anditssubgroups)includedinthestudyshouldbeextractedandassessedbytheuserofthistool.InthestudybySkeieetal(2015)therecruitedsampleconsistedoflowbackpatients,patientswithotherspinalcomplaints,butalsoofpain‐freesubjects.Thislattergroupcouldhaveincreasedthevariancebetweenpatients,andsubsequently,influencedtheresults(i.e.increasedtheICC)ofthereliabilitystudy.
IntheCOSMINmethodologyweusethewordpatient.However,sometimesthestudypopulationofinterestconsistsofhealthyindividuals,bodystructures(e.g.joints,kidneys),cliniciansorcaregivers.Inthesecases,thewordpatientshouldbereadase.g.healthypersonorcaregiver.
ForpeoplefamiliarwiththeterminologyoftheGeneralizabilityTheory,thepatientpopulationreferstotheobjectofmeasurementorthefacetsofdifferentiation(23).
![Page 27: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/27.jpg)
27
2.3ExampleofhowtousePartAoftheCOSMINRiskofBiastooltoassessthequalityofastudybySkeieetal.(2015)
InthischapterweprovideanexampleofhowtousetheCOSMINtool–PartAusingapaperbySkeieetal.(19).Togetafullunderstandingofthestudy,werecommendtofirstreadtheintroductionandmethodsectionofthepaper.Inthispaperfourdifferentstudiesaredescribed.Hereweusethefirsttwosubstudies,andprovideasummaryofthesetwostudies.
Inthispaper,thelumbarmultifidusmuscle(LMM)thicknessscore(study1)andcontractionscore(study2)wasinvestigatedbyultrasound.Themeasurementproceedsasfollows:apatientisaskedtolaydowninaspecificposition,andtheprobeisplacedonaveryspecificbodypart.Thisyieldsanon‐screenimage.Subsequently,amarkerisplacedonaspecificstructure(i.e.theapexofthefacetjoint)identifiedontheimage.Instudy1,astillimageisrecorded,andthefirstraterplacesthesecondmarkeronanotherspecificstructure(i.e.processusmammillaris)onthisimage,andmeasuresthedistancebetweenthemarkerswiththecallipersoftware.ThetwomarkerscorrespondwiththethicknessoftheLMM.Thefirstraterrepeatsthesecondmarkerplacementanddistancemeasurementonthestillimagetwice,foratotalofthreemeasurements.Thepatientleaves.Next,basedontheverysamestillimage(withonlythefirstmarkervisible)asecondraterplacesthesecondmarkeronthescreenandmeasuresthedistanceatotalofthreetimes.Next,alldataistransferredtoaseparatepaperbyrater1whocalculatesameanvalueperpatientperrater.ThismeanvalueistheLMMthicknessscore.Therepeatedplacementofthesecondmarkeronthestillimageandapplicationofthecalipertooltomeasurethedistancebetweenthetwomarkersispartofonemeasurement(19).ThisprocedureisdepictedinFigure3,study1.
Figure3.StudydesignsofSkeieetal.
![Page 28: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/28.jpg)
28
Instudy2,foreachpatienteachoftheratersindependentlygeneratedoneimageoftheLMMintherestingstateandoneimageoftheLMMincontractedstate.Usingasplit‐screenofthetwostillimagesofbothstates,eachratermeasuredthickness(i.e.caliper‐assesseddistancebetweenthemarkers)ofthetwostatesthreetimes.Next,rater1transferredthedatatoaseparatepaperandcalculatedmeanvalues of the thickness of each state. Next,rater1calculatedthe‘LMMcontractionscore’astheexactchangeinthickness(contractedLMMminusrestingLMM)(19).ThisprocedureisdepictedinFigure3,study2.
BasedonthethoroughelaborationofthestudyperformedanddescribedbySkeieandcolleagues,weextracttheelementsofacomprehensiveresearchquestion.
Table6.ExampleofhowtousePartAoftheCOSMINRiskofBiastoolbasedonthestudybySkeie(19).
Element Instruction Study1 Study21.Nameoftheinstrument
Alternatively:typeofinstrumentandparameter
Ultrasoundmeasurementofthelumbarmultifidusmuscle(LMM)thicknessscore
UltrasoundmeasurementoftheLMMcontractionscore
2.Versionorwayofoperationalization
Allrelevantcomponentsthatareknownorexpectedtoinfluencethescore,andwhicharestandardizedorrestricted(facetofstratification(23))
Equipment:MedisonAccuvixV10ultrasoundscannerwitha3–7MHzcurvilinearprobe;Preparatoryactions:twochiropractorswith4respectively8yearsofexperienceindiagnosticultrasoundforthemusculoskeletalsystem,withapostgraduatediplomaindiagnosticultrasound;stillon‐screenimageswereobtainedwiththesubjectsinapronepositionwithapillowplacedundertheabdomentoflattenthelumbarlordosis.Preparation:Imagewason‐screengeneratedandamarkerwasplacedontheimageonthemamillaryprocessoftheleveltobemeasured.Unprocesseddatacollection:Thesecondmarkerwasplacedontheon‐screenimage,andthedistancewascomputedbythecallipersoftware.Thispartwasrepeatedthreetimes.
Preparation:Inrestingposition,animagewason‐screengeneratedandamarkerwasplacedontheimageonthemamillaryprocessoftheleveltobemeasured.Next,incontractedstate(LMMcontractionwasinducedbyacontralateralarmliftingtask),animagewason‐screengenerated,too,andamarkerwasplacedontheimage.
![Page 29: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/29.jpg)
29
Element Instruction Study1 Study2Dataprocessingandstorage:Dataistransferredtoaseparatepaperbyrater1.
Unprocesseddatacollection:basedonthesplit‐screenofbothimages,thesecondmarkerwasplacesoneachimage,andthedistance(perimage)wascalculatedbythecallipersoftware.Thispartwasrepeatedthreetimes.Dataprocessingandstorage:Dataistransferredtoaseparatepaperbyrater1.
Assignmentofthescore:Rater1calculatedameanvalueperpatientperrater.
Assignmentofthescore:Rater1calculatesameanvalueperpatientperraterforbothstates.Next,theratercalculatedthe‘LMMcontractionscore’astheexactchangeinthickness(contractedLMMminusrestingLMM).
3.Construct Descriptionofwhatisbeingmeasured
LMMthickness LMMcontraction,whichischangeinLMMthicknessincontractedandrestingstate(contractedLMMminusrestingLMM).
4.Measurementproperty
Reliabilityandmeasurementerror
Reliabilityandmeasurementerror
5.Componentsthatwillberepeated
Eitherthewholemeasurement(i.e.allcomponents)ortheassignmentofthescore(i.e.lastcomponent)
Thewholemeasurementwillberepeated.However,thefocusofinterestinontheunprocesseddatacollection:placingofthesecondmarkerontheon‐screenimage(meanofthreetimes).
Thewholemeasurementwillberepeated.However,thefocusofinterestinonthepreparation(i.e.preparationandgenerationofimagesintherestingandcontractedstates,andtheplacingofthefirstmarker),andontheunprocesseddatacollection(placingofthe
![Page 30: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/30.jpg)
30
Element Instruction Study1 Study2secondmarkerontheon‐screenimage(meanofthreetimes).
6.Source(s)ofvariationvaried
Componentswhichisvariedacrossthemeasurements(i.e.focusofanalysis;facetofgeneralizability(23))
Raters(n=2;inter‐raterreliability)
Raters(n=2;inter‐raterreliability)
7.Patientpopulation
(i.e.facetofdifferentiation(23))
LBPpatients,patientswithotherspinalcomplaintssuchasmidbackpain,neckpain,and/orextremitypain,andpain‐freesubjects(n=30ineachexperiment,totaln=120)
Basedontheextractedinformation,acomprehensiveresearchquestioncanbeformulatedas:
Study1:Whatistheinter‐raterreliabilityofthedatacollectionphaseofthelumbarmultifidusmuscle(LMM)thicknessscorebasedonthemeanofthreemarkeddistancewiththecallipersoftwareonastillimageoftheultrasoundmeasurement,measuredusingtheMedisonAccuvixV10ultrasoundscannerwitha3–7MHzcurvilinearprobebypost‐graduateexperiencedchiropractors,inLBPpatients,patientswithotherspinalcomplaintssuchasmidbackpain,neckpain,and/orextremitypain,andpain‐freesubjects?
Study2:Whatistheinter‐raterreliabilityofpreparing,generating,anddatacollectionphasesofthelumbarmultifidusmuscle(LMM)contractionscore,basedonthemeanofthreemarkeddistancewiththecallipersoftwareonanon‐screenimageinrestingandincontractionstateoftheultrasoundmeasurement,measuredusingtheMedisonAccuvixV10ultrasoundscannerwitha3–7MHzcurvilinearprobebypost‐graduateexperiencedchiropractors,inLBPpatients,patientswithotherspinalcomplaintssuchasmidbackpain,neckpain,and/orextremitypain,andpain‐freesubjects?
Please,notethatwedonotrecommendtoreporttheresearchquestionalwaysasthisinonelongquestion.Though,weconsideritveryusefultodescribeallthisinformationclearly,e.g.inthemethodsectionofapaper.
![Page 31: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/31.jpg)
31
3. PartB.Assessingtheriskofbiasofastudyonreliabilityormeasurementerror
PartBoftheCOSMINRiskofBiastoolcontainstwoboxeswithstandardsthatcanbeusedtodeterminewhethertheresultofastudyonreliabilityormeasurementerror,respectively,canbetrusted.Standardsrefertothedesignrequirementsofthestudyortothepreferredstatisticalmethods.Thestandards1to5inbothboxesrefertodesignrequirements.Thesestandardsarethesameforstudiesonreliabilityandforstudiesonmeasurementerror,asthesamedesigncanbeusedforassessingbothmeasurementproperties.Threestandardsrefertothepreferredstatisticalmethodsforstudiesonreliabilityandtwostandardsrefertothepreferredstatisticalmethodsforstudiesonmeasurementerror.IntheCOSMINRiskofBiastool,weincludedstandardsconcerningthepreferredstatisticalmethodsthatareappropriatetousewhenevaluatingreliabilityormeasurementerrorofoutcomemeasurementinstruments(seealsosection1.6).Othermethodsmaybeappropriatetouseaswell(forexamplebi‐factormodelsorMulti‐TraitMulti‐Method(MTMM)analyses,ornewlydevelopedmethods).Itisnotourintentiontocomprehensivelydescribeallpossiblestatisticalmethods,rathertodescribetheadequatemethodsthatarecommonlyusedintheliterature.Eachboxalsocontainsastandardaskingiftherewereanyotherimportantmethodologicalflawsthatarenotcoveredbytheotherstandards(standard6),butthatmayhaveledtobiasedresultsorconclusions.Someflawsareratheruncommon,andtherefore,donotjustifyaseparatestandard.Inchapter3.1weprovideseveralexamplesfortheseflaws.Eachstandardwillbescoredonafour‐pointratingsystem(i.e.‘verygood’,‘adequate’,‘doubtful’,or‘inadequate’)inlinewiththeCOSMINRiskofBiaschecklistforPatient‐ReportedOutcomeMeasures(PROMs)(1).Subsequently,thelowestratinggiveninaboxdeterminesthefinalrating,i.e.thequalityofthestudy(thisiscalledtheworst‐score‐countsmethod(18)todeterminetheriskofbias).Sometimesaresponseoptionisindicatedingrey,meaningthattheresponseoptionisnotapplicableforthestandard,andusersshouldchoosebetweentheotheroptions.Final,somestandardscanberatedas‘notapplicable’.Ingeneral,astandardonadesignrequirementisratedas‘verygood’whenthereisevidenceorconvincingargumentswereprovidedthatthestandardismet;‘adequate’whenitisassumable,althoughnotexplicitlydescribed,thatthestandardismet;‘doubtful’whenitisunclearthatthestandardismet;and‘inadequate’whenthereisevidencethatthestandardisnotmet(18).Astandardaboutpreferredstatisticalmethodsisingeneralratedas‘verygood’whenapreferredmethodwasoptimallyused;‘adequate’whenthepreferredmethodwasused,
![Page 32: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/32.jpg)
32
butitwasnotoptimallyused,‘doubtful’whenitisunclearifapreferredmethodwasused,and‘inadequate’whenthestatisticalmethodsusedareconsideredinadequate.Theboxesforreliabilityandmeasurementerror,respectively,canbefoundhere.Below,anelaborationofeachstandardisdescribedforreliability(chapter3.1)andmeasurementerror(chapter3.2).Inchapter3.3weprovideanexampleforratingtheboxonreliabilityinthestudybySkeie,thatwasalsousedasanexampleinchapter2.3.
![Page 33: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/33.jpg)
33
3.1ElaborationonstandardsforstudiesonreliabilityTheboxonreliabilitycontainsfivestandardsaboutdesignrequirements,onestandards‘otherflaws’andthreestandardsaboutpreferredstatisticalmethods.Foreachstandardwegivesuggestionsforhowtoratethestandard.Standard1.Stabilityofthepatient verygood adequate doubtful inadequate NA
Werepatientsstableinthetimebetweentherepeatedmeasurementsontheconstructtobemeasured?
Yes(evidenceprovided)
Reasonstoassumestandardwasmet
Unclear No(evidenceprovided)
Notapplicable
Elaboration:Patientsshouldbestablewithregardtotheconstructtobemeasuredbetweentherepeatedmeasurements.Whenaninterventionsuchassurgeryormedicationisgivenintheinterimperiod,itislikelythat(manyof)thepatientshavechangedontheconstructtobemeasured.Inotherwords,theyarenotstable–andthestandardshouldberatedas‘inadequate’.Whentheaimistoassessthereliabilityoftheassignmentofthescore,e.g.usingstaticimagesorvideosoftheperformanceofataskasobjectofinterest(seeFigure1study2–page24),thisstandardisnotapplicableastheimagesandvideoswereacquiredonlyonce.Furthermore,themeasurementcaninterferewiththestabilityofthepatient.Forexample,thereshouldbeenoughtimeforpatientstorecoverfromexperiencedpainorfatiguebetweenrepeatedmeasurementsandpermitpatientstoreturntotheirinitialstate.Ifnot,thestandardshouldberatedas‘doubtful’,asitisunclearwhetherthepatientsarestableontheconstructtobemeasured.Whenevidenceorconvincingargumentsareprovidedthatthepatientswerestable,thestandardisscored‘verygood’.Standard2:Timeinterval verygood adequate doubtful inadequate
Wasthetimeintervalbetweenthemeasurementsappropriate?
Yes Doubtful,ORtimeintervalnotstated
No
Elaboration:Thetimeintervalbetweenthemeasurementsmustbeappropriate.Thedefinitionof“appropriate”dependsontheconstructtobemeasuredandthestudypopulation.Thetimeintervalshouldbelongenoughtopreventrecallbiasofpreviousscoresincaseofintra‐raterreliability,andshortenoughtoensurethatpatientshavenotchangedontheconstructtobemeasured.Forexamplesynovitiscanchangeinafewdays,whileachangeincartilageorbonestatuswouldtakeafewmonths.
![Page 34: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/34.jpg)
34
Standard3.Similarmeasurementconditions
verygood adequate doubtful inadequate
Werethemeasurementconditionssimilarforthemeasurements–exceptfortheconditionbeingevaluatedasasourceofvariation?
Yes(evidenceprovided)
Reasonstoassumestandardwasmet,ORchangewasunavoidable
Unclear No(evidenceprovided)
Elaboration:Eachrepeatedmeasurementshouldbeconductedwiththesamemeasurementprotocol–exceptforthesourceofvariationthatwasintentionallyvaried,i.e.element6ofthecomprehensiveresearchquestion(seechapter2.2).Forexample,iftheaimwastounderstandthevariationduetodifferentraters(i.e.inter‐raterreliability),onlytheratersshouldbevaried.Otherconcomitantsourcesofvariation(i.e.element2ofthecomprehensiveresearchquestion,seechapter2.2)shouldbekeptsimilar.Wasthestudyuptostandard?Wereallequipment,preparatoryactions,theenvironmentalconditions(e.g.temperature),andmethodsofprocessingthesameinbothmeasurements?Forexample,whenthepatientsareverylikelytoshowalearningeffect(forexampleonaperformance‐basedtest),theabsenceofafamiliarizationsessionshouldyieldaratingofdoubtfulorinadequateonthisstandard,asthefirstmeasurementcanthenbeconsideredtobethefamiliarizationsession,andthemeasurementconditionsarenotthesame.Adescriptionofsimilarityofthemeasurementconditionsoftherepeatedmeasurementscanbeconsideredasevidence.Standards4.AdministrationofmeasurementsIninstrumentsthatdonotinvolvebiologicalsampling,theadministrationreferstothecomponents‘Collectionofrawdata’and‘Dataprocessingandstorage’(seechapter2.1).Ininstrumentsinvolvingbiologicalsampling,itreferstothecomponents‘Collectionofbiologicalsampling’and‘Biologicalsamplingprocessingandstorage’(seechapter2.1). verygood adequate doubtful inadequate
Didtheprofessional(s)administerthemeasurementwithoutknowledgeofscoresorvaluesofotherrepeatedmeasurement(s)inthesamepatients?
Yes(evidenceprovided)
Reasonstoassumestandardwasmet
Unclear No(evidenceprovided)
Elaboration:Allmeasurementsshouldbeadministeredbytheprofessional(s)involvedwithoutthemhavingknowledgeofthescoresorvaluesofotherrepeatedmeasurementsonthesameoutcomemeasurementinstrument.Thismeansthatthemeasurementsshouldallbeadministeredwithoutknowledgeoftheprior(e.g.incaseofanintra‐raterreliabilitystudy)orother(e.g.incaseofaninter‐raterreliabilitystudy)score(s)orvalue(s)ontheinstrumentofinterest.
![Page 35: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/35.jpg)
35
Theratingofthisstandardisrathersubjective.Forexample,ifinastudytheratersindependentlyadministeredthemeasurement,andnonewereinvolvedinthecareofthepatients(makingitveryunlikelythattheratersreceivedadditionalinformationofthepatientsincludingknowledgeofthescore(s)ofotherrepeatedmeasurements),thiscanbeconsideredas‘evidenceprovided’,andtheratingis‘verygood’.Whentheotherscoreisknowntotheprofessionalwhileadministeringtherepeatedmeasurement,itmayinfluencethewaythemeasurementisadministered.Forexample,withaseverescoreobtainedwithanimagingtechnique,therepeatedmeasurementcanbeadministeredmorecarefully,andmoretimecanbeusedtolookatthepatient.Ifitisknownthatthiswasthecase,theratingis‘inadequate’.Whenthereisnoexplicitdescription,butitseemsveryunlikelythattheratersknewthescoresorvaluesofotherrepeatedmeasurements,itcanberatedas‘adequate’,or‘doubtful’.Insomesituationsthisstandardisnotapplicable,i.e.whentheadministration(i.e.collectionoftherawmaterialorbiologicalsample,dataorsamplingprocessingandstorage)isnotrepeatedinthestudy,butonlytheassignmentofthescoreorthedeterminationofthevalue(seeforexampleschapter2.2element5ofthecomprehensiveresearchquestion,orFigure1study2).Standard5.Assignmentofthescoreordeterminationofthebiologicalvalue
verygood adequate doubtful inadequate
Didtheprofessional(s)assignscoresordeterminevalueswithoutknowledgeofthescoresorvaluesofotherrepeatedmeasurement(s)inthesamepatients?
Yes(evidenceprovided)
Reasonstoassumestandardwasmet
Unclear No(evidenceprovided)
Elaboration:Thescoresonallmeasurementsshouldbeassignedorvaluesshouldbedeterminedbytheprofessional(s)involvedwithoutthemhavingknowledgeofthescoresorvaluesofotherrepeatedmeasurements.Thismeansthatassigningascoretoameasurementordeterminingthevalueofabiologicalsampleshouldbedonewithoutknowledgeoftheprior(e.g.incaseofanintra‐raterreliabilitystudy)orother(e.g.incaseofaninter‐raterreliabilitystudy)score(s)orvalue(s)ontheinstrumentofinterest.Althoughpartofthedeterminationofthevalueofabiologicalsamplecanbeanautomaticstep,theremaybehumanactionrequiredtodothisdetermination.Forexample,anurinepHleveltesttomeasuretheacidityoralkalinityofurinewherethecolorofthestripisinterpretedbytheprofessional.Theratingissimilarlyasexplainedforstandard4.
![Page 36: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/36.jpg)
36
Standard6.Otherimportantflaws verygood adequate doubtful inadequate
Werethereanyotherimportantflawsinthedesignorstatisticalmethodsofthestudy?
No Minorothermethodologicalflaws
Yes
Elaboration:Thisstandardisincludedbecausetheremightbeuncommondesignflawsthatarenotcoveredbyotherstandardsbutthatmaycauseadditionalriskofbias.Below,someexamplesareprovided.Whenvariousprofessionalsareinvolvedinthemeasurementinstrument,andoneoftheprofessionalsistheattendingphysicianofthepatient,thisphysicianhas(much)moreinformationaboutthepatientthantheotherprofessionals.Insomesituations–dependingontheaimofthestudyandthespecificconstructtobemeasured–thiscouldbeconsideredaflawbecauseoftheinfluenceonthescoresobtained.InthepreviouschapterwesawintheexampleofSkeiethatpartofthesamplecomprisedhealthypatients,whereastheauthorswereultimatelyinterestedinthesemeasurementsinlowbackpainpatients(19).Asthiswillincreasethevariancebetweenpatients,anditwillincreasetheresultsofthestudy(i.e.theICCorGCoefficient).Dependingonwherethisstudysitsinthedevelopmentoftheinstrument,thiscouldbedeemedproper(whenthefullrangeofthescoresisnotyetknown)oranimportantflawwhenthepurposeistodeterminethereliabilityofmeasurementintheclinicalsettingoflowbackpain.AfinalexamplereferstotheuseoftheICCmodelforaveragescores.Althoughdiscussedunderstandard7forreliability,itmaybethattheICCforthemeanscoreofthemeasurementsisreported,whereasinclinicalpracticethesinglescoreisused.Dependingonthepurposeofthestudythiscanbeproper(whenthemeanscoreisgoingtobeusedinfutureresearch)oranimportantflawwhenthestudyisaimedatprovingreliabilityonclinicalpractice(wherethesinglescoreisused).
ItisuptotheuseroftheCOSMINRiskofBiastoolwhetheraflawisconsideredminor(andisratedas‘doubtful’)orimportant(andisratedas‘inadequate’).Thescoresoftheotherflawsareincludedintheoverallscore/ratingbasedontheworstscorecountsprinciple.
![Page 37: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/37.jpg)
37
Standard7:Preferredstatisticalmethodsforcontinuousscores verygood adequate doubtful Inadequate
Forcontinuousscores:wasanintraclasscorrelationcoefficient(ICC)calculated?
ICCcalculated;themodelorformulawasdescribed,andmatchesthestudydesignandthedata
ICCcalculatedbutmodelorformulawasnotdescribedordoesnotoptimallymatchthestudydesignORPearsonorSpearmancorrelationcoefficientcalculatedWITHevidenceprovidedthatnosystematicdifferencebetweenmeasurementshasoccurred
PearsonorSpearmancorrelationcoefficientcalculatedWITHOUTevidenceprovidedthatnosystematicdifferencebetweenmeasurementshasoccurredORWITHevidenceprovidedthatsystematicdifferencebetweenmeasurementshasoccurred
Elaboration:Forcontinuousscorestheintraclasscorrelationcoefficient(ICC)ispreferredtoevaluatereliability.ICCsareafamilyofstatisticalparameters,includingGeneralizability(G)coefficients,andDecision(D)coefficients.Togeta“verygood”rating,theICCmodelusedinthereliabilitystudyshouldmatchthestudydesign(andtheaim)ofthestudythatisbeingassessed.Therefore,themodelorformulaoftheICCorGCoefficientusedshouldbedescribed.Itshouldbeclear,e.g.whetheracrossedornesteddesignwasused(seealsopage25/26),orwhetheraone‐wayrandomeffectsmodel,two‐orthree‐wayrandomormixedeffectsmodelwasused.Next,itshouldbecomparedtothestudydesignusingtheextractedinformationfromPartA,anddeterminedwhethertheICCorGCoefficientusedindeedmatchesthestudydesign.TheICCbasedonthetwo‐waymixedeffectsmodelofconsistency(31)(alsoreferredtoasICCmodel3.1(32)),andthePearsonorSpearmancorrelationcoefficientdonottakeasystematicdifferencebetweentherepeatedmeasurementsintoaccount,andarethereforeconsideredlessappropriate,asitcanleadtooverestimatingthereliability.Therefore,basedoninformationofasystematicdifferencebetweenthesourceofvariationconsidered(e.g.raters)either‘adequate’(whennoorverylittlesystematicdifferenceoccurred),or‘doubtful’(whentherewasasystematicdifferencebetweene.g.theraters)canberated.Whenthestudywasdesignedtoinvestigateaspecificsourceofvariation(e.g.inter‐rater),andthesystematicdifferencesbetweenthissourceofvariationintherepeatedmeasurementswastakenintoaccountintheformula(forexample,byusingtheICCrandomeffectsmodelforagreement(31),alsoreferredtoasModel2.1(32)ortheφcoefficient(seee.g.(23)),thestudycanberatedas‘verygood’.Whenastudyisdesignedwithoutanyspecificsourceofvariationisconsidered,theappropriateICCmodelisaone‐wayrandomeffectsmodel(31).Inthissituationtheuse
![Page 38: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/38.jpg)
38
ofaone‐wayrandomeffectsmodelcanberatedas‘verygood’,whiletheuseofothermodelscanberatedas‘adequate’.Next,theICCcanbecalculatedforasinglemeasurementoranaveragemeasurement(31).Ifasinglemeasurementisnormallyusedinclinicalpracticeortrials(andnottheaveragescoreofmultiplemeasurements,suchisdonebyabloodpressuremeasurement),theICCforsinglemeasuresshouldhavebeencalculated.TheICCaveragereferstothereliabilityoftheaveragedscoreofthemeasurements,andreferstotheuseoftheaveragedscoreonrepeatedmeasurements.WhentheICCforaveragemeasuresisreported,inthesituationthatusuallyasinglemeasurementistaken,werecommendthisstandardtoberatedas‘adequate’,asthemodeldoesnotoptimallymatchthedesignofthestudy.However,wealsorecommendinthissituation,toratestandard6(i.e.otherflaws),as‘doubtful’oreven‘inadequate’(seealsotheexampleatstandard6).Moreover,togeta‘verygood’rating,thedescribedICCorGcoefficientmodelorformulashouldmatchthedata.Ifthereisa(known)problemwithnormaldistributionofthedata(normality)whichisnotproperlytakenintoaccount,thestudycouldberatedas‘adequate’insteadof‘verygood’.Itisimpossibletodescribeallotherflawshere,ThereforeitisuptotheuseroftheCOSMINRiskofBiastooltodecidehowtheidentifiedflawshouldbescored.Relevantquestioninthisregardishowcertainandhowlargetheinfluenceisonthestudyresult.Standard8:Preferredstatisticalmethodsforordinalscores verygood adequate doubtful inadequate
Forordinalscores:wasa(weighted)kappacalculated?
Kappacalculated;theweightingschemewasdescribed,andmatchesthestudydesignandthedata
Kappacalculated,butweightingschemenotdescribedordoesnotoptimallymatchthestudydesign
Elaboration:Toassessreliabilityforordinalscores,Cohen’skappa(33‐35)isconsideredthepreferredstatisticalparameter.Nobetteralternativeisknown(4,36).Informationonthespecifickappausedshouldbedescribedintermsofwhetheraweightingschemewasusedandwhichschemewasused.Unweightedkappaconsidersanymisclassificationequallyinappropriate.However,amisclassificationoftwoadjacentcategoriesmaybelesserroneousasamisclassificationofcategoriesthataremoreapartfromeachother.Aweightedkappatakesthisintoaccount(e.g.usinglinearorquadraticweights(37)).Ifthegoalofthestudywastoconsideranymisclassificationasequallyimportant,anditwasstatedthattheunweightedkappawasused,thisstandardcanberateda‘verygood’.However,inothersituation(e.g.misclassificationofcategoriesmore
![Page 39: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/39.jpg)
39
apartfromeachotherisabiggerproblemthatmisclassificationofadjacentcategories)aspecificweightingschemeismorepreferred.Ifunweightedkappacalculatedinthatcasethestandardcouldberatedas‘adequate’.Standard9:Preferredstatisticalmethodsfordichotomousornominalscores
verygood adequate doubtful inadequate
Fordichotomous/nominalscores:wasKappacalculatedforeachcategoryagainsttheothercategoriescombined?
Kappacalculatedforeachcategoryagainsttheothercategoriescombined
Elaboration:Astudyonreliabilityofanoutcomemeasurementinstrumentwithdichotomousornominalscoresgetsa‘verygood’score,whenanunweightedkappawascalculatedofeachcategoryagainsttheothercategories(33).
![Page 40: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/40.jpg)
40
3.2Elaborationonstandardsforstudiesonmeasurementerror
Standards1to6oftheboxforstandardsforstudiesonmeasurementerrorarethesameasforstudiesonreliability.Foranelaborationoneachofthestandards,pleaseseeabove.Standard7:Preferredstatisticalmethodsforcontinuousscores
verygood adequate doubtful inadequate
Forcontinuousscores:wastheStandardErrorofMeasurement(SEM),SmallestDetectableChange(SDC),LimitsofAgreement(LoA)orCoefficientofVariation(CV)calculated?
SEM,SDC,LoAorCVcalculated;themodelorformulafortheSEM/SDCisdescribed;itmatchesthereviewerconstructedresearchquestionandthedata
SEM,SDC,LoAorCVcalculated,butthemodelorformulaisnotdescribedordoesnotoptimallymatchthereviewerconstructedresearchquestionandevidenceprovidedthatnosystematicdifferencehasoccurred
SEMconsistencySDCconsistencyorLoAorCVcalculated,withoutknowledgeaboutsystematicdifferenceorwithevidenceprovidedthatsystematicdifferencehasoccurred
SEMcalculatedbasedonCronbach’salpha,ORusingSDfromanotherpopulation
Elaboration:ForcontinuousscorespreferredmeasuresforthemeasurementerrorofasinglescorearetheSEM,LoAortheCoefficientofVariation(CV);theSDCispreferredasameasureforchangescores.Differentformulascanbeusedtocometocalculatethesevariousmeasures.Therefore,wewillfirstdescribetheirformulas.Subsequently,wewillexplainthestandardforstudiesusingSEMandSDCderivedfromvariancecomponentsanalyses.Next,wewilldiscussLoA,SEMandSDCusingtheSDdifference.Wewillexplainwhenignoringtheinfluenceofthesourceofvariationisappropriate.Andlast,wewilldiscusssomeothermethodsused,includingtheCV.Measuresthattakeallerrorintoaccount,includingthesystematicdifferencebetweenrepeatedmeasurements,basedonaone‐wayortwo‐wayeffectsmodel,are:
(1)
(2)
1.96 ∗ √2 ∗ 1.96 ∗ √2 ∗ (3)
![Page 41: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/41.jpg)
41
Measuresthatdonottakethesystematicdifferencebetweenrepeatedmeasurementsintoaccount:
(4)
1.96 ∗ √2 ∗ 1.96 ∗ √2 ∗ (5)
√
(6)
1.96 ∗ √2 ∗ 1.96 ∗ √2 ∗√
(7)
1.96 ∗ (8)
1.96 ∗ (9)
Togeta‘verygood’rating,theformulausedshouldmatchthestudydesign(andtheaim)ofthestudythatisbeingassessed.Therefore,itshouldbeclearwhattheaimis,andwhichmeasureorwhichformulawasusedinthestudybeingassessed.Measurementerrorderivedfromvariancecomponentsanalyses(formulas1‐5)Thespecificmodelusedshouldbeclearlydescribed,e.g.whetheraone‐wayrandomeffectsmodel,oratwo‐orthree‐wayrandomormixedeffectsmodelwasused,andwhetherallerror(exceptfromthevarianceduetovariationbetweenpatients)wasincludedinthecalculationofthemeasurementerror,orwhetherthesystematicerrorbetweenthesourceofvariationthatisbeingvariedinthedesignisignored(i.e.asoccurredwhencalculatingSEMconsistencyforsinglescores(formula4)andSDCconsistencyforchangescores(formula5)).Next,itshouldbecomparedtothestudydesignusingtheextractedinformationaboutthecomprehensiveresearchquestion(seePartAofthetool),anddeterminedwhetherthemethodusedindeedmatchesthestudydesign.Inotherwords,whentheaimofthestudywastoassessthemeasurementerrorofasinglescoreofanymeasurementtakeninclinicalpracticeoftrials,theaimistogeneralizetheresultsbeyond(e.g.)thespecificratersinvolvedinthestudy.Inthiscase,thesystematicerrorbetweenratersshouldbetakenintoaccount;theraters(inthisexample)shouldbeconsideredrandom;andallerrorshouldbetakenintoaccount(i.e.formulas1‐3)tomatchthedesignofthestudy(andthisisrated‘verygood’).Ifinthiscase,(withtheaimtogeneralizebeyondthespecificraters)theSEMconsistency(formula4)orSDCconsistency(formula5)wascalculated(i.e.ignoringasystematic
![Page 42: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/42.jpg)
42
differencebetweenraters),evidenceshouldbeprovidedthatno(oronlyverysmall)systematicdifferencehasoccurredbetweentheraters.Incaseofnoorverysmalldifferencesthestandardcanberatedas‘adequate’,astheSEMagreement(formula2)andSEMconsistency(formula4),orSDCagreement(formula3)andSDCconsistency(formula5)willbethesameorveryclose.Ifitisunclearwhethersystematicdifferencesoccurred(becauseitwasnotreported),thestandardisratedas‘doubtful’.MeasurementerrorderivedfromtheSDdifference(formulas6‐9)ThemeasurementerrorofasinglescoreorachangescorecanalsobecalculatedusingtheSDdifference.Thisreferstothestandarddeviationofthedifferenceofthescoresontherepeatedmeasurements(38,39).InaBlandandAltmanplottworepeatedmeasurementsperpatientareplotted:onthex‐axesthemeanscoreofthetwomeasurements,andonthey‐axesthemeandifferencebetweentherepeatedmeasurements(39).Althoughtheplotisdesignedinsuchawaythatsystematicdifferencescaneasilybeseen(i.e.thelineofthemeandifferencesinscores,andtheasymmetricallylocatedlimitsofagreementaroundthezero),thesystematicdifferenceisdisregardedwhentheSDCiscalculatedfromtheselimits(resultingintheSDCconsistency).Therefore,ifa(large)systematicerrorbetweentherepeatedmeasurementsoccurred,whiletheaimofthestudyistogeneralizebeyondthespecificsourceofvariation(e.g.raters),thestandardshouldberatedas‘doubtful’,astheresultsofthestudyisunderestimatingthemeasurementerror.Whenisameasureofconsistency(formulas4‐9)appropriate?Sometime,thesourceofvariationthatisbeingvariedacrossthemeasurementsisconsideredtobefixedinastudy.Thismeansthattheaimofthestudyisnottogeneralizebeyondthespecificstudyobjectsincludedinthestudy.Forexample,inastudyonlytworatersareconsidered(e.g.theratersMyrtheandBrechtje),andtheaimofthestudyiswhetherthesetworaterswillcometoequalscores(e.g.becausetheywillbetheonlytworatersinvolvedinthemeasurementsforaspecifictrial).IfasystematicerroroccursbetweenMyrtheandBrechtje(e.g.Myrthesystematicallyscores5pointshighercomparedtoBrechtje),thescoresobtainedinthetrialcaneasilybeadjustedbyextracting5pointsofeachmeasurementobtainedbyMyrthe.Inthisstudy,thesourceofvariation‘rater’isdeemedirrelevant(31),asthesystematicdifferencewillbeadjustedlateronwhenusingtheinstrumentbyeitherMyrtheorBrechtje.Inthisspecificsituation,theSEMconsistency,SDCconsistencyorthelimitsofagreementmatchtheaimanddesignofthestudy,soitcanberatedas‘verygood’.However,theseresultscannotbegeneralizedtootherraters,as‘rater’wasconsideredfixed.Therefore,thestudyislessrelevantinothersituations,especiallywhenthereisasystematicdifferencebetweentheraters.
![Page 43: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/43.jpg)
43
MeasurementerrorcalculatedusingtheformulaSD*(√1‐ICC)ThereisanotherformulawhichissometimesusedtocalculatetheSEMfromtheICCformula:SEM=SD*(√1‐ICC)(40).ThestandarddeviationreferstotheSDpooledofthesample,thatisofSDtestandSDretest.UsingthisformulaisonlyjustifiedifthedataforICCandSDarederivedfromthesamestudy.WhentheSDisbasedonanotherpopulation,thisisconsideredinadequate,astheSDofthisotherpopulationmaybesmaller,andsubsequently,themeasurementerrorissmaller.Moreover,sometimestheCronbach’salphaisinsertedintheformulainsteadoftheICC.Thisisconsideredinadequate,asthismeasureisbasedononefull‐scalemeasurementwhereitemsareconsideredastherepeatedmeasurements,insteadofatleasttwofull‐scalemeasurementsusingthetotalscoreinthecalculationoftheSEM.OftenCronbach’salphaishigherthanICC’sbasedonrepeatedmeasurements,thusleadingtosmallerSEMvalues.Byratingthisinadequate,theresultofthisstudycanstillbeconsidered,however,itisconsideredtobelesstrustworthy.Moreover,Cronbach’salphaissometimesusedinadequately,becauseitiscalculatedforascalethatisnotunidimensional,orbasedonaformativemodel.InsuchcasestheCronbach’salphacannotbeinterpreted.Otherparametersthatarebasedonsinglemeasurements,suchasthepersonseparationindex(orotherIRT‐basedmeasurementerrormeasures)ortheOmega,arenotcoveredbythemeasurementerroraccordingtotheCOSMINtaxonomy,butbyinternalconsistency.TheCoefficientofvariationCoefficientofvariation(CV)isalsoaparameterofmeasurementerror.Itisoftenusedinphysicsandtopresentthemeasurementerrorofadevice.Whendevelopinganewdevicethemeasurementerrorisassessedbymeasuringafixedsamplemany(e.g.50)times.TheSDofthesemeasurementsisthestandarderrorofmeasurements.Oftenthemeasurementerrorincreaseswithhighervalues.ForthesesituationCVisasuitablemeasure,asCVexpressestheSDaspercentageofthemeanvalue:informulaCV=SD/mean.Usually,itisexpressedinpercentage,forexample,themeasurementerroris2%ofthemeasuredvalue.TheassumptionunderlyingCVisthattheCVgivesaconstantvalueoverallvaluesofthemean,sothattheSDise.g.2%ofthemeanvalue,regardlessofameanvalueof10or100or1000.InaBlandandAltmanplot,wehadacontraryassumption,i.e.thattheSDofthedifferenceisconstantoverthemeanvalues,ontheX‐axis.Ifthedifferencesarelowerwithsmallvaluesandhigherwithlargevaluethehorizontallinesofthelimitsofagreementgiveawrongvalue:toolargeforthesmallvaluesandtoosmallforthelargemeanvalues.Inthatcaseoneshouldtransformthedata.Oftenanaturallogarithmor10loglogarithmtransformationisused.Thishastheadvantagethatthelimitsofagreementcanbedirectlyexpressedinacoefficientsofvariation(41).
![Page 44: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/44.jpg)
44
Standard8:Preferredstatisticalmethodsfordichotomous,nominal,orordinalscores
verygood adequate doubtful inadequate
Fordichotomous/nominal/ordinalscores:Wasthepercentagespecific(e.g.positiveandnegative)agreementcalculated?
%specificagreementcalculated
%agreementcalculated
Elaboration:Oftenkappaisconsideredasameasureofagreement,however,kappaisameasureofreliability(42).Anappropriateparameterofmeasurementerror(alsocalledagreement)ofdichotomous/nominal/ordinalscoresistheproportionofspecificagreement(42‐44).Itisameasurethatexpressestheagreementseparatelyforeachcategoryofthescore–thatispositiveandnegativeratingsagreementincasethescoreisdichotomous.
![Page 45: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/45.jpg)
45
3.3ExampleofhowtousePartBoftheCOSMINRiskofBiastooltoassessthequalityofastudybySkeieetal.(2015)
InthischapterweprovideanexampleofhowtousetheCOSMINtool–PartBusingagainthepaperbySkeieetal.(19).TofullyunderstandtheexplanationinTable7,werecommendtofirstreadtheintroductionandmethodsectionofthepaper,andthesummaryprovidedatpage27/28.Inthispaperfourdifferentstudiesaredescribed.Hereweusethefirsttwosubstudies.
Table7.ExampleofhowtousePartBoftheCOSMINRiskofBiastoolbasedonthestudybySkeie(19).
StandardsondesignrequirementsforReliabilityandMeasurementerrorDesignrequirements Ratingstudy1 Ratingstudy2 1 Werepatientsstableinthetimebetween
therepeatedmeasurementsontheconstructtobemeasured?
NA(measurementswerebasedonastillimage
Verygood.Measurementswereconductedinsuccession.
2 Wasthetimeintervalbetweentherepeatedmeasurementsappropriate?
NA Verygood.Thetimeinterval(i.e.thesecondraterstartedimmediatelyafterthefirsthadcompletedtheprocedure)hasprobablynotinfluencedthescores.
3 Werethemeasurementconditionsimilarfortherepeatedmeasurements–exceptfortheconditionbeingevaluatedasasourceofvariation?
Verygood Verygood
4 Didtheprofessional(s)administerthemeasurementwithoutknowledgeofscoresorvaluesofotherrepeatedmeasurement(s)inthesamepatients?
Verygood.Noneofthepreviousscoreswereavailable
Verygood.Noneofthepreviousscoreswereavailable
5 Didtheprofessional(s)assignthescoresordeterminedthevalueswithoutknowledgeofthescoresorvaluesofotherrepeatedmeasurement(s)inthesamepatients?
Verygood.Noneofthepreviousscoreswereavailable
Verygood.Noneofthepreviousscoreswereavailable
6 Werethereanyotherimportantflawsinthedesignorstatisticalmethodsofthestudy?
Forreliability:Doubtful.5of30persons(seeTable1ofthepaper)werepain‐freesubjects,whichcouldhavemajorlyincreasedthevariationbetweenthepatients,andsubsequentlytheICC
Forreliability:Verygood.(inthisstudynopain‐freepersonswereincluded,seeTable1ofthepaper)
Formeasurementerror:verygood.Heterogeneityofthesampleisconsideredlessaproblem,asthevariationbetweenpatientsisnotincludedintheparameter.
![Page 46: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/46.jpg)
46
StandardsonpreferredstatisticalmethodsforReliability Ratingstudy1 Ratingstudy2
7 Forcontinuousscores:wasanIntraclass
CorrelationCoefficient(ICC)calculated?
Adequate.ICCtwo‐waymixedsinglemeasures(3.1)andtwo‐waymixedaveragemeasures(3.2)werecalculated.ThisistheICCconsistency,whichdoesnottakethesystematicerrorbetweenratersintoaccount.Thestudyaimstogeneralizebeyondtheratersinvolved,therefore,theratersshouldnotbeconsideredfixed,andtheICCmodeldoesnotmatchoptimallytheresearchaimanddesign.BasedonthemeanofthemeasurementsprovidedinTable2,wecanconcludethatnosystematicdifferencebetweentheratersoccurred.TheICCtwo‐waymixedaveragemeasures(3.2)referstothepracticeinwhichtworaterswouldmeasureeachpatient(withtripleplacementofsecondmarker),andbothfinalscoreswereaveraged.Asthiswillnotbecommonpractice,wewillignorethisICC.Therepetitionofpartofthemeasurementisalreadypartofonemeasurement.
8 Forordinalscores:wasa(weighted)
Kappacalculated?
Notapplicable Notapplicable
9 Fordichotomous/nominalscores:was
Kappacalculatedforeachcategoryagainst
theothercategoriescombined?
Notapplicable Notapplicable
FinalRiskofBiasratingReliabilitystudies Doubtful Adequate
StandardsonpreferredstatisticalmethodsforMeasurementerrorRatingstudy1 Ratingstudy2
7 Forcontinuousscores:wastheStandard
ErrorofMeasurement(SEM),Smallest
DetectableChange(SDC),Limitsof
Agreement(LoA)orCoefficientofVariation
(CV)calculated?
Adequate,asthelimitsofagreementwerecalculated,whiletheaimwastogeneralizebeyondtheratersincludedinthisstudy,andprobablytherewasnosystematicdifferencebetweentheraters.
8 Fordichotomous/nominal/ordinalscores:
Wasthepercentagespecific(e.g.positiveand
negative)agreementcalculated?
Notapplicable Notapplicable
FinalRiskofBiasratingstudyonMeasurement
error
Adequate Adequate
![Page 47: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/47.jpg)
47
4. UsingtheCOSMINRiskofBiastoolinasystematicreviewofoutcomemeasurementinstruments
Researchersandclinicianswhoaredecidingonthemostsuitableoutcomemeasurementinstrumentforuseintheirstudy,canoftenchoosefrommultipledifferentinstruments.Theselectionshouldbebasedontheevidenceofthequalityoftheoutcomemeasurementinstruments(i.e.reliability,validity,andresponsiveness),aswellasonaspectsoffeasibilityandinterpretability.Ahigh‐qualitysystematicreviewonoutcomemeasurementinstrumentsgivesaclearoverviewofallimportantaspectstomakeyourchoice.Understandingthequalityofthestudiesandthequalityofthemeasurementinstrumentunderstudyisachallengingtask,specificallyforresearchersandclinicianswhoarelessfamiliarwiththemethodologytoevaluateallmeasurementproperties.Therefore,in2018,we(COSMINinitiative)publishedathoroughmethodologytoconductasystematicreviewofPROMs(5).Itconsistedofaten‐stepproceduretosummarizetheavailableevidencepermeasurementpropertyperincludedPROManddrawconclusionsoneachmeasurementpropertyperPROM.Andsubsequently,togiverecommendationsofthemostsuitablePROMforagivenpurpose,includingalsofeasibilityandinterpretabilityaspects.ThismethodologyalsoincludestheCOSMINRiskofBiaschecklisttoassessthequalityofstudiesonmeasurementpropertiesofPROMs(1),includingstandardsfordesignrequirementsandpreferredstatisticalmethodsorganizedinboxespermeasurementproperty.ToperformasystematicreviewonthequalityofClinROMs,PerFOMsandlaboratoryvalues,thesamemethodologycanbeused.However,werecommendsomeadaptations.TwoaspectsoftheCOSMINmethodologyforsystematicreviewsofPROMsaredifferentforClinROMs,PerFOMsorlaboratoryvalues:recommendationtousedifferentboxesforreliabilityandmeasurementerror,andtheadditionofanewstepThenewboxesInsystematicreviewsofClinROMs,PerFOMsorlaboratoryvaluestheCOSMINRiskofBiaschecklistforPROMs(1)canbeused,althoughtheboxesforreliabilityandmeasurementerrorshouldbereplacedwiththeCOSMINRiskofBiastooltoassessthequalityofastudyonreliabilityormeasurementerror(4).Standardsformostoftheremainingmeasurementproperties(i.e.contentvalidity,internalconsistency,constructvalidity,criterionvalidityandresponsiveness)developedforPROMscanbeusedforothertypesofmeasurementinstrumentsaswell.Somemeasurementpropertiesareonlyrelevantformulti‐iteminstrumentsbasedonareflectivemodel(i.e.structuralvalidityandinternalconsistency).Forsomeothermeasurementpropertiesonlythefinalscoreorvalueofameasurementinstrumentisconsidered(i.e.hypothesestesting
![Page 48: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/48.jpg)
48
forconstructvalidity,criterionvalidityandresponsiveness).Thequalityofstudiesonthesemeasurementpropertiesaresimilarlyassessedforalltypesofoutcomemeasurementinstruments,andtheexistingboxesfromtheCOSMINRiskofBiaschecklistforPROMscanbeused.AnadditionalstepInareliabilitystudyorastudyonmeasurementerrorofaPROMthefocusofinterestisusuallyonthequalityofthePROMasitisbeingusedinclinicalpractice(analyzedusingaone‐wayrandomeffectsmodel),orinthetest‐retestreliability(usingatwo‐wayrandomeffectsmodelofagreement).However,thefocusofinterestinareliabilitystudyofothertypesofmeasurementinstrumentsismuchmorediverse.Asexplainedinchapter2,therearemanypotentialsourcesofvariation(i.e.manydifferentwaystooperationalizethecomponentsofoutcomemeasurementinstruments)thatcouldbethefocusofinterestinastudyonreliability.Eachresultofallthosestudiestellsyousomethingaboutthequalityoftheinstrument(andgivessuggestionsforimprovementofthemeasurementbystandardizingorrestrictingthesourceofvariationwhichshowedthelargesterror).Basedonanoverviewofallthesestudies,anbest‐evidencemeasurementprotocolcanberecommended.InaCOSMINreviewsofClinROMs,PerFOMsorlaboratoryvalues,anadditionalstepisneededintheten‐stepprocedure(seeFigure3),specificallyintheassessmentofreliabilityandmeasurementerror.Towellinterprettheresultsofstudiesincludedinasystematicreview,youneedtodecidehowtheresultsofthestudyyouwanttoassessinformyouaboutthequalityofthemeasurementinstrument.Therefore,weseparatedtheassessmentofreliabilityandmeasurementerrorfromtheothermeasurementproperties.Changeinthemethodology
Basedonourexperienceusingthemethodology,wedecidedtoremovestep8(whichwas‘Evaluateinterpretabilityandfeasibility’)fromthemethodology.Aspectsofinterpretabilityandfeasibilityareonlyextracted(andsummarized)ratherthanevaluated.Therefore,thisstepisirrelevantinthemethodology.However,weconsideritveryusefultohaveaseparatestepondataextraction.Onceyouincludedallthestudiesinareview,wefirstrecommendyoutoextractallnecessaryinformationfromanarticle,beforeassessingtheriskofbias,andthequalityoftheinstrument.Relevantinformationtobeextractedreferstocharacteristicsoftheincludedmeasurementinstruments,informationonfeasibilityandinterpretability,characteristicsofthestudies,andtheresultsofthestudy.
Consequently,thestep‐numbersaredeviatingfromthestepnumberspresentedintheoriginal10‐stepprocedureoftheCOSMINmethodologytoconductasystematicreviewofPROMs(5).
![Page 49: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/49.jpg)
49
Figure3.Eleven‐stepprocedureforconductingasystematicreviewonanytypeofoutcomemeasurementinstrument
![Page 50: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/50.jpg)
50
4.1Theeleven‐stepprocedureforconductingasystematicreviewofClinROMs,PerFOMs,orlaboratoryvalues
Below,asummaryisgivenfortheeleven‐stepprocedure.IntheusermanualoftheCOSMINmethodologyforsystematicreviewsofPROMs(45)athoroughexplanationofeachstepisprovided.OnlythestepsthataredifferentforareviewofoutcomemeasurementinstrumentsotherthanPROMsaredescribedhereindetail.Pleasenotethatthenumberofthesteparechanged.
Themethodologyofasystematicreviewofoutcomemeasurementinstrumentsissubdividedintothreeparts(A,B,andC)(5).
Step1‐4:Performtheliteraturesearch
Thesteps1‐4arestandardprocedureswhenperformingsystematicreviews,andareinagreementwithexistingguidelinesforreviews(46,47):formulatingthespecificaimofthereview,andtheeligibilitycriteria,performingtheliteraturesearch,andselectingrelevantpublications.
Intheresearchquestion,andeligibilitycriteriafourkeyelementsshouldbeincluded:1)theconstruct;2)thepopulation;3)thetype(s)ofinstruments;and4)themeasurementpropertiesofinterest.
Inthesearchstrategywerecommendtoalsousethesekeyelements,exceptfromthetypeofinstruments,aswearenotawareofhighlysensitivesearchblocksfordifferenttypesofmeasurementinstruments.Searchfiltersfordifferentconstructsmaybefoundathttps://blocks.bmi‐online.nl/.Whenusingthesearchfilterforfindingstudiesonmeasurementproperties(48)ofCLinROMs,PerFOMsandlaboratoryvalues,werecommendtouseadditionalsearchtermsforfindingstudiesusingGeneralizabilitytheory.Thisstring,developedwiththehelpofaclinicallibrarian,canbeaddedwiththebolean“OR”tothesearchfilter.
PubmedsearchstringforfindingstudiesusingGeneralizabilitytheory:
G‐theory[tiab]OR"Gtheory"[tiab]OR"generalizabilitytheory"[tiab]OR"generalisabilitytheory"[tiab]
EMBASEsearchstringforfindingstudiesusingGeneralizabilitytheory:
‘g‐theory’:ti:abOR‘gtheory’:ti,abOR‘generalizabilitytheory’:ti,abOR‘generalisabilitytheory’:ti,ab
![Page 51: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/51.jpg)
51
Step5:Dataextraction
Onceyouincludedallrelevantarticles,youcheckperarticlewhichmeasurementpropertieswereevaluated(andsubsequentlydecidewhichCOSMINboxesarerelevanttobecompletedforthespecificarticle).Whenreadingthroughthearticle,atthispoint,werecommendyoutoextractallinformationfromthearticleaboutthecharacteristicsoftheincludedmeasurementinstruments(forsuggestionsofcharacteristicsseeappendix4),includingaspectsoffeasibilityandinterpretability(seebelow).Interpretabilityisdefinedasthedegreetowhichonecanassignqualitativemeaning(thatis,clinicalorcommonlyunderstoodconnotations)toaquantitativescoreorchangeinscoresofanoutcomemeasurementinstrument(7).Boththeinterpretabilityofsinglescoresandtheinterpretabilityofchangescoresisinformativetoreportinasystematicreview.Theinterpretationofsinglescorescanbeoutlinedbyprovidinginformationonthedistributionofscoresinthestudypopulationorotherrelevantsubgroups,asitmayrevealclusteringofscores,anditcanindicatefloorandceilingeffects.TheinterpretabilityofchangescorescanbeenhancedbyreportingM(C)ICvalues.However,thereisanongoingdebateabouthowthesevaluesshouldbeassessed.
Feasibilityisdefinedastheeaseofapplicationofthemeasurementinstrumentinitsintendedcontextofuse,givenconstraintssuchastimeormoney(49).Aspectsoffeasibilityare,forexample,completiontime,costofaninstrument,lengthoftheinstrument,typeandeaseofadministration.Feasibilityappliestoboththepatientsandtheprofessionalwhoareinvolvedinthemeasurement.Theconcept‘feasibility’isrelatedtotheconcept‘clinicalutility’,wherefeasibilityreferstoameasurementinstrument,andclinicalutilityreferstoanintervention(50).
Interpretabilityandfeasibilityarenotmeasurementpropertiesbecausetheydonotrefertothequalityofanoutcomemeasurementinstrument.However,theyareconsideredimportantaspectsforawell‐consideredselectionofanoutcomemeasurementinstrument.
![Page 52: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/52.jpg)
52
Steps6‐9:Evaluatethemeasurementproperties
Thesteps6‐9concerntheevaluationoftheninemeasurementpropertiesoftheincludedoutcomemeasurementinstruments.Inthesestepspermeasurementproperty,dataisextractedonthecharacteristicsofthestudies,andtheresultofeachstudy,theriskofbiasoftheincludedstudiesisratedbyusingtheCOSMINRiskofBiasstandards,andtheresultsofthestudiesareratedbyapplyingthecriteriaforgoodmeasurementproperties.Subsequently,allevidenceissummarized,andthequalityofallavailableevidencepermeasurementpropertypermeasurementinstrumentisgradedusingamodifiedGRADEapproach.
Characteristicsofthestudiesrefertothecharacteristicsoftheincludedpatientpopulations,andpopulationofincludedprofessionals(forsuggestionsofcharacteristicsseeappendix5).Forspecificrecommendationsforextractinginformationontheresultsofstudiesonreliabilityandmeasurementerrorseestep8extractinginformation(p53).
Instep6thecontentvalidityisassessed.Instep7theinternalstructure(structuralvalidity,internalconsistencyandcross‐culturalvalidity\measurementinvariance)isassessed.Astheassessmentofreliabilityandmeasurementerrorrequiresanadditionalstep(i.e.understandinghowtheresultsofastudyinformyouaboutthereliabilityormeasurementerrorofaoutcomemeasurementinstrument),thesetwomeasurementpropertiesarenowassessedinaseparatestep,i.e.step8,apartfromtheassessmentofthemeasurementpropertiescriterionvalidity,hypothesestestingforconstructvalidity,andresponsiveness(i.e.step9).
Step6.Evaluatecontentvalidity
Instep6contentvalidityisevaluated.InthecurrentstandardsandcriteriaforassessingcontentvalidityofPROMs(6)emphasizeisputontherelevance,comprehensiveness,andcomprehensibilityofthePROMfortheconstruct,targetpopulation,andintendedcontextofuse.InthisassessmentalsothedevelopmentofthePROMisconsidered,specifically,theitemelicitationphaseandtheresultsfromthepilot‐testingphase.Theassessmentofcontentvalidityofothertypesofinstrumentsmaybedifferent,andmoreresearchisneededtodevelopstandardsandcriteriaforothertypesofmeasurementinstruments.
Assessingthecontentvalidityofmeasurementinstrumentsthatincludemultipleitems–eitherbasedonareflectiveorformativemodel–canheavilyleanonthestandardsandcriteriaforPROMs.Only,becauseprofessionalsareinvolvedinthemeasurement,thethreeaspectsofcontentvalidity(i.e.relevance,comprehensiveness,andcomprehensibility)shouldbeaskedtotheprofessionals.Dependingontheconstructofinterest,theseaspectscouldbeaskedtopatients,too,forexampleforPerFOMs,orClinROMsaboutsymptomsorseverityofconditions.
![Page 53: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/53.jpg)
53
Fortheassessmentofcontentvalidityofmeasurementinstrumentsthatexistofasingleparameter(e.g.imaging‐basedparameters,orlaboratoryvalues),otheraspectsarelikelymorerelevant.Forexample,youshouldjudgewhetheritmakessensethatthemeasurementinstrumentindeedmeasurestheconstructitpurportstomeasure,basedontheoryandmedicalknowledge,andbasedontheclaimsbythemanufacturer.Inaddition,theunitofmeasurementshouldmatchtheconstructtobemeasured.Forexample,a6minutewalktest–expressedinthedistancecoveredoveratimeof6minutes–measureswalkingcapacity,ratherthanphysicalfunctioning(51).Ascurrentlynostandardsandcriteriaforcontentvalidityexist,facevalidity(whichisarathersubjectivejudgmentaboutwhetherthecontentoftheinstrumentindeedlooksasanadequatereflectionoftheconstructtobemeasured)couldbeassessedbythereviewer.
Step7.Evaluatetheinternalstructure
Instep7theinternalstructure(structuralvalidity,internalconsistencyandcross‐culturalvalidity\measurementinvariance)isassessed.Thisstepisonlyrelevantwhenthemeasurementinstrumentisamulti‐iteminstrumentbasedonareflectivemodel.Thestandards(1)andcriteria(5)providedforsystematicreviewsofPROMscanbeused.
Step8.Evaluatereliabilityandmeasurementerror
Next,instep8reliabilityandmeasurementerrorareassessed.Inchapter2and3wehaveexplainedhowtoassessthequalityofeachstudyonreliabilityandmeasurementerror.
Inasystematicreviewperstudy,youshouldfirstextractinformationabouttheelementsofacomprehensiveresearchquestion(seechapter2),thespecificICCmodelorformula,andtheresultsofeachstudy.Next,youshouldassessthestudyqualityusingthestandards(seechapter3),andassesstheresultsofeachstudy,bycomparingtheresultsagainstthecriteriaforgoodmeasurementproperties(5).Subsequently,youshouldsummarizeallevidenceforreliabilityandformeasurementerror,respectively,andgradethequalityoftheevidenceusingthemodifiedGRADEapproach(5).Basedonthisoverview,youcanrecommendonthebest‐evidencemeasurementprotocolforaspecificmeasurementinstrument.
Extractinginformation
InAppendix1weprovideanexampleofadataextractiontable.First,werecommendtoextractthesevenelementsofacomprehensiveresearchquestion,andtheresearch
![Page 54: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/54.jpg)
54
questionasstatedbytheauthorsinthearticle.Basedontheelements,youcansubsequentlyformulateacomprehensiveresearchquestion.Next,werecommendtoextracttheinformationaboutthekeyelementsofthereview,i.e.theconstruct,population,typeofmeasurementinstrument,andmeasurementpropertiesofinterest.Theconstructtobemeasured(element3ofacomprehensiveresearchquestion),andthespecificmeasurementproperties(element4ofacomprehensiveresearchquestion)arealreadyextracted,sothetargetpopulationandthetypeofmeasurementinstrumentarerecommendedtobeextracted.Thetargetpopulationreferstothetargetpopulationofthespecificstudy.IntheexampleofSkeieetal.(19),thetargetpopulationwerepatientswithlow‐backpain.Thiscanbedifferentfromthestudypopulation(i.e.thesampleused)asextractedinitem7,or(slightly)differentfromthetargetpopulationofthereview(e.g.abroaderpopulation).InthestudyofSkeie,notonlypatientswithlow‐backpainwereincluded,butalsopatientswithotherspinalcomplaintssuchasmidbackpain,neckpain,and/orextremitypain,orevenpain‐freesubjects.ThetypeofmeasurementinstrumentreferstowhethertheinstrumentunderstudyisaClinROM,PerFOM,laboratoryvalue,aPROMoranObsROM.
Last,werecommendtoextractinformationaboutthestatistics:themodelorformulaused,theresult,and,ifapplicable,its95%confidenceinterval.Ifavailable,werecommendtoextractthevariancecomponents,ortheSDsampleorSDdifference(seealsochapter3.2formoreexplanation).Forordinalordichotomousdatawerecommendtoextracttherawnumbersinthecellsplusmarginaltotals.
RiskofBiasassessment
Thenextstepinthereview,istoassessthequalityofeachstudy,usingPartBoftheRiskofBiastooltoassessreliabilityandmeasurementerror(asdescribedinchapter3).Werecommendtousetheworst‐scorecountsmethodstocometoanoverallratingperstudy.InAppendix2weprovideanexampleofsuchatabletoorganizetheseratings.Werecommendthateachstudyisassessedbytwoindependentreviewers,andthattheycometoconsensus.
Comparisonagainstthecriteriaforgoodmeasurementproperties
Eachresultofeachsinglestudyonreliabilityormeasurementerrorisnowcomparedagainstthecriteriaforgoodmeasurementproperties(5).AsnocriteriafortheunweightedKappa,andCVwereprovidedintheguidelinesforsystematicreviewsofPROMs,weaddedthesemissingcriteria(seeTable8).Criteriafor%specificagreementaredifficulttoset,becausetheyare,justlikesensitivityandspecificity,highlydependentonthesituation.Asaruleofthumb80%mightbeused.
![Page 55: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/55.jpg)
55
Table8.Extendedcriteriaforgoodreliabilityandmeasurementerror(adaptedfromPrinsenetal.(5))
Reliability
+ ICCor(weighted)Kappa≥0.70
? ICCor(weighted)Kappanotreported
– ICCor(weighted)Kappa<0.70
Measurementerror
+
SDCorLoAorCV*√2*1.96<M(C)IC1;%specificagreement>80%2
? MICnotdefined
–SDCorLoAorCV*√2*1.96>M(C)IC1;%specificagreement<80%2
1theM(C)ICvaluemaycomefromanotherstudy.2Sometimesahigherpercentageismoreappropriate;whensubstantiated,thiscouldbeappropriate,too.
Summarizingtheevidence
Tocometoanoverallconclusionofthereliabilityorthemeasurementerrorofanoutcomemeasurementinstrument,oneshouldfirstdecidewhethertheresultsfrommultiplestudiescanbecombined.Youshouldtaketwoaspectsintoaccountinthisdecision.1)Dotheresultsrefertothesameinformation(i.e.refertothesameunderlyingcomprehensiveresearchquestion).Resultsfromdifferentdesigns(i.e.differentcomponentswerevariedacrosstherepeatedmeasurements)giveyouotherinformationaboutthereliabilityofaninstrument,andthereforecannotsimplybesummarized.And2)Aretheresultsconsistent,thatisallresultsareeithersufficient(+)orinsufficient(‐).Incaseofinconsistencyinresults,werecommendtosearchforreasonsforthisinconsistency,e.g.differentdesignsorstatisticalmodels,differentpopulations,differentbackgroundofraters.Subsequently,subgroupsofstudiescanbesummarized.
Tosummarizetheevidence,youcaneitherqualitativelysummarizetheresults(e.g.describetherangeoftheresults)orquantitativelypooltheresults.Inreliabilitystudies,onlythepointestimateofanICCorCohen’skappaisusedtoconcludewhetherthespecificmeasurementinstrumenthassufficientreliability(e.g.inthecriteriathatweproposeabove).Therefore,itisnotnecessarytopoolthedatatoobtainamoreprecisepointestimate.
Themeasurementerrorreferstotheabsolutedeviationofthescorefromthe‘true’scoreortheamountoferrorinthescore.Thepointestimateofthemeasurementerrorparameterreferstothisdeviationorerror,andthereforeitisusedtoknowhowprecisethemeasurementinstrumentisabletomeasureapatient.Tocometoamoreprecisepointestimatesofthemeasurementerror,theparametersobtainedinstudieswiththesamedesign(i.e.thathavethesameunderlyingcomprehensiveresearchquestion)can
![Page 56: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/56.jpg)
56
bepooled,whentheconfidenceintervalsarealsoreporting(whichcanbeobtainedusingthesamplesize(39)orbootstrappingmethods(52)).
Note,thatyoushouldonlysummarizeorpoolparametersofmeasurementerrorthatwerederivedfromthesamestudydesignandmodelorformulaused.Forexample,theSEMconsistency(eitherformula4or8,chapter3.2)andSEMagreement(formula2,chapter3.2)shouldnotbecombined.However,SEMconsistencyusingeitherformula4or6(chapter3.2)canbecombinedastheywillleadtothesameresult,andtheSDCconsistencyusingeitherformula5,7,or9(chapter3.2)canbecombined.ThesameresultsarefoundwhenusingeithertheSEMone‐wayrandomeffectsmodel(formula1,chapter3.2)orSEMagreement(formula2,chapter3.2).Thisisbecauseallsourcesofvariance(apartfromthevariancebetweenpatients)aretakenintoaccountinbothformulas.Therefore,theseparameterscanbecombined.
Handlinginconsistentresults.
Iftheresultsofstudieswiththesameunderlyingresearchquestionareinconsistent(e.g.bothsufficientandinsufficientresultsarefound),firstexplanationsforinconsistencyshouldbeexplored.Forexample,slightlydifferentstudypopulationsormethodswereused.Ifanexplanationisfound,subgroupsofstudies(e.g.nowbasedonthesamestudypopulation,orinwhichthesamesourceofvariationisvaried)canbesummarized.Theoverallconclusionfor(e.g.)reliabilitycansubsequentlybedrawnpersubgroup.Whentheexplanationisfoundinthequalityofthestudies(i.e.verygoodandadequatestudiesleadtoanotheroverallratingthandoubtfulandinadequatestudies),thedoubtfulandinadequatequalitystudiesmayonlybereported,butignoredinthisstep,andonlyverygoodandadequatequalitystudiesareconsideredtobedecisiveindeterminingtheoverallratingwhenratingsareinconsistent.Thisshouldbeexplainedinthemanuscript.
Ifstudieswiththesameunderlyingresearchquestionshowedinconsistentresults,andnoexplanationcanbefound,onecanconcludethatresultsareinconsistent.
WerefertotheUsermanualoftheCOSMINmethodologyforsystematicreviewsofPROMsformoreinformation.
Ratethequalityofthesummarizedresult
Ifmultiplestudiescanqualitativelybesummarized(e.g.therangeofresults)orquantitativelypooled,theoverallresultcanagainbecomparedtothecriteriaforgoodmeasurementproperties(seeTable8);youcanthenconcludethattheoutcomemeasurementinstrumenthaseithersufficient(+)orinsufficient(‐)reliabilityormeasurementerror.Oryoushouldconcludethattheresultsareinconsistent(±),or
![Page 57: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/57.jpg)
57
indeterminate(?).Formoreinformation,werefertotheUsermanualoftheCOSMINmethodologyforsystematicreviewsofPROMs.
GradingthequalityoftheevidenceusingthemodifiedGRADEapproach
Aftersummarizingorpoolingallevidenceperoutcomemeasurementinstrumentforreliabilityandformeasurementerror,andratingthesummarizedorpooledresultsagainstthecriteriaforgoodmeasurementproperties,thenextstepistogradethequalityoftheevidence.Thequalityoftheevidencereferstotheconfidencethatthesummarizedorpooledresultsistrustworthy.WedevelopedamodifiedGRADE(GradingofRecommendationsAssessment,Development,andEvaluation)approachtogradetheevidenceashigh,moderate,loworverylow(5),basedonthe1)riskofbias(i.e.themethodologicalqualityofthestudies),2)inconsistency(i.e.unexplainedinconsistencyofresultsacrossstudies),3)imprecision(i.e.totalsamplesizeoftheavailablestudies),and4)indirectness(i.e.evidencefromdifferentpopulationsthanthepopulationofinterestinthereview).ThisprocedureisdescribedintheUsermanualoftheCOSMINmethodologyforsystematicreviewsofPROMs(5,45).
Drawconclusionon‘best‐evidencemeasurementprotocol’
Theresultsofreliabilitystudieswiththeirspecificdesignsinformyouwhetherasourceofvariation(forexamplethetrainingofarater,thespecificmachineused)importantlyaffectsthescore(i.e.themeasurement).Ifpossible,thissourceofvariationshouldbestandardizedorrestrictedinfuturemeasurements.Bylookingatallevidenceforvarioussourceofvariation,youcannowdrawconclusionsabouthowtostandardizeandrestrictthemeasurement,anddescribethisbest‐evidencemeasurementprotocol.
Step8.Evaluatecriterionvalidity,hypothesestestingforconstructvalidity,andresponsiveness
Instep8criterionvalidity,hypothesestestingforconstructvalidity,andresponsivenessisassessed.Thestandards(1)andcriteria(5)providedforsystematicreviewsofPROMscanbeused.
![Page 58: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/58.jpg)
58
Steps10‐11:.Selecttheoutcomemeasurementinstrument
Thesteps10and11concernstheformulatingrecommendations(step10)andthereportingofthesystematicreview(step11).
Step10.Formulaterecommendations
Thegoalofasystematicreviewonmeasurementinstrumentsistogetanoverviewofallavailableevidenceonthequalityofoutcomemeasurementinstrumentsthatmeasureaspecificconstructinadefinedpatientpopulation.Basedonthisoverview,andtakingaspectsoffeasibilityandinterpretabilityintoaccount,werecommendyoutoformulateyourrecommendationsaboutthemostsuitableoutcomemeasurementinstrument.Tocometoanevidence‐basedandfully‐transparentrecommendation,werecommendtocategorizetheincludedmeasurementinstrumentsintothreecategories.Pertypeofmeasurementinstrumentyoucanconcludewhichinstrument(s)arerecommended(categoryA)orpromising(categoryB),orinsufficient(categoryC)andshouldnotbeusedanymore.
Category(A):
Werecommendusingdifferentdefinitionsofthecategory(A),dependingonthestructureofthemeasurementinstrument:
Multi‐itemreflectief
Evidenceforsufficientcontentvalidity(anylevel),ANDsufficientinternalconsistency(atleastlowquality,meaningalsosufficientstructuralvalidity)
Multi‐itemformatief
Evidenceforsufficientcontentvalidity(anylevel)
Singleitem(singleparameter)(nogoldstandard)
Sufficientfacevalidity(ratedbye.g.thereviewersteam),ANDevidenceforsufficientreliability(anylevel)
Singleitem(goldstandardavailable)
Evidenceforsufficientcriterionvalidity,ANDevidenceforsufficientreliability(anylevel)
Category(B):outcomemeasurementinstrumentnotcategorizedas‘A’or‘B’.
Category(C):outcomemeasurementinstrumentwithhighqualityevidenceforaninsufficientmeasurementproperty.
![Page 59: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/59.jpg)
59
Step11.Reportthesystematicreview
InaccordancewiththePRISMAStatement(53,54),werecommendtoreportthefollowinginformation:(1)thesearchstrategy(forexampleonawebsiteorinthe(online)supplementalmaterialstothearticleatissue),andtheresultsoftheliteraturesearchandselectionofthestudiesandmeasurementinstruments,displayedinthePRISMAflowdiagram(includingthefinalnumberofarticlesandthefinalnumberofmeasurementinstrumentsincludedinthereview)(Appendix3);(2)thecharacteristicsoftheincludedmeasurementinstruments,includingaspectsoffeasibilityandinterpretability(Appendix4);(3)thecharacteristicsofthestudies,includingthecharacteristicsoftheincludedpatientpopulations,andpopulationofincludedprofessionals(Appendix5);(4)themethodologicalqualityratingsofeachstudypermeasurementpropertypermeasurementinstrument(i.e.verygood,adequate,doubtful,inadequate),theresultsofeachstudy,andtheaccompanyingratingsoftheresultsbasedonthecriteriaforgoodmeasurementproperties(sufficient(+)/insufficient(‐)/indeterminate(?)).IntheUserManualforconductingsystematicreviewsofPROMs(45)anexampleisprovided.InAppendix6weprovideexamplesspecificallyforcolumnsonreliabilityandmeasurementerror.ThetablecouldbepublishedforexampleasAppendixorsupplementalmaterialonthewebsiteofthejournalonly;(5)aSummaryofFindings(SoF)tablepermeasurementproperty,includingthepooledorsummarizedresultsofthemeasurementproperties,itsoverallrating(i.e.sufficient(+)/insufficient(‐)/inconsistent(±)/indeterminate(?)),andthegradingofthequalityofevidence(high,moderate,low,verylow).IntheUserManualforconductingsystematicreviewsofPROMs(45)anexampleisprovided.InAppendix7weprovideexamplesspecificallyforcolumnsonreliabilityandmeasurementerror.TheseSoFtables(i.e.onepermeasurementproperty)willultimatelybeusedinprovidingrecommendationsfortheselectionofthemostappropriatePROMforagivenpurposeoraparticularcontextofuse.
![Page 60: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/60.jpg)
60
Appendix1.DataExtractiontableofrelevantinformationforeachincludedstudyinasystematicreview.
Extractionitem Instruction Study1 Study2Elementsofacomprehensiveresearchquestion1.Nameoftheinstrument
Alternatively:typeofinstrumentandparameter
2.Versionorwayofoperationalization
Allrelevantcomponentsthatareknownorexpectedtoinfluencethescore,andwhicharestandardizedorrestricted(facetofstratification(23))
Equipment:Preparatoryactions:
Equipment:Preparatoryactions:
Unprocesseddata/samplecollection:Dataprocessingandstorage:
Unprocesseddata/samplecollection:Dataprocessingandstorage:
Assignmentofthescore/determinationofthevalue:
Assignmentofthescore/determinationofthevalue:
3.Construct Descriptionofwhatisbeingmeasured
4.Measurementproperty
Reliabilityand/ormeasurementerror
5.Componentsthatwillberepeated
e.g.wholemeasurement(i.e.allcomponents)orsomeofthecomponent
6.Source(s)ofvariationvaried
Componentswhichisvariedacrossthemeasurements(i.e.focusofanalysis;facetofgeneralizability(23))
7.Patientpopulation
(i.e.facetofdifferentiation(23))
Theresearchquestion
Publishedresearchquestion
Asformulatedbytheauthors
Comprehensiveresearchquestion
Asformulatedbythereviewer
Additionalkeyelementofresearchaimofthereview
Targetpopulation Descriptionofthepopulationtowhichtheauthorswanttogeneralize
Typesofmeasurementinstrument
e.g.ClinROM,PerFOM,laboratoryvalue,PROMorObsROM
![Page 61: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/61.jpg)
61
Statisticalinformationandresults
Modelorformulaused
Statisticalmodel
Result e.g.results(95%CI)ofICC,kappa,SEM,LoAandsystematicdifference
Variancecomponents
Allreportedvariancecomponents
Applycriteriaforgoodmeasurementproperty*
sufficient(+),insufficient(‐),orindeterminate(?)
*althoughthisisarating,andnotdataextraction,weincludeithere,astherequiredinformationtomaketheratingisextractedhere.
![Page 62: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/62.jpg)
62
Appendix2.RiskofBiasratingsperstandardperstudy
RiskofBiasrating study1 rater1 rater2 consensusDesignrequirements 1 Stabilityofthepatients 2 Timeinterval 3 Similarityofmeasurementcondition 4 Administationwithoutknowledgeof
scoresorvalues 5 Scoreassignmentordeterminationof
valueswithoutknowledgeofthescoresorvalues
6 Otherimportantflaws Statisticalmethods 7 Forcontinuousscores:ICC 8 Forordinalscores:Kappa 9 Fordichotomous/nominalscores:
Kappaforeachcategoryagainsttheothercategoriescombined?
Finalrating
![Page 63: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/63.jpg)
63
Appendix3.ExampleofaFlow‐chart
![Page 64: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/64.jpg)
64
Appendix4.Exampleofreportingtableoncharacteristicsoftheincludedmeasurementinstruments.
Name(referencetofirstarticle)
Construct Intendedcontextofuse
Best‐evidencemeasurementprotocol
Targetpopulation
Typeofmeasurementinstrument
Feasibilityaspects
Interpretabilityaspects
LMMthickness(19)
Thicknessofrestingmuscle
Evaluation Trainingdiagnosticultrasound.Specificinstructionsforpatient,andprobepositions.
Patientswithlowbackpain
Ultrasound Meanscoreinmixofpainpatientswas27.9mm(±3.2)
LMMcontraction(19)
Comparisonofthethicknessofrestingmusclewiththatofactivatedmuscle
Evaluation Trainingdiagnosticultrasound.Specificinstructionsforpatient,andprobepositions.
Patientswithlowbackpain
Ultrasound Meanscoreinmixofpainpatientsranges1.3mm(±1.7)–3.5mm(±2.6)
Othercharacteristicswhichmaybeextractedare:conceptualmodelused,recommendedbystandardizationinitiatives,fullcopyavailable,fitforpurpose(diagnostic,prognostic,evaluation).
Aspectsoffeasibilityare,forexample,completiontime,licensinginformationandcostsofaninstrument,typeandeaseofadministration.Feasibilityappliestoboththepatientsandtheprofessionalwhoareinvolvedinthemeasurement.ItmaybeconsideredtoreportthisinformationinaseparateTable.
Aspectsofinterpretabilityreferto1)interpretabilityofsinglescores(e.g.informationonthedistributionofscoresinstudypopulationorotherrelevantsubgroups,andfloorandceilingeffects),and2)interpretabilityofchangescores(i.e.M(C)ICvalues).
![Page 65: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/65.jpg)
65
Appendix5.Exampleofreportingtableoncharacteristicsofthestudypopulations.
Measurementinstrument
Reference Measurementpropertyassessed
Patientpopulation Professionalpopulation Responserate
Samplesize
Patientcharacteristics Samplesize
Characteristicsofprofessionals
LMMcontraction
(19)Study2 Reliability,measurementerror
30 47%female,agemean(SD)37(±12);LBPn=20;neck/midbackpainn=5;extremitypainn=1;painfreen=4
2 Chiropractorsexperiencedindiagnosticultrasoundforthemusculoskeletalsystem,i.e.4and8yearsresp.,withapostgraduatediplomaindiagnosticultrasound.Beforethestudy.bothdevelopedtheprotocolofdiagnosticultrasoundthatwasappliedinthisstudy.
(19)Study3 Reliability 30 50%female,agemean(SD)38(±11);LBPn=23;neck/midbackpainn=7
2
(19)Study4 Reliability,measurementerror
30 43%female,agemean(SD)40(±11);LBPn=20;neck/midbackpainn=6;extremitypainn=3;painfreen=1
2
B 1
2
Patientcharacteristicsreferto,e.g.age,gender,diseasecharacteristics(diagnosis,diseaseduration,diseaseseverity),setting,andgeographicallocation.
Ratercharacteristicsmayreferto,e.g.professionalbackground,specifictrainingreceived,oryearsofexperience.
![Page 66: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/66.jpg)
66
Appendix6.OverviewTableofqualityandresultsofstudiesonreliabilityandmeasurementerror.
Measurementinstrument(MI)(ref)
TypeofMI Reliability Measurementerrorn Studyquality Result(rating) N Studyquality Result(rating)
LLMcontractionscore(study2)(19)
Ultrasound 30 Adequate 0.97(0.92‐0.98) 30 Adequate LoA[−0.94;1.22mm]
LLMcontractionscore(study3)(19)
Ultrasound 30 Adequate 0.94(0.88‐0.97)
LLMcontractionscore(study4)(19)
Ultrasound 30 Adequate 0.97(0.94‐0.99) 30 Adequate LoA[−1.32;1.25mm]
LLMcontractionscore(ref)
LLMcontractionscore(ref)
Pooledorsummaryresult(overallrating)
90 0.94‐0.97(+) 90 SDCconsistsncy=1.08;1.29a
acalculatedfromLoA
![Page 67: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/67.jpg)
67
Appendix7.SummaryofFindingsTablesforReliabilityandMeasurementerror.
BasedonthestudiesonreliabilitydescribedbySkeie(19)
Reliability Summaryresult Overallrating Qualityofevidence
UltrasoundmeasurementoftheLMMcontractionscore–best‐evidencemeasurementprotocol:rater,dayandactivemotortasksperformedbeforemeasurementwerenotofinfluence
RangeICC:0.94‐0.97 Sufficient High(twostudiesofadequatequality)
MeasurementinstrumentB–
BasedonthestudiesonmeasurementerrordescribedbySkeie(19)
Measurementerror Summaryresult Overallrating Qualityofevidence
UltrasoundmeasurementoftheLMMcontractionscore–best‐evidencemeasurementprotocol:rater,dayandactivemotortasksperformedbeforemeasurementwerenotofinfluence
RangeSDCconsistsncy:1.08‐1.29
MIC=notassessed
?
MeasurementinstrumentB–
![Page 68: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/68.jpg)
68
References1. Mokkink LB, de Vet HCW, Prinsen CAC, Patrick DL, Alonso J, Bouter LM, et al. COSMIN Risk of Bias checklist for systematic reviews of Patient-Reported Outcome Measures. Qual Life Res. 2018;27(5):1171-9. 2. Walton MK, Powers JA, Hobart J, al. e. Clinical outcome assessments: A conceptual foundation – Report of the ISPOR Clinical Outcomes Assessment Emerging Good Practices Task Force. Value Health. 2015;18:741-52. 3. Powers JH, 3rd, Patrick DL, Walton MK, Marquis P, Cano S, Hobart J, et al. Clinician-Reported Outcome Assessments of Treatment Benefit: Report of the ISPOR Clinical Outcome Assessment Emerging Good Practices Task Force. Value Health. 2017;20(1):2-14. 4. Mokkink LB, Boers M, van der Vleuten CPM, Bouter LM, Alonso J, Patrick DL, et al. COSMIN Risk of Bias tool to assess the quality of studies on reliability or measurement error of outcome measurement instruments: a Delphi study. . BMC Medical Research Methodology. 2020;20(293). 5. Prinsen CAC, Mokkink LB, Bouter LM, Alonso J, Patrick DL, de Vet HCW, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1147-57. 6. Terwee CB, Prinsen CAC, Chiarotto A, Westerman MJ, Patrick DL, Alonso J, et al. COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study. Qual Life Res. 2018;27(5):1159-70. 7. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737-45. 8. Hamilton M. The assessment of anxiety states by rating. Br J Med Psychol. 1959;32(1):50-5. 9. Douglas PS, DeCara JM, Devereux RB, Duckworth S, Gardin JM, Jaber WA, et al. Echocardiographic imaging in clinical trials: American Society of Echocardiography Standards for echocardiography core laboratories: endorsed by the American College of Cardiology Foundation. J Am Soc Echocardiogr. 2009;22(7):755-65. 10. Jungmann PM, Welsch GH, Brittberg M, Trattnig S, Braun S, Imhoff AB, et al. Magnetic Resonance Imaging Score and Classification System (AMADEUS) for Assessment of Preoperative Cartilage Defect Severity. Cartilage. 2017;8(3):272-82. 11. Fischer JSJ, A.J.; Kniker, J.E.; Rudick, R.A.; Cutter,G. Multiple Sclerosis Functional Composite (MSFC). Administration and scoring manual.; 2001. 12. Genc S, Omer B, Aycan-Ustyol E, Ince N, Bal F, Gurdol F. Evaluation of turbidimetric inhibition immunoassay (TINIA) and HPLC methods for glycated haemoglobin determination. J Clin Lab Anal. 2012;26(6):481-5. 13. Holen JC, Saltvedt I, Fayers PM, Hjermstad MJ, Loge JH, Kaasa S. Doloplus-2, a valid tool for behavioural pain assessment? BMC Geriatr. 2007;7:29. 14. Farooq MN, Mohseni Bandpei MA, Ali M, Khan GA. Reliability of the universal goniometer for assessing active cervical range of motion in asymptomatic healthy persons. Pak J Med Sci. 2016;32(2):457-61. 15. Jordan K, Haywood KL, Dziedzic K, Garratt AM, Jones PW, Ong BN, et al. Assessment of the 3-dimensional Fastrak measurement system in measuring range of motion in ankylosing spondylitis. J Rheumatol. 2004;31(11):2207-15.
![Page 69: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/69.jpg)
69
16. Correll S, Field J, Hutchinson H, Mickevicius G, Fitzsimmons A, Smoot B. Reliability and Validity of the Halo Digital Goniometer for Shoulder Range of Motion in Healthy Subjects. Int J Sports Phys Ther. 2018;13(4):707-14. 17. D'Agostino M A, Aegerter P, Jousse-Joulin S, Chary-Valckenaere I, Lecoq B, Gaudin P, et al. How to evaluate and improve the reliability of power Doppler ultrasonography for assessing enthesitis in spondylarthritis. Arthritis Rheum. 2009;61(1):61-9. 18. Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21(4):651-7. 19. Skeie EJ, Borge JA, Leboeuf-Yde C, Bolton J, Wedderkopp N. Reliability of diagnostic ultrasound in measuring the multifidus muscle. Chiropr Man Therap. 2015;23:15. 20. Mathew AJ, Ostergaard M. Magnetic Resonance Imaging of Enthesitis in Spondyloarthritis, Including Psoriatic Arthritis-Status and Recent Advances. Front Med (Lausanne). 2020;7:296. 21. Butland RJ, Pang J, Gross ER, Woodcock AA, Geddes DM. Two-, six-, and 12-minute walking tests in respiratory disease. Br Med J (Clin Res Ed). 1982;284(6329):1607-8. 22. de Jong K ea. Richtlijnen 6-minutes timed walking test.; 2000. 23. Bloch R, Norman G. Generalizability theory for the perplexed: a practical introduction and guide: AMEE Guide No. 68. Med Teach. 2012;34(11):960-92. 24. Feys P, Lamers I, Francis G, Benedict R, Phillips G, LaRocca N, et al. The Nine-Hole Peg Test as a manual dexterity performance measure for multiple sclerosis. Mult Scler. 2017;23(5):711-20. 25. Mathiowetz V, Weber K, Kashman N, Volland G. Adult norms for the Nine Hole Peg Test of finger dexterity. Occup Particip Health. 1985;5:24-38. 26. Arvidsson Lindvall M, Anderzen-Carlsson A, Appelros P, Forsberg A. Validity and test-retest reliability of the six-spot step test in persons after stroke. Physiother Theory Pract. 2020;36(1):211-8. 27. Romani J, Giavedoni P, Roe E, Vidal D, Luelmo J, Wortsman X. Inter- and Intra-rater Agreement of Dermatologic Ultrasound for the Diagnosis of Lobular and Septal Panniculitis. J Ultrasound Med. 2020;39(1):107-12. 28. Gellhorn AC, Carlson MJ. Inter-rater, intra-rater, and inter-machine reliability of quantitative ultrasound measurements of the patellar tendon. Ultrasound Med Biol. 2013;39(5):791-6. 29. Brennan RL. Generalizability Theory. New York: Springer-Verlag; 2001. 30. Govaerts MJ, van der Vleuten CP, Schuwirth LW. Optimising the reproducibility of a performance-based assessment test in midwifery education. Adv Health Sci Educ Theory Pract. 2002;7(2):133-45. 31. McGraw KOW, S.P. Forming inferences about some intraclass correlation coefficients. Psychological Methods. 1996;1:30-46. 32. Shrout PE, Fleiss JL. Intraclass Correlations: Uses in assessing rater reliability. Psychological Bulletin. 1979;86:420-8. 33. Kraemer HC, Periyakoil, V. S., Noda, A. Kappa coefficients in medical research. Tutorial in biostatistics. Statistics in Medicine. 2002;21:2109–29. 34. Cohen J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin. 1968;70:213-20. 35. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 1960;20:37-46. 36. Vach W. The dependence of Cohen's kappa on the prevalence does not matter. J Clin Epidemiol. 2005;58(7):655-61.
![Page 70: user manual COSMIN Risk of Bias tool v4 JAN final · 2021. 1. 16. · 6 1. Background information 1.1 COSMIN initiative and steering committee The COSMIN initiative aims to improve](https://reader035.vdokument.com/reader035/viewer/2022071507/6127b1091edf854c486d9c38/html5/thumbnails/70.jpg)
70
37. Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. . Educational and Psychological Measurement. 1973;33:613-9. 38. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135-60. 39. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307-10. 40. de Vet HC, Terwee CB, Mokkink L, Knol DL. Measurement in Medicine. Cambridge: Cambridge University Press; 2011 2010. 41. Euser AM, Dekker FW, le Cessie S. A practical approach to Bland-Altman plots and variation coefficients for log transformed variables. J Clin Epidemiol. 2008;61(10):978-82. 42. de Vet HC, Mokkink LB, Terwee CB, Hoekstra OS, Knol DL. Clinicians are right not to like Cohen's kappa. BMJ. 2013;346:f2125. 43. de Vet HC, Dikmans RE, Eekhout I. Specific agreement on dichotomous outcomes can be calculated for more than two raters. J Clin Epidemiol. 2017. 44. de Vet HCW, Mullender MG, Eekhout I. Specific agreement on ordinal and multiple nominal outcomes can be calculated for more than two raters. J Clin Epidemiol. 2018;96:47-53. 45. Mokkink LB, Vet HC, Prinsen CA, patrick DL, Alonso J, Bouter LM, et al. COSMIN methodology for systematic reviews of Patient‐Reported Outcome Measures (PROMs) - user manual 2018 [Available from: www.cosmin.nl. 46. Higgins JP, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. 2011 [Available from: www.handbook.cochrane.org. 47. Cochrane Hanbook for Systematic reviews of Diagnostic Test Accuracy Reviews 2013 [Available from: http://methods.cochrane.org/sdt/handbook-dta-reviews. 48. Terwee CB, Jansma EP, Riphagen, II, de Vet HC. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res. 2009;18(8):1115-23. 49. Boers M, Kirwan JR, Tugwell P, Beaton D, Bingham CO, III, Conaghan PG, et al. The OMERACT handbook: OMERACT; 2015 2015. 50. Smart A. A multi-dimensional model of clinical utility. International journal for quality in health care : journal of the International Society for Quality in Health Care. 2006;18(5):377-82. 51. Stratford PW, Kennedy D, Pagura SM, Gollish JD. The relationship between self-report and performance-related measures: questioning the content validity of timed tests. Arthritis Rheum. 2003;49(4):535-40. 52. Efron B. Better bootstrap confidence intervals. Journal of the American Statistical Association. 1987;82(397):171-85. 53. Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097. 54. Peterson DAB, P.; Jabusch, H. C.; Altenmuller, E.; Frucht, S. J. Rating scales for musician's dystonia: the state of the art. Neurology. 2013;81(6):589-98.