Journal of Intensive and Critical Care Open Access

  • ISSN: 2471-8505
  • Journal h-index: 14
  • Journal CiteScore: 2.54
  • Journal Impact Factor: 3.4
  • Average acceptance to publication time (5-7 days)
  • Average article processing time (30-45 days) Less than 5 volumes 30 days
    8 - 9 volumes 40 days
    10 and more volumes 45 days
Reach us +32 25889658

Review Article - (2016) Volume 2, Issue 2

Accuracy of Modified Early Warning Scores for Predicting Mortality in Hospital: A Systematic Review and Meta-analysis

Sunee Suwanpasu1* and Youwanuch Sattayasomboon2

1Nursing Department, King Chulalongkorn Memorial Hospital, Bangkok 10330, Thailand

2Faculty of Public Health, Mahidol University, Bangkok 10400, Thailand

*Corresponding Author:
Sunee Suwanpasu
Nursing Department, King Chulalongkorn Memorial Hospital
Bangkok 10330, Thailand
E-mail: suneesuwanpasu@gmail.com

Received: April 26, 2016; Accepted: April 28, 2016; Published: May 05, 2016

Citation: Suwanpasu S, Sattayasomboon Y. Accuracy of Modified Early Warning Scores for Predicting Mortality in Hospital: A Systematic Review and Meta-analysis. J Intensive & Crit Care 2016, 2:2.

Visit for more related articles at Journal of Intensive and Critical Care

Abstract

Background: Patients at risk of rapid deterioration and critical illness often have preceding changes in their physiological parameters. Use of Modified Early Warning Score (MEWS) allows distinguishing points-of-care of patients who are at increased risk of serious adverse outcomes.

Objectives: To evaluate the prognostic accuracy of the MEWS to identify patients at risk of in-hospital death. Methods: The review process conformed to the recommendation form five steps to conduct a systematic review. Relevant studies from January 2000 to December 2015 were obtained from electronic databases. Standards for Reporting of Diagnostic accuracy (STARD) and Quality Assessment of Diagnostic Accuracy Studies instrument (QUADAS-2) were used to assess quality and bias of individual studies. MedCalc statistic software was used to merge the diagnostic accuracy data of recruited studies. The prognostic accuracy of predicting for in-hospital death was pooled on data from the diagnostic odds ratio and area under The ROC Curve (AUC) analysis.

Results: A total of 402 citations were identified yielding 16 studies for inclusion in this systematic review. Studies were statistically significant heterogeneous in terms of age and sample size. For predicting in-hospital death, high risk group of MEWS that get the threshold equal 4 or more and equal 5 or more had the Diagnostic odds ratio (DOR) of 14.278 (95% Confidence Interval [CI] 12.185 to 16.730, I2=56.59%]) and 3.28 (95%CI:2.489 to 4.323, I2=48.64%). On pooled AUC analysis, there was a trend for MEWS to estimate fair at the discriminative power of test. AUC of MEWS > 4 was 0.778 (95% CI : 0.715 to 0.841, I2=89.54%) and of MEWS > 5 was 0.646 (95% CI : 0.611 to 0.682, I2=49.69%).

Conclusions: The result showed a robust positive trend to predict in-hospital death. MEWS equal 4 or greater may be a favored threshold to alert to call for immediate appropriate action in hospitalized patients.

Keywords

Sensitivity; Specificity; Morbidity; Prognostic accuracy

Introduction

Patients at risk of severe illness and unanticipated serious adverse outcomes often have tended to precede changes in their physiological parameters [1-3]. Delays in the recognition and treatment of these changes increase the risk of in-hospital death [4]. Early recognition of them is important not only in order to facilitate treatment, but also so that decisions can be made as to whether admission to critical care and cardiopulmonary resuscitation are appropriate. Calculation of early warning system scores is standard practice in many hospitals to predict clinical deterioration. Screening tools, such as the Modified Early Warning Score (MEWS), have been demonstrated to have some utility in identifying these patients particularly among general medical and surgical patients [3,5].

The basis of a MEWS as being low risk (MEWS 0-1), intermediate risk (MEWS 2-3), high risk (4 to 5), or highest risk (MEWS > 6) of serious deterioration, nursing staff was instructed to alert appropriate medical staff if the MEWS threshold was 4 to 5 indicating deterioration and the need for greater concern and a score of 6 or higher, meaning the patient was experiencing serious changes in condition that called for immediate action [6]. A recent study found a statistically significant decrease in mortality after MEWS implementation [7]. They found that death per adult admission decreased from 1.4 to 1.2% (p<0.0001) in one hospital and 1.5 to 1.3% (p< 0.0001) in the other hospital. Moon A et al. also found that patients who had undergone cardiopulmonary resuscitation had significant decrease in in-hospital mortality at the two hospitals from 52 to 42% (p<0.05) and from 70 to 40% (p<0.0001), respectively [7].

MEWS was significantly higher in non-survivor than survivor group. In emergency medical patients, median score of MEWS in non-survivors was 2-5 and survivors were 1-2 [8-11]. Mean of MEWS in non-survivors was about 3.5+1.7 and survivors were 2.3 + 1.7 [12]. Moreover, in general internal medicine department, mean of MEWS in non-survivors was 4.5-6.3 and survivors were 3.2-4.2. [13,14]. In a poor outcome as death or admission to the ICU, mean of MEWS was 5.6 + 2.5 and good outcomes were 3.3 + 2.3 [15]. In predicting patient mortality a MEWS threshold being 4 or greater had a sensitivity of 70.6% and a specificity of 37.8%, and a MEWS score of 5 or more had a sensitivity of 58.8% and a specificity of 56.2% [16].

In an effort to improve patient safety, there have been a number of studies analyses the use of MEWS to recognize the potential for clinical deterioration in ED and ward patients. However, the screening scoring has also been demonstrated to have uncertain thresholds used to identify patients in at risk to deteriorating and alert for actions. The fact is little is known exactly point of threshold of MEWS tools to predict hospitalized death should be. The aim of the systematic review and meta-analysis was to evaluate and provide robust evident on the prognostic accuracy (sensitivity, specificity, AUROC) of MEWS to in-hospital death for hospitalized patients.

Methods

This study followed five steps to conducting a systematic review [17]. The MedCalc statistic software was used for meta-analysis.

Framing of questions

Our key study questions were devised on the basis of the acronym PICO (patient, intervention or exposure, comparator, and outcome). They met the following criteria: (a) patients: Adult patients (aged 18 years or over) requiring hospitalization because of acute medical or surgical reasons, (b) intervention or exposure: MEWS instrumentation, (c) comparator: high risk of in-hospital death (MEWS > 4 or > 5) compared with low risk of in-hospital death (MEWS < 4 or < 5) (d) outcome: unplanned in-hospital death as shown in Table 1 (Table 1). The main research question was thus, how many is prognostic accuracy for MEWS tool to predict the deterioration in term of unplanned in-hospital death?

Population Adult patients (aged 18 years or over) requiring hospitalization because of acute medical or surgical reasons
Intervention or exposure The Modified Early Warning Score (MEWS) instrumentation
Comparator group Criteria of high risk of in-hospital death (MEWS ≥ 4 or ≥ 5) compared with low risk of in-hospital death (MEWS<4 or <5)
Outcome or endpoint Unplanned in-hospital death is the main outcomes of interest.

Table 1: PICO: eligibility criteria for considering studies for this review.

Identifying relevant publications

Predefined criteria were applied to select the final list of articles to be included in the review. The articles had to describe a study that provided the prognostic accuracy of MEWS. We included studies in which researchers reported on unplanned in-hospital death were reported, and if diagnostic accuracy of tests for MEWS scoring system were available, or were derivable from the data presented. If several tests were studied simultaneously, data from each were extracted separately, and if patients who obstetric and psychiatric conditions could be excluded. We also excluded before surgical procedure and before discharge from operating room studies, editorials, single cases and case series, studies that published only abstracts, letters, or commentaries, or if they were part of duplicate populations.

Selection of relevant databases and search terms

An electronic search of published reports was undertaken to identify studies published between January 2000 to December 2015 were obtained from electronic databases in English language journals. An electronic database search was conducted using the following databases: PubMed (121 papers), Clinical key (33 papers) Ovid MEDLINE (53 papers), CINAHL plus with full text, EBSCO host (50 papers), Scopus, Science Direct, and Web of Science (145 papers). A variety of key words were used to conduct multiple searches. We started by searching the terms in each column of the PICO question, linked by “OR,” and then combined the results of each search and retrieved only articles that contained Medical Subject Headings (MeSH) terms (207), Cumulative Index to Nursing and Allied Health Literature (CINAHL) headings (35 papers), Title-Abstr- Key, and Topic (160 papers). Additional references were obtained from the bibliographies of review articles and original papers. The search was limited to manuscripts published from January 2010 to December 2015, English language and human participants (Figure 1). The selected papers were then independently assessed for inclusion by Suwanpasu and Sattayasomboon (S.S. and S.Y). Discrepancies were discussed and agreement was achieved by consensus.

criticalcare-PRISMA-Flow

Figure 1: PRISMA Flow Diagram.

Assessing the quality of studies

The methodological quality was assessed using The Standards for Reporting of Diagnostic Accuracy (STRAD) checklist [18]. This is a reporting guideline that includes a checklist of 25 items evaluated the studies to contribute to the completeness and transparency of reporting of diagnostic accuracy studies. Each item was scored as completely reported (score=2), partly reported (score=1) or not reported (score=0). Equal weights were given to all items. A manuscript could score a maximum possible score of 50 if all items were fully reported; conversely a study not reporting any item would score 0. The final number of studies meeting our inclusion criteria was 16 reported more than 50% of the STARD items (Table 2). Decisions to include or exclude a study and the information retrieved were compared between the two authors. Discrepancies were discussed and agreement was achieved by consensus.

Study Finlay et al. [23] Ghanem-Zoubi et  al. [13] Cattermole et al. [15] Ho et al. [26] Adrijevic et al. [24] Dundar et al. [11] Vorwerk et al. [14] Wheeler et al. [16] Gardner-Thorpe et al. [27] Subbe et al. [5] Geier et al. [21] Stark et al. [22] Cooksley et al. [28] Bulut et al. [10] Eick et al. [12] Armagan et al. [25]
Title/ 1. A study of diagnostic accuracy? 0 0 0 0 0 0 1 2 2 0 2 0 0 0 0 0
Abstract/
Keywords
Introduction 2. Study aims: estimating or comparing accuracy 2 2 2 2 2 2 1 1 2 1 2 0 0 2 2 2
Methods 3.Inclusion and exclusion criteria, setting and locations 2 0 2 2 2 2 2 1 2 2 0 1 2 2 0 0
Participants 4. Recruitment 2 2 2 2 2 2 2 1 2 2 1 1 2 2 1 1
  5.Sampling 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2
  6. Data collection 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Test methods 7. Reference standard 2 2 2 2 2 2 2 2 1 1 2 1 2 2 2 1
  8. Technical specifications of material and methods 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
  9. Definition of and rationale for the units, cut-offs and/or categories 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
  10. Expertise 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0 0
  11. Blind  to the results 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0
Statistical methods 12. Diagnostic accuracy and quantify uncertainty 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
  13. Test reproducibility 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Results 14. Beginning and end dates of recruitment. 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Participants 15. Clinical and demographic characteristics 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
  16. Flow diagram 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0
Test results 17.Time interval 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
  18. Distribution of severity of disease 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
  19. A cross tabulation 0 0 0 0 0 0 0 0 1 1 0 2 0 0 0 0
  20. Any adverse events 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Estimates 21. Estimates of diagnostic accuracy and statistical uncertainty 2 2 2 2 2 2 2 2 0 0 1 1 1 1 1 1
  22. Missing data were handled. 2 0 1 1 1 1 1 1 1 1 1 1 2 1 1 1
  23. Subgroups analysis 0 2 2 2 0 1 1 2 2 2 2 2 0 0 2 0
  24. Test reproducibility 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Discussion 25. Discuss the clinical applicability 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Total score   32 30 33 33 31 36 36 32 33 32 31 29 28 30 29 25

Table 2: STARD checklist for reporting of studies of diagnostic accuracy of MEWS.

Data extraction strategy

Pertinent data from the selected studies were evaluated and extracted independently by two of us (SS and YS), using standardized spreadsheets for quality and extracted relevant data. Discrepancies were resolved by consensus. Information extracted included reference data (first author, journal, and institution), publication year, number of patients, mean age, proportion of male patients, design, MEWS threshold, and prognostic accuracy (True Positive [TP], False Positive [FP], False Negative [FN], True Negative [TN], sensitivity, specificity, Positive Predictive Value [PPV], Negative Predictive Value [NPV], Positive Likelihood Ratio [LR+], Negative Likelihood Ratio [LR-], area under the receiver operating characteristic curve [AUROC], and Diagnostic Odds Ratio [DOR]. A number of patients with SSS, unplanned CA and in-hospital death were recorded. This information is shown in Table 3. Papers in languages other than English, case reports, commentaries, review papers, letters to editors were excluded.

Number Author Year AUC SE DOR Design Population Men Age Setting CC MEWS cut-off TI TP FN FP TN TC Sensitivity Specificity PPV NPV LR+ LR-
1 Finlay et al.
[23]
2014 0.82 (0.82 to 0.83) 0.005 14.4813 (12.2901 to17.0633) Retrospective 32472     electronic medical records general > 4 2346 307 2039 310 29816 30126 0.498 0.936 13.07% 98.97% 7.8 0.54
2 Cattermole et al.
[15]
2009 0.754 (0.703 to 0.799) 0.023 6.3137 (3.1845 to12.5178 Prospective 330 195 61.3+ 20.6 ED General > 4 105 31 74 14 211 225 0.69 0.74 29.52% 93.77% 2.65 0.4
3 Ho et al.[26] 2013 0.71 0.03 2.9109 (1.9161 to 4.4223 Retrospective 1024   61.4+ 18.1 ED General > 4 311 53 258 47 666 713 0.53 0.721 17.04% 93.41% 1.9 0.65
4 Andrijevic et al. [24] 2014     5.913 Prospective 101 76 63+ 11.8 ED CAP > 5 35 17 18 9 57 66 0.654 0.76 48.57% 86.36% 2.72 0.46
5 Dundar et al.[11] 2015 0.891 (0.844 to 0.937) 0.023 22.4824 (11.8409 to 42.6874) Prospective 671 375 75+11 ED Med-surg > 4 110 42 68 15 546 561 0.74 0.89 39.25% 97.34% 6.76 0.29
6 Vorwerk et al.[14] 2008 0.72 (0.67 to 0.77) 0.026 3.76 (2.11 to 6.71) Retrospective 307 158 69.7 (67.5 to 71.8) ED sepsis > 5 148 52 96 20 139 159 0.722 0.592 35.37% 87.42% 1.77 0.47
7 Wheeler et al.[16] 2013 0.59 (0.51 to 0.68) 0.046 1.8313 Prospective 302 155 39.5+15.9 ED General > 5 140 30 110 21 141 162 0.588 0.562 21.43% 87.00% 1.343 0.733
8 Geier et al. [21] 2013 0.642(0.517 to 0.768) 0.064   Prospective 151 82 68.3+ 18 ED suspect sepsis > 5                        
9 Armagan et al.[25] 2008     24 (9.093 to 63.3455) Prospective 309 183 57.1+ 15.3 ED Medicicine > 4 106 40 66 5 198 203 0.889 0.75 37.74% 97.54% 3.56 0.15
10 Subbe et al. [3] 2003     4.2814 (2.8134 to 6.5154) Prospective 1695 45 64+ 19 General Medicine >5 133 37 96 129 1433 1562 0.223 0.937 27.82% 91.74% 3.55 0.83
11 Gardner-throepe et al.[27] 2006       Prospective 334 165 58.6+ 19.2 Surgrery Surgery > 4 58 4 54 0 276 276 1 0.86 6.90% ####### 6.11 0
12 Stark et al.[22] 2015     9.9048 (2.4334 to 40.3164) Retrospective 62 40 62 General Surgical > 4 46 32 14 3 13 16 0.91 0.48 71.00% 80.00% 1.75 0.19
13 Cooksley et al.[28] 2012 0.6 0.021   Retrospective 840 430   Ward Onco > 4                        
14 Bulut et al. [10] 2015 0.630 (0.608 to 0.651) 0.107 3.837 (2.358 to 6.243) Prospective 2000 1039 61.41+ 18.92 ED general > 5                        
15 Ghanem et al.[13] 2011 0.6947 (0.65 to 0.73) 0.021   Prospective 1072   74.7+16.1 General   > 4                        
16 Eick et al.[12] 2015 0.706 (0.667 to 0.750) 0.022 1.28 (1.22 to 1.35) Prospective 5730 3125 61.2+ 17.7 ED general                          

Table 3: Studies of diagnostic accuracy.

Risk of bias in individual studies

Outcome reporting bias within eligible studies was reported quantitatively using QUADAS-2 assessment [19]. Data were extracted by the lead author for patient selection, index test, reference standard, and flow and timing. The information was independently checked against full-text articles by the second author and agreement was reached for the accuracy of all studies. Study bias was assessed on a scale from 1 to 3 [1=low risk, 2=unclear risk, 3=High risk. Higher score with QUADAS-2 of the selected articles suggest more likely to represent bias than lower score (Table 4).

Study Risk of Bias Applicability concerns
Patient selection Index test Reference standard Flow and time Patient selection Index test Reference standard
1. Finlay et al. [23] 2014 1 3 1 1 2 1 1
2. Ghanem-Zoubi et al.[13] 2011 3 1 1 1 1 1 1
3. Cattermole et al.[15] 2009 1 3 1 1 1 1 1
4. Ho et al.[3] 2013 1 1 2 1 1 1 1
5. Andrijevic et al.[24] 2014 1 1 1 1 1 1 1
6. Dundar et al.[11] 2015 1 1 1 1 1 1 1
7. Vorwerk et al.[14] 2009 2 1 1 1 1 1 1
8. Wheeler et al.[16] 2013 2 1 1 1 1 1 1
9. Gardner-Thorpe et al. [27] 2006 1 1 2 1 1 1 1
10. Subbe et al.[3] 2003 1 1 2 1 1 1 1
11. Geier et al.[21] 2013 2 1 1 1 3 1 1
12. Stark et al.[22] 2015 1 1 2 1 1 1 1
13. Cooksley et al.[28] 2012 1 1 1 1 3 1 1
14. Bulut et al.[10] 2014 1 1 1 1 1 1 1
15. Eick et al.[12] 2015 2 1 1 1 2 1 1
16. Armagan et al.[25] 2008 3 1 2 1 1 1 1

Table 4: QUADAS-2 results in diagnostic accuracy review of MEWS.

Summary measures and synthesis of results

We considered three issues: individual study setting number of patients, mean age, design, adverse outcomes prevalence, design, and prognostic test properties (sensitivity, specificity, PPV, NPV, LR+, LR-, DOR, and AUC). To compute meta-analysis summary estimates when more than one study assessed the same index test at the same threshold for the same or similar outcomes at the same follow-up interval, we combined eligible trials’ data using MedCalc statistic software. A fixed-effects model was initially used in this systematic review because we found homogeneity across the study population. A random-effects model was applied only if statistical heterogeneity existed. We assessed statistical heterogeneity using the Cochran Q test and by calculating –squared and I2values (I2>75%considered high level of heterogeneity) [20]. When heterogeneity was substantial (I2>75%), we investigated the sources of heterogeneity by determining the effect of important modifiers: sample details (type and quantity), study design and risk for bias, and the effect of the imputed data.

Risk of bias across studies

Publication bias was assessed by the use of a funnel plot (Funnel plot Generator alpha), which is used primarily as a visual aid for detecting bias or systematic heterogeneity. A symmetric inverted funnel shape arises from a ‘well-behaved’ data set, in which publication bias is unlikely. An asymmetric funnel indicates a relationship between treatment effect and study size. This suggests the possibility of either publication bias or a systematic difference between smaller and larger studies.

Results

Search results

For among 549 titles and abstracts and 147 duplicates were identified and removed after exporting the results in to reference management database Endnote®. Studies were examined starting with an appraisal of titles and abstracts. A preliminary review of the titles and abstracts resulted in the identification of 170 articles were selected for full-text review (See Figure 1-systematic review flow diagram). In the next stage, printed copies of the remaining publications were read by either Suwanpasu or Sattayasomboon (S.S. or S.Y). A consensus was made on those that met the aforementioned criteria. Furthermore, 76 studies were excluded and 94 papers remained. An additional 36 studies were excluded after examining them carefully. Of these 58 papers, we included 27 providing on diagnostic accuracy of MEWS scoring system. The methodological quality of the 17 included articles was assessed using The Standards for Reporting of Diagnostic Accuracy (STRAD) checklist. The final number of studies meeting our inclusion criteria was 16. Decisions to include or exclude a study and the information retrieved were compared between the two authors. Discrepancies were discussed and agreement was achieved by consensus.

Characteristics of included studies

Papers include in the final review originated form six countries, Australia [14], Germany [12,21], Israel [13], United States of America [22,23], Hong Kong [15], Serbia [24], Malawi [16], Turkey [10,11,25], Singapore [26], and United Kingdom [5,27,28] and were published between 2010-2015. Twelve studies were prospective study and five studies were retrospective study. The mean score of STRAD was 31.25 + 2.817 with a range 25-36 (out of a maximum of 50) (Table 2). Only two studies fully reported at least 70% of the checklist items. The use of the MeSH heading ‘sensitivity and specificity” were identified only 18.8% (3/16) of all studies. 11/16 (68%) state estimating diagnostic accuracy or comparing accuracy between tests or across participant groups in research question or study aims. Almost studies (15/16) state the reference standard and its rationale. Ten of sixteen (62.5%) fully described the inclusion and exclusion criteria, setting and locations where data were collected (Item 3). All of the articles mentioned data was collection planned before the index test and reference standard were performed or after (Item 6). Information about masking of the test readers was reported in only two of the included publications (12.5%). (Item 11) Methods for calculating test reproducibility and any adverse events were not reported of the publications (Item 13 and 20). Only 6.3% (2/16) of the manuscripts had a flow chart describing flow of patients within a study. Studies were significantly heterogeneous in terms of age and sample size.

In term of risk of bias and applicability of primary diagnostic accuracy studies, QUADAS-2 was applied. (Table 4) Ten of sixteen (62.5%) were low risk of bias with patient selection. Almost of studies were low risk of bias with index test (87.5%) and reference standard (68.8%). For, risk of bias of applicability concerns, 75% (12/16) were low risk of bias with patient selection and all studies were low risk of bias with Index test, reference standard and flow and time.

Weighted pooled results

The diagnostic test performance for individual studies is outlined in table 3. In pooled data weighted by the number of patients of patients with in-hospital death and without in-hospital death in each study (Table 5), MEWS using threshold > 4 showed the Diagnostic odds ratio (DOR) of 14.278 (95% Confidence Interval [CI] 12.185 to 16.730, I2=56.59%]) compared with MEWS using threshold < 4 (Figure 2). In addition, MEWS using threshold > 5 showed the Diagnostic odds ratio (DOR) of 3.28 (95% CI : 2.489 to 4.323, I2=48.64%) for predicting in-hospital death (Figure 3).

criticalcare-Flow-Diagram

Figure 2: PRISMA Flow Diagram.

criticalcare-Diagnostic-odds-ratio

Figure 3: Diagnostic odds ratio of MEWS threshold equal 5 or greater.

MEWS threshold More than 4 or greater   More than 5or greater  
  Fix effects
(95%CI)
Random effects
(95%CI)
p Fix effects
(95%CI)
Random effects
(95%CI)
p
DOR 14.287 (12.185-16.73) 14.139
(9.169-21.804)
<0.001 3.280
(2.489-4.323)
3.262
(2.171-4.901)
<0.001
AUC 0.801
(0.785-0.818)
0.778
(0.715-0.841)
<0.001 0.801
(0.785-0.818)
0.778
(0.715-0.841)
<0.001

Table 5: Prognostic accuracy of MEWS threshold.

Summary receiver operation characteristic analysis for each diagnostic test

In a pooled AUROC analysis for MEWS (Table 5) with using threshold > 4, there was a trend of AUROC in-hospital death to estimate sufficient at the discriminative power of test for in-hospital death (AUROC 0.778 [95% CI : 0.715 to 0.841, I2=89.54%], but there was statistically significant demonstrated heterogeneity (Figure 4). For MEWS using threshold > 5 to estimate at the discriminative power of test for in-hospital death was 0.646 (95% CI : 0.611 to 0.682, I2=49.69%) (Figure 5). Therefore, overall the MEWS using threshold > 4 was generally more diagnostic accuracy than using threshold > 5 for patients who had an in-hospital death.

criticalcare-Area-under

Figure 4: Area under the receiver operating characteristic curve of MEWS threshold equal 4 or greater.

criticalcare-receiver-operating

Figure 5: Area under the receiver operating characteristic curve of MEWS threshold equal or greater.

Discussion

The systematic review aimed to identify the prognostic accuracy of screening instruments of the Modified Early Warning Score (MEWS) to detect risk of in-hospital death. Sixteen studies implementing using two triggering critical score approaches as 4 and greater or 5 and greater if there was in-hospital death. The first step in critical appraisal of a test is a comprehensive literature search to identify relevant studies. Only three studies did not explicitly use the terms ‘diagnostic accuracy’ or ‘sensitivity and specificity’ in their titles or abstracts [16,21,27]. This is likely to limit a quick electronic search for evidence base of a diagnostic test. To evaluate the risk of bias and applicability of primary diagnostic accuracy studies, the selection of patients have introduced bias related to lack of exclusion criteria [25,27]. Risk of bias of index test relates data did not contain the simplified alert/voice/pain/ unresponsive (A/V/P/U), computation of MEWS used appropriate mapping of Glasgow Coma score [15,23]. Moreover, Risk of bias of reference standard dues to the potential influence of prior knowledge on the interpretation of the reference standard, using difference threshold as reference standard [5,22,25-27].

However, a larger number of studies being identified for this review compares with previous systematic reviews in this subject area [1]. Studies reported that there is high discriminative performance for predicting in-hospital death. A MEWS threshold of 5 or more, a sensitivity of 58.8-68% was associated with specificity of 56.2-68% [16,22] and accuracy 68% [22]. Lowering the MEWS triggering score for 5 to 4 (of possible 14) increased the sensitivity (70.6-91%) and accuracy (73%), but decreased specificity (37.8-48%) [16,22]. The significant difference were found between patients who died and alive during hospital, the averages of the MEWS scores for patients who died during hospital was 3.5-4.5, and survival was 2.3-3.3 [12,13,15].

Our studies found that MEWS using threshold of 4 or more show the pooled diagnostic odds ratio (DOR) of 14.278 (95% Confidence Interval [CI] 12.185 to 16.730, I2=56.59%]) compared with MEWS using threshold less than 4. In addition, MEWS using threshold of 5 or more showed the pooled diagnostic odds ratio (DOR) of 3.28 (95% CI : 2.489 to 4.323, I2=48.64%) compared with MEWS using threshold less than 5. With the same sensitivity of the test, DOR increases with the increase of the test specificity. The value of a DOR ranges from 0 to infinity, with higher values indicating better discriminatory test performance [29].

The AUC for in-hospital death was 0.778 with MEWS using threshold of 4 or more as compared with 0.646 with the MEWS using threshold of 4 or more. The area under the curve (AUC) helps us estimate how high the discriminative power of a test is. The performance of a test for which several cutoffs are available can be expressed by means of ROC analysis [29]. The AUC takes values between 0 and 1, which higher values indicating better test performance. Thus, MEWS using threshold > 4 was generally more diagnostic accuracy than using threshold > 5, for patients who had an in-hospital death. There is a pair of diagnostic sensitivity and specificity values for individual cut-off. The results support that patients were classified on the basic of MEWS as being low risk (MEWS 0-2), intermediate risk (MEWS 3-4) or high risk MEWS > 4 of serious deteriorations in association with a call out algorithm in a useful and appropriate risk-management that should be implemented for general in-patients.

Limitations

This mate-analysis has limitations. First, this study was limited by the inclusion/exclusion criteria and the search terms employed. In addition, relevant studies may have been omitted due to applied search limitation and reviver decision-making for inclusion/ exclusion for final review incorporation. Second, relevant studies may have been missed despite using comprehensive search strategies and the assistance of an information specialist to search multiple databases. Third, we identified a small number of studies with heterogeneous populations that measured different outcomes, which limited direct comparison of threshold of risk stratification tools. Finally, the meta-analysis of individual studies demonstrates significant statistical heterogeneity, even when assessing the same criteria of MEWS for the same outcome on similar patient populations. This heterogeneity is partly due to inconsistent definitions for outcomes and variable methods of measuring and/or obtaining the outcomes measures. In addition, some studies recruited patients solely from the ED, while others included ED patients after admission. Some articles lacked sufficient details to reconstruct 2×2 tables. QUADAS-2 assessment indicates several forms of potential bias, including spectrum bias and incorporation bias, since outcome assessors sometimes lacked blinding to the index test results. Another limitation is that a lack of sufficiently similar prognostic studies existed to perform meta-analysis for the instruments and outcomes. The relative importance of the outcome is undefined, but likely not equal.

Conclusion

Accurate and reliable identification of early deterioration among hospitalized adult patients is essential to be used to risk stratify them do accurately distinguish high risk subsets and should be used by nurses. Independent risk stratification may provide clinicians with additional information to guide clinical decision-making, but further evaluation is required. Implementation of a risk stratification tool can improve processes and outcomes of care for patients. The tool needs to have sufficient predictive power to provide clinicians with confidence to use the results to guide decision-making. The first two evaluation criteria are satisfied to varying degrees by the tools identified in our review. However, prior to implementation, an impact analysis demonstrating evidence that risk stratification changes physician behavior and improves patient outcomes is needed

References