Planned Home vs Hospital Birth: A Meta-Analysis Gone Wrong
Carl A. Michal, PhD; Patricia A. Janssen, PhD; Saraswathi Vedam, SciD; Eileen K. Hutton, PhD; Ank de Jonge, PhD
Posted: 04/01/2011
Authors and Disclosures
Author(s)
Carl A. Michal, PhD
Associate Professor, Department of Physics and Astronomy, The University of British Columbia, Vancouver, British Columbia, Canada
Disclosure: Carl A. Michal, PhD, has disclosed no relevant financial relationships.
Patricia A. Janssen, PhD
Associate Professor, School of Population and Public Health, The University of British Columbia, Vancouver, British Columbia, Canada
Disclosure: Patricia A. Janssen, PhD, has disclosed no relevant financial relationships.
Saraswathi Vedam, SciD
Associate Professor & Director, Division of Midwifery, University of British Columbia, Vancouver, British Columbia; Senior Consultant, Division of Research, Midwives Alliance of North America, Washington, DC; Chair, Home Birth Section, Division of Standards and Practice, American College of Nurse-Midwives, Silver Spring, Maryland
Disclosure: Saraswathi Vedam, SciD, has disclosed no relevant financial relationships.
Eileen K. Hutton, PhD
Director, Midwifery Education Program, McMaster University, Hamilton, Ontario, Canada
Disclosure: Eileen K. Hutton, PhD, has disclosed no relevant financial relationships.
Ank de Jonge, PhD
Senior Midwife Researcher, Department of Midwifery Science, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands
Disclosure: Ank de Jonge, PhD, has disclosed no relevant financial relationships.
A Flawed Analysis
The highly charged debate over the safety of home birth was inflamed by the publication of a meta-analysis by Joseph R. Wax and coworkers,
[1] which concluded that "less medical intervention during planned home birth is associated with a tripling of the neonatal mortality rate." The statistical analysis upon which this conclusion was based was deeply flawed, containing many numerical errors, improper inclusion and exclusion of studies, mischaracterization of cited works, and logical impossibilities. In addition, the software tool used for nearly two thirds of the meta-analysis calculations contains serious errors that can dramatically underestimate confidence intervals (CIs), and this resulted in at least 1 spuriously statistically significant result. Despite the publication of statements and commentaries querying the reliability of the findings,
[2-6] this faulty study now forms the evidentiary basis for an American College of Obstetricians and Gynecologists Committee Opinion,
[7] meaning that its results are being presented to expectant parents as the state-of-the-art in home birth safety research.
In this article we describe in detail numerous mistakes in design, methodology, and reporting in the Wax meta-analysis that place clinicians and patients at risk for being misinformed.
Paradoxical Results
The main conclusion of the analysis by Wax and coworkers that planned home and planned hospital births exhibit similar perinatal mortality rates, but home births are characterized by 2-3 times higher neonatal death rates,
[1] is drawn from data that are self contradictory. The mortality rates reported in the paper are reproduced in Table 1. CIs for these proportions were not provided in the article.
Table 1: Perinatal and Neonatal Death Rates Reported by Wax and Colleagues
| Planned Home Birth (%) | Planned Hospital Birth (%) |
Perinatal death | | |
All | 0.07 | 0.08 |
Nonanomalous | 0.07 | 0.08 |
Neonatal death | | |
All | 0.20 | 0.09 |
Nonanomalous | 0.15 | 0.04 |
Adapted from Wax JR, et al.
Am J Obstet Gynecol. 2010;203:243:e1-e8.
[1]
Wax and colleagues defined perinatal death as stillbirth of at least 20 weeks or 500 g, or death of a liveborn infant within 28 days of birth. Neonatal deaths are defined as deaths of liveborn infants within 28 days of delivery.
[1] With the definitions chosen by these investigators,
neonatal deaths are a subset of perinatal deaths. As can be seen in Table 1, however, the investigators' results show that for planned home births, the neonatal death rates are actually far higher than the corresponding perinatal death rates. According to the investigators' definitions, these results are impossible. This is not unique to the planned home birth statistics, and in fact the neonatal death rate for all hospital births is also greater than the corresponding perinatal death rate. These paradoxical results arise from the dramatic differences in outcomes among the included studies, as will be described. It is clear, however, that the perinatal and neonatal death results cannot possibly represent comparable populations.
Because the perinatal death statistics are drawn from more than 500,000 births, whereas the neonatal death statistics are drawn from fewer than 50,000 births (and for many other reasons described below), the neonatal death statistics in the study by Wax and colleagues cannot be defended.
Numerical Errors
The results of the meta-analysis are presented in 2 tables: (1) for maternal outcomes, and (2) for neonatal outcomes.
[1] For each outcome, Wax and associates provided the number of studies used in the calculation for that outcome, the number of births reporting that outcome, the total number of births in the included studies, and the summary odds ratio (OR) and 95% CI for the OR. Lists of which studies were included for each of the outcomes were not originally provided but have subsequently been made available.
[8]
In attempting to reproduce some of the results, we find numerous numerical errors. In Table 2, we reproduce Wax and colleagues' table of neonatal outcomes, adding a column indicating which studies were used for each. Numerical errors are evident in every row. Many of these errors are minor, but several are highly significant, off by factors of 2 or more. In 1 instance (all perinatal deaths), the number of included studies was even incorrect. For another (large for gestational age), essentially every number is wrong.
Table 2. A Reproduction of the Neonatal Results Table
Many of the ORs and CIs have been calculated incorrectly. In some cases, this was the result of errors apparently made in the extraction of data from the original studies. For example, we point to the study of Pang and coworkers
[14] from which, to obtain results found in the summary table, the investigators must have counted 13 nonanomalous neonatal deaths in the home birth group. However, from Table 4 in that paper, it is clear that only 12 deaths should have been included.
Another example of an error in data extraction is in the all neonatal deaths outcome, where, again to reproduce the results in the supplemental table, the study by Janssen and colleagues
[15] must have included a neonatal death in the hospital group. The only hospital death mentioned in that report was a stillbirth, not a neonatal death.
A third example of incorrect data extraction may be found, again in the all neonatal death outcome. In the study by Koehler, Solomon, and Murphy,
[13] from which Wax and coworkers apparently included no deaths, 1 of the home birth deaths fit the definition of neonatal death.
In all 3 of these cases, the studies should not have been included in these outcomes at all.
A fourth example of incorrect data extraction is found in the perineal laceration outcome for which, to reproduce the results in the summary table, the report by Janssen and colleagues
[19] apparently included only first- and second-degree lacerations rather than all perineal lacerations.
Both the investigators and peer reviewers ought to have been concerned that the direction and magnitude of the ORs for a variety of outcomes were illogical. Examples include postdates, for which the occurrence frequencies of 2.1% and 2.2% make the provided OR of 1.87 seem very unlikely, and newborn ventilation, for which the frequencies of 3.7% and 4.7% similarly make the OR of 1.12 seem unlikely. Several of the denominators appearing in the tables should also have raised concerns. For example, large for gestational age and newborn ventilation both have denominators of 13,525 for the home birth groups, in the first case arising from 4 studies, but from only 3 studies in the other. The denominator 10,701 appears for hospital births for both postdates and newborn ventilation, arising from 3 studies in the first case and 4 studies in the other.
A Faulty Computational Tool
In the methods section of the article, Wax and coworkers state that the random effects analyses were performed with "an online meta-analysis calculator from the University of Pittsburgh (
http://www.pitt.edu/~super1/lecture/lec1171/meta5.doc)." This is a mischaracterization. Visiting this Web address results in a download of a Microsoft® Word document containing an embedded spreadsheet. The file is distributed as part of an online course in epidemiology.
Close inspection of the spreadsheet (retrieved on January 28, 2011), however, reveals several serious errors within the spreadsheet. The consequences of these errors are that:
- The CI provided most likely underestimates the true CI, often dramatically;
- The summary OR is in general incorrect;
- The results of the analysis can appear to provide a statistically significant positive or negative result when it should not (this has in fact occurred in Wax and colleagues' article in at least 1 outcome); and
- The calculated results depend on the order in which the studies are entered into the spreadsheet.
These errors have been confirmed by the spreadsheet's creator.
[20] Refer to the appendix for details.
This spreadsheet appears to have been used to calculate results for 13 of the 21 outcomes in the paper (the investigators fail to state whether it was used for electronic fetal monitoring, but it does appear to have been used).
All of the results calculated on the basis of the spreadsheet are numerically incorrect.
The article contains at least 1 outcome for which the statistical significance of the result is incorrect as a result of using the spreadsheet. For perineal lacerations, the result of an OR of 0.76 (95% CI: 0.72-0.81) would have been an OR of 1.03 (95% CI: 0.70-1.51) if a correct computational tool had been used, and very different conclusions would be drawn for this outcome. The error in data extraction associated with this outcome does not alter the finding that the use of the spreadsheet results in the wrong conclusion being drawn. We have not attempted to reproduce most of the maternal outcome results, but we expect that similarly serious errors remain.
Selective and Mistaken Inclusion/Exclusion
A number of errors are apparent in the inclusion of studies. The inclusion of de Jonge and associates
[17] in the all perinatal death statistic is erroneous, because that article plainly states that all children with congenital abnormalities were excluded. This study should not have been included in nonanomalous perinatal death statistics either, because the statistics provided include only intrapartum and neonatal deaths up to 7 days. This time period is strikingly different from Wax and colleagues' definition of perinatal death. This study, which contributes more than 95% of the births used for the perinatal death rates, therefore, does not provide data that are compatible with Wax and colleagues' definitions for those outcomes. It is unclear why Wax and colleagues chose to exclude this study from the calculations for neonatal mortality but include the study for perinatal mortality. If that study were removed from the calculations for the 2 outcomes for which it was erroneously included, the total number of births included in the meta-analysis would have been reduced from nearly 550,000 to just 65,000. This dramatic reduction in the size of the dataset would have significantly reduced the impact of any findings of the meta-analysis. On the other hand, if Wax and colleagues had defined perinatal death and neonatal death according to definitions used by de Jonge and associates,
[17] the conclusions for these outcomes would have been quite different.
In addition, a statement in the text cites 6 of the studies
[9,11,13-15,18] as examining neonatal deaths. This appears to mischaracterize 3
[9,13,15] of these articles. One of these
[15] makes clear that it does not provide neonatal death rates compatible with the authors' definition (see the footnote to Table 5 in that paper). This paper should not have been cited at this point in the text and should not have been included in the calculation.
The list of studies used for the nonanomalous neonatal death outcome included 6 of the 7 references from the all neonatal death outcome, dropping only the study by Janssen and colleagues.
[15] It is truly remarkable that the Janssen study was included in the all neonatal death outcome rather than the nonanomalous neonatal death outcome, because it specifically excluded births of infants with congenital anomalies. This study was also included in the all perinatal death outcome, where, in addition to the fact that it excluded infants with congenital anomalies, the death statistics provided are incompatible with Wax's definition of perinatal death.
It appears that the study by Ackermann-Liebrich and colleagues
[9] should not have been included in the neonatal death outcomes, because deaths reported in this study are referred to as perinatal death rates rather than neonatal death rates, and perinatal was not defined in that work. The study by Koehler and colleagues
[13] similarly reports perinatal deaths (undefined) rather than neonatal deaths. Definitions of perinatal death vary dramatically. In fact in the United States, the National Vital Statistics Reports provide data using 2 different definitions of perinatal death rates:
- Definition 1: infant deaths of < 7 days and fetal deaths > 28 weeks; and
- Definition 2: infant deaths of < 28 days and fetal deaths > 20 weeks).
In 2005, these 2 rates differed by a factor of 1.6 (6.64 vs 10.73 per 1000).
[21]
The paper by Pang and coworkers,
[14] on the other hand, presents a completely different problem for inclusion. This article, which alone provides more than half of the neonatal deaths but just one third of the births, suffers from a number of serious flaws and has been thoroughly critiqued elsewhere.
[22] One principal flaw is that it includes an unknown number of unplanned home births. Pang and colleagues
[14] acknowledge this limitation of their study, and mention that previous studies show that neonatal mortality among unplanned home births is high, 73-120 per 1000 live births.
Pang and colleagues attempted to reduce the inclusion of unplanned home births by limiting data to uncomplicated pregnancies and deliveries of > 34 weeks' gestation with a midwife, nurse, or physician listed as attendant or certifier on the birth certificate. These criteria are an unreliable proxy for the true planning status; unplanned low-risk births would have been included by Pang and colleagues' criteria because many unplanned home births would have a physician, nurse attendant, or certifier.
[22] According to Wax and colleagues
, "An estimated 75% of low-risk singleton home births appear to be planned home deliveries."
[1] This statement implies that about 25% of low-risk singleton home births in the United States are unplanned. One would expect then that as many as 1500 of the 6133 home births reported by Pang and colleagues
[14] could have been unplanned. A further indication that unplanned home births are included in the study by Pang and colleagues is the fact that 7.6% of home births in that study were reported as having been attended by physicians, yet during the study period not a single physician in Washington state was known to offer home birth services.
[22] Given that Wax and colleagues' stated goal is to compare outcomes of planned home vs planned hospital births, it is extraordinary and incomprehensible that the study by Pang and colleagues was included.
In summary, at least 4
[9,13-15] of the 7 studies used to calculate the neonatal death outcomes appear to have been included inappropriately, and the vast majority of the births included in the perinatal death outcomes are from studies that did not provide statistics compatible with Wax and colleagues' definition of perinatal death.
Finally, it is surprising that the 2009 study by Janssen and colleagues
[19] was not included in the nonanomalous perinatal death outcome, because it does appear to provide adequate information to be included in this row. Similarly, the study by Lindgren and colleagues
[18] appears to provide adequate information to be included in both the all perinatal death and nonanomalous perinatal death outcomes. Koehler, Solomon, and Murphy
[13] also describe perinatal mortality, although difficulties are associated with their definition of perinatal mortality.
In reviewing the 12 cited studies, we have found a variety of definitions of perinatal mortality and frequent omission of complete descriptions of which deaths are and are not reported. This issue would appear to make combining studies of perinatal mortality in any meaningful way to be very challenging. It is very surprising that Wax and associates did not mention this limitation at all.
With respect to other reported outcomes, we have not completed an exhaustive search for improperly included and excluded studies but have found some additional exclusions, for example, the study by Hutton, Reitsma, and Kaufman
[12] and Janssen and colleagues' 2002 study
[15] were not included in the perineal laceration outcome and the latter was also not included in the ≥ third-degree laceration outcome.
For a study in which the main results arise from distinctions between precisely defined categories, such as perinatal vs neonatal death and nonanomalous vs all newborns, the issue of improper inclusion/exclusion is of utmost importance, and we have described many specific examples where studies were included or excluded incorrectly.
More Methodological and Reporting Errors
Invalid Statistical Test
Wax and colleagues begin their discussion by remarking on the robustness of the neonatal death statistics, supported by the homogeneity of the observation across studies.
[1] Homogeneity is said in the methods section to have been assessed with the Breslow-Day test. This test is not, however, valid for any of the perinatal or neonatal death outcomes. The user guide for SAS® 9.2 (SAS, Cary, NC), which the investigators claim to have used, states: "For the Breslow-Day test to be valid, the sample size should be relatively large in each stratum, and at least 80% of the expected cell counts should be greater than 5."
[23] These criteria are not met for any of the mortality outcomes. The ORs for the individual, included studies range in some cases from 0 to infinity. It is not at all obvious that the studies are statistically homogeneous.
Association and Causation Conflated
Wax and colleagues claim that "less medical intervention during planned home birth is associated with a tripling of the neonatal mortality rate."
[1] This is the sole conclusion offered in the abstract. Although it may be unintentional, the discussion in the paper implies that the reasons for an increase in neonatal mortality are derived from the studies that were included in the meta-analysis. However, the discussion of causes of neonatal mortality focuses on findings from studies that were not included in the meta-analysis, including studies that mix high-risk with low-risk cases.
[24-27] Of the studies that are included in the meta-analysis, none associates rates of intervention with rates of neonatal mortality.
Any discussion of causation for elevated neonatal death rates for planned home births compared with planned hospital births is particularly specious in light of the paradoxical nature of the results it attempts to explain -- the results reproduced in Table 1 above. Furthermore, as part of their discussion of causation, Wax and colleagues claim that planned home births were characterized by a greater proportion of deaths attributed to respiratory distress and failed resuscitation. No data are provided in support of this claim, but 4
[11,13,14,18] of the 12 primary articles are cited. However, not a single death in the home birth group in the study by Woodcock and associates
[11] was attributed to respiratory distress or failed resuscitation. In the study by Lindgren and associates,
[18] 1 of 2 home birth fatalities is attributed to asphyxia, whereas 4 of 7 in the hospital group list asphyxia in the cause of death. Koehler, Solomon, and Murphy
[13] reported 1 death of an infant who had no onset of spontaneous respiration; in this study, the hospital birth comparison group consisted of only 67 births with no deaths reported. It is very difficult to see how these 3 studies could be interpreted to support the claim made by Wax and colleagues.
The entire discussion of causation is further undermined by the numerous numerical errors, and issues of inclusion and exclusion described above.
Errors in the Abstract
The abstract states that the results revealed less frequent assisted newborn ventilation in planned home births. However, this is inconsistent with the body of the article, where the result is not statistically significant but trends towards
increased frequency. The spuriously statistically significant result for perineal laceration produced by the faulty spreadsheet results in another outcome that is incorrectly reported in the abstract. Significant additional errors in the abstract are associated with the mistaken inclusion/exclusion issues already described.
Shifting Numbers
Following a post-publication investigation of the study initiated by the
American Journal of Obstetrics and Gynecology, [28] Wax published a supplement containing forest plots and summary tables.
[8] The summary ORs and CIs for 3 of the reported outcomes (nonanomalous neonatal death, postdates, and prematurity) differ from their values in the originally published paper. Although none of these changes alters the direction of the reported result or its statistical significance, it is very surprising that Wax made no mention of these changes. None of the 3 updated outcomes yet provides correct values; for postdates and prematurity the faulty calculator was used, whereas the nonanomalous neonatal death outcome suffers from data extraction and mistaken inclusion errors.
Differences Among Studies
The group of studies included in this meta-analysis presents a number of additional statistical problems. Most, but not all, of the studies restricted inclusion to low-risk births. Most (by population), but not all, of the studies restricted home births to those attended by certified or licensed midwives. Most (by population), but not all, of the studies included only midwives operating in jurisdictions where midwives offering home birth services are well integrated into the greater healthcare system. All but a single study restricted home births to those that were planned.
Wax and coworkers make little mention of any of these complications, and it would seem that any conclusions made on the basis of combined results from such a disparate set of conditions would not be relevant to any parent planning a birth. Given these complexities, decisions would be better made on the basis of the subset of studies that are relevant to the conditions at hand.
Conclusion
The debate over the safety of home birth is deeply divided and emotionally charged. Reliable information is required to allow productive debate and informed decisions. In an era of evidence-based medicine, it is incomprehensible that medical society opinion can be formulated on research that does not hold to the most basic standards of methodological rigor.
Appendix. An Analysis of "Meta5.doc," The Computational Tool Used For Random-Effects Meta-Analysis
The random effects calculations in the study by Wax and colleagues made use of a meta-analysis calculator implemented in a spreadsheet that was embedded in a Microsoft® Word document.
{1} The spreadsheet is based upon formulae found in Petitti's meta-analysis text.
{2} The formulae in question are in Table 7-7 on page 102 and on pages 116-117.
Specifically, the errors in the spreadsheet are:
- In cell W10, which contains Petitti's D value (Δ2 in DerSimonian and Laird's notation{3}), any negative value for D should be replaced with 0. The spreadsheet, however, contained no logic to replace negative values. In cases where D is negative, this could dramatically alter the weightings of the datasets.
- In the calculation of the adjusted weights (wi*), which should be given by wi* = 1/(D+(1/wi )), the spreadsheet cell reference to the cell containing D was entered not as an absolute cell reference (eg $W$10) but as a relative cell reference (W10), so that it referred to the (possibly incorrect, due to error 1) D value in cell W10, for only the first study. For subsequent studies, the value taken to be D was whatever value happened to be in cells W11, W12, W13, etc. As a consequence of the layout of the spreadsheet, those cells are generally blank, returning values of 0.
- In the calculation of the CI limits, rather than employing the sum of the adjusted variances (variances*), the sum of the raw variances was used. To correct this, the references to cell H10 in cells I10 and J10 should be replaced by 1/sqrt(U14).
- The spreadsheet provides a negative value for the Q statistic. Because Q is a weighted sum-of-squares, this cannot be correct. Q should have been taken from cell S14.
It is possible that the spreadsheet will be corrected. The original version of the spreadsheet can be found on the Internet archive "
Wayback Machine."
After making the described corrections, the spreadsheet appears to implement correctly Petitti's description of the DerSimonian-Laird method. However, the results provided by the spreadsheet still show minor deviations from their expected values. This is the result of a discrepancy between Petitti's algorithm and DerSimonian and Laird's paper. In particular, Petitti makes use of the Mantel-Haenszel variance and OR in calculating the adjusted weights
wi *. These do not appear to agree with the variance (
si 2 ) and weighted (natural log of) OR (
yw ) indicated for OR calculations in DerSimonian and Laird's original paper.
As an example of the possible consequences of using the spreadsheet, we consider a random effects model calculation combining the 2 datasets shown below.
| Disease
(exposed) | Disease
(nonexposed) | Nondiseased
(exposed) | Nondiseased
(nonexposed) |
Study 1 | 920 | 480 | 1216 | 1588 |
Study 2 | 160 | 172 | 235 | 157 |
The spreadsheet provides a random effects OR of 0.64, with 95% confidence bounds of 0.55-0.73. Simply exchanging the order of the 2 studies changes the result to OR = 2.48 (95% CI, 2.15-2.85).
The correct results from a DerSimonian-Laird random effects calculation
{1} is OR, 1.26 (95% CI, 0.32-4.92). The correct results in this example were calculated using the rmeta package
{4} for the R statistical analysis environment
{5} and verified by hand calculations.
These analyses are shown in the forest plot in the Figure. Clearly, either of the 2 incorrect results (shown in blue and red) would lead to incorrect conclusions -- both results spuriously suggest statistical significance, although the conclusion on the direction of the effect depends on the order in which the studies are entered into the spreadsheet.
Figure. Forest plot showing ORs and CIs for a random-effects meta-analysis of the example data sets. The correct result is represented by the long thin black diamond. The red and blue diamonds represent the 2 possible incorrect results produced by the faulty spreadsheet.
References
- Basu A. Calculation of summary effects using random effects method. Available at: http://www.pitt.edu/~super1/lecture/lec1171/meta5.doc Accessed January 28, 2011.
- Petitti DB. Meta-Analysis, Decision Analysis, and Cost-Effectiveness Analysis. 2nd Ed. Oxford University Press; New York; 2000.
- DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7:177-188.
- Lumley, T. rmeta: Meta-analysis. September 29, 2009. Available at: http://cran.r-project.org/web/packages/rmeta/index.html Accessed January 26, 2011.
- R. The R Project for Statistical Computing. Available at: http://www.r-project.org Accessed January 26, 2011.