Background: While healthcare utilization data are useful for post-marketing surveillance of drug safety in pregnancy, the start of pregnancy and gestational age at birth are often incompletely recorded or missing. Our objective was to develop and validate a claims-based live birth gestational age algorithm. Methods: Using the Medicaid Analytic eXtract (MAX) linked to birth certificates in three states, we developed four candidate algorithms based on: preterm codes; preterm or post-term codes; timing of prenatal care; and prediction models - using conventional regression and machine-learning approaches with a broad range of pre-specified and empirically selected predictors. We assessed algorithm performance based on mean squared error (MSE) and proportion of pregnancies with estimated gestational age within 1 and 2 weeks of the gold standard, defined as the clinical or obstetric estimate of gestation on the birth certificate. We validated the best performing algorithms against medical records in a nationwide sample. We quantified misclassification of select drug exposure scenarios due to estimated gestational age as positive predictive value (PPV), sensitivity, and specificity. Results: Among 114,117 eligible pregnancies, the random forest model with all predictors emerged as the best performing algorithm: MSE 1.5; 84.8% within 1 week and 96.3% within 2 weeks, with similar performance in the nationwide validation cohort. For all exposure scenarios, PPVs were >93.8%, sensitivities >94.3%, and specificities >99.4%. Conclusions: We developed a highly accurate algorithm for estimating gestational age among live births in the nationwide MAX data, further supporting the value of these data for drug safety surveillance in pregnancy. Copyright © 2022 Wolters Kluwer Health, Inc. All rights reserved.
No comments:
Post a Comment