Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing

Kristen Barbour, Dale C. Hesdorffer, Niu Tian, Elissa G. Yozawitz, Patricia E. McGoldrick, Steven Wolf, Tiffani L. McDonough, Aaron Nelson, Tobias Loddenkemper, Natasha Basma, Stephen B. Johnson, Zachary M. Grinspan

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Objective: Sudden unexpected death in epilepsy (SUDEP) is an important cause of mortality in epilepsy. However, there is a gap in how often providers counsel patients about SUDEP. One potential solution is to electronically prompt clinicians to provide counseling via automated detection of risk factors in electronic medical records (EMRs). We evaluated (1) the feasibility and generalizability of using regular expressions to identify risk factors in EMRs and (2) barriers to generalizability. Methods: Data included physician notes for 3000 patients from one medical center (home) and 1000 from five additional centers (away). Through chart review, we identified three SUDEP risk factors: (1) generalized tonic–clonic seizures, (2) refractory epilepsy, and (3) epilepsy surgery candidacy. Regular expressions of risk factors were manually created with home training data, and performance was evaluated with home test and away test data. Performance was evaluated by sensitivity, positive predictive value, and F-measure. Generalizability was defined as an absolute decrease in performance by <0.10 for away versus home test data. To evaluate underlying barriers to generalizability, we identified causes of errors seen more often in away data than home data. To demonstrate how small revisions can improve generalizability, we removed three “boilerplate” standard text phrases from away notes and repeated performance. Results: We observed high performance in home test data (F-measure range = 0.86-0.90), and low to high performance in away test data (F-measure range = 0.53-0.81). After removing three boilerplate phrases, away performance improved (F-measure range = 0.79-0.89) and generalizability was achieved for nearly all measures. The only significant barrier to generalizability was use of boilerplate phrases, causing 104 of 171 errors (61%) in away data. Significance: Regular expressions are a feasible and probably a generalizable method to identify variables related to SUDEP risk. Our methods may be implemented to create large patient cohorts for research and to generate electronic prompts for SUDEP counseling.

Original languageEnglish (US)
Pages (from-to)1209-1220
Number of pages12
JournalEpilepsia
Volume60
Issue number6
DOIs
StatePublished - Jun 1 2019

Fingerprint

Natural Language Processing
Electronic Health Records
Sudden Death
Epilepsy
Counseling
Patient-Centered Care
Seizures
Physicians
Mortality

Keywords

  • electronic health records
  • generalized tonic–clonic seizures
  • natural language processing
  • patient education
  • sudden unexpected death in epilepsy

ASJC Scopus subject areas

  • Neurology
  • Clinical Neurology

Cite this

Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing. / Barbour, Kristen; Hesdorffer, Dale C.; Tian, Niu; Yozawitz, Elissa G.; McGoldrick, Patricia E.; Wolf, Steven; McDonough, Tiffani L.; Nelson, Aaron; Loddenkemper, Tobias; Basma, Natasha; Johnson, Stephen B.; Grinspan, Zachary M.

In: Epilepsia, Vol. 60, No. 6, 01.06.2019, p. 1209-1220.

Research output: Contribution to journalArticle

Barbour, K, Hesdorffer, DC, Tian, N, Yozawitz, EG, McGoldrick, PE, Wolf, S, McDonough, TL, Nelson, A, Loddenkemper, T, Basma, N, Johnson, SB & Grinspan, ZM 2019, 'Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing', Epilepsia, vol. 60, no. 6, pp. 1209-1220. https://doi.org/10.1111/epi.15966
Barbour, Kristen ; Hesdorffer, Dale C. ; Tian, Niu ; Yozawitz, Elissa G. ; McGoldrick, Patricia E. ; Wolf, Steven ; McDonough, Tiffani L. ; Nelson, Aaron ; Loddenkemper, Tobias ; Basma, Natasha ; Johnson, Stephen B. ; Grinspan, Zachary M. / Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing. In: Epilepsia. 2019 ; Vol. 60, No. 6. pp. 1209-1220.
@article{2a10b4cc488c41e6bff8a69daf6bc511,
title = "Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing",
abstract = "Objective: Sudden unexpected death in epilepsy (SUDEP) is an important cause of mortality in epilepsy. However, there is a gap in how often providers counsel patients about SUDEP. One potential solution is to electronically prompt clinicians to provide counseling via automated detection of risk factors in electronic medical records (EMRs). We evaluated (1) the feasibility and generalizability of using regular expressions to identify risk factors in EMRs and (2) barriers to generalizability. Methods: Data included physician notes for 3000 patients from one medical center (home) and 1000 from five additional centers (away). Through chart review, we identified three SUDEP risk factors: (1) generalized tonic–clonic seizures, (2) refractory epilepsy, and (3) epilepsy surgery candidacy. Regular expressions of risk factors were manually created with home training data, and performance was evaluated with home test and away test data. Performance was evaluated by sensitivity, positive predictive value, and F-measure. Generalizability was defined as an absolute decrease in performance by <0.10 for away versus home test data. To evaluate underlying barriers to generalizability, we identified causes of errors seen more often in away data than home data. To demonstrate how small revisions can improve generalizability, we removed three “boilerplate” standard text phrases from away notes and repeated performance. Results: We observed high performance in home test data (F-measure range = 0.86-0.90), and low to high performance in away test data (F-measure range = 0.53-0.81). After removing three boilerplate phrases, away performance improved (F-measure range = 0.79-0.89) and generalizability was achieved for nearly all measures. The only significant barrier to generalizability was use of boilerplate phrases, causing 104 of 171 errors (61{\%}) in away data. Significance: Regular expressions are a feasible and probably a generalizable method to identify variables related to SUDEP risk. Our methods may be implemented to create large patient cohorts for research and to generate electronic prompts for SUDEP counseling.",
keywords = "electronic health records, generalized tonic–clonic seizures, natural language processing, patient education, sudden unexpected death in epilepsy",
author = "Kristen Barbour and Hesdorffer, {Dale C.} and Niu Tian and Yozawitz, {Elissa G.} and McGoldrick, {Patricia E.} and Steven Wolf and McDonough, {Tiffani L.} and Aaron Nelson and Tobias Loddenkemper and Natasha Basma and Johnson, {Stephen B.} and Grinspan, {Zachary M.}",
year = "2019",
month = "6",
day = "1",
doi = "10.1111/epi.15966",
language = "English (US)",
volume = "60",
pages = "1209--1220",
journal = "Epilepsia",
issn = "0013-9580",
publisher = "Wiley-Blackwell",
number = "6",

}

TY - JOUR

T1 - Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing

AU - Barbour, Kristen

AU - Hesdorffer, Dale C.

AU - Tian, Niu

AU - Yozawitz, Elissa G.

AU - McGoldrick, Patricia E.

AU - Wolf, Steven

AU - McDonough, Tiffani L.

AU - Nelson, Aaron

AU - Loddenkemper, Tobias

AU - Basma, Natasha

AU - Johnson, Stephen B.

AU - Grinspan, Zachary M.

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Objective: Sudden unexpected death in epilepsy (SUDEP) is an important cause of mortality in epilepsy. However, there is a gap in how often providers counsel patients about SUDEP. One potential solution is to electronically prompt clinicians to provide counseling via automated detection of risk factors in electronic medical records (EMRs). We evaluated (1) the feasibility and generalizability of using regular expressions to identify risk factors in EMRs and (2) barriers to generalizability. Methods: Data included physician notes for 3000 patients from one medical center (home) and 1000 from five additional centers (away). Through chart review, we identified three SUDEP risk factors: (1) generalized tonic–clonic seizures, (2) refractory epilepsy, and (3) epilepsy surgery candidacy. Regular expressions of risk factors were manually created with home training data, and performance was evaluated with home test and away test data. Performance was evaluated by sensitivity, positive predictive value, and F-measure. Generalizability was defined as an absolute decrease in performance by <0.10 for away versus home test data. To evaluate underlying barriers to generalizability, we identified causes of errors seen more often in away data than home data. To demonstrate how small revisions can improve generalizability, we removed three “boilerplate” standard text phrases from away notes and repeated performance. Results: We observed high performance in home test data (F-measure range = 0.86-0.90), and low to high performance in away test data (F-measure range = 0.53-0.81). After removing three boilerplate phrases, away performance improved (F-measure range = 0.79-0.89) and generalizability was achieved for nearly all measures. The only significant barrier to generalizability was use of boilerplate phrases, causing 104 of 171 errors (61%) in away data. Significance: Regular expressions are a feasible and probably a generalizable method to identify variables related to SUDEP risk. Our methods may be implemented to create large patient cohorts for research and to generate electronic prompts for SUDEP counseling.

AB - Objective: Sudden unexpected death in epilepsy (SUDEP) is an important cause of mortality in epilepsy. However, there is a gap in how often providers counsel patients about SUDEP. One potential solution is to electronically prompt clinicians to provide counseling via automated detection of risk factors in electronic medical records (EMRs). We evaluated (1) the feasibility and generalizability of using regular expressions to identify risk factors in EMRs and (2) barriers to generalizability. Methods: Data included physician notes for 3000 patients from one medical center (home) and 1000 from five additional centers (away). Through chart review, we identified three SUDEP risk factors: (1) generalized tonic–clonic seizures, (2) refractory epilepsy, and (3) epilepsy surgery candidacy. Regular expressions of risk factors were manually created with home training data, and performance was evaluated with home test and away test data. Performance was evaluated by sensitivity, positive predictive value, and F-measure. Generalizability was defined as an absolute decrease in performance by <0.10 for away versus home test data. To evaluate underlying barriers to generalizability, we identified causes of errors seen more often in away data than home data. To demonstrate how small revisions can improve generalizability, we removed three “boilerplate” standard text phrases from away notes and repeated performance. Results: We observed high performance in home test data (F-measure range = 0.86-0.90), and low to high performance in away test data (F-measure range = 0.53-0.81). After removing three boilerplate phrases, away performance improved (F-measure range = 0.79-0.89) and generalizability was achieved for nearly all measures. The only significant barrier to generalizability was use of boilerplate phrases, causing 104 of 171 errors (61%) in away data. Significance: Regular expressions are a feasible and probably a generalizable method to identify variables related to SUDEP risk. Our methods may be implemented to create large patient cohorts for research and to generate electronic prompts for SUDEP counseling.

KW - electronic health records

KW - generalized tonic–clonic seizures

KW - natural language processing

KW - patient education

KW - sudden unexpected death in epilepsy

UR - http://www.scopus.com/inward/record.url?scp=85066128201&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066128201&partnerID=8YFLogxK

U2 - 10.1111/epi.15966

DO - 10.1111/epi.15966

M3 - Article

C2 - 31111463

AN - SCOPUS:85066128201

VL - 60

SP - 1209

EP - 1220

JO - Epilepsia

JF - Epilepsia

SN - 0013-9580

IS - 6

ER -