Streamlining social media information retrieval for public health research with deep learning (2024)

Article Navigation

Volume 31 Issue 7 July 2024
  • < Previous
  • Next >

Journal Article Featured

Get access

,

Yining Hua, MS

Department of Epidemiology, Harvard Chan School of Public Health

, Boston, MA 02115,

United States

Department of Biomedical Informatics, Harvard Medical School

, Boston, MA 02115,

United States

Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School

, Boston, MA 02145,

United States

Search for other works by this author on:

Oxford Academic

,

Jiageng Wu, MS

School of Public Health, Zhejiang University School of Medicine

, Hangzhou, Zhejiang, 310058,

China

Search for other works by this author on:

Oxford Academic

,

Shixu Lin, BS

School of Public Health, Zhejiang University School of Medicine

, Hangzhou, Zhejiang, 310058,

China

Search for other works by this author on:

Oxford Academic

,

Minghui Li, BS

School of Public Health, Zhejiang University School of Medicine

, Hangzhou, Zhejiang, 310058,

China

Search for other works by this author on:

Oxford Academic

,

Yujie Zhang, BS

School of Public Health, Zhejiang University School of Medicine

, Hangzhou, Zhejiang, 310058,

China

Search for other works by this author on:

Oxford Academic

,

Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School

, Boston, MA 02145,

United States

Search for other works by this author on:

Oxford Academic

,

Siwen Wang, MD

Department of Epidemiology, Harvard Chan School of Public Health

, Boston, MA 02115,

United States

Search for other works by this author on:

Oxford Academic

,

Peilin Zhou, MS

Thrust of Data Science and Analytics, The Hong Kong University of Science and Technology (Guangzhou)

, Guangzhou, Guangdong, 511458,

China

Search for other works by this author on:

Oxford Academic

,

Jie Yang, PhD

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School

, Boston, MA 02120,

United States

Corresponding authors: Jie Yang, PhD, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 1620 Tremont Street, Suite 3030, Boston, MA 02120, United States (jyang66@bwh.harvard.edu) and Li Zhou, MD, PhD, Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical School, 399 Revolution Drive, Suite 777, Somerville, Boston, MA 02145, United States (lzhou@bwh.harvard.edu)

Search for other works by this author on:

Oxford Academic

Li Zhou, MD, PhD

Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School

, Boston, MA 02145,

United States

Corresponding authors: Jie Yang, PhD, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 1620 Tremont Street, Suite 3030, Boston, MA 02120, United States (jyang66@bwh.harvard.edu) and Li Zhou, MD, PhD, Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical School, 399 Revolution Drive, Suite 777, Somerville, Boston, MA 02145, United States (lzhou@bwh.harvard.edu)

Search for other works by this author on:

Oxford Academic

Journal of the American Medical Informatics Association, Volume 31, Issue 7, July 2024, Pages 1569–1577, https://doi.org/10.1093/jamia/ocae118

Published:

08 May 2024

Article history

Received:

24 January 2024

Revision received:

28 March 2024

Editorial decision:

28 April 2024

Accepted:

07 May 2024

Published:

08 May 2024

Corrected and typeset:

23 May 2024

  • Views
    • Article contents
    • Figures & tables
    • Video
    • Audio
    • Supplementary Data
  • Cite

    Cite

    Yining Hua, Jiageng Wu, Shixu Lin, Minghui Li, Yujie Zhang, Dinah Foer, Siwen Wang, Peilin Zhou, Jie Yang, Li Zhou, Streamlining social media information retrieval for public health research with deep learning, Journal of the American Medical Informatics Association, Volume 31, Issue 7, July 2024, Pages 1569–1577, https://doi.org/10.1093/jamia/ocae118

    Close

Search

Close

Search

Advanced Search

Search Menu

Abstract

Objective

Social media-based public health research is crucial for epidemic surveillance, but most studies identify relevant corpora with keyword-matching. This study develops a system to streamline the process of curating colloquial medical dictionaries. We demonstrate the pipeline by curating a Unified Medical Language System (UMLS)-colloquial symptom dictionary from COVID-19-related tweets as proof of concept.

Methods

COVID-19-related tweets from February 1, 2020, to April 30, 2022 were used. The pipeline includes three modules: a named entity recognition module to detect symptoms in tweets; an entity normalization module to aggregate detected entities; and a mapping module that iteratively maps entities to Unified Medical Language System concepts. A random 500 entity samples were drawn from the final dictionary for accuracy validation. Additionally, we conducted a symptom frequency distribution analysis to compare our dictionary to a pre-defined lexicon from previous research.

Results

We identified 498480 unique symptom entity expressions from the tweets. Pre-processing reduces the number to 18226. The final dictionary contains 38175 unique expressions of symptoms that can be mapped to 966 UMLS concepts (accuracy = 95%). Symptom distribution analysis found that our dictionary detects more symptoms and is effective at identifying psychiatric disorders like anxiety and depression, often missed by pre-defined lexicons.

Conclusions

This study advances public health research by implementing a novel, systematic pipeline for curating symptom lexicons from social media data. The final lexicon's high accuracy, validated by medical professionals, underscores the potential of this methodology to reliably interpret, and categorize vast amounts of unstructured social media data into actionable medical insights across diverse linguistic and regional landscapes.

public health, social media, information retrieval, named entity recognition, name entity normalization, COVID-19, deep learning

© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)

Issue Section:

Research and Application

You do not currently have access to this article.

Download all slides

Sign in

Get help with access

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD

Sign in Register

Institutional access

    Sign in through your institution

    Sign in through your institution

  1. Sign in with a library card
  2. Sign in with username/password
  3. Recommend to your librarian

Institutional account management

Sign in as administrator

Get help with access

Institutional access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Sign in through your institution

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  1. Click Sign in through your institution.
  2. Select your institution from the list provided, which will take you to your institution's website to sign in.
  3. When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  4. Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Sign in with a library card

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  1. Click Sign in through society site.
  2. When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.
  3. Following successful sign in, you will be returned to Oxford Academic.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

Personal account

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

Institutional account management

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Purchase

Subscription prices and ordering for this journal

Purchasing options for books and journals across Oxford Academic

Short-term Access

To purchase short-term access, please sign in to your personal account above.

Don't already have a personal account? Register

Streamlining social media information retrieval for public health research with deep learning - 24 Hours access

EUR €38.00

GBP £33.00

USD $41.00

Rental

Streamlining social media information retrieval for public health research with deep learning (7)

This article is also available for rental through DeepDyve.

Advertisem*nt

Citations

Views

58

Altmetric

More metrics information

Metrics

Total Views 58

20 Pageviews

38 PDF Downloads

Since 5/1/2024

Month: Total Views:
May 2024 40
June 2024 18

Citations

Powered by Dimensions

Altmetrics

×

Email alerts

Article activity alert

Advance article alerts

New issue alert

Receive exclusive offers and updates from Oxford Academic

Citing articles via

Google Scholar

  • Latest

  • Most Read

  • Most Cited

Addressing methodological and logistical challenges of using electronic health record (EHR) data for research
A novel hyperparameter search approach for accuracy and simplicity in disease prediction risk scoring
Impact of response bias in three surveys on primary care providers’ experiences with electronic health records
On the utility of using the All of Us Research Program as a resource to study military service members and veterans
Use of All of Us data to increase health literacy and research skills in high school students

More from Oxford Academic

Bioinformatics and Computational Biology

Biological Sciences

Biomathematics and Statistics

Mathematics

Medical Statistics and Methodology

Medicine and Health

Science and Mathematics

Books

Journals

Advertisem*nt

Streamlining social media information retrieval for public health research with deep learning (2024)
Top Articles
Latest Posts
Article information

Author: Edwin Metz

Last Updated:

Views: 6646

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Edwin Metz

Birthday: 1997-04-16

Address: 51593 Leanne Light, Kuphalmouth, DE 50012-5183

Phone: +639107620957

Job: Corporate Banking Technician

Hobby: Reading, scrapbook, role-playing games, Fishing, Fishing, Scuba diving, Beekeeping

Introduction: My name is Edwin Metz, I am a fair, energetic, helpful, brave, outstanding, nice, helpful person who loves writing and wants to share my knowledge and understanding with you.