Proceedings chapter
Open access

Cluster analysis of low-dimensional medical concept representations from Electronic Health Records

Published inHealth Information Science. HIS 2022., p. 313-324
Presented at 11th International Conference on Health Information Science (HIS 2022), Online, 28–30 October 2022
PublisherCham : Springer Nature
  • Lecture Notes in Computer Science; 13705
First online date2022-10-25

The study of existing links among different types of medical concepts can support research on optimal pathways for the treatment of human diseases. Here, we present a clustering analysis of medical concept learned representations generated from MIMIC-IV, an open dataset of de-identified digital health records. Patient’s trajectory information were extracted in chronological order to generate +500k sequence-like data structures, which were fed to a word2vec model to automatically learn concept representations. As a result, we obtained concept embeddings that describe diagnostics, procedures, and medications in a continuous low-dimensional space. A quantitative evaluation of the embeddings shows the significant power of the extracted embeddings on predicting exact labels of diagnoses, procedures, and medications for a given patient trajectory, achieving top-10 and top-30 accuracy over 47% and 66%, respectively, for all the dimensions evaluated. Moreover, clustering analyses of medical concepts after dimensionality reduction with t-SNE and UMAP techniques show that similar diagnoses (and procedures) are grouped together matching the categories of ICD-10 codes. However, the distribution by categories is not as evident if PCA or SVD are employed, indicating that the relationships among concepts are highly non-linear. This highlights the importance of non-linear models, such as those provided by deep learning, to capture the complex relationships of medical concepts.

  • Electronic health records
  • Patient trajectory
  • Embeddings
  • Clustering
  • Representation learning
Citation (ISO format)
JAUME SANTERO, Fernando et al. Cluster analysis of low-dimensional medical concept representations from Electronic Health Records. In: Health Information Science. HIS 2022. Online. Cham : Springer Nature, 2022. p. 313–324. (Lecture Notes in Computer Science) doi: 10.1007/978-3-031-20627-6_29
Main files (1)
Proceedings chapter (Accepted version)

Technical informations

Creation09/28/2022 10:16:00 AM
First validation09/28/2022 10:16:00 AM
Update time03/16/2023 8:53:40 AM
Status update03/16/2023 8:53:39 AM
Last indexation05/06/2024 11:59:46 AM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack