Authors
Yongjun Zhu, Erjia Yan, Fei Wang
Publication date
2017/12
Journal
BMC medical informatics and decision making
Volume
17
Pages
1-8
Publisher
BioMed Central
Description
Background
Understanding semantic relatedness and similarity between biomedical terms has a great impact on a variety of applications such as biomedical information retrieval, information extraction, and recommender systems. The objective of this study is to examine word2vec’s ability in deriving semantic relatedness and similarity between biomedical terms from large publication data. Specifically, we focus on the effects of recency, size, and section of biomedical publication data on the performance of word2vec.
Methods
We download abstracts of 18,777,129 articles from PubMed and 766,326 full-text articles from PubMed Central (PMC). The datasets are preprocessed and grouped into subsets by recency, size, and section. Word2vec models are trained on these subtests. Cosine similarities between biomedical terms obtained from the word2vec …
Total citations
2017201820192020202120222023202431121241710102