Report for: Simplifying drug package leaflets written in Spanish by using word embedding

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Simplifying drug package leaflets written in Spanish by using word embedding
Published in	Journal of Biomedical Semantics, September 2017
DOI	10.1186/s13326-017-0156-7
Pubmed ID	28962645
Authors	Isabel Segura-Bedmar, Paloma Martínez
Abstract	Drug Package Leaflets (DPLs) provide information for patients on how to safely use medicines. Pharmaceutical companies are responsible for producing these documents. However, several studies have shown that patients usually have problems in understanding sections describing posology (dosage quantity and prescription), contraindications and adverse drug reactions. An ultimate goal of this work is to provide an automatic approach that helps these companies to write drug package leaflets in an easy-to-understand language. Natural language processing has become a powerful tool for improving patient care and advancing medicine because it leads to automatically process the large amount of unstructured information needed for patient care. However, to the best of our knowledge, no research has been done on the automatic simplification of drug package leaflets. In a previous work, we proposed to use domain terminological resources for gathering a set of synonyms for a given target term. A potential drawback of this approach is that it depends heavily on the existence of dictionaries, however these are not always available for any domain and language or if they exist, their coverage is very scarce. To overcome this limitation, we propose the use of word embeddings to identify the simplest synonym for a given term. Word embedding models represent each word in a corpus with a vector in a semantic space. Our approach is based on assumption that synonyms should have close vectors because they occur in similar contexts. In our evaluation, we used the corpus EasyDPL (Easy Drug Package Leaflets), a collection of 306 leaflets written in Spanish and manually annotated with 1400 adverse drug effects and their simplest synonyms. We focus on leaflets written in Spanish because it is the second most widely spoken language on the world, but as for the existence of terminological resources, the Spanish language is usually less prolific than the English language. Our experiments show an accuracy of 38.5% using word embeddings. This work provides a promising approach to simplify DPLs without using terminological resources or parallel corpora. Moreover, it could be easily adapted to different domains and languages. However, more research efforts are needed to improve our approach based on word embedding because it does not overcome our previous work using dictionaries yet.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 5 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Spain	3	60%
Switzerland	1	20%
Unknown	1	20%

Demographic breakdown

Type	Count	As %
Members of the public	3	60%
Scientists	2	40%

Mendeley readers

The data shown below were compiled from readership statistics for 32 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	32	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	5	16%
Student > Master	5	16%
Student > Doctoral Student	3	9%
Professor	2	6%
Researcher	2	6%
Other	6	19%
Unknown	9	28%

Readers by discipline	Count	As %
Computer Science	10	31%
Linguistics	4	13%
Pharmacology, Toxicology and Pharmaceutical Science	2	6%
Medicine and Dentistry	2	6%
Social Sciences	1	3%
Other	3	9%
Unknown	10	31%

Attention Score in Context

This research output has an Altmetric Attention Score of 4. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 10 October 2017.

All research outputs

#6,925,833

of 23,003,906 outputs

Outputs from Journal of Biomedical Semantics

#129

of 364 outputs

Outputs of similar age

#111,112

of 321,103 outputs

Outputs of similar age from Journal of Biomedical Semantics

of 20 outputs

Altmetric has tracked 23,003,906 research outputs across all sources so far. This one has received more attention than most of these and is in the 69th percentile.

So far Altmetric has tracked 364 research outputs from this source. They receive a mean Attention Score of 4.6. This one has gotten more attention than average, scoring higher than 64% of its peers.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 321,103 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 65% of its contemporaries.

We're also able to compare this research output to 20 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 75% of its contemporaries.

Simplifying drug package leaflets written in Spanish by using word embedding

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context