Filtering¶
I created a controlled vocabulary from a subset of MESH terms which were related to OBGYN.
I processed the paper titles and abstracts using MetaMap in a Python script to identify the the concepts and terms in this controlled vocabulary. Extra conditions were added for some terms that were ambiguous.
Using the concepts extracted and the controlled vocabulary I filtered down the 4.8 million papers to 955,626.
Classifying the Papers¶
We were interested in the trends in gendered language so three classes were created: gendered, gender neutral, and transgender. The assignment of multiple classes was allowed. Using the MetaMap concepts along with extra conditions to reduce false-positive matches we identified language belonging to these classes.
A set of conditionals was used to assign each paper to one or multiple classes and to further reduce false-positive matches.