Project Overview¶
A bibliometrics project using natural language processing and information extraction techniques to analyse language use trends in OBGYN research papers. The research was funded by IWK Health and led by Professor Jocelyn Stairs.
We performed a cross-sectional study to examine trends in gendered language use in abstracts pertaining to obstetrics and gynaecology along with use of language specific to the transgender population.
I presented my work on this project to the Dalhousie Department of Obstetrics and Gynaecology Research Day Conference in 2025.
I am first-author on a manuscript we published to the Journal of Obstetrics and Gynaecology Canada in February 2026.
The Data¶
The goal for this project was to analyse the trends across all OBGYN papers in PubMed. In total, PubMed has around 27 million papers, so the first problem was filtering that down to the papers which were likely OBGYN related.
We developed a search query for PubMed that included a wide range of OBGYN terms which filtered the papers down to 4.8 million.
Information Extraction¶
Filtering¶
I created a controlled vocabulary from a subset of MESH terms which were related to OBGYN.
I processed the paper titles and abstracts using MetaMap in a Python script to identify the the concepts and terms in this controlled vocabulary. Extra conditions were added for some terms that were ambiguous.
Using the concepts extracted and the controlled vocabulary I filtered down the 4.8 million papers to 955,626.
Classifying the Papers¶
We were interested in the trends in gendered language so three classes were created: gendered, gender neutral, and transgender. The assignment of multiple classes was allowed. Using the MetaMap concepts along with extra conditions to reduce false-positive matches we identified language belonging to these classes.
A set of conditionals was used to assign each paper to one or multiple classes and to further reduce false-positive matches.
Analysis and Results¶
The script produced a CSV file for all 955,626 papers containing metadata on each, the MESH terms identified, and the class(es) each paper was assigned to.
A total of 59% of the papers contained gendered language, 0.03% transgender, and 40.6% gender-neutral. The proportion of obstetrics and gynecology abstracts containing gender-neutral and transgender language has rapidly increased. We believe that this shift may represent a response to journal policies on language, and the growing recognition of the applicability of “women’s” health to a broader population than cis-gendered women.
Trends in the percentage of abstracts that used gender-neutral language concepts over time among top 15 obstetrics and gynecology journals indexed in the InCites dataset with a gender-neutral language policy.¶

Trends in the percentage of abstracts that used transgender language concepts over time for all abstracts containing obstetrics and gynecology concepts indexed in PubMed.¶

Presentation¶
I had the opportunity to present my work on this project to the Dalhousie Department of Obstetrics and Gynaecology Research Day Conference in 2025.

Paper¶
I am first-author on a manuscript we published to the Journal of Obstetrics and Gynaecology Canada in February 2026. I wrote the methods section, created the figures, and created the supplementary document for this paper. I collaborated with Professor Jocelyn Stairs, Professor Finlay Maguire, and Dr. Aisling Clancy on this paper.