OpenPapers
OpenPapers is a robust Python toolkit for systematically collecting research papers from top-tier AI and machine learning conferences.
๐ GitHub Repository: https://github.com/zhangleiniu/OpenPapers
Motivation
Since the rise of deep learning (~2013 onward), the number of publications in AI has grown exponentially. This creates challenges for:
- Literature review
- Citation analysis
- Research trend detection
- Paper recommendation
- Research gap identification
OpenPapers builds a curated and structured dataset that enables:
- Citation and reference recommendation
- Structured PDF parsing (e.g., via GROBID/Nougat)
- Research limitation analysis
- Intelligent paper reading recommendation systems
Overview
The rapid growth of AI and ML research has led to an overwhelming volume of publications each year. While platforms such as Google Scholar, Semantic Scholar, and OpenReview attempt to aggregate research output, they often suffer from:
- Incomplete coverage
- Noisy or inconsistent metadata
- Missing PDFs
- Poor differentiation between archival and non-archival content
OpenPapers addresses this gap by providing dedicated, conference-specific scrapers designed to extract high-quality metadata and full PDFs from major AI/ML venues.
The resulting dataset is structured, comprehensive, and suitable for downstream research applications.
Status
Active development.
New conferences and years are added regularly.