OpenPapers

OpenPapers is a robust Python toolkit for systematically collecting research papers from top-tier AI and machine learning conferences.

🔗 GitHub Repository: https://github.com/zhangleiniu/OpenPapers

Motivation

Since the rise of deep learning (~2013 onward), the number of publications in AI has grown exponentially. This creates challenges for:

Literature review
Citation analysis
Research trend detection
Paper recommendation
Research gap identification

OpenPapers builds a curated and structured dataset that enables:

Citation and reference recommendation
Structured PDF parsing (e.g., via GROBID/Nougat)
Research limitation analysis
Intelligent paper reading recommendation systems

Overview

The rapid growth of AI and ML research has led to an overwhelming volume of publications each year. While platforms such as Google Scholar, Semantic Scholar, and OpenReview attempt to aggregate research output, they often suffer from:

Incomplete coverage
Noisy or inconsistent metadata
Missing PDFs
Poor differentiation between archival and non-archival content

OpenPapers addresses this gap by providing dedicated, conference-specific scrapers designed to extract high-quality metadata and full PDFs from major AI/ML venues.

The resulting dataset is structured, comprehensive, and suitable for downstream research applications.

Status

Active development.
New conferences and years are added regularly.

OpenPapers#

Motivation#

Overview#

Status#

OpenPapers

Motivation

Overview

Status