generate_paper_citations¶
src.generators.generate_paper_citations
¶
Generate paper citation counts from Google Scholar.
Uses a single source (Google Scholar via the scholarly library) to ensure all citation counts are comparable. Results are cached to disk so the enricher can be run in batches — if Scholar blocks us we stop, and on the next run cached papers are skipped automatically.
Reads
assets/data/artifacts.json — paper titles, conferences, badges
Outputs
assets/data/paper_citations.json — per-paper citation data assets/data/paper_citations_summary.json — aggregate summary
Usage
Full run (will stop gracefully if blocked):¶
python3 -m src.generators.generate_paper_citations \ --data_dir ../reprodb.github.io
Report what's cached without making any API calls:¶
python3 -m src.generators.generate_paper_citations \ --data_dir ../reprodb.github.io --cache_only
Custom cache TTL (default: 90 days):¶
python3 -m src.generators.generate_paper_citations \ --data_dir ../reprodb.github.io --cache_ttl_days 90
scholar_lookup(title: str) -> dict | None
¶
Query Google Scholar for citation count. Returns result dict or None.
Source code in src/generators/generate_paper_citations.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |