generate_author_index¶
src.generators.generate_author_index
¶
Generate and maintain the canonical author index.
The author index is the single source of truth for author identity and affiliation. Every author gets a stable integer ID that never changes.
Reads
- assets/data/authors.json (from generate_author_stats — names + display_names)
- assets/data/author_index.json (previous index, if any — preserves IDs)
Writes
- assets/data/author_index.json
Usage
python -m src.generators.generate_author_index --data_dir ../reprodb.github.io
load_existing_index(path)
¶
Load the previous author index, return (list, name->entry dict, max_id).
Source code in src/generators/generate_author_index.py
28 29 30 31 32 33 34 35 36 | |
load_authors_json(path)
¶
Load authors.json produced by generate_author_stats.
Source code in src/generators/generate_author_index.py
39 40 41 42 43 44 | |
build_index(authors: list[dict], existing_by_name: dict[str, dict], max_id: int) -> list[dict]
¶
Build a new index, preserving existing IDs and syncing affiliations.
When an enricher updates authors.json with a new affiliation, we detect the change here, update the index entry, and record the old value in affiliation_history.
Returns (index_list, stats_dict).
Source code in src/generators/generate_author_index.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |