Every script lives in
tools/.
Each is small, standalone, and meant to be read and adapted — scaffolding,
not a framework. They share one helper module (common.py: HTTP with backoff,
JSON I/O, DOI/arXiv parsing, APA building). Run any with --help.
Core pipeline
| Script |
Phase |
Purpose |
verify.py |
3 |
Verify every citation via PMC/PubMed/CrossRef + the arXiv API (preprints get a real verdict, not a misleading NOT-FOUND). Catches ~25% search-agent fabrications. |
references.py |
3f |
Rebuild every reference from its verified DOI/arXiv id into canonical APA-7 (full authors, particles, casing, real venue incl. bioRxiv/PsyArXiv). --audit is a hard gate. Both modes. |
spreadsheet.py |
5 |
Build/rebuild the .xlsx from the accumulated JSON rows; auto-adds Cite / Family columns when present. |
citations.py |
5b |
Per-paper citation counts from OpenAlex (primary) + Semantic Scholar by DOI, with undercount reconciliation. |
xref.py |
6 |
Cross-citation frequency table from the corpus's own CrossRef reference lists. |
| Script |
Phase |
Purpose |
families.py |
6b |
Validate / stamp / render a theoretical-family grouping (agent proposes, you approve the definitions). |
families_figure.py |
6b |
Interactive HTML lineage figure (+ svg/png/pdf); landmark dots auto-selected by citation count, within-corpus in-degree, and lab authorship. |
family_prompt_template.md |
6b |
Two-step propose → assign prompt for the families pass. |
review_paper.py |
7 |
Render an AI-authored narrative review .docx from content.json (prose) + rows.json (canonical references). Mechanics only — prose is authored separately, after the priority audit. |
Lab mode
| Script |
Phase |
Purpose |
lab_corpus.py |
L1 |
Ingest a lab's full publication corpus from OpenAlex by author id (--search to resolve the id). Enrich abstracts before classifying — OpenAlex metadata alone is insufficient. |
Search prompt & PDFs (opt-in)
| Script |
Phase |
Purpose |
search_prompt_template.md |
2 / 2b |
Prompt template for the literature-search subagent (forward search + antecedents). |
download.py |
4 (opt-in) |
Multi-source PDF downloader (arXiv → Unpaywall → EuropePMC). |
reconcile_downloads.py |
4 (opt-in) |
File manually-downloaded PDFs from ~/Downloads into the per-topic dir with the right slug. |
Shared helper
| Script |
Purpose |
common.py |
HTTP with exponential backoff, JSON read/write, DOI/arXiv id parsing, author-name splitting, and the canonical APA builder shared by the other tools. |
Read the PLAYBOOK alongside the tools
The
PLAYBOOK.md
is the operating manual the agent follows — it documents the order, the
guardrails, and the hard-won lessons (mojibake handling, compound-surname
fixes, OpenAlex undercount tells, and more) that the scripts encode.