Skip to content

Tools reference

Every script lives in tools/. Each is small, standalone, and meant to be read and adapted — scaffolding, not a framework. They share one helper module (common.py: HTTP with backoff, JSON I/O, DOI/arXiv parsing, APA building). Run any with --help.

Core pipeline

Script Phase Purpose
verify.py 3 Verify every citation via PMC/PubMed/CrossRef + the arXiv API (preprints get a real verdict, not a misleading NOT-FOUND). Catches ~25% search-agent fabrications.
references.py 3f Rebuild every reference from its verified DOI/arXiv id into canonical APA-7 (full authors, particles, casing, real venue incl. bioRxiv/PsyArXiv). --audit is a hard gate. Both modes.
spreadsheet.py 5 Build/rebuild the .xlsx from the accumulated JSON rows; auto-adds Cite / Family columns when present.
citations.py 5b Per-paper citation counts from OpenAlex (primary) + Semantic Scholar by DOI, with undercount reconciliation.
xref.py 6 Cross-citation frequency table from the corpus's own CrossRef reference lists.

Families, figure & review

Script Phase Purpose
families.py 6b Validate / stamp / render a theoretical-family grouping (agent proposes, you approve the definitions).
families_figure.py 6b Interactive HTML lineage figure (+ svg/png/pdf); landmark dots auto-selected by citation count, within-corpus in-degree, and lab authorship.
family_prompt_template.md 6b Two-step propose → assign prompt for the families pass.
review_paper.py 7 Render an AI-authored narrative review .docx from content.json (prose) + rows.json (canonical references). Mechanics only — prose is authored separately, after the priority audit.

Lab mode

Script Phase Purpose
lab_corpus.py L1 Ingest a lab's full publication corpus from OpenAlex by author id (--search to resolve the id). Enrich abstracts before classifying — OpenAlex metadata alone is insufficient.

Search prompt & PDFs (opt-in)

Script Phase Purpose
search_prompt_template.md 2 / 2b Prompt template for the literature-search subagent (forward search + antecedents).
download.py 4 (opt-in) Multi-source PDF downloader (arXiv → Unpaywall → EuropePMC).
reconcile_downloads.py 4 (opt-in) File manually-downloaded PDFs from ~/Downloads into the per-topic dir with the right slug.

Shared helper

Script Purpose
common.py HTTP with exponential backoff, JSON read/write, DOI/arXiv id parsing, author-name splitting, and the canonical APA builder shared by the other tools.

Read the PLAYBOOK alongside the tools

The PLAYBOOK.md is the operating manual the agent follows — it documents the order, the guardrails, and the hard-won lessons (mojibake handling, compound-surname fixes, OpenAlex undercount tells, and more) that the scripts encode.