Corpus Architecture — What's Complete, What's Missing
The "same week" investigation surfaced a real gap. Documenting honestly so future research has correct expectations.
What's accessible from this prep environment
| Corpus | Status | Notes |
|---|---|---|
| bp-corpus | ✓ Indexed | BibleProject text records as JSON |
| bp-transcripts | ✓ Indexed | Whisper-overlay for empty bp-corpus records (corrected → raw/voilib → raw/yt-dlp priority) |
| bp-embeddings | ✓ Indexed | Voyage 3.5 over 87,596 fragments. Covers BOTH bp-corpus AND bp-transcripts (greedy-packed whisper utterances) |
| bp-wiki dictionary | ✓ Indexed | 1,699 candidates, 157 method entries, 745 patterns |
| voilib live API | ✓ Accessible | https://voilib.holyspirit.dev/service/media/query against 9 channels |
| CREATR-corpus | ✓ Indexed | Hillsong/worship content; mostly not relevant to BP-style biblical work |
| ICOC Alpha Omega channel | ⚠ Partial | YouTube playlist transcribed (33/36 videos); 3 missing-caption videos require whisper transcription |
Voilib channel coverage
The user's voilib.holyspirit.dev instance indexes 9 podcast channels:
| Channel | ID | Type |
|---|---|---|
| Bridgetown Audio Podcast | fafcd003-1c9f-4cb9-80d8-3627d4054168 |
Sunday teachings |
| The Handlebar Podcast | 87acf436-c77e-4c76-b2fd-1213c629068f |
— |
| BibleProject | 21e903b3-9734-4f86-8582-a4557ea41887 |
Podcast only — NOT classroom |
| Practicing the Way | ac323dc0-3dc4-4a1b-8a60-091762d4530f |
Comer's other podcast |
| Rule of Life | f6bd54a9-17e8-44e0-b72a-8259f966ca74 |
Comer's series |
| The Familiar Stranger Podcast | e426cb28-5738-475f-9754-a95108e97394 |
— |
| Praying Like Monks Living Like Fools | fe880c33-46b4-4336-92c3-4cddd6246f8d |
— |
| Being Known Podcast | eabd9848-fc86-43b2-ab03-08b414726c30 |
— |
| Exploring My Strange Bible | 8a3d016f-e4b8-43b9-bc14-96453cbb810a |
Tim Mackie's older personal podcast (Door of Hope era) |
Critical: the BibleProject voilib channel is podcasts only — it does NOT include BP classroom content.
Local bp-corpus classroom coverage
BP runs a "Classroom" series for paid subscribers. The local bp-corpus has partial coverage:
| Classroom | Local sessions | Series total (web) | Coverage |
|---|---|---|---|
| 1-corinthians (Lucy Peppiatt) | varies | varies | partial |
| Abraham | varies | — | partial |
| Adam to Noah | 6 sessions (1, 8, 15, 18, 25, 30) | 30+ | very partial |
| Art of Biblical Words | varies | — | partial |
| Ephesians | 11 sessions (1, 5, 8, 10, 13, 16, 19, 22, 24, 28, 32) | 35 | 31% coverage — 24 missing |
| Exodus Overview (Carmen Imes) | varies | — | partial |
| Ezekiel | varies | — | partial |
| Heaven and Earth | varies | — | partial |
| Introduction to Hebrew Bible | varies | — | partial |
| Jacob | varies | — | partial |
| Jonah | varies | — | partial |
| Joseph | varies | — | partial |
| Messianic Torah | varies | — | partial |
| Noah to Abraham | varies | — | partial |
| Rise of the Messiah | varies | — | partial |
The Ephesians gap (24 missing sessions) was the architectural issue that emerged from the same-week investigation. Sessions 2, 3, 4, 6, 7, 9, 11, 12, 14, 15, 17, 18, 20, 21, 23, 25, 26, 27, 29, 30, 31, 33, 34, 35 — none of them are in bp-corpus, bp-transcripts, or voilib. The verbatim "same week" Tim quote may live in any one of those gaps.
Honest research posture going forward
What "not in the corpus" actually means
Wrong inference: "Tim doesn't say X." Right inference: "X is not findable in any indexed source I have access to."
These are different. The Ephesians classroom gap proves the second is the honest read. For any future claim of the form "BP doesn't say X," qualify: "BP doesn't say X in the indexed corpus; might be in one of the 24 missing classroom sessions."
Verbatim verification protocol for load-bearing quotes
For any quote that will be load-bearing in the sermon:
- Pulled from a record I read in full? ✓ Trust verbatim. (Apply to:
[podcast:firstborn-creation],[podcast:theme-god-e18-who-did-paul-think-jesus-was],[video:art-biblical-poetry]study notes,[podcast:pen-parchment-and-people], anything inexpansion/_raw/records/.) - Voyage semantic-search snippet only? ⚠ Verify against full transcript. (The Galatians/Colossians slip is the proof of why this matters.)
- Tim said it in classroom but not findable? ⚠ Cite the substance, not the verbatim. Or look up the BP classroom directly with login.
What a future "research-against-corpus" pass should fix
If you (or whoever maintains the corpus) want to close the Ephesians gap:
- The bp-corpus discovery script targets BP's published API. The classroom sessions might require a different discovery path (paid subscription cookies?).
- Run the discovery + fetch against
class__ephesians__*to surface the 24 missing sessions. - If they're not API-accessible, capture them via browser automation against
bibleproject.com/classroom/ephesians/sessions/<N>with logged-in cookies.
For now, the gap is a known limitation. Plan accordingly when claiming what BP "doesn't say."
What's safe to claim from the indexed corpus
For Col 1:15-20 specifically, the indexed corpus is dense and reliable for:
- The hymn's content (firstborn-creation podcast, theme-god-e18 podcast — both read in full)
- Genre classification (art-biblical-poetry video study notes — confirmed)
- Image-of-God theology (multiple BP records, dictionary entries)
- Firstborn theology (firstborn series, multiple records)
- Reconciliation cluster (extensive)
- Eph + Col coordination substance (Pen, Parchment, and People — verbatim paragraph overlap claim)
For Col 1:15-20 specifically, the indexed corpus is thinner / requires care for:
- Specific Tim quotes about Eph/Col timing (the "same week" gap)
- BP classroom-only material (24 Ephesians sessions missing)
- Pre-Pauline Greek literature (TLG-class queries — not in any indexed source)
- Wisdom of Solomon 1:7 → Col 1:17 link (my own conjecture, not BP-attested)
- Door of Hope sermons (Tim's earlier pastor-era teaching beyond the 5 Exploring My Strange Bible episodes)
Bottom line
The indexed corpus is rich and useful but incomplete. The synthesis work in expansion/synthesis/01_sermon_prep_full.md is well-grounded in what IS indexed, with honest flags where claims rest on my own conjecture or standard scholarship rather than BP material.
For pulpit claims, the verification log at expansion/_verification/agent_v_report.md documents what was tested and at what confidence level. Use that as your final pre-pulpit check.
When the local corpus gap closes (24 Ephesians sessions, plus other classroom gaps), the verification footprint of any claim deepens. Until then, "not findable in the indexed corpus" ≠ "verifiably false."