teaching/sermons/col-1-15-20/expansion/synthesis/05_corpus_architecture_notes.md

Corpus Architecture — What's Complete, What's Missing

The "same week" investigation surfaced a real gap. Documenting honestly so future research has correct expectations.


What's accessible from this prep environment

Corpus Status Notes
bp-corpus ✓ Indexed BibleProject text records as JSON
bp-transcripts ✓ Indexed Whisper-overlay for empty bp-corpus records (corrected → raw/voilib → raw/yt-dlp priority)
bp-embeddings ✓ Indexed Voyage 3.5 over 87,596 fragments. Covers BOTH bp-corpus AND bp-transcripts (greedy-packed whisper utterances)
bp-wiki dictionary ✓ Indexed 1,699 candidates, 157 method entries, 745 patterns
voilib live API ✓ Accessible https://voilib.holyspirit.dev/service/media/query against 9 channels
CREATR-corpus ✓ Indexed Hillsong/worship content; mostly not relevant to BP-style biblical work
ICOC Alpha Omega channel ⚠ Partial YouTube playlist transcribed (33/36 videos); 3 missing-caption videos require whisper transcription

Voilib channel coverage

The user's voilib.holyspirit.dev instance indexes 9 podcast channels:

Channel ID Type
Bridgetown Audio Podcast fafcd003-1c9f-4cb9-80d8-3627d4054168 Sunday teachings
The Handlebar Podcast 87acf436-c77e-4c76-b2fd-1213c629068f
BibleProject 21e903b3-9734-4f86-8582-a4557ea41887 Podcast only — NOT classroom
Practicing the Way ac323dc0-3dc4-4a1b-8a60-091762d4530f Comer's other podcast
Rule of Life f6bd54a9-17e8-44e0-b72a-8259f966ca74 Comer's series
The Familiar Stranger Podcast e426cb28-5738-475f-9754-a95108e97394
Praying Like Monks Living Like Fools fe880c33-46b4-4336-92c3-4cddd6246f8d
Being Known Podcast eabd9848-fc86-43b2-ab03-08b414726c30
Exploring My Strange Bible 8a3d016f-e4b8-43b9-bc14-96453cbb810a Tim Mackie's older personal podcast (Door of Hope era)

Critical: the BibleProject voilib channel is podcasts only — it does NOT include BP classroom content.


Local bp-corpus classroom coverage

BP runs a "Classroom" series for paid subscribers. The local bp-corpus has partial coverage:

Classroom Local sessions Series total (web) Coverage
1-corinthians (Lucy Peppiatt) varies varies partial
Abraham varies partial
Adam to Noah 6 sessions (1, 8, 15, 18, 25, 30) 30+ very partial
Art of Biblical Words varies partial
Ephesians 11 sessions (1, 5, 8, 10, 13, 16, 19, 22, 24, 28, 32) 35 31% coverage — 24 missing
Exodus Overview (Carmen Imes) varies partial
Ezekiel varies partial
Heaven and Earth varies partial
Introduction to Hebrew Bible varies partial
Jacob varies partial
Jonah varies partial
Joseph varies partial
Messianic Torah varies partial
Noah to Abraham varies partial
Rise of the Messiah varies partial

The Ephesians gap (24 missing sessions) was the architectural issue that emerged from the same-week investigation. Sessions 2, 3, 4, 6, 7, 9, 11, 12, 14, 15, 17, 18, 20, 21, 23, 25, 26, 27, 29, 30, 31, 33, 34, 35 — none of them are in bp-corpus, bp-transcripts, or voilib. The verbatim "same week" Tim quote may live in any one of those gaps.


Honest research posture going forward

What "not in the corpus" actually means

Wrong inference: "Tim doesn't say X." Right inference: "X is not findable in any indexed source I have access to."

These are different. The Ephesians classroom gap proves the second is the honest read. For any future claim of the form "BP doesn't say X," qualify: "BP doesn't say X in the indexed corpus; might be in one of the 24 missing classroom sessions."

Verbatim verification protocol for load-bearing quotes

For any quote that will be load-bearing in the sermon:

  1. Pulled from a record I read in full? ✓ Trust verbatim. (Apply to: [podcast:firstborn-creation], [podcast:theme-god-e18-who-did-paul-think-jesus-was], [video:art-biblical-poetry] study notes, [podcast:pen-parchment-and-people], anything in expansion/_raw/records/.)
  2. Voyage semantic-search snippet only? ⚠ Verify against full transcript. (The Galatians/Colossians slip is the proof of why this matters.)
  3. Tim said it in classroom but not findable? ⚠ Cite the substance, not the verbatim. Or look up the BP classroom directly with login.

What a future "research-against-corpus" pass should fix

If you (or whoever maintains the corpus) want to close the Ephesians gap:

  1. The bp-corpus discovery script targets BP's published API. The classroom sessions might require a different discovery path (paid subscription cookies?).
  2. Run the discovery + fetch against class__ephesians__* to surface the 24 missing sessions.
  3. If they're not API-accessible, capture them via browser automation against bibleproject.com/classroom/ephesians/sessions/<N> with logged-in cookies.

For now, the gap is a known limitation. Plan accordingly when claiming what BP "doesn't say."


What's safe to claim from the indexed corpus

For Col 1:15-20 specifically, the indexed corpus is dense and reliable for:

For Col 1:15-20 specifically, the indexed corpus is thinner / requires care for:


Bottom line

The indexed corpus is rich and useful but incomplete. The synthesis work in expansion/synthesis/01_sermon_prep_full.md is well-grounded in what IS indexed, with honest flags where claims rest on my own conjecture or standard scholarship rather than BP material.

For pulpit claims, the verification log at expansion/_verification/agent_v_report.md documents what was tested and at what confidence level. Use that as your final pre-pulpit check.

When the local corpus gap closes (24 Ephesians sessions, plus other classroom gaps), the verification footprint of any claim deepens. Until then, "not findable in the indexed corpus""verifiably false."