The problem
A solid-tumor NGS report is 300 to 500 pages. It documents variant calls, VAF distributions, coverage metrics, copy-number data, fusion analysis, and page after page of negative findings. That structure is optimized for regulatory completeness and lab workflow traceability. It is not optimized for the oncologist who has ten minutes with the patient and a treatment decision to make.
The April 2026 Northwestern study that compared structured AI-generated summaries against manual physician summarization of NGS reports found that the gap in completeness and clinical relevance favored the AI output. That finding is consistent with what clinicians describe privately: co-occurring mutations that modify treatment response, resistance markers buried in sub-reports, and recruiting trials the oncologist didn't know existed — details the report technically contains but the workflow doesn't surface. The science is ahead of the information design.
Why vector RAG is the wrong tool here
The obvious move with a modern AI pipeline is to embed the NGS report into vector space and retrieve by semantic similarity. That approach works well for general question answering. It fails for oncology on a specific and predictable class of inputs: near-miss variants.
EGFR L858R and EGFR L861Q look identical to a semantic model. Their vector embeddings sit on top of each other. They respond to different drug combinations. A pipeline that retrieves by similarity will confidently return the wrong answer for one of them. No amount of reranking or prompt engineering fixes the root problem: the retrieval layer has no structural knowledge of the difference. We cover this failure mode in depth in Why Vector RAG Fails for Oncology.
What UNMIRI does instead
UNMIRI's pipeline has four layers. Each is deliberately boring and deliberately inspectable.
Layer 1 — PDF extraction
AWS Textract handles the raw OCR. UNMIRI maintains per-lab parsers for the major NGS report formats: Foundation One CDx, Tempus xT, Caris MI Profile, Illumina TruSight Oncology, and lab-developed panels. Output is structured variant JSON: gene symbol, HGVS nomenclature, variant allele frequency, coverage, and classification.
Extraction is grindy but critical. Per-lab parsers exist because report formats differ in subtle ways that matter downstream. We build them, test them against real samples, and version them.
Layer 2 — The knowledge graph
The graph is where clinical knowledge lives. It's a Neo4j database encoding typed relationships across four authoritative sources:
- OncoKB— Memorial Sloan Kettering's FDA-recognized oncology precision knowledge base. Every variant-drug edge carries an OncoKB evidence level (1, 2, 3A, 3B, 4, R1, R2).
- ClinVar— NIH's aggregated variant interpretation database. Pathogenicity classifications and submitter-level detail.
- ClinicalTrials.gov — federal trials registry. Variant-level eligibility criteria are parsed and stored as graph edges.
- openFDA drug labels — current indications, contraindications, and boxed warnings for every FDA-approved oncology drug.
The graph is designed to incorporate additional authoritative oncology sources — commercial variant databases, guideline publishers, pharmacogenomics catalogs — as pilot partners require them and licensing is in place.
A concrete traversal for a single variant:
(Variant: EGFR L858R)
├── SENSITIZES_TO ──▶ (Drug: Osimertinib)
│ ↳ evidence_level: Level 1 (OncoKB)
│ ↳ source: FLAURA trial (NEJM 2018)
├── SENSITIZES_TO ──▶ (Drug: Erlotinib + Ramucirumab)
│ ↳ evidence_level: Level 2A (OncoKB)
│ ↳ source: RELAY trial (Lancet Oncol 2019)
└── HAS_OPEN_TRIAL ─▶ (Trial: NCT05667792 · MARIPOSA-2)
↳ eligibility: EGFR L858R + prior osimertinib
↳ status: Actively EnrollingQueries are Cypher. Same input, same output, every time. No similarity scoring, no ambiguous retrieval, no LLM improvising the clinical answer. If a variant has no edge to a drug in the graph, the pipeline reports no recommendation — which is the correct answer in that case.
Layer 3 — Deterministic template rendering
This is the most deliberate architectural choice UNMIRI makes: the 2-page cheat sheet is generated by typed templates from structured graph output, not written by a language model. Every sentence is rendered from a data field with a verified citation. Templates can't hallucinate. They can't invent a drug name, misattribute an evidence tier, or fabricate a trial NCT ID. In clinical contexts, determinism is a feature.
Layer 4 — Narrow LLM use
LLMs help in exactly two places in the pipeline:
- Extraction edge cases — where the per-lab parser hits an unusual format and the structured output needs disambiguation.
- Long-tail variants — where the graph has no exact match and the LLM is asked to summarize literature context (with the output flagged at a lower confidence band).
LLM calls run on Anthropic's HIPAA-ready API tier with a signed BAA. Prompts carry only de-identified variant data, never patient identifiers. If an LLM contributed to an output, the output is clearly marked and delivered with a confidence annotation. The default clinical path is LLM-free.
Sample output
The best way to understand the output is to see it. The sample reportshows a fully rendered 2-page Actionable Insight for an NSCLC adenocarcinoma case (EGFR L858R, TP53 co-mutation, PD-L1 <1%) — top 3 recommendations with evidence levels, checkpoint-inhibitor contraindication flagged, one matched open trial, and every claim cited. Synthetic data, real rendering.
What UNMIRI is not
Healthy skepticism about AI in clinical settings is warranted. Here is what UNMIRI does not do:
- UNMIRI is not a diagnostic device and does not make diagnoses.
- UNMIRI is not a substitute for oncologist judgment. Every output is a decision-support aid to be reviewed and countersigned by a licensed physician before clinical use.
- UNMIRI does not currently integrate with EHRs. Integration is at the LIMS layer via REST API.
- UNMIRI does not process PHI on the marketing site. Pilot and production use requires a signed BAA and deployment into a HIPAA-ready environment.
Where we are
UNMIRI is actively onboarding 2–3 regional diagnostic labs in the mid-Atlantic as Q2 2026 design partners for retrospective pilots: you send 5–10 de-identified historical NGS reports, we deliver the cheat sheets, you evaluate the output against your existing workflow. If that sounds like something your lab would consider, we'd like to talk.