Clinical intelligence

OncoKB + ClinVar Matching Engine

Unified variant classification across OncoKB, ClinVar, COSMIC, ClinicalTrials.gov, and openFDA — via knowledge graph traversal, not sequential API calls. Aggregate evidence tiers, surface disagreements between sources, and cite every claim back to the specific knowledge-base entry that produced it.

OncoKB Level 1–4ClinVar clinical significanceCOSMIC mutation contextGraph traversal

Why manual multi-KB matching fails

A bioinformatician classifying a complex case typically opens 4–6 browser tabs: OncoKB for therapy tiers, ClinVar for clinical significance, COSMIC for mutation prevalence, openFDA for label-level drug status, civic for additional evidence, and PubMed for the source literature. They cross-reference by hand, reconcile disagreements, and write up a summary. Per case, 30–60 minutes on matching alone.

The tabs-and-spreadsheets approach doesn't scale with panel volume and doesn't produce a consistent audit trail. That's the gap this engine closes.

How the graph traversal works

Each oncology knowledge base is represented as a set of typed nodes and relationships in a single Neo4j graph. When a variant arrives, one query traverses all relevant sources simultaneously:

SourceWhat it contributesGraph relationship type
OncoKBTherapy-oriented evidence tiers (Level 1–4)SENSITIZES_TO, RESISTS
ClinVarAggregated clinical significance (pathogenic, VUS, benign)HAS_CLINICAL_SIGNIFICANCE
COSMICMutation prevalence by tumor typeOCCURS_IN_CANCER_TYPE
openFDACurrent approval + label-level contraindicationsAPPROVED_FOR, CONTRAINDICATED_WITH
civicAdditional predictive/prognostic evidenceHAS_EVIDENCE
gnomADPopulation allele frequency for germline filteringHAS_POPULATION_FREQUENCY

What the response looks like

A single POST /v1/variants/classify call returns a unified classification: evidence tier, clinical significance, disagreement flags (when sources conflict), mutation prevalence context, and a complete citation set. Output structure example:

  • variant — HGVS notation, transcript, gene
  • amp_tier — I / II / III / IV (aggregated)
  • oncokb_level — 1, 2, 3A, 3B, 4, or R1/R2
  • clinvar_significance — pathogenic, likely pathogenic, VUS, etc.
  • source_agreement — “concordant” | “partial” | “discordant” + details
  • citations — per-source, with version and retrieval date
Why no live API calls at report time. Knowledge bases are ingested on a scheduled cadence and stored locally in the graph. This means sub-second classification latency, no dependency on external uptime, and a clean PHI boundary — no variant data leaves your infrastructure during classification. Architecture details: Building a HIPAA-Ready Architecture for Clinical Decision Support.

Handling source disagreement

When OncoKB and ClinVar disagree on a variant, the matching engine doesn't silently pick a winner. The output explicitly flags the disagreement with both positions. Your bioinformatician — or your institutional variant interpretation committee — makes the final call. That call can be pinned as a lab-specific override that applies to all future reports.

Full architectural context: Why Vector RAG Fails for Oncology. Compliance details on the security page.

How UNMIRI actually does this

The OncoKB and ClinVar data are normalized into a single Neo4j graph, along with ClinicalTrials.gov and openFDA drug labels. A classification request runs as a Cypher query that returns all matched entries with provenance. No similarity scoring, no LLM reasoning — matching is deterministic and fully auditable. More on the architecture.

Frequently asked questions

They serve different roles. OncoKB provides therapy-oriented evidence tiers (Level 1–4) for somatic variants — optimized for treatment selection. ClinVar aggregates clinical significance assertions across submitters with an emphasis on germline pathogenicity. Matching against both gives you therapy recommendations AND pathogenicity context in a single pass.

Stop juggling 6 browser tabs per variant.

One API call. All major oncology knowledge bases. Every claim traceable. Available standalone or as part of the full reporting pipeline.