Correlation Is Not Enough: Embedding Human Metadata for Individual Causal Discovery

📄 Abstract - Correlation Is Not Enough: Embedding Human Metadata for Individual Causal Discovery

Ask a pretrained biomedical language model whether "cortisol 28 ug/dL" and "stock-market volatility" are related, and it returns a cosine similarity of 0.83 on a scale where 1.0 means identical. The two share no mechanism. This is not a corner case: every off-the-shelf biomedical encoder we tested (BioBERT, PubMedBERT, BioM-ELECTRA) scores unrelated cross-domain pairs between 0.76 and 0.92 when the answer should be near zero. Accuracy on cross-domain discrimination is 0%. Retrieval systems survive this, because a language model downstream filters the noise. A Large Behavioural Model (LBM), a foundation model whose subject is a person rather than a sentence, does not: it reasons over a graph of a user's life and treats embedding proximity as evidence that two events are causally linked. False proximity writes a false causal edge, and everything downstream inherits the error. Here, embedding geometry is not a tuning knob; it is correctness. We report the fix. A contrastive pass over 72,034 pairs raises PubMedBERT BIOSSES correlation from 0.633 to 0.828 and within-vs-across-domain separation from 1.05x to 1.63x. A second pass, BODHI, mines hard negatives from edges absent in a biomedical knowledge graph and lifts separation to 2.30x and the discrimination gap to +0.392, at a 4.5% BIOSSES cost. On an Intel Xeon 6737P with AMX, OpenVINO cuts single-query latency from 1367 ms to 10 ms (133x) and reaches 555 sentences/sec. One finding contradicts standard advice: FP16 beats INT8 on this silicon at every serving batch size, and we explain why. The same model on a no-AMX Ice Lake instance runs 13-27x slower. We release the benchmark suite, training corpora, the BODHI generator, and the OpenVINO scripts.

相关性并不足够：为个体因果发现嵌入人类元数据 / Correlation Is Not Enough: Embedding Human Metadata for Individual Causal Discovery

1️⃣ 一句话总结

本文发现现有的生物医学语言模型在判断毫无关联的两个概念时也会给出很高的相似度分数（如“皮质醇28 μg/dL”与“股市波动”的余弦相似度高达0.83），这导致以嵌入距离作为因果证据的大型行为模型（LBM）产生大量虚假因果链路；作者提出了BODHI的对比训练方法，通过从生物医学知识图谱中挖掘难负样本，将跨领域区分度提升2.30倍，并结合OpenVINO在支持AMX的英特尔至强处理器上实现了133倍的速度提升（单查询从1367毫秒降至10毫秒），同时发现FP16精度在该硬件上全面优于INT8。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要