菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-18
📄 Abstract - scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns

Methodology bugs in scientific Python code produce plausible but incorrect results that traditional linters and static analysis tools cannot detect. Several research groups have built ML-specific linters, demonstrating that detection is feasible. Yet these tools share a sustainability problem: dependency on specific pylint or Python versions, limited packaging, and reliance on manual engineering for every new pattern. As AI-generated code increases the volume of scientific software, the need for automated methodology checking (such as detecting data leakage, incorrect cross-validation, and missing random seeds) grows. We present scicode-lint, whose two-tier architecture separates pattern design (frontier models at build time) from execution (small local model at runtime). Patterns are generated, not hand-coded; adapting to new library versions costs tokens, not engineering hours. On Kaggle notebooks with human-labeled ground truth, preprocessing leakage detection reaches 65% precision at 100% recall; on 38 published scientific papers applying AI/ML, precision is 62% (LLM-judged) with substantial variation across pattern categories; on a held-out paper set, precision is 54%. On controlled tests, scicode-lint achieves 97.7% accuracy across 66 patterns.

顶级标签: llm model evaluation systems
详细标签: code linting methodology bugs scientific software static analysis ai-generated code 或 搜索:

scicode-lint:利用大语言模型生成的模式检测科学Python代码中的方法论错误 / scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns


1️⃣ 一句话总结

这篇论文介绍了一个名为scicode-lint的新工具,它利用大语言模型自动生成检测规则,能高效发现科学计算Python代码中那些看似合理但实则错误的‘方法论漏洞’,比如数据泄露或交叉验证错误,从而解决了传统工具难以检测和手动维护规则成本高的问题。

源自 arXiv: 2603.17893