The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code

📄 Abstract - The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code

As Large Language Models (LLMs) are transforming software development, the functional quality of generated code has become a central focus, leaving readability, one of critical non-functional attributes, understudied. Given that LLM-generated code still needs human review before adoption, it is important to understand its readability especially compared with human-written code and the role of prompt design in shaping it. We therefore set out to conduct a systematic investigation into the code readability of LLM-generated code. To systematically quantify code readability, We establish a comprehensive readability model that synthesizes textual, structural, program, and visual features of code. Based on the model, we evaluate the readability of code generated by the mainstream LLMs under 5,869 scenarios extracted from large code base including World of Code (WoC) and LeetCode. We find that current LLMs produce code with overall readability comparable to human-written code, but displaying distinct readability issue patterns. We further examine how different prompt dimensions affect the readability of LLM-generated code, and find that function signatures, constraints and style descriptions emerge as the most influential factors, while the overall impact of prompt design remains limited. Our findings indicate that, on one hand, LLM-generated code is at least comparable to human-written code in readability, validating its potential for systematic integration into software workflows from a non-functional perspective; on the other hand, distinct readability issue patterns and limited effectiveness of prompt engineering reveal a latent technical debt, highlighting the need for future research to improve the readability of LLM-generated code and thus ensure long-term maintainability.

可读性光谱：LLM生成代码的模式、问题与提示效应 / The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code

1️⃣ 一句话总结

这篇论文系统研究了大型语言模型生成代码的可读性，发现其整体可读性与人类编写代码相当，但存在特有的可读性问题模式，且通过调整提示词来改善可读性的效果有限，提示了未来需要关注代码可维护性的技术债务。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要