Do We Truly Need Code Understanding?

5 min readMay 6, 2024

As LLM shows its excellent capability at writing and explaining code, developers are gradually satisfied with products like GitHub Copilot, which releases them from coding line by line.

In this article, I want to discuss whether we truly need code understanding as developer tools in the future. Let’s begin with a paper from CMU researchers: Using an LLM to Help With Code Understanding.

Using an LLM to Help With Code Understanding

In this paper, researchers developed a Visual Studio Code plugin to provide code understanding functionalities. Let me illustrate what this tool could do.

Here is the user story:

A user is developing a 3D visualizer. He found the output incorrect. Specifically, the bunny should stand on the chair instead of beneath it (see picture below).
He located the bug in the lines from 26 to 30 highlighted in the screenshot. However, he is not familiar with the Open3D library and computer graphics. He needs to ask others or the internet for more information.
Before searching the internet, the CMU researchers said “Hold on, we just created a great code understanding tool. You can use it to understand your code and fix the bug in it.”
He said “OK, hope that works. Let me try.” He first selected the lines between 26 to 30, then clicked the “AI Explanation” button. The left panel is shown. At the top of the panel, there is an AI overview talking about what this code snippet does. Umm, looks like the overview is not useful. He started to type in his question “How can I move bunny to sit upright?” Fortunately, the LLM responded with an explanation of the bug and showed the correct code.

With this tool, researchers established some experiments with 32 participants to investigate whether it can help people during their development. I don’t want to introduce too many details about the statistics in these experiments. Let’s directly take a look at the conclusions they obtained. And I will share my interpretation.

Conclusion 1: There are statistically significant gains in task completion rate when using GILT, compared to a web search, but the degree of the benefit varies between students and professionals.

My interpretation 1: Task completion rate is improved while task time consumption is not improved. This means if people have experience, they can always make it regardless of the tools they use. On the other side, beginners don’t know how to search the internet for a good solution, but the tool helps them avoid this weakness.

Conclusion 2: Overall, participants used Overview and prompt context most frequently. However, the way participants interact with GILT varied based on their learning styles and familiarity with other AI tools.

My interpretation 2: Prompt is more powerful when fixing the bug. But the no-prompt buttons are also useful, especially for beginners who are not good at writing prompts.

Conclusion 3: Participants appreciated GILT’s ability to easily incorporate their code as context, but some participants reported that writing a good prompt is still a challenge.

Limitations of this research

Though this research reveals some interesting conclusions, it’s not useful enough. I will demonstrate the limitations from two perspectives.

From the perspective of tool design

The strength of this tool is that it can read context directly by working as a VSCode plugin. But it’s still not fluent enough for the user. The user must locate the bug in a code snippet, and then ask how to fix it. This itself should be a critical problem for beginners.

As Conclusion 2 mentioned, beginners are not good at writing prompts, but they are also not good at fixing bugs by looking at API documents or examples. Overall, the tool is good for experienced people, but not for beginners.

From the perspective of the research method

The researchers do 2 things in this paper. They designed the tool and they investigated how LLM can help code understanding. The problem is that this paper can only show how their tool can help code understanding. Others may build better tools with LLM and result in a different conclusion. So I think this research doesn’t say much about the big question.

As GitHub already released CopilotX with similar functionalities, it would be better if any research could be conducted with the backend statistics from the production side.

Do we truly need code understanding?

I will explain my ideas towards LLM-powered IDE by starting with this question.

My answer is it depends. As part of today’s development workflows, code understanding is the core of tools like CopilotX. But when we think about the future of software development, is it possible to switch our programming paradigm to a new stage?

Still, take the debug task we discussed above as an example. If I’m a pure beginner in programming, what’s the best tool for me to fix the code? Here are the preferred procedures:

Compares the real output with the expected output, finds the differences, and tells the tool to fix the code.
By examining all the code, the tool has to find the most possible code snippet that caused the bug.
By checking the API and background knowledge, the tool needs to rewrite the code selected in the last step.
Ask the user to rerun the code or directly rerun it in the background. The user decides whether to accept the change or retry again.

In this new workflow, the user doesn’t need to understand the code in detail. He just needs to focus on whether it achieves the final goal.

From a research perspective, another advantage of this idea is that we can treat it as a bug-fixing task and establish a standard evaluation set for it. This will eliminate the noise of the subjective ratings from particular users.

This idea aligns with my ultimate vision of software development. The programmer and product manager combine as a single role. They tell the IDE what they want, and leave the rest to LLM.

To achieve this goal, there are some critical technical issues to be addressed:

Support multi-modal format input for users demonstrating their demand.
An algorithm to find the most possible related code snippet for the bug.
Code generation with a given context.

This is a fascinating research topic and I believe it will shape the future software development ecosystem.

(Note that this article is written when I applying for a related PhD program. But I didn’t make it to the finals. However, I still believe my idea is in the correct direction and I’m willing to share it with you. Feel free to contact me if you are interested.)