Using ChatGPT And Prompt Engineering To Learn Undocumented Codebases And Libraries

Table of Contents

Learning how to navigate and understand undocumented codebases is a vital skill for software developers, which is why in this article I’d like to share some insights and techniques on how to make this process easier using ChatGPT (or any other similar tools).

Using ChatGPT The Right Way

AI tools are a wonderful addition to a developer’s work environment and can exponentially increase productivity if used properly. However, developers should be mindful of how to use them.

All Answers Should Be Verifiable

When using ChatGPT for analyzing a codebase, it’s important for all answers to be verifiable. That usually means that instead of simply asking how something works, you should ask where the information comes from.

If documentation is lacking, oftentimes the best approach would be to simply ask for the source code snippets it used to reach its conclusion.

Obviously this requires a minimum understanding of whichever programming language the code is written in, which may not be a problem for seasoned developers but may pose some challenge to less experienced ones.

For instance, consider this prompt:

How does X class fit within the system?

Notice that ChatGPT will return an answer for the question but won’t provide verifiable sources that backup its hypothesis. Hence, a better prompt would be as follows:

Could you explain to me how X class fits within the system? Provide me with the code snippets in the source code that made you reach your conclusion.

By asking for proof of its conclusions, you can safely verify whether or not they’re correct.

However, sometimes this may not be necessary or even ideal, as described in the following sections of this article.

Generalize First, Specify Later

The more granular a prompt, the better and more accurate of a response it gives. Yet when learning a new codebase, we can’t specify that much, given that we don’t know much about it. Hence, the more familiar you are with a codebase, the more specific your prompts should be, and vice versa.

An effective prompt for when you’re brand new to a codebase would be

Implement X using library/codebase Y. Show me where you got this information from, preferably referencing code snippets in said library/codebase.

You may even omit the proof-asking prompt altogether, simplifying it to

Implement X using library/codebase Y.

The better your understanding of the codebase, the more specific and rigorous proof you should inquire about.

Note that these outlines are not meant to be strict rules but instead a guide on how to approach general cases. There will always be situations where different approaches may be reasonable.

Open Source Libraries

A strong use case for this approach is integrating an undocumented or poorly documented open-source library into an application, since ChatGPT can access both the source code and projects that use it.

Proprietary Code

When working with proprietary code, it’s important to comply with your company guidelines regarding codebase privacy, and some organizations have policies that forbid developers from inputting company code into ChatGPT or other AI tools due to privacy concerns.

In these circumstances, a practical way to learn an undocumented codebase with ChatGPT is by using mock code snippets to represent specific parts of the system. Rather than sharing real code, describe situations, structures, or behaviors in the code that you want to understand in a generalized form.

If your company doesn’t specify policies about the usage of AI tools such as ChatGPT, be sure to seek permission, even when using mock code as described.

This is something that decision-makers have to account for in view of the widespread growth and adoption of AI in these last few years, given that AI-limiting policies in companies that seek code privacy may negatively impact developers’ productivity.

Businesses that value privacy should consider investments that enable developers’ usage of LLMs privately, such as self-hosting.