A quick guide to to using ChatGPT to relabel inconsistent data
1. You will need ChatGPT Plus (currently $20 per month). Although we recommend against using ChatGPT Plus for confidential information. Instead, you’ll want to use the Enterprise version.
2. If you are operating on the older ChatGPT interface (shown below), then you will need to turn on the ChatGPT Advanced Data Analysis feature. (Following OpenAI’s Dev Day on November 6, 2023, they are rolling out a new ChatGPT interface that does not require you to turn on Advanced Data Analysis, it is automatically integrated.)
- Click the three dots next to your username in the lower left
- Go to Beta Features. Turn Advanced Data Analysis on
- Click GPT-4 at the top (this will only be there if you are a paid subscriber). Select Advanced Data Analysis
3. Next, upload the Excel file with the messy data into ChatGPT (use the + symbol in the chat bar). Or if you prefer, you can paste the data directly into that same chat bar.
4. Prompt ChatGPT to look at the data for any inconsistencies.
5. ChatGPT will evaluate the dataset and make re-naming suggestions.
6. You can then request a set of relabeled data as well as a key mapping the old unique names to the new names. You can ask ChatGPT to output the information to Excel or you can copy-paste the data out of ChatGPT directly.
Once you relabel your data, you can also ask ChatGPT to analyze and summarize your data. Pro tip: ask it to check its work for more accurate outputs.
Traditionally, messy data clean-up is something we teach people to do in Excel by:
- Generating a unique customer list: =sort(unique(original_customer_list)
- Manually deciding on standardized names for each of those customers
- Integrating the clean names back into the dataset =xlookup(orig_customer_name, original_customer_list, clean_customer_list)
While not exceptionally complex, following the process still relies on a decent understanding of Excel and locking cells. Even if you’re great at Excel, the process of manually building out the standardized names in the key is tedious for long lists.
The only downside is the sheet you get back from ChatGPT doesn’t have a live xlookup formula, so if you want to change the key later, you’d have to go back to ChatGPT for help or replace the clean column with an xlookup. Regardless, for the initial data assessment and clean-up process, ChatGPT is a game-changer, even for strong Excel users.
**As always, the usual caveats about privacy apply. Don’t upload confidential data to regular ChatGPT. You’ll want Enterprise for appropriate data protection. You should also consult with your company about any firm-specific policies on AI usage. Recalc Academy does not assume any liability related to your use of AI.
Our expectation is that the future of data work will involve a mix of AI (with tools like ChatGPT and Microsoft Co-Pilot) and more traditional Excel and Google Sheets work. To improve your Excel and Google Sheets skills join our Spreadsheet Fundamentals or Business Analysis courses. We’ll teach you how to take AI’s relabeling suggestions and turn them into an easy-to-update key to keep your analysis flexible and up-to-date.
Stay up-to-date on upcoming events as part of our AI and Data Analysis Learning Series by subscribing to our calendar.