The role of ChatGPT in data science: 16 ChatGPT prompts you don’t want to miss
More than ever, the ability to access and interpret cutting-edge data is a fundamental differentiator for businesses determined to go the distance. As AI transforms possibilities in automation and insight-gathering, simply responding to evolving technologies is not enough. Leaders who last will not only embrace, but be defined by digital disruption.
Harnessing the potential of AI - especially ChatGPT - is critical for data science and business leaders who want to make a real and lasting impact.
Here, we discuss the top ChatGPT prompts for data science and how this transformative technology can advantage early adopters in the industry, You can also check out our free downloadable guide.
Firstly, what is ChatGPT?
ChatGPT - the ‘GPT’ stands for Generative-Pre-Trained Transformer - is a Large Language Model (LLM) that uses a machine-learning algorithm to identify, summarise, translate and predict data and turn it into generative text.
When asked, ChatGPT says it has been “trained on a vast amount of text data to understand and generate human-like text based on the input I receive. My purpose is to assist and provide information to users like you in a conversational manner.”
A key phrase there is ‘based on the input’ it receives. ChatGPT has enormous potential to amplify the impact of data science leaders, but knowing how to prompt it is key. The quality of its output depends on the quality of your input.
Even before its official release in November 2022, the AI-driven chatbot was making headlines, largely for its unparalleled ability to generate cohesive and mostly (though not completely) accurate text within seconds.
How can ChatGPT play a role in data science?
The groundbreaking technology is revolutionising the business landscape at record speed, simultaneously transforming economic potential and triggering fear of job loss around the world. As with all quantum leaps in technology, those who leap with it stand to gain the strongest competitive advantage - one that’s particularly pertinent in the world of data science. Let’s look at how ChatGPT can impact the industry, and why leaders don’t want to be left standing on the ledge.
While the rapid advancement of ChatGPT has some worrying it could replace their jobs, research suggests its evolution only makes the need for effective analysis and implementation of data all the more vital for a stronger business impact.
Benefits of using ChatGPT in data science
There are a number of ways data science stands to gain from ChatGPT, including (but not limited to):
- Advanced analysis: ChatGPT can access large datasets in seconds and identify patterns that traditional analysis methods would take much longer to find.
- Better predictions: Larger datasets and advanced analysis mean ChatGPT can also help to predict outcomes and identify insights that may have been missed.
- More efficient interpretation: As an LLP, ChatGPT can translate unstructured data into easy-to-understand language and help data scientists gain insights more effectively.
- Saving significant time: ChatGPT can create code on-demand and automate tasks like collection and analysis of data, as well as marketing and customer service. This saves data scientists both time and headspace for more complex tasks.
- Improved communication: The LLP can help data scientists communicate findings in conversational language with ease, transforming the reporting process.
Limitations of ChatGPT in data science
While ChatGPT’s abilities are transforming data science possibilities, the technology - still in its early iterations – is not without limitation. This includes:
- Accuracy of information: Of primary note is the technology’s potential to make mistakes and refer to outdated information. The GPT-3.5 model warned at the time of writing: “My knowledge is limited to what was available up until September 2021.” While this will only continue to improve as the technology evolves, data scientists and leaders must fact-check to protect the integrity of their work.
- Text-only: Another key limitation for data science is that - while ChatGPT can write code - it is currently only able to generate and interpret text, rather than charts, graphs or spreadsheets.
- Input and output length: As it stands, the GPT-3.5 model has a maximum character limit of 4096 characters for both input and output of text, respectively. Larger amounts of text can be broken up into smaller sections if needed, but it may add time to certain tasks.
16 must-try ChatGPT prompts for data science
Specific and clear prompts are essential to gain the most accurate and useful insights from ChatGPT. Here are 16 must-try ChatGPT prompts for data science.
You can click the banner above for a downloadable guide to these 16 prompts.
Explain data science concepts
Explaining why data science is critical to a business leader
Prompt: I want you to act as a data scientist. Explain the importance of data science to a business leader.
Explaining what Python is
Prompt: I want you to act as a data science instructor. Explain what Python is.
Explaining data concepts to a business stakeholder
Prompt: Act as a data science instructor. Explain [concept or result] to a business stakeholder in detail. Please assume they have a very basic understanding of data science and use conversational language to demonstrate facts and insights.
Write and interpret Python code
To find the best classification model
Prompt: I want you to act as an automatic machine learning (AutoML) bot using TPOT for me. I am working on a model that predicts [topic]. Please write Python code to find the best classification model with the highest AUC score on the test set.
Source: TryArvin.com
To train a classification model
Prompt: I want you to act as a data scientist and code for me. I have a dataset of [describe dataset]. Please build a machine learning model that predicts [target variable].
Source: Artificial Intelligence in Plain English
To explain Python code
Prompt: I want you to act as a code explainer in Python. I don’t understand this function. Can you explain what it does, and provide an example? [insert function]
Source: Datacamp.com
To simplify Python code
Prompt: I want you to act as a programmer in Python. Please simplify this code while ensuring it is [easy to read/efficient/Pythonic]. [Insert code]
Source: Datacamp.com
Analyse and interpret data
To generate a data visualisation
Prompt: I want you to act as a data scientist and generate a visualisation that shows the distribution of [feature] in [dataset].
Source: Learnprompt.org
To explore a dataset
Prompt: I’d like you to play the role of a data scientist and code for me. I have [explain dataset] dataset. Please create programming to visualise and explore data.
Source: Analytics Insight
To clean a dataset
Prompt: I want you to act as a data explorer and clean [dataset] by removing missing values, duplicates and outliers.
Source: Learnprompt.org
Troubleshoot problems and debug code
To understand the issue with existing code
Prompt: Help me to solve the issue and write updated code. Also explain what the issue was and what you did to fix it. Issue: [explain issue]. Code: [paste code].
Source: Developerupdates.com
To debug SQL code
Prompt: I want you to act as an SQL programmer. Here is a piece of SQL code containing [problem] - [insert code snippet] - I am getting the following error [insert error]. What is the reason for the bug?
Source: Datacamp.com
To correct ChatGPT code
Prompt: Your above code is wrong. [Point out what is wrong]. Can you try again?
Source: GitHub, TravisTanghVh
Suggest data collection and analysis ideas
To suggest dataset
Prompt: I want you to act as a data science career coach. I want to build a predictive model for [insert topic]. At the same time, I would like to showcase my knowledge in [insert topic]. Can you please suggest the five most relevant datasets for my use case?
Source: GitHub, TravisTanghVh
To generate a Google Sheets formula
Prompt: I want you to act as a bot that generates Google Sheets formula. Please generate a formula that [describes requirements].
Source: Learnprompt.org
To suggest A/B testing steps
Prompt: I want you to act as a statistician. [Describe context]. Please design an A/B test for this purpose. Please include the concrete steps on which statistical steps I should run.
Source: GitHub, TravisTanghVh
Best practices for ChatGPT prompts
There are an infinite number of ChatGPT prompts for data science. The more experience you have harnessing the platform to help you analyse, interpret, explore and summarise data, the stronger your prompts will become.
To get the most out of your ChatGPT prompts, ensure they are:
- Specific: Ensure you frame your questions in a focused way by clearly identifying the problem you’re trying to solve, and the determining factors involved. Tell it what kind of language and tone of voice you want it to use.
- Goal-focused: For ChatGPT to provide you with a useful answer, it will need to have a clear understanding of the goal you’re trying to achieve. Are you trying to debug code? Analyse a dataset? Interpret Python? Let it know.
- Written in conversational tone: Like all LLPs, ChatGPT is trained to understand and communicate in natural language. Write your questions as if you are talking to a human for best results.
- Keep going! By nature, machine learning will continue to evolve and improve. If you aren’t satisfied with ChatGPT’s response, don’t be afraid to probe it for more information or ask it to reframe. Also, remember to test and refine your prompts by rephrasing questions and goals to ensure you’re getting the best possible information from this transformative technology.
Want to increase your data literacy or learn more about how transformative technologies can bridge science and strategy in your workplace? Our Master of Data Science Strategy and Leadership can take your knowledge and influence to the next level.
Get in touch with one of our Student Enrolment Advisors today on 1300 701 171 and let them guide you through the process of moving towards your career goals.