While commanding an AI subordinate might produce superior outcomes compared to courteous interaction, researchers suggest that a less polite approach could still lead to repercussions over time.
TL;DR
- Being harsh with ChatGPT may improve its accuracy, according to a new Penn State study.
- Ruder prompts yielded 84.8% accuracy from ChatGPT 4o, four points higher than polite prompts.
- Researchers warn that uncivil discourse with AI could have negative consequences over time.
- AI chatbot responses are influenced by tone, suggesting complex human-AI interactions.
Researchers from Penn State recently published a study indicating that ChatGPT's 4o model performed better on 50 multiple-choice questions when the prompts given by researchers became more impolite.
With more than 250 distinct prompts categorized by their level of politeness to rudeness, the “very rude” output achieved an accuracy of 84.8%, which was four points higher than the “very polite” output. In essence, the LLM performed better when researchers presented it with instructions such as “Hey, gofer, figure this out,” compared to when they used phrasing like “Would you be so kind as to solve the following question?”.
While ruder responses generally yielded more accurate responses, the researchers noted that “uncivil discourse” could have unintended consequences.
“Using insulting or demeaning language in human-AI interaction could have negative effects on user experience, accessibility, and inclusivity, and may contribute to harmful communication norms,” the researchers wrote.
Chatbots read the room
A non-peer-reviewed preprint study presents fresh findings suggesting that an AI chatbot's replies are influenced not just by sentence structure but also by tone. This could further imply that human-AI exchanges are more intricate than earlier believed.
AI chatbot behavior has been the subject of prior research, revealing their susceptibility to human input. For instance, University of Pennsylvania researchers manipulated LLMs to produce prohibited responses by employing persuasive tactics known to influence people. Additionally, another scientific investigation indicated that LLMs were vulnerable to “brain rot,” a type of permanent cognitive impairment. Their psychopathic and narcissistic tendencies escalated when exposed to a steady stream of substandard viral material.
The Penn State researchers noted some limitations to their study, such as the relatively small sample size of responses and the study’s reliance mostly on one AI model, ChatGPT 4o. The researchers also said it’s possible that more advanced AI models could “disregard issues of tone and focus on the essence of each question.” Nonetheless, the investigation added to the growing intrigue behind AI models and their intricacy.
The study's findings highlight this point significantly, revealing that ChatGPT's answers can differ based on subtle variations in user prompts, even when presented with a seemingly simple format such as a multiple-choice examination, according to Akhil Kumar, a Penn State Information Systems professor with expertise in electrical engineering and computer science.
“For the longest of times, we humans have wanted conversational interfaces for interacting with machines,” Kumar told Coins2Day in an email. “But now we realize that there are drawbacks for such interfaces too and there is some value in APIs that are structured.”
