ChatGPT Shows a Human-like Phonological Bias: Consonants over Vowels

A recent research article reveals an interesting similarity between human language processing and artificial intelligence language models: both display a phonological bias, specifically a preference for consonants over vowels when identifying words. Lead researcher Juan Manuel Toro and the author from the Institució Catalana de Recerca i Estudis Avançats (ICREA) analyzed the language processing patterns of OpenAI’s ChatGPT and discovered that it mimics the human’s consonant bias across different languages.

Exploring ChatGPT’s Phonological Bias

Typically, humans rely more on consonants than vowels for lexical access. This consonant bias appears across different ages (from infants to adults), modalities (oral or written), native languages (English, French, Spanish, Dutch), and tasks (word learning, word reconstruction, masked priming). The question that emerged was whether current artificial large language models, such as OpenAI’s ChatGPT, would also exhibit this consonant bias when processing language.

Juan Manuel Toro and the researchers tested ChatGPT by asking it to choose between two non-words, one with a vowel change and one with a consonant change, that were more similar to a target word. By analyzing 100 different words, the authors noticed that ChatGPT showed a strong consonant bias across both English and Spanish languages. This bias was not built into the model, suggesting it may be an emergent property of the training process that prepares these models to process language.

ChatGPT’s Emergent Behavior and AI Language Processing

The existence of a phonological bias in advanced chatbots like ChatGPT is an example of emergent behavior in artificial intelligence. This emergent property draws interesting parallels between natural and artificial intelligences in their use of language. While natural languages tend to have more consonants than vowels, AI language models like ChatGPT are not explicitly trained to prefer one over the other.

The training process for large language models like ChatGPT involves tracking the relative differences in the distribution of consonants and vowels in natural languages. By learning to predict which word most likely follows another in a given context, these models may focus on consonants to disambiguate between possible lexical tokens. This learning process appears similar to how infants generalize grammatical rules based on their experiences and exposure to language.

Implications and Takeaways for AI Capabilities

The discovery of ChatGPT’s phonological bias highlights important progress in the development of artificial intelligence, as it signifies a similarity between human language processing and AI language models. This parallel may suggest that AI language models are becoming more capable of understanding and processing language in ways that resemble human-like language abilities.

As AI language models continue to advance, understanding emergent properties like phonological bias provides insight not only into the capabilities of these models but also the conditions under which some complexities of human language may arise across diverse cognitive systems. This research may lead to better language model training processes, ultimately contributing to more efficient, accurate, and contextually sensitive AI language processing.

In conclusion, the presence of a consonant bias in ChatGPT exemplifies an important emerging behavior that brings together artificial and natural intelligences in their use of language. These findings could potentially pave the way for more sophisticated AI language capabilities that better emulate human-like language processing skills, ultimately revolutionizing the way AI understands and communicates with humans.

Original Paper

Emergence of a phonological bias in ChatGPT