A lot of the time, machine learning & artificial intelligence researchers don’t understand exactly how their programs work. Specialists said in a research released on the pre-print site Arxiv that DALLE-2 has a well-known problem with text. Many text questions, such as this one that asks for a “picture of the word aircraft,” result in creating photos that show nonsense language.
There is a secret language that the system appears to have formed on its own in this generated text, which they find. Using this nonsense language as an example, the model is more likely to create aircraft.
DALLE-2 has a secret language.
"Apoploe vesrreaitais" means birds.
"Contarra ccetnxniams luryca tanniounons" means bugs or pests.The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs.
A thread (1/n)🧵 pic.twitter.com/VzWfsCFnZo
— Giannis Daras (@giannis_daras) May 31, 2022
When requested to caption a discussion between 2 farmers, a graphic released on Twitter by Computer Science Doctoral student Giannis Daras displays them chatting, but the conversation balloons are packed with what seems to be gibberish.
A known limitation of DALLE-2 is that it struggles with text. For example, the prompt: "Two farmers talking about vegetables, with subtitles" gives an image that appears to have gibberish text on it.
However, the text is not as random as it initially appears… (2/n) pic.twitter.com/B3e5qVsTKu
— Giannis Daras (@giannis_daras) May 31, 2022
To test the AI’s interpretations of these meaningless phrases, Daras started feeding them back into the network. This led him to discover that the AI seemed to understand what the farmers were saying when they spoke about veggies and birds.
His research addressed the very first vulnerability concern, which was the use of nonsensical cues as backdoor exploits or techniques to evade the filter. Text prompts that break policy guidelines can currently be filtered by Natural Language Processing technologies however nonsensical prompts can be exploited to get around these monitors. More crucially, nonsensical prompts that create visuals on a regular basis put our faith in these large generative models under scrutiny.
There is a possibility that the speech is more akin to noise in certain instances. Peer assessment of the article will tell us more, but there may still be something happening around that we do not really understand yet.
Leave a Reply