Gregory P. Bufithis, Esq.
Founder/CEO
Sarcasm is almost impossible for computers to spot. A mathematical approach to linguistics could change that.
20 October 2016 – Back in 1970, the social activist Irina Dunn scribbled a slogan on the back of a toilet cubicle door at the University of Sydney. It said: “A woman needs a man like a fish needs a bicycle.” The phrase went viral and eventually became a famous refrain for the growing feminist movement of the time.
The phrase is also an example of sarcasm. The humor comes from the fact that a fish doesn’t need a bicycle. Most humans have little trouble spotting this. But while various advanced machine learning techniques have helped computers spot other forms of humor, sarcasm still largely eludes them. These other forms of humor can be spotted by looking for, say, positive verbs associated with negative or undesirable situation. And some researchers have used this approach to look for sarcasm.
But sarcasm is often devoid of sentiment. The phrase above is a good example — it contains no sentiment-bearing words. So a new strategy is clearly needed if computers are ever to spot this kind of joke.
Aditya Joshi at the Indian Institute of Technology Bombay in India, and a few pals, say they’ve hit on just such a strategy. They say their new approach dramatically improves the ability of computers to spot sarcasm. They have published a paper on my favorite artificial intelligence/computer science/mathematics/physics site arxiv.org entitled “Are Word Embedding-based Features Useful for Sarcasm Detection?” I have a direct link to the paper at the end of this post but here is a summary:
Their method is relatively straightforward. Instead of analyzing the sentiment in a sentence, Joshi and co analyze the similarity of the words. They do this by studying the way words relate to each other in a vast database of Google News stories containing some three million words. This is known as the Word2Vec database.
NOTE: Word2vec will be recognized by my e-discovery clients. This is part of a neural networks system being developed by Google which I wrote about in my client summary of DLD Tel Aviv (public version here), sort of the “next new thing” drama in machine learning/reconstruction of linguistic contexts of words. It is part of an algorithm system using a souped-up latent semantic analysis technique tipped to give text analytics something better than predictive coding. “Game-changing”? Oh, please. Early stages. We’ll see. But this is more attuned to “storytelling”, the new mantra in e-discovery. It was discussed last week in Stockholm, Sweden at the annual Intelligent Data Analysis Symposium. I am awaiting more info from my staffer who attended.
This database has been analyzed extensively to determine how often words appear next to each other. This allows them to be represented as vectors in a high dimensional space. It turns out that similar words can be represented by similar vectors and that vector space mathematics can capture simple relationships between them. For example, “king – man + woman = queen.” Although there are clear differences between the words “man” and “woman,” they occupy similar parts of the vector space. However, the words bicycle and fish occupy entirely different parts of the space and so are thought of as very different.
According to Joshi and co, sentences that contrast similar concepts with dissimilar ones are more likely to be sarcastic.
To test this idea, they study the similarity between words in a database of quotes on the Goodreads website. The team chose only quotes that have been tagged “sarcastic” by readers and, as a control, also include quotes tagged as “philosophy.” This results in a database of 3,629 quotes, of which 759 are sarcastic. The team then compared the word vectors in each quote looking for similarities and differences.
The results make for interesting reading. Joshi and co say this word embedding approach is significantly better than other techniques at spotting sarcasm. “We observe an improvement in sarcasm detection,” they say. The new approach isn’t perfect, of course. And the errors it makes are instructive. For example, it did not spot the sarcasm in the following quote: “Great. Relationship advice from one of America’s most wanted.” That’s probably because many of these words have multiple meanings that the Word2Vec embedding does not capture.
Another sarcastic sentence it fails to spot is: “Oh, and I suppose the apple ate the cheese.” In this case, apple and cheese have a high similarity score and none of the words pairs shows a meaningful difference. So this example does not follow the rule that the algorithm is designed to search for.
The algorithm also incorrectly identifies some sentences as sarcastic. Joshi and co point to this one, for example: “Oh my love, I like to vanish in you like a ripple vanishes in an ocean—slowly, silently and endlessly.” Humans had not tagged this as sarcastic. However, it is not hard to imagine this sentence being used sarcastically.
Overall, this is interesting work which raises some directions for future research. In particular, it would be fascinating to use this kind of algorithm to create sarcastic sentences and perhaps use human judges to decide whether or not they work in this sense. Beyond that is the task of computational humor itself. That’s an ambitious goal but perhaps one that is not entirely out of reach. Much humor is formulaic so an algorithm ought to be able to apply such a formula with ease. Yeah. Right.
The pace of artificial intelligence and language analysis is intriguing, but we have far to go. No artefact exists today that has anything approaching human-level intelligence and understanding. While there are computer programs that can out-perform humans in highly demanding yet specialized intellectual domains … such as playing the game of Go … no computer or robot today can match the generality of human intelligence. Not there yet. The English language gives a fair amount of leeway in expressing ideas compared to other more restrictive languages. That makes it lively and it tends to adopt new words rather quickly but also sometimes makes it equivocal and difficult to comprehend the true meaning of a text.
As a concluding thought, let me quote a paragraph from Marcel Proust’s On Reading … yes, he of the breath-defying sentences. One of my AI professors uses passages from Proust to explain her points on AI and language. Being one of my favorite French authors, the connections work brilliantly for me. The passage (read it as fast as you can):
There are perhaps no days of our childhood we lived so fully as those , . . we spent with a favorite book, Everything that filled them for others, so it seemed, and that we dismissed as a vulgar obstacle to a divine pleasure: the game for which a friend would come to fetch us at the most interesting passage; the troublesome bee or sun ray that forced us to lift our eyes from the page or to change position; the provisions for the afternoon snack that we had been made to take along and we left beside us on the bench without touching, while above our head the sun was diminishing in force in the blue sky; the dinner we had to return home and during which we thought only of going up immediately afterward to finish – we never become the same; sometimes we’re sometimes saddened, but we are always enriched. Through this exposure we learn both the commonality and thrust of our own thoughts – that we are individuals, but not alone. The moment this happens, we are no limited by the confines of our own selves. Wherever we set, our banal boundaries are challenged, and gradually placed somewhere new. A sense of who we are, and, most importantly for children, what we can be.
Let’s go back to what you did. If you are like me, Proust conjured up your own long-stored memories of books and perhaps secret places to read without intrusions. You engaged an array of mental or cognitive processes, both memory and linguistic processes. Promptly, your brain’s attentional systems began to read Proust speedily but still understand it. Next, your visual system swooped quickly across the page, forwarding its letter and word forms, and common phrases, to linguistic systems awaiting the information.
And you no doubt also spotted the deliberate typo in the third second sentence from the bottom: “… we are not limited by the confines of our own …”.
These systems connected the differentiated visual symbols with essential information about the sounds contained in the words. Without a moment of conscious awareness, your highly automatic rules about the sounds of letters in the English system clicked … and used a great many linguistic processes to do so.
This is the essence of what is called the alphabetic principle and it depends on your brain’s uncanny ability to connect and integrate at high speeds what it sees and what it hears to what it knows. And this, in essence, is what we are trying to train AI to do.
And it’s a problem because AI technologies must overcome those three challenges I outlined above: language, context, and reasoning. Today’s AI technologies … yes, even including IBM Watson and Google AlphaGo … struggle to process language the way that humans do. That’s because the large majority of the current implementations approach text as data, not as language. They apply the same techniques that worked on spatial data to text. Current methods for natural language processing are largely driven by computational statistics. These methods don’t attempt to understand the text, but instead convert the text into data, then attempt to learn from patterns in that data. In the conversion process, we lose all context and meaning in the text. The assumption behind such approaches is clearly that, given sufficiently large collections of text, all possible permutations and combinations of meaning must be present. Thus, discovering word-based patterns should reveal the intelligence in the text, which can then be acted upon. Unfortunately, that outcome doesn’t occur in most real-world situations.
But the approach is changing, dramatically, on many fronts especially with hybrid systems that use existing data storage capacity of conventional computers paired with smart AI and a neural network. I will have a much more detailed piece on all of this coming up in my artificial intelligence series.
Oh, yes. Sarcasm. To read the full sarcasm study click below:
“Are Word Embedding-based Features Useful for Sarcasm Detection?”
No comments yet... Be the first to leave a reply!