From dating profiles to Brexit - how to spot an online lie

There are three things you can be sure of in life: death, taxes - and lying. The latter certainly appears to have been borne out by the United Kingdom's recent Brexit referendum, with a number of the Leave campaign's pledges looking more like porkie pies than solid truths.

But from Internet advertising, visa applications and academic articles to political blogs, insurance claims and dating profiles, there are countless places we can tell digital lies. So how can one go about spotting these online fibs?

Well, Mr Stephan Ludwig from the University of Westminster, Mr Ko de Ruyter from City University London's Cass Business School, Mr Mike Friedman of the Catholic University of Louvain and yours truly have developed a digital lie detector - and it can uncover a whole host of Internet untruths.

In our new research, we used linguistic cues to compare tens of thousands of e-mail pre-identified as lies with those known to be truthful. And from this comparison, we developed a text analytic algorithm that can detect deception. It works on three levels.


Keyword searches can be a reasonable approach when dealing with large amounts of digital data. So, we first uncovered differences in word usage between the two document sets. These differences identify text that is likely to contain a lie. We found that individuals who lie generally use fewer personal pronouns such as I, you and he/she, and more adjectives such as brilliant, fearless and sublime. They also use fewer first-person singular pronouns such as I, me and mine, with discrepancy words such as could, should and would, as well as more second-person pronouns (you, your) with achievement words (earn, hero, win).

Fewer personal pronouns indicate an author's attempt to dissociate himself from his words, while using more adjectives is an attempt to distract from the lie through a flurry of superfluous descriptions.

Fewer first-person singular pronouns combined with discrepancy words indicate a lack of subtlety and a positive self-image, while more second-person pronouns combined with achievement words indicate an attempt to flatter recipients. We therefore included these combinations of search terms in our algorithm.


Another part of the solution lay in analysing the variance of cognitive process words, such as cause, because, know and ought - and we identified a relationship between structure words and lies.

Liars cannot generate deceptive e-mail from actual memory, so they avoid spontaneity to evade detection. That does not mean that liars use more cognitive process words overall than people who are telling the truth, but they do include these words more consistently. For example, they tend to connect every sentence to the next - "we know this happened because of this, because this ought to be the case". Our algorithm detects such usage of process words in communications.


We also studied the ways in which a sender of an e-mail message alters his linguistic style while exchanging a number of e-mail messages with someone else. This part of the study revealed that as the exchange went on, the more the sender tended to use the function words that the receiver was using.

Function words are words that contribute to the syntax, or structure, rather than the meaning of a sentence - for example an, am and to. And senders revised the linguistic style of their messages to match that of the receiver. As a consequence, our algorithm identifies and collects such matching.


Consumer watchdogs can use this technology to assign a "possibly lying" score to advertisements of a dubious nature.

Security companies and national border forces can use the algorithm to assess documents, such as visa applications and landing cards, to better monitor compliance with access and entry rules and regulations.

Secretaries of higher education exam committees and editors of academic journals can improve their proofing tools for automatically checking student theses and academic articles for plagiarism.

In fact, the potential applications go on and on. Political blogs can successfully monitor their social media interactions for textual anomalies, while dating and review sites can classify messages submitted by users on the basis of their "possibly lying" score. Insurance companies can make better use of their time and resources available for claim auditing. Accountants, tax advisers and forensic specialists can investigate financial statements and tax claims and find deceptive smoking guns through our algorithm.

Humans are startlingly bad at consciously detecting deception. Indeed, human accuracy when it comes to spotting a lie is just 54 per cent, hardly better than chance. Our digital lie detector, meanwhile, is 70 per cent accurate. It can be put to work to fight fraud wherever it occurs in computerised content and, as the technology evolves, its Pinocchio warnings can be wholly automated and its accuracy will increase even further. Just as Pinocchio's nose reflexively signalled falsehood, so does our digital lie detector. Fibbers beware.

•The writer is a senior lecturer in marketing at City University London. This article first appeared in, a website of analysis from academics and researchers.

A version of this article appeared in the print edition of The Sunday Times on July 03, 2016, with the headline 'From dating profiles to Brexit - how to spot an online lie'. Print Edition | Subscribe