How data can make immigrants look like criminals

US President Donald Trump plans to collect a lot more data about crimes committed by immigrants.

This will inevitably give him a weapon to use against them, thanks to a peculiarity of crime statistics: If you look for something, you'll almost always find more of it.

Mr Trump recently started two initiatives focused on crime.

He has promised to create a new office in the Department of Homeland Security - called Victims of Immigrant Crime Engagement - to collect data on the transgressions of immigrants.

And in his revised executive order halting visas and refugees from certain countries, he called for a public database on "honour killings", defined as gender-based violence against women by foreign nationals.

It's hard to get to the truth about crime. One could even argue that we don't really have crime data at all. Rather, we have information on arrests and reports, neither of which are great proxies for actual crimes.

A lot of criminal activity - drug use, small-time theft, trespassing, turnstile jumping - never gets recorded unless a police officer happens to be present.

Most rapes go unreported, and as many as a third of all murders are never solved.

If we start overscrutinising immigrants from Muslim-majority countries, the numbers might well change to their detriment, giving the Trump administration the fodder it needs to engage in yet more profiling purported to ensure the nation's security.

The incompleteness of the data means that what we decide to collect can have a big impact on what we see.

If we spend a lot of time and energy finding and documenting crime committed by a certain sub-population, we'll naturally increase its prominence.

This wouldn't mean that such people are more criminal. They're simply getting a different level of scrutiny.

Consider how police departments have focused on nuisance crimes in poor and minority neighbourhoods - part of a broader strategy known as "broken windows policing".

Blacks ended up getting arrested for smoking marijuana a lot more often than whites - even though people of both races actually use the stuff at about the same rate.

Similarly, the Chicago Police Accountability Task Force found that black drivers were much more likely than white drivers to be stopped on suspicion of carrying contraband, even though they were less likely to be actually carrying contraband.

Despite the obvious flaws in arrest data, we still use it in designing policies.

Police departments send more officers to areas where they make the most arrests. Judges consider previous arrests in deciding how harshly to sentence an offender.

Computer algorithms use the data to predict where crimes will occur ("predictive policing"), determine how much bail to demand and whether to free prisoners on parole ("recidivism risk").

All those decisions are as biased as the data on which they are based - an ongoing problem for poor people and minorities, who find themselves increasingly surveilled and incarcerated.

Perhaps you've already heard the statistic that immigrants are involved in less crime than native-born Americans.

If we start overscrutinising immigrants from Muslim-majority countries, the numbers might well change to their detriment, giving the Trump administration the fodder it needs to engage in yet more profiling purported to ensure the nation's security.

To be fair, and to be scientific about it, we should choose another sub-population for equal focus, so we can measure the effects of our added attention. I suggest starting with politicians.

BLOOMBERG

•The writer is a mathematician who has worked as a professor, hedge fund analyst and data scientist. She founded ORCAA, an algorithmic auditing company, and is the author of Weapons Of Math Destruction.

A version of this article appeared in the print edition of The Straits Times on March 11, 2017, with the headline 'How data can make immigrants look like criminals'. Print Edition | Subscribe