Predicting prison terms and parole
Imagine your future was determined by statistics inputted and spit out by a computer. That is just what is occurring to convicted criminal defendants and those scheduled to be released from prison, in algorithm data programs, called predictive analysis, being used by criminal justice professionals throughout the country. Criminal justice algorithms – sometimes called risk assessment tools (RAT) – are used at every stage of the criminal justice process, ranging from pre-trial decisions to determine where a defendant should be sent. At their heart, these sort of tools were created to help with taxing workloads, to cut government costs, and get rid of human bias. Richard Berk, a professor of criminology and statistics at the University of Pennsylvania, designed various algorithms currently being used by the Pennsylvania criminal justice system. His programs are used by the probation and parole department to determine what level of supervision to provide someone once they are released. An algorithm is a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer. Berk's algorithms were created using machine learning, an algorithm that searches for associations between inputs and outputs. Factors like age, prior record and an individual's most recent crime, not including the one they are currently being arrested for, are included into the equation. Then the algorithm searches historically for similar people, crimes and whether or not they had trouble once they were released from prison. "Once those associations are established, then subsequently when you don't know whether someone is going to get in trouble or not, you put their characteristics back into the algorithm, and it spits out whether they’re a risk or not," Berk said. Berk acknowledged that algorithms that project risk will make mistakes sometimes and will appear to be biased, as well as to some they will be a black box. But he pointed out the human mind is a black box too, and decision makers often have biases that can be unconscious, known as unconscious bias. In addition, algorithms can weigh hundreds of risk factors at once, while the human brain is much more limited. Then, it can be updated to adjust as circumstances change. An example Berk gave was how in Pennsylvania, new laws have been established which distinguish between domestic violence and general assault, so that statutory difference effects sentencing. "So an algorithm that’s say, trying to predict domestic violence, before that distinction was made, it’s going to get different results than an algorithm developed after that distinction was made," Berk said. "You have to start keeping an eye on things that are changing that might affect the credibility of your forecast and when things change you have to recalibrate your algorithm." Making sure the algorithm is used correctly is vital to its success and accuracy. When asked what sort of benefits using an algorithm could be attained, he said it's more accurate, more fair, and more transparent. "They're just better," Berk said. Not every expert agrees. "I don’t think any individual person who comes before the criminal justice system should be identified as a risk or as someone who is likely to reoffend or someone who's a danger to be put on probation or parole based upon some statistical model," said Neil Rockind, criminal defense lawyer, Rockind Law in Bloomfield Hills. In 2014, Attorney General Eric Holder submitted a statement to the United States Sentencing Commission stating, "As analytical tools transform risk assessment instruments, there is great potential for their use, but also great dangers." "It’s a nice theory in some ways because it’s a response to the idea in the past that we had parole boards that would consider who should get out early, and there was a concern it was just political things where people would get out if they were friends with so-and-so, and this takes the politics out of it, the discrimination out of it, and bases it on math and empirical data," said Jim Felman, a criminal defense attorney in Tampa, Florida. "But the reality of it is there’s just a lot of questions about whether that’s really possible or whether it’s just masking the same sorts of discrimination and hiding it." According to the Electronic Privacy Information Center, a non-profit research center focused on emerging privacy and related human rights issues, the most commonly used are pre-trial risk assessment tools, which are utilized in nearly every state. According to a database recently released by the Media Mobilizing Project in Philadelphia and MediaJustice in Oakland, California, over 60 percent of the U.S. population lives in a jurisdiction that uses a risk assessment tool. Only four states do not – Arkansas, Massachusetts, Mississippi and Wyoming. RATs can be both "sophisticated mathematical formulas, run by computers, as well as straightforward scoring guides for questions on checklists that are asked by a court officer before bail hearings," as defined by the Community Justice Exchange's recently released guide to confronting pre-trial risk assessment tools. The latter is meant to be a predictive tool, no different than its more mathematical counterpart. These pre-trial tools are used to predict the likelihood of certain outcomes if a person accused of a crime is released from jail. Most pre-trial risk assessments measure the risks of failing to appear in court and engaging in new criminal activity. The outcomes produced from the pre-trial risk assessment tool depend on what is being used. Five of the most commonly used in the U.S., according to the database by the Media Mobilizing Project and MediaJustice, are the Public Safety Assessment (PSA), Ohio Risk Assessment System-Pretrial Assessment Tool (ORAS-PAT), Virginia Pretrial Risk Assessment Instrument (VPRAI), Virginia Pretrial Risk Assessment Instrument-Revised (VPRAI-R), and Correctional Offender Management Profiling for Alternative Sanctions (COMPAS). In Oakland County, Michigan PRAXIS – a risk-based tool – has been used since 2009, to provide guidance to pre-trial service investigators while making bail recommendations. It contains recommendations for items including release, release with conditions, release on recognizance not recommended, or bond denied. Michigan PRAXIS was based on VPRAI, developed by Dr. Marie VanNostrand, who runs Luminosity, Inc. and the Virginia Department of Criminal Justice Services. Oakland County contracted with Dr. VanNostrand, who developed the tool to be locally implemented in Oakland County. The factors selected – which range from the type of charge to having a criminal history or history of drug abuse – are based on those used in VPRAI. Each of the eight factors are worth a point that is added to the risk score, except for history of failure to appear, which is worth two points. A defendant can be awarded as many as nine points, and their score is then calculated to determine if the risk level is low, average, or high. The risk factor is designed to be only used internally. The data set used to conduct the original research for VPRAI was from a sample of defendants arrested in one of seven localities in Virginia. According to a 2009 VPRAI report, the database varied in community characteristics, based upon type of community, sex, race, percentage of people below the poverty level, and education level. That final sample for the database included almost 2,000 adults and was finalized in 2001. While pre-trial outcome was the dependent variable, there were 50 independent variables, or risk factors. Univariate, bivariate, and multivariate analyses were used to determine the statistically significant predictors of pre-trial outcome. According to the report, "The univariate analysis included descriptive statistics of the dependent variable, pretrial outcome – success or failure pending trial, and each independent variable, risk factor. The bivariate analysis included an examination of the relationship between each risk factor and pretrial outcome. The risk factors found to be statistically significantly related to pre-trial outcome were identified and used to conduct the multivariate analysis. The multivariate technique logistic regression was used to identify nine statistically significant predictors of pre-trial outcome." Those nine predictors were then assigned weights and used to determine risk levels. Michigan PRAXIS eliminated the outstanding warrant(s) factor, which was also removed from VPRAI-revised. Michigan PRAXIS only uses high, medium, and low factors, compared to VPRAI's five suggested risk levels. Eric Schmidt, chief of field operations, Oakland County Community Corrections, said they believe this tool is more beneficial compared to others, like PSA, used by Wayne County, because it is an interview-based tool. "It allows the individual being scored to reflect additional critical information that might indicate aggravating or mitigating circumstances," he said. Aggravating or mitigating circumstances are additional considerations put into the reports, which go in the court file. "Someone might score high but they could have those mitigating factors, so we inform the courts of those things," he said. On the report, each of the eight factors are listed and then indicated in the written language – such as by saying the defendant is charged with a felony offense – which ones apply to that defendant. The report will also include if the recommendation they're making is to release, release with conditions, release on recognizance not recommended, or if bond is denied. With PRAXIS, no individual is ever labeled as low, average, or high risk, except internally. "We think that that brings in an emotional response to the judicial officer...if you say someone is high risk, what does that make somebody think? This person’s a problem," Schmidt said. Before Oakland County began utilizing Michigan PRAXIS in 2009, Schmidt said they would make a completely subjective recommendation following the court ruling and other case law through interviews and verifying information, much like they do now. Prior to their current program, they recommended a money amount for bail because they felt that providing a dollar amount would help the court understand what the lowest amount was so the person could gain release and have the incentive to appear. They learned those practices didn't hold much weight in research. “It’s important to know that although there may be risk assessments that are conducted, you have to understand what they’re for, what they’re specifically designed for,” said Paul Walton, Chief Oakland County Deputy Prosecutor. “But in all cases, although they may be required to compute or calculate, they are not mandatory on the decision-maker to use.” While Oakland County has worked with a more evidence-based approach and humanistic approach to pre-trial risk assessment, other counties in the state, like Wayne, and numerous jurisdictions across the country, use PSA, which was developed in 2013 by the Laura and John Arnold Foundation. PSA primarily relies upon court and police data to assess an individual's risk, and is scored without speaking to a defendant. With this tool, three different areas are measured – risk of failure to appear; risk of new criminal activity; and risk of new violent criminal activity. The first two outcomes are used with a scale of 1 to 6, while the last one is simply given a "yes" or "no" flag. The PSA tool was developed using over 750,000 cases from seven state court systems: Colorado, Connecticut, Florida, Kentucky, Maine, Ohio, and Virginia, as well as the federal pre-trial system and Washington D.C. The initial algorithm was built on computers using those thousands of cases from various jurisdictions to determine risk factors. When used in courts, though, it's much simpler than that. "It’s really just checking off boxes, adding up points, and arriving at a risk score," said Colin Doyle, a staff attorney at the Criminal Justice Policy Program at Harvard Law School, who works on bail and pre-trial reform across the country at the local and state level. Unlike with Michigan PRAXIS, tools like PSA don't always allow for explanation or comments about the risk factors being used for the predictive outcome. That is where organizations like Silicon Valley De-Bug come in. Located in San Jose, California – where during arraignment hearings an algorithm is used – the organization brings personal information to hearings that has been obtained from interviews with the family's of defendants. The information is then shared with defenders. "The algorithms and the use of the tools sort of strip away people’s humanity and also the context," said Pilar Weiss, director, Community Justice Exchange. The Community Justice Exchange, a national organization which provides support to community-based organizations, is one of many leading the way in the fight against algorithms being employed in the criminal justice system. There's also the Media Mobilizing Project and MediaJustice, which recently released a database on risk assessment tools being used across the U.S. Media Mobilizing Project's Policy Director, Hannah Jane Sassaman, said their research started in 2016, and was harder to obtain than they originally thought it would be. What started as a decision to look at 40 to 50 cities ended up delving into a database with information from over 300 jurisdictions covering 1,000 counties across the country. Their study did full-length interviews with 38 jurisdictions, as well as producing the database. In those interviews, they found that most people they talked to did not have data regarding whether or not the analytic tools were making their jail population's smaller or changing the racial disparities. "We really came to the conclusion that you can’t make overarching statements about how useful these tools are in decarceration because a, they’re not, and b, they aren’t tracking it," said Jenessa Irvine, a policy and research organizer at Media Mobilizing Project. There also wasn't data found in regards to how often judges follow the recommendations and what the impact is. While part of the intention to use risk assessment tools is to reduce racial bias, many would argue it does just the opposite. "What happens with the AI (artificial intelligence) type of models depends heavily on what is the data set they are trained in...If you put garbage in, you get garbage out," said Meir Shillor, a professor in Oakland University's Department of Math and Science. "If it’s biased, then their predictions are going to be biased." Shillor said that happens all the time. In 2016, ProPublica released a bombshell report looking at the underlying bias with predictive algorithms. "A lot of these tools don’t include race as a factor...but part of what the argument is that it doesn’t matter whether race is in there or not because it’s sort of baked into the system, even though you aren’t specifically defining somebody’s race," said Wendi Johnson, assistant professor of Criminal Justice, Department of Sociology, Anthropology, Social Work and Criminal Justice at Oakland University. Factors like criminal record are used as a proxy for race. Johnson said part of the problem is relying on past criminal history, an important factor in many assessments, like the PSA tool, which concluded from its original algorithm data that criminal history was among one of the three strongest predictors of failure to appear, new criminal activity, and new violent criminal activity. Irvine pointed out that while something as explicit as a zip code doesn't ask about things like residential stability, housing, employment stability, education, mental health, and substance abuse, all actually are baked into the system. "If you go scratch under the surface, all of those variables have a lot to do with race and class," Irvine said. The generalized outcomes themselves also pose an issue. "Once we expand the category to being as general as how likely are you to be arrested again, you’re increasing the number of people who are viewed as risky, and in a way that’s going to increase the racial bias dramatically in the results," said Chelsea Barabas, PhD candidate at Massachusetts Institute of Technology. "They just are using data points that reflect the facts that black and brown communities are criminalized by the legal system," said Weiss from the Community Justice Exchange. Even Berk from University of Pennsylvania admitted that bias can happen in algorithms. "The algorithm is innocent...If that information is biased in some way, everybody is going to be affected by it, including the algorithm," Berk said. "It’s just using the same information everybody else uses." How accurate are these risk assessment tools, especially pre-trial, anyway? It's the same as the flip of a coin, many experts contend. "For around 65 percent of cases, risk assessment tools correctly identify whether someone on pre-trial release will be arrested," Criminal Justice Policy Program's Doyle said. "For around 35 percent of cases, they make an incorrect prediction." Oakland University's Johnson said these sort of tools do a much more accurate job in predicting someone's recidivism rate if it's a low-risk case, compared to a high-risk offender. "Nobody wants to be responsible for turning loose somebody who may go on to commit a violent offense," she said. "So people are really reluctant to make a mistake when you’re talking a high-risk offender." The First Step Act, a federal bipartisan criminal justice reform bill, passed in 2018. One of the aspects of the federal law was to mandate that the Federal Bureau of Prisons (FBOP) create a risk and needs assessment tool to be used to release people from early imprisonment based on their risk scores, which fall into four categories – high, medium, low and minimal. The risk assessment tool mandated by the First Step Act is used only with regard to federal prisoners by the FBOP in every state with a federal facility. Even though each state is free to enact its own criminal laws, Felman, an attorney in Tampa, said the federal system often serves as a model for states. "If it is perceived that this is a good idea at the federal level, it is likely some states will follow suit," he said. The tool was required to be disclosed and vetted last summer, and was then used to evaluated everyone by the end of January 2020. Felman is chair of the American Bar Association Criminal Justice Section Task Force on the implementation of the First Step Act, and is also the chairman of the National Association of Criminal Defense Lawyers (NACDL) Task Force for the implementation of the First Step Act. Each task force is trying to further understand the use of the tool. Right now, they just have a lot of questions. "Still no one has seen the tool itself, no one has seen any of the data that underlies the creation of the tool so you could actually verify how they did it," he said. The tool's data was taken from roughly 200,000 federal prisoners over the last five years, Felman said. While Felman likes to believe the two PhDs who helped write the tool, Dr. Grant Duwe and Dr. Zachary Hamilton, took the data and did their best to make a tool that's mathematically-sound, he and others would still love to be able to see the choices and decisions that were made, especially the weight on some of the risk factors, like prior record. Felman said that the tool will put people into four different categories, but it doesn't say how that is to be done. "Apparently, they considered more than a dozen different ways to arrive at these cut points," Felman said. "They have not disclosed what any of the other cut point methodology that they considered and rejected were." Even though the federal government is providing more risk assessment tools to be used in practice, some states are fighting against the use of algorithms. This past February, the Ohio Supreme Court removed language from proposed criminal justice reforms that would have required all judges in the state be provided with pre-trial risk assessment tools. In 2019, Idaho passed a law that required the methods and data used in bail algorithms to be made available to the public. Not all algorithms are transparent, though. "It’s one thing to just disclose these are the factors that are in the risk assessment, this is what weights they have, but also they haven’t shared underlying validation data with researchers so we can see for ourselves how they came up with the instrument," said law professor Brandon Garrett, Duke University. Garrett mentioned that since most of these instruments are created by non-profits, developers, and the government, and not dominated by private companies, there's a real opportunity for adoption of a more transparent and fair processes. There will still be the occasional risk assessment tool created by a private company, though. A 2017 article from Harvard Law School, "Algorithms in the Criminal Justice System: Assessing the Use of Risk Assessments in Sentencing," stated that, "Because COMPAS is proprietary software, it is not subject to federal oversight and there is almost no transparency about its inner workings, including how it weighs certain variables." COMPAS, standing for Correctional Offender Management Profiling for Alternative Sanctions, was originally intended for accessing a defendant's recidivism rate, but can also be used to assist judges in sentencing decisions, as it does in Michigan. According to Holly Kramer, communications representative, Michigan Department of Corrections, COMPAS – which is validated annually based on Michigan data – provides information on validated risks and needs factors to help inform decisions around supervision, case management, and treatment. This information is used to determine programming and assist in other interventions that decrease the likelihood someone will reoffend and will help ensure their long-term success in the community. Scales measure criminogenic and historical factors, such as a defendant's criminal justice system history, education, social environment and employment. In Oakland County COMPAS, created by the company Northpointe, is used as well as Michigan PRAXIS. Judge Michael Warren of Michigan's Sixth Judicial Circuit Court in Oakland County said he mostly sees information from COMPAS. By the time he receives a bond report, someone else has already set the conditions of bond – usually a district court judge or magistrate. The Michigan Department of Corrections creates a pre-sentence investigative report for every defendant who is being sentenced to a felony, which is where a COMPAS report is put and then given to a judge, like Warren. What a judge or lawyer receives is a table with multiple columns. One includes a Core COMPAS Needs Scale, which will have factors ranging from the defendant's crime opportunities and criminal peers. Next to that is a score that rates the perceived likelihood. There's also a column included for supervisor recommendations, such as extending treatment to a defendant with a substance abuse problem. Then there's an overall recommendation. When asked how much of a role the scores on these types of predictive tools play in a sentencing decision, he said it ranges on a case-to-case basis. "For me, this is going to sound like a cop out, but I give it the weight I think it deserves," Warren said. "If I have a criminal defendant who has repeatedly engaged in domestic violence, has violated probation in the past, and the guideline range is very high, and they come back with a recommendation of probation and no jail, and no domestic violence program, I’m going to think that’s a terrible recommendation and I’m going to do what I think is right," he continued. “They’re given a recommendation basically,” said Walton from the prosecutor's office. “I think it’s going to be dependent on the individual member of the judiciary as to whether they're going to dive deep into exactly what it is.” The same would go on the opposite side of the spectrum, though. If in a similar case he was given a recommendation with a firm jail sentence with domestic violence programming and a GPS tether, then he would say that’s spot-on. While Warren can see the advantages to these sort of tools – he said it provides a hopefully objective measure that’s scientifically based that helps the judge craft a sentence that is most likely to rehabilitate the defendant – he can also see the disadvantages, such as a score not getting all of the factors of a defendant. "You may have circumstances where a recommendation based on COMPAS is completely different than everything else in my heart and mind is telling me about a particular defendant," Warren said. There are also causes when a COMPAS score indicates that a person isn't salvageable and shouldn't be given probation and sent to jail, but once in front of him they could give a compelling elocution about why they want to go to rehab or discuss many factors that aren't in the chart that could persuade him to go against the score. Warren said a defendant and their lawyer are always welcome to challenge a pre-sentence investigative report. Sometimes that sort of challenge gets taken to the Supreme Court, as in the 2016 Wisconsin Supreme Court case, State v Loomis. The defendant, Eric Loomis, became one of the first cases in the country to take on whether a judge's consideration of a software-generated program's risk assessment score during Loomis' sentencing violated due process. Loomis was determined high-risk by the risk score calculation, created by COMPAS. While all three judges in the case hesitated at the bias potential from these sort of algorithms, they ultimately sided against Loomis. "It’s very easy for a judge to say, 'oh, that’s just one piece of the puzzle, one factor I’m considering,' but sometimes as human beings it’s also very easy to be heavily influenced by a conclusion that someone’s high-risk," said Christopher Slobogin, director, Criminal Justice Program, Vanderbilt University. "I would be influenced by that." "I, and a lot of other scholars, think it was an obvious and serious due process problem," said Duke's Garrett about State v Loomis. "I think one of the dangers of an algorithm, like in the Loomis case, is if it’s not well-designed and you don’t even understand how it works, how it’s designed, then people aren’t going to be able to use that information in a sound way." Garrett thinks that more cases like State v Loomis will come up in the future. "I hope so and I hope lawmakers consider passing statutes that blanket require that quantitative information in criminal cases be fully disclosed to defense and researchers," he said. A lot of the work Garrett has done has focused on risk assessment used in sentencing, with a look at how judges and lawyers use it in practice. Even though Garrett sees similar cases coming up in the near future, he doesn't feel algorithms raise an issue one way or the other about their ethical use. The data being relied on is the same that would be presented to a judge without the use of an algorithm, like age, race and criminal history. Garrett does say he thinks we may reach a natural limit as to how much can be predicted, though. So does Doyle. "I think what we’re going to learn as we continue to study these tools is that there are just limits to how much we can know about violent crime in the future at an individual level," Doyle, a staff attorney at Harvard Law School, said. No tool will ever be perfect either. "You could run tons and tons of data and you’re always going to have a standard error. It’s one tool," Stephanie Hartwell, dean, College of Liberal Arts and Sciences at Wayne State University, said. Hartwell is also a nationally-known sociologist and expert on corrections and recidivism. Vanderbilt's Slobogin agrees that these predictive tools have all sorts of problems, including the ability to reinforce bias, but what is the alternative to not using them. Just tell a judge to go figure it out? "A lot of people say they should just be one part of the judge's decision, but the judge gets to make the ultimate decision – but based on what? Where’s he getting his calculations? It opens the door wide to an incredible amount of discretion, some of which will be used against defendants, some of which will be used in their favor," he said. Slobogin said that if we ever have a algorithm that's well-done enough, we should probably require judges to abide by that, unless the reasons for not doing so, like if the person has found religion and is now completely different, are factors that the developer did not investigate in constructing the instrument. There may be a limit to predicability and issues with bias, transparency, and fairness, but many agree these sort of predictive tools aren't going anywhere. Berk, at University of Pennsylvania, thinks they'll take over and continue to be used more, but hopes users never reach a point where a judge or probation officer relies solely on a number produced from an algorithm score. "To develop a decision solely based on the algorithm is to throw out information that might be useful," he said. At the end of the day, though, how do people like Judge Warren feel about the future of their profession and the use of risk assessment tools? "To use an appropriate pun – I think the jury is still out," Warren said.