NYCDS Testifies at City Hall about Algorithms in the Criminal Legal System

Testimony of

Christopher Boyle[1]

Director of Data Research and Policy

New York County Defender Services

Before the

Committee on Technology

Oversight Hearing – Follow up on Local Law 49 of 2018 in Relation to Automated Decision Systems Used by Agencies

Intros. 1447-2019 & 1806-2019

January 22, 2020

My name is Christopher Boyle and I am the Director of Data Research and Policy at New York County Defender Services (NYCDS). We are a public defense office that represents New Yorkers in thousands of cases in Manhattan’s Criminal and Supreme Courts every year. I have been a New York City public defender for more than twenty years. Thank you to Chair Holden for holding this hearing on the use of automated decisions systems, or algorithms, by city agencies. There is an urgent need for greater transparency regarding these systems.

New York City has spent the past two and a half years reviewing and discussing how city agencies use automated decision systems. But it feels like, despite legislation, public hearings, and task force reports, we have barely inched closer to true transparency. We are pleased to see the introduction of two new bills that we hope will mandate city agencies to disclose meaningful information about their use of automated decision systems, and we urge certain amendments that would allow members of the public to actually hold these systems and agencies accountable.

Algorithms play an increasingly large role in the criminal legal system

“An ‘automated decision system’ is any software, system, or process that aims to aid or replace human decision making. Automated decision systems can include analyzing complex datasets to generate scores, predictions, classifications, or some recommended action(s), which are used by agencies to make decisions that impact human welfare.” - Janai Nelson, Associate Director-Counsel of the NAACP Legal Defense Fund[2]

Automated Decision Systems are routinely used to inform actions at every step of the legal system. From the locations to which police are deployed to who gets released pretrial; from access to treatment and programs to the length of one’s sentence or their eligibility for parole; algorithms are significantly influencing important criminal justice decisions. While a primary objective of such programs is to eliminate the effects of race or class biases, numerous studies have shown that without proper oversight, “risk assessments unintentionally amplify [these]... under the guise of science.”[3] We have put together a chart of all of the ADS that we are aware of that affect our clients throughout the life of their criminal case. See attachment.

At present, we do not have access to information regarding how many ADS are used in New York City, nor do we know for what purposes they are being implemented. This must change.

The limitations – and harms – of predictive algorithms in the criminal legal system

Often omitted in any discussion when designing predictive algorithms is algorithmic bias. The use of technological jargon and scientific speech can obscure the bias in the design of classification and risk prediction algorithms. For example, an algorithm that predicts repetition of a crime that is based on race, class, and other marginalized groupings will ignore the history of oppression that causes certain groups to be overrepresented in crime statistics. Therefore, the algorithm may assign artificially high risks of reoffending to already marginalized groups and magnify historical oppression. However, mentions of neural nets and machine learning can allow us to forget this.

ADS designed to predict human behavior are trained using historical data. Thus, the predictions generated by these tools reflect decades of over-policing of communities of color (i.e. stop-and frisk, broken-windows policing) as well as disproportionate enforcement of specific charges (i.e. petty theft/minor drug offenses)[4]. NYPD uses predictive policing algorithms informed by such data, as well as a number of other ADS which have repeatedly produced unreliable outputs; especially when identifying women, children, and people with darker complexions (including but not limited to facial and vocal recognition, video analytics, and various forms of social media monitoring).⁴ Absent transparency, we cannot know how many such systems are currently in use, nor if they are subjected to any validity testing.

Even when attempting to produce race-neutral algorithms, many systems unintentionally include proxies for race and/or socioeconomic status (i.e. education level, employment status, ZIP-code, recent address changes, arrest history, prior FTAs).³^,[5]Thus, outputs are still likely to deem people from these communities high risk, which may increase their rate of being held in jail. “There is strong evidence that people who are held in jail as they await court hearings plead guilty at considerably higher rates than do people who are released. The resulting conviction would then serve as an additional data point held against them the next time they are arrested, leading to a vicious circle.”[6]^,³

In 2016 ProPublica analyzed a popular risk assessment tool used across the country to inform pre and post-conviction judicial decisions (Northpointe, Inc’s Correctional Offender Management for Profiling Alternative Sanctions-COMPAS). The results revealed that the algorithm was only slightly more accurate than a coin flip at predicting overall recidivism and predicted risk for violent recidivism only 20% of the time. In addition, black defendants were almost twice as likely as white defendants to be “false positives” (labeled “high risk” when they did not go on to commit another crime). White defendants, on the other hand, are far more likely to be misclassified as “low risk”.⁵ It was this ProPublica report that spurred City Council to act and pass Local Law 49 of 2017 to create the Automated Decision Systems Task Force.

The ADS Task Force

NYCDS previously supported the creation of the ADS Task Force in 2017, along with other New York City defenders and civil rights advocates, and we attended a task force public hearing in Manhattan in 2019.

While the Task Force was an important first step in assessing the breadth and scope of the use of ADS in city agencies, the final report fell short of advocates’ goals for increased transparency. Critically, Local Law 49 failed to require city agencies to disclose information about ADS to the task force and the task force could not come to a consensus about what types of ADS should fall under the purview of the task force.

Despite the efforts and resources put towards the task force, the public remains in the dark about what algorithms exist in our city’s agencies, how they operate, and whether they can be considered scientifically valid.

What must subsequent legislation do?

We, the public, must have the necessary information to hold ADS accountable. This includes access to the data used in algorithm collection, methodology behind data collection and algorithm design, the algorithm itself, and performance and precision metrics. Additionally, any algorithm that can impact the precarious lives of the most vulnerable New Yorker must be vetted through the process of open scientific peer review through open access journal publication of the algorithm. Algorithms used in medicine are subject to no less and have a similar impact on people’s lives.

Overwhelmingly, studies have shown that the best way to ensure these ADS do not perpetuate historical injustices are to:

Avoid parameters which can serve as proxies for race or socioeconomic status,
Be transparent: we should know which systems are being used and how; this will allow for development of oversight and best/more consistent practices, and
Allow for consistent, rigorous validity testing, preferably by institutions outside of the agency using the system

There’s a saying in computer science and statistics: “garbage in and garbage out.” If your data is fraught with selection bias, it will produce bad conclusions. This can be worsened by incorrectly or dishonestly applying statistical techniques. This is self-evident in the reproducibility crisis in the sciences. Therefore, it is imperative that those designing algorithms are trained in research methodology enabling them to appropriately address sources of bias and confounding, and that they be scrutinized by senior scientists and citizen scientists.

Int. 1447-2019 - A Local Law to amend the New York city charter, in relation to an annual inventory of agency data

NYCDS supports passage of Int. 1447-2019. At a bare minimum, as this law proscribes, the public should know what kind of data is being collected and stored by city agencies. However, this bill only requires that this information be reported to the Mayor and Speaker of the Council. We urge that this information be made publicly available on the Mayor’s Office of Data Analytics website, or at the very least that the Office of Data Analytics create a process for members of the public to access this information by request. We also urge that the Mayor’s Office of Data Analytics be required to offer annual recommendations to the Council about the future of data analytics in New York City and steps the Council can take to improve public accountability.

Int. 1806-2019 - A Local Law to amend the administrative code of the city of New York, in relation to reporting on automated decision systems used by city agencies

NYCDS similarly supports passage of Int. 1806-2019, which goes further than Int. 1447, in requiring reporting by city agencies about ADS. Primarily, this bill defines ADS and thus lays out the parameters of what types of ADS agencies would be required to report on.

However, we believe that the information that this bill requires reporting on is insufficient to ensure public accountability. For example, the new Criminal Justice Agency release assessment was developed over the past several years to better provide courts with additional information about an accused person’s likelihood to return to court. CJA has released significant underlying information about the algorithm on their website.[7] This is the kind of information that we believe should be released for every ADS used in the criminal legal system, as well as other city agencies, but even the CJA website, while a step in the right direction, does not go far enough.[8]

But we believe that even more is needed. As we noted about, the validity of a risk assessment instrument depends on its ability to be validated and replicated by others. Thus, we recommend that agencies be required to provide the underlying data and algorithms to the Office of Data Analytics so that interested third parties, particularly universities and think tanks, can successfully replicate the validation studies and publish the results to the public. The National Institutes of Health has a good model for this, whereby they maintain private health data sets but allow scientists access to the data sets for future research.[9] The Office of Data Analytics should develop a similar process informed by existing models in medical and scientific research to allow for third-party validation and study of city data and algorithms. The data formatting for ADS should also be dictated by the Office of Data Analytics to ensure that researchers can easily use the data.

Finally, the Council should ban city agencies from contracting with companies to purchase or adopt proprietary algorithms that cannot be reviewed by the public. Any such existing agreements must be immediately phased-out or revoked. Our citizens, and particularly those whose liberty hangs in the balance based on ADS in the criminal legal system, must have access to the data underlying these tools to ensure that they are not biased or invalid.

If you have any questions about my testimony, please contact me at cboyle@nycds.org.

[1] Written together with Celia Joyce, Corrections Data Specialist and Willem Van Der Mei, Data Scientist.

[2] Frost, Mary. "Bias and Secrecy Among Pitfalls of NYC's Algorithm Use, Experts Say." Brooklyn Eagle. May 3, 2019. https://brooklyneagle.com/articles/2019/05/03/bias-and-secrecy-among-pitfalls-of-nycs-algorithm-use-experts-say/.

[3] Picard, Sarah, Matt Watkins, Michael Rempel, and Ashmini G. Kerodal. "Beyond the Algorithm: Pretrial Reform, Risk Assessment, and Racial Fairness." Center for Court Innovation. July 2019. https://www.courtinnovation.org/sites/default/files/media/documents/2019-06/beyond_the_algorithm.pdf.

[4] Díaz, Ángel. "New York City Police Department Surveillance Technology." Brennan Center for Justice. October 4, 2019. https://www.brennancenter.org/our-work/research-reports/new-york-city-police-department-surveillance-technology.

[5] Angwin, Julia, Jeff Larson, Lauren Kirchner, and Surya Mattu. "Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks." ProPublica. May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

[6] Wykstra, Stephanie. “Philosopher’s Corner: What is “Fair”?: Algorithms in Criminal Justice.” Issues in Science and Technology 34, no. 3 (Spring 2018). https://issues.org/perspective-philosophers-corner-what-is-fair-algorithms-in-criminal-justice/

[7] New York City Criminal Justice Agency, Release Assessment, available at https://www.nycja.org/release-assessment.

[8] For example, the CJA website makes no mention of external validation and which metrics are going to be used to evaluate the validity of the algorithm. The website also does not reveal anything about the technical aspects of the algorithm. This is information that we believe should be available to researchers upon request.

[9] See, e.g., NIH National Cancer Institute Genomic Data Commons, Obtaining Access to Controlled Data, available at https://gdc.cancer.gov/access-data/obtaining-access-controlled-data.

Testimony posted January 22, 2020