Julia Stoyanovich is an Associate Professor in the Department of Computer Science and Engineering at the Tandon School of Engineering, and the Center for Data Science. She is a recipient of an NSF CAREER award and of an NSF/CRA CI Fellowship. Julia's research focuses on responsible data management and analysis practices: on operationalizing fairness, diversity, transparency, and data protection in all stages of the data acquisition and processing lifecycle. She established the Data, Responsibly consortium, and serves on the New York City Automated Decision Systems Task Force (by appointment by Mayor de Blasio). In addition to data ethics, Julia works on management and analysis of preference data, and on querying large evolving graphs. She holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics and Statistics from the University of Massachusetts at Amherst.
NYU Affiliations:
- NYU Center for Data Science
- Computer Science & Engineering at NYU Tandon School of Engineering
Awards and Recognition
NSF CAREER: Querying Evolving Graphs (2018)
Member of the NYC automated decision systems task force, appointed by Mayor de Blasio (2018)
Co-PI on a NSF-BSF grant: Databases Meet Computational Social Choice (collaborative with UC Santa Cruz and Technion) (2018)
Lead PI on an NSF BIGDATA grant: Foundations of Responsible Data Management (collaborative with UW, UMich and UMass Amherst) (2017)
Research News
Better transparency: Introducing contextual transparency for automated decision systems
LinkedIn Recruiter — a search tool used by professional job recruiters to find candidates for open positions — would function better if recruiters knew exactly how LinkedIn generates its search query responses, possible through a framework called “contextual transparency.”
That is what a team of researchers led by NYU Tandon’s Mona Sloane, a Senior Research Scientist at the NYU Center for Responsible AI and a Research Assistant Professor in the Technology, Culture and Society Department, advance in a provocative new study published in Nature Machine Intelligence.
The study is a collaboration with Julia Stoyanovich, Institute Associate Professor of Computer Science and Engineering, Associate Professor of Data Science, and Director of the Center for Responsible AI at New York University, as well as Ian René Solano-Kamaiko, Ph.D. student at Cornell Tech; Aritra Dasgupta, Assistant Professor of Data Science at New Jersey Institute of Technology; and Jun Yuan, Ph.D. Candidate at New Jersey Institute of Technology.
It introduces the concept of contextual transparency, essentially a “nutritional label” that would accompany results delivered by any Automated Decision System (ADS), a computer system or machine that uses algorithms, data, and rules to make decisions without human intervention. The label would lay bare the explicit and hidden criteria — the ingredients and the recipe — within the algorithms or other technological processes the ADS uses in specific situations.
LinkedIn Recruiter is a real-world ADS example — it “decides” which candidates best fit the criteria the recruiter wants — but different professions use ADS tools in different ways. The researchers propose a flexible model of building contextual transparency — the nutritional label — so it is highly specific to the context. To do this, they recommend three “contextual transparency principles” (CTP) as the basis for building contextual transparency, each of which relies on an approach related to an academic discipline.
- CTP 1: Social Science for Stakeholder Specificity: This aims to identify the professionals who rely on a particular ADS system, how exactly they use it, and what information they need to know about the system to do their jobs better. This can be accomplished through surveys or interviews.
- CTP 2: Engineering for ADS Specificity: This aims to understand the technical context of the ADS used by the relevant stakeholders. Different types of ADS operate with different assumptions, mechanisms and technical constraints. This principle requires an understanding of both the input, the data being used in decision-making, and the output, how the decision is being delivered back.
- CTP 3: Design for Transparency- and Outcome-Specificity: This aims to understand the link between process transparency and the specific outcomes the ADS system would ideally deliver. In recruiting, for example, the outcome could be a more diverse pool of candidates facilitated by an explainable ranking model
Researchers looked at how contextual transparency would work with LinkedIn Recruiter, in which recruiters use Boolean searches — AND, OR, NOT written queries — to receive ranked results. Researchers found that recruiters do not blindly trust ADS-derived rankings and typically double-check ranking outputs for accuracy, oftentimes going back and tweaking keywords. Recruiters told researchers that the lack of ADS transparency challenges efforts to recruit for diversity.
To address the transparency needs of recruiters, researchers suggest that the nutritional label of contextual transparency include passive and active factors. Passive factors comprise information that is relevant to the general functioning of the ADS and the professional practice of recruiting in general, while active factors comprise information that is specific to the Boolean search string and therefore changes.
The nutritional label would be inserted into the typical workflow of LinkedIn Recruiter users, providing them information that would allow them to both assess the degree to which the ranked results satisfy the intent of their original search, and to refine the Boolean search string accordingly to generate better results.
To evaluate whether this ADS transparency intervention did achieve the change that can reasonably be expected, researchers suggest using stakeholder interviews about potential change in use and perception of ADS alongside participant diaries documenting professional practice and A/B testing (if possible).
Contextual transparency is an approach that can be used for AI transparency requirements that are mandated in new and forthcoming AI regulation in the US and Europe, such as the NYC Local Law 144 of 2021 or the EU AI Act.
Teaching Responsible Data Science: Charting New Pedagogical Territory
Julia Stoyanovich, director of the Center for Responsible AI (R/AI) at NYU Tandon, and assistant professor of computer science and engineering and of data science, co-authored this paper with Armanda Lewis, a graduate student pursuing her master’s at the NYU Center for Data Science.
The authors detail their development of and pedagogy for a technical course focused on responsible data science, which tackles the issues of ethics in AI, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection.
The ability to interpret machine-assisted decision-making is an important component of responsible data science that gives a good lens through which to see other responsible data science topics, including privacy and fairness. The researchers’ study includes best practices for teaching technical data science and AI courses that focus on interpretability, and tying responsible data science to current learning science and learning analytics research.
The work also explores the use of “nutritional labels” — a family of interpretability tools that are gaining popularity in responsible data science research and practice — for interpreting machine learning models.
- In the paper, the investigators offer a description of a unique course on responsible data science that is geared toward technical students, and incorporates topics from social science, ethics and law.
- The work connects theories and advances within the learning sciences to the teaching of responsible data science, specifically, interpretability — allowing humans to understand, trust and, if necessary, contest the computational process and its outcomes. The study asserts that interpretability is central to the critical study of the underlying computational elements of machine learning platforms.
- The collaborators assert that they are among the first to consider the pedagogical implications of responsible data science, creating parallels between cutting-edge data science research and cutting-edge educational research within the fields of learning sciences, artificial intelligence in education, and learning analytics and knowledge.
Additionally, the authors propose a set of pedagogical techniques for teaching the interpretability of data and models, positioning interpretability as a central integrative component of responsible data science.