The National Institute of Justice (NIJ) recently partnered with the National Science Foundation (NSF) on the “Building the Prototype Open Knowledge Network (Proto-OKN)” solicitation.
This solicitation aimed to stimulate research using knowledge graph (KG) databases. According to the solicitation, KGs “provide a powerful approach for organizing, representing, integrating, reusing, and accessing data from multiple structured and unstructured sources using ontologies and ontology alignment.” One of the core components of the solicitation is to create open knowledge networks, accessible to both government and non-government users, that meet existing federal government use cases. Each KG will address a specific urgent societal challenge as identified by a federal agency. NIJ contributed two use cases:
- Data on Nonfatal Firearms Injuries. The first case challenges applicants to develop better data describing nonfatal firearms injuries by integrating existing data sets from various sources.
- Mining Criminal Justice Insights from Existing Data. The second case, put forward in collaboration with the Bureau of Justice Statistics (BJS), draws attention to the more than 3,300 data sets in the National Archive of Criminal Justice Data (NACJD) repository. Applicants are challenged to realize the potential insights to be gained from connecting data sets developed for individual research projects with one another or with one of the national data sets collected by BJS, such as the National Incident-Based Reporting System and the National Crime Victimization Survey.
In collaboration with five other U.S. government agencies, including the National Institute of Justice, the U.S. National Science Foundation invested $26.7 million in 18 projects through the Proto-OKN program. NIJ will be working with two of those NSF awardees.
- A Knowledge Graph Warehouse for Neighborhood Information
This award proposes to use novel cube-like data structures to integrate information on criminal victimizations and the contexts in which they occur. Additionally, the awardees plan to implement querying, mining, and visualization tools to allow for investigation of patterns and trends at a variety of levels. - DREAM-KG: Develop Dynamic, Responsive, Adaptive, and Multifaceted Knowledge Graphs to Address Homelessness with Explainable AI
This award creates a KG system to increase understanding of the factors that contribute to homelessness and to improve the provision of services to people who are experiencing homelessness. The effort uses Topological Data Analysis tools and Artificial Intelligence model to gather and integrate data related to homelessness.
Also of note, the Bureau of Justice Statistics will be working with an NSF awardee on An Integrated Platform to Connect Criminal Justice Data Across Data Silo. This award builds the data infrastructure to connect data systems across the justice system. This effort focuses on creating a knowledge network to allow data produced by the criminal justice system to be used more easily and effectively.
View a list of all awards made by NSF under this opportunity.
Original Uses Cases
Following are the uses cases that NIJ contributed to this opportunity.
Data on Nonfatal Firearms Injuries
NIJ is interested in creating a single source of information for nonfatal, interpersonal, and intentional firearm injuries. That source will support the development of evidence-based firearm violence prevention policy. This data source would incorporate data from existing relevant national, state, and local databases that contain critical elements related to injury and firearm use. This database could be linked with local-level data describing socioeconomic characteristics, features of the built environment, and other community-relevant information to better understand why, where, and when nonfatal shootings occur to inform strategic violent crime prevention efforts in the community.
Ideally, the knowledge graph database would provide:
- A publicly accessible platform that can automatically and continuously update as new shootings occur.
- An easy-to-use interface that provides a standard set of metrics and is available to practitioners, policymakers, researchers, and the general public.
- The ability to expose data fields based on user group (e.g., policymakers, researchers, or government agencies).
- The ability to generate statistics at various geographic and temporal scales.
- The ability to easily generate reports that include descriptive statistics and spatiotemporal information.
- The ability to export data fields and relationships to make raw data (an important component of rigorous research) available to research professionals.
- The ability to answer a wide variety of questions, such as:
- Do nonfatal firearms injuries follow spatial and or temporal patterns?
- Are concentrations of nonfatal firearms injury events associated with specific characteristics of their immediate context (e.g., land use mix or types) or with the community in which they occur (e.g., sociodemographic structure)?
- What are the differences in patterns and predictors between fatal and nonfatal firearms injuries?
NIJ encourages applicants to address these factors in their proposals:
- The criteria for identifying and selecting data partners. (NIJ is available to assist with identifying partner jurisdictions.)
- Evidence that the core development team is familiar with criminal justice data.
- Evidence of familiarity with current efforts to create nonfatal shooting databases and a plan to leverage them where possible. Examples of these efforts include, but are not limited to, the Violence Project Mass Shootings database, the National Violent Death Reporting System, and the Crime Data Explorer.
- Provision for continuous enhancements and expansion to support a variety of different future uses.
- Attention to linking the knowledge graph entities and/or relationships to characteristics of places.
Guidance for Mining Criminal Justice Insights from National Archive of Criminal Justice Data (NACJD)
Overview
The Office of Justice Programs (OJP) established the National Archive of Criminal Justice Data (NACJD) in 1979 to provide a central repository of high-quality criminal justice data. The NACJD contains data from over 3,300 studies on topics of crime, justice, and the criminal justice system. Data are provided to researchers in a variety of formats, along with codebooks and other supporting documents.
All study pages on the NACJD website list the variables associated with each study, and the website supports variable-level search across all the studies. Using this information and study codebooks, knowledge graphs can be designed that connect different studies for the purposes of this challenge. NACJD staff are available to provide assistance with any questions about accessing and using NACJD data and documentation.
The NSF’s Open Knowledge Network challenge is seeking applicants interested in linking the data sets in the NACJD to support efforts to explore cross-cutting questions, provision new insights, and promote knowledge development across the criminal justice system.
This challenge provides a unique opportunity to advance and expand the use of authoritative scientific data and develop a partnership that will assist OJP in its strategic objectives as a provider of critical data to the criminal justice community.
Accessing NACJD Data
The NACJD provides access to two types of data: public-use files (PUFs) and restricted-use files (RUFs). PUFs are available to the public for download on demand but have been stripped of direct identifiers and other potentially identifiable information. RUFs have additional variables and offer more opportunities to establish linkages to other data sources. Researchers must complete an application process and be approved before being granted access to a RUF. The amount of time between application and approval varies based in part on the level of detail in the application, the number of revisions required, and other factors such as access to an internal review board (IRB). On average, it takes about 45 days (but usually no longer than 12 weeks) after an application is submitted for access to restricted data to be granted.
Public-Use Files
Data sets that do not contain information identifiable to a private person are available for public access download via NACJD as public-use files (PUFs). PUFs have been cleansed of direct identifiers (e.g., name and ID numbers) and indirect identifiers that pose a disclosure risk (e.g., specific date of arrest). NACJD protects respondent confidentiality by removing, masking, blanking, or collapsing direct or indirect variables and records within public-use versions of the data set. PUFs are downloadable from the NACJD website once a user agrees to certain terms of use, which include that the data will be used only for statistical purposes and that the researcher may not attempt to use the data to identify specific individuals.
Restricted-Use Files
In cases where a data set is not suitable for public download because some risk of disclosure remains (e.g., variables used in conjunction with one another or linking to other data files), BJS makes the data available in a restricted-use setting with strong confidentiality protections that requires potential users to apply for access. To obtain a restricted-use file, researchers must first complete an application and obtain approval to access the data.
For BJS data, an application must be submitted via the ResearchDataGov website using the Standard Application Process (SAP). The SAP is a Confidential Information Protection and Statistical Efficiency (CIPSEA) requirement that statistical agencies follow to operate a uniform process to make confidential data assets discoverable to researchers and allow researchers to apply for access to these data for research or statistical purposes. NIJ and the Office of Juvenile Justice and Delinquency Prevention data do not require application using the SAP.
Applicants for BJS restricted data must provide documentation including, but not limited to, a description of the research project that demonstrates a clear statistical or research purpose, a signed assurance of confidentiality, a signed restricted use data agreement, BJS Privacy Certificate, and data security plan. Institutional Review Board (IRB) approval is also required depending on the type of data, access, and research project.
Once the researcher’s application is approved, data can be made available in a variety of methods depending on need, access capabilities, and level of sensitivity of the data.
- Downloading Restricted-Use Files
Once the researcher’s application is approved, a secure download URL is provided. - Virtual Data Enclave (VDE)
The NACJD’s Virtual Data Environment (VDE) allows researchers to analyze data via a virtual desktop interface. These users do not get possession of the actual data and all output must be vetted by the NACJD. A proposal could include a request for multiple studies to be accessed for linking purposes. - Physical Enclave
Files in the physical enclave contain one or more of the following: direct or indirect identifiers or highly sensitive data. To obtain use of an enclaved file, the researcher must complete an application like that for restricted-use files. Upon approval of the application by NACJD and BJS, the researcher must travel to Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan and work in the enclave.
The enclave is a small suite of rooms with a few personal computers connected to a stand-alone server. The computer is not connected to the internet, and ICPSR staff are always present when a researcher is using an enclave. All output, notes, and other materials must be submitted for disclosure review before the investigator leaves the enclave, and any printed analytic results will undergo disclosure review before being sent to the researcher.
Linking NACJD Data
Linkages Using Geographical Identifiers
Most agency surveys, which are typically provided as PUFs, carry geographical identifiers (state, county, and city in some cases) that would allow people to better understand the totality of criminal justice agencies serving particular geographies (though some are sample surveys, which may limit the ability to do some of this analysis). Those data can also be tied to other publicly available data, such as those provided by the Census in its Annual Survey of State and Local Government Finances and its Annual Survey of Public Employment and Payroll, both of which provide data in a variety of ways, including via an application programming interface (API).
The BJS “Law Enforcement Agency Identifiers Crosswalk” at NACJD (available at https://www.icpsr.umich.edu/web/NACJD/studies/35158 ) may be a helpful resource for merging aggregate data at the state and county levels.
Linkages Using Other Identifiers
Studies with direct or indirect identifiers allow additional linking opportunities but are restricted-use data and require additional clearances and approvals as described above. For example, linkage of data on incarcerated persons to longitudinal data across generations might show the potential social and relational causal factors within families that increase the occurrence of offenses over a lifetime. A few very sensitive data studies can only be analyzed onsite at the NACJD facility in Ann Arbor, Michigan.
Confidentiality Protection
BJS is required to protect the confidentiality of information identifiable to a private person from misuse and unauthorized access. Information identifiable to a private person, as defined in DOJ regulations at 28 CFR Part 22.2, includes both individuals and establishments. These confidentiality protections extend beyond directly identifiable information such as names, social security numbers, and other identifying numbers to a broader application that includes information that could be reasonably interpreted as referring to a specific private person due to small sample or cell sizes, combination of indirect identifiers, linkage to other data sources, or other factors. Protecting confidentiality is a fundamental responsibility of BJS as a federal statistical agency and is essential to maintain the trust of data providers and survey respondents. When linking data as part of this challenge, participants would need to use the appropriate methods to mitigate disclosure risk and proceed cautiously to ensure that such linkages would not lead to identity disclosure.