The National Institute of Justice (NIJ) recently partnered with the National Science Foundation (NSF) on the “Building the Prototype Open Knowledge Network (Proto-OKN)” solicitation.
This solicitation aimed to stimulate research using knowledge graph (KG) databases. According to the solicitation, KGs “provide a powerful approach for organizing, representing, integrating, reusing, and accessing data from multiple structured and unstructured sources using ontologies and ontology alignment.” One of the core components of the solicitation is to create open knowledge networks, accessible to both government and non-government users, that meet existing federal government use cases. Each KG will address a specific urgent societal challenge as identified by a federal agency. NIJ contributed two use cases:
- Data on Nonfatal Firearms Injuries. The first case challenges applicants to develop better data describing nonfatal firearms injuries by integrating existing data sets from various sources.
- Mining Criminal Justice Insights from Existing Data. The second case, put forward in collaboration with the Bureau of Justice Statistics (BJS), draws attention to the more than 3,300 data sets in the National Archive of Criminal Justice Data (NACJD) repository. Applicants are challenged to realize the potential insights to be gained from connecting data sets developed for individual research projects with one another or with one of the national data sets collected by BJS, such as the National Incident-Based Reporting System and the National Crime Victimization Survey.
In collaboration with five other U.S. government agencies, including the National Institute of Justice, the U.S. National Science Foundation invested $26.7 million in 18 projects through the Proto-OKN program. NIJ will be working with two of those NSF awardees.
- A Knowledge Graph Warehouse for Neighborhood Information
This award proposes to use novel cube-like data structures to integrate information on criminal victimizations and the contexts in which they occur. Additionally, the awardees plan to implement querying, mining, and visualization tools to allow for investigation of patterns and trends at a variety of levels.
- DREAM-KG: Develop Dynamic, Responsive, Adaptive, and Multifaceted Knowledge Graphs to Address Homelessness with Explainable AI
This award creates a KG system to increase understanding of the factors that contribute to homelessness and to improve the provision of services to people who are experiencing homelessness. The effort uses Topological Data Analysis tools and Artificial Intelligence model to gather and integrate data related to homelessness.
Also of note, the Bureau of Justice Statistics will be working with an NSF awardee on An Integrated Platform to Connect Criminal Justice Data Across Data Silo. This award builds the data infrastructure to connect data systems across the justice system. This effort focuses on creating a knowledge network to allow data produced by the criminal justice system to be used more easily and effectively.
Original Uses Cases
Following are the uses cases that NIJ contributed to this opportunity.
Data on Nonfatal Firearms Injuries
NIJ is interested in creating a single source of information for nonfatal, interpersonal, and intentional firearm injuries. That source will support the development of evidence-based firearm violence prevention policy. This data source would incorporate data from existing relevant national, state, and local databases that contain critical elements related to injury and firearm use. This database could be linked with local-level data describing socioeconomic characteristics, features of the built environment, and other community-relevant information to better understand why, where, and when nonfatal shootings occur to inform strategic violent crime prevention efforts in the community.
Ideally, the knowledge graph database would provide:
- A publicly accessible platform that can automatically and continuously update as new shootings occur.
- An easy-to-use interface that provides a standard set of metrics and is available to practitioners, policymakers, researchers, and the general public.
- The ability to expose data fields based on user group (e.g., policymakers, researchers, or government agencies).
- The ability to generate statistics at various geographic and temporal scales.
- The ability to easily generate reports that include descriptive statistics and spatiotemporal information.
- The ability to export data fields and relationships to make raw data (an important component of rigorous research) available to research professionals.
- The ability to answer a wide variety of questions, such as:
- Do nonfatal firearms injuries follow spatial and or temporal patterns?
- Are concentrations of nonfatal firearms injury events associated with specific characteristics of their immediate context (e.g., land use mix or types) or with the community in which they occur (e.g., sociodemographic structure)?
- What are the differences in patterns and predictors between fatal and nonfatal firearms injuries?
NIJ encourages applicants to address these factors in their proposals:
- The criteria for identifying and selecting data partners. (NIJ is available to assist with identifying partner jurisdictions.)
- Evidence that the core development team is familiar with criminal justice data.
- Evidence of familiarity with current efforts to create nonfatal shooting databases and a plan to leverage them where possible. Examples of these efforts include, but are not limited to, the Violence Project Mass Shootings database, the National Violent Death Reporting System, and the Crime Data Explorer.
- Provision for continuous enhancements and expansion to support a variety of different future uses.
- Attention to linking the knowledge graph entities and/or relationships to characteristics of places.
Guidance for Mining Criminal Justice Insights from National Archive of Criminal Justice Data (NACJD)
The Office of Justice Programs (OJP) established the National Archive of Criminal Justice Data (NACJD) in 1979 to provide a central repository of high-quality criminal justice data. The NACJD contains data from over 3,300 studies on topics of crime, justice, and the criminal justice system. Data are provided to researchers in a variety of formats, along with codebooks and other supporting documents.
All study pages on the NACJD website list the variables associated with each study, and the website supports variable-level search across all the studies. Using this information and study codebooks, knowledge graphs can be designed that connect different studies for the purposes of this challenge. NACJD staff are available to provide assistance with any questions about accessing and using NACJD data and documentation.
The NSF’s Open Knowledge Network challenge is seeking applicants interested in linking the data sets in the NACJD to support efforts to explore cross-cutting questions, provision new insights, and promote knowledge development across the criminal justice system.
This challenge provides a unique opportunity to advance and expand the use of authoritative scientific data and develop a partnership that will assist OJP in its strategic objectives as a provider of critical data to the criminal justice community.
Accessing NACJD Data
The NACJD provides access to two types of data: public-use files (PUFs) and restricted-use files (RUFs). PUFs are available to the public for download on demand but have been stripped of direct identifiers and other potentially identifiable information. RUFs have additional variables and offer more opportunities to establish linkages to other data sources. Researchers must complete an application process and be approved before being granted access to a RUF. The amount of time between application and approval varies based in part on the level of detail in the application, the number of revisions required, and other factors such as access to an internal review board (IRB). On average, it takes about 45 days (but usually no longer than 12 weeks) after an application is submitted for access to restricted data to be granted.
In cases where a data set is not suitable for public download because some risk of disclosure remains (e.g., variables used in conjunction with one another or linking to other data files), BJS makes the data available in a restricted-use setting with strong confidentiality protections that requires potential users to apply for access. To obtain a restricted-use file, researchers must first complete an application and obtain approval to access the data.
For BJS data, an application must be submitted via the ResearchDataGov website using the Standard Application Process (SAP). The SAP is a Confidential Information Protection and Statistical Efficiency (CIPSEA) requirement that statistical agencies follow to operate a uniform process to make confidential data assets discoverable to researchers and allow researchers to apply for access to these data for research or statistical purposes. NIJ and the Office of Juvenile Justice and Delinquency Prevention data do not require application using the SAP.
Applicants for BJS restricted data must provide documentation including, but not limited to, a description of the research project that demonstrates a clear statistical or research purpose, a signed assurance of confidentiality, a signed restricted use data agreement, BJS Privacy Certificate, and data security plan. Institutional Review Board (IRB) approval is also required depending on the type of data, access, and research project.
Once the researcher’s application is approved, data can be made available in a variety of methods depending on need, access capabilities, and level of sensitivity of the data.
- Downloading Restricted-Use Files
Once the researcher’s application is approved, a secure download URL is provided.
- Virtual Data Enclave (VDE)
The NACJD’s Virtual Data Environment (VDE) allows researchers to analyze data via a virtual desktop interface. These users do not get possession of the actual data and all output must be vetted by the NACJD. A proposal could include a request for multiple studies to be accessed for linking purposes.
- Physical Enclave
Files in the physical enclave contain one or more of the following: direct or indirect identifiers or highly sensitive data. To obtain use of an enclaved file, the researcher must complete an application like that for restricted-use files. Upon approval of the application by NACJD and BJS, the researcher must travel to Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan and work in the enclave.
The enclave is a small suite of rooms with a few personal computers connected to a stand-alone server. The computer is not connected to the internet, and ICPSR staff are always present when a researcher is using an enclave. All output, notes, and other materials must be submitted for disclosure review before the investigator leaves the enclave, and any printed analytic results will undergo disclosure review before being sent to the researcher.
Linking NACJD Data
Linkages Using Geographical Identifiers
Most agency surveys, which are typically provided as PUFs, carry geographical identifiers (state, county, and city in some cases) that would allow people to better understand the totality of criminal justice agencies serving particular geographies (though some are sample surveys, which may limit the ability to do some of this analysis). Those data can also be tied to other publicly available data, such as those provided by the Census in its Annual Survey of State and Local Government Finances and its Annual Survey of Public Employment and Payroll, both of which provide data in a variety of ways, including via an application programming interface (API).
The BJS “Law Enforcement Agency Identifiers Crosswalk” at NACJD (available at https://www.icpsr.umich.edu/web/NACJD/studies/35158 ) may be a helpful resource for merging aggregate data at the state and county levels.
Linkages Using Other Identifiers
Studies with direct or indirect identifiers allow additional linking opportunities but are restricted-use data and require additional clearances and approvals as described above. For example, linkage of data on incarcerated persons to longitudinal data across generations might show the potential social and relational causal factors within families that increase the occurrence of offenses over a lifetime. A few very sensitive data studies can only be analyzed onsite at the NACJD facility in Ann Arbor, Michigan.
BJS is required to protect the confidentiality of information identifiable to a private person from misuse and unauthorized access. Information identifiable to a private person, as defined in DOJ regulations at 28 CFR Part 22.2, includes both individuals and establishments. These confidentiality protections extend beyond directly identifiable information such as names, social security numbers, and other identifying numbers to a broader application that includes information that could be reasonably interpreted as referring to a specific private person due to small sample or cell sizes, combination of indirect identifiers, linkage to other data sources, or other factors. Protecting confidentiality is a fundamental responsibility of BJS as a federal statistical agency and is essential to maintain the trust of data providers and survey respondents. When linking data as part of this challenge, participants would need to use the appropriate methods to mitigate disclosure risk and proceed cautiously to ensure that such linkages would not lead to identity disclosure.
Frequently Asked Questions
NSF expects applicants to include detailed descriptions of the collaboration between the applicant and the government agency which provided the use case. The list below provides a general guide to the types of activities that NIJ may be willing to undertake. Applicants should use these as a starting point for the description but add specifics that reflect the contents of their proposal.
- Staff time of substantive experts to contribute the following types of support:
- Identify potential data sources and partner jurisdictions.
- Leverage the extant literature to evaluate data quality before using in the knowledge graph. For example, identify known shortcomings of potential data sets (i.e., issues with official crime data).
- Provide design suggestions regarding the relative importance of data elements and relationships to answering pressing criminal justice-related questions.
- Identify important questions that have not been answered by extant research.
- Identify insights likely to be answered by current NIJ funding.
- Provide insight into useability of interfaces developed to provide access by researchers, policymakers, and practitioners. Each of these user groups has different sets of needs to make the best use of the database.
- Provide feedback related to the timeliness and relevance of information generated from knowledge graphs.
- Make suggestions related to contextual factors that may be integrated into the knowledge graphs.
- Identify related data sets and/or networks that may be important and potential insights such additional data may contribute.
Recipients of awards under The Centers for Disease Control and Prevention (CDC) under the Firearm Injury Surveillance Through Emergency Rooms (FASTER) program are collecting nonfatal firearm injury data and are potential health partners. The CDC has been funding firearm injury surveillance in these 10 state health departments for the last three years. The program (ends in August 2023.
Recipient agencies include:
- District of Columbia Department of Health
- Florida Department of Health
- Georgia Department of Public Health
- New Mexico Department of Health
- North Carolina Department of Health and Human Services
- Oregon Health Authority Public Health Division
- Utah Department of Health
- Virginia Department of Health
- Washington State Department of Health
- West Virginia Department of Health and Human Resources
Three of the states have FASTER data dashboards:
Following are two publications that summarize data collected under the program
- County-Level Social Vulnerability and Emergency Department Visits for Firearm Injuries — 10 U.S. Jurisdictions, January 1, 2018–December 31, 2021
- Using the Centers for Disease Control and Prevention’s National Syndromic Surveillance Program Data to Monitor Trends in US Emergency Department Visits for Firearm Injuries, 2018 to 2019
Going forward, CDC’s surveillance of firearms injuries through emergency departments (EDs) will occur within Advancing Violence Epidemiology in Real-Time (AVERT).
Other local and state health departments not funded through the FASTER or AVERT programs, but that continue to monitor firearm injury emergency department visits, also represent a potential health partners.
The FBI is currently working on modifications to the NIBRS data elements to capture:
- If the victim suffered a gunshot wound as an injury.
- If the firearm involved was discharged during the crime incident
Many larger agencies have records management systems capable of recording those data elements. Applicants may consider reaching out to law enforcement agencies that are currently NIBRS reporting agencies. Agencies serving populations of 250,000 persons or more that report data to NIBRS appear on the Bureau of Justice Statistics’ Law Enforcement Agency Reported Crime Analysis Tool website.
Your first step should be to contact Elizabeth Groff ([email protected]) to discuss the proposed collaboration.
In your proposal, clearly describe the details of your collaboration with NIJ. When completed, send your proposal to Dr. Groff for her review of the collaboration description, before submitting to NSF.