Child Welfare Quality Improvement Center for Workforce Development (QIC-WD)


Date of this Version



Media is loading

Document Type



When examining workforce data, it can be valuable to capitalize on data from a variety of systems, such as various human resources (HR) databases, learning management systems, and child welfare information systems. Each can be useful on their own, but additional information can be learned when different types of data are connected. For example, applicant information may be stored in a database that is separate from other HR data on those hired, and there is value in looking at the connections between applicant data and later aspects of employment.

Data linkage involves pairing observations from two or more data files and identifying the pairs that belong to the same entity. For example, if you wanted to know how caseload affected the length of time someone stayed with the agency, this would typically require linking HR data for an individual with data for cases assigned to the same individual pulled from a child welfare information system. When there is a common ID across systems, linking can be fairly straightforward. However, in most agencies, different ID codes are used for different systems, which can make data linking challenging. In those circumstances, additional strategies are needed to successfully link the data.

Before beginning any data linking, a first step is data cleaning and standardization. Work done up front to correct issues such as missing data and entry errors, and to reconcile differences in how the same data appear in different files (e.g., names in lower case versus upper case or full names versus initials), will maximize successful matches. After the data are cleaned and standardized, one of two matching strategies can be used: deterministic or probabilistic. In deterministic matching, individuals across databases are linked through an exact match on a common identifier or a set of variables that are sufficiently descriptive to constitute an exact match (e.g., name, DOB, gender, office). Even with a common ID code, matching still may not be perfect due to missing data, duplicate records, or entry errors that prevent a match. For probabilistic matching, records are also compared on one or more common fields, but a set of formal decision rules are used to assign a probability that the records are a match. The success of probabilistic approaches depends on the uniqueness of the combinations of variables used to make the match. Probabilistic matching is accomplished with special software, and there are both open source and commercial options available.

In this video, Cindy Parry, QIC-WD Evaluation Specialist, provides further details on how probabilistic matching works.

The Linking Human Resources and Child Welfare Data resource describes the type of data that a child welfare agency may want to link to examine workforce questions. It also provides additional information on data cleaning and data matching strategies.

The content contained in this blog post was developed as part of the QIC-WD’s Child Welfare Workforce Analytics Institute. The Institute was designed to facilitate growth and collaboration between leaders in child welfare and human resources in their awareness, knowledge, and use of data analytics.