Become familiar with methods to evaluate the accuracy of record linkage software. Tribal epidemiology toolkit data linkage council of. For some projects, ccr has also used the software integrity previously named automatch, and link plus, developed by the. Probabilistic record linkage in sas glenn wright, m. Understand the importance of accurate record linkage in a prescription drug monitoring program. Use of commercial record linkage software and vital. Registry plus software programs for cancer registries. The three algorithms were used to unduplicate an administrative database containing personal identifiers for over 500,000 clients. A brief overview over key linkage techniques is included as well. Record linkage based on a probabilistic matching approach was used to identify pregnancies exposed to acts in the first trimester of pregnancy. Probabilistic linkage technology makes it feasible to link large data files and achieve results governed by mathematical principles which adhere to statistically valid standards. A standalone probabilistic record linkage program that can detect duplicates in a cancer registry database. Link plus a component of registry plus is free, publicly available, probabilistic linkage and deduplication software designed by cdc for use by central cancer registries, but usable with any fixed width or delimited data type. This technology finds true linked pairs by comparing data values on candidate pairs of records and calculating the probability that each pair is a true match given.
The registry plus suite can be used separately or together for routine or special data collection. While link plus is easy to use, it may not be efficient or capable of processing large. This method, which is called probabilistic record linkage, is the approach used by most record linkage software, including free programs such as link plus provided by the cdc division of cancer prevention and link king a sasbased tool developed by the substance abuse and mental. A box in the link plus software informs you that the link process is done and displays some. The link king has fashioned a powerful alliance between sophisticated. To link a cancer registry file with external files. The python record linkage toolkit is a library to link records in or. The study objective was to compare the accuracy of a deterministic record linkage algorithm and two public domain software applications for record linkage the link king and link plus. Krupskirecord linkage software in the public domain. Comparison of record linkage software for deduplicating. Based on software calculated m probability sensitivity and u probability specificity. Repository of information on duplicate detection, record. Quickly and accurately link records within or across data sources using record linkage software that automates phonetic, numeric, domainspecific, and fuzzy matching.
Citeseerx a comparison of link plus, the link king, and. Comparison of publicdomain software and services for probabilistic record linkage and address standardization. Matchpro has the advantage of handling huge datasets. Record linkage is intrinsic to efficient, modern survey operations. Istat is the main producer of official statistics in italy. Linksolv record linkage software is used for standardizing reported data for record linkage purposes and computing bayesian probabilities that candidate record pairs are true links. Discover new connections and unearth insights with record linkage software even when the records in question are in different formats and have no. Finally, some software that is free and available for you to play around with on the web, link plus, the link king, choicemaker 2, febrl and the merge toolbox, they have quite good user interface. Link plus is a record linkage solution for cancer registries. Link plus is a record linkage tool for cancer registries. The total probability weight assigned to each record pair. May 15, 20 record linkage based on a probabilistic matching approach was used to identify pregnancies exposed to acts in the first trimester of pregnancy.
Campbell, dennis deck and antoinette krupski the study objective was to compare the accuracy of a deterministic record link age algorithm and two public domain software applications for record linkage the link king and link plus. An overview of record linkage methods linking data for health services research. These software programs, compliant with national standards, are made available by cdc to implement the national program of. A comparison of link plus, the link king, and a basic deterministic algorithm abstract objective. While link plus is easy to use, it may not be efficient or capable of processing large datasets those with 1 million records. Record linkage references projects population informatics. Registry plustm link plus link plus is a free software developed to perform probabilistic record linkage to support the national program of cancer registries npcr of the united states. Link plus is a probabilistic record linkage program developed at cdcs division of cancer prevention and control. To compare the accuracy of a deterministic record linkage algorithm and two public domain software applications for record linkage. By the way, you have to be careful how you set up your record linkage software when performing one to many matches. Repository of information on duplicate detection, record linkage, and identity uncertainty substance abuse and mental health services integrated database project technical monograph details about the probabilistic.
Apr 20, 2020 relais record linkage at istat is a toolkit providing a set of techniques for dealing with record linkage projects. Bibliography on record linkage software last updated. Software demonstrations record linkage techniques 1997. The link king is free public domain, probabilistic linkage and deduplication software user manual available. Comparing record linkage software programs and algorithms using. A list of free data matching and record linkage software. Campbell, dennis deck and antoinette krupski the study objective was to compare the accuracy of a deterministic record link age. To compare the accuracy of a deterministic record linkage algorithm and two public domain software applications for record linkage the link king and link plus.
Schema reconciliation, onthefly dataschema reconciliation, yes, no, no, limited. The generalized record linkage system is a probabilistic record linkage system designed for use by a wide range of statistical applications. Evaluating record linkage software using representative. Comparison of publicdomain software and services for. Provide advice to individuals who plan to update and maintain the programs for record linkage and related data preparation. The quality of the final record linkage results may depend on users preset up value of the cutoff point and user chosen blocking variables and matching methods. The software can run in two modes corresponding to two usages. One study used actual identifiers to evaluate probabilistic approaches from two software packages link plus and link king without studying. Link plus software standalone probabilistic record linkage program combines ease of use and statistical sophistication detects duplicates within a single database, or links 2 database files supports north american association of central cancer registries files, fixed width files, delimited files, and crs plus database. Link plus is a standalone probabilistic record linkage program that can detect duplicates in a cancer registry database and link a cancer registry file with external. Link plus is a free probabilistic record linkage and deduplication program developed at cdcs division of cancer prevention and control.
Mar 12, 2019 registry plus is a suite of publicly available free software programs for collecting and processing cancer registry data. It is used for unduplicating and updating name and address lists. For all of these reasons, nass decided to explore the use of commercially available record linkage software. Link plus is a probabilistic record linkage program developed at cdcs division of cancer prevention and control in support of cdcs national program of cancer registries. Link plus a component of registry plus is free, publicly available. A comparison of link plus, the link king, and a basic deterministic.
For all of these reasons, nass decided to explore the use of. Tribal epidemiology toolkit data linkage council of state. Standalone linkage systems some free record linkage software link plus us cdc free software designed for working with cancer registries, but can be used more widely febrl. Campbell public domain record linkage software page 2 of 27 pages record linkage software in the public domain. Record linkage with washington state cancer registry by. In the fourth section, we examine the current generalized record linkage system used at statistics canada, and then describe its features. Methods birth and newborn screening records maintained by the michigan department of community health from january 2007 through march 2008 were used in this study. To determine the accuracy of record matching using link king software that uses an ordinal score for the certainty that linked records are valid matches. Procedures for conducting data linkages with the ccr california. It is part of a toolbox of generalized systems developed at statistics canada. Become familiar with methods to evaluate the accuracy of record linkage. Generalized record linkage system statistics canadas. Because commercial record linkage software and computerized death certificates are now available at relatively low cost a few thousand dollars total for both, it is becoming. A checklist for evaluating record linkage software great article on how to evaluate probabilistic record linkage software riddle.
Registry plustm link plus link plus is a free software developed to perform probabilistic record linkage to support the national program of cancer registries npcr of the united states center for diseases control cdc. Several examples will be given on why it is useful to link data. Remadder is unsupervised free fuzzy data matching software with a gui. Tribal linkage and race data quality for american indians. Pdf comparison of publicdomain software and services. Record linkage is defined as the process of identifying records on two or more datasets that refer to the same entity across various data sources such as databases, crms, and social media platforms. Link plus software standalone probabilistic record linkage program combines ease of use and statistical sophistication detects duplicates within a single database, or links 2 database files. This method, which is called probabilistic record linkage, is the approach used by most record linkage software, including free programs. In the realm of public domainopen source software for record linkage and unduplication, the link king reigns supreme. The record linkage process will deduplicate the many record set. Know which patient metrics are most affected by the use of specific record linkage software. At the completion of a linkage run, link plus will generate a linkage report, named linkagereport. Gave advicesoftware of record linkage methods to census bureau program divisions. The evaluation of link plus was based on the examination of the user guide version 2.
Methods birth and newborn screening records maintained by the michigan. Comparison of record linkage software for deduplicating patient identities in californias prescription drug monitoring program california department of justice. Citeseerx a comparison of link plus, the link king, and a. For example, with link plus, the one side is what you would assign to. Study of record linkage software for the 2010 brazilian. Both matchpro and linkplus produce very good linkage quality. For example, with link plus, the one side is what you would assign to file 1, and the many side would be assigned to file 2. Pdf probabilistic record linkage prl refers to the process of. The link king has fashioned a powerful alliance between sophisticated probabilistic and deterministic record linkage protocols. Rector and many more programs are available for instant and free download.
A general rule of thumb is to set the file you want to improve as file 1. The problem addressed by this methodology is that of matching two data files. This report is an evaluation of several commercially available packages. Quickly and accurately link records within or across data sources using automated record linkage software that outperforms ibm and sas every time. To compare the accuracy of a deterministic record linkage algorithm and two public domain. We linked records in north carolina medicaid files to public health surveillance. Sep 29, 2019 link plus is a record linkage tool for cancer registries. Record linkage rl refers to the task of finding records in a data set that refer to the same entity across different data sources e. To detect duplicates in a cancer registry database. An overview of record linkage methods linking data for. Link plus, a freelyavailable probabilistic record linkage soft. Generalized record linkage system statistics canadas record linkage software martha fair statistics canada, ottawa abstract.
1300 623 239 1203 144 96 737 836 541 1332 309 1509 23 27 1477 92 1405 405 1328 1553 1222 634 115 1459 1481 463 339 542 136 61 632 1212 604 1275 564 948 1284 846 426 309 771 674 581 254 1374 1207 1481