Ado Files

Jaro-Winkler Distance (GitHub)

Stata ado file to calculate the Jaro-Winkler string distance between two strings.

Use ssc install jarowinkler to install the ado file from Stata or see here

jarowinkler calculates the distance between two string variables using the Jaro-Winkler distance metric. The distance metric is often used in record linkage to compare first or last names in different sources. Jaro-Winkler modifies the standard Jaro distance metric by putting extra weight on string differences at the start of the strings to be compared. The metric is scaled between 0 (not similar at all) and 1 (exact match).

For more detail on the Jaro-Winkler method, see wikipedia and http://www.gabormelli.com/RKB/Jaro-Winkler_Distance_Function.

Jaro-Winkler implementation based on code from http://cs.anu.edu.au/~Peter.Christen/Febrl/febrl-0.4.01/stringcmp.py and https://github.com/miguelvps/c/blob/master/jarowinkler.c.

keeporder (GitHub)

99% of the time, when you run keep in Stata, you follow it with order, right? Well, keeporder saves you a line of code, running keep and then order on the variable list.

Use ssc install keeporder to install the ado file from Stata or see here

R Programs

extract_bib (GitHub)

extract_bib is a simple R script to create minimal a bibtex library file for an article. Some journals require tex and bib files with submissions. However, I have only one master bib file. That file currently contains 3543 references. Rather than send all of that to the journal to help typeset my manuscript, I only want to send a bib file with the references included in my article. This script does that.