Archive for May, 2007

Agenda for the week

After receiving a great guideline on how to implement weight scaling from James, I examined part of the existing code today. In the analytic phase, for each field, we need to calculate:(1) number of unique values(2) frequency of each unique value(3) total recordsThese calculated values should be stored in the database for future analysis.My idea [...]

Read Full Post »

Connecting the dots

After reading the Fellegi-Sunter paper, I’d like to map the ideas described in words to formulas used in the original paper:
(1) FS generates a likelihood score based on agreement pattern among corresponding fields from 2 records. The higher the likelihood score, the more likely two records represent a match, rather than simple random agreement among [...]

Read Full Post »

Background check

In Overview of Record Linkage and Current Research Directions, Winkler states that basic ideas in Fellegi-Sunter model are based on statistical concepts such as odds ratios, hypothesis testing, and relative frequency.Luckily I had a lot of exposure to hypothesis testing and relative frequency in the statistical modeling course this semester. Apparently the rest is basic probability:Odds [...]

Read Full Post »