Posted in , , , , on May 30th, 2007 2 Comments »
After receiving a great guideline on how to implement weight scaling from James, I examined part of the existing code today. In the analytic phase, for each field, we need to calculate:(1) number of unique values(2) frequency of each unique value(3) total recordsThese calculated values should be stored in the database for future analysis.My idea [...]
Read Full Post »
Posted in on May 21st, 2007 2 Comments »
After reading the Fellegi-Sunter paper, I’d like to map the ideas described in words to formulas used in the original paper:
(1) FS generates a likelihood score based on agreement pattern among corresponding fields from 2 records. The higher the likelihood score, the more likely two records represent a match, rather than simple random agreement among [...]
Read Full Post »
Posted in on May 21st, 2007 Comments Off
In Overview of Record Linkage and Current Research Directions, Winkler states that basic ideas in Fellegi-Sunter model are based on statistical concepts such as odds ratios, hypothesis testing, and relative frequency.Luckily I had a lot of exposure to hypothesis testing and relative frequency in the statistical modeling course this semester. Apparently the rest is basic probability:Odds [...]
Read Full Post »