• Home
  • About
  • OpenMRS
  • Proposal
  • Timeline

OpenMRS: Record Linkage Project

Google Summer of Code™ 2007






Phase2 Complete

Jul 2nd, 2007 by sarp

I refactored the code for Phase1 according to the feedback I received, and completed Phase 2 (Here is the changeset). I will make a more detailed post tomorrow (before the phone call). The next step (changing the formula for weight scaling) should be fairly easy, after I figure out where scores are calculated in the existing code. Testing and commenting the code should not take too long either, since I know that the code is functional, just have to check for boundary cases etc.

Runtime Component, start-up:  Implement functionality instantiating a data structure that provides fast lookup of individual token frequencies.  This data structure will likely be a hash table, where the key is the token value (eg, last name of “SMITH”) and the value is the token frequency (eg, 2,102). This data will be loaded from the persistent data structure created in task 1(e).Because the primary performance constraint for weight-based frequency scaling will be the lookup, we will need to be able to configure the number of elements loaded into the hash table.  For example, it is likely that some fields will have hundreds of thousands of unique tokens (eg, name fields), while others will have on the order of 10 or 20 (middle initial, month of birth).Also, weight scaling can be used to either increase or decrease individual field weights.  If an individual token frequency is less than the average frequency it will be increased, if it is above the average frequency it will be decreased.  Consequently, there needs to be some ability to configure the total number of tokens loaded into the lookup structure for each field.a. Implement functionality to load top ‘N’ most/least frequent tokens from the persistent data structure, where top, bottom, and ‘N’ are specified in the configuration file.  Ifb. Other (future) options may include top or bottom N%, frequencies above or below N.He

Posted in , , , | No Comments

Comments are closed.

  • Email Updates

    To receive email updates on new posts, click here.

  • About Me

    I am 22 years old, recently graduated from Computer Science and Engineering program of Sabanci University in Istanbul, TURKEY.

  • Recent Posts

    • Performance Test: Analyzing token frequencies
    • Midterm update
    • Code Spotlight: Analyzing token frequencies
    • Phase3 Complete
    • Phase2 Complete
    • Phase1 Complete
  • Archives

    • July 2007 (5)
    • June 2007 (4)
    • May 2007 (3)
    • April 2007 (1)
  • Categories

    • (6)
    • (7)
    • (8)
    • (8)
    • (8)
    • (8)
    • (1)
    • (2)
    • (4)
  • Pages

    • About
    • OpenMRS
    • Proposal
    • Timeline
Powered by Wordpress |
Feed on
Posts
Comments