• Home
  • About
  • OpenMRS
  • Proposal
  • Timeline

OpenMRS: Record Linkage Project

Google Summer of Code™ 2007






Database schema for analysis results

Jun 17th, 2007 by sarp

Here is a draft on how to store analytical phase results in a relational database table. I created this diagram using DBDesigner 4, the same tool used to create OpenMRS Data Model. Right click here and choose ”Save target as” if you’d like to load this schema in DBDesigner and modify it.model.pngDatasource_analysis table mimics LinkDataSource class in org.regenstrief.linkage.io package. I am imagining a GUI in OpenMRS where the user manually chooses among existing data sources, or adds a new data source in which the datasource_id is automatically assigned by the database.Field table contains changed and data_changed attributes to determine how fresh the statistics are.

Posted in , , , , | 2 Comments

2 Responses to “Database schema for analysis results”

  1. on 19 Jun 2007 at 8:21 pm1Shaun Grannis

    Sarp, a great start to the the data model! A couple of questions:

    1. In the patient_matching_datasource_analysis table, there is a field labeled “access”, can you describe its use?

    2. In the patient_matching_field table, can you describe how the “date_changed” and “changed” fields will be used? I’m not certain how the “changed” field will be used

    3. A given field (such as last name) will have an overall entropy. Additionally, each individual token possesses an individual entropy. I would suggest adding an “entropy” field to the patientmatching_token table. Calculating the entropy of individual tokens is not top priority, but I anticipate potentially using those values in the future.

  2. on 20 Jun 2007 at 1:05 pm2sarp

    1. access refers to the delimiter character in flat files, for databases it could be referring to username-password information etc. James would know better because it is a variable he used while designing LinkDataSource class.

    2. The motivation for these fields was that since analytic phase may take a long time for large data sources, we may want to use old statistics in the linkage process to skip the analytic phase.

    changed field will be either 0 or 1, depending on whether any update/insert/delete operations were done to the database after our statistical analysis. This field would need to be maintained by functions outside of patient linkage module that make changes to the database.

    date_changed is to store the last date in which changes to the database were made. It would be good to know how fresh the calculated statistics are, to decide whether to run the analytic phase again or not.

    3. Thanks, I’ll add that

  • Email Updates

    To receive email updates on new posts, click here.

  • About Me

    I am 22 years old, recently graduated from Computer Science and Engineering program of Sabanci University in Istanbul, TURKEY.

  • Recent Posts

    • Performance Test: Analyzing token frequencies
    • Midterm update
    • Code Spotlight: Analyzing token frequencies
    • Phase3 Complete
    • Phase2 Complete
    • Phase1 Complete
  • Archives

    • July 2007 (5)
    • June 2007 (4)
    • May 2007 (3)
    • April 2007 (1)
  • Categories

    • (6)
    • (7)
    • (8)
    • (8)
    • (8)
    • (8)
    • (1)
    • (2)
    • (4)
  • Pages

    • About
    • OpenMRS
    • Proposal
    • Timeline
Powered by Wordpress |
Feed on
Posts
Comments