Database schema for analysis results
Jun 17th, 2007 by sarp
Here is a draft on how to store analytical phase results in a relational database table. I created this diagram using DBDesigner 4, the same tool used to create OpenMRS Data Model. Right click here and choose ”Save target as” if you’d like to load this schema in DBDesigner and modify it.
Datasource_analysis table mimics LinkDataSource class in org.regenstrief.linkage.io package. I am imagining a GUI in OpenMRS where the user manually chooses among existing data sources, or adds a new data source in which the datasource_id is automatically assigned by the database.Field table contains changed and data_changed attributes to determine how fresh the statistics are.
2 Responses to “Database schema for analysis results”
Sarp, a great start to the the data model! A couple of questions:
1. In the patient_matching_datasource_analysis table, there is a field labeled “access”, can you describe its use?
2. In the patient_matching_field table, can you describe how the “date_changed” and “changed” fields will be used? I’m not certain how the “changed” field will be used
3. A given field (such as last name) will have an overall entropy. Additionally, each individual token possesses an individual entropy. I would suggest adding an “entropy” field to the patientmatching_token table. Calculating the entropy of individual tokens is not top priority, but I anticipate potentially using those values in the future.
1. access refers to the delimiter character in flat files, for databases it could be referring to username-password information etc. James would know better because it is a variable he used while designing LinkDataSource class.
2. The motivation for these fields was that since analytic phase may take a long time for large data sources, we may want to use old statistics in the linkage process to skip the analytic phase.
changed field will be either 0 or 1, depending on whether any update/insert/delete operations were done to the database after our statistical analysis. This field would need to be maintained by functions outside of patient linkage module that make changes to the database.
date_changed is to store the last date in which changes to the database were made. It would be good to know how fresh the calculated statistics are, to decide whether to run the analytic phase again or not.
3. Thanks, I’ll add that