• Home
  • About
  • OpenMRS
  • Proposal
  • Timeline

OpenMRS: Record Linkage Project

Google Summer of Code™ 2007






Phase1 Complete

Jun 25th, 2007 by sarp

I completed Phase 1, the Analytic Component. It includes:

Implement and analytic object that performs the following tasks:a. Read linkage configuration file and determine which fields/columns need to be scaled.b. Connect to data source containing individual tokens. Assume that the data source is either a flat file such as a CSV file or a relational database.c. Add new information to configuration file that indicates location of token frequenciesd. Count frequencies of individual tokens.e. Store token frequency results in persistent structure (eg, a relational database table).  In order to access the token frequency data at runtime, the frequency tables need to be identified in the configuration file.  Thus, will need to develop a programmatic scheme to identify each token frequency table associated with a given data source, eg:

What has changed in the source code?Testing code has been moved into a new package called org.openmrs.testingorg.regenstrief.linkage.analysis,A new abstract class for analyzers called DataSourceAnalyzerTwo classes that extend it are CharDelimFileAnalyzer and DataBaseAnalyzerThese classes contain a CharDelimFileReader/DataBaseReader to go over/query recordsIn this schema, existing classes such as DataSourceAnalysis, Analyzer  and ScaleWeightAnalyzer are not used. The schema provided by existing classes was more general, and if we can find a way to fit my classes into it, I’m willing to refactor the code. Otherwise, I’ll delete them.org.regenstrief.linkage.dbAnalyzers contain a LinkDBManager to insert token frequencies into the database. I’m not sure if this was the most suitable class for adding this code.org.regenstrief.linkage.ioDataSourceReader has a new parameter to determine if it will be used for analysis or reading. This change was necessary because in reading, some columns are excluded and blocking is done to make the source ready for linkage. However, we don’ want these in analyzing the data.org.regenstrief.linkage.utilLinkDataSource: Added a variable to store a unique identifier for each linkdatasource (is used in storing token frequencies)RecMatchConfig: Added a LinkDBManager to create a connectionXMLTranslator: Modified to include id of the linkdatasource and a new parameter for storing token frequencies 

Posted in | No Comments

Comments are closed.

  • Email Updates

    To receive email updates on new posts, click here.

  • About Me

    I am 22 years old, recently graduated from Computer Science and Engineering program of Sabanci University in Istanbul, TURKEY.

  • Recent Posts

    • Performance Test: Analyzing token frequencies
    • Midterm update
    • Code Spotlight: Analyzing token frequencies
    • Phase3 Complete
    • Phase2 Complete
    • Phase1 Complete
  • Archives

    • July 2007 (5)
    • June 2007 (4)
    • May 2007 (3)
    • April 2007 (1)
  • Categories

    • (6)
    • (7)
    • (8)
    • (8)
    • (8)
    • (8)
    • (1)
    • (2)
    • (4)
  • Pages

    • About
    • OpenMRS
    • Proposal
    • Timeline
Powered by Wordpress |
Feed on
Posts
Comments