Phase1 Complete
Jun 25th, 2007 by sarp
I completed Phase 1, the Analytic Component. It includes:
Implement and analytic object that performs the following tasks:a. Read linkage configuration file and determine which fields/columns need to be scaled.b. Connect to data source containing individual tokens. Assume that the data source is either a flat file such as a CSV file or a relational database.c. Add new information to configuration file that indicates location of token frequenciesd. Count frequencies of individual tokens.e. Store token frequency results in persistent structure (eg, a relational database table). In order to access the token frequency data at runtime, the frequency tables need to be identified in the configuration file. Thus, will need to develop a programmatic scheme to identify each token frequency table associated with a given data source, eg:
What has changed in the source code?Testing code has been moved into a new package called org.openmrs.testingorg.regenstrief.linkage.analysis,A new abstract class for analyzers called DataSourceAnalyzerTwo classes that extend it are CharDelimFileAnalyzer and DataBaseAnalyzerThese classes contain a CharDelimFileReader/DataBaseReader to go over/query recordsIn this schema, existing classes such as DataSourceAnalysis, Analyzer and ScaleWeightAnalyzer are not used. The schema provided by existing classes was more general, and if we can find a way to fit my classes into it, I’m willing to refactor the code. Otherwise, I’ll delete them.org.regenstrief.linkage.dbAnalyzers contain a LinkDBManager to insert token frequencies into the database. I’m not sure if this was the most suitable class for adding this code.org.regenstrief.linkage.ioDataSourceReader has a new parameter to determine if it will be used for analysis or reading. This change was necessary because in reading, some columns are excluded and blocking is done to make the source ready for linkage. However, we don’ want these in analyzing the data.org.regenstrief.linkage.utilLinkDataSource: Added a variable to store a unique identifier for each linkdatasource (is used in storing token frequencies)RecMatchConfig: Added a LinkDBManager to create a connectionXMLTranslator: Modified to include id of the linkdatasource and a new parameter for storing token frequencies