Archive for July, 2007

Performance Test: Analyzing token frequencies

For analyzing token frequencies, I have implemented two different designs.Design A: For different data sources, create specific data readers that convert your data into Records. For frequency analysis, operate on the abstraction of RecordAdvantage:

Easy to maintain, implement once and for all
Reduces complexity of the project

Disadvantage:

Poor performance
High memory requirement, since you are doing analysis record by [...]

Read Full Post »

Midterm update

As many of you know, we are working on a patient matching module for OpenMRS that will allow users to identify records that belong to the same patient among different data sources.In the first part of SoC, I’ve completed adding weight scaling functionality to the existing record linkage framework.Matching records are determined by assigning a [...]

Read Full Post »

Code Spotlight: Analyzing token frequencies

I’d like to read a text file and store the frequency of each word in it into a database. Memory is fast, database is slow. At the two extremes, we have:1) I don’t use any memory, for each word I read, I query the database to learn it’s frequency, and update it in the database.Problem: Very Inefficient2) Everything [...]

Read Full Post »

Phase3 Complete

Now that Phase3 is ready for review, we have added weight scaling functionality to patient matching module. In fact, I have both versions of weight scaling implemented (Framework A and B). This week, I will be performing performance tests this week to determine if there is noteworthy performance difference between these two approaches.
3. Runtime Component, operational:  [...]

Read Full Post »

Phase2 Complete

I refactored the code for Phase1 according to the feedback I received, and completed Phase 2 (Here is the changeset). I will make a more detailed post tomorrow (before the phone call). The next step (changing the formula for weight scaling) should be fairly easy, after I figure out where scores are calculated in the existing code. Testing and [...]

Read Full Post »