OpenHIE Entity Matching Service

Purpose

The purpose of the entity matching service is to enable matching in a single list of patients, health workers, facilities or other entities or to find potential matches between two lists of the same entities.

How it Works

The service receives a FHIR message with the entity to be matched and returns zero to 10 matches and their scores.

Potential Use Cases

We envision the following potential use cases:

Ensuring the the entity doesn't exist when entering a new instance of the entity
Duplicate checking during bulk imports.
Analysis of potential duplicates in an existing data set.
Mapping one data set of entities to their corresponding value in another data set.

Potential Implementations

Depending upon the use case, we envision that there might be a spectrum of implementation options. We expect to learn from the first implementations and refine the use patterns based upon experience. For now, we imagine the following types of architectural implementations:

Tight coupling - A tightly coupled implementation might be one where the matching service software library is incorporated into the architecture component.
Medium - This type of implementation could be one where the service interacts directly with the architecture component's data source.
Loose - This type of service may load data into the service's data base and analyze the data from there.

FHIR Reference

http://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemEdit&tracker_item_id=9685&start=0

High Level Overview of Mapping Service Components

Sample URL:

https://testmap.ohie.org/registry/fhir/Location/$match

Sample Request:

<Parameters xmlns="http://hl7.org/fhir"> <parameter> <name value="location"/> <resource> <Location xmlns="http://hl7.org/fhir"> <contained> <Location xmlns="http://hl7.org/fhir"> <id value="1"/> <identifier> <value value="a.bc.1.sample"/> </identifier> <name value="simple health"/> </Location> </contained> <identifier> <value value="117"/> </identifier> <name value="simple clinic"/> <position> <longitude value="10"/> <latitude value="100"/> </position> <partOf> <reference value="#1"/> </partOf> </Location> </resource> </parameter> <parameter> <name value="count"/> <valueInteger value="5"/> </parameter> </Parameters>

Sample Response:

<Bundle xmlns="http://hl7.org/fhir"> <entry> <resource> <Location xmlns="http://hl7.org/fhir"> <id value="1000010"/> <contained> <Location xmlns="http://hl7.org/fhir"> <id value="con31"/> <identifier> <value value="A.BC.1.SAMPLE"/> </identifier> <name value="SAMPLE HEALTH"/> </Location> </contained> <extension url="http://ohie.org/fhir/StructureDefinition/datim-mechid"> <valueString value="1111"/> </extension> <identifier> <value value="117"/> </identifier> <name value="SIMPLE CLINIC"/> <position> <longitude value="10.0"/> <latitude value="100.0"/> </position> <partOf> <reference value="#con31"/> </partOf> </Location> </resource> <search> <score value="0.99762179871785583440413347489084117114543914794921875"/> </search> </entry> </Bundle>

Matching Engine Source Code:

https://tools.regenstrief.org/stash/users/amartin/repos/registry/browse

Potential Workflow - Find Possible Matches as an HIE Service

Example Actors:

Entity Authority -
- OpenInfoMan/InterLinked Registry with the FHIR adapter
Entity Searcher
- iHRIS - when a new health worker record is added
- DHIS2 - when a new facility is added
- OpenMRS - when a new client is added

This is one example of a possible workflow:

participant Entity Searcher as ES
participant Interoperability\nLayer as IL
participant Entity Matching Service as EMS
participant Entity Authority as EA

loop configuration time of refresh managed in IL
IL->EMS: trigger refresh of entity cache 
EMS->EA: request updates to FHIR entity\nsince last refresh\nusing the search transaction
EA->EMS: return FHIR bundle of\n upated entities
EMS->EMS: update local cache
EMS->EMS: retune matching parameters 
end
ES->EMS: execute FHIR $match service
EMS->ES: return possible matches

(CL: didn't see the ability to add a web sequence diagram directly on this page for some reason)

Interfaces

Different interfaces will be created to instantiate different use cases.

Matching Algorithms

While the entity matching service currently implements a sophisticated probabilistic algorithm, a key overarching goal of the entity matching service is to accommodate a variety matching methods. The current algorithm can be configured for matching different types of entities.

Configuration File

The matching service is highly configurable. Shaun Grannis Andrew Martin - please advise here.

Questions and Answers

Q: Is blocking used?

A: The matching service is divided into two basic steps: coarse blocking and fine-grained matching handled in Java. The blocking step is for performance, so that the service doesn't need to apply the fine-grained matching algorithm to every row in the database. It’s less flexible than the fine-grained matching step and is designed to allow fast queries based on typical database indexes. For example, an index on the name column will make this query fast:

select * from organisationunit where name=?

But a normal database won’t be able to quickly run a query to search for rows based on a Levenshtein score. The <blockingScheme> element defines how the matching service will handle this coarse blocking.

Q: How is case matching supported?

The <caseMode> element can be used with these possible values:

CASE_SENSITIVE - it allows values to be stored as mixed case, and it uses query parameters however they’re received.
QUERY_UPPER - it allows work with mixed case values in the database. It will also work regardless of the case of incoming query parameters. When doing lookups, it will convert the database values and the query parameters to upper case within the query itself, so queries will be case insensitive: select … from organisationunit where upper(name)=upper(?) But if you have a normal index on that column instead of a function-based index, then it won’t be able to use the index.
QUERY_LOWER
STORE_UPPER - expects values to be stored as upper case in the database, and it converts query parameters to upper case before doing lookups.
STORE_LOWER

OHIE Architecture Quick Links

Architecture Meetings

Architecture Governance and Principles

Architecture Review Board

Architecture Road Map

OpenHIE Entity Matching Service

Release Management and Workflow Processes

OpenHIE Standards and Profiles

Page - Quick Links

Purpose

The purpose of the entity matching service is to enable matching in a single list of patients, health workers, facilities or other entities or to find potential matches between two lists of the same entities.

How it Works

The service receives a FHIR message with the entity to be matched and returns zero to 10 matches and their scores.

Potential Use Cases

We envision the following potential use cases:

Ensuring the the entity doesn't exist when entering a new instance of the entity
Duplicate checking during bulk imports.
Analysis of potential duplicates in an existing data set.
Mapping one data set of entities to their corresponding value in another data set.

Potential Implementations

Depending upon the use case, we envision that there might be a spectrum of implementation options. We expect to learn from the first implementations and refine the use patterns based upon experience. For now, we imagine the following types of architectural implementations:

Tight coupling - A tightly coupled implementation might be one where the matching service software library is incorporated into the architecture component.
Medium - This type of implementation could be one where the service interacts directly with the architecture component's data source.
Loose - This type of service may load data into the service's data base and analyze the data from there.

FHIR Reference

http://gforge.hl7.org/gf/project/fhir/tracker/?action=TrackerItemEdit&tracker_item_id=9685&start=0

High Level Overview of Mapping Service Components

Sample URL:

https://testmap.ohie.org/registry/fhir/Location/$match

Sample Request:

<Parameters xmlns="http://hl7.org/fhir"> <parameter> <name value="location"/> <resource> <Location xmlns="http://hl7.org/fhir"> <contained> <Location xmlns="http://hl7.org/fhir"> <id value="1"/> <identifier> <value value="a.bc.1.sample"/> </identifier> <name value="simple health"/> </Location> </contained> <identifier> <value value="117"/> </identifier> <name value="simple clinic"/> <position> <longitude value="10"/> <latitude value="100"/> </position> <partOf> <reference value="#1"/> </partOf> </Location> </resource> </parameter> <parameter> <name value="count"/> <valueInteger value="5"/> </parameter> </Parameters>

Sample Response:

<Bundle xmlns="http://hl7.org/fhir"> <entry> <resource> <Location xmlns="http://hl7.org/fhir"> <id value="1000010"/> <contained> <Location xmlns="http://hl7.org/fhir"> <id value="con31"/> <identifier> <value value="A.BC.1.SAMPLE"/> </identifier> <name value="SAMPLE HEALTH"/> </Location> </contained> <extension url="http://ohie.org/fhir/StructureDefinition/datim-mechid"> <valueString value="1111"/> </extension> <identifier> <value value="117"/> </identifier> <name value="SIMPLE CLINIC"/> <position> <longitude value="10.0"/> <latitude value="100.0"/> </position> <partOf> <reference value="#con31"/> </partOf> </Location> </resource> <search> <score value="0.99762179871785583440413347489084117114543914794921875"/> </search> </entry> </Bundle>

Matching Engine Source Code:

https://tools.regenstrief.org/stash/users/amartin/repos/registry/browse

Potential Workflow - Find Possible Matches as an HIE Service

Example Actors:

Entity Authority -
- OpenInfoMan/InterLinked Registry with the FHIR adapter
Entity Searcher
- iHRIS - when a new health worker record is added
- DHIS2 - when a new facility is added
- OpenMRS - when a new client is added

See:

participant Entity Searcher as ES
participant Interoperability\nLayer as IL
participant Entity Matching Service as EMS
participant Entity Authority as EA

loop configuration time of refresh managed in IL
IL->EMS: trigger refresh of entity cache 
EMS->EA: request updates to FHIR entity\nsince last refresh\nusing the search transaction
EA->EMS: return FHIR bundle of\n upated entities
EMS->EMS: update local cache
EMS->EMS: retune matching parameters 
end
ES->EMS: execute FHIR $match service
EMS->ES: return possible matches

(CL: didn't see the ability to add a web sequence diagram directly on this page for some reason)

Interfaces

Different interfaces will be created to instantiate different use cases.

Matching Algorithms

While the entity matching service currently implements a sophisticated probabilistic algorithm, a key overarching goal of the entity matching service is to accommodate a variety matching methods. The current algorithm can be configured for matching different types of entities.

Configuration File

The matching service is highly configurable. Shaun Grannis Andrew Martin - please advise here.

Questions and Answers

Q: Is blocking used?

A: The matching service is divided into two basic steps: coarse blocking and fine-grained matching handled in Java. The blocking step is for performance, so that the service doesn't need to apply the fine-grained matching algorithm to every row in the database. It’s less flexible than the fine-grained matching step and is designed to allow fast queries based on typical database indexes. For example, an index on the name column will make this query fast:

select * from organisationunit where name=?

But a normal database won’t be able to quickly run a query to search for rows based on a Levenshtein score. The <blockingScheme> element defines how the matching service will handle this coarse blocking.

Q: How is case matching supported?

A: The <caseMode> element can be used with these possible values:

CASE_SENSITIVE - it allows values to be stored as mixed case, and it uses query parameters however they’re received.
QUERY_UPPER -
work with mixed case values in the database. It will also work regardless of the case of incoming query parameters. When doing lookups, it will convert the database values and the query parameters to upper case within the query itself, so queries will be case insensitive: select … from organisationunit where upper(name)=upper(?) But if you have a normal index on that column instead of a function-based index, then it won’t be able to use the index.
QUERY_LOWER
STORE_UPPER - expects values to be stored as upper case in the database, and it converts query parameters to upper case before doing lookups.
STORE_LOWER

OHIE Architecture Quick Links

Architecture Meetings

Architecture Governance and Principles

Architecture Review Board

Architecture Road Map

OpenHIE Entity Matching Service

Release Management and Workflow Processes

OpenHIE Standards and Profiles

Page - Quick Links

Space shortcuts

Page tree

Purpose

How it Works

Potential Use Cases

Potential Implementations

FHIR Reference

High Level Overview of Mapping Service Components

Sample URL:

Sample Request:

Sample Response:

Matching Engine Source Code:

Potential Workflow - Find Possible Matches as an HIE Service

Interfaces

Matching Algorithms

Configuration File

Questions and Answers

Q: Is blocking used?

Purpose

How it Works

Potential Use Cases

Potential Implementations

FHIR Reference

High Level Overview of Mapping Service Components

Sample URL:

Sample Request:

Sample Response:

Matching Engine Source Code:

Potential Workflow - Find Possible Matches as an HIE Service

Interfaces

Matching Algorithms

Configuration File

Questions and Answers

Q: Is blocking used?

Q: How is case matching supported?