Health information integration (HII) is one of the crucial challenges for healthcare systems to integrate heterogeneous data sources into a standard format. This process allows researchers to extract information and knowledge from the integrated systems effectively and efficiently using information retrieval (IR) methods, such as semantic similarity mapping . Many practices realize that performing integration tasks in legacy health information systems is a cumbersome one due to the restrictions in semantic interoperability requirements. The lack of standards is a barrier for the electronic health record (EHR) implementation and integrated delivery systems which support health information exchange (HIE) in disparate health information management systems (HIMS) . In order to integrate the discussed standards, a unified methodology is desired, one that applies different technique, informational and computational, and use standardized terminologies such as the following: the International Classification of Diseases, Ninth and Tenth Revisions (ICD-9 and ICD-10), Current Procedural Terminology, 4th Edition (CPT-4) and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT). These are important requirements for all information models that represent the health information in a standard way across all the domains of healthcare.
With the use of well defined and widely accepted standards, information can be integrated and shared within and across healthcare domains enabling HIE to be implemented. With the vision of an integrated system, there should be only one integrated schema (global view) that combines all distributed data sources across several disparate systems and allows the integrated system to seamlessly and efficiently extract relevant information and evaluate the effectiveness of HIE. Various information integration techniques have been proposed in the past, for example schema integration [3, 4], data warehousing , federated databases , distributed and adaptive distributed query processing systems [7, 8].
Srikrishnan et. al. proposed an integrated system from distributed databases using the hyper-graph data model which represents database schema in a low-level data model . Several database integration projects were developed in global-as-view (GAV) technique, in which the global schema was constructed as a view over the local schemas . The drawback of GAV is the evolution of local schemas when existing local schemas are changed and all the predefined information retrieval processes have to be reconstructed to adapt to the new design. The need for greater flexibility in changes in local schemas was addressed by local-as-view (LAV) technique . However, the drawbacks are more complex in query processing time in LAV than in GAV . Mazzarisi et. al. developed the integrated data system for the electrophysiology laboratory (EPH-Lab) using HL7 clinical document architecture (CDA) . Deigo et. al., proposed a point-to-point interface architecture and the message server integration techniques as the mediator integration model using HL7-based legacy systems integration .
Existing legacy clinical data management System (CDMS) refers the system that is in used before the introduction of HL7 based system (v2 or v3), while all references to "HL7 legacy systems" refer to HL7 v2 implementations since v2 is the older of the two technologies. A comparison of HL7 v2 and v3 messages reveals that they are incompatible with each other, as their message formats are completely different. v2 messages use delimiters ("|" symbol) to separate values, whereas v3 messages use Extensible Markup Language (XML) data objects for values. Also, HL7 v2.3.1 and later versions support XML encoding of messages, but the HL7 v3 XML specification is built around vocabularies and data types appropriately extracted from RIM [14, 15]. v3 was designed with stronger standardization in mind, so v3 messages are much more consistent with each other than v2 . The v2 specification is commonly represented using ASCII (American Standard Code for Information Interchange) format. The XML tags are based on natural language; thus it can be both read by humans and processed by machines. This attribute enables a higher level of semantic and syntactic interoperability from one system to another . Another limitation of v2 is that the standard messages have a large number of optional segments and fields, thus it cannot be implemented easily. These preclude rigorous conformance testing. Additionally, the v2 standard does not lend itself to implementation in alternate communication protocols. Some of the benefits of HL7 v2 messages include, but not limited to, ease of implementation, backward compatibility with other HL7 v2 versions, the capability of being implemented in modules, the provision of an application program interface (API) for interfacing with legacy systems by providing 80 percent of the interface frame work [13–16].
Scientists and researchers have been using one technique or the other in order to integrate their CDMS legacy systems and HL7 v2 systems with the HL7 v3 system. In an effort to build a unified format for a National Information System in Turkey, The Ministry of Health implemented the Clinical Document Architecture Release Two (CDA R2) standard and a client application compatible with HL7 v3 for information interchange . Yang et. al. presented the design of the HL7 RIM based sharing components for clinical information systems in Taipei City Hospital . Sui-hui et. al. developed the HL7 v3 gateway using Web Services aimed at solving the bottleneck of the HL7 v2.x standard for data transfer between two medical information systems . Paterson et. al. proposed a boundary objects approach by designing an HL7 template for data entry against information codification such as HL7 vocabulary, HL7 external vocabularies and controlled vocabularies in order to improve the quality of the data in the discharge summary .
In this article, we developed a prototype to integrate distributed clinical data sources using R-MIM classes from HL7 v3-RIM as a global view along with a collaborative centralized web-based mapping tool to tackle the evolution of both global and local schemas. Our prototype was implemented and integrated with a clinical data management system as a Plug-in module using a CDMS. A clinical data management system (e.g Slim-Prim [21, 22]) is used in administering and managing patient medical records and making the records available to health-care providers so that they can be used in their research and translational health care practice. We have tested the prototype system with some use case scenarios for distributed clinical data sources across several legacy clinical database management systems (CDMS) and database management systems (DBMS) at the University of Tennessee Health Science Center (UTHSC). These disparate systems were built on different underlying database technologies such as Oracle, MySQL and MS Access. All the database management systems (Oracle, MySQL, and MS Access) used are relational database model, as this ensures a one-one, one-many or many-many relation between a patient's administrative and clinical health information data items. The results have been effective in improving information delivery, completing tasks that would have been otherwise difficult to finish, and reducing the time required to accomplish tasks which are used in collaborative information retrieval and sharing with other systems. One of the challenges implementing HL7 v3 is creating automatic semantic interoperability for existing legacy CDMS and also with HL7 v2 format.
The HL7 technical committees developed the HL7 v3 message structure that is based on the reference information model (RIM). The objective of this model is to tackle the bottleneck involved in the information interchange among health information systems. The RIM is the data source, with a coherent and shared information model that is necessary for the data content of all the HL7 v3 messages. The RIM is an abstract model-driven development methodology based on a unified modelling language (UML) and the root of all information models that represent the HL7 data in a standard way across all the domains of healthcare system. It is also a complete HL7 v3 reference model that includes all the object attributes and properties and state transition diagrams that specify the life cycles of all class objects . For specific domains, the Domain-Message Information Model (D-MIM) represents a refined subset of the RIM that is used to drive domain-specific information models such as "Administrative Management Domain -Accounting and Billing" and "Health and Clinical Management Domain-Clinical Document Architecture Medical Record." A D-MIM is composed of a set of class clones, attributes, state-machines and relationships in R-MIM that are essential for constructing HL7 v3 messages for a particular domain in a specific area of interest in healthcare.
RIM core classes contain the following subject areas: Acts, Entities and Roles. The Acts subject area contains the following classes: Account, Act, ActRelationship, ControlAct, Device Task, DiagnosticImage, Diet, FinancialContract, FinancialTransaction, InvoiceElement, ManagedParticipation, Observation, Participation, PatientEncounter, Procedure, PublicHealthCase, SubstanceAdministration, Supply and WorkingList. All the classes in the Acts subject area relates to all the events and actions in the health care services . The Entities subject area consists of the following classes: Container, Device, Entity, LanguageCommunication, LivingSubject, ManufacturedMaterial, Material, NonPersonLivingSubject, Organization, Person and Place. All the classes in the Entities subject area involve all the stake holders in the health care services. Role subject classes are Access, Employee, LicensedEntity, Patient, Role and RoleLink, and they relate to roles the participants play in health care services.
Other subclasses can be derived or cloned from the core classes (e.g., observation and procedure subclasses derived from the class act). The clone classes can be viewed as a direct or conceptual specialization of the core class. There are invariably thousands of clone classes with the core or specific domain classes of RIM .
Once the domain is specified, a Refined-Message Information Model (R-MIM) is used to express the content for a set of messages with incumbent annotations and elaborations that are message specific. To exchange information between systems, the Hierarchical Message Descriptions (HMDs) represents the message structures or message types that are used to express R-MIM abstract message structures in an organized way, which can be communicated between systems with disparate underlying technologies. One of the most important things the HMD does is to specify the serialization of the two dimensional R-MIM into a one dimensional data stream. It also involves the Implementation Technology Specification (ITS) through Extensible Markup Language (XML) and Unified Modeling Language (UML) [25, 26]. Furthermore, since HL7 v3 specification is based on RIM classes and v3 message uses XML that includes both data and metadata in a unified format, the data (XML representation) can be correctly processed at its destination point irrespective of the platform or technology that may evolve in the future. This is to enable a higher level of semantic consistency and interoperability for the interchange of clinical data, biomedical data, and other data from one system to another.
The HL7 data types and structures are defined by XML ITS. It follows the extendable markup language protocols, while the structures represent the constructs defined by HMDs. Thus for every HL7 message type it is necessary to have an HMD and XML Schema Definition (XSD) to express a set of rules to which XML document structure must follow in order to be considered valid according to the schema specifications. Thus the XSDs contain all required information that is essential for constructing a complete HL7 v3 message.
An understanding of the artifacts for domain-specific models is essential for understanding the HL7 v3 specification. For every domain, the artifacts are organized in the same structure that is submitted by the HL7 technical committee during the specification development process. For instance, an application role submitted by the Patient Administration under Administrative Management Domain will have a unique artifact identifier like PRPA_ IN101001UV 01 where PR = Practice (Subsection), PA = Patient Administration (Domain), IN = Interaction (Artifact type), 101001 = 6 digit non-meaningful number assigned by the Technical Committee to ensure uniqueness, UV = Realm (the only current value is UV for universal), and 01 = Current version number. The root element uniquely identifies the message's interaction identifier, which identifies the message type, the trigger event, and the receiver responsibilities. In this example, the interaction between two systems is defined by the interaction code (IN). The IN shown in the actual message would be composed of the Trigger Event, the Message Type, the Transmission Wrapper, the Control Act Wrapper, the Sender, and the Receiver. The Trigger Event and Control Act Wrapper represent another wrapper around the actual message, which explains information about the date and time the trigger event occurred as well as the responsible parties for the trigger. The development of RIM opens the door to significant movement from HIE standards through messaging to an integrated healthcare systems architectural paradigm. RIM represents all the attributes and data elements that are needed for HL7 message communication and data exchange [23, 25].
Compared with other implementations of Health Information exchange [9, 10, 12, 13], our prototype implementation of HL7 v3 ensures the mapping for information integration between distributed clinical data sources to promote collaborative healthcare and translational research. This mapping is triggered in real-time to ensure that the right information is received at the right time. Our approach has effectively and efficiently ensured the correctness of the information and knowledge extractions for systems that have been integrated. Our prototype integrates distributed clinical data sources using R-MIM classes from HL7 v3-RIM as a global view, along with a collaborative centralized web-based mapping tool to tackle the evolution of both global and local schemas.