Guide to Health Informatics 2nd Edition
Enrico Coiera
| Home | Order | About the book | Health Informatics | Sample Chapters
| Reviews |
The terms disease and remedy were
formerly understood and therefore defined quite differently to what they are
now; so, likewise, are the meanings and definitions of inflammation, pneumonia,
typhus, gout, lithiasis, &c., different from those which were attached to
them thirty years ago…It is evident ... that great mischief will in most cases
ensue if, in such attempts at definition and explanation, greater importance is
attached to a clear and determinate, than to a complete and comprehensive
understanding of the objects and questions before us. In a field like ours,
clearness can in general be purchased only at the expense of completeness and
therefore truth.
Oesterlen,
Medical Logic, (1855)
Coding
and classification systems have a long history in medicine. Current systems can
trace their origins back to epidemiological lists of the causes of death from
the early part of the eighteenth century. François Bossier de Lacroix
(1706-1777) is commonly credited with the first attempt to classify diseases
systematically (ICD-10, 1993). Better known as Sauvages, he published the work
under the title Nosologia Methodica.
Linnaeus
(1707-1778) who was a contemporary of Sauvages also published his Genera Morborum in that period. By the
beginning of the nineteenth century, the Synopsis
Nosologiae Methodicae, published in 1785 by William Cullen of Edinburgh
(1710-1790) was the classification in most common use.
It
was John Graunt who, working about a hundred years earlier, is credited with
the first practical attempts to classify disease for statistical purposes.
Working on his London Bills of Mortality,
he was able to estimate the proportion of deaths in different age groups. For
example, he estimated a 36% mortality for liveborn children before the age of
6. He did this by taking all the deaths classified as convulsions, rickets,
teeth and worms, thrush, abortives, chrysomes, infants, and livergrown. To
these he added half of the deaths classed as smallpox, swinepox, measles, and
worms without convulsions. By all accounts his estimate was a good one (ICD-10,
1993).
It
has only been in the last few decades that these terminological systems have
started to attract wide-spread attention and resources. The ever growing need
to amass and analyse clinical data, no longer just for epidemiological
purposes, has provided considerable incentive and resources for their
development. Further, with the development of computer technology, there has
been a belief that such wide-spread collection and analysis of data are now
possible. In parallel, the requirement for clinicians to participate in that
data collection has meant that they have had more opportunity to work with
terminologies, and begin to understand their benefits and limitations.
In
the previous chapter, the basic concepts of term, code, and classification were
introduced. In this chapter, several of the major coding and classification
systems in routine use in healthcare will be introduced, and their features
compared. Some specific limitations of each system will be highlighted. In
reality there are a large number of such systems in development and use, and
they cannot all be identified here. The systems discussed are however
representative of most systems in common use, and can serve as an introduction
to them. Throughout, a historical perspective will be retained, since in this
case the lessons of the past have deep implications for the present. The more
general limitations of all terminological systems will be addressed in the
following chapter.
Purpose. The International Classification of Diseases
(ICD) is published by the World Health Organisation (WHO). Currently in its
tenth revision (ICD-10), its goal is to allow morbidity and mortality data from
different countries around the world to be systematically collected and
statistically analysed. It is not intended, nor is it suitable, for indexing
distinct clinical entities (Gersenovic, 1995). The International Nomenclature
of Diseases (IND) provides the set of recommended terms and synonyms that
correspond to the entries classified in the ICD codes.
History. The ICD can trace its ancestry to the early
days of healthcare terminologies. William Farr (1807-1883) became the first
medical statistician for the General Register Office of England and Wales. Upon
taking office, he found the Cullen classification in use, but that it had not
been updated in accordance with medical advances, nor did it seem suitable for
statistical purposes. In his first Annual Report of the Registrar General, he
noted:
‘The advantages of a uniform statistical nomenclature, however
imperfect, are so obvious, that it is surprising that no attention has been
paid to its enforcement in Bills of Mortality. Each disease has, in many
instances, been denoted by three or four terms, and each term has been applied
to as many different diseases: vague, inconvenient names have been employed, or
complications have been registered instead of primary diseases. The
nomenclature is of as much importance in this department of enquiry as weights
and measures in the physical sciences, and should be settled without delay.
(ICD-10, 1993).’
Farr
toiled hard at improving the classification, and by 1855, the International
Statistical Congress adopted a classification based on the work of Farr, and
Marc d’Espine of Geneva. Subsequently steered by Jaques Bertillon, this
developed into the International List of Causes of Death. This was adopted in
1893, and continued to develop through the turn of the century and beyond, and
ultimately evolved into the current ICD system.
In
particular, the system was expanded to include not just causes of death, but
diseases resulting in measurable morbidity. This expansion started with the
urging of Farr. It was supported by Florence Nightingale, who in 1860 urged the
adoption of Farr’s disease classification for the tabulation of hospital
morbidity in her paper Proposals for a
uniform plan of hospital statistics. In 1900 at the First International
Conference to revise the Bertillon Classification, a parallel classification of
diseases for use in statistics of sickness was finally adopted.
Level of acceptance and use. The ICD today is used internationally by WHO
for comparison of statistical returns. It is also adopted by many individual
countries in the preparation of their statistical returns. Most other major
classification systems endeavour to make their systems compatible with ICD, so
that data coded in these systems can be mapped directly to ICD codes. ICD thus
acts as a defacto reference point for many healthcare terminologies.
Classification structure. The ICD-10 is a multiple-axis classification
system. At its core, the basic ICD is a single list of three alphanumeric
character codes. These are organised by category, from A00 to Z99 (excluding U
codes which are reserved for research, and for the provisional assignment of
new diseases of uncertain aetiology). This level of detail is the mandatory
level for reporting to the WHO mortality database and for general international
comparisons.
The
classification is structured into 21 chapters, and the first character of the
ICD code is a letter associated with a particular chapter (Table 17.1).
|
Table
17.1: The ICD-10 chapter headings (adapted from
ICD-10, 1993). |
|
Chapter I |
Infectious and parasitic
diseases |
|
Chapter II |
Neoplasms |
|
Chapter III |
Diseases of the blood and
blood forming organs and certain disorders affecting the immune mechanism |
|
Chapter IV |
Endocrine, nutritional and
metabolic diseases |
|
Chapter V |
Mental and behavioural
disorders |
|
Chapter VI |
Diseases of the nervous
system |
|
Chapter VII |
Diseases of the eye and
adnexa |
|
Chapter VIII |
Diseases of the ear and
mastoid process |
|
Chapter IX |
Diseases of the
circulatory system |
|
Chapter X |
Diseases of the
respiratory system |
|
Chapter XI |
Diseases of the digestive
system |
|
Chapter XII |
Diseases of skin and
subcutaneous tissue |
|
Chapter XIII |
Diseases of
musculoskeletal system and connective tissue |
|
Chapter XIV |
Diseases of the
genitourinary system |
|
Chapter XV |
Pregnancy, childbirth and
the puerperium |
|
Chapter XVI |
Certain conditions
originating in the perinatal period |
|
Chapter XVII |
Congenital malformations,
deformations and chromosomal abnormalities |
|
Chapter XVIII |
Symptoms, signs and abnormal
clinical and laboratory findings |
|
Chapter XIX |
Injuries, poisoning and
certain other consequences of external causes |
|
Chapter XX |
External causes of
morbidity and mortality |
|
Chapter XXI |
Factors affecting health status
and contact with health services of a person not currently sick |
Within
chapters, the 3 character codes are divided into homogenous blocks reflecting different
axes of classification. In Chapter I for example, the blocks signify the axes
of mode of transmission and of the broad group of the infecting organism.
Within Chapter II on neoplasms, the first axis is the behaviour of the
neoplasm, and the next is its site. Within all blocks some codes are reserved
for conditions not specified elsewhere in the classification.
When
more detail is required, each category in ICD can be further subdivided, using
a fourth numeric character after a decimal point, creating up to 10
subcategories. This is used, for example, to classify histological varieties of
neoplasms. A few ICD chapters adopt five or more characters to allow further
subclassification along different axes.
Since ICD continues to
be used for ever-wider applications beyond its intent, the WHO decided in the
10th revision to develop the concept of a family of related classifications
surrounding this core set. This ‘family’ contains lists that have been
condensed from the full ICD, and lists expanded for speciality-based
adaptations (Figure 17.1). It also contains lists that cover topics beyond
morbidity and mortality. For example, there are classifications of medical and
surgical procedures, disablement and so forth (Gersenovic, 1995).
|
Figure
17.1: The ICD family of disease and health-related
classifications (adapted from ICD-10, 1993). |

The
International Classification of Functioning, Disability and Health (ICF) is a
more recent member of the ICD ‘family’. While ICD-10 focuses on
classifying a patient’s diagnosis, ICF is aimed at capturing a description of
their capacity to function. ICF
describes how people live with their health condition and describes body
functions and structures, activities and participation. The domains are
classified from body, individual and societal perspectives. Since an
individual's functioning and disability occurs in a context, ICF also includes
a list of environmental factors. The ICF is intended to assist with measuring
health outcomes.
Limitations. The ICD has developed as a practical, rather
than theoretically based, classification. There have been compromises between
classification based on axes of aetiology, anatomical site and so on. There
have also been adjustments made to it to meet the needs of different
statistical applications beyond morbidity and mortality, for example social
security. As such, the ICD exists as a practical attempt at compromise between
various health care needs. Consequently, for many applications, finer levels of
detail may still be needed, or other axes of classification required.
Purpose. Diagnosis Related Groups (DRGs) relate a
patient’s diagnosis and treatment to the cost of their care (Murphy-Muth, 1987;
Feinstein, 1988). Developed in the United States by the Health Care Finance
Administration, DRGs were designed to support the calculation of federal
reimbursement for healthcare delivered through the U.S. Medicare system.
A
patient’s principal diagnoses and the procedures they are treated with during
hospital admission are used to select the group in the DRG classification that
most appropriately describes they overall type of care that has been delivered.
Next the group selected is associated with a typical cost. Specifically, DRG funding
requires the use of a cost weighting that is applied by the funding agency to
determine the actual amount that should be paid to an institution for treating
a patient with a particular DRG. The weightings are determined by a formula
that is typically developed on a state or national basis.
DRGs
are also used to determine an institution’s overall case-mix. The case-mix index helps to take account of the types of
patient an individual institution sees, and estimates their severity of
illness. Thus a hospital seeing the same proportion of patients as another, but
dealing with more severe illness, will have a higher case-mix index. An
institution’s case-mix index can then be used in the formula that determines
reimbursement per individual DRG. Unsurprisingly different versions of the
reimbursement formula favour different types of institution, and case-mix
represents an area for ongoing debate and research.
History. In the mid 1970s the Centre for Health
Studies at Yale University began work on a system for monitoring hospital
utilisation review (Rothwell, 1987). Following a 1976 trial of a DRG system, it
was decided to base the final system on the ICD-9-CM which would provide the
basic diagnostic categories. The ICD-9-CM
(clinical modification) classification was developed from the ICD-9 by the
American Commission on Professional and Hospital Activities. It contains
finer-grained clinical detail than the old ICD-9, and along with its successors
developed in various countries for ICD-10, is intended for healthcare review
and reimbursement use.
Level of acceptance and use. DRGs are used routinely in the United States
for management review and payment for Medicare and Medicaid patients. Given the
importance of reimbursement world-wide, DRGs have undergone ongoing
development, and have been adopted in one form or another in many countries
outside the USA, including Australia (AR-DRG), Canada (CMG) and countries of
Europe and Asia.
Classification structure. Patients are initially assigned a code from ICD-9
CM or a clinical modification of ICD-10. ICD clinical modifications are
multiaxial systems closely based on the ICD structure. Diagnoses are then
partitioned into one of about 23 Major Diagnostic Categories (MDCs) according
to body organ system or disease. The aim of this step is to group codes into
similar categories that reflect consumption of resources and treatment (Figure
10.1). The categories are next partitioned based upon the performance of
procedures, and on other variables such as the presence of complications and
co-morbidities, patient age, and length of stay, before a DRG is finally
assigned (Rothwell, 1987). There is thus a process of category reduction at
each stage, starting from the many thousands of ICD codes to the few hundred
DRGs:
ICD Þ MDC Þ
DRG
Limitations. Given the local variations in clinical
practice, disease incidence, patient selection, procedures performed, and
resources, DRGs and case-mix indices will always only give approximate
estimates of the true resource utilisation. For example, should a hospital that
is developing new and expensive procedures be paid the same amount as an
institution that treats the same type of patient with a more common and cheaper
procedure? Should quality of care be reflected in a DRG? For example, if a
hospital delivers good quality of care that results in better patient outcomes,
should it be paid the same as a hospital that performs more poorly for the same
type of patient?
As
importantly, those institutions that are best able to create DRGs accurately
are more likely to receive reimbursement in line with their true expenditure on
care. There is thus an implication in the DRG model that an institution
actually has the ability to accurately assemble information to derive DRGs and
a case-mix index. Given local and national variations in information systems
and coding practice, it is likely that institutions with poor information
systems will be disadvantaged, unless the information infrastructure across a
region is a ‘level playing field’.
Developments. DRGs are designed for use with inpatients.
Accordingly, other systems have been developed for other areas of healthcare.
Systems such as Ambulatory Visit Groups (AVGs) and Ambulatory Payment
Classifications (APCs) have been developed for outpatient or ambulatory care in
the primary sector. These are based upon a patient’s diagnosis, intervention,
visit status and physician time. Given the increasing age of the population in
western nations, there is a tremendous ongoing cost that comes from the chronic
care needed by the elderly. Consequently, systems such as Resource Utilisation
Groups (RUGs) and the Australian National Sub-Acute and Non-Acute Patient
Classification (AN-SNAP) have been developed to help determine the usage of
sub-acute and long-term care resources. RUGs are based upon the time spent by
nursing home staff when caring for a patient. SNAP includes measures of
functional ability.
Purpose. The Read codes (now simply called the
Clinical Terms in the UK) are produced for clinicians, initially in primary
care, who wish to audit the process of care. The Clinical Terms Version 3
(CTV3) is intended, like SNOMED International, to code events in the electronic
patient record (O’Neil et al., 1995).
History. The Read codes were introduced in the UK in
1986 to generate computer summaries of patient care in primary care. In the
subsequent revision Version 2, their structure was changed and based upon ICD-9
and OPCS-4, the Classification of Surgical Operations and Procedures. As
Version 2 became increasingly inadequate, the UK’s Conference of Medical Royal
Colleges, and the government’s National Health Service (NHS) established a
joint Clinical Terms Project, comprising some 40 working groups representing
the different specialities. This was subsequently joined by groups representing
nurses and allied health professionals. Version 3 of the Read codes was created
in response to the output of the Terms project.
Level of acceptance and use. Use of the Read codes is not mandatory in the
UK. However, in 1994 it was recommended by the medical and nursing professional
bodies as the preferred dictionary for clinical information systems. The Read
codes have been purchased by the UK government and made Crown Copyright.
Classification structure. The Read codes have undergone substantive
changes through their various revisions, altering not just the classification
and terminological content, but also their structure. In Versions 1 and 2, Read
was a strictly hierarchical classification system.
Read
Version 3 is released in 2 stages and was a ‘superset’ of all previous
releases, containing all previous terms, to allow backward compatibility with
past versions. Version 3.0 is a kind of compositional classification system.
Like SNOMED, a term can appear in several different ‘hierarchical structures’,
classified against different axes. Unlike ICD or SNOMED, the codes themselves
do not reflect a given hierarchy. They simply act as a unique identifier for a
clinical concept. The ‘hierarchy’ exists as a set of links between concepts.
Terms can inherit properties across these links. For example, ‘pulmonary
tuberculosis’ may naturally inherit from a parent respiratory disorder or a
parent infection term.
In
Version 3.1, a set of qualifier terms such as anatomical site was added that
can be combined with existing terms. When terms are composed, these composites
exist outside of any strict hierarchy. To help in the combination of qualifiers
with terms, they are grouped into templates. These capture some rules that help
describe the range of possible qualifiers that a term in Read can take (Table
17.2).
|
Table
17.2: Example Read Version 3.1 template showing
allowable combinations of terms with qualifier attributes, and attribute
values (adapted from O’Neil et al., 1995). |
|
Object |
Applicable Attribute |
Applicable values |
|
Bone operation |
Site |
Bone, Part of Bone |
|
Fixation of
fracture |
Reduction method |
Percutaneous,
open, closed |
|
Fixation of
fracture using intramedullary nail |
Reaming method |
Hand, powered
rigid, powered flexible, etc. |
|
Fixation of
fracture using intramedullary nail |
Nail Type |
Flexible, Locking,
Rigid, etc. |
The
Read Codes Drug and Appliance Dictionary is part of the Clinical Terms and covers
medicinal products, appliances, special foods, reagents and dressings. The
dictionary is designed for use in software that requires capture of medication
and treatment data such as electronic patient records and prescribing systems.
Like
other major systems, Read offers mapping to ICD-9 codes to permit international
reporting, and in some cases also provides ICD-10 mapping. A set of Quality
Assurance Rules have been developed for the Clinical Terms which are designed
to check the clinical, drug and cross-mapping domains between the current and
previous versions of the terms and other major terminologies like ICD-10, and
for areas of overlap between the domains themselves (Schulz et al., 1998). Each
QA rule is written to interrogate the various files that make up the Read Code
releases and is designed to identify those concepts or terms that violate the
basic structure of the Read Codes.
Although
Read Version 3 does not overtly emphasise axes of classification like SNOMED,
both systems allow terms to be linked to each other and to inherit properties
across those links. Therefore the underlying potential for expressiveness is
the same at the structural level. Differences in the number and type of terms,
and the richness of interconnections between them are probably greater
determinants of difference between these coding systems, than any underlying
structural difference. The presence of a fixed hierarchy, as we find with ICD
or SNOMED, carries certain benefits of regularity when exploring the system. It
also imposes greater constraints when it is necessary to alter the system
because of changes to the terminology. In Read, this burden of regularity
begins to be shifted to the rules guiding the composition of terms.
Limitations. The Read templates for term composition are
limited in their ability to control combination. A much richer language and
knowledge base would be needed to regulate term combination (Rector et al.,
1995).
Purpose. The Systematized nomenclature of medicine is intended
to be a general-purpose, comprehensive and computer-processable terminology to
represent and, according to its creators, will index “virtually all of the
events found in the medical record” (Côté et al., 1993).
History. SNOMED was derived from the 1968 edition of
the Manual of tumour nomenclature and
coding (MONTAC) and the Systematized
nomenclature of pathology (SNOP). SNOMED International (or SNOMED III) is a
development of the second edition of SNOMED, published in 1979 by the College
of American Pathologists (CAP).
Level of acceptance and use. SNOMED is reportedly used in over 40
countries, presumably largely in laboratories for the coding of reports to
generate statistics and facilitate data retrieval. Although CAP is a not for
profit organisation, in the past SNOMED license fees have often been
significant and may have impeded its more widespread adoption.
Classification structure. SNOMED is a hierarchical, multi-axial
classification system. Terms are assigned to one of eleven independent systematised
modules, corresponding to different axes of classification (Table 17.3). Each
term is placed into a hierarchy within one of these modules, and assigned a
five or six digit alphanumeric code (Figure 17.2).
|
Table 17.3: The
SNOMED International modules (or axes). |
|
Module designator |
|
Topography (T) |
|
Morphology (M) |
|
Function (F) |
|
Diseases/Diagnoses (D) |
|
Procedures (P) |
|
Occupations (J) |
|
Living Organisms (L) |
|
Chemicals, Drugs &
Biological Products (C) |
|
Physical Agents, Forces
& Activities (A) |
|
Social Context (S) |
|
General Linkage-Modifiers
(G) |
Terms
can also be cross-referenced across these modules. Each code carries with it a
packet of information about the terms it designates, giving some notion of the
clinical context of that code (Table 17.4).
|
Figure
17.2: SNOMED Codes are hierarchically structured.
Implicit in the code, tuberculosis is an infectious bacterial disease. |

SNOMED also allows the composition of
complex terms from simpler terms, and is thus partially compositional. SNOMED
International incorporates virtually all of the ICD-9-CM terms and codes,
allowing reports to be generated in this format if necessary.
|
Table 17.4: An
example of SNOMED’s nomenclature and classification. Some terms (e.g.
Tuberculosis) can be cross-referenced to others, to give the term a richer
clinical context (adapted from Rothwell, 1995). |
|
|
Nomenclature |
Classification |
||||
|
Axis |
T |
+ M |
+ L |
+ F |
= D |
||
|
Term |
Lung |
+ Granuloma |
+ M. tuberculosis |
+ Fever |
= Tuberculosis |
||
|
Code | |||||||