Guide to Health Informatics 2nd Edition
Enrico Coiera
| Home | Order | About the book | Health Informatics | Sample Chapters
| Reviews |
From the very earliest moments in the modern
history of the computer, scientists have dreamed of creating an “electronic
brain”. Of all the modern technological quests, this search to create
artificially intelligent (AI) computer systems has been one of the most
ambitious and, not surprisingly, controversial.
It also seems that very early on, scientists
and doctors alike were captivated by the potential such a technology might have
in medicine (e.g. Ledley and Lusted, 1959). With intelligent computers able to
store and process vast stores of knowledge, the hope was that they would become
perfect ‘doctors in a box’, assisting or surpassing clinicians with tasks like
diagnosis.
With such motivations, a small but talented
community of computer scientists and healthcare professionals set about shaping
a research program for a new discipline called Artificial Intelligence in
Medicine (AIM). These researchers had a bold vision of the way AIM would
revolutionise medicine, and push forward the frontiers of technology.
AI in medicine at that time was a largely
US-based research community. Work originated out of a number of campuses,
including MIT-Tufts, Pittsburgh, Stanford and Rutgers (e.g. Szolovits, 1982;
Clancey and Shortliffe, 1984; Miller, 1988). The field attracted many of the
best computer scientists and by any measure their output in the first decade of
the field remains a remarkable achievement.
In reviewing this new field in 1984, Clancey
and Shortliffe provided the following definition:
‘Medical artificial intelligence is
primarily concerned with the construction of AI programs that perform diagnosis
and make therapy recommendations. Unlike medical applications based on other programming
methods, such as purely statistical and probabilistic methods, medical AI
programs are based on symbolic models of disease entities and their
relationship to patient factors and clinical manifestations.’
Much has changed since then, and today the
importance of diagnosis as a task requiring computer support in routine
clinical situations receives much less emphasis (Durinck et al., 1994). The
strict focus on the medical setting has now broadened across the healthcare
spectrum, and instead of AIM systems, it is more typical to describe them as clinical
decision support systems (CDSS). Intelligent systems today are thus found
supporting medication prescribing, in clinical laboratories and educational
settings, for clinical surveillance, or in data-rich areas like the intensive
care setting.
While there certainly have been ongoing
challenges in developing such systems, they actually have proven their
reliability and accuracy on repeated occasions (Shortliffe, 1987). Much of the
difficulty experienced in introducing them has been associated with the poor
way in which they have fitted into clinical practice, either solving problems
that were not perceived to be an issue, or imposing changes in the way
clinicians worked. What is now being realised is that when they fill an
appropriately role, intelligent programmes do indeed offer significant
benefits. One of the most important tasks now facing developers of AI-based
systems is to characterise accurately those aspects of clinical practice that
are best suited to the introduction of artificial intelligence systems.
In the remainder of this chapter, the initial
focus will thus remain on the different roles CDSS can play in clinical
practice, looking particularly to see where clear successes can be identified,
as well as looking to the future. Much of the material presumes familiarity
with Chapters two and eight. The next chapter will take a more technological
focus, and look at the way CDSS are built. A variety of technologies including
expert systems and neural networks will be discussed. The final chapters in
this section look at several specialised topics where intelligent decision
support is an essential component. We will look at the way CDSS can support the
interpretation of patient signals that come off clinical monitoring devices,
how it can assist in the surveillance for infectious diseases and public health
challenges like bioterrorism, and how genome science is supported through
bioinformatics.
Proponents of so-called ‘strong’ AI are
interested in creating computer systems whose behaviour is at some level
indistinguishable from humans (see Box 25.1). Success in strong AI would result
in computer minds that might reside in autonomous physical beings like robots,
or perhaps live in ‘virtual’ worlds like the information space created by
something like the Internet.
An alternative approach to strong AI is to look
at human cognition and decide how it can be supported in complex or difficult
situations. For example, a fighter pilot may need the help of intelligent
systems to assist in flying an aircraft that is too complex for a human to
operate on their own. These ‘weak’ AI systems are not intended to have an
independent existence, but are a form of ‘cognitive prosthesis’ that supports a
human in a variety of tasks.
CDSS are by and large intended to support
healthcare workers in the normal course of their duties, assisting with tasks
that rely on the manipulation of data and knowledge. An AI system could be
running within an electronic patient record system, for example, and alert a
clinician when it detects a contraindication to a planned treatment. It could
also alert the clinician when it detected patterns in clinical data that suggested
significant changes in a patient’s condition.
Along with tasks that require reasoning with
clinical knowledge, AI systems also have a very different role to play in the
process of scientific research. In particular, AI systems have the capacity to
learn, leading to the discovery of new phenomena and the creation of clinical
knowledge. For example, a computer system can be used to analyse large amounts
of data, looking for complex patterns within it that suggest previously
unexpected associations. Equally, with enough of a model of existing knowledge,
an AI system can be used to show how a new set of experimental observations
conflict with the existing theories. We shall now examine such capabilities in
more detail.
Box 25.1 - The Turing test
How will we know when
a computer program has achieved an equivalent intelligence to a human? Is there
some set of objective measures that can be assembled against which a computer
program can be tested? Alan Turing was one of the founders of modern computer
science and AI, whose intellectual achievements to this day remain astonishing
in their breadth and importance. When he came to ponder this question, he
brilliantly side-stepped the problem almost entirely.
In his opinion, there
were no ultimately useful measures of intelligence. It was sufficient that an
objective observer could not tell the difference in conversation between a
human and a computer for us to conclude that the computer was intelligent. To
cancel out any potential observer biases, Turing’s test put the observer in a
room, equipped with a computer keyboard and screen, and made the observer talk
to the test subjects only using these. The observer would engage in a
discussion with the test subjects using the printed word, much as one would today
by exchanging e-mail with a remote colleague. If a set of observers could not
distinguish the computer from another human in over 50% of cases, then Turing
felt that one had to accept that the computer was intelligent.
Another consequence
of the Turing test is that it says nothing about how one builds an intelligent
artefact, thus neatly avoiding discussions about whether the artefact needed to
in anyway mimic the structure of the human brain or our cognitive processes. It
really didn’t matter how the system was built in Turing’s mind. Its
intelligence should only be assessed based upon its overt behaviour.
There have been
attempts to build systems that can pass Turing’s test in recent years. Some
have managed to convince at least some humans in a panel of judges that they
too are human, but none have yet passed the mark set by Turing.
Knowledge-based systems are the commonest type
of CDSS technology in routine clinical use. Also known as expert systems,
they contain clinical knowledge, usually about a very specifically defined
task, and are able to reason with data from individual patients to come up with
reasoned conclusions. Although there are many variations, the knowledge within
an expert system is typically represented in the form of a set of rules.
There are many different types of clinical task
to which expert systems can be applied.
Alerts and reminders. In real-time situations, an expert system
attached to a patient monitoring device like an ECG or pulse oximeter can warn
of changes in a patient’s condition. In less acute circumstances, it might scan
laboratory test results, drug or test order, or the EMR and then send reminders
or warnings, either via immediate on-screen feedback or through a messaging system
like e-mail. Reminder systems are used to notify clinicians
of important tasks that need to be done before an event occurs. For example, an
outpatient clinic reminder system may generate a list of immunizations that
each patient on the daily schedule requires (Randolph et al., 1999).
Diagnostic assistance. When a patient’s case is complex, rare or the
person making the diagnosis is simply inexperienced, an expert system can help
in the formulation of likely diagnoses based on patient data presented to it,
and the systems understanding of illness, stored in its knowledge base.
Diagnostic assistance is often needed with complex data, such as the ECG, where
most clinicians can make straightforward diagnoses, but may miss rare
presentations of common illnesses like myocardial infarction, or may struggle
with formulating diagnoses, which typically require specialised expertise.
Therapy critiquing and planning. Critiquing systems can look for
inconsistencies, errors and omissions in an existing treatment plan, but do not
assist in the generation of the plan. Critiquing systems
can applied to physician order entry. For example, on entering an order for a
blood transfusion a clinician may receive a message stating that the patient's
haemoglobin level is above the transfusion threshold, and the clinician must
justify the order by stating an indication, such as active bleeding (Randolph
et al., 1999). Planning systems on the other hand have more
knowledge about the structure of treatment protocols and can be used to
formulate a treatment based upon a data on patient’s specific condition from
the EMR and accepted treatment guidelines.
Prescribing decision support systems. One of the commonest clinical tasks is the
prescription of medications, and PDSS can assist by checking for drug-drug
interactions, dosage errors, and if connected to an EMR, for other prescribing
contraindications such as allergy. PDSS are usually well received because they
support a pre-existing routine task, and as well as improving the quality of
the clinical decision, usually offer other benefits like automated script
generation and sometimes electronic transmission of the script to a pharmacy.
Information retrieval. Finding evidence in support of clinical cases
is still difficult on the Web, and intelligent information retrieval systems
can assist in formulating appropriately specific and accurate clinical
questions, they can act as information filters, by reducing the number of
documents found in response to a query to a Web search engine, and they can
assist in identifying the most appropriate sources of evidence appropriate to a
clinical question. More complex software ‘agents’ can be sent to search for and
retrieve information to answer clinical questions, for example on the Internet.
The agent may contain knowledge about its user’s preferences and needs, and may
also have some clinical knowledge to assist it in assessing the importance and
utility of what it finds.
Image recognition and interpretation. Many clinical images can now be automatically
interpreted, from plane X-rays through to more complex images like angiograms,
CT and MRI scans. This is of value in mass-screenings, for example, when the
system can flag potentially abnormal images for detailed human attention.
There are numerous reasons why more CDSS are
not in routine use (Coiera, 1994). Some require the existence of an electronic
patient record system to supply their data, and most institutions and practices
do not yet have all their working data available electronically. Others suffer
from poor human interface design and so do not get used even if they are of
benefit.
Much of the initial reluctance to use CDSS
simply arose because they did not fit naturally into the process of care, and
as a result using them required additional effort from already busy
individuals. It is also true, but perhaps dangerous, to ascribe some of the
reluctance to use early systems upon the technophobia or computer illiteracy of
healthcare workers. If a system is perceived by those using it to be
beneficial, then it will be used. If not, independent of its true value, it
will probably be rejected.
Happily, there are today very many systems that
have made it into clinical use (Table 25.1). Many of these are small, but nevertheless
make positive contributions to care. Others, like prescribing decision support
systems, are in widespread use and for many clinicians form a routine part of
their everyday practice.
In the first decade of AIM, most research
systems were developed to assist clinicians in the process of diagnosis,
typically with the intention that it would be used during a clinical encounter
with a patient. Most of these early systems did not develop further than the
research laboratory, partly because they did not gain sufficient support from
clinicians to permit their routine introduction.
DXplain is an example of one of these clinical
decision support systems, developed at the Massachusetts General Hospital
(Barnett et al., 1987). It is used to assist in the process of diagnosis,
taking a set of clinical findings including signs, symptoms, laboratory data
and then produces a ranked list of diagnoses. It provides justification for
each of differential diagnosis, and suggests further investigations. The system
contains a database of crude probabilities for over 4,500 clinical
manifestations that are associated with over 2,000 different diseases.
|
Table
25.1: A wide variety
of expert systems have been placed into routine
clinical use. These systems are typical examples. |
|
SYSTEM |
DESCRIPTION |
|
ACUTE CARE SYSTEMS |
|
|
(Dugas et al. 2002), |
Decision
support in hepatic surgery |
|
POEMS
(Sawar et al., 1992) |
Post-operative
care decision support |
|
VIE-PNN
(Miksch et al., 1993) |
Parenteral
nutrition planning for neonatal ICU |
|
NéoGanesh
(Dojat et al., 1996) |
ICU
ventilator management |
|
SETH
(Darmoni, 1993) |
Clinical
toxicology advisor |
|
LABORATORY SYSTEMS |
|
|
GERMWATCHER
(Kahn et al.,1993) |
Analysis
of nosocomial infections |
|
HEPAXPERT
I, II (Adlassnig et al., 1991) |
Interprets
tests for hepatitis A and B |
|
Acid-base
expert system (Pince, et al., 1990) |
Interpretation
of acid-base disorders |
|
MICROBIOLOGY/PHARMACY
(Morrell et al., 1993) |
Monitors
renal active antibiotic dosing |
|
PEIRS
(Edwards et al., 1993) |
Chemical
pathology expert system |
|
PUFF
(Snow et al., 1988) |
Interprets
pulmonary function tests |
|
Pro.M.D.-
CSF Diagnostics (Trendelenburg, 1994) |
Interpretation
of CSF findings |
|
EDUCATIONAL SYSTEMS |
|
|
DXPLAIN
(Barnett et al., 1987) |
Internal
medicine expert system |
|
ILLIAD
(Warner et al., 1988) |
Internal
medicine expert system |
|
HELP
(Kuperman et al., 1991) |
Knowledge-based
hospital information system |
|
QUALITY ASSURANCE AND ADMINISTRATION |
|
|
COLORADO
MEDICAID UTILIZATION REVIEW SYSTEM |
Quality
review of drug prescribing practices |
|
MANAGED
SECOND SURGICAL OPINION SYSTEM |
Aetna
Life and Casualty assessor system |
|
MEDICAL IMAGING |
|
|
PERFEX
(Ezquerra et al., 1992) |
Interprets
cardiac SPECT data |
|
(Lindahl et al.
1999). |
classification
of scintigrams |
DXplain is in routine use at a number of
hospitals and medical schools, mostly for clinical education purposes, but is
also available for clinical consultation. It also has a role as an electronic
medical textbook. It is able to provide a description of over 2,000 different
diseases, emphasising the signs and symptoms that occur in each disease and
provides recent references appropriate for each specific disease.
Decision support systems need not be ‘stand
alone’ but can be deeply integrated into an electronic patient record system.
Indeed, such integration reduces the barriers to using such a system, by
crafting them more closely into clinical working processes, rather than
expecting workers to create new processes to use them.
The HELP system is an example of this type of
knowledge-based hospital information system, which began operation in 1980
(Kuperman et al., 1990; Kuperman et al., 1991). It not only supports the
routine applications of a hospital information system (HIS) including
management of admissions and discharges and order entry, but also provides a
decision support function. The decision support system has been actively
incorporated into the functions of the routine HIS applications. Decision
support provides clinicians with alerts and reminders, data interpretation and
patient diagnosis facilities, patient management suggestions and clinical
protocols. Activation of the decision support is provided within the
applications but can also be triggered automatically as clinical data is
entered into the patient's computerised record.
One of the most successful areas in which
expert systems are applied is in the clinical laboratory. Practitioners may be
unaware that while a pathologist checked the printed report they receive from a
laboratory, the whole report may now have been generated by a computer system
that has automatically interpreted the test results. Examples of such systems
include the following.
· The PUFF system for automatic interpretation of
pulmonary function tests has been sold in its commercial form to hundreds of
sites world-wide (Snow et al., 1988). PUFF went into production at Pacific
Presbyterian Medical Centre in San Francisco in 1977, making it one of the very
earliest medical expert systems in use. Many thousands of cases later, it is
still in routine use.
· A more general example of this type of system
is PEIRS (Pathology Expert Interpretative Reporting System) (Edwards et al.,
1993). During it period of operation, PEIRS interpreted about 80-100 reports a
day with a diagnostic accuracy of about 95%. It accounted for about which 20%
of all the reports generated by the hospital’s Chemical Pathology Department.
PEIRS reported on thyroid function tests, arterial blood gases, urine and
plasma catecholamines, hCG (human chorionic gonadotrophin) and AFP (alpha
fetoprotein), glucose tolerance tests, cortisol, gastrin, cholinesterase
phenotypes and parathyroid hormone related peptide (PTH-RP).
Laboratory expert systems usually do not
intrude into clinical practice. Rather, they are embedded within the process of
care, and with the exception of laboratory staff, clinicians working with
patients do not need to interact with them. For the ordering clinician, the
system prints a report with a diagnostic hypothesis for consideration, but does
not remove responsibility for information gathering, examination, assessment
and treatment. For the pathologist, the system cuts down the workload of
generating reports, without removing the need to check and correct reports.
Learning is seen to be the quintessential
characteristic of an intelligent being. Consequently, one of the driving
ambitions of AI has been to develop computers that can learn from experience.
The resulting developments in the AI sub-field of machine learning have
resulted in a set of techniques that have the potential to alter the way in
which knowledge is created.
All scientists are familiar with the
statistical approach to data analysis. Given a particular hypothesis,
statistical tests are applied to data to see if any relationships can be found
between different parameters. Machine learning systems can go much further.
They look at raw data and then attempt to hypothesise relationships within the
data, and newer learning systems are able to produce quite complex characterisations
of those relationships. In other words they attempt to discover humanly
understandable concepts.
Learning techniques include neural networks,
but encompass a large variety of other methods as well, each with their own
particular characteristic benefits and difficulties. For example, some systems
are able to learn decision trees from examples taken from data (Quinlan, 1986).
These trees look much like the decision tress discussed in Chapter eight, and
can be used to help in diagnosis.
Healthcare has formed a rich test-bed for
machine learning experiments in the past, allowing scientists to develop
complex and powerful learning systems. While there has been much practical use
of expert systems in routine clinical settings, at present machine learning
systems still seem to be used in a more experimental way. There are, however,
many situations in which they can make a significant contribution.
· Machine learning systems can be used to develop
the knowledge bases used by expert systems. Given a set of clinical cases that
act as examples, a machine learning system can produce a systematic description
of those clinical features that uniquely characterise the clinical conditions.
This knowledge can be expressed in the form of simple rules, or often as a
decision tree. A classic example of this type of system is KARDIO, which was
developed to interpret ECGs (Bratko et al., 1989).
· This approach can be extended to explore poorly
understood areas of healthcare, and people now talk of the process of ‘data mining’
and of ‘knowledge discovery’ systems. For example, it is possible, using
patient data, to automatically construct pathophysiological models that
describe the functional relationships between the various measurements. For
example, Hau and Coiera (1997) describe a learning system that takes real-time
patient data obtained during cardiac bypass surgery, and then creates models of
normal and abnormal cardiac physiology. These models might be used to look for
changes in a patient’s condition if used at the time they are created.
Alternatively, if used in a research setting, these models can serve as initial
hypotheses that can drive further experimentation.
· One particularly exciting development has been
the use of learning systems to discover new drugs. The learning system is given
examples of one or more drugs that weakly exhibit a particular activity, and
based upon a description of the chemical structure of those compounds, the
learning system suggests which of the chemical attributes are necessary for that
pharmacological activity. Based upon the new characterisation of chemical
structure produced by the learning system, drug designers can try to design a
new compound that has those characteristics. Currently, drug designers
synthesis a number of analogues of the drug they wish to improve upon, and
experiment with these to determine which exhibits the desired activity. By
boot-strapping the process using the machine learning approach, the development
of new drugs can be speeded up, and the costs significantly reduced. At present
statistical analyses of activity are used to assist with analogue development,
and machine learning techniques have been shown to at least equal if not
outperform them, as well as having the benefit of generating knowledge in a form
that is more easily understood by chemists (King et al., 1992). Since such
learning experiments are still in their infancy, significant developments can
be expected here in the next few years.
·
Machine learning
has a potential role to play in the development of clinical guidelines. It is
often the case that there are several alternate treatments for a given
condition, with slightly different outcomes. It may not be clear however, what
features of one particular treatment method are responsible for the better
results. If databases are kept of the outcomes of competing treatments, then
machine learning systems can be used to identify features that are responsible
for different outcomes.
Many potential benefits from CDSS have been
widely reported in the literature (Johnson & Feldman, 1995; Evans, 1996).
The claims made fall into 3 broad categories (Sintchenko et al., 2002):
1. Improved patient safety e.g. through reduced medication errors and
adverse events and improved medication and test ordering;
2. Improved quality of care e.g. by increasing clinicians’ available time
for direct patient care, increased application of clinical pathways and
guidelines, facilitating the use of up-to-date clinical evidence, improved
clinical documentation and patient satisfaction; and
3. Improved efficiency in health care delivery e.g. by reducing costs through faster order
processing, reductions in test duplication, decreased adverse events, and
changed patterns of drug prescribing favouring cheaper but equally effective
generic brands.
The evaluation of CDSS are often poorly
conceptualised and implemented (Cushman, 1997; Heathfield et al, 1998). In a
systematic review of 55 CDSS evaluations, Sintchenko et al. (2003) found that
less than a quarter involved a randomised controlled trial (Table 25.2).
|
Table 25.2: Evaluation
methodologies used in CDSS evaluation studies (N=55) (Sintchenko et al., 2003). |
Evaluation
Methodology %
|
Evaluation Methodology |
% |
|
Before/after sample |
27.27% |
|
RCT |
23.64% |
|
Case-control |
21.82% |
|
Case study |
16.36% |
|
Qualitative |
5.45% |
|
Not done |
3.64% |
|
Longitudinal study |
1.82% |
Table 25.3: Limitations of evaluation components of CDSS studies
(Sintchenko et al., 2003).
·
A focus on post-system
implementation evaluation of users’ perceptions of systems.
·
A reliance upon retrospective
designs which are limited in their ability to determine the extent to which
improvements in outcome and process indicators may be causally linked to the
CDSS.
·
Rare adoption of a
comprehensive approach to evaluation where a multi-method design is used to
capture the impact of CDSS on multiple dimensions.
·
Concentration on assessment
of technical and functionality issues, which are estimated to explain less than 20% of IT failures. Such
evaluations have also failed to determine why useful and useable systems are often
unsuccessful.
·
Expectations that improvements
will be immediate. In the short term
there is likely to be a decrease in productivity. Implementing information systems takes time and measuring its
impact is complex thus a long-term evaluation strategy is required but rarely
implemented.
· Almost none use naturalistic design in routine clinical settings with real patients and most studies involved doctors and excluded other clinical or managerial staff.
Evaluation of CDSS is complex, and there are
many challenges in appropriately structuring such studies (Randolph et al.,
1999). Consequently many studies fall into traps such as overemphasising user
satisfaction as a measure of system success. Some of the most frequent
limitations of CDSS studies are listed in Table 25.3. While CDSS are often
justified on the basis of clinical benefit, evaluation often focuses on
technical issues or on clinical processes.
Measurement of clinical outcomes is still sadly rare amongst evaluation
studies, and most studies that do attempt to measure clinical impact do so
through process variables (Table 25.4).
Nevertheless, the growing pool of evidence on
the impact of CDSS in delivering improvements in the quality, safety and
efficiency of health is promising, mainly in relation to alerts and reminders,
and PDSSs. The following sections demonstrate not only the value of decision
support systems in clinical practice, but also the complexity of the evaluation
task, the ongoing gaps in our knowledge about their effectiveness, and the
richness and variety of form of decision support.
|
Table 25.4: Impact measures chosen in CDSS
evaluation studies (N=55)
(Sintchenko et al., 2003). |
|
|
Impact Measured |
Impact not measured |
|
|
|
Improvement demonstrated (no. of studies) |
No significant impact (no. of studies) |
(% of studies) |
|
Process variables |
|
|
|
|
Confidence in decision |
12 |
3 |
40 (73%) |
|
Patterns of care |
15 |
4 |
36 (66%) |
|
Adherence to protocol |
10 |
4 |
41 (75%) |
|
Efficiency/Cost |
10 |
2 |
43 (78%) |
|
Adverse effects |
12 |
3 |
40 (73%) |