Approved For Release 2000/08/10 CIA-RDP96-00791 R000200320001 -1 ENHANCING E H1 t" Issues, nem*s, and Techn4ues Daniel Druckman and John A. Swets, Editors Committee on Techniques for the Enhancement of Human Performance Commission on Behavioral and Social Sciences and Education National Research Council NATIONAL ACADEMY PRESS Washington, D.C. 1988 Approved For Release 2000/08/10 CIA-RDP96-00791 R000200320001 -1 Approved For Release 2000/08/n 0 CIA-RDP96-00791 R000200320001 -1 NATIONAL ACADEMY PRESS - 2101 Constitution Avenue, NW - Washington, DC 20418 NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance. This report has been reviewed by a group other than the authors according to procedures approved by a Report Review Committee consisting of members of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furthera&e of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Frank Press is president of the National Academy of Sciences. The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achieve- ments of engineers. Dr. Robert M. White is president of the National Academy of Engineering. The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charier to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Samuel 0. Thier is president of the Institute of Medicine. The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy's purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Frank Press and Dr. Robert M. White are chairman and vice chairman, respectively, of the National Research Council. Library of Congress Cataloging-in-Publication Data Enhancing human performance : issues, theories, and techniques Daniel Druckman and John A. Swets, editors. P. cm. "Committee on Techniques for the Enhancement of Human Performance, Commission on Behavioral and Social Sciences and Education, National Research Council." Bibliography: p. Includes index. ISBN 0-309-03792-1. ISBN 0-309-03787-5 (soft) 1. Self-realization-Congresses. 2. Performance-Psychological aspects-Congresses. 1. Druckman, Daniel, 1939- . 11. Swets, John Arthur, 1928- . 111. National Research Council (U.S.). Committee on Techniques for the Enhancement of Human Performance. BF637.S4E56 1987 15"c19 87-31233 Copyright 0 1988 by the National Academy of Sciences CIP Printed in the United States of America COMMITTEE ON TECHNIQUES FOR THE ENHANCEMENT OF HUMAN PERFORMANCE JOHN A. SWETS, Chair, Bolt Beranek and Newman Inc., Cambridge, Mass. ROBERT A. BJORK, Department of Psychology, University of California, Los Angeles THOMAS D. COOK, Department of Psychology, Northwestern University GERALD C. DAVISON, Department of Psychology, University of Southern California LLOYD G. HUMPHREYS, Department of Psychology, University of Illinois RAY HYMAN, Department of Psychology, University of Oregon DANIEL M. LANDERS, Department of Physical Education, Arizona State University SANDRA A. MOBLEY, Director of Training and Development, The Wyatt Company, Washington, D.C. LYMAN W. PORTER, Graduate School of Management, University of California, Irvine MICHAEL 1. POSNER, Department of Neurology, Washington University WALTER SCHNEIDER, Department of Psychology, University of Pittsburgh JEROME E. SINGER, Department of Medical Psychology, Uniformed Services University of Health Sciences, Bethesda, Md. SALLY P. SPRINGER, Department of Psychology, State University of New York, Stony Brook RICHARD F. THOMPSON, Department of Psychology, Stanford University DANIEL DRUCKMAN, Study Director JULIE A. KRAMAN, Administrative Secretary Approved For Release 2000/08/10 CIA-RDP96-00791 R00020032,0001 -1 III. Approved For Release 2000/0 0 CIA-RDP96-00791ROO0200320001-1 Contents PREFACE ............................................. V11 I OVERVIEW ........................................ I 1 Introduction ........ :****--*-'-- ........ 3 2 Findings and Conclusions ......................... 15 3 Evaluation Issues ................................ 24 11 PSYCHOLOGICAL TECHNIQUES .................. 37 4 Leaming ........................................ 39 5 Improving Motor Skills ........................... 61 6 Altering Mental States ........................... 102 7 Stress Management ............................. 115 8 Social Processes ................................ 133 III PARAPSYCHOLOGICAL TECHNIQUES ........... 167 9 Paranormal Phenomena ........................... 169 REFERENCES ........................................ 209 APPENDIXES ..................... ................... 233 A Summary of Techniques: Theory, Research, and Applications ................................... 235 B Background Papersi ............................. 246 C Committee Activities ............................ 248 Approved For Release 2000/08/14: CIA-RDP96-00791 R000200320'001 -1 Vi CONTEMS D Key Terms .................................... 252 E Military Applications of Scientific Information ..... 262 F Biographical Sketches ........................... 282 Ir INWX Q (L 0 > 0 L_ CL CL ............................................... 289 Preface The Army Research Institute in 1984 asked the National Academy of Sciences to form a committee to examine the potential value of certain techniques that had been proposed to enhance human performance. As a class, these techniques were viewed as extraordinary, in that they were developed outside the mainstream of the human sciences and were was presented with strong claims for high effectiveness. The committee also to recommend general policy and criteria for future evaluation of enhancement techniques by the Army. - The Committee on Techniques for the Enhancement of Human Per formance first met in June 1985. The 14 members of the committee were appointed for their expertise in areas related to the techniques examined. The disciplines they represent include experimental, physiological, clin- ical, social, and industrial psychology and cognitive neuroscience; one member is a training program director from the private sector. During the next two years, the committee gathered six times, met in toto or in part on several occasions with various representatives of the Army conducted interviews and site visits and sent subcommittees on severai others, and commissioned 10 analytical and survey papers. The committee also examined a variety of materials, including state-of-the-art reviews of relevant literature, reports commissioned by the Army Research Institute, and unpublished documents provided by institutes, practition- ers, and researchers. The report that follows describes the committee's activities, findings, and conclusions. Though cast largely in terms of the sponsor's setting, this report is relevant to other settings, for example, industry. The next few paragraphs present some background. V11 Q Q (L 0 W Q 00 Q Q CD Q 04 (D U) (D 77D 0 LL > 0 L_ CL CL Viii PREFACE That the United States Army should be concerned to enhance the performance of its personnel is self-evident. We know that young volunteers must become not only soldiers who do well in battle but also tNinicians who skillfully operate and maintain complex equipment in pe#e and war. We are aware, moreover, that personal skills are not en~gh: individuals are heavily dependent on each other within small gr4mps, and groups of various sizes must work very effectively together to%rmit survival and ensure success. And, of course,,all must be ready to ve peak performances in situations of great hardship, uncertainty, 91 and4stress. In the face of these staggering requirements, one must realize thi8turnover of personnel is high and that the training time available- toQnpart the necessary cognitive, physical, and social skills-is brief. it comes as no surprise that the Army is on the lookout for techniques th9P can help enhance human performance. The Army Research Institute is2harged with seeking out and developing such techniques: it does so iemploying researchers in the human sciences and by supporting bP a4opriate research in universities and other public and private organim z*ns. It focuses largely on promising new techniques as they appear inwle mainstream of behavioral, physiological, and social research. Ho ever, given the pressures and given a view of mainstream research a ow, narrow, and insufficiently targeted, it also comes as no surprise t 41some influential officers and certain segments of the Army want to cage a broader net to snare promising enhancement techniques. To do tW they look beyond traditional research organizations and practices to~vhat are viewed as extraordinary techniques. These techniques are thi&ght possibly to provide such unusual benefits as accelerated learning, le'&'ling during sleep, superior performance through altered mental states, bmer management of behavior under stress, more effective ways of infencing other people, and so on. There is also an initiative within the AWy to consider techniques based on paranormal phenomena, for ex.Snple, extrasensory perception to view remote sites and psychokinesis tq[%Lfluence the operation of distant machines. Along with these urgings to examine, to try, or to implement extraor- d&y techniques come difficult new problems for those in the Army rewonsible for evaluation, as well as for those in the Army responsible foopersonnel and training practices. One issue is that proponents of such te4niques are usually not content with traditional evaluation procedures ortcientific standards of evidence, often giving more weight to personal exLL-rience and testimony. Furthermore, a typical technique of this kind do"S not arise from the usual research traditions of experiments published in refereed journals and peer review of cumulated evidence, but rather appears full-blown as a package promoted by a commercial vendor. What does the Army Training and Doctrine Command or the base commander PREFACE ix do when the need is great, the package is ready, the claims are for miracles, some senior officers are vocally supportive, and the evaluation criteria are fluid? What do Army intelligence agencies do when the same conditions apply and other nations are said to be active in investigating paranormal effects? The committee decided to assess a representative set of the techniques in question and resolved to address the surrounding issues in an open- minded and thorough way. We therefore divided ourselves into a number of subcommittees organized according to the behavioral processes adm dressed by the several techniques: accelerated learning, sleep learning, guided imagery, split-brain effects, stress management, biofeedback, influence strategies, group cohesion, and parapsychology. In addition, a subcommittee on evaluation issues was formed to examine practices and standards relevant to all the techniques. Each chapter of the report was prepared by the appropriate subcommittee, but interactions were frequent and so the report represents a collaborative effort of all the members. Chapter I provides a context for the committee's task and the Army's interest in enhancing performance, characterizes some particular tech- niques, and introduces some general issues in evaluating them. Chapter 2 presents the committee's findings about the techniques examined and conclusions about appropriate evaluation procedures. Chapter 3 treats the relevant evaluation issues more systematically and presents the committee's philosophy of evaluation as it pertains to the matter at hand. Chapters 4 through 8 deal with particular techniques but are organized in terms of more general psychological processes. Chapter 9 considers parapsychological techniques. The report concludes with six appendixes. Appendix A briefly sum- marizes the key elements of each enhancement technique. Appendix B lists the ten papers commissioned by the committee and their authors. Appendix C lists the members and activities of the subcommittees and also the activities of the committee as a whole. Appendix D lists key terms used in the research on particular techniques. Appendix E discusses the application of scientific research by the military. Appendix F contains biographical sketches of the committee members. As committee chair, I am now in the pleasant position of recounting the several contributors to the total committee process, a process that went remarkably well. Definition and guidance for the committee's task came primarily from Edgar M. Johnson, director of the Army Research Institute. Administrative and technical liaison was ably provided by project monitor George Lawrence, who worked closely with the com- mittee in its various activities. They were supported well by several senior Army officers, including Colonel William Darryl Henderson, Commander of the Army Research Institute; Major General John Crosby, Q Q Q C*4 M Q Q C*4 Q Q Q W Q Q (L 0 W 13 > 0- CL CL < X PREFACE Assistant Deputy Chief of Staff for Personnel; and General Maxwell R. Thurman, Vice Chief of Staff. The committee met with members of a resource advisory group that included Lieutenant General Robert M. Eken, chair, Deputy Chief of Staff for Personnel; Lieutenant General SiAey T. Weinstein, Assistant Chief of Staff for Intelligence; Dr. Louis M&ameron, Director of Army Research and Technology; Major General MRrice 0. Edmunds, Commander of the Soldier Support Center; and M2m General Philip K. Russell, Commander of the Medical Research ang Development Command. Among the Army staff who were very hc_gful to the committee are Colonel John Alexander and Mr. Robert Kims. the names of many others appear in Appendix C. e'committee's two consultants contributed special expertise: Paul Horwitz (of Bolt Beranek and Newman Inc.) joined the site visits of the sAommittee on parapsychology and advised on physical aspects of e,-griments in that area; James Schroeder (of Southwest Research In,Wtute) attended the committee's meeting at Fort Berming, Georgia, angbadvised on the application of scientific research by the military (see A&endix E). The committee also received special expertise by commis- si(Ming papers. These papers and their authors are listed in Appendix B. At the National Research Council, David Goslin, executive director of th ornmission on Behavioral and Social Sciences and Education, once 5C again provided wise counsel and support. Ira Hirsh, commission chair, ancb William Estes, also representing the commission, gave valuable adr~ice and encouragement. Thomas Landauer, a member of the NRC's 00 Cqvmittee on Human Factors, provided liaison in the areas of our committees' mutual interests. The reviewers of this report gave us a good m8sure of reinforcement along with helpful critiques. Eugenia Grohman, as"ciate director for reports, lent experience and wisdom to this report. Sp&ial gratitude is extended to Christine McShane, the commission's ed lor: her skillful editing of the entire manuscript contributed substantially to-as readability, and the coherence of the volume owes much to her suMestions for organizing the material. Julie Kraman, as administrative se tary to the committee, earned its considerable appreciation for n seri g up efficient meetings and for handling all manner of tasks graciously ar*smoothly. Ijiniel Druckman, study director of the project, receives the commit- tecOs great appreciation for his intellectual contributions across the broad rarW of topics considered as well as for his logistic support. Working c1 1y with the authors of chapters and commissioned papers, he provided an ntegration of the several contributions as well as much of the introductory and interstitial material. He also served on two subcommit- tees in areas of his expertise. The ultimate debt of anyone who finds this report useful, and my large PREFACE Xi personal debt, is to the members of the committee. As individuals, their capabilities are broad and deep. As a group, they gave generously and productively of their time, were always engaged, responded to every challenge, and, especially, showed an exceptional talent for reaching consensus in a collegial, advised, and efficient way. JOHN A. SWETS, Chair Committee on Techniques for the Enhancement of Human Performance Q Q Q C*4 M Q (D C*4 Q Q Q Q Q IL 0 CL CL Q Q Q C4 M Q. Q C4 Q Q Q Q Q (L > 0 L_ CL CL PART Overview ART I CONSISTS OF THREE CHAPTERS. Chapter I sets the stage for the report. It describes the committee's task, provides background on P the Army's interest in enhancement techniques, characterizes specific techniques examined by the committee, and identifies the main issues in evaluating the relation between techniques and human performance. Chapter 2 presents the committee's findings and conclusions. We draw general conclusions about the process of consideration given to any technique and state specific findings and conclusions for each of the areas of human performance examined. Chapter 3 presents the committee's philosophy of evaluation as it pertains to enhancement techniques. Some of the issues involved concern the conduct of basic research; others concern the conduct of field tests. With respect to basic research, issues include.the plausibility of inferences about novel concepts, causation, alternative explanations of causal relations, and the generalizability of causal relations. With respect to field tests, a number of questions are of interest: Does the enhancement program meet genuine Army needs? Is the resulting program implement- able, given program design and resources? Do unintended side effects limit utility? Is the program more cost-effective than its alternatives? These questions underscore the reality that evaluation research is largely a pragmatic activity influenced by the organizational context in which it occurs. Q Q Q C4 M Q Q C4 Q Q Q Q Q (L 0 115 Q , 0-0- Q Q Q C*4 CD a) U) a) (I) 0 LL > 0 L_ rL CL < T- CO (L e0 Q 04 (D U) W (D 7~ 0 LL > 2 CL CL Introduction THE COMMITTEE'S TASK At the request of the U.S. Army Research Institute, the National Research Council formed a committee to assess the field of techniques that are claimed to enhance human performance. The Institute asked the Council to evaluate the claims made by proponents of selected existing techniques and to address two general additional questions: (1) What are the appropriate criteria for evaluating claims for such techniques in the future? (2) What research is needed to advance our understanding of performance enhancement in areas related to the proposed techniques? The objectives of the committee's study are to provide an authoritative assessment of these questions for policymakers in research and devel- opment who are consumers of the techniques, as well as to consider their possible applications to Army training. Many of the techniques under consideration grew out of the human potential movement of the 1960s, including guided imagery, meditation, biofeedback, neurolinguistic programming, sleep learning, accelerated learning, split-brain learning, and various techniques to reduce stress and increase concentration. Many of these techniques have gained popularity over the past two decades, promoted by persons eager to provide answers to problems of human performance or to prosper from them. While often using the language of science to justify their approach, these promoters are for the most part not trained professionals in the social and behavioral sciences. Nonetheless, they do appeal to basic needs for human perform- ance, and the Army, like many other institutions, is attracted to the prospect of cost-effective procedures that can improve performance. 3 N N FJ ' ' CD 00 04 (D U) CU (D 77D 0 LL > 2 CL CL < 4 ENHANCING HUMAN PERFORMANCE These institutions must evaluate the effects of such procedures, however. Issues include the appropriateness of a quick-fix approach, the distinction between the impact of an experience and actual change, and the plausibility of evidence indicating that something is happening even if the effects are not, reproducible or the benefits uncertain. more conservative atmosphere in the 1980s is reflected in the way Z te&iques are advanced. Motivation in the 1980s may be primarily * preneurial, not ideological, as it was in the 1960s. Advocates focus * ,ating the techniques to specific tasks, such as marksmanship, foreign la age acquisition, fine motor skills, sleep inducement, and even combat eflativeness. Some techniques are in fact rooted in a scientific literature. Foothese reasons the various techniques have attracted the interest of in%.Tutions that have rejected, and would probably continue to reject, coUhtercultural trends in society. Indeed, much attention has been given to6ese techniques by industrial, government, and military policymakers, as9ell as by the general public. For this reason especially, it is important to(O'dress the issues surrounding the claims made for effectiveness. fflaborate training programs have grown, nourished by their developers' en~usiasm and salesmanship in a social context receptive to quick cures. Folynany of these programs, success in the marketplace is used to justify thZ_t4approaches. For others, more esoteric concepts, including the role of(r?eurotransmitters, the physics of neuromuscular programming, brain wq~p patterns, hemispheric laterality, high-access memory storage, pre feZmd sensory modalities, and low-gain innervation of muscles, are used to tempt to provide scientific justification for the claims. The chapters 8 thafollow evaluate the evidence and theories used to support the claims of Several popular techniques. Before turning to these evaluations, hovwver, we provide some background on the Army's interest in these tec6niques, as well as a discussion of issues surrounding enhanced pe&rmance and issues in evaluating the relation between techniques and2performance. THE ARMY'S NEEDS 0 e Army motto, "Be all that you can be," symbolizes the current i~ ethls of the institution, an army of excellence. Emphasis is placed on att&ing certain ideals, such as fearlessness, cunning, courage, one-shot effil eliveness, fatigue reversal, and nighttime fighting capabilities. These id are assumed to be realizable through training, even if the most efAbtive techniques have not as yet been identified. The culture of improvement is further reinforced by the dilemma created by an all- volunteer Army and the demands of complex new computer technologies. Many civilians enter military service with only the required minimum of INTRODUCTION 5 formal education; most of these volunteers enlist in the Army. For this reason, the Army's emphasis on skill training is well founded. The importance of the human element in combat is recognized in the Army Science Board's 1983 report "Emerging Concepts in Human Technology," which phrases the issue in terms of high yield at relatively low investment. Human capital is considered to be the best potential source for growth in Army effectiveness, both in terms of return on investment and as a moral imperative "if we are to commit our soldiers to fight outnumbered and win." The technologies singled out in the report are those that can improve creativity and innovation, learning and training, motivation and cohesion, leadership and management, individual, crew, and unit fitness, soldier-machine interface, and the general productivity of the Army's human resources. The Board's report largely bypasses issues of systematic evaluation of enhancement techniques within the Army context, while addressing mechanisms for integrating them with Army activities. Little concern is shown for adducing relevant criteria to determine whether implementation is feasible. The Army's ambitious goals, combined with a reluctance to deal with the complexities surrounding issues of human performance, make this institution potentially susceptible to a variety of claims made by technique developers. It would therefore seem prudent to devise criteria for evaluating those claims. A SELLER'S MARKET Techniques for enhancement of human performance have received much attention in the popular press. They have been actively promoted by entrepreneurs who sense a profitable market in self-improvement. The American Society for Training and Development "estimates that com- panies are spending an astounding $30 billion a year on formal courses and training programs for workers. And that's only the tip of the iceberg" (Wall Street Journal, August 5, 1986). They are also taken seriously by the U.S. military, who are at times accused of losing the "mind race" to the Soviets (see, for example, Anderson and Van Atta, Washington Post, July 17, 1985). The Army has shown particular interest in techniques that help people acquire, maintain, or improve such skills as classroom learning, communication and influence, creativity, and accuracy in the execution of tasks requiring motor skills. Those that are cost-effective and produce relatively rapid results are likely to receive the most attention, along with research breakthroughs that could be a basis for new training programs. What are these techniques? What claims are being made for them? Is there evidence that substantiates these claims? Examples of techniques include biofeedback (information about internal 7 Ir- CD CD Q C,4 M Q Q C*4 Q Q CD Q 9 to [L 0 0 Q T- 00 Q - Q C5 Q C*4 a) U) a) (D 0 LL 13 a) > 0 L_ CL CL 6 ENHANCING HUMAN PERFORMANCE processes), Suggestive Accelerative Learning and Teaching Techniques (a package of methods geared primarily toward classroom learning), hemispheric synchronization (a machine-aided process based on assump tioos about right brain-left brain activities), neurolinguistic programming I (pwedures for influencing another person), and Concentrix (a procedure us@ to improve concentration on specific targets). Also of interest to thflkrmy are such processes as group cohesion and stress reduction, as wd" as the claims for sleep learning, peak performance, and parapsy- chgogy. Together, these techniques and processes cover the major types of%kills-motor, cognitive, and social. Several of them are described hed briefly, along with illustrative claims found in brochures and course m rial. figgestive Accelerative Learning and Teaching Techniques (SALTT) isr%n approach to training that employs a combination of physical regxation, mental concentration, guided imagery, suggestive principles, a baroque music with the intent of improving classroom performance. Some applications have included language training, typing instruction, aig high school science courses. Attempts have been made to evaluate thR applications, and many of these evaluations are published in the J*nal of the Society for Accelerative Learning and Teaching (Psy- cff ogy Department, Iowa State University). The following is a sampling of.-claims made in brochures and convention announcements: "A proven n-ohod which has broad potential application in U.S. Army training"; 1will significantly reduce training time, improve memory of material loaned and introduce behavioral changes that positively affect soldier aormance-self-esteem, self-confidence, and mental discipline"; and gost students will prove to themselves that they have learned a far gNater amount of material per unit of time with a greater amount of p sure than they have ever previously done." eurolinguistic programming (NLP) refers to a set of procedures dfgeloped to influence and change the behaviors and beliefs of a target pggson. Its goals are mostly therapeutic, but its proponents also advocate thA use of the techniques in advertising, management, education, and ii1rerpersonal activities. A small research literature, published primarily indhe Journal of Counseling Psychology, has developed. Practitioners C' be trained and certified at various institutes, and the National 9 ACsociation for Neurolinguistic Programming distributes a newsletter to ittmembership, currently about 500 persons. Illustrative claims and Alimonials found in advertising materials include: -[NLP] has evolved a unique technology which encompasses a set of specific techniques enabling you to produce well-defined results" and -NLP ... is clear, easy to learn, and brilliant." A typical slogan is that found in a brochure from the Potomac Institutes, Silver Spring, Maryland: "The difference INTRODUCTION 7 that makes the difference, for education, management, psychotherapy, psychiatry, business, law, health care, and the arts." Hemi-Sync'!D, which is short for hemispheric synchronization, is a technique that consists of presenting two tones slightly differing in frequency to separate ears with stereo headphones to produce binaural beats. The long-known result is a tone that waxes and wanes at a frequency equal to the difference between the original tones. Pioneered as an enhancement technique by Robert Monroe of the Monroe Institute of Applied Science in Faber, Virginia, the technique is based on the assumption of a frequency following response (FFR) in the human brain. The FFR refers to a correspondence between sound signals heard by the ear and electrical signals recorded by an electroencephalograph (EEG). It is claimed that, by altering sound patterns, it is possible to alter states of awareness. Stated applications are in the areas of Ian uage learning, 9 stress management, reading skills, and creativity and problem solving. Claims of effectiveness stated in the Monroe Institute's brochure are wide-ranging, covering education (e.g., -77.8 percent of a class reported improvement in mental-motor skills"), health (early recuperation, lower blood pressure), psychotherapy (stress reduction, working with terminally ill patients, teaching autistic children), and sleep restorative training (e.g., "forty of forty-five insomniacs reported that one-month use of Hemi- Synclf~l tapes was at least as effective as medication, without the drug side effects"). SyberVision(D is a scripted videotape that presents an expert (e.g., a world-class athlete) repeatedly performing fundamental skills of his or her activity (e.g., golf) without verbal instructions. It is based loosely on principles of vicarious learning, guided imagery, and mental rehearsal. Developed and marketed by SyberVision Systems Inc., San Leandro California, the package includes a cassette and instruction manual wit an appendix on the "simple physics of neuro-muscular programming." The appendix presents a scientific rationale for the technique, for example, "the more you see and bear pure movement, the deeper it becomes imprinted in your nervous system ... and the more likely you are to perform it as a conditioned reflex," and "The decomposition of what is seen and sensorily experienced into an electromagnetic wave form is accomplished by a complex mathematical operation (Fourier Transform) by the brain" (Instruction Manual on Golf with Patty Sheehan). Support for enhanced performance is, however, based on testimonials rather than experiments, for example, Killy on skiing, a Stanford tennis coach on tennis, Professional Golf Association members on golf, Peters (In Search o~f Excellence) on achievement, Salk on leadership, and a variety of corporate executives and educators on self-improvement. Claims range from sweeping statements (e.g., "We owe these two men a large debt of 7 (,rS Q C) C4 M Q Q C4 Q Q Q X T" 0) [*- Q Q iL Q 115 < 0 Q T" 00 Q Q Q Q C4 a) U) M a) 0 LL 'a 4) > 0 < 8 ENHANCING HUMAN PERFORMANCE gratitude") to rather precise statements (e.g., "In 47 days I have lost 25 pounds 1191 to 1661, yet I took like I lost 40") (in the United Airlines magazine, Discoveries). This technique involves a significant marketing effert that builds on users' willingness to be quoted and the use of ac4nowledged academic experts (e.g., Stanford neuropsychologist Karl PcDram), whose role in the program is advertised as being central. Rress management techniques are procedures designed to alleviate armety or tension. Catering to an age of anxiety, self-help books, groups, ai~~linics on managing stress proliferate. A good example of the approach is e recent book by Charlesworth and Nathan (1982), which emphasizes (I fiWess, nutrition, managing time, general life-styles and life-cycles, as W9 as strategies such as progressive relaxation, autogenic training, and intage rehearsal. Appendixes provide the reader with home practice crarts, a guide to self-help groups, and suggested books and recordings. IS groups offer their members information, emotional support, and a s8se of belonging. Often stress management procedures are combined V a number of other techniques into a single package. The promoters n emphasize the total package rather than particular techniques; the 0 9, pWkages usually combine several processes that, when acting together, a4 thought to produce significant effects. ahe Army's needs for techniques that can improve performance make it.subject to the sorts of claims illustrated above. While they and other ccmsumers can avoid the more obvious pitfalls, the proliferation of choices aW products and the lack of scientific evidence allow marketplace criteria tCD become the bases for decisions. But there are exceptions. Some Ruhniques have received the attention of the scientific community, and d8dence is available to be used as criteria in such areas as biofeedback, gRded imagery, steep learning, cohesion, and even for some aspects of *chic phenomena and neurolinguistic programming. re has alerted us, for example, to the distinction between Whe literatu effects of biofeedback on fine motor skitls and on stress, to the ferent effects of mental and physical rehearsal, to placebo and Haw torne effects in stress research, to the priming and repetition effects of Wterial presented during sleep, to some dysfunctions of group cohesion, to the difficulties of replicating experiments on extrasensory perception, *d to the implausibility of specialized sensory modalities as postulated Iq NLP (see Appendix D for key terms). These findings make evident a (minplex relation between technique and performance. IMPROVED PERFORMANCE: COMPLEX ISSUES, SIMPLE SOLUTIONS The research literature in such traditional areas of experimental psy- chology as learning, perception, sensation, and motivation suggests INTRODUCTION 9 complex relations between interventions and improved performance. Many technique promoters appear to pay little attention to this literature, preferring an alternative route to invention: rather than derive a procedure from appropriate scientific literature, they create techniques from personal experiences, sudden insights, or informal observation of "what works." Science may enter the process after the technique is developed and used, for example, to legitimize its use or to endorse methods for evaluation. Research follows rather than precedes the invention. This sequence increases the likelihood that important considerations will be missed. We highlight some of these considerations in this section. The lack of easy avenues to improved performance may well be due to the complexity of the behavior in question. One definition of skills emphasizes the importance of the coordination of behavior: "A skilled response ... means one in which receptor-effector-feedback processes are highly organized, both spatially and temporally. The central problem for the study of skill learning is how such organizations or patterning comes about" (Fitts, 1964:244). This definition implies that skill learning involves an orchestration of diverse processes, making the topic an interesting one to various subfields of psychology. It also makes evident a number of unresolved issues, including whether different skills are learned and retained in different ways. The research findings obtained in this literature contribute to our understanding of the necessary, if not sufficient, conditions for improved performance. Research on skill acquisition addresses such basic questions as What are the stages of learning? and What is learned? Distinctions made between short-term and long-term memory, storage and between schemas and details have contributed to our understanding of basic processes (see Welford, 1976). Other questions have more direct consequences for application: for example, what contributes to the acquisition and main- tenance of skills? How can the adverse effects of stress, fatigue, and monotony be avoided? These questions are the basis for programs of research that can be divided into several parts, each defined in terms of empirical issues (Irion, 1969; see also the other chapters in Bilodeau and Bilodeau, 1969). Some examples of empirical issues are practice effects (differences due to distributed versus massed practice, long versus short rest periods, short versus long sessions), the whole-part problem (differ- ences due to learning a task as a whole versus learning it by its constituent elements), feedback (differences due to delays in receiving knowledge of results and to type of information during the delay period), retention (differences due to whether the the task is motor or verbal), and transfer of training. These and related considerations suggest that skill learning is an incremental process likely to differ from one type of skill to another. Whether intending to enhance motor, verbal, problem-solving, or social Q (D CR V) Q Q C24) Q Q X C-D [*- 8D CL CD IL 13 WN < Q 00 Q Q Q (D C4 0 CO 7~ X " 0 LL -0 > 0- a- 10 ENHANCING HUMAN PERFORMANCE performances, technique designers can ill afford to ignore these lessons from the experimental literature on skill acquisition and maintenance. It is also the case, however, that the agenda of unexplored issues is much laTg'er than the accomplishments to date, and this is recognized particularly i T" he rapidly growing field of cognitive psychology, in which the rbt "gformation-processing revolution" is just beginning. ckractical applications are, however, not automatic. Many excellent a~kications do not spring from basic science; some are the result of craft aR experience. More important perhaps are the indirect contributions nade in both directions-from basic to applied and vice versa. A s8tematic approach taken in both domains serves to vitalize each, as v%en applied investigations reveal new phenomena that need explanation T- 00$Vhen a new package incorporates basic principles discovered originally i 6`1he laboratory. Such an approach is likely to facilitate the design of a pro.priate techniques for skill acquisition. At issue is whether a particular -15.7 tWhmque can produce and sustain desired changes. 610ne conclusion from the research accumulated to date is that effective M~rventions are those that are continuous and self-regulating and take Xount of both context and person (see, for example, Lerner, 1984). Articularly relevant is the difference between short-term and long-term Mbmges. Effects obtained by many techniques for performance enhance- tNnt may be short-term in their effects. This distinction is made by Back (573, 1987) in his evaluation of the sensitivity training movement. The anges observed by sensitivity trainers and documented by evaluators A%y well reflect the impact of the experience per se. Such situation gects are unlikely to be sustained in different environments, an obser- fition supported by the literatures in both developmental and social Rbychology (Druckman, 1971; Frederiksen, 1972). These literatures cau- tgn against hasty generalizations from observed, situation-specific effects; t0ey also explain why long-term effects may be difficult to produce with Tef exposures to "treatments." Like the sensitivity trainers of the 1960s d 1970s, many of the promoters (and consumers) of the 1980s pay little Vention to issues of causality and intrinsic motivation, preferring instead W dwell on single dimensions of treatments or to offer a mixed package 'instructed in arbitrary ways and producing diffuse effects that reflect the experience. 0 "The issue of expected benefits from techniques provides a bridge &ween research and application. Research can be designed to evaluate 4i(chmques, as well as to discover possible unintended side effects. Indeed, a research literature has developed in some of the areas examined in this book, namely biofeedback, stress, and guided imagery. For many other techniques, however, a relevant body of research does not exist; this lack applies to some of the techniques examined by the committee, INTRODUCTION 11 as well as to those yet to appear on the market. It is these techniques that present a problem for us as evaluators. Evaluation without data is difficult, but not impossible. Our approach is to place the techniques into broader categories corresponding to the key processes being influenced, for example, learning, motor skills, and influence. By so doing, the claims can be evaluated within the frameworks of existing theories and metho- dologies. They can also be judged against results obtained in relate d areas. This approach serves as the organizing theme for the chapters that follow. EVALUATING THE TECHNIQUES Evaluations properly hinge on answers to a standard set of questions proposed in a paper entitled "Evaluating Human Technologies: What Questions Should We Ask?" by Hegge, Tyner, and Genser (1983) at the Walter Reed Army Institute for Research: 0 What changes will the technique produce? 0 What evidence supports the claims for the technique? 0 What theories stand behind the technique? * Who will be able to use the technique? 0 What are the implications of the technique for Army operations? 0 How does the technique fit with Army philosophy? 0 What are the cost-benefit factors? These questions served as guidelines for the committee's evaluations. Appendix A is a summary description of each technique, organized along the lines of the Hegge, Tyner, and Genser questions, covering theory, research, and application. For many of the categories, however, the desired information is either too limited to be useful or simply not available; in such cases we have considered other strategies for evaluation. The committee faced a number of difficulties in evaluation that stem from recurrent problems posed by the technologies. One is the tendency for some promoters (and consumers) to rely primarily on testimonials or anecdotal evidence as a basis for application. Another is a general lack of strong research designs to provide evidence of effects. These problems are considered also in the context of specific techniques discussed in the chapters of Parts 11 and 111. Practitioners of techniques often emphasize the value of personal or clinical experience and marketplace popularity as bases for judging the techniques. They are generally less inclined to seek research evidence or to support research evaluation programs. These attitudes may be related to the fact that few practitioners are trained as researchers. For some it is sufficient to let others do the research. For others, research is CD CD CD C-4 M CD Q C*4 CD C:P CD W Q Q (6 CD T- ' 00 CD Q CD Q C14 a) to M "0 LL 'a a) > 0 CL CL < 12 ENHANCING HUMAN PERFORMANCE viewed, in varying degrees, as a threat to their product. At one extreme, research is regarded as a debunking enterprise, engaged in by scientists who have little interest in providing human services. At another extreme, the*problem is one of educating the researchers in nuance, context, and a %Anical approach that emphasizes adapting techniques to changed sit~tions and client tastes. The result is a gap in communication epi0mized by two cultures-scientists searching for evidence and prac- titipers seeking effects and cures. A step toward bridging the gap would cogAt of mutual education throughjoint ventures. These ventures would eTse scientists to the goals (and motives) of practitioners and would alsomake practitioners aware of the general analytical approaches used by ientists. aperimentation is an appropriate vehicle for evaluating performance- Cn enigancing techniques; the problem is usually defined in terms of effects of8chniques (procedures) on performance (behaviors). It is also appro- pr ig e at an earlier stage in the process, when products are being dearloped. Products evolve in a kind of trial-and-error fashion similar in respects to scientific discoveries. One model for integrating research w product development is engineering research; and development (R&D). A strenuous applied research effort accompanies the development pi 1W . many firms, as does a quality-control prp'gram designed to gess in evauate products both during development and after they have been p1med on the market. With a few exceptions, this model has not been acRted by firms or institutions in the field of performance enhancement. (3xperimental evidence has accumulated in some areas related to teZkniques. Although not linked specifically to product development in tl~manner of an R&D operation, this work does address the question, Mftt evidence supports the claims for the technique? In fact, so strong is re experimental tradition in some areas that a body of work has demeloped programmatically within a generally accepted paradigm (e.g., gted imagery). The benefits of a long research tradition can be seen in tf&se areas. Meta-analyses have been performed and can be used as a Nwis for evaluation. For other areas, we are presented with the prospect odelying on scattered experiments or using other criteria as a basis for e,oluation, or both (see Appendix A for summaries of the state of the s*nce in each of the areas). Olowever, the benefits of experimental evidence derive primarily from thEgeneral approach rather than from the particular experiments. This it is captured by Kelman, who noted that "an experimental finding . . . cannot very meaningfully stand by itself. Its contribution to knowledge hinges on the conceptual thinking that has produced it and into which it is subsequently fed back" (1968:161). We emphasize here the contribution INTRODUC71ON 13 of an analytical approach to thinking about behavior, as distinct from the establishment of laws about psychological processes. It is the cumulation of a series of experiments that winnows out the useful parts of treatments or techniques. It is the self-correcting progression of new experiments that refines treatments, saving those that work and discarding those that do not (or that work only under very restricted conditions). This process contributes equally well to the goals of theoryQQ development and product development. Other evaluation criteria elucidated by Hegge.0 Tyner, and Genser (1983) include theories, uses, and implicationsQ for Army operations and philosophy. A problem with these criteria N is that they tend to be vague and somewhat idiosyncratic, making it difficulta to propose general cate- gories on which most people would agree. Withouta precisely defined W categories forjudging techniques, it is difficult to address issues of transfer of performance from one situation to another or to evaluate newly emerging techniques. A similar problem existsa with respect to developing taxonomies in broadly defined fields: there 9 is little agreementon a set of categories for the fields of human learning, a) performance, motivation, perception, and social and organizational a- processes. More mature sub- Q disciplines provide an empirical basis for W taxonomies, allowing for more tightly constructed systems of tasks and situations:I for example, rot < e learning, short-term memory, concept learning,0 problem solving, work motivation, and team functions (see Fleishman and Quaintance, 1984). An advantage of such systems is that they capture rather precise relationships between task and performance. 00 This discussion serves only to introduce the a issues and identifies several themes that receive more detailed attention -a- in the chapters to follow. First, any evaluation must take into account the status of the available evidence. Confidence placed in judgments about04 a technique should be based on the quality of the evidence produced4) by researchers. Second, 0 the evaluator cannot afford to rely exclusivelyC9 on a single criterion for judging effectiveness. Theoretical and applied issues are also important, as are considerations of values served or violated by use of the technique. Third, technique development issues are not L_ isolated from research or 0 analytical issues. Each step in the process LL of product design can be regarded as an empirical issue; decisions -0 made about procedures and packaging can be the result of experimental 4) outcomes. Fourth, the subject of enhancing human performance is not new. 0 It has been a topic of interest for centuries and an area of scientificCL work for several decades. The literatures on learning and skill acquisitionCL should be consulted by developers, and insights derived from these literatures should be used in product design. 14 ENHANCING HUMAN PERFORMANCE These themes are woven throughout the discussions of specific tech- niques. Each chapter discusses relevant literature, describes the specific techniques, points to directions for further research when appropriate, 40 notes possible applications in military and industrial settings. Despite t4k common coverage, however, each chapter is also unique in that each iCXailored to the particular problems associated with its focus. C4 C4 (L (D > 0 L_ CL CL < Findings and Conclusions C4 C4 The committee's first major task was to evaluate the existing scientific (L evidence for a wide range of techniques that have been proposed to enhance human performance. This evaluation was intended by our Army sponsors to suggest guidelines for decision making on Army research and training programs. In our evaluation we draw conclusions with respect to whether more basic or applied research is warranted, whether training CD programs could benefit from new findings or procedures, and what, in ZTZ 00 particular, might be worth monitoring for potential breakthroughs of use p to the Army. In many of the areas examined it appears feasible to pursue C8 Q carefully designed programs that build on basic research; however, such Q programs should be monitored closely. C*4 The committee's second major task was to develop general guidelines (D U) for evaluating newly proposed techniques and their potential application. M (D We are aware that the use of basic and applied research in decision 7F) making is a complex issue. Although payoffs from basic research can often be realized in the long run, the value of research findings to the 0 Army depends on developing a way of putting them into practice. With LL regard to applied or evaluation research, further complexities are evident: -0 multiple, sometimes conflicting, criteria must be satisfied at each of (D > several stages in the evaluation process, from assessing a pilot program 0 L_ to implementing the program in an appropriate setting. Another problem CL is that of choosing among alternative techniques when none of them has CL been subjected to a systematic evaluation. In the absence of evaluation studies, the Army needs guidelines for selecting packages and vendors. The committee's evaluation has produced several answers to questions 15 16 ENHANCING HUMAN PERFORMANCE of how best to improve performance in specific areas. On the positive side, we learned about the possibilities of priming future learning by presenting material during certain stages of sleep, of improving learning by integrating certain instructional elements, of improving skilled per- fiqTmance through certain combinations of mental and physical practice, 0reducing stress by providing information that increases the sense of CS ccmtrol, of exerting influence by employing certain communication strat- eges, and of maximizing group performance by taking advantage of (nanizational cultures to transmit values. On the negative side, we d5covered a lack of supporting evidence for such techniques as visual tgning exercises as enhancers of performance, hemispheric synchroniz- a gon, and neurolinguistic programming; a lack of scientific justification f& the parapsychological phenomena considered; some potentially neg- e effects of group cohesion; and ambiguous evidence for the effec- trAwness of the suggestive accelerative learning package. Srhe remainder of this chapter presents the committee's findings and c&clusions, which are presented in two parts: general conclusions .4= arding the process of evaluating any technique being considered by tg Army and specific findings and conclusions for each of the areas of fWnan performance examined. Whenever appropriate, we make recom- ty~ndations for research, evaluation, and practice. 6 GENERAL CONCLUSIONS Q Ir- ;aThe committee suggests that the Army move vigorously, yet carefully systematically, to implement techniques that can be shown to enhance ormance in military settings. Such an effort would be timely because 0 8f, o0recent developments in the relevant research areas. Moreover, the p off is likely to be very high if techniques are selected judiciously. 41hough the desire for dramatic improvements in performance makes s&ne extraordinary techniques attractive, techniques drawn from main- ur- S1&am research in relevant areas of performance may be more effective. 'Me Army's concern for enhancing human performance and its substantial rc~q)urces for evaluating techniques place it in a favorable position to tiLe advantage of developments. The Army might also consider the p=sibilities of transferring its findings to the civilian sector. §collectively, the committee's conclusions call'for the adoption of s(&ntifically sound evaluation procedures; however, these procedures n _tt be adapted to institutional needs and must take into account problems kgmplementation. We summarize these considerations below. SCIENTIFIC EvIDENCE FINDINGS AND CONCLUSIONS 17 or compelling theoretical argument, or both. A technique's utility should be judged in relation to alternatives designed for similar purposes, and the estimated utility should be of significant magnitude. Specific stages of analysis can be incorporated in pilot or field testing, and such testing should be carried out by investigators who are independent of the technique's originators or promoters. Q Q TESTIMONIALS As EvIDENCE Q C*4 M Personal experiences and testimonials cited Q on behalf of a technique are not regarded as an acceptable alternative Q to rigorous scientific C*4 evidence. Even when they have high face validity,Q such personal beliefs Q are not trustworthy as evidence. They often Q fail to consider the full range of factors that may be responsible for an observedW effect. Personal versions of reality, which are essentially private, are especially antithetical to science, which is a fundamentally public enterprise. Of course, a Q caution about testimonials should not be confuseda with a lack of openness to new and unusual ideas. Such openness is c6 consistent with the require- a) ment that the evidential criteria of science (L be satisfied. The subject of testimonials as evidence has 0 received considerable attention in recent research on how people 115 arrive at their beliefs. These < studies indicate that many sources of bias operate and that they can lead to personal knowledge that is invalid despite its often being associated with high levels of conviction. The committee recommends that this Q research be disseminated, as appropriate, in T.- the Army. It may then be applied whenever testimony is used as the primary0-0- evidence to promote an enhancement technique. Q Q Q CONDITIONS FOR IMPLEMENTATION Q C*4 Two kinds of evidence should be sought to support(D decisions to U) implement a technique: successful field tests M and an analysis of imple- CD mentability. It would also be useful to analyze- the impact of the technique CD or package on the larger system in which it is to be embedded. These analyses would aid in explaining why the procedures are necessary and why certain consequences are expected. In general,0 any description of LL what a technique accomplishes should be accompanied'a by an explanation of why it accomplishes what it does. Such an CD explanation would provide > a more fundamental understanding of processes 0 affected by exposure to L_ the technique and permit optimal implementation.CL CL RATIONAL DECISION MAKING < The considerations that must be entertained in selecting a technique for practical use in a military setting are different from the considerations Techniques and commercial packages proposed for consideration by the Army should be shown to be effective by adequate scientific evidence 18 ENHANCING HUMAN PERFORMANCE needed to verify the existence of an enhancement effect in a scientific setting. For example, the benefits of correct decisions and the costs of incorrect decisions, that is, the risk calculus, may differ in the two settings. Furthermore, what is viewed as a timely decision will also differ. 'Me specific differences as they apply to particular decisions should be rG,de explicit. MECHANISMS FOR ADVICE cit would be useful to provide valid information about useful techniques C*4 tb5Army commanders and other interested staff on a regular basis. Special asideration should be given to ways in which tech nique- re I ated infor Wtion can be transferred from scientists to practitioners. The charac Igistics of a transfer agent could be defined, and such a position might established within an appropriate office. t . dThe committee recommends that the Army Research Institute formalize tie ways in which it receives and provides advice about specific tech- 1Tques. A committee to review experimental designs and statistical JMalyses could be convened to improve the evaluation of techniques. I*ecial and standing committees could also be used to make program P(Commendations and to review proposals for intramural and extramural (.Osearch. BIDDING PROCEDURES co CDPurchase by the Army of a commercial enhancement package should 8ke place within the context of a set of well-defined procedures. The mmittee recommends that an open-bid procedure be followed, based R V a full presentation of the Army's stated objectives. This would ghcourage competitive evaluation of techniques. The following informa- &n, presented in a standard format, should be required: the objectives 7the technique, a description of its procedures, evidence that it produces I*e claimed effects, and the vendor's record of past achievements in CFlevant areas. LL Lack of professional training and research experience in human per rmance by a designer or advocate should not preclude consideration the proposed package; it should, however, signal the need for a more Rringent analysis by the Army. CL SPECIFIC FINDINGS AND CONCLUSIONS We present below findings and conclusions for each of the areas investigated. Some statements take the form of suggested actions based FINDINGS AND CONCLUSIONS 19 on what we know; others Consist Of Suggestions for more work or for research that has not yet been done. LEARNING DURING SLEEP L The committee finds no evidence to suggest that learning occurs during verified sleep (confirmed as such by electrical recordi ngs of brain activity). However, waking perception and CN interpretation of verbal rna- terial could well be altered by presenting C') that material during the lighter stages of sleep. We conclude that the existencea and degree of learning CN and recall of materials presented during sleepa should be examined again as a basic research problem. 2. Pending further research results, the committeeX concludes that possible Army applications of learning during" sleep deserve a second CY) look. Findings that suggest the possibility I- of state-dependent learning a and retention (i.e., better recall of materialQ when learned in the same physiological and mental state) may be applicable(L to fatigued soldiers. (D Furthermore, even presentations of material [L that disrupt normal sleep may be cost-effective, as may presentations Q that coincide with stages of light sleep. 0 ACCELERATED LEARNING 1. Many studies have found that effective instruction is the result of 00 such factors as the quality of instruction, a practice or study time, motivation of the learner, and the matching of the training8 regimen to the job demands. Programs that integrate all these factors would be d sirable. e We recommend that the Army examine the costs,C4 effectiveness, and longevity of training benefits to be derived a) from such programs and U) compare them with established Army procedures.C9 2. The committee finds little scientific evidence7a that so-called super- ; learning programs, such as Suggestive Accelerative Learning and Teach- ing Techniques, derive their instructional L_ benefits from elements outside 0 the mainstream of research and practice. We UL observe, however, that these programs do integrate well-known instructional,-0 motivational, and practice elements in a manner that is generally(D not present in most scientific studies. > 0 3. We find that scientifically supported procedures" for enhancing skills CL are not being sufficiently used in training CL programs and make two recommendations to remedy this problem. First, the basic research literature should be monitored to identify procedures verified by laboratory tests to increase instructional effectiveness. Second, additional basic 20 ENHANCING HUMAN PERFORMANCE research should be supported to expand the understanding of skill acquisition for both noncombat and combat activities. 4. We conclude that the Army training system provides a unique .npportunity for cohort testing of training regimens. The Army is in a Rosition to create laboratory classroom environments in which competing c3raining procedures can be scientifically evaluated. ca 5. The committee recommends that the Army investigate expert teacher Elrograms by identifying and evaluating particularly effective programs 0within the Army. In addition, transferable elements of effective instruction alan be reported to the larger instructional community. Q Q W IMPROVING MOTOR SKILLS T- (D 1. The committee concludes that mental practice is effective in en- ~;hancing the performance of motor skills. This conclusion suggests further 9work in two directions: (1) evaluation studies of motor skills used in the (DArmy and (2) research designed to determine the combination of mental a) 0-and physical practice that, on average, would best enhance skill acquisition Cand maintenance, taking into account both time and cost. W 2. The committee concludes that programs purporting to enhance cognitive and behavioral skills by improving visual concentration have Unot been shown to be effective to date. In ourjudgment, these programs CDare not worth further evaluation at this time. 3. The committee concludes that existing data do not establish the 00 generality of observed effects from programs that train visual capabilities Q 8 to increase performance. D Q 4. Similarly, the committee concludes that the effects of biofeedback Q C*4 on skilled performance remain to be determined. (D 5. The committee recommends additional research to establish the (n M potential of these techniques in the domain of specific skilled perform- ances. ALTERING MENTAL STATES 0 LL 1. Time did not allow the committee to explore the evidence for a 'D wide variety of specific methods for relating mental states to changes in (D > performance. Such methods include forms of self-induced hypnotic states 0 and peak performance resulting from high levels of focused concentration L- CLand meditation. We recommend that reviews of the literature in these CL areas be undertaken to ascertain whether any practical results might be obtained by the use of such methods. 2. The committee finds that, while the study of mental computations in language and imagery has progressed in recent years, the effort to understand how such computations are modulated by energetic factors FINDINGS AND CONCLUSIONS 21 such as arousal, stress, emotion, and high levels of sustained concentration has not been fully developed. For example, the claims that certain mental states produce general improvements in performance derive from the idea, supported by research, that arousal affects mental computations and that there ought to be an optimal level of arousal for the performance of such computations. We recommend this as Q an important area for investment of basic research funds. Q Q C*4 3. The committee's review of the appropriate M literature refutes Claims that link differential use of the brain hemispheresQ to performance. Further Q evaluation of these claims depends on developingC-4 valid and reliable measures of hemispheric involvement. Q Q 4. The committee finds no scientifically acceptableQ evidence to support the claimed effects of techniques intended to integrate hemispheric W 'r. activity, for example, Hemi-Sync'3'. Attempts0) to increase information- - processing capacity by presenting material [*- separately to the two hemi (D spheres do not appear to be useful. We concludeQ that such techniques should be considered further by the Army only if scientific evidence is provided to and evaluated by the Army Research(L Institute. STRESS MANAGEMENT 1. Existing data indicate that stress is reduced by giving an individual as much knowledge and understanding as possible Q regarding future events. In addition, giving the individual a sense of control is effective. On the basis of these findings, the committee recommends00 a systematic program Q of research and development that would address8 three questions: (1) How relevant is this finding for stress reductionQ in the Army? (2) To what extent does stress reduction realized in trainingQ transfer to combat C-4 situations? (3) What are the limitations on a) providing knowledge and understanding of future events and a sense U) of control in the Army sett . M Ing? Pending the outcome of this research, we suggest that consideration be given to including the material in training programs for company grade, field grade, command, and staff officers. 2. We find that, while biofeedback can achieve0 a reduction of muscle LL tension, it does not reduce stress effectively. It is therefore not a promising research topic in that respect. We recommend (D that funding be directed > toward investigation of more promising stress0 management procedures. 3. We recommend that information be gathered on the costs of stress CL in terms of organ breakdown, loss of efficiency,CL and loss of time. This information would have implications for training< programs. INFLUENCE STRATEGIES 1. The committee finds no scientific evidence to support the claim that neurolinguistic programming is an effective strategy for exerting influence. 22 ENHANCING HUMAN PERFORMANCE We advise that further Army study of this aspect of NLP be made only in comparison with other techniques. 2. There are no existing evaluations of NLP as a model of expert performance. We conclude that further investigation of such models may be worthwhile and suggest that NLP be examined in comparison with IV ,Several other techniques. Q 3. Concerning the process of technology transfer, we recommend that Q otudies be conducted to develop training regimens for those who train gthers to wield social influence. The large literature on this topic in social sychology would provide a basis for such packages. 91 C*4 Q Q Q GROUP COHESION W 1. We find few scientific studies that address the possible relationship ftetween group cohesion and performance; however, such a relationship &ay well be found with more extensive research. There is a need for (Aesearch to consider the possibility of negative effects from inducing %-ohesion and methods of avoiding such effects. The committee recom- C3nends continued study of cohesion and related group processes. W 2. We are favorably impressed with the evaluation studies of the 0 L- CL CL < Q Q Evaluation Issues Q C*4 Q C*4 Q 40 Q M I Q Q (6 a) Implementation of an enhancement technique, in the committee's view, Wshould depend on two general kinds, or levels, of evaluation. The first -~examines primarily the scientific justification for the effectiveness of the C)'technique and the potential of the technique for improving performance - -in practice. The second kind examines field tests of a pilot program (Dincorporating the technique to determine how feasible it is and to what IT- ~_extent it brings about effects that Army officials consider useful. 00 a Convincing scientific justification can come only from basic research, Qthat is, from carefully controlled studies that usually take place in Q Qlaboratory settings and that preferably are related to a body of theory. C*4Such research can provide evidence for the existence of the causal effect (D Mon which a technique is based and can help explain, or indicate a Mmechanism for, the effect. Analysis in connection with basic research a) 7a;should go beyond scientific justification to operational potential and likely Wcost-effectiveness. Only field tests can assess a program's actual opera- "tions and effects, however, and for such tests a broader array of evaluative 0 U-criteria are needed, related primarily to the technique's utility. -a Because strong claims of support from basic research have been made a) >for some of the techniques the committee examined, we review here Owhat it takes to justify a scientific claim, specifically, we review some 06tandards for evaluating basic research. We then examine in more detail 10~ <1 ome standards for evaluating field tests of pilbt programs. In the third section of this chapter, we set forth briefly some of our impressions of how the Army now manages the solicitation and evaluation of new performance-enhancing techniques. This chapter concludes with a note EVALUATION ISSUES 25 on informal, qualitative approaches to evaluation, which are sometimes suggested as alternatives to basic research and field tests. This chapter does not aspire to a comprehensive treatment of evaluation issues, and it barely touches on research methods. Articles, journals, books, and handbooks testify to the scope and complexity of this burgeoning field (e.g., Barber, 1976; Cook and Campbell, 1979). Our objective here is to highlight the topics that have impressed us as most germane. The various sources just mentioned would need to be consulted for even a minimal elaboration of these topics, and other committees would be required if recipes for evaluation of the Army's enhancement programs were sought as extensions of our work. Still, we believe this chapter will help the Army set general evaluation standards. STANDARDS FOR EVALUATING BASIC RESEARCH The purpose of basic research is to permit inferences to be drawn in accordance with scientific standards, including inferences about novel concepts, about causation, about alternative explanations of causal relations, and about the generalizability of causal relations. For novel concepts, evidence must be gathered that both the purported enhancement technique and the relevant performance have been (1) defined in a way to highlight their critical elements, (2) differentiated from related variables that might bring about similar effects, and (3) put into operation (manipulated or measured) in ways that include the critical parts. The burden is on the evaluator to analyze how the components of each new technique differ from concepts already in the literature. The need for this standard is illustrated well by packages for accelerated learning, as discussed in Chapter 4. Evidence needs also to be adduced that supposed cause and effect variables vary together in a systematic manner. Relevant procedures include comparison of performance before and after introduction of the technique, contrasts of experimental and control groups in an experimental design, and calculation of statistical significance. Illusory covariation can occur more easily in nonstatistical. studies, which are used often to support the existence of paranormal effects, as discussed in Chapter 9. Especially demanding is the need for evidence that the performance effect observed is due to the postulated cause and not to some other variable. Ruling out alternative explanations or mechanisms requires intimate knowledge of a research area. Historical findings and critical commentary are needed to identify alternatives, determine their plausi- bility, and judge how well they have been ruled out in particular sets of experiments. Common threats to the validity of any presumed cause- CTD, Q Q C*4 M a C:, C*4 Q Q Q Q I W a) [L Q 00 Q __ Q 45 (D C*4 U) M a) 77D 0 11- > 0 " CL CL 24 26 ENHANCING HUMAN PERFORMANCE effect relation include effects stemming from subject selection, unexpected changes in organizational forces, the spontaneous maturation of subjects, and the sensitizing effects of a pretest measurement on a posttest assessment. Experiments with random assignment of subjects to treat- I.ments are preferred, but some of the better quasi-experimental designs ,are also useful. Another class of threats to validity is associated with Ir oubject reactions to such conceptual irrelevancies as experimenter ex- Bectations about how subjects should perform or subjects' performing getter merely because they are receiving attention. Procedures that have covolved to reduce this sort of threat include double-blind experiments, alacebo control groups, mechanical delivery of treatments, and the ?,limination of all communication between experimenters and subjects or 0mong subjects. These safeguards, however, are not certain, and imple- %enting them is not a simple matter. 0 Finally, for a technique to be of value, one must ascertain that a causal I- cb~lation observed in one setting is likely to be observed in other settings S which the technique is to be employed. Replication of an experiment 9y an independent investigator is a first step. Another step is to produce Obe cause and effect with different samples of people, settings, and times. stematic reviews of the literature, perhaps aided by what is referred tas meta-analysis of studies (as illustrated in Chapter 5), are also helpful. !&yond these steps, a thorough theoretical understanding of causal 140cesses, which is a fundamental goal of science, permits increased &actical control. Our point-perhaps seeming obvious to many but nonetheless needing gnphasis here-is that a planned or existing program for implementing enhancement technique is much more likely to bear fruit if evidence Wr the technique's effectiveness is properly derived from basic research, Rcomplex set of ground rules exists for conducting and drawing inferences fimm basic research, and waiving those rules greatly increases the chances (A incorrect conclusions. d) 7F) STANDARDS FOR EVALUATING FIELD TESTS OF PROGRAMS 0 LL An adequate appraisal of an actual enhancement program requires a&ntion to three general factors. First, the organizational (i.e., political, administrative) context in which the program is'embedded should be dLc,riibed. That context strongly influences the choice of evaluation c a, the types of evaluations considered feasible, and the extent to Aich evaluation results will be used. Second, the program's conse- quences should be described and explained, including planned and unplanned, short-term and long-term consequences. The way the program EVALUATION ISSUES 27 is construed influences the claims resulting from an evaluation and the degree of confidence that can be placed in .what was learned. Third, value or merit should be explicitly assigned to a program. Valuing relates an enhancement technique to an Army need and to feasible alternatives. In the following sections we comment on these three factors in turn. CD CD CD THE ORGANIZATIONAL CONTEXT 04 (q) A description of the broader context of an CD enhancement program would include an assessment both of the various CD constituencies with a stake in C%1 its implementation and of the priorities of CD the larger institution. We do CD not discuss stakeholder interests in general CD at this point because we refer to some specifically later in this chapter, in the section on the committee's impressions of current Army evaluation practices.0) We do comment here on the Army's institutional priorities as r*- they may relate to scientific CS standards. I We understand that the Army, like other organizationsto in society, may M have-and quite possibly should have-different[ standards for evaluating L knowledge claims, or technique effectiveness,0 than science has. The scientific establishment is conservative in the tests it administers to discipline its conjectures; in particular, its goal is to reduce uncertainty as far as possible, no matter how long that takes. In the Army, by contrast, the need for timely information and decisions may lead to an CD acceptance of greater uncertainty and a higher risk of being wrong. There is no Army doctrine of which we are aware concerning the 00 degree of risk that is acceptable in evaluationsCD of pilot programs. Yet CD surely one objective of evaluations of pilot CD programs should be to describe the costs to the Army of drawing incorrect CD conclusions so that inferential C14 standards can be made commensurate with those(D costs. If the costs are relatively low, the riskier approach of most U) commercial research (as, for M example, in management consulting or marketing)(D may be preferred to 7~D the more conservative approach of basic science. 0 DESCRI13ING A PROGRAM'S CONSEQUENCES LL In evaluating a program, it is desirable to present an analysis and a defense of the questions probed and not probed,(D together with justification 0 usually L_ for the priorities accorded to various issues. Primary issues include the program's immediate effects and CL its organizational side effects. CL Immediate Effects A primary problem in evaluation is to decide on the criteria by which a program is to be assessed. The major sources for identifying potential 28 ENHANCING HUMAN PERFORMANCE criteria include program goals, interviews with interested persons, con- sideration of plausible consequences found in the literature, and insights gained from preliminary field work. ,r- Such criteria specify only potential effects, however. They do not ,Teak to the matter of whether the relation between a supposed cause and effect is truly causal. In this respect, a fundamental issue of gethodology is the use of randomized experiments. Although logistic fja~sons abound in any practical context for not going to the trouble to ase such research designs, one might nonetheless argue that the Army is a better position to conduct randomized experiments than are organi- &tions in such fields as education, job training, and public health. The ctason for going to such trouble is that randomized experiments give a %wer risk of incorrect causal conclusions than the alternatives. (D Alternatives at the next level of confidence are quasi-experimental gesigns that include pretest measures and comparison (control) groups. %elatively little confidence can be placed either in before-after measure gents of a single group exposed to a technique without an external (Lomparison, or in comparisons of nonequivalent intact groups for which etest measures are not available. 2r < 0 Side Effects Q Unintended side effects include impacts on the broader organization, d these should be monitored. For example, trainers from other (non- ,(,ZPxperimental) units may copy what they think is going on, or they may mply be upset by the implementation of new instructional packages in Re experimental units. Units not treated in the same way as the Cgxperimental units may be unwilling to cooperate when cooperation &Vould seem to be in their best interest. They may also suffer by gomparison, as is thought to be the case, for example, when COHORT 7ignits are introduced into a division (see Chapter 8). Evaluators should Wrive to see any program as fitting into a wider system of Army activities 8n which it may have unintended positive or negative effects. LL 'a AsSIGNING VALUE TO PILOT PROGRAMS 0 L_ The described consequences of a program tell us what a program has CL achieved but not how valuable it is. Three other factors are important in 'Cferring value: Does the new technique meet a demonstrable Army need to the extent that without it the organization would be less effective? How likely is it that the program can be transferred to other Army settings, either as a total package or in part? How well does the new EVALUATION ISSUES 29 program fare when compared with current practice and with alternatives for bringing about the same results? Meeting Needs Representatives of the commercial world who seek outlets for their Q products often confound wants with needs, enthusiasmQ with proof, and hope with reality. While it is axiomatic that Q all field tests should aim to C*4 meet genuine Army needs, it is not clear how M needs are now assessed when the developers of new products approach Q Army personnel for Q permission to do general research or field C*4 tests. It is clear that a needs Q analysis should be part of the documentation Q about every field test. What should a needs analysis look like? At Q the minimum, it should W document the current level of performance at some task, why the level a) is inadequate, what reason there is to believer.- that performance can change, and what the Armywide impacts would Q probably be if the Q performance in question were improved. In addition,(6 an analysis should question why a particular program is needed a) for solving the problem. Such an analysis would describe the program, IL critically examine its 0 justification in basic research, identify the W financial and human resources required to make the program work, relate the resources required to the funds available, examine other ways of bringing about the same intended results, and justify the program at hand in terms of its anticipated cost- effectiveness. To facilitate critical feedback,CD such reports should be independent of the persons who sponsor a program, though based on a 00 thorough, firsthand acquaintance with the programQ and its developers and sponsors. Q Q As just described, needs analysis is a planningQ exercise to justify mounting a pilot program. It is not a review C*4 of program achievements (D relative to needs, for which a description U) of a program's consequences is required. At that later stage in evaluationM ajudgment is required about whether the magnitude of a program's effects (D is sufficient to reduce needs Z to a degree that makes a practical difference.W More is at stake than whether the program makes a statistically reliable difference in perform- 0 ance. Size of effect relative to need is the LL crucial concern. When the magnitude of change required for practical -0 significance has been specified in advance, it is easy to use such a specification(D to probe how well a need has been met. But the level of change 0 required to alleviate need is not usually predetermined, and there are politicalCL reasons why developers are not always eager to have their programs CL evaluated in terms of effect < sizes they themselves have clearly promised or that others have set for them. Needs can be specified only by Army officials, and it is vital that such 30 ENHANCING HUMAN PERFORMANCE officials inspect the results a program has achieved, relating them to their perception of need. Since the Army is heterogeneous, it would be naive to believe that there are no significant differences within it about how important various needs are and how far a particular effect goes in meeting a particular need. Some theorists relate needs primarily to the 7number of persons performing below a desired level, while others T_ cDemphasize the seriousness of consequences for unit performance, for Qwhich deficiencies in only one or two persons may be crucial. Some Q c4pyactitioners are likely to think a deficit in skill X is worse than a deficit M (Din skill Y, while others may believe the opposite. Evaluators who take CDthe concept of need seriously have to take cognizance of such hetero- C*4 (Dgeneity, perhaps using group approaches like the Delphi technique to Q Qbring about consensus on both the level of need and the extent to which Wa particular pattern of evaluative results helps meet that need. Q Likelihood of Transfer Q (6 Although some local commanders may sponsor field trials for the benefit a) 0-of their command alone, the more widely a successful new practice can Obe implemented within the Army, the more important it is likely to be. 115Consequently, evaluations of pilot programs should seek to draw conclu- 5sions about the likelihood that findings will transfer to populations and Usettings different from those studied. . . In this regard, it is particularly important to probe the extent to which Q IT-any findings from a pilot study might depend on the special knowledge coand enthusiasm of those persons who deliver or sponsor the program. Q Such persons are often strongly committed to a program, treating it with Q (5a concern and intensity that most regular Army personnel could not be (Dexpected to match. While it is sometimes possible to transfer such C*4 (Dcomrnitted persons from one Army site to another in order to implement Wa program, in many instances this cannot be done. Transfer is partly a M Q)question of the psychology of ownership; authorities who did not sponsor (Da product will sometimes reject out of hand what others have developed, Wincluding their immediate predecessors. Since Army leaders in any oposition turn over with some regularity due to transfers, promotions, and LLretirement, successors will probably not identify with a program as 'Ustrongly as the original sponsors and developers did. a) > The likelihood of transfer also affects the degree to which program 0. "Implementation is monitored. Pilot programs are likely to be more CL btrusively monitored than other programs. Not only is this obtrusiveness 0-0 0 CL CL 32 ENHANCING HUMAN PERFORMANCE effectiveness analysis lends itself better than what is called cost-benefit ose of cost- analysis to the comparison of different programs. The purp effectiveness research is to express the total cost for each program in dollar terms and to relate this to the amount of effect as expressed in its T- the effects 4original metrics-unlike cost-benefit research, in which even C:Phave to be expressed in dollar terms. Sophisticated consumers of eval- o Ouation should want something akin to cost-effectiveness knowledge, for "it. reflects decisions they should be making. Is it not useful to know, for Oexample, that the best available computer-assisted instruction packages 0 o4are much less cost-effective than peer tutoring? 0 0 0 W CURRENT STATUS OF ARMY EVALUATIONS T- of the way in which the CD We set forth here some of our impressions 1~_ a Army currently manages the solicitation and evaluation of novel tech- Q tress that these are only 1niques to enhance performance. We must s (D impressions, gained through the limited investigative capabilities of a (D CL committee such as ours, not hard conclusions based on systematic research directed at the particular question. Furthermore, although the . .ons that follow are largely critical of Army procedures, they are not opim accompanied by much detail. As noted earlier, the focus here is on the identification of the various Army constituencies that have a stake in enhancement programs and on the role they play in evaluation. T- als should be How the Army decides which among competing prOPOS co sponsored for development or for field tests is not clear. What is clear is Q both geographically and institutionally. Q that decision making is diffuse on or from Q Sponsorship may come from senior managers in the Pentag Q 04 local personnel of varying rank. While differences in the quality of (D program design, implementation, or evaluation may be correlated with U) such a correlation is not clear at present in W the source of sponsorship, the Army context- A particular concern is that Army sponsors of pilot programs may base their judgment about the value of a program either on their own ideas 0 about what is desirable or effective or on the persuasiveness of the LL arguments presented to them by program developers, who stand to gain 'a Judgments of value should a) financially if the Army adopts their program. > depend on broader analysis of Army needs and resources, as well as on 2 CL realistic assessment of the quality of proposed ideas based on a thorough CL and independent knowledge of the relevant research literatures. Sponsors < should examine what is being advocated at every stage: proposal, testing, and implementation Also of concern when pilot programs are planned is how decisions are reached about funding and about the quality of implementation expected EVALUATION ISSUES 33 from them. Although systematic evidence is lacking, it seemed to committee members that pilot programs are not generally implemented w ell and, except for fiscal accountability, are not closely monitored by their Army sponsors. Evaluations of pilot programs should try to char- a cterize resources required by the program and the resources actually available. We found little evidence that sponsors, advocates, or local implementers had aspirations to evaluations that use state-of-the-art methods. We found no guidelines about the standards expected for evaluative work, whether C4 in the form of published minimal stan CIO) pre dards or published statementsforf C014 ferred practices. When it comes to field trials of novel ideas 00 enhancing human performance, the monitoring of evaluation quality does not seem to be part of the organizational context. Given the absence of 0 formal expectations in these regards, it is not surprising that the pilot programs we saw and the evaluation materials we read were usually disappointing in the technical quality of the research condu ;cted. 0 In0 settings in which program sponsors or advocates control an evaluation, , to weaker evaluations (e.g., based on testimony) will sometimes be preferred 0) to stronger methods (e.g., experiments) because the latter are usually 0- more disruptive when implemented and are more likely to result in effects 0 W that are disappointing, however much more accurate they may be. The I weaker methods are easier to implement when few units are available, < are less disruptive of ongoing activities, are easier to manipulate for self- interested ends, and need not be as expensive for data collection. 0 We saw little evidence that the Army requires evaluations by persons independent of the pilot program under review. Moreover, the noninde- 00 0 pendent evaluations we saw did not seem to have been subjected to any ;5 of the peer review procedures to which research results (and plans) are 0 0 subjected not only in academic sciences, but also in much of the corporate 04 world, as with, say, pharmaceutical testing. While in-house evaluation is (D U) highly valuable for gaining feedback for program improvement, many CU experienced evaluators contend that it is inadequate for assigning overall value because in-house evaluators cannot divorce themselves from their own stake in the program under examination. Although it is not easy to L_ specify organizational standards adequate for a high-quality field test of 0 LL some novel technique, it is also not difficult to detect the inadequacies associated with local program sponsors' having few clear expectations about the desirable qualities of program operations or evaluative practices.> 2 In the absence of such expectations, program developers and evaluators CL may believe that few officials care about the small-scale field tests of r-L techniques on which the developers'-and, all too often, the evaluators'- < own welfare depends. Since the organizational climate we have just described is not optimal 34 ENHANCING HUMAN PERFORMANCE for gaining trustworthy information about program value, future evaluators of Army field trials might do well to characterize: (1) what program managers expect in terms of the quality of the program and its evaluation; (2) who is paying attention to the trials, and (3) for what purposes they want to use any information provided by the evaluation. This kind of information, as mentioned above, contributes to a description of the 0 organizational context of a program, which is a major part of an adequate 0 evaluation. 0 04 0 QUALITATIVE APPROACHES 04 0 Alternatives to experimentation are the largely qualitative traditions, 0 which rely mostly on direct observation, sometimes supplemented by 0 archival data. Investigative journalists operate in this mode; so do many a) cultural anthropologists, political scientists, and historians. These profes sions use clues to suggest hypotheses about possible causes and investigate 9 the empirical evidence in ever-greater detail in an attempt to rule out (0 hypotheses until they are left with just one. A critical aspect of their a) IL work is the use of substantive theories and ad hoc findings from the past to help in ruling out alternative explanations. Also working in this tradition 1~_are committees of psychologists who seek to make statements about the 5causes of enhanced human performance. Rarely conducting studies 0themselves, they instead sift through historical evidence provided by reviews of the literature and make on-site observations in the manner of v-detectives, pathologists, investigative journalists, and cultural anthropol 000gists. CD dents' ~ These traditions rely strongly on personal testimony. Respon CD areports are taken seriously and, indeed, should be. Any method can, in CD 04principle, generate strong causal evidence, provided that plausible alter- a)natives to a preferred hypothesis have been ruled out. The general issues Ware: Can personal testimony usually rule out all the plausible alternative (D interpretations? Does use of it engender the very threats to validity that (Dmilitate against strong inferences? Dale Griffin, in a paper prepared for Dhe committee (see Appendix B), suggests "no" to the first question and 0-yes" to the second. His analysis of biases that operate when people LL attempt to explain how and why they changed after an experience reveals V (Dmany of the shortcomings associated with relyin g on testimony as a major means of testing causal hypotheses. 0 " While testimony can be regarded as a form of confirmatory evidence, CL CLI tdoes not provide any of the disconfirming evidence needed to reduce 0 L_ CL CL CO C*4 (D 7F) 0 LL V (D > 0 I- CL CL PART III Parapsychological Techniques F ALL THE suBiEcTsTREATED in this volume, none is more contro- 0 versial than parapsychology. While the flavor of the debates is captured to some extent in this chapter, the subject is treated in the same manner as the other techniques reviewed: we address the question of whether the evidence warrants further consideration of parapsychological techniques for research or application or both. Emphasized here is information gathering by remote viewing and mind- over-matter effects in controlling machine behavior, particularly machines that generate series of random numbers, which are often used in para- psychology experiments. Although scattered results are said to be statis- tically significant, an evaluation of a large body of the best available evidence does not support the contention that these phenomena exist. If, however, future experiments, conducted according to the best possible methodological standards, are more generally viewed as producing sig- nificant results, it would be appropriate to consider a systematic program of research. Such a prograrn should include it concern for the need to proceed from small effects to practical applications. 167 CO C*4 (D U) 7F) 0 LL V (D > 0 L_ CL CL T- Q Q Paranormal Phenomena Q C4 M Q Q C4 Q Q Q T- Q Q BACKGROUND The primary purpose of this chapter is to evaluate the scientific evidence on parapsychological techniques in selected areas. 0 A more complete understanding of the topic, however, requires that Q we provide background on the military's interest in these phenomena and T- treat the conceptual issue of how people come to believe as they do. This00 background section includes a discussion of the phenomena and the military'sQ interest in Q them as well as an overview of the committee's focus.Q A brief examination Q of the different kinds of justifications for the C*4 claims is followed by a more detailed treatment of the evidence in areas that a) have produced large U) literatures: remote viewing, random number generators, and what are called Ganzfeld (whole visual field) experiments. In addition, we describe experimental work that the committee actually witnessed7a; by visiting a parapsychological laboratory. Despite the growing scientific tradition in some of these areas, many people continue to rely 0 on qualitative or LL experiential evidence to support their beliefs; we discuss the problems associated with qualitative evidence in conjunction with the research on cognitive and emotional biases, which is reviewed > in the paper by Dale 0 Griffin (Appendix 13). Finally, the chapter SLImmarizesLM the committee's CL major conclusions. CL THE NATURE OF THE PHENOMENA Parapsychologists divide psi-the term applied to all psychic phenom- ena-into two broad categories: extrasensory perception (ESP) and 169 170 ENHANCING HUMAN PERFORMANCE psychokinesis (PK). Included in ESP are telepathy, precognition, and clairvoyance, all of which refer to methods of gathering information about objects or thoughts without the intervention of known sensory mecha- nisms. Popularly called mind over matter, PK refers to the influence of T-thoughts upon objects without the intervention of known physical proc- CDsses. CD A presentation to the committee by several military officers described CD some detail the results of experiments in remote viewing carried out at both SRI International and the Engineering Anomalies Research gaboratory at Princeton University. In these experiments subjects are to have more or less accurately described a geographical location Jaeii'ng visited by'a target team. Although the human sub-je-c-ts ha-v-e no- way (4f normally knowing the target location, the examples recounted appear T_ do indicate, at first glance, some striking correspondences between their ~;escriptions and the actual sites. These studies have been related by gome persons to reported out-of-body experiences. The presentation included discussion of psychic mind-altering tech- Miques, the levitation claims of transcendental meditation groups, psy- Qhotronic weapons, psychic metal bending, dowsing, thought photogra- hy, and bioenergy transfer. It was indicated that the Soviet Union is far Lad of the United States in developing potential applications of such 4aranormal phenomena, in particular psychically controlling and influ- (~ncing minds at a distance. At the presentation, personal accounts were :Miven of spoon-bending parties, in which participants believe they have gaused cutlery to bend with the power of their minds, as well as instances Z5f self-hypnosis to control pain and cure illness, walking barefoot on fire Cind handling hot coals without being burned, leaving one's body at will, and bursting Clouds by psychic means. a) The media and popular publications, especially in recent years, have discussed various aspects of psychic warfare. Three recent books, by W-bon (1983), McRae (1984), and Targ and Harary (1984), have attempted ) document Soviet and American efforts to develop military and intel- .,gence applications of alleged paranormal phenomena. These accounts (have been augmented by newspaper stories, magazine articles, and 4elevision programs. Many of these sources acknowledge the speculative 'a Wature of the proposed applications, but others report that some of the &chniques already exist and work. The claimed phenomena and applications range from the incredible to 95e outrageously incredible. The "antimissile time warp," for example, < supposed to somehow deflect attack by nuclear warheads so that they will transcend time and explode among the ancient dinosaurs, thereby leaving us unharmed but destroying many dinosaurs (and, presumably, some of our evolutionary ancestors). Other psychotronic weapons, such PARANORMAL PHENOMENA 171 as the "hyperspatial nuclear howitzer," are claimed to have equally bizarre capabilities. Many of the sources cite the claim that Soviet psychotronic weapons were responsible for the 1976 outbreak of Legion- naires' disease, as well as the 1963 sinking of the nuclear submarine Thresher. CD CD POTENTIAL MILITARY APPLICATIONS CD C*4 Some people, including some military decision makers, can imagine CD potential military applications of the two broad categories of psychic CD C*4 phenomena. In their view, ESP, if real and controllable, could be used CD CD for intelligence gathering and, because it includes "prew-gmt-ion, _ESP__ CD could also be used to anticipate the actions of an enemy. It is believed W T_ that PK, if realizable, might be used tojam enemy computers, prematurely trigger nuclear weapons, and incapacitate weapons and vehicles. More CD specific applications envisioned involve behavior modification; inducing 9 sickness, disorientation, or even death in a distant enemy; communicating to with submarines; planting thoughts in individuals without their knowledge; IL hypnotizing individuals at a distance; psychotronic weapons of various 0 kinds; psychic shields to protect sensitive information or military instal- 115 lations; and the like. One suggested application is a conception of the < "First Earth Battalion," made up of "warrior monks," who will have mastered almost all the techniques under consideration by the committee, CD including the use of ESP, leaving their bodies at will, levitating, psychic healing, and walking through walls. 00 THE COMMITTEE'S Focus Although such colorful examples provide the context for our agenda, (D the cumulative body of data in the discipline of parapsychology enables U) us to judge the degree to which paranormal claims should be taken (1) seriously. Since 1882 reports of both naturally occurring incidents and 77D phenomena in laboratory settings have been accumulated in journals, monographs, and books. Just to survey the reports in the refereed journals 0 of parapsychology would be an enormous undertaking. As scientists, our LL 'a inclination is, of course, to restrict ourselves to the evidence that purports (1) to be scientific. But the alleged phenomena that have apparently gained > 0 most attention and that have apparently convinced many proponents do CL not come from the parapsychological laboratory. Nothing approaching a CL scientific literature supports the claims for psychotronic weaponry, psychic metal bending, out-of-body experiences, and other potential applications supported by many proponents. The phenomena are real and important in the minds of proponents, so 172 ENHANCING HUMAN PERFORMANCE we attempt to evaluate them fairly. Although we cannot rely solely on a scientific data base to evaluate the claims, their credibility ultimately must stand or fall on the basis of data from scientific research that is gobject to adequate control and is potentially replicable. ,r~We divided the task into two parts. First, we looked at the best scientific d9uments for the reality of psychic phenomena. Our sponsors, as well -11~~Our own appraisal of the current status of parapsychology, indicated At the two most influential scientific programs were the experiments on t r&ote viewing and the experiments on psychokinesis using random ,ant generators. In addition, we looked at the research on the Ganzfeld (cwhole visual field) because this, in the opinion of many parapsychologists, i2the most likely candidate for a replicable experiment. We also report orr a parapsychological experiment that the committee itself witnessed. crl_ F.~;econd, we considered the arguments of proponents who rely on what +My call qualitative as opposed to quantitative evidence for the paranor- ,a 41. Such evidence depends on personal experience or the testimony of (fters who have had such experience. Most, if not all, of this evidence not be evaluated by scientific standards, yet it has created compelling biefs among many who have encountered it. Witnessing or having an *malous experience can be more powerful than large accumulations of (&ntitative, scientific data as a method of creating and reinforcing beliefs. Because personal experience rather than scientific data has been the Ource of most beliefs in the paranormal, we have devoted some of our Ir- ources to considering this sort of cognitive method as a too] for Nieving knowledge. Q Q Q C*4 STANDARDS OF EVIDENCE $3iverse justifications have been offered for pursuing paranormal claims. e argument asserts that paranormal phenomena may no longer be ~qpmalous, given the implications of contemporary quantum mechanics. Weed, a few physicists have supported some parapsychologists in intaining that certain forms of precognition and psychokinesis are t qWsistent with some interpretations of quantum theory. The other major aMument is that we have no choice but to get involved because the S~viet Union already has a program to develop military applications of pffchic phenomena. CSeveral proponents, including some scientists, firmly believe that q~anormal phenomena have been scientifically demonstrated several times over. At the same time, most scientists do not believe that psi exists. Many persons on both sides believe this paradox to be the result of irrational and dogmatic belief systems. The proponents accuse the critics of being closed-minded and bigoted. The critics imply that the PARANORMAL PHENOMENA 173 proponents have allowed wishful thinking to bias theirjudgment and that they are incompetent scientists and are self-deceived. Both sides can point to examples to back their positions. One essential question confronts the committee: What does an impartial examination of the scientific evidence reveal about the existence of psi? Such an examination assumes that clear standards exist for judging the adequacy of the evidence, which, in turn, raises the issue of what c onstitutes sufficient evidence. That issue involves many difficult philo- sophical, theoretical, and methodological matters. For example, Palmer, in his "An Evaluative Repor-t on the Current Status of Parapsychology" (1985), denies that current parapsychological experiments can provide any evidence for the existence of psi. This is because psi implies paranormality and, according to Palmer, we cannot argue that a given effect has a paranormal cause until we have an adequate theory of paranormality. He further argues, however, that parapsychological ex- periments can and do provide evidence for the existence of anomalies. By an anomaly, Palmer means a statistically significant deviation from chance expectation that cannot readily be explained by existing scientific theories. The burden of Palmer's paper is that just such anomalies have been demonstrated. Beca 'use parapsychologists other than Palmer do not make this distinc- tion between demonstrating an anomaly and testing a theory of paranor- mality, we do not carry on this distinction in our own assessment of the evidence. We tend to agree with Palmer on this matter, however. When we talk about evidence for psi in the remainder of this chapter, we are using psi in the neutral sense of an apparent anomaly rather than in the stronger sense of a paranormal phenomenon. MINIMAL CRITERIA Fortunately, critics and parapsychologists appear to agree on the general requirements necessary to demonstrate psi in a parapsychological experiment. Both Palmer (1985) and.Jarnes E. Alcock (Appendix B) discuss such criteria in their respective papers. As Palmer points out, psi is defined negatively as a statistical departure from a chance baseline that cannot be accounted for by chance, sensory cues, or known artifacts. Such a negative definition implies the minimal criteria required to justify a conclusion that psi has been demonstrated. Given the statistical aspect, it is imperative that the data be collected in such a way that the underlying probability model and assumptions of the statistical test are fulfilled. This means that targets must be adequately randomized and that each trial in the experiment must be independent of the preceding ones-and, of course, the statistical procedures must be Q Ir- 00 Q Q Q Q C*4 U) 77D 0 LL > 0 CL CL 174 ENHANCING HUMAN PERFORMANCE applied and interpreted correctly. Given that all ordinary explanations must be ruled out, the experimenter must take special precautions to ensure that sensory cues, recording errors, subject fraud, and other alternatives have been prevented. Although it is impossible to rule out ciVnpletely every possible contaminant or to anticipate every alternative, t re are reasonable standards that most parapsychologists would agree sMuld be followed. aecause different research paradigms have their own special require- rignts, no single set of standards can be specified in advance for all p0apsychological experiments. Experiments with electronic number Zerators, for example, rarely have problems with data recording, but t1gy do require special methods such as tests of randomness and attention tWhe immediate physical environment that are unnecessary with more tRitional parapsychological experiments. One requirement for assessing tt; adequacy of a given experiment is that its procedures and methods oU--unalysis be adequately documented. Unless we know how the targets vf~re selected, how the results were analyzed, how the possibility of s&lsory leakage was prevented, and how other such aspects of the stu dy vae carried out, we have no basis for evaluating the quality of the ifi,a)rmation provided by the experiment. 0 GLOBAL CRITERIA ahe criteria mentioned in the preceding paragraphs apply to the iigividual experiment. More global criteria come into play when one vE~its to evaluate an entire research program or set of experiments. Here v8 look for such things as replicability, robustness, lawfulness, manip- u2bility, and coherent theory. These criteria deal with the coherence a~o intelligibility of the alleged phenomena. It is in terms of such global csheria that parapsychological research has been especially vulnerable. 24uch of the objectivity involved in assessing the adequacy of research lies to judging individual experiments. But science is cumulative and ends not so much on the outcome of a single experiment as on cosistent and lawful patterns of results across many experiments carried oU4 in a variety of independent settings. Lawful consistency in this sense, a3Zording to both parapsychologists and their critics, has never been fcF*nd in parapsychological investigations in the history of psychic r-c&arch. Recently a few parapsychologists have expressed the hope that tkxperiments on remote viewing, random number generators, and the Cxozfeld (the very ones we have chosen to examine in detail in this report) may actually yield the long-sought replicability. The type of replicability that has been claimed so far is the possibility of obtaining significant departures from the chance baseline in only a proportion of PARANORMAL PHENOMENA 175 the experiments, which is a kind of replicability quite different from the consistent and lawful patterns of covariation found in other areas of inquiry. Despite the fact that scientific progress in a given area depends on the accumulation of lawful and consistent patterns across many experiments, the met hods for deciding that such consistency exists are still quite primitive in comparison with the standards for judging the adequacy of a single experiment. Indeed, it is only within the past few years that serious attention has been devoted to developing objective and standard- ized procedures for evaluating the consistencies across a body of inde- pendent studies. For the most part, judgment about what a body of investigations demonstrates is still a surprisingly intuitive and haphazard process. This probably has not been a serious drawback in those areas of inquiry in which the basic phenomena are robust and experiments can be conducted with high confidence that the predicted relations will be obtained; but such impressionistic means for aggregating the outcomes of several experiments in the domain of parapsychology open the door to all the motivational and cognitive biases discussed in the paper prepared for the committee by Griffin. Not only are the data and alleged correlations erratic and elusive in this field, but their very existence is open to question. EVALUATION OF THE SCIENTIFIC EVIDENCE To evaluate the best scientific evidence on the existence of psi, and with the advice of proponents and our sponsors, we conducted site visits to some of the most notable parapsychological laboratories. The para- psychology subcommittee (see Appendix Q visited Robert Jahn's Engi- neering Anomalies Research Laboratory at Princeton University, where it witnessed presentations and demonstrations regarding psychokinetic experiments on random number generators. Jahn and his associates also briefed the subcommittee on the current status of their work in remote viewing. The subcommittee also visited Helmut Schmidt's laboratory at the Mind Science Foundation, San Antonio, Texas. Schmidt pioneered the use of random number generators in parapsychology experiments in 1969. His is considered one of the two major research programs on psychokinesis (the second is Jahn's). As an additional posssible input. the committee agreed to participate in a psychokinetic experiment of new design with Helmut Schmidt. Specifically, Schmidt accepted the suggestion that the committee's con- sultant, Paul Horwitz, be included in the conduct of the experiment. The W > 0 CL CL 176 ENHANCING HUMAN PERFORMANCE work has not yet begun, however, and it now appears that we will not have any results to report before our terms expire. The chair of the parapsychology subcommittee also visited SRI Inter- national, another major laboratory studying psychic effects on random Uumber generators. (This latter research group argues that the observed IVT &ffects are not due to psychokinesis but rather represent a special form C2~ -recognition.) The subcommittee chair also attended the meetings of 9e PParapsychological Association held at Sonoma State College in California. The entire committee made a site visit to Cleve Backster's Rboratory in San Diego (arranged to coincide with the committee's CDeeting in La Jolla, California). Q (DThese site visits enabled the committee to observe firsthand the Mperimental arrangements and equipment used by some of the major T- 4wntributors to parapsychological research. They also provided us an ~Pportunity to discuss results, interpretations, and problems with a few 9hportant investigators. We were impressed with the sincerity and &dication of these investigators and believe that they are trying to gLonduct their research in the best scientific tradition. We also got the Qpression that this type of research involves many unresolved problems Rpod still has a long way to go before it develops standardized, easily 5plicable procedures. The information obtained from these site visits (&es not provide an adequate basis for making scientific judgments. For s we rely, as we would in other fields of science, on a careful survey the literature. I 00 Q RESEARCH ON REMOTE VIEWING Q Q Q The SRI Remote Viewing Program C*4 a) Since the early 1970s, probably the best known research program parapsychology has been the experiments in remote viewing initiated physicists Harold Puthoff and Russell Targ when they were at I International. In a typical remote viewing experiment a subject, percipient, remains in a room or laboratory with an experimenter, Ohile a target team visits a randomly selected geographical site g., a shopping mall, an outdoor arena, the Palo Alto airport, the Io-over tower). Neither the experimenter nor the subject has been ven any information about the target. Once the experimenter and e subject are closeted in the laboratory, they wait for 30 minutes I Ifore the subject begins to describe his or her impressions of the 0 CL CL 184 ENHANCING HUMAN PERFORMANCE to replicate the Schlitz and Gruber experiment without the flaw mentioned. One, still unpublished, produced negative results. The second, by Schlitz and Haight (1984), produced marginally significant results. In e deed, if more acceptable two-tailed test of significance had been used, the results vmuld not have been considered significant by customary standards. Afthough the report of this study lacks sufficient documentation with rcD ect to certain aspects of procedure, both Palmer (1985) and Alcock R2P age that this is the best controlled and most methodologically sound of afthe remote viewing experiments so far. summary, after approximately 15 years of claims and sometimes ~tb r controversy, the literature on remote viewing has managed to p 181 uce only one possibly successful experiment that is not seriously~ &yed in its methodology-and that one experiment provides on y nl&ginal evidence for the existence of ESP. By both scientific and p*r.apsychological standards, then, the case for remote viewing is no jug very weak, but virtually nonexistent. It seems that the preeminent~ p4tion that remote viewing occupies in the minds of many proponents rTlts from the highly exaggerated claims made for the early experiments, atyell as the subjectively compelling, but illusory, correspondences that eWprimenters and participants find between components of the descrip- tiQjs and the target sites. RESEARCH ON RANDOM NUMBER GENERATORS The Basic Paradigm ge use of random number (or random event) generators for pampsychological research began in the 1960s and became relatively _04 sta gdard during the 1970s as the technology became widely available. A upndom number generator (RNG) is simply an electronic device th49 uses either radioactive decay or electronic noise to generate a seeence of random symbols. Originally such devices were used to teZ ESP, usually clairvoyance or precognition, but the most wide- sp&id and widely known work focuses on what is called micropsy- chizinesis, or micro-PK. In such research a subject, or operator, attUnpts to mentally bias the output of the random number generator, so g6at it produces a nonrandom sequence. Sst of the work with RNGs has used binary generators, or what ScI134idt calls "electronic coin flippers." The output on each trial is &rher 0 or 1, that is, heads or tails. If the RNG is unbiased and truly random, then it should produce, on control runs, sequences of Os and Is that are independent of each other and that, in the long run, will yield Is 50 percent of the time. PARANORMAL PHENOMENA 185 In a typical experiment, a subject (either a person who claims to be a psychic or a person chosen for availability who does not make such claims) is placed in the vicinity of the RNG and attempts to bias the output either toward more or fewer Is. When an animal is used as -T_ the subject, the RNG output is usually coupled to an outcome whose frequency the animal presumably would like to either increase or ase. In an experiment carried out with cockroaches, for example, decre one outcome was electric shock. If, during the time the output of the RNG was coupled with the shock apparatus, the proportion of evidence shocks decreased below 50 percent, this would be taken as of a psychokinetic effect of the cockroach on the output of the RNG. The RNG experiments have been of interest to some military and govammental personnel because of the possibility, if such micro-PK is demonstrable, of psychically affecting equipment and computers that depend on the output of electronic symbols. Results of the Experiments In a recent survey 56 reports published between 1969 and 1984 and dealing with research on possible psychokinetic perturbations of binary RNGs (Radin, May, and Thomson, 1985), the reviewers counted 332 separate experiments. Of the 332 experiments, 188 were reported in refereed journals or conference proceedings, and of these 188 experiments with some claim to scientific status, 58 reported statis- tically significant results (compared with the 9 or 10 experiments that would be expected by chance). The other 144 experiments were produced by the Engineering Anomalies Research Laboratory at Princeton University; none of them had been published in a refereed journal at the time of the survey. Of these 144 experiments, 13 were classified as yielding statistically significant results. So, in the total sample of 332 experiments, 71 yielded ostensibly significant results at the traditional .05 level. This amounts to a success rate of approximately 21 percent, compared with the rate of 5 percent that would be expected by chance. Palmer (1985) and Alcock agree that such results cannot be accounted for by chance. In other words, both the parapsychologist and the skeptic, in their respective reviews of the RNG research, agree that something other than accidental fluctuation is producing these results. Palmer calls this something an anomaly, which, while it may or may not be paranormal, cannot be explained by current scientific theories. Alcock points to various defects in the experimen- tal protocols and concludes that no conclusions about the origins of these departures from randomness are justified until successful > 0 CL CL 186 ENHANCING HUMAN PERFORMANCE outcomes can be more or less consistently produced with adequately designed and executed experiments. Both Palmer and Alcock focus their reviews on the two most influential research programs on RNGs. One is the program of Helmut JQhmidt, a quantum physicist who began working on psi and RNGs in 069. The other is the program begun by Robert Jahn in the late f52b7/0s, when he was dean of the School of Engineering and Applied Rience at Princeton University (see Jahn, 1982). These two programs Ij6ve accounted for almost 60 percent of all known experiments on ZI ~ 5~Gs. They have also been the most consistently successful in thieving statistically significant outcomes. 5AIthough the results suggest that on each experimental group of *Is the number of Is is greater or less than the 50 percent baseline (4epending on the intended direction), the actual degree of deviation #wm chance is quite small. As Palmer (1985) indicates, Schmidt's Objects have averaged approximately 50.5 percent hits over the years, 8mpared with the expected baseline of 50 percent. This amounts to ducIng one extra I every 100 trials. The reason such a small gr arture from chance is statistically significant is that an enormous JEmber of trials is conducted with each subject. 'Jahn and his colleagues at Princeton have, in a much shorter time, duced on the order of 200 times the number of trials that Schmidt 170 in 17 years. The Princeton researchers have also produced a %*nificantly lower success rate than Schmidt. In their formal series ~7 78 million trials, the percentage of hits in the intended direction 00 ViNs only 50.02 percent, or an average of 2 extra hits every 2,500 Ws. Again, such an extremely weak effect is statistically signifi 16 & only when one is dealing with very large numbers of trials. C*4 W U) Scientific Assessment of the RNG Experiments M W -tPalmer (1985) carefully reviews the major criticisms of the work fe Schmidt and Jahn. He addresses questions about security, because *Jects often are left alone with the apparatus during the data frilection. In the Princeton experiments, the data are always col- lUted when the subject is alone with the apparatus. Although the Vnceton experiments now contain a number of features that would noike it extremely difficult for a naive subject to bias the results, it L_ imot clear that this has always been so. It would make good scientific s tse to conduct some trials during which the subject is carefully in nitored to see if successful outcomes are still obtained. The major reservations about the RNG experiments concern the adequacy of the randomization of the outputs. Schmidt applied only limited tests for the randomness of his machines, and most of the PARANORMAL PHENOMENA 187 control trials were gathered by allowing the machine to run for long periods, usually overnight. Although these controls usually produced results in line with the chance baseline, critics have pointed out that the controls are unsatisfactory because they were not conducted for shorter runs and at the same time as the data from the experimental sessions. Palmer grants that the critics are correct in pointing out some of the shortcomings in Schmidt's methods for testing and controlling for the randomization of his machines. Palmer also correctly points out that such criticism is somewhat blunted by the fact that the critics have not specified any plausible mechanisms that would account for the obtained differences between the experimental and control trials. He is correct in pointing out that the Princeton experiments provide more adequate controls; however, he has probably assumed that the baseline controls in the Princeton experiments were run at the same time as the two experimental conditions of hitting and missing. It is easy to interpret the somewhat ambiguous description of the procedure in this manner. The relevant part of the authors' methodological description is as follows (Nelson, Dunne, and Jahn, 1984:9): The primary variable in these experiments is the operator's pre-recorded intention to shift the trial counts to higher or lower numbers. This direc- tional intention may be the operator's choice-the so-called "volitional" mode-or it may be assigned by a specified random process-the "instructed" mode. In either mode, data are collected in a "tri-polar" protocol, wherein trials taken under an intention to achieve high numbers (PK+), trials taken under an intention to achieve low numbers (PK-), and trials taken as baseline, i.e. under null intention (BL), are interspersed in some reasonable fashion, with all other operating conditions held identical. For all three streams of data, effect size is measured relative to the theoretical chance mean. This tri-polar protocol is the ultimate safeguard in precluding any artifacts such as residual electronic biases or transient environmental influences from systematically distorting the data. At first glance it might appear as if the tripolar protocol requires that the two types of experimental groups of trials and the baseline group of trials always be taken at the same session. This would be consistent with the claim that "any artifacts such as residual electronic biases or transient environmental influences" were thereby precluded "from systematically distorting the data." Such a claim would be justified if, in fact, at each session one group of trials of each of the three types was obtained, provided that each group of trials was of the same length and that the order of the three types of trials was independently randomized for each session. The description provided by Nelson and his colleagues says nothing 188 ENHANCING HUMAN PERFORMANCE at all about the order in which the three conditions were conducted, and a careful reading indicates that the baseline data may not always have been obtained at the same sessions and under the same conditions as the experimental groups of trials. It is not clear what the authors mFan by stating that the three trials "are interspersed in some re0onable fashion." In fact, an examination of the data reported RE each subject makes it clear that the strict tripolar protocol c("d not possibly have been followed with much of the data C-0-ction, because in many cases the baseline data are entirely 4iy- ,:Int or occur with many fewer trials than the experimental data. a ,,ed, it is not even clear that PK + and PIC - trials were always 0 ined at the same sessions, because for some subjects the total AMbers of these trials are not equal. Re suspect that, over the six years or so during which the Princeton g~pp was accumulating its data base, it made many changes in both th;? hardware and the experimental protocol. The sophisticated p*edures currently in use and the requirement that the three types ofi:trials be of equal length and that one of each be conducted at eJQh session are the most recent variations in the paradigm * Unfor- ttg~tely, the data are not presented in such a way that it is possible tAdetermine whether the successful results are due to the earlier o5he later experiments. uch issues become especially important when we consider the eiemely small size of the effect being claimed and when we further raize, as Palmer has pointed out, that the bulk of the significance ighe formal series was due to just one subject, who contributed 23 pcwent of the total data. This one subject achieved a hit rate of 5R5 percent. When her data are eliminated, the remaining data yiWd a hit rate of 50.01 percent, which is no longer significantly digerent from chance. On other words, it looks as if almost all the success of Jahn's huge d& base can be attributed to the results from one individual, whoo, over the years, produced almost 25 percent of the data. This one individual was not only the most experienced subject, but also, p~Lrsumably, familiar with the equipment. When combined with the tag, as Palmer points out, that the Princeton experiments provide i equate documentation on precautions to prevent tampering by ngd sm ects, it becomes even more important to see if the same degree of sless can be achieved when the sessions are adequately monitored. 4*lcock, in his review of the same RNG studies surveyed by Palmer, points to a number of weaknesses in both the Schmidt and the Princeton experiments. For example, he faults Schmidt's experiments for such things as inadequate controls, failure to examine the target se- PARANORMAL PHENOMENA 189 quences, overcomplicated experimental setups, inadequate tests of randomness, and lack of methodological rigor. Alcock faults the Princeton experiments for such things as failing to randomize the sequence of groups of trials at each session, inadequate documentation on precautions against data tampering, and possibilities of data selection. Palmer and Alcock do not really differ in their assessments of the shortcomings of the Schmidt and Princeton RNG experiments. They do differ, however, on what conclusions can be drawn from such imperfect experiments. Palmer emphasizes the fact that the critics have not provided plausible explanations as to how the admitted flaws could have caused the observed results. His position seems to be that, unless the critics can provide such plausible alternatives, the results should be accepted as demonstrating an anomaly. Alcock focuses on the fact that the successful results have been obtained under conditions that fall short of the experimental ideals that parapsychologists themselves profess. He emphasizes that the para- psychologists have no right to claim to have demonstrated psi from experiments that have been conducted with "dirty test tubes." Such a revolutionary conclusion as the existence of psi demands justifi- cation from experiments that have clearly used "clean test tubes." What would it take to conduct an adequate RNG experiment? May, Humphrey, and Hubbard (1980) set out to do just that. After reviewing all available RNG experiments from 1970 through 1979 and taking into account the various deficiencies in these experiments, they gathered together and meticulously tested the components necessary to provide adequately randomized trials. They also devised a careful experimental protocol and set out in advance the precise criteria that would have to be fulfilled before they could call their results successful. Going further, after they completed the experi- ment with results that met their criteria for success, they subjected their equipment to all sorts of physical extremes to see if they could obtain such a degree of success by a possible artifact. They report that this singularly well controlled RNG experiment in fact met their criteria for success. It is unfortunate, therefore, that this carefully thought-out experiment was conducted only once. After the one successful series, using seven subjects, the equipment was dismantled, and the authors have no intention of trying to replicate it (personal communication, August 1986). It is unfortunate because this appears to. be the only near-flawless RNG experiment known to us, and the results were just barely significant. Only two of the seven subjects produced significant results, and the test of overall significance for the total formal series yielded a probability of 0.029. 190 ENHANCING HUMAN PERFORMANCE The experiment, while nearly flawless, still had some problems as evidence for psi. For one thing, it was reported only in a technical report in 1980 and has never been published in a refereed scientific journal. Despite the admirable attention to details, all the control trials were taken w7ren no human being was present. One might argue that this was not an idM.l control for the experimental session, in which a subject was pasically present in the room. The authors have assured us that their vaqous attempts to bias the machine by physical means almost certainly nM out the possibility that the mere presence of a human being could hoe affected the output. However, a physicist who claims to have C*4 someral years of experience in constructing and testing random number deices tells us that it is quite possible, under some circumstances, for t& human body to act as an antenna and, as a result, possibly bias the Zr-ut. 6May and his colleagues at SRI, in the same technical report in which tW claim successful results for their single experiment, surveyed all the P,&G experiments known to them through the year 1979 and found that tlg~iir combined significance was astronomically high. They add (May, I-Mmphrey, and Hubbard, 1980:8): At impressive statistic must, however, be evaluated with respect to experimental e pment and protocols. All the studies surveyed could be considered incomplete A" .1 least one of the following four areas: (1) No control tests were reported in fie than 44 percent of the references. Of those that did, most did not check in temporal stability of the random sources during the course of the experiment. (arhere were insufficient details about the physics and constructed parameters oQhe experimental apparatus to assess the possibility of environmental influences. (-SFhe raw data was not saved for later and independent analysis in virtually alo of the experiments. (4) None of the experiments reported controlled and liNted access to the experimental apparatus. %ks far as we can tell, the same four points can be made with respect tca)the RNG experiments that have been conducted since 1980. The s0ation for the RNG experiments thus seems to be the same as that for rZote viewing: over a period of approximately 15 years of research, oidy one successful experiment can be found that appears to meet most ol#the minimal criteria of scientific acceptability, and that one successful e~geriment yielded results that arejust marginally significant. 0 L- CL RESEARCH ON THE GANZFELD CL The Ganzfeld Experiments The Ganzfeld psi experiments are named after the term used by Gestalt psychologists to designate the entire visual field. For PARANORMAL PHENOMENA 191 theoretical purposes, the Gestalt psychologists wanted to create a situation in which the subject or observer could view a homogeneous visual field, one with no imperfections or boundaries. Psychologists late discovered that when individuals are put into a Ganzfeld situation they tend quickly to experience what they described as an altered state of mind. In the early 1970s, some parapsychologists decided that the use of the Ganzfeld would provide a relatively safe and easy way to create -1 an altered state in their experimental subjects. They believed that such a state was more conducive to picking up the elusive psi signals. In a typical psi Ganzfeld experiment, the subject, or percipient, has halved ping-pong balls taped over the eyes. The subject then reclines in a comfortable chair while white noise plays through earphones attached to his or her head. A bright light shines in front of the subject's face. When seen through the translucent ping-pong balls, the light is experienced as a homogeneous, foglike field. When so prepared, almost all subjects report experiencing a pleasant, altered state within 15 minutes. While one experimenter is preparing the subject for the Ganzfeld state, a second experimenter randomly selects a target pool from a large set. The target pool typically consists of four possible targets, usually reproductions of paintings or pictures of travel scenes. One of the four is chosen at random to be the target for that trial. The target is given to an agent, or sender, who tries to communicate its substance psychically to the subject in the Ganzfeld state. After a designated period, the subject is removed from the Ganzfeld state and presented with the four candidates from the target pool. The subject then ranks the four candidates in terms of how well each matched the experience of the Ganzfeld period. If the actual target is ranked first, the trial is designated a hit. An actual experiment consists of several trials. In the example, the probability is that one of every four trials will produce a hit. If the number of hits significantly exceeds the expected 25 percent, then the result is considered to be evidence for the existence of psi. Critique of the Ganzfeld Experiments In a careful and systematic review of the Ganzfeld experiments undertaken in 1981 and published in the March 1985 issue of the Journal of Parapsychology, Hyman concluded that the data base exhibited flaws involving multiple testing, inadequate controls for sensory leakage, inadequate randomization, statistical errors, and inadequate documentation. These flaws, in his opinion, were sufficient 0 CL CL 192 ENHANCING HUMAN PERFORMANCE to disqualify the Ganzfeld data base as evidence for psi. Of the 42 experiments, 39 (93 percent) used multiple analyses, which artificially inflated the chances of obtaining significant outcomes. Only 11 (26 pWent) clearly indicated that they had adequately randomized the tarAet selections. As many as 15 (36 percent) used inferior randomi- z-01-n, such as hand shuffling, or no randomization at all. The iy reWaining 16 experiments did not supply sufficient information on how C*4 thivi had chosen the targets. As many as 23 of the experiments (55 p(gent) used only one target pool, which means that the subject wR4 handed for judging not a copy of the target but the very same Q tacoet that the percipient had handled, permitting the possibility of s(ft,ory cueing. Although the argument for psi is mainly a statistical o", the reports of 12 experiments (29 percent) revealed statistical ej%rs. A number of other departures from optimal practice were also ~'444 e same issue of the Journal of Parapsychology contained a laWthy rebuttal by parapsychologist Charles Honorton, one of the pteers of the Ganzfeld psi technique. Honorton disputed many of 's opinions as to what constituted flaws; provided a reanalysis an of~the data base to overcome many of the statistical weaknesses of t16 original experiments; and argued that the flaws he agreed existed wgre not sufficient to have accounted for the findings. In this respect him analysis is consistent with Palmer's approach. He does not deny t9t the experiments depart from optimal design, but he argues that 00 s0h departures are insufficient to account for the results. ;2konorton and Hyman had the opportunity to discuss their differ- elRes about psi in general at the Parapsychological Association rMetings in 1986; as a result, they agreed to draft a joint communiqud t(&emphasize those points on which they agree. That communiqud afpeared in the December issue of the Journal of Parapsychology Ogyman and Honor-ton, 1986). They agree that the current data base iWnsufficient to support either the conclusion that psi exists or the conclusion that the results are due to artifacts. They further agree ti'Lat the issue can be settled only by future experiments conducted alUording to the stated standards of parapsychology, which are also tlg accepted standards of psychological research. cAnother important input to the committee's judgment on the tpzfeld research was the systematic evaluation of the contemporary apsychological literature by Charles Akers (1984), a former pta apsychologist. Akers's critique used a methodological strategy different from that used by Hyman. Hyman undertook to evaluate the entire data base of a single research paradigm (Ganzfeld), including both successful and unsuccessful outcomes. Akers surveyed PARANORMAL PHENOMENA 193 contemporary ESP experiments broadly, but confined his evaluation to those that had produced significant results with unselected subjects. Hyman assigned flaws to experiments without regard to whether each flaw, by itself, could have caused the observed outcome. Akers charged a flaw to a study only if he thought the flaw could have been sufficient to produce the observed result. He chose a sample of 54 parapsychological experiments from areas of research that had been previously reviewed by Honorton or Palmer; his intent was to choose experiments that could be viewed as the best current evidence for the existence of psi. As a result of this exercise, he concluded (Akers, 1984:160-161): Resu' ts from the 54-experiment survey have demonstrated that there are in any alternative explanations for ESP phenomena; the choice is not simply etween psi and experimenter fraud.... The numbers of experiments ... awed on various grounds were as follows: randomization failures (13), sensory leakage (22), subject cheating (12), recording errors (10), classification or scoring errors (9), statistical errors (12), reporting failures (10).... All told, 85% of the experiments were considered flawed (46/54). This leaves eight experiments where no flaws were assigned.... Although none of these experiments has a glaring weakness, this does not mean that they are especially strong in either their methods or their results.... In conclusion, eight experiments were conducted with reasonable care, but none of these could be considered as methodologically ideal. When all 54 experiments are considered, it can be stated that the research methods are too weak to establish the existence of a paranormal phenomenon. RESEARCH ON ELECTRICAL ACTIVITY AND EMOTIONAL STATES The Backster Laboratory In addition to examining parapsychological research in areas that have produced large literatures, the committee witnessed an example of experimental work at a far less developed stage. On February 10, 1986, committee members visited the Backster Research Foundation in San Diego and saw a demonstration of experimental procedures for detecting a correlation between the electrical activity of oral leukocytes and the emotional states of the donor. Cleve Backster is a polygraph specialist who had at one time helped develop interrogation techniques for the Central Intelligence Agency and now runs his own polygraph school in San Diego. The school is housed in the same rooms that constitute the Backster Research Foundation, which is devoted to the study of what Backster refers to as primary perception. Backster's research on paranormal matters Q 00 Q Q Q Q C*4 a) U) M a) 7a; 0 U- 194 ENHANCING HUMAN PERFORMANCE began in February 1966, when he recorded, from a philodendron plant that he had hooked up to a polygraph, a response he recognized as similar to that of human beings in emotional states. Backster believed h*-had demonstrated that the plant showed such emotional response v4Zen brine shrimp or other living organisms were either threatened o8actually killed in an adjoining room. The notion of primary pf~_-eption in plants became both a popular subject for research and aMghly controversial concept during the late 1960s and early 1970s. IM 7 ae were told that Backster has quietly continued his researches irg this and related matters. He has now devised a technique for r=Drding electrical activity in leukocytes taken from a donor's n th. The advantage of this technique, we were told, is that the ytes respond mostly to emotional states of the donor. letoc ibne committee member volunteered to be the demonstration subject. ther member accompanied him to observe the techniques for 0 ining the leukocytes and preparing them for recording. The smple was obtained by having the subject "chew" on a 1.2 percent s&e solution and then spit it back into a centrifuge tube. Ten s .uch sUnples were obtained in this way. The samples were then spun in a c trifuge for six minutes, and the particulate matter at the bottom 050ach tube was pipetted into the preparation tube. The preparation tuhe contained about one centimeter of particulate matter and was fiftd almost to the top with 1.2 percent saline solution. Two uIrr sulated wire electrodes were inserted into the bottom of the tZ, which was then placed within a shielded cage and connected bloR eads to an EEG-type recording apparatus. &uring the demonstration, the subject sat approximately two meters Mn the preparation. We were told that subjects usually sit about fi% meters from the preparation. A split-screen projection video diffl-lay was provided: the lower portion of the screen recorded the rrZvements of the polygraph paper and pen as they produced a record oo~the electrical activity presumably taking place in the leukocyte PINparation. The upper portion of the screen recorded the behavior oQthe seated subject. -an his previous research using this arrangement, Backster reported t*, when the subject revealed an emotional reaction, the electrical aQlMion of the leukocytes showed a corresponding reaction. During 00- demonstration, the polygraph record produced several strong A,M-ctions in both the control and the experimental series, but they ~.iw not obviously correlate with any corresponding thoughts or emotional states of the subject as various stimuli were presented. Backster suggested that this was probably because so many people were crowded into the laboratory that the leukocytes were respond- PARANORMAL PHENOMENA 195 ing to thoughts and feelings of other individuals in the room. Thus, a demonstration of results, as opposed to techniques, was not, after all, going to be possible during our visit. Backster then showed us videotapes of the split-screen results he had obtained in his "formal" experiments. The results consisted of 12 examples of apparent correlations between an emotional response and a deflection of the polygraph record. The 12 examples came from 7 sessions with 7 different subjects. Although the information is not given in his written report, it appears that each session lasted for 11 approximately half an hour. During this time, the donor is engaged i in conversation or watches videotapes of television programs. The sessions are not standardized or planned. Backster's intent, appar- ently, is to elicit spontaneous emotional responses from a subject during the session. He believes that a stimulus that evokes an emotional response in one subject will not necessarily do so in another subject. In one example, the subject was a young man who was looking at an issue of Playboy magazine. The polygraph tracing began to display large deflections soon after he encountered a nude photograph of an attractive young woman. The large deflections continued for approximately two minutes; the tracing slowly settled down to normal activity after the magazine was closed. Soon after, the young man reached for the closed magazine, and the record reveals a single deflection at that point. In another example, the subject was a retired police lieutenant. When discussing his approaching retirement, he was asked a question about his wife's attitude toward having him "underfoot." A large deflection of the polygraph tracing occurred soon after this question was asked. When asked, the donor confirmed that he was emotionally aroused at that moment in the session (see Backster and White, 1985). Cleve Backster and his supporters apparently believe that he has successfully demonstrated that detached oral leukocytes respond to the emotions of their donor even when separated by as much as several miles. They also believe that these results are reliable and replicable. Critique of the Backster Experiment What we have read and observed about Backster's procedures does not justify the claim he is making. His answers to our questions made it clear that he has not considered using the appropriate controls needed to ensure that the obtained "correlations" are real and due to the causes he has assumed. To make adequate physiological recordings from a 196 ENHANCING HUMAN PERFORMANCE preparation of in vitro leukocytes and to demonstrate the correlation ntal between emotional response and leukocyte activity requires experime arrangements and procedures at a level of sophistication well beyond ,g.we observed. tho5 (4hnmittee members who are knowledgeable about the procedures and instQmentation of psychophysiological experiments expressed doubts abo13 the adequacy of the setup to perform the tasks Backster has un(qtaken. Serious doubts were expressed about the possibility that the leulecytes were alive at the time of recording. Further doubts were exprqssed about the setup's ability to avoid contamination of the recording proj3dures by stray influences of various sorts. We do not discuss these dra5backs in detail here - We confine our discussion to Backster's method for%htablishing a correlation between the alleged activity of the detached leupcytes and the emotional state of the donor. When we consider how thecDxistence of such correlations was established, we again see how inairopriate methodology can lead to very misleading conclusions. ny problems exist with regard to Backster's procedures for detecting corfWations. In trying to demonstrate a pattern of covariation between tweecords of behavior over time, one record is the tracing of amplified lec lectrodes and through the leads. 37ical activity coming from the e e AltZbugh this tracing can be quantified, Backster has apparently made no9tempt to do so. Instead, he has relied on visual inspection of the poiiraph record to pick out points at which the deflections of the pen fro]Z the baseline are noticeable. Although such subjective judgment is scil§tifically unacceptable, the deflections that he uses in his examples seq'g sufficiently marked that they probably can be considered to be real de&tions from the baseline. At any rate, let us assume that responses reasonable onfte polygraph record can be visually pinpointed with objgktivity. Vie deflections on the polygraph record are then compared with haRenings on the concurrent videotaping of the conversation with the su&ct. Here we encounter very serious problems as to what constitutes an Cjnotional response on this behavioral record. Backster believes he cajodentify categories of potentially emotionally arousing stimuli in the noDdtandardized, qualitative, ongoing record of conversation. He then cat6determine if the subject was experiencing an emotional reaction to SL14 a stimulus by simply replaying the record, pointing to the segment thatcorresponds to a place where the polygraph showed a deflection, and2asking the subject if he or she recalls what was taking place at that molnent as an emotionally arousing experience. If the subject agrees, this is said to confirm a "correlation" between the emotional state and the corresponding activity of the tracing. Such a purely subjective determination of an emotional response opens PARANORMAL PHENOMENA 197 the process to a variety of known biases, Many of them discussed in the paper prepared for the Committee by Griffin (Appendix B). The literature on "illusory correlation" (Alloy and Tabachnik, 1984; Griffin paper) makes it clear how subjective expectations and cognitive biases can lead to false impressions of correlation. Backster's method of searching for correlations compounds these inevitable biases: he does not independently determine moments of emotional response in the subject's behavioral 4 record and moments of polygraph deflections and then look for a match between the two. Instead, he apparently looks for polygraph deflections and then tries to determine if an emotional response can be found that 2 occurred in the vicinity of the polygraph activity. In other words, the determination of the emotional response is done with full knowledge of the fact that a polygraph deflection has occurred. Under such circumstances, we would expect processes of subjective validation to operate. In addition, the method of verifying the emotional response, by asking the subject to acknowledge that he or she was in fact experiencing such a state at the moment the polygraph record indicated a leukocyte response, is itself suspect. This is the sort of circumstance in which demand characteristics (i.e., responses determined by the presumed intent of the experimenters) are known to operate. Good science dictates that the moments of emotional response should be determined independently of the moments of polygraph response. Both the experimenter and the subject must be blind to the polygraph record when determining the moments of emotional response. Only when the determination of events on the two records has been made independ- ently of each other can the records be compared to determine if the emotional responses and the polygraph activity are correlated. Illusory correlations occur because our subjective judgments of cov- ariation tend to use only a portion of the relevant information and because we tend to bias observed events in terms of our expectations. In particular, in ul t itive judgments of covariation tend to focus only on the co-occurrence of treatment of interest and successful outcomes, ignoring times when the treatment co-occurred with unsuccessful outcomes. Backster uses only those examples from his records in which an emotional response co-occurs with a polygraph deflection; the 12 such examples from the 7 expenmental series represent a very small fraction of the total data collected. Not only is a sample of just 12 co-occurrences probably too small for estimating whether a true correlation exists, but it is also impossible from this information alone to estimate whether any correlation exists. All the data are needed for this purpose. Almost certainly, more than 12 polygraph deflections must have appeared in the total record. In the brief demon- stration for the committee, both the control and the experimental series CD T- 00 CD Q CD CD C*4 U) M (D a) W 0 LL 0 CL CL 198 ENHANCING HUMAN PERFORMANCE yielded several deflections, so it is reasonable to assume that many more than 12 deflections were obtained in the complete record. It is likely that these unreported deflections were not preceded by any emotional re- sponses. T;klmost certainly, more than 12 emotional responses must have appeared iGhe total record. The point of conducting the sessions was to expose tl~o subjects to a variety of emotional stimuli; therefore, it is essential to kp4w the number of times that emotional responses occurred without tthhe c sponding occurrence of polygraph responses. Finally, to determine carelation, it is essential to know the frequency of co-occurrence of the agencc of emotional responses and the absence of polygraph responses. &I this information is needed to determine whether the claimed clXrelation exists. All the data must be used. From these data, one can cM7- are the proportion of times that an emotional response is followed 07 ppolygraph response with the proportion of times that the absence Tn emotional response is followed by a polygraph response. Only if drLse two proportions are significantly different from one another can we alEume that the data provide evidence for a correlation between emotional riMponse and leukocyte activity. The fact that Backster was able to find IW examples of the co-occurrence between emotional response and p4ygraph deflection, even if these correspondences had come from dUable-blind matching, provides us with absolutely no information about vArether a correlation exists. ahe stronger claim would be, of course, not that a correlation exists, blB that a causal connection exists between the subject's emotional states Q the responses of the detached leukocytes. As Chapter 3 on evaluation Qicates, such a causal explanation requires much more than the d,qnonstration of correlation between two series. Because Backster did nig use double-blind procedures to determine emotional responses, and bVtause the procedures he did use are known to be just those that fZilitate the occurrence of a variety of subjective biases, he may well f7a-ye obtained a correlation between his two series. However, his Acedures for finding such correlations are sufficiently flawed that we d6not know if in fact the suspected (and presumably biased) correlation J&ually does exist in his data. The Backster experiment indicates that tig best intentions combined with scientific instrumentation and poly- gtmphic records cannot, in themselves, guarantee data of scientific quality. 0 L_ CL CL DISCUSSION OF THE SCIENTIFIC EvIDENCE Both the parapsychologists cited in this report and the critics of parapsychology believe that the best contemporary experiments in para- psychology fall short of acceptable methodological standards. The critics PARANORMAL PHENOMENA 199 conclude that such data, based on methodologically flawed procedures, cannot justify any conclusions about psi. The parapsychologists argue that, while each experiment is individually flawed, when taken together they justify the conclusion that psi exists. Palmer's conclusion in this regard is unique. Although he agrees that the data do not justify the conclusion that a paranormal phenomenon has been demonstrated, he argues that the data, with all their drawbacks, do justify the conclusion that an anomaly of some sort has been demonstrated. it is this purported demonstration of an anomaly that, according to Palmer, furtherjustifies the claim that parapsychologists do have a subject matter. The awkward aspect of Palmer's position is that, without an Al adequate theory, there is no way to know that the anomaly "demon- strated" in one experiment is the same anomaly "demonstrated" in another; indeed, there is no limit to the possible causes of the anomaly in a given experiment. Without an adequate theory, there is no reason to assume that the various anomalies constitute a coherent or intelligibly related class of phenomena. The committee distinguishes among three types of criticism that can be leveled at a given parapsychological finding. The first is what we might I refer to as the smoking gun. This type of criticism asserts or strongly implies that the observed findings were due not to psi but to factor X. Such a claim puts the burden of proof on the critic. To back up such a claim, the critic must provide evidence that the results were in fact caused by X. Many of the bitterly contested feuds between critics and proponents have often been the result of the proponent's assuming, correctly or incorrectly, that this type of criticism was being made. The second type of criticism can be referred to as the plausible alternative. In this case, the critic does not assert that the result was due to factor X, but instead asserts that the result could have been due to factor X. Such a stance also places a burden on the critic, but one not so stringent as the smoking gun assertion. The critic now has to make a plausible case for the possibility that factor X was sufficient to have caused the result. For example, optional stopping of an experiment on the part of a subject can bias the results, but the bias is a small one; it would be a mistake to assert that an outcome was due to optional stopping if the probability of the outcome is extremely low. Akers's critique, which was previously discussed, is an example based on the plausible alternative. The third type of criticism is what we have called the dirty test tube. In this case, the critic does not claim that the results have been produced by some artifact, but instead points out that the results have been obtained under conditions that fail to meet generally accepted standards. The gist of this type of criticism is that test tubes should be clean when doing a) > 0- CL CL 200 ENHANCING HUMAN PERFORMANCE careful and important scientific research. To the extent that the test tubes were dirty, it is suggested that the experiment was not carried out according to acceptable standards. Consequently, the results remain. suspect even though the critic cannot demonstrate that the dirt in the teV tubes was sufficient to have produced the outcome. Hyman's critique oFdhe Ganzfeld psi research and Alcock's paper on remote viewing and om number generator research are examples of this type of criticism. rt the committee's view, it is in this latter sense, the dirty test tube s4e, that the best parapsychological experiments fall short. We do not hfie a smoking gun, nor have we demonstrated a plausible alternative; bcb we imagine that even the parapsychological community must be c(Scerned that their best experiments still fall far short of the methodo- 11cal adequacy that they themselves profess. dionorton and Hyman differ on whether to assign a flaw in randomization particular series of experiments. With Honorton's assignment, the stqflies with adequate randomization do not differ in significance of o9ome from those with inadequate randomization. With Hyman's atLignment, the experiments with inadequate randomization have signif- icEktly more successful outcomes than do those with adequate random- iz%ion. A simple disagreement on one experiment can thus make a huge d6terence as to whether we conclude that this flaw contributed or did nFJ contribute to the observed outcomes. Several similar examples could be 'cited to illustrate the extreme sensitivity of this data base to slight cPwnges in flaw assignments. Iven if Palmer is correct in asserting that in a particular case an alAmaly has been demonstrated, serious problems remain. In astronomy aj5 other sciences, an anomaly is a very precise and specifiable departure frRn a well-defined theoretical expectation. Neptune was discovered, fforr ex&nple, when Leverrier was able to specify not only that the orbit of U*nus departed from that expected by Newtonian theory, but also p&isely in what way it departed from expectation. Nothing approaching st& a specifiable anomaly has been claimed for parapsychology. A vague alt unspecifiable departure from chance is a far cry from a well-described arO systematic departure from a precise, theoretical equation. Leverrier's ahbmaly was consistent with only a very narrow range of possibilities. Tig sort of anomaly claimed for parapsychology is currently consistent w an almost infinite variety of possibilities, including artifacts of various & ki0s. CL CL THE PROBLEM OF QUALITATIVE EVIDENCE The committee continually encountered the distinction between qual- itative and quantitative evidence for the existence of paranormal phe- PARANORMAL PHENOMENA 201 nomena. Many proponents of the paranormal acknowledge such a differ- ence in one way or another. Some realize that it is only quantitative evidence that will convince the scientific community. Although they themselves have relied on qualitative evidence for their own beliefs, they 3, refer us to the RNG experiments of Robert Jahn or the remote viewing experiments at SRI as examples of supporting quantitative data. Most proponents seem impatient with the request for scientific evidence. They have been convinced through their own experiences or the vivid testimonies of individuals whom they trust. Many argue that qualitative evidence can be as good as quantitative; indeed, they claim that in some circumstances it can be better. The arguments for the superiority of qualitative evidence are based in many cases on such factors as ecological validity, conducive atmosphere, and holism. The ecological validity argument asserts that the artificial conditions required for laboratory experiments are so different from the natural settings in which paranormal phenomena typically occur that findings from such controlled studies are irrelevant. By removing the psychic from his or her natural domain or by arranging conditions to suit the needs of scientific observation, it is claimed, the scientist destroys the very phenomenon under question. The ecological validity argument is closely related to the other arguments. Proponents who emphasize the conducive atmosphere assert that the austere conditions of strict labo- ratory p ocedure create an atmosphere that is numbing or inimical to psychic functioning. Those who emphasize holism point out that the experimental procedures necessarily dissect and focus on restricted portions of a system. Such compartmentalization, it is claimed, makes it impossible to study the sorts of paranormal phenomena that operate only as a total system in a naturalistic context. QUALITATIVE EVIDENCE AND SUBJECTIVE BIASES What is meant by qualitative evidence? Roughly, it means any sort of nonscientific evidence that proponents find personally convincing. Typ- ically, it involves personally experiencing or witnessing the phenomenon. Less compelling, but still effective, is the testimony of friends or trusted acquaintances who have personally experienced it. Even individuals who are intellectually aware of the pitfalls of personal observation and testimony find it difficult, even impossible, to disregard the compelling quality of such evidence in the formation of their own beliefs. A major parapsychologist admitted to one committee member that the scientific evidence did not justify concluding that psi exists. "As a trained scientist," he said, "I know quite well that by scientific criteria there is no evidence for the existence of psi. In fact, I have always argued with 202 ENHANCING HUMAN PERFORMANCE my parapsychological colleagues that they are making a serious mistake in trying to get the scientific community to take their current evidence seriously. Before they do this, they first have to be able to collect the sort of repeatable and lawful data that constitute scientific evidence." Thig same parapsychologist then explained why, despite the current lack V Of c3idence, he remained a parapsychologist. "When I was 16 1 had some pe&nal experiences of a psychic nature that were so compelling that I ha t 2mno doubt that they were real. Yet, as a trained scientist, I know h y personal experiences and subjective convictions cannot and shald not be the basis for asking others to believe me." This parapsy- ch gist is unusual in that he makes the distinction within himself belveen beliefs that are subjectively compelling and beliefs that are sc%tifically justifiable. More typical is the proponent who, as a result of mmpelling personal experience, not only has no doubt about the reality of 6pderlying paranormal cause, but also has no patience with the refusal o0ithers; to support that belief. 19e see two problems regarding qualitative evidence. First, personal 0 olaervation and testimony are subject to a variety of strong biases of w1h most of us are unaware. When such observations and testimony e ge from circumstances that are emotional and personal, the biases a4distortions are greatly enhanced. Psychologists and others have found thP the circumstances under which such evidence is obtained are just th ' 'e that foster a variety of human biases and erroneous beliefs. Second . Ngfs formed under such circumstances tend to carry a high degree of s ective certainty and often resist alteration by later, more reliable digonfirming data. Such beliefs become self-sealing, in that when new in0rmation comes along that would ordinarily contradict them, the bavers find ways to turn the apparent contradictions into additional comfirmation. %he committee asked Dale Griffin to describe many of the ways in ARch cognitive and social psychologists have documented that human S' Jdk ective judgment can lead us astray. Griffin's paper emphasizes the '12'.. cqgnitive biases termed availability and representativeness, but he also diCtusses motivational biases. Although most of these biases have been LL created under laboratory conditions, they are nonetheless quite powerful, '~ evidence has been mounting that, if anything, they are much more r a p(8verful in natural settings. Griffin points out that one vivid, concrete e,Verience is usually sufficient to outweigh conclusions based on hundreds 0 _'housands of cases based on abstract summary statistics. These and %:L tFgother biases discussed by Griffin should make us wary of conclusions based on qualitative evidence. PARANORMAL PHENOMENA 203 EXAMPLES OF PROBLEMATic BELIEFS P.In this section we discuss some examples of beliefs about paranormal enomena that have been formed under conditions known to generate cognitive illusions and strong delusional beliefs. We attempt to make clear why we are skeptical of any evidence offered in support of the paranormal that does not strictly fulfill scientific criteria. We believe it is important to realize the power of such conditions to create strong but A false beliefs. In 1974 a group of distinguished physicists at the University of London observed renowned psychic Uri Geller apparently bend metallic objects ~41 and cause part of a crystal, encapsulated in a container, to disappear. Impressed with what they saw, in 1975 these scientists contributed an article to Nature outlining their ideas about how to conduct successful parapsychological research (reprinted in Hasted et al., 1976). In their discussion they note that successful results depend on the relation among the participants and that phenomena are more likely to occur when all participants are in a relaxed state, all sincerely want the psychic to succeed, and "the experimental arrangement is aesthetically or imagi- natively appealing to the person with apparent psycbokinetic powers." Hasted and his colleagues describe further desiderata. The psychic should be treated as one of the experimental team, contributing to an . . . . . attitude of mutual trust and confidence that facilitates successful appear ance of the allegedly paranormal effects. The slightest hint of suspicion on the part of the observers can stifle the occurrence of any phenomena. Observers should avoid looking for any particular outcome that interferes with the required relaxed state of mind and impedes paranormal powers. To help avoid the inhibiting effects of concentrated attention, participants should talk and think about matters irrelevant to the experiment at hand. Acknowledging that these desiderata make it difficult to preclude trickery, Hasted and his colleagues express confidence that they can both create psi-conducive conditions and eliminate the possibility of being tricked (Hasted et al., 1976:194): It should be possible to design experimental arrangements which are beyond any reasonable possibility of trickery, and which magicians will generally acknowledge to be so. In the first stages of our work we did in fact present Mr. Geller with several such arrangements, but these proved aesthetically unappealing to him. Although we may sympathize with the British physicists' desire to create conditions conducive to the appearance of genuine psychic powers, if such powers exist, we cannot fail to note the quandary that their efforts produce. In their quest for psi-conducive conditions, they have created guidelines that play into the hands of anyone intent on deceiving them. T- 00 0) C9 0) 77D 0 LL > 2 CL CL < i,A 204 ENHANCING HUMAN PERFORMANCE The very conditions that are specified as being conducive to the appearance of paranormal phenomena are almost always precisely those that are 11 conducive to the successful performance of conjuring tricks. One of the first rules the aspiring conjuror learns is never to announce in advance Om specific outcome that he or she is going to produce. In this way coookers will not know where and on what they should focus their Am. ntion and consequently will be less apt to detect the method by which t trick was accomplished. The authors' advice to avoid focusing on a determined outcome greatly facilitates the conjuror's task. Offie insistence that the arrangements meet with the psychic's approval igby far the most devastating of these conditions. Geller will perform C q~y if the conditions are "aesthetically pleasing." This amounts to giving t1p alleged psychic complete veto power over any situation in which he (F- she feels that success is not ensured. This in turn means that the &chic being tested, not the experimenters, is controlling the experiment. ~kely the British physicists ought to realize the irony of their admission Tt all their experimental arrangements designed to preclude trickery tElned out to be aesthetically unacceptable to Uri Geller. O~Another example of beliefs generated in circumstances that are known I create cognitive illustions is macro-PK, which is practiced at spoon t< &ding, or PK, parties. The 15 or more participants in a PK party, who u.sually pay a fee to attend and bring their own silverware, are guided f%ough various rituals and encouraged to believe that, by cooperating ith the leader, they can achieve a mental state in which their spoons W forks will apparently soften and bend through the agency of their 9nds. cSince 1981, although thousands of participants have apparently bent %ital objects successfully, not one scientifically documented case of pAranormal metal bending has been presented to the scientific community. 4mt participants in the PK parties are convinced that they have both ty Vnessed and personally produced paranormal metal bending. Over and Wer again we have been told by participants that they know that metal qscame paranormally deformed in their presence. This situation gives I& distinct impression that proponents of macro-PK, having consistently ftdled to produce scientific evidence, have forsaken the scientific method a$d under-taken a campaign to convince themselves and others on the 12sis of clearly nonscientific data based on personal experience and t6ttimony obtained under emotionally charged conditions. ~Consider the conditions that leaders and participants agree facilitate spoon bending. Efforts are made to exclude critics because, it is asserted, skepticism and attempts to make objective observations can hinder or prevent the phenomena from appearing. As Houck, the originator of the PK party, describes it, the objective is to create in the participants a PARANORMAL PHENOMENA 205 Peak emotional experience (Houck, 1984). To this end, various exercises involving relaxation, guided imagery, concentration, and chanting are performed. The participants are encouraged to shout at the silverware and to "disconnect" by deliberately avoiding looking at what their hands are doing. They are encouraged to shout Bend! throughout the party. "To help with the release of that initial concentration, people are encouraged to jump up or scream that theirs is bending, so that others can observe." Houck makes it clear that the objective is to create a state of emotional chaos. "Shouting at the silverware has also been added as a means of helping to enhance the emotional level in a group. This procedure adds to the intensity of the command to bend and helps create pandemonium throughout the party." A PK party obviously is not the ideal situation for obtaining reliable observations. The conditions are just those which psychologists and others have described as creating states of heightened suggestibility and implanting compelling beliefs that may be unrelated to reality. It is beliefs acquired in this fashion that seem to motivate persons who urge us to take macro-PK seriously. Complete absence of any scientific evidence does not discourage the proponents; they have acquired their beliefs under circumstances that instill zeal and subjective certainty. Unfortu- nately, it is just these circumstances that foster false beliefs. DiscuSSION OF QUALITATIVE EVIDENCE Our analysis of the evidence put before us indicates that even the most solidly based arguments for the existence of paranormal phenomena fall short of the currently accepted parapsychological standards. Even if the best evidence had been collected according to acceptable scientific standards, most proponents would have in fact remained convinced by personal experiences and data that clearly fall far short of scientific acceptability. We have looked at two examples to make clear why and in what ways such failures to meet acceptable standards render the corresponding arguments useless as evidence for the paranormal, even though they have created compelling and strongly held beliefs in those who have been exposed to them. The examples illustrate how different ways of attempting to acquire evidence for paranormal phenomena can depart from adequate standards. These inadequacies become especially critical when we note that the conditions under which the alleged paranormal phenomena are supposed to occur are just those known to foster biases and false beliefs. The PK parties, while creating powerful beliefs in paranormal metal bending, clearly violate almost every principle for obtaining trustworthy data. These parties offer no standardization, no objective records, and no 206 ENHANCING HUMAN PERFORMANCE controls against self-deception or the deliberate deception of others. All participants, including the leader, are encouraged to achieve a peak emotional state, and general chaos is encouraged. +he suggestions of a group of British physicists for testing alleged psychics are aimed at somehow combining the desire to keep the psychic fr* feeling inhibited with the desire to obtain evidence of acceptable sc ific quality. The observers' zeal for making the psychic feel trusted 11ces conditions that make scientific observation impossible: observ ergare instructed to refrain from focusing attention on any expected It' and the experimental arrangement must be aesthetically acceptable t Me psychic, a condition that in effect puts the psychic in control of A Q th&experiment. The search for psi-conducive conditions is understandable. Parapsy- cWogical research, even at its best, has been continually frustrated by thwack of robust, lawful, and repeatable outcomes, yet parapsychologists h4k experienced phenomena or have encountered data that have con vgeed them of the reality of the paranormal. When they try to put such e nce before their critics, however, the phenomena have a habit of diLdppeafing. If one fervently believes that the phenomena are real, then it iety of reasons why they are elusive and k-comes easy to imagine a var hUJ to produce on demand. .When proponents encounter a new phenomenon or psychic, they are st!Rngly motivated to create conditions that will not drive the phenomenon The special atmosphere of PK parties and the suggestions of the ysh physicists are just two examples of attempts to generate psi- X c~ducive conditions that also seem to be deception-conducive and bias- c4mducive. C*4 U) CONCLUSIONS %1 drawing conclusions from our review of evidence and other consid- e&kions related to psychic phenomena, we note that the large body of re,search completed to date does not present a clear picture. Overall, the e,merimental designs are of insufficient quAity to arbitrate between the Ai~s made for and against the existence of the phenomena. While the Qt research is of higher quality than many critics assume, the bulk of th8 work does not meet the standards necessary to contribute to the k1wledge base of science. Definitive conclusions must depend on e ence derived from stronger research designs. The points below s6Cmarize key arguments in this chapter. 1. Although proponents of ESP have made sweeping claims, not only for its existence but also for its potential applications, an evaluation of the best available evidence does not justify such optimism. The strongest PARANORMAL PHENOMENA 207 4 claims have been made for remote viewing and the Ganzfeld experiments. The Scientific case for remote viewing is based on a relatively small number of experiments, almost all of which have serious methodological defects. Although the first experiments of this type were begun in 1972, the existence of remote viewing still has not been established. Further- more, although success rates varying from 30 to 60 percent have been claimed for the Ganzfeld experiments, the evidence remains problematic because all the experiments deviate in one or more respects from accepted scientific procedures. In the committee's view, the best scientific evidence does not justify the conclusion that ESP-that is, gathering information about objects or thoughts without the intervention of known sensory mechanisms-exists. 2. Nor does scientific evidence offer support for the existence of psychokinesis-that is, the influence of thoughts upon objects without the intervention of known physical processes. In the experiments using random number generators, the reported size of effects is very small, a hit rate of no more than 50.5 percent compared with the chance expectancy of 50 percent. Although analysis indicates that overall significance for the experiments, with their unusually large number of trials, is probably not due to a statistical fluke, virtually all the studies depart from good scientific practice in a variety of ways; furthermore, it is not clear that the pattern of results is consistent across laboratories. In the committee's y conclusions favoring the existence of an effect so small must view, an at least await the results of experiments conducted according to more adequate protocols. 3. Should the Army be interested in evaluating further experiments, the following procedures are recommended: first, the Army and outside scientists should arrive at a common protocol; second, the research should be conducted according to that protocol by both proponents and skeptics; and third, attention should be given to the manipulability and I practical application of any effects found. Even if psi phenomena are determined to exist in some sense, this does not guarantee that they will have any practical utility, let alone military applications. For this to be possible, the phenomena would have to obey causal laws and be manipulable. 11 4. The committee is aware of the discrepancy between the lack of scientific evidence and the strength of many individuals' beliefs in paranormal phenomena. This is a cause for concern. Historically, many of the the world's most prominent scientists have concluded that such phenomena exist and that they have been scientifically verified. Yet in just about all these cases, subsequent information has revealed that their convictions were misguided. We also are aware that many proponents believe that the scientific method may not be the only, or the most 'a (D > 0 0- 0- < 208 ENHANCING HUMAN PERFORMANCE appropriate, method for establishing the reality of paranormal phenomena. Unfortunately, the alternative methods that have been used to demonstrate the existence of the paranormal create just those conditions that psy- chologists have found enhance human tendencies toward self-deception and suggestibility. Concerns about making the experimental situation 0-Infortable for the alleged psychic or conducive to paranormal phenom- ef3 frequently result in practices that also increase opportunities for dQeption and error. C14 CV), SOURCES OF INFORMATION 9wo of the military officers who briefed us during our first meeting uigd the committee to give serious consideration to paranormal phe- n(Mena and related parapsychological techniques. They described a vNety of such phenomena that they felt had military potential, either as t1teats to security or as aids to defense. Site visits to leading laboratories arS a paper prepared for the committee also contributed to the bases for th6committee's work. Briefings were given to committee members by R~ert Jahn, Cleve Backster, Helmut Schmidt, members of the staff of theFtanford Research Institute, and the U.S. Army Laboratory Command inWdelphi, Maryland. The paper prepared by James Alcock provided deailed reviews of the available evidence on random event generators an(j)remote viewing. In addition, the committee benefited from a thorough review conducted for the Army Research Institute by John Palmer and fr(p its own review of recent articles in the Journal of Parapsychology aniZother relevant periodicals and handbooks. C14 (D 0) M (D 7~ 0 U_ (D > 0- CL CL CD 00 CD CD C14 (D to) cc (D 7j; 0 U_ (D > 0- CL CL 206 ENHANCING HUMAN PERFORMANCE controls against self-deception or the deliberate deception of others. All participants, including the leader, are encouraged to achieve a peak emotional state, and general chaos is encouraged. The suggestions of a group of British physicists for testing alleged psytqhics are aimed at somehow combining the desire to keep the psychic frog feeling inhibited with the desire to obtain evidence of acceptable sciptific quality. The observers' zeal for making the psychic feel trusted prq%ces conditions that make scientific observation impossible: observ- erEUre instructed to refrain from focusing attention on any expected result, and the experimental arrangement must be aesthetically acceptable to Fie psychic, a condition that in effect puts the psychic in control of th(~xperiment. We search for psi-conducive conditions is understandable. Parapsy- chM7-gical research, even at its best, has been continually frustrated by thsack of robust, lawful, and repeatable outcomes, yet parapsychologists haev experienced phenomena or have encountered data that have con- viabed them of the reality of the paranormal. When they try to put such eva!ence before their critics, however, the phenomena have a habit of diQppearing. If one fervently believes that the phenomena are real, then it 66comes easy to imagine a variety of reasons why they are elusive and ha< to produce on demand. Ohen proponents encounter a new phenomenon or psychic, they are strungly motivated to create conditions that will not drive the phenomenon avRy. The special atmosphere of PK parties and the suggestions of the Bash physicists are just two examples of attempts to generate psi- ccQJucive conditions that also seem to be deception-conducive and bias- cc&ucive. Q C*4 (D CONCLUSIONS A drawing conclusions from our review of evidence and other consid er-foris related to psychic phenomena, we note that the large body of reW-arch completed to date does not present a clear picture. Overall, the C)frimental designs are of insufficient quality to arbitrate between the c Iirms made for and against the existence of the phenomena. While the b~d research is of higher quality than many critics assume, the bulk of th work does not meet the standards necessary to contribute to the t krowledge base of science. Definitive conclusions must depend on evkence derived from stronger research designs. The points below tnarize key arguments in this chapter. su 1. Although proponents of ESP have made sweeping claims, not only for its existence but also for its potential applications, an evaluation of the best available evidence does not justify such optimism. The strongest PARANORMAL PHENOMENA 207 claims have been made for remote viewing and the Ganzfeld experiments. The scientific case for remote viewing is based on a relatively small number of experiments, almost all of which have serious methodological defects. Although the first experiments of this type were begun in 1972, the existence of remote viewing still has not been established. Further- more, although success rates varying from 30 to 60 percent have been claimed for the Ganzfeld experiments, the evidence remains problematic because all the experiments deviate in one or more respects from accepted scientific procedures. In the committee's view, the best scientific evidence does not justify the conclusion that ESP-that is, gathering information about objects or thoughts without the intervention of known sensory mechanism s-exi sts. 2. Nor does scientific evidence offer support for the existence of psychokinesis-that is, the influence of thoughts upon objects without the intervention of known physical processes. In the experiments using random number generators, the reported size of effects is very small, a hit rate of no more than 50.5 percent compared with the chance expectancy of 50 percent. Although analysis indicates that overall significance for the experiments, with their unusually large number Of trials, is probably not duo to a statistical fluke, virtually all the studies depart from good scientific practice in a variety of ways; furthermore, it is not clear that the pattern of results is consistent across laboratories. In the committee's view, any conclusions favoring the existence of an effect so small must at least await the results of experiments conducted according to more adequate protocols. 3. Should the Army be interested in evaluating further experiments, the following procedures are recommended: first, the Army and outside scientists should arrive at a common protocol; second, the research should be conducted according to that protocol by both proponents and skeptics; and third, attention should be given to the manipulability and practical application of any effects found. Even if psi phenomena are determined to exist in some sense, this does not guarantee that they will have any practical utility, let alone military applications. For this to be possible, the phenomena would have to obey causal laws and be manipulable. 4. The committee is aware of the discrepancy between the lack of scientific evidence and the strength of many individuals' beliefs in paranormal phenomena. This is a cause for concern. Historically, many of the the world's most prominent scientists have concluded that such phenomena exist and that they have been scientifically verified. Yet in just about all these cases, subsequent information has revealed that their convictions were misguided. We also are aware that many proponents believe that the scientific method may not be the only, or the most > 0 CL CL < 208 ENHANCING HUMAN PERFORMANCE appropriate, method for establishing the reality of paranormal phenomena. Unfortunately, the alternative methods that have been used to demonstrate the existence of the paranormal create just those conditions that psy- chologists have found enhance human tendencies toward self-deception and suggestibility. Concerns about making the experimental situation cowfortable for the alleged psychic or conducive to paranormal phenom- ena4requently result in practices that also increase opportunities for degption and error. CD 04 C0 CD SOURCES OF INFORMATION CD 11vo of the military officers who briefed us during our first meeting urdmi the committee to give serious consideration to paranormal phe- no&ena and related parapsychological techniques. They described a vaiftty of such phenomena that they felt had military potential, either as th W(Dts to security or as aids to defense. Site visits to leading laboratories an(& paper prepared for the committee also contributed to the bases for the~&ommittce's work. Briefings were given to committee members by Ro rt Jahn, Cleve Backster, Helmut Schmidt, members of the staff of e th anford Research Institute, and the U.S. Army Laboratory Command g in elphi, Maryland. The paper prepared by James Alcock provided W detQ ed reviews of the available evidence on random event generators an(Gemote viewing. In addition, the committee benefited from a thorough review conducted for the Army Research Institute by John Palmer and froAP its own review of recent articles in the Journal of Parapsychology and,',Sther relevant periodicals and handbooks. CD CD CD CD 04 (D U) (D 77D 0 U_ 13 (D > 0 I- CL CD 00 CD CD CD CD 04 (D U) 77D 0 11- 13 (D > 0 1- CL CL NATIONAL ACADEMY PRESS - 2101 Constitution Avenue, NW - Washington, DC 20418 NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance. This-report has been reviewed by a group other than the authors according to procedures appr*d by a Report Review Committee consisting of members of the National Academy of SqWces, the National Academy of Engineering, and the Institute of Medicine. ThCNational Academy of Sciences is a private, nonprofit, self-perpetuating society of distiRished scholars engaged in scientific and engineering research, dedicated to the furtho:j~ce of science and technology and to their use for the general welfare. Upon the authdMy of the charter granted to it by the Congress in 1863, the Academy has a mandate that uires it to advise the federal government on scientific and technical matters. Dr. Franaress is president of the National Academy of Sciences. Th ational Academy of Engineering was established in 1964, under the charter of the N ati IAcademy of Sciences, as a parallel organization of outstanding engineers. It is auto4lanous in its administration and in the selection of its members, sharing with the NatiMl Academy of Sciences the responsibility for advising the federal government. The Nati~*] Academy of Engineering also sponsors engineering programs aimed at meeting naticffv needs, encourages education and research, and recognizes the superior achieve- ment 'of engineers. Dr. Robert M. White is president of the National Academy of Engikring. Th[LInstitute of Medicine was established in 1970 by the National Academy of Sciences to segXe the services of eminent members of appropriate professions in the examination of plWy matters pertaining to the health of the public. The Institute acts under the respo .*ibility given to the National Academy of Sciences by its congressional charter to be m-Mdviser to the federal government and, upon its own initiative, to identify issues of medQ care, research, and education. Dr. Samuel 0. Thier is president of the Institute of Medicine. ThQqational Research Council was organized by the National Academy of Sciences in 1916:rQ associate the broad community of science and technology with the Academy's purpas of furthering knowledge and advising the federal government. Functioning in accorYance with general policies determined by the Academy, the Council has become the princol. operating agency of both the National Academy of Sciences and the National Acadogy of Engineering in providing services to the government, the public, and the scierANc and engineering communities. The Council is administered jointly by both Acad4hies and the Institute of Medicine. Dr. Frank Press and Dr. Robert M. White are chairiRn and vice chairman, respectively, of the National Research Council. LibraG) of Congress Cataloging-in-Publication Data EnhaMing human performance : issues, theories, and techniques D,-Wl Druckman and John A. Swels, editors. LP. cm. '&mmittee on Techniques for the Enhancement of Human Performance, Co[Lmission on Behavioral and Social Sciences and Education, National Research Council." oliography: P. I~Iudes index. ION 0-309-03792-1. ISBN 0-309-03787-5 (soft) 1 elf-realization-Congresses. 2. Performance-Psychological asp s--Congresses. 1. Druckman, Daniel, 1939- . 11. Swets, Joh rthur, 1928- . 111. National Research Council (U.S.). Co ittee on Techniques for the Enhancement of Human Performance. BX.S4E56 1987 158-dc 19 87-31233 Copyright (0 1988 by the National Academy of Sciences CIP Printed in the United States of America COMMITTEE ON TECHNIQUES FOR THE ENHANCEMENT OF HUMAN PERFORMANCE JOHN A. SWETS, Chair, Bolt Beranek and Newman Inc., Cambridge, Mass. ROBERT A. WORK, Department of Psychology, University of California, Los Angeles THOMAS D. COOK, Department of Psychology, Northwestern University GERALD C. DAVISON, Department of Psychology, University of Southern California LLOYD G. HUMPHREYS, Department of Psychology, University of Illinois RAY HYMAN, Department of Psychology, University of Oregon DANIEL M. LANDERS, Department of Physical Education, Arizona State University SANDRA A. MOBLEY, Director of Training and Development, The Wyatt Company, Washington, D.C. LYMAN W. PORTER, Graduate School of Management, University of California, Irvine MICHAEL 1. POSNER, Department of Neurology, Washington University WALTER SCHNEIDER, Department of Psychology, University of Pittsburgh JEROME E. SINGER, Department of Medical Psychology, Uniformed Services University of Health Sciences, Bethesda, Md. SALLY P. SPRINGER, Department of Psychology, State University of New York, Stony Brook RICHARD F. THOMPSON, Department of Psychology, Stanford University DANIEL DRUCKMAN, Study Director JULIE A. KRAMAN, Administrative Secretary iff > 0 L_ CL CL CD CD N CV) CD CD N CD CD CD (L a w 00 C*4 (D Z w %- 0 U- V > 0 CL CL CD Contents CD CD CD CD CD CD CD CD CD PREFACE ............................................. vii I VERVIEW ........................................ O I I Introduction ...................................... 3 2 Findings and Conclusions .........................15 3 Evaluation Issues ................................24 CD 11 37 c0 PSYCHOLOGICAL TECHNIQUES .................. 4 Learning ........................................39 52 CD 5 Improving Motor Skills ...........................61 CD CD 6 Altering Mental States ...........................102C14 7 Stress Management .............................115(D U) 8 Social Processes ................................133m (D III 167W(D PARAPSYCHOLOGICAL TECHNIQUES ........... 9 Paranormal Phenomena ...........................169" 0 U- REFERENCES ........................................ 209 > 0 APPENDIXES ......................................... 233 A Summary of Techniques: Theory, Research, and Applications ................................... 235 B Background Papersi ............................. 246 C Committee Activities ............................ 248 V vi CONTENTS D Key Terms .................................... 252 E Military Applications of Scientific Information ..... 262 F Biographical Sketches ........................... 282 IN W_X L) Q 00 Q Q Q Q 04 W U) M W 75 0 LL > 0 %_ CL CL ............................................... 289 Preface The Army Research Institute in 1984 asked the National Academy Of Sciences to form a committee to examine the potential Value of certai n techniques that had been proposed to enhance human performance. As a class, these techniques were viewed as extraordinary, in that they were developed outside the mainstream of the human sciences and were presented with strong claims for high effectiveness. The committee was also to recommend general policy and criteria for future evaluation of enhancement techniques by the Army. The Committee on Techniques for the Enhancement of Human Per- formance first met in June 1985. The 14 members of the committee were appointed for their expertise in areas related to the techniques examined. The disciplines they represent include experimental, physiological, clin- ical, social, and industrial psychology and cognitive neuroscience; one member is a training program director from the private sector. During the next two years, the committee gathered six times, met in toto or in part on several occasions with various representatives of the Army, conducted interviews and site visits and sent subcommittees on several others, and commissioned 10 analytical and survey papers. The committee also examined a variety of materials, including state-of-the-art reviews of relevant literature, reports commissioned by the Army Research Institute, and unpublished documents provided by institutes, practition- ers, and researchers. The report that follows describes the committee's activities, findings, and conclusions. Though cast largely in terms of the sponsor's setting, this report is relevant to other settings, for example, industry. The next few paragraphs present some background. Wi Ir- co CO 0 0 C14 (D W 0 LL 4) > 0 I- CL CL Viii PREFACE That the United States Army should be concerned to enhance the performance of its personnel is self-evident. We know that young volunteers must become not only soldiers who do well in battle but also technicians who skillfully operate and maintain complex equipment in peace and war. We are aware, moreover, that personal skills are not enough: individuals are heavily dependent on each other within small Ir groups, and groups of various sizes must work very effectively together tc~ermit survival and ensure success. And, of course,,all must be ready k n situations of great hardship, uncertainty, t ive pea performances i a2stress. In the face of these staggering requirements, one must realize thg turnover of personnel is high and that the training time available- tcrpart the necessary cognitive, physical, and social skills-is brief. it comes as no surprise that the Army is on the lookout for techniques th99 can help enhance human performance. The Army Research Institute isaarged with seeking out and developing such techniques: it does so )r. b employing researchers in the human sciences and by supporting apVopriate research in universities and other public and private organi- za.&ns. It focuses largely on promising new techniques as they appear inMhe mainstream of behavioral, physiological, and social research. H,kever, given the pressures and given a view of mainstream research as ow, narrow, and insufficiently targeted, it also comes as no surprise thA some influential officers and certain segments of the Army want to caV a broader net to snare promising enhancement techniques. To do thb they look beyond traditional research organizations and practices to%Zhat are viewed as extraordinary techniques. These techniques are thaght possibly to provide such unusual benefits as accelerated learning, "E' leq&ing during sleep, superior performance through altered mental states, bemr management of behavior under stress, more effective ways of infpwncing other people, and so on. There is also an initiative within the Arivy to consider techniques based on paranormal phenomena, for ex*ple, extrasensory perception to view remote sites and psychokinesis to Sfluence the operation of distant machines. ng with these urgings to examine, to try, or to implement extraor- dinlry techniques come difficult new problems for those in the Army resEnsible for evaluation, as well as for those in the Army responsible for ersonnel and training practices. One issue is that proponents of such tectiques are usually not content with traditional evaluation procedures or ~&entific standards of evidence, often giving more weight to personal exptience and testimony. Furthermore, a typical technique of this kind doeQnot arise from the usual research traditions of experiments published in Oereed journals and peer review of cumulated evidence, but rather appears full-blown as a package promoted by a commercial vendor. What does the Army Training and Doctrine Command or the base commander PREFACE Ix do when the need is great, the package is ready, the claims are for miracles, some senior officers are vocally supportive, and the evaluation criteria are fluid? What do Army intelligence agencies do when the same conditions apply and other nations are said to be active in investigating paranormal. effects? The committee decided to assess a representative set of the techniques in question and resolved to address the surrounding issues in an open- minded and thorough way. We therefore divided ourselves into a number of subcommittees organized according to the behavioral processes ad- dressed by the several techniques: accelerated learning, sleep learning, guided imagery, split-brain effects, stress management, biofeedback, influence strategies, group cohesion, and parapsychology. In addition, a subcommittee on evaluation issues was formed to examine practices and standards relevant to all the techniques. Each chapter of the report was prepared by the appropriate subcommittee, but interactions were frequent and so the report represents a collaborative effort of all the members. Chapter I provides a context for the committee's task and the Army's interest in enhancing performance, characterizes some particular tech- niques, and introduces some general issues in evaluating them. Chapter 2 presents the committee's findings about the techniques examined and conc lusions about appropriate evaluation procedures. Chapter 3 treats the relevant evaluation issues more systematically and presents the committee's philosophy of evaluation as it pertains to the matter at hand. Chapters 4 through 8 deal with particular techniques but are organized in terms of more general psychological processes. Chapter 9 considers parapsychological techniques. The report concludes with six appendixes. Appendix A briefly sum- marizes the key elements of each enhancement technique. Appendix B lists the ten papers commissioned by the committee and their authors. Appendix C lists the members and activities of the subcommittees and also the activities of the committee as a whole. Appendix D lists key terms used in the research on particular techniques. Appendix E discusses the apptication of scientific research by the military. Appendix F contains biographical sketches of the committee members. As committee chair, I am now in the pleasant position of recounting the several contributors to the total committee process, a process that went remarkably well. Definition and guidance for the committee Is task came primarily from Edgar M. Johnson, director of the Army Research Institute. Administrative and technical liaison was ably provided by project monitor George Lawrence, who worked closely with the com- mittee in its various activities. They were supported well by several senior Army officers, including Colonel William Darryl Henderson, Commander of the Army Research Institute; Major General John Crosby, Q Q Q C*4 C) Q Q C14 CD Q Q W Ta") Q Q to (L a Q Ir 00 Q Q Q C*4 CD a) U) a) 0) W 0 LL > 0 CL CL X PREFACE PREFACE Xi Assistant Deputy Chief of Staff for Personnel; and General Maxwell R. Thurman, Vice Chief of Staff. The committee met with members of a resource advisory group that included Lieutenant General Robert M. Elton, chair, Deputy Chief of Staff for Personnel; Lieutenant General Sidney T. Weinstein, Assistant Chief of Staff for Intelligence; Dr. Louis Cameron, Director of Army Research and Technology; Major General Zurice 0. Edmunds, Commander of the Soldier Support Center; and ~&or General Philip K. Russell, Commander of the Medical Research aN Development Command. Among the Army staff who were very hdpful to the committee are Colonel John Alexander and Mr. Robert . gn fw4us; the names of many others appear in Appendix C. ghe committee's two consultants contributed special expertise: Paul f5rwitz (of Bolt Beranek and Newman Inc.) joined the site visits of the socommittee on parapsychology and advised on physical aspects of eMeriments in that area; James Schroeder (of Southwest Research lt6itute) attended the committee's meeting at Fort Benning, Georgia, aQ advised on the application of scientific research by the military (see A&endix E). The committee also received special expertise by commis- sioning papers. These papers and their authors are listed in Appendix B. 5at the National Research Council, David Goslin, executive director of th'&Commission on Behavioral and Social Sciences and Education, once aAn provided wise counsel and support, Ira Hirsh, commission chair, aiQ William Estes, also representing the commission, gave valuable aci6ce and encouragement. Thomas Landauer, a member of the NRC's Cqmmittee on Human Factors, provided liaison in the areas of our c-100mittees' mutual interests. The reviewers of this report gave us a good ',-a m9sure of reinforcement along with helpful critiques. Eugenia Grohman, assciate director for reports, lent experience and wisdom to this report. Spacial gratitude is extended to Christine McShane, the commission's ediar: her skillful editing of the entire manuscript contributed substantially to is readability, and the coherence of the volume 'owes much to her suLTestions for organizing the material. Julie Krarnan, as administrative seltary to the committee, earned its considerable appreciation for setUpg up efficient meetings and for handling all manner of tasks graciously anff moothly. ]:~niel Druckman, study director of the project, receives the commit teem great appreciation for his intellectual contributions across the broad ran?ff of topics considered as well as for his logistic support. Working cloilly with the authors of chapters and commissioned papers, he provided an Wegration of the several contributions as well as much of the intr'5ductory and interstitial material. He also served on two subcommit- tees in areas of his expertise. The ultimate debt of anyone who finds this report useful, and my large personal debt, is to the members of the committee. As individuals, their capabilities are broad and deep. As a group, they gave generously and productively of their time, were always engaged, responded to every challenge, and, especially, showed an exceptional talent for reaching consensus in a collegial, advised, and efficient way. JOHN A. SWETS, Chair C*4 Committee on Techniques for the CV) Enhancement of Human Performance Q 4D C*4 Q 4D CD CD (D IL L) 00 CD CD 40 CD C14 0 U_ 0- am CL PART 0 V_ 00 04 d) 7~ 0 LL d) 0 CL CL Overview (L ART I CONSISTS OF THREE CHAPTERS. Chapter I sets the stage for thew P report. It describes the committee's task, provides background on I the Army's interest in enhancement techniques, characterizes specific!5 techniques examined by the committee, and identifies the main issues in(.) evaluating the relation between techniques and human performance.0 Chapter 2 presents the committee's findings and conclusions. We draw'r" general conclusions about the process of consideration given to anyco 0 technique and state specific findings and conclusions for each of the areas -- of human performance examined. Chapter 3 presents the committee's philosophy of evaluation as it (CD4 pertains to enhancement techniques. Some of the issues involved concern M U) the conduct of basic research; others concern the conduct of field tests. M With respect to basic research, issues include.the plausibility of inferences M about novel concepts, causation, alternative explanations of causal W relations, and the generalizability of causal relations. With respect to L_ field tests, a number of questions are of interest: Does the enhancement 0 LL program meet genuine Army needs? Is the resulting program implement- m0 able, given program design and resources? Do unintended side effects M limit utility? Is the program more cost-effective than its alternatives? 0 These questions underscore the reality that evaluation research is largely a pragmatic activity influenced by the organizational context in which it CL occurs. N N a 00 C*4 7F) 0 LL > I- 0 CL CL Introduction THE COMMITTEE'S TASK At the request of the U.S. Army Research Institute, the National Research Council formed a committee to assess the field of techniques that are claimed to enhance human performance. The Institute asked the Council to evaluate the claims made by proponents of selected existing techniques and to address two general additional questions: (1) What are the appropriate criteria for evaluating claims for such techniques in the future? (2) What research is needed to advance our understanding of performance enhancement in areas related to the proposed techniques? The objectives of the committee's study are to provide an authoritative assessment of these questions for policyrnakers in research and devel- opment who are consumers of the techniques, as well as to consider their possible applications to Army training. Many of the techniques under consideration grew out of the human potential movement of the 1960s, including guided imagery, meditation, biofeedback, neurolinguistic programming, sleep learning, accelerated learning, split-brain learning, and various techniques to reduce stress and increase concentration. Many of these techniques have gained popularity ove r the past two decades, promoted by persons eager to provide answers to problems of human performance or to prosper from them. While often using the language of science to justify their approach, these promoters are for the most part not trained professionals in the social and behavioral sciences. Nonetheless, they do appeal to basic needs for human perform- ance, and the Army, like many other institutions, is attracted to the prospect of cost-effective procedures that can improve performance. 3 T- CO 0 0 0 T- 0 00 a a C*4 a) 0) cc 0 LL MO a) > 0 L- CL CL < 4 ENHANCING HUMAN PERFORMANCE These institutions must evaluate the effects of such procedures, however. Issues include the appropriateness of a quick-fix approach, the distinction between the impact of an experience and actual change, and the plausibility of evidence indicating that something is happening even if the effects are not reproducible or the benefits uncertain. %-ik more conservative atmosphere in the 1980s is reflected in the way teaniques are advanced, Motivation in the 1980s may be primarily extrepreneurial, not ideological, as it was in the 1960s. Advocates focus oadating the techniques to specific tasks, such as marksmanship, foreign la uage acquisition, fine motor skills, sleep inducement, and even combat 9p eftetiveness. Some techniques are in fact rooted in a scientific literature. 04 F0 these reasons the various techniques have attracted the interest of ir~tutions that have rejected, and would probably continue to reject, c&ntercultural trends in society. Indeed, much attention has been given tc&ese techniques by industrial, government, and military policyrnakers, a4a.ell as by the general public. For this reason especially, it is important to5ddress the issues surrounding the claims made for effectiveness. Aaborate training programs have grown, nourished by their developers' e en usiasm and salesmanship in a social context receptive to quick cures. n& F(Inmany of these programs, success in the marketplace is used to justify thlZapproaches. For others, more esoteric concepts, including the role of46rotransmitters, the physics of neuromuscular programming, brain w9& patterns, hemispheric laterality, high-access memory storage, pre- ferted sensory modalities, and low-gain innervation of muscles, are used to ~atempt to provide scientific justification for the claims. The chapters thafollow evaluate the evidence and theories used to support the claims of %veral popular techniques. Before turning to these evaluations, hoi2ever, we provide some background on the Army's interest in these te"~'-iques, as well as a discussion of issues surrounding enhanced pe4rmance and issues in evaluating the relation between techniques amoerformance. 77D W THE ARMY'S NEEDS T& Army motto, "Be all that you can be," symbolizes the current ethNs of the institution, an army of excellence. Emphasis is placed on a tta 'ding certain ideals, such as fearlessness, cunning, courage, one-shot effeAiveness, fatigue reversal, and nighttime fighting' capabilities. These idearg are assumed to be realizable through training, even if the most effekive techniques have not as yet been identified. The culture of imp4vement is further reinforced by the dilemma created by an all- volunteer Army and the demands of complex new computer technologies. Many civilians enter military service with only the required minimum of INTRODUCTION 5 formal education; most of these volunteers enlist in the Army. For this reason, the Army's emphasis on skill training is well founded. The importance of the human element in combat is recognized in the Army Science Board's 1983 report "Emerging Concepts in Human Technology," which phrases the issue in terms of high yield at relatively low investment. Human capital is considered to be the best potential source for growth in Army effectiveness, both in terms of return on investment and as a moral imperative "if we are to commit our soldiers C-4 to fight outnumbered and win." The technologies singled out in the report Cl) are those that can improve creativity and innovation, learning and training, 0 C*4 motivation and cohesion, leadership and management, individual, crew, q 0 and unit fitness, soldier-machine interface, and the general productivity 0 of the Army's human resources. W The Board's report largely bypasses issues of systematic evaluation of enhancement techniques within the Army context, while addressing mechanisms for integrating them with Army activities. Little concern is 9 shown for adducing relevant criteria to determine whether implementation W (P is feasible. The Army's ambitious goals, combined with a reluctance to [L deal with the complexities surrounding issues of human performance, 0 make this institution potentially susceptible to a variety of claims made by technique developers. It would therefore seem prudent to devise criteria for evaluating those claims. 0 0 A SELLER'S MARKET !r- 00 Techniques for enhancement of human performance have received much attention in the popular press. They 0 have been actively promoted by entrepreneurs who sense a profitable market04 in self-improvement. The American Society for Training and Developmenta) "estimates that com- parties are spending an astounding $30 billion a year on formal courses and training programs for workers. And that's only the tip of the iceberg" (Wall Street Journal, August 5, 1986). They (D are also taken seriously by the U.S. military, who are at times accused of losing the "mind race" to the Soviets (see, for example, Anderson 0 and Van Atta, Washington Post, July 17, 1985). The Army has shown particularLL interest in techniques , a that help people acquire, maintain, or improve4) such skills as classroom 0 learning, communication and influence, creativity, and accuracy in the execution of tasks requiring motor skills. Those that are cost-effective and produce relatively rapid results are likelyCL to receive the most attention, along with research breakthroughs that could be a basis for new training programs. What are these techniques? What claims are being made for them? Is there evidence that substantiates these claims? Examples of techniques include biofeedback (information about internal 6 ENHANCING HUMAN PERFORMANCE processes), Suggestive Accelerative Learning and Teaching Techniques (a package of methods geared primarily toward classroom learning), hemispheric synchronization (a machine-aided process based on assump tions about right brain-left brain activities), neurolinguistic programming (procedures for influencing another person), and Concentrix (a procedure 'r used to improve concentration on specific targets). Also of interest to tGArmy are such processes as group cohesion and stress reduction, as W& as the claims for sleep learning, peak performance, and parapsy- cfllogy. Together, these techniques and processes cover the major types oftkills-motor, cognitive, and social. Several of them are described Af briefly, along with illustrative claims found in brochures and course erial. luggestive Accelerative Learning and Teaching Techniques (SAL77) i S~rn approach to training that employs a combination of physical roftxation, mental concentration, guided imagery, suggestive principles, ag baroque music with the intent of improving classroom performance. S6?ne applications have included language training, typing instruction, a-'a high school science courses. Attempts have been made to evaluate tHL applications, and many of these evaluations are published in the JQrnal of the Society Jor Accelerative Learning and Teaching (Psy- crrology Department, Iowa State University). The following is a sampling odlaims made in brochures and convention announcements: "A proven Adhod which has broad potential application in U.S. Army training"; "gwill significantly reduce training time, improve memory of material 14LTned and introduce behavioral changes that positively affect soldier pffformance-self-esteem, self-confidence, and mental discipline"; and "Bost students will prove to themselves that they have learned a far ter amount of material per unit of time with a greater amount of 91 .9 piFasure than they have ever previously done." (Dieurolinguistic progranuning (NLP) refers to a set of procedures Aeloped to influence and change the behaviors and beliefs of a target P&on. Its goals are mostly therapeutic, but its proponents also advocate tl-i~ use of the techniques in advertising, management, education, and inLerpersonal activities. A small research literature, published primarily ir&e Journal of Counseling Psychology, has developed. Practitioners ca8 be trained and certified at various institutes, and the National Aeociation for Neurolinguistic Programming distributes a newsletter to it embership, currently about 500 persons. Illustrative claims and som tetkmonials found in advertising materials include: -[NLPJ has evolved a Maique technology which encompasses a set of specific techniques en 'Cling you to produce well-defined results" and -NLP ... is clear, easy to learn, and brilliant." A typical slogan is that found in a brochure from the Potomac Institutes, Silver Spring, Maryland: "The difference INTRODUCHON 7 that makes the difference, for education, management, psychotherapy, psychiatry, business, law, health care, and the arts." Hemi-Syncl_~, which is short for hemispheric synchronization, is a technique that consists of presenting two tones slightly differing in frequency to separate ears with stereo headphones to produce binaural 7 beats. The long-known result is a tone that waxes and wanes at a T" Q frequency equal to the difference between the original tones. Pioneered Q Q as an enhancement technique by Robert Monroe of the Monroe Institute C*4 of Applied Science in Faber, Virginia, the technique is based on the M Q assumption of a frequency following response (FFR) in the human brain. Q C*4 The FFR refers to a correspondence between sound signals heard by the Q ear and electrical signals recorded by an electroencephalograph (EEG). Q Q It is claimed that, by altering sound patterns, it is possiblc to alter states of awareness. Stated applications are in the areas of language learning, stress management, reading skills, and creativity and problem solving. t-_ Claims of effectiveness stated in the Monroe Institute's brochure are CD Q wide-ranging, covering education (e.g., -77.8 percent of a class reporte improvement in mental-motor skills"), health (early recuperation, low,-r _' IL blood pressure), psychotherapy (stress reduction, working with terminally 0 ill patients, teaching autistic children), and sleep restorative training (e.g., W "forty of forty-five insomniacs reported that one-month use of Hemi- Synclf'~ tapes was at least as effective as medication, without the drug side effects"). SyberVision~ is a scripted videotape that presents an expert (e.g., a CD Ir world-class athlete) repeatedly performing fundamental skills of his or , 00 her activity (e.g., golf) without verbal instructions. It is based loosely on 52 principles of vicarious learning, guided imagery, and mental rehearsal. 'COD Developed and marketed by SyberVision Systems Inc., S C*4 an Leandro, Q California, the package includes a cassette and instruction manual with (D an appendix on the "simple physics of neuro-muscular programming." u) M The appendix presents a scientific rationale for the technique, for example, CD "the more you see and hear pure movement, the deeper it becomes 7a; imprinted in your nervous system ... and the more likely you are to W perform it as a conditioned reflex," and "The decomposition of what is "0 seen and sensorily experienced into an electromagnetic wave form is LL accomplished by a complex mathematical operation (Fourier Transform) '0 CD by the brain" (Instruction Manual on Golf with Patty Sheehan). Support > 0 for enhanced performance is, however, based on testimonials rather than " CL experiments, for example, Killy on skiing, a Stanford tennis coach on CL tennis, Professional Golf Association members on golf, Peters (In Search of Excellence) on achievement, Salk on leadership, and a variety of corporate executives and educators on self-improvement. Claims range from sweeping statements (e.g., "We owe these two men a large debt of 8 ENHANCING HUMAN PERFORMANCE gratitude") to rather precise statements (e.g., "In 47 days I have lost 25 pounds [191 to 1661, yet I look like I lost 40") (in the United Airlines magazine, Discoveries). This technique involves a significant marketing effort that builds on users' willingness to be quoted and the use of Aaknowledged academic experts (e.g., Stanford neuropsychologist Karl l4bbram), whose role in the program is advertised as being central. tress management techniques are procedures designed to alleviate 81 xDxiety or tension. Catering to an age of anxiety, self-help books, groups, dlinics on managing stress proliferate. A good example of the approach i~he recent book by Charlesworth and Nathan (1982), which emphasizes Ress, nutrition, managing time, general life-styles and life-cycles, as &I as strategies such as progressive relaxation, autogenic training, and ij?age rehearsal. Appendixes provide the reader with home practice dparts, a guide to self-help groups, and suggested books and recordings. 'f,Re groups offer their members information, emotional support, and a s0hse of belonging. Often stress management procedures are combined w th a number of other techniques into a single package. The promoters :1en emphasize the total package rather than particular techniques; the kages usually combine several processes that, when acting together, a thought to produce significant effects. .'TheIArmy's needs for techniques that can improve performance make i - ubject to the sorts of claims illustrated above. While they and other consumers can avoid the more obvious pitfalls, the proliferation of choices P~;d products and the lack of scientific evidence allow marketplace criteria M: become the bases for decisions. But there are exceptions. Some .00 t5hniques have received the attention of the scientific community, and adence is available to be used as criteria in such areas as biofeedback, E~ded imagery, sleep learning, cohesion, and even for some. aspects of 1;*chic phenomena and neurolinguistic programming. %Fhe literature has alerted us, for example, to the distinction between tk* effects of biofeedback on fine motor skills and on stress, to the (ferent effects of mental and physical rehearsal,:'to placebo and Haw- t&rne effects in stress research, to the priming and repetition effects of material presented during sleep, to some dysfunctions of group cohesion, ti2the difficulties of replicating experiments on extrasensory perception, atO to the implausibility of specialized sensory modalities as postulated b~NLP (see Appendix D for key terms). These findings make evident a conplex relation between technique and performance. I CL CL IMPROVED PERFORMANCE: COMPLEX ISSUES, SIMPLE SOLUTIONS The research literature in such traditional areas of experimental psy- chology as learning, perception, sensation, and motivation suggests INTRODUC77ON 9 complex relations between interventions and improved performance. Many technique promoters appear to pay little attention to this literature, preferring an alternative route to invention: rather than derive a procedure from appropriate scientific literature, they create techniques from personal experiences, sudden insights, or informal observation of "what works.-7 Science may enter the process after the technique 0 is developed and used," for example, to legitimize its use or to endorse methods for evaluation.0 0 Research follows rather than precedes the invention. This sequenceC*4 C11) increases the likelihood that important considerations will be missed. Weo highlight some of these considerations in this section. 0 C*4 The lack of easy avenues to improved performance may well be dueo 0 to the complexity of the behavior in question. One definition of skillso emphasizes the importance of the coordination of behavior: "A skilledw response ... means one in which receptor-effector-feedback processes'a'-) are highly organized, both spatially and temporally. The central problem for the study of skill learning is how such organizations or patterning 9 comes about" (Fitts, 1964:244). This definition implies that skill learning W 0) involves an orchestration of diverse processes, making the topic an CL interesting one to various subfields of psychology. It also makes evident 13 a, number of unresolved issues, including whether different skills are 1)5- learned and retained in different ways. The research findings obtained in this literature contribute to our understanding of the necessary, if not sufficient, conditions for improved performance. Research on skill acquisition addresses such basic questions as What are the stages of learning? and What is learned? Distinctions made 00 0 between short-term and long-term memory. storage and between schemas and details have contributed to our understanding of basic processes (see 0 Welford, 1976). Other questions have more direct consequences for 04 application: for example, what contributes to the acquisition and main- tenance of skills? How can the adverse effects of stress, fatigue, and monotony be avoided? These questions are the basis for programs of a) research that can be divided into several parts, each defined in terms of empirical issues (Irion, 1969; see also the other chapters in Bilodeau and Bilodeau, 1969). Some examples of empirical issues are practice effects 0 (differences due to distributed versus massed practice, long versus short LL V rest periods, short versus long sessions), the whole-part problem (differ- a) ences due to learning a task as a whole versus learning it by its constituent > 0 elements), feedback (differences due to delays it! receiving knowledge of i- results and to type of information during the delay period), retention .CL CL (differences due to whether the the task is motor or verbal), and transfer < of training. These and related considerations suggest that skill learning is an incremental process likely to differ from one type of skill to another. Whether intending to enhance motor, verbal, problem-solving, or social 10 ENHANCING HUMAN PERFORMANCE performances, technique designers can ill afford to ignore these lessons from the experimental literature on skill acquisition and maintenance. It is also the case, however, that the agenda of unexplored issues is much larger than the accomplishments to date, and this is recognized Particularly tp the rapidly growing field of cognitive psychology, in which the griformation-processing revolution" is just beginning. o Practical applications are, however, not automatic. Many excellent ?Wications do not spring from basic science; some are the result of craft ad experience. More important perhaps are the indirect contributions cade in both directions-from basic to applied and vice versa. A C*4 ostematic approach taken in both domains serves to vitalize each, as CXen applied investigations reveal new phenornena that need explanation &I at when a new package incorporates basic principles discovered originally gi the laboratory. Such an approach is likely to facilitate the design of ~Tpropriate techniques for skill acquisition. At issue is whether a particular achnique can produce and sustain desired changes. c6One conclusion from the research accumulated to date is that effective em &Lerventions are those that are continuous and self-regulating and take Gcount of both context and person (see, for example, Lerner, 1984). Wrticularly relevant is the difference between short-term and long-term 4~anges. Effects obtained by many techniques for performance enhance- &nt may be short-term in their effects. This distinction is made by Back 0-9731 1987) in his evaluation of the sensitivity training movement. The ~Banges observed by sensitivity trainers and documented by evaluators Wy well reflect the impact of the experience per se. Such situation Wects are unlikely to be sustained in different environments, an obser- Eltion supported by the literatures in both developmental and social flychology (Druckman, 1971; Frederiksen, 1972). These literatures cau- tion against hasty generalizations from observed, situation-specific effects; t0ey also explain why long-term effects may be difficult to produce with .M. Wef exposures to "treatments." Like the sensitivity trainers of the 1960s Vd 1970s, many of the promoters (and consumers) of the 1980s pay little Mention to issues of causality and intrinsic motivation, preferring instead t6dwell on single dimensions of treatments or to offer a mixed package ibristructed in arbitrary ways and producing diffuse effects that reflect experience. > he issue of expected benefits from techniqdes provides a bridge heween research and application. Research can be designed to evaluate t ftfiniques, as well as to discover possible unintended side effects. UV ed, a research literature has developed in some of the areas examined in this book, namely biofeedback, stress, and guided imagery. For many other techniques, however, a relevant body of research does not exist; this lack applies to some of the techniques examined by the committee, IN7RODUC71ON 11 as well as to those yet to appear on the market. It is these techniques that present a problem for us as evaluators. Evaluation without data is difficult, but not impossible. Our approach is to place the techniques into broader categories corresponding to the key processes being influenced, for example, learning, motor skills, and influence. By so doing, the claims`7 can be evaluated within the frameworks of existing theories and metho-O dologies. They can also be judged against results obtained in relatedoo areas. This approach serves as the organizing theme for the chapters that ccl*4 follow, C%1 0 EVALUATING THE TECHNIQUES 0 0 w Evaluations properly hinge on answers to a standard set of questions proposed in a paper entitled "Evaluating Human Technologies: What a) Questions Should We Ask?" by Hegge, Tyner, and Genser (1983) at t 1 he 0 Walter Reed Army Institute for Research: 0 I to * What changes will the technique produce? 0) 13 0 What evidence supports the claims for the technique? 0 What theories stand behind the technique? !6 Who will be able to use the technique? < 0 What are the implications of the technique for Army operations? * How does the technique fit with Army philosophy? 0 What are the cost-benefit factors? These questions served as guidelines for the committee's evaluations. 00 0 Appendix A is a summary description of each technique, organized along 0 the lines of the Hegge, Tyner, and Genser questions, covering theory, 00 research, and application. For many of the categories, however, the 04 desired information is either too limited to be useful or simply not (D available; in such cases we have considered other strategies for evaluation. The committee faced a number of difficulties in evaluation that stem from recurrent problems posed by the technologies. One is the tendency (D w for some promoters (and consumers) to rely primarily on testimonials or %_ anecdotal evidence as a basis for application. Another is a general lack 0 U_ of strong research designs to provide evidence of effects. These problems 10 are considered also in the context of specific techniques discussed in the (D chapters of Parts 11 and III. > 0 Practitioners of techniques often emphasize the value of personal or CL clinical experience and marketplace popularity as bases for judging the a < techniques. They are generally less inclined to seek research evidence or to support research evaluation programs. These attitudes may be related to the fact that few practitioners are trained as researchers. For some it is sufficient to let others do the research. For others, research is 12 ENHANCING HUMAN PERFORMANCE viewed, in varying degrees, as a threat to their product. At one extreme, research is regarded as a debunking enterprise, engaged in by scientists who have little interest in providing human services. At another extreme, the problem is one of educating the researchers in nuance, context, and a,clinical approach that emphasizes adapting techniques to changed s4Lations and client tastes. The result is a gap in communication e8omized by two cult u res-scientists searching for evidence and prac- tioDners seeking effects and cures. A step toward bridging the gap would cgsi§t of mutual education throughioint ventures. These ventures would e'6~-se scientists to the goals (and motives) of practitioners and would Wmake practitioners aware of the general analytical approaches used b&cientists. W,xperimentation is an appropriate vehicle for evaluating performance- epancing techniques; the problem is usually defined in terms of effects ol~echniques (procedures) on performance (behaviors). It is also appro- p te at an earlier stage in the process, when products are being de(G?eloped. Products evolve in a kind of trial-and-error fashion similar in many respects to scientific discoveries. One model for integrating research w% product development is engineering research and development (IgD). A strenuous applied research effort accompani s the development 3'ess in many firms, as does a quality-contrc gram designed to ermuate products both during development and after they have been p PAed on the market. With a few exceptions, this model has not been a4Dpted by firms or institutions in the field of performance enhancement. T.xperimental evidence has accumulated in some areas related to tegniques. Although not linked specifically to product development in tZmanner of an R&D operation, this work does address the question, NNW evidence supports the claims for the technique? In fact, so strong isc*e experimental tradition in some areas that a body of work has de# loped programmatically within a generally accepted paradigm (e.g., guided imagery). The benefits of a long research tradition can be seen in Q t e areas. Meta-analyses have been performed and can be used as a 1148- NWs for evaluation. For other areas, we are presented with the prospect f lying on scattered experiments or using other criteria as a basis for 0 e lulation, or both (see Appendix A for summaries of the state of the scignce in each of the areas). lowever, the benefits of experimental evidence derive primarily from thqOgeneral approach rather than from the particular experiments. This idqmis captured by Kelman, who noted that "an experimental finding 11 1 c tannot very meaningfully stand by itself. Its contribution to k ow edg h s on the conceptual thinking that has produced it and into which it is subsequently fed back" (1968:161). We emphasize here the contribution IN7RODUCHON 13 of an analytical approach to thinking about behavior, as distinct from the establishment of laws about psychological processes. It is the cumulation of a series of experiments that winnows out the useful parts of treatments or techniques, It is the self-correcting progression of new experiments that refines treatments, saving those that work and discarding those that7 do not (or that work only under very restricted conditions). This process'o contributes equally well to the goals of theory development and productoo development. C%1 Other evaluation criteria elucidated by Hegge, Tyner, and Gensero (1983) include theories, uses, and implications for Army operations ando C*4 philosophy. A problem with these criteria is that they tend to be vagueo 0 and somewhat idiosyncratic, making it difficult to propose general cate-0 gories on which most people would agree. Without precisely definedw categories forjudging techniques, it is difficult to address issues of transfer""' of performance from one situation to another or to evaluate newlyl 0 emerging techniques. A similar problem exists with respect to developing 9 taxonomies in broadly defined fields: there is little agreement on a set of W 0) categories for the fields of human learning, performance, motivation, a- perception, and social and organizational processes. More mature sub- 13 d 'isciplines provide an empirical basis for taxonomies, allowing for more tightly constructed systems of tasks and situations: for example, rote learning, short-term memory, concept learning, problem solving, work U motivation, and team functions (see Fleishman and Quaintance, 1984). 0 An advantage of such systems is that they capture rather precise relationships between task and performance. 00 This discussion serves only to introduce the issues and identifies several themes that receive more detailed attention in the chapters to follow. 0 ~ 0 First, any evaluation must take into account the status 'of the available C.4 evidence. Confidence placed in judgments about a technique should be (D based on the quality of the evidence produced by researchers. Second, OM) the evaluator cannot afford to rely exclusively on a single criterion for (D judging effectiveness. Theoretical and applied issues are also important, (D w as are considerations of values served or violated by use of the technique. I- Third, technique development issues are not isolated from research or 0 LL analytical issues. Each step in the process of product design can be 'a regarded as an empirical issue; decisions made about procedures and (D packaging can be the result of experimental outcomes. Fourth, the subject 0 of enhancing human performance is not new. It has been a topic of " CL interest for centuries and an area of scientific work for several decades. CL The literatures on learning and skill acquisition should be consulted by developers, and insights derived from these literatures should be used in product design. 14 ENHANCING HUMAN PERFORMANCE These themes are woven throughout the discussions of specific tech- niques. Each chapter discusses relevant literature, describes the specific techniques, points to directions for further research when appropriate, and notes possible applications in military and industrial settings. Despite Me common coverage, however, each chapter is also unique in that each Jr- %btailored to the particular problems associated with its focus. N N > 0 L_ CL CL < Findings and Conclusions N N The committee's first major task was to evaluate the existing scientific0 evidence for a wide range of techniques that have been proposed to 115 enhance human performance. This evaluation was intended by our Army < sponsors to suggest guidelines for decision making on Army research and 0 training programs. In our evaluation we draw conclusions with respect ' ' Q to whether more basic or applied research is warranted, whether training v.- programs could benefit from new findings or procedures, and what, in 0-0- particular, might be worth monitoring for potential breakthroughs of use Q Q to the Army. In many of the areas examined it appears feasible to pursue CD carefully designed programs that build on basic research; however, such CD C*4 programs should be monitored closely. (D The committee's second major task was to develop general guidelines U) for evaluating newly proposed techniques and their potential application. (D We are aware that the use of basic and applied research in decision (D making is a complex issue. Although payoffs from basic research can often be realized in the long run, the value of research findings to the 0 Army depends on developing a way of putting them into practice. With LL regard to applied or evaluation research, further complexities are evident: multiple, sometimes conflicting, criteria must be satisfied at each of > 0 several stages in the evaluation process, from assessing a pilot program L_ CL to implementing the program in an appropriate setting. Another problem CL is that of choosing arnong alternative techniques when none of them has < been subjected to a systematic evaluation. In the absence of evaluation studies, the Army needs guidelines for selecting packages and vendors. The committee's evaluation has produced several answers to questions 15 16 ENHANCING HUMAN PERFORMANCE of how best to improve performance in specific areas. On the positive side, we learned about the possibilities of priming future learning by presenting material during certain stages of sleep, of improving learning by integrating certain instructional elements, of improving skilled per- 142Ymance through certain combinations of mental and physical practice, qLreducing stress by providing information that increases the sense of trol, of exerting influence by employing certain communication strat Wes, and of maximizing group performance by taking advantage of dnizational cultures to transmit values. On the negative side, we 4, 4~covered a lack of supporting evidence for such techniques as visual ~0. - W4mmg exercises as enhancers of performance, hemispheric synchroniz- Bon, and neurolinguistic programming; a lack of scientific justification the parapsychological phenomena considered; some potentially neg Ive effects of group cohesion; and ambiguous evidence for the effec eness of the suggestive accelerative learning package. ?I- iDThe remainder of this chapter presents the committee's findings and c9nclusions, which are presented in two parts. general conclusions Sarding the process of evaluating any technique being considered by tArmy and specific findings and conclusions for each of the areas of an performance examined. Whenever appropriate, we make recom- n endations for research, evaluation, and practice. GENERAL CONCLUSIONS :MFhe committee suggests that the Army move vigorously, yet carefully a& systematically, to implement techniques that can be shown to enhance P;aformance in military settings. Such an effort would be timely because c8recent developments in the relevant research areas. Moreover, the p"off is likely to be very high if techniques are selected judiciously. Aft J k8hough the desire for dramatic improvements in performance makes swe extraordinary techniques attractive, techniques drawn from main sAam research in relevant areas of performance may be more effective. a) 1&e Army's concern for enhancing human performance and its substantial reaources for evaluating techniques place it in a favorable position to tie advantage of developments. The Army might also consider the ppasibilities of transferring its findings to the civilian sector. ollectively, the committee's conclusions call'for the adoption of 451- SQ&ntifically sound evaluation procedures-, however, these procedUrCS MI- t be adapted to institutional needs and must take into account problems IM ofo-plementation. We summarize these considerations below. 4T, SCIENTIFic EvIDENCE. FINDINGS AND CONCLUSIONS 17 or compelling theoretical argument, or both. A technique's utility should bejudged in relation to alternatives designed for similar purposes, and the estimated utility should be of significant magnitude. Specific stages of analysis can be incorporated in pilot or field testing, and such testing should be carried out by investigators who are independent of the technique's originators or promoters. 0 TESTIMONIALS As EvIDFNCE 04 CV) Personal experiences and testimonials cited on behalf of a technique d are not regarded as an acceptable alternative to rigorous scientific 0 04 evidence. Even when they have high face validity, such personal beliefs 0 0 are not trustworthy as evidence. They often fail to consider the full range 0 of factors that may be responsible for an observed effect. Personal Y-' versions of reality, which are essentially private, are especially antithetical 'a`,), to science, which is a fundamentally public enterprise. Of course, a caution about testimonials should not be confused with a lack of openness 9 to new and unusual ideas. Such openness is consistent with the require- to 0) ment that the evidential criteria of science be satisfied. (L The subject of testimonials as evidence has received considerable a lX atiention in recent research on how people arrive at their beliefs. These studies indicate that many sources of bias operate and that they can lead to personal knowledge that is invalid despite its often being associated with high levels of conviction. The committee recommends that this research be disseminated, as appropriate, in the Army. It may then be applied whenever testimony is used as the primary evidence to promote 00 an enhancement technique. 0 0 0 CONDITIONS FOR IMPLEMENTATION 04 Two kinds of evidence should be sought to support decisions to M U) implement a technique- successful field tests and an analysis of imple- M M mentability. It would also be useful to analyze the impact of the technique Z or package on the larger system in which it is to be embedded. These W lyses would aid in explaining why the procedures are necessary and 6- ana why certain consequences are expected. In general, any description of 0 LL what a technique accomplishes should be accompanied by an explanation 0 of why it accomplishes what it does. Such an explanation would provide M > a more fundamental understanding of processes affected by exposure to 0 L the technique and permit optimal implementation. CL CL < RATIONAL DECISION MAKING The considerations that must be entertained in selecting a technique for practical use in a military setting are different from the considerations Techniques and commercial packages proposed for consideration by the Army should be shown to be effective by adequate scientific evidence 18 ENHANCING 11UMAN PERI'ORMANCE FINDINGS AND CONCLUSIONS /9 needed to verify the existence of an enhancement effect in a scientific setting. For example, the benefits of correct decisions and the costs of incorrect decisions, that is, the risk calculus, may differ in the two settings: Furthermore, what is viewed as a timely decision will also differ. The specific differences as they apply to particular decisions should be i4ade explicit. Q Q Q MECHANISMS FOR ADVICF C*4 (DIt would be useful to provide valid information about useful techniques Q N Army commanders and other interested staff on a regular basis. Special Q consideration should be given to ways in which technique-related infor- Ration can be transferred from scientists to practitioners. Tile charac- *&ristics of a transfer agent could be defined, and such a position might R established within an appropriate office. Q The committee recommends that the Army Research Institute formalize Se ways in which it receives and provides advice about specific tech- CDques. A committee to review experimental designs and statistical alyses could be convened to improve the evaluation of techniques. h-lecial and standing committees could also be used to make program 4commendations and to review proposals for intramural and extramural U search. Q BIDDING PROCEDURES 00 Purchase by the Army of a commercial enhancement package should Q ake place within the context of a set of well-dcfined procedures. The -bid procedure be followed, based ammittee recommends that an open a full presentation of the Army's stated objectives. This would Vourage competitive evaluation of techniques. The following informa- Oon, presented in a standard format, should be required: the objectives the technique, a description of its procedures, evidence that it produces le claimed effects, and the vendor's record of past achievements in tievant areas. U Lack of professional training and research experience in human per- -&rmance by a designer or advocate should not preclude consideration ?f the proposed package; it should, however, signal the need for a more Gringent analysis by the Army. L_ CL CL < SPECIFIC FINDINGS AND CONCLUSIONS We present below findings and conclusions for each of the areas investigated. Some statements take the form of suggested actions based on what we know; others consist Of Suggestions for more work or for research that has not yet been done. LEARNING DURING SLIZIET 1. The committee finds no evidence to suggest that learning o ccurs CD during verified sleep (confirmed as such by electrical recordings of brain Q Q activity). However.. waking perception and interpretation of verbal ma- C*4 M terial could well be altered by presenting that material during the lighter Q stages of sleep. We conclude that the existence and degree of learning Q C*4 and recall of materials presented during sleep should be examined again Q as a basic research problem. Q Q 2. Pending further research results, the committee concludes that W possible Army applications of learning during sleep deserve a second Ta") look. Findings that suggest the possibility of state-dependent learning I*- Q and retention (i.e., better recall of material when learned in the same 9 physiological and mental state) may be applicable to fatigued soldiers. (D Furthermore, even presentations of material that disrupt normal sleep (D cL may be cost-effective, as may presentations that coincide with stages of 0 lijht sleep. 0 Accr-.LERATED LEARNING Q L Many Studies have found that effective instruction is the result ofZ 00 such factors as the quality of instruction, practice or study time, motivation Q of the learner, and the matching of the training regimen to the job CD Q demands. Programs that integrate all these factors would be desirable. Q We recommend that the Army examine the costs, effectiveness, and C*4 longevity of training benefits to be derived from such programs and W U) Compare them with established Army procedures. M W 2. The committee finds little scientific evidence that so-called super- Z learning programs, such as Suggestive Accelerative Learning and Teach- W ing Techniques, derive their instructional benefits from elements outside " 0 the mainstream of research and practice. We observe. however, that U_ these programs do integrate well-known instructional, motivational, and -a practice elements in a manner that is generally not present in most scientific Studies. 0 3. We find that scientifically supported procedures for enhancing skills CL CL are not being sufficiently used in training programs and make two recommendations to remedy this problem. First, the basic research literature should be monitored to identify procedures verified by laboratory tests to increase instructional effectiveness. Second, additional basic 20 ENHANCING HUMAN I'ERFORMANCE research should be supported to expand the understanding of skill acquisition for both noncombat and combat activities. -4. We conclude that the Army training system provides a unique ,opportunity for cohort testing of training regimens. The Army is in a ,,,!position to create laboratory classroom environments in which competing Qtraining procedures can be scientifically evaluated. Q Q 5. The committee recommends that the Army investigate expert teacher C*4 mprograms by identifying and evaluating particularly effective programs (Dwithin the Army. In addition, transferable elements of effective instruction Q 04can be reported to the larger instructional community. Q Q Q IMPROVING MOTOR SKILLS w Ir- 1. The committee concludes that mental practice is effective in en- hancing the performance of motor skills. This conclusion suggests further Q a work in two directions: (1) evaluation studies of motor skills used in the I (D Army and (2) research designed to determine the combination of mental a) (L and physical practice that, on average, would best enhance skill acquisition 0 and maintenance, taking into account both time and cost. 105 2. The committee concludes that programs purporting to enhance cognitive and behavioral skills by improving visual concentration have not been shown to be effective to date. In our judgment, these programs are not worth further evaluation at this time. Q establish the 3. The committee concludes that existing data do not 00 generality of observed effects from programs that train visual capabilities Q to increase performance. Q Q 4. Similarly, the committee concludes that the effects of biofeedback Q on skilled performance remain to be determined. C*4 (D 5. The committee recommends additional research to establish the U) potential of these techniques in the domain of specific skilled perform- M (D ances. 77D ALTERING MENTAL STATES 0 LL 1. Time did not allow the committee to explore the evidence for a -0 wide variety of specific methods for relating mental states to changes in (D performance. Such methods include forms of self-induced hypnotic states > 0 and peak performance resulting from high levels of focused concentration L- CLand meditation. We recommend that reviews of the literature in these CL areas be undertaken to ascertain whether any practical results might be obtained by the use of such methods. 2. The committee finds that, while the study of mental computations in language and imagery has progressed in recent years, the effort to understand how such computations are modulated by energetic factors FINDINGS AND CONCLUSIONS 21 such as arousal, stress, emotion, and high levels of sustained concentration has not been fully developed. For example, the claims that certain mental states produce general improvements in performance derive from the idea, supported by research, that arousal affects mental computations and that there ought to be an optimal level of arousal for the performance Q of such computations. We recommend this as an important area for Q investment of basic research funds. 24 3. The committee's review of the appropriate literature Q refutes claims M that link differential use of the brain hemispheres to performance. Further Q evaluation of these claims depends on developing valid and reliable C,4 Q Q measures of hemispheric involvement. Q 4. The committee finds no scientifically acceptable evidence to support W the claimed effects of techniques intended to integrate hemispheric activity, for example, Hemi-Synclt-;41. Attempts to increase information- Q processing capacity by presenting material separately to the two hemi- Q spheres do not appear to be useful. We conclude that such techniques (6 should be considered further by the Army only if scientific evidence is 4M (L provided to and evaluated by the Army Research Institute. STRESS MANAGEMENT 1. Existing data indicate that stress is reduced by giving an individual as much knowledge and understanding as possible regarding future events. cr? In addition, giving the individual a sense of control is effective. On the 00 basis of these findings, the committee recommends a systematic program Q of research and development that would address three questions: (1) How CD Q relevant is this finding for stress reduction in the Army? (2) To what Q C*4 extent does stress reduction realized in training transfer to combat (D situations? (3) What are the limitations on providing knowledge and U) understanding of future events and a sense of control in the Army setting? T Pending the outcome of this research, we suggest that consideration be 77D given to including the material in training programs for company grade, field grade, command, and staff officers. 0 2. We find that, while biofeedback can achieve a reduction of muscle LL tension, it does not reduce stress effectively. It is therefore not a promising '0 research topic in that respect. We recommend that funding be directed > 0 toward investigation of more promising stress management procedures. " CL 3. We recommend that information be gathered on the costs of stress CL in terms of organ breakdown, loss of efficiency, and loss of time. This < information would have implications for training programs. INFLUENCE STRATEGIES 1. The committee finds no scientific evidence to support the claim that neurolinguistic programmingis an effective strategy for exerling influence. 22 ENHANCING HUMAN PERFORMANCE We advise that further Army study of this aspect of NLP be made only in comparison with other techniques. 2. There are no existing evaluations of NLP as a model of expert performance. We conclude that further investigation of such models may be worthwhile and suggest that NLP be examined in comparison with several other techniques. 3. Concerning the process of technology transfer, we recommend that ostudies be conducted to develop training regimens for those who train Oothers to wield social influence. The large literature on this topic in social C*4 V)psychology would provide a basis for such packages. C*4 GROUP COHESION 1. We find few scientific studies that address the possible relationship 0')between group cohesion and performance; however, such a relationship omay well be found with more extensive research. There is a need for 9research to consider the possibility of negative effects from inducing W d~cohesion and methods of avoiding such effects. The committee recom- 0-mends continued study of cohesion and related group processes. 2 We are favorably impressed with the evaluation studies of the rmy s COHORT system. We endorse the investigators' plan to proceed beyond measures of attitudes to measures of group performance. 6 3. We recommend that the Army, as well as independent investigators, 4~study the possible impacts of cohesion beyond the COHORT system, for :C~xample, on intergroup performance. 00 PARAPSYCHOLOGY 04 1. The committee finds no scientific justification from research con- oducted over a period of 130 years for the existence of parapsychological 5henomena. It therefore concludes that there is no reason for direct 73nvolvement by the Army at this time. We do recommend, however, that %esearch in certain areas be monitored, including work by the Soviets ~5nd the best work in the United States. The latter includes that being Udone at Princeton University by Robert Jahn; at Maimonides Medical 'genler in Brooklyn by Charles Honorton, now in Princeton; at San :0, to n nio by Helmut Schmidt; and at the Stanford Research Institute by Edward May. Monitoring could be enhanced by site visits and by expert &vice from both proponents and skeptics. The research areas included -*ould be psychokinesis with random event generators and Ganzfeld effects. 2. One possible result of the monitoring mentioned above is the proposal I-INDINGS AND CONCLUSIONS 23 of specific studies. In that situation the committee recommends the following procedures: first, the Army andoutside scientists should arrive at a common protocol; second, the research should be conducted according to that protocol by both proponents and skeptics; and third, attention should be given in such research to the manipulability and practical application of any effects found to exist. 04 C*4 0l) CD a- 00 C4 7~D 0 LL V > 0 %- CL CL < EVALUATION ISSUES 25 3 "7 le- 0 0 Evaluation Issues 0 C14 CV)1 0 0 04 0 IL Implementation of an enhancement technique, in the committee's view, a Wshould depend on two general kinds, or levels, of evaluation. The first .~examines primarily the scientific justification for the effectiveness of the Gtechnique and the potential of the technique for improving performance . in practice. The second kind examines field tests of a pilot program oincorporating the technique to determine how feasible it is and to what Ir- ~_extent it brings about effects that Army officials consider useful. 00 0 Convincing scientific justification can come only from basic research, 6that is, from carefully controlled studies that usually take place in Slaboratory settings and that preferably are related to a body of theory. Much research can provide evidence for the existence of the causal effect n which a technique is based and can help explain, or indicate a RD Mmechanism for, the effect. Analysis in connection with basic research hould go beyond scientific justification to operational potential and likely Wcost-effectiveness. Only field tests can assess a program's actual opera 8~i~ns and effects, however, and for such tests a broader array of evaluative U_criteria are needed, related primarily to the technique's utility. Because strong claims of support from basic research have been made Vor some of the techniques the committee examined, we review here Cwhat it takes to justify a scientific claim, specifically, we review some %_ aitandards for evaluating basic research. We then examine in more detail .come standards for evaluating field tests of pilbt programs. In the third section of this chapter, we set forth briefly some of our impressions of how the Army now manages the solicitation and evaluation of new performance-enhancing techniques. This chapter concludes with a note on informal, qualitative approaches to evaluation, which are sometimes suggested as alternatives to basic research and field tests. This chapter does not aspire to a comprehensive treatment of evaluation issues, and it barely touches on research methods. Articles, journals,,- books, and handbooks testify to the scope and complexity of this'L burgeoning field (e.g., Barber, 1976; Cook and Campbell, 1979). Outo- 0 objective here is to highlight the topics that have impressed us as mosto germane. The various sources just mentioned would need to be consultedco for even a minimal elaboration of these topics, and other committeesO 0 would be required if recipes for evaluation of the Army's enhancementc*4 0 programs were sought as extensions of our work. Still, we believe thiso chapter will help the Army set general evaluation standards. 0) STANDARDS FOR EVALUATING BASIC RESEARCH The purpose of basic research is to permit inferences to be drawn in a) accordance with scientific standards, including inferences about novel IL concepts, about causation, about alternative explanations of causal W I relations, and about the general izabil ity of causal relations. < IFor novel concepts, evidence must be gathered that both the purported enhancement technique and the relevant performance have been (1) defined in a way to highlight their critical elements, (2) differentiated 0 from related variables that might bring about similar effects, and (3) put !__ into operation (manipulated or measured) in ways that include the critical 00 0 parts. The burden is on the evaluator to analyze how the components of -0- each new technique differ from concepts already in the literature. The 0 0 need for this standard is illustrated well by packages for accelerated 04 learning, as discussed in Chapter 4. Evidence needs also to be adduced that supposed cause and effect M (D variables vary together in a systematic manner. Relevant procedures - include comparison of performance before and after introduction of the W technique, contrasts of experimental and control groups in an experimental o design, and calculation of statistical significance. Illusory covariation can U_ occur more easily in nonstatistical studies, which are used often to support the existence of paranormal effects, as discussed in Chapter 9. > Especially demanding is the need for evidence that the performance 0 effect observed is due to the postulated cause and not to some other 0 CL variable. Ruling out alternative explanations or mechanisms requires intimate knowledge of a research area. Historical findings and critical commentary are needed to identify alternatives, determine their plausi- bility, andjudge how well they have been ruled out in particular sets of experiments. Common threats to the validity of any presumed cause- 24 26 ENHANCING HUMAN PERFORMANCE effect relation include effects stemming from subject selection, unexpected changes in organizational forces, the spontaneous maturation of subjects, and the sensitizing effects of a pretest measurement on a posttest assessment. Experiments with random assignment of subjects to treat ments~ are preferred, but some of the better quasi-experi mental designs ,are also useful. Another class of threats to validity is associated with Ir dubject reactions to such conceptual irrelevancies as experimenter ex- ectations about how subjects should perform or subjects' performing Reiter merely because they are receiving attention. Procedures that have avolved to reduce this sort of threat include double-blind experiments, Rlacebo control groups, mechanical delivery of treatments, and the colimination of all communication between experimenters and subjects or omong subjects. These safeguards, however, are not certain, and imple- Menting them is not a simple matter. 0) Finally, for a technique to be of value, one must ascertain that a causal ~,,Jation observed in one setting is likely to be observed in other settings which the technique is to be employed. Replication of an experiment an independent investigator is a first step. Another step is to produce Fe cause and effect with different samples of people, settings, and times. stematic reviews of the literature, perhaps aided by what is referred tas meta-analysis of studies (as illustrated in Chapter 5), are also helpful. L%eyond these steps, a thorough theoretical understanding of causal 0ocesses, which is a fundamental goal of science, permits increased actical control. v-Our point-perhaps seeming obvious to many but nonetheless needing emphasis here-is that a planned or existing program for implementing enhancement technique is much more likely to bear fruit if evidence the technique's effectiveness is properly derived from basic research. gomplex set of ground rules exists forconducting and drawing inferences ftpm basic research, and waiving those rules greatly increases the chances cgincorrect conclusions. W STANDARDS FOR EVALUATING L_ FIELD TESTS OF PROGRAMS 0 UAn adequate appraisal of an actual enhancement program requires afpntion to three general factors. First, the organizational (i.e., political, a(~ninistrative) context in which the program is embedded should be d-1-meribed. That context strongly influences the choice of evaluation cq%ria, the types of evaluations considered feasible, and the extent to Ach evaluation results will be used. Second, the program's conse- quences should be described and explained, including planned and unplanned, short-term and long-term consequences. The way the program EVALUA77ON ISSUES 27 is construed influences the claims resulting from an evaluation and the degree of confidence that can be placed in what was learned. Third, value or merit should be explicitly assigned to a program. Valuing relates an enhancement technique to an Army need and to feasible alternatives. In the following sections we comment on these -1L three factors in turn. TIiE ORGANIZATIONAL CONTEXT 04 CIO A description of the broader context of an 0 enhancement program would 0 include an assessment both of the various C-4 constituencies with a stake in its implementation and of the priorities of 0 the larger institution. We do not discuss stakeholder interests in general at this point because we refer to some specifically later in this chapter, in the section on the committee's impressions of current Army evaluation practices. We do comment here on the Army's institutional priorities as 0 they may relate to scientific standards. We understand that the Army, like other organizationsa) in society, may have-and quite possibly should have-different0- standards for evaluating a knowledge claims, or technique effectiveness,W than science has. The scientific establishment is conservative in the tests it administers to < discipline its conjectures; in particular, 0 its goal is to reduce uncertainty as far as possible, no matter how long that takes. In the Army, by contrast, the need for timely information and decisions may lead to an acceptance of greater uncertainty and a higher risk of being wrong. 00 There is no Army doctrine of which we are 0 aware concerning the degree of risk that is acceptable in evaluations0 of pilot programs. Yet surely one objective of evaluations of pilot 0 programs should be to describe the costs to the Army of drawing incorrect 04 conclusions so that inferential standards can be made commensurate with thoseW costs. if the costs are U) relatively low, the riskier approach of most M commercial research (as, for example, in management consulting or marketing)7~ may be preferred to the more conservative approach of basic science. 0 DESCRIBING A PROGRAM'S CONSEQUENCES LL a In evaluating a program, it is desirable to present an analysis and defense of the questions probed and not probed, together with justification 0 for the priorities accorded to various issues. Primary issues usually include the program's immediate effects and its organizational side effects. Immediate Effects A primary problem in evaluation is to decide on the criteria by which a program is to be assessed. The major sources for identifying potential 28 ENHANCING HUMAN PERFORMANCE criteria include program goals, interviews with interested persons, con- sideration of plausible consequences found in the literature, and insights gained from preliminary field work. Such criteria specify only potential effects, however. They do not Tpeak to the matter of whether the relation between a supposed cause F9d effect is truly causal. In this respect, a fundamental issue of cDethodology is the use of randomized experiments. Although logistic Q cqasons abound in any practical context for not going to the trouble to 9' e such research designs, one might nonetheless argue that the Army is 9 a better position to conduct randomized experiments than are organi- extions in such fields as education, job training, and public health. The gason for going to such trouble is that randomized experiments give a Mwer risk of incorrect causal conclusions than the alternatives. T"Alternatives at the next level of confidence are quasi-experimental a) 6esigns that include pretest measures and comparison (control) groups. oelatively little confidence can be placed either in before-aftcr measure- Zents of a single group exposed to a technique without an external gmparison, or in comparisons of nonequivalent intact groups for which Gretest measures are not available. 0 Side Effects a Unintended side effects include impacts on the broader organization, Md these should be monitored. For example, trainers from other (non- Sperimental) units may copy what they think is going on, or they may G"ply be upset by the implementation of new instructional packages in Be experimental units. Units not treated in the same way as the ftperimental units may be unwilling to cooperate when cooperation Could seem to be in their best interest. They may also suffer by Omparison, as is thought to be the case, for example, when COHORT luits are introduced into a division (see Chapter 8). Evaluators should &rive to see any program as fitting into a wider system of Army activities qn which it may have unintended positive or negative effects. 0 LL 'a (D ASSIGNING VALUE TO PILOT PROGRAMS > 0 "The described consequences of a program tell us what a program has &ieved but not how valuable it is. Three other factors are important in Werring value: Does the new technique meet a demonstrable Army need to the extent that without it the organization would be less effective? How likely is it that the program can be transferred to other Army settings, either as a total package or in part? How well does the new EVALUA71ON ISSUES program fare when compared with current practi for bringing about the same results? Meeting Needs 7 Representatives of the commercial wmHd wh, products often confound wants with nee?s, enft hope with reality. While it is axiomatic at all t meet genuine Army needs, it is not clearhow 1 when the developers of new products approa( permission to do general research or flegtests. analysis should be part of the documenta ,Son ab( What should a needs analysis look liW. At I document the current level of performan= at sc is inadequate, what reason there is toMelieve change, and what the Armywide irnpa& wok performance in question were improved.& addii question why a particular program is mm4ed fc W Such an analysis would describe the pi3gram, jus'tification in basic research, identify thcWnanci; required to make the program work, rela 'the re Q( funds available, examine other ways of bt('-Dging a results, and justify the program at hand iti -terms effectiveness. To facilitate critical feedIFLIck, s independent of the persons who sponsor8progr thorough, firsthand acquaintance with tle progi and sponsors. Q Q As just described, needs analysis is 0plann 04 mounting a pilot program. It is not a rev&w of relative to needs, for which a descriptioncof a pi is required. At that later stage in evaluatioTajudl whether the magnitude of a program's effeW is su to a degree that makes a practical diffAnce. whether the program makes a statistically ~pliabi( ance. Size of effect relative to need is Wc cruc magnitude of change required for practical gnific in advance, it is easy to use s catio uch a specl need has been met. But the level of change requi not usually predetermined, and there are po&ical i are not always eager to have their prograu4 evali sizes they themselves have clearly promised or t them. Needs can be specified only by Army officials, 30 ENHANCING HUMAN PERFORMANCE officials inspect the results a program has achieved, relating them to their perception of need. Since the Army is heterogeneous, it would be naive to believe that there are no significant differences within it about how important various needs are and how far a particular effect goes in meeting a particular need. Some theorists relate needs primarily to the T_ number of persons performing below a desired level, while others ,L emphasize the seriousness of consequences for unit Performance, for 0which deficiencies in only one or two persons may be crucial. Some 0 0pyactitioners are likely to think a deficit in skill X is worse than a deficit C*4 CV) in skill Y, while others may believe the opposite. Evaluators who take CD the concept of need seriously have to take cognizance of such hetero- 0 C*4 geneity, perhaps using group approaches like the Delphi technique to 0 0 bring about consensus on both the level of need and the extent to which 0 a particular pattern of evaluative results helps meet that need. 0 Likelihood of Transfer 0 CL Although some local commanders may sponsor field trials for the benefit 0 of their command alone, the more widely a successful new practice can 0- be implemented within the Army, the more important it is likely to be. 13 Consequently, evaluations of pilot programs should seek to draw conclu- w sions about the likelihood that findings will transfer to populations and settings different from those studied. In this regard, it is particularly important to probe the extent to which 0 any findings from a pilot study might depend on the special knowledge and enthusiasm of those persons who deliver or sponsor the program. 00 0 Such persons are often strongly committed to a program, treating it with 0a concern and intensity that most regular Army personnel could not be 0 expected to match. While it is sometimes possible to transfer such 0 04committed persons from one Army site to another in order to implement a) a program, in many instances this cannot be done. Transfer is partly a 0) Mquestion of the psychology of ownership; authorities who did not sponsor d) (D a product will sometimes reject out of hand what others have developed, wincluding their immediate predecessors. Since Army leaders in any s- osition turn over with some regularity due to transfers, promotions, and 0P LLretirement, successors will probably not identify with a program as -0strongly as the original sponsors and developers did. The likelihood of transfer also affects the degree to which program > Oimplementation is monitored. Pilot programs are likely to be more I- =btrusively monitored than other programs. Not only is this obtrusiveness ~Iue to developers' and evaluators' fussing over their charge, it is also due to teams of experts brought in to inspect what is novel and to responsible officers wanting to show others the unique programs they EVALUA77ON ISSUES 31 are leading (and on which the success Of their careers may depend). For at least these reasons pilot programs tend to stand out more than the regular programs they may engender. Research suggests that the quality with which programs are delivered may in fact increase when outside personnel are obviously monitoring individual and group performance. T7 It is naive to believe that one can go confidently from a single pilot '0"', program to full-blown Armywide implementation. Even if this 0 were 0 feasible politically, it would not be technically advisable unless there 04 CV) were compelling evidence from a great deal of prior research indicating 0 that the program was indeed built on valid substantive foundations. Given 0 C14 a single pilot program, decisions about transfer are best made if the 0 0 program is tested again, at a larger but still restricted set of sites and 0 under conditions that more closely approximate those that would pertain W if the new enhancement technique were implemented as routine policy, 'a")' 'Only then might serious plans for Armywide implementation be feasible. Contrast with Alternatives 0) 13 Most of the evaluation we have discussed contrasts a novel program 0 wish standard practices that are believed worth improving; yet rational models of decision making are usually predicated on managers' having to choose among several different options for performing a particular L) task. One would hope that every sponsor of a novel performance enhancement technique is conversant with the practical alternatives to it and has cogent arguments for rejecting them. 00 0 Many novel techniques have some components that are already in 0 standard practice or can be clearly derived from established theories. 0 Upon close inspection, pilot programs often turn out to be less novel 0 C*4 than their developers and sponsors claim. Of course, the Army may often a) find it convenient to order complete packages in the form offered and may not have much latitude to interact with developers in order to modify package contents to emphasize what is truly a novel alternative and to downplay that which is merely standard practice. Ultimately, alternatives have to do with costs. Although many forms 0 of cost are at issue-including those associated with how much a new LL practice disrupts normal Army activities and how much stress it puts on (D personnel-the major cost usually considered is financial. Cost analysis > 0 is always difficult, nowhere more so than in the Army, which use I s many a ways to calculate personnel costs. Nonetheless, in planning an evaluation, CL some evidence about the total cost of a pilot program to the Army will usually be available and can be critically scrutinized. It is also useful, as far as possible, to ascribe accurate Army costs to each of the major components of such an intervention. In our view, what is called cost- 32 ENHANCING HUMAN PERFORMANCE effectiveness analysis lends itself better than what is called cost-benefit analysis to the comparison of different programs. The purpose of cost- effectiveness research is to express the total cost for each program in dollars terms and to relate this to the amount of effect as expressed in its 7original metrics-unlike cost-benefit research, in which even the effects -have to be expressed in dollar terms. Sophisticated consumers of eval- CD Quation should want something akin to cost-effectiveness knowledge, for CD it. reflects decisions they should be making. Is it not useful to know, for C*4 M example, that the best available computer-assisted instruction packages CD Q are much less cost-effective than peer tutoring? C*4 CD CD CD CURRENT STATUS OF ARMY EVALUATIONS W Ir- (D We set forth here some of our impressions of the way in which the Army currently manages the solicitation and evaluation of novel tech- CD . s to enhance performance. We must stress that these are only 9 nique to impressions, gained through the limited investigative capabilities of a a) committee such as ours, not hard conclusions based on systematic IL 0 research directed at the particular question. Furthermore, although the 105 opinions that follow are largely critical of Army procedures, they are not < accompanied by much detail. As noted earlier, the focus here is on the identification of the various Army constituencies that have a stake in enhancement programs and on the role they play in evaluation. CD How the Army decides which among competing proposals should be 00 sponsored for development or for field tests is not clear. What is clear is CD that decision making is diffuse both geographically and institutionally. CD Sponsorship may come from senior managers in the Pentagon or from CD CD local personnel of varying rank. While differences in the quality of C*4 program design, implementation, or evaluation may be correlated with U) the source of sponsorship, such a correlation is not clear at present in M the Army context. (D 77D A particular concern is that Army sponsors of pilot programs may base their judgment about the value of a program either on their own ideas 0 about what is desirable or effective or on the persuasiveness of the LL arguments presented to them by program developers, who stand to gain '0 financially if the Army adopts their program. Judgments of value should > depend on broader analysis of Army needs and resources, as well as on 0- realistic assessment of the quality of proposed ideas based on a thorough CL and independent knowledge of the relevant research literatures. Sponsors CL < should examine what is being advocated at every stage: proposal, testing, and implementation. Also of concern when pilot programs are planned is how decisions are reached about funding and about the quality of implementation expected EVALUATION ISSUFS 33 from them. Although Systematic evidence is lacking, it seemed to committee members that pilot programs are not generally implemented well and, except for fiscal accountability, are not closely monitored by their Army sponsors. Evaluations of pilot programs should try to char- acterize resources required by the program and the resources actual1r, available. ICVD We found little evidence that sponsors, advocates, or local implementersO had aspirations to evaluations that use state-of-the-art methods, We foun(R no guidelines about the standards expected for evaluative work, whether, in the form of published minimal standards or published statements o Preferred practices. When it comes to field trials of novel ideas fog enhancing human performance, the monitoring of evaluation quality doe 10 not seem to be part of the organizational context. Given the absence o -E IW formal expectations in these regards, it is not surprising that the pilo Programs we saw and the evaluation materials we read were usuall~f_ CD disappointing in the technical quality of the research conducted. Inc) settings in which program sponsors or advocates control an evaluation,(6 w eaker evaluations (e.g., based on testimony) will sometimes be preferred(D IL to stronger methods (e.g., experiments) because the latter are usuallyo more disruptive when implemented and are more likely to result in effectsw that are disappointing, however much more accurate they may be. The-~ weaker methods are easier to implement when few units are available, are less disruptive of ongoing activities, are easier to manipulate for self- CD interested ends, and need not be as expensive for data collection. We saw little evidence that the Army requires evaluations by personso-o independent of the Pilot program under review. Moreover, the noninde-CD CD pendent evaluations we saw did no( seem to have been subjected to any CD of the peer review Procedures to which research results (and plans) are CD C*4 subjected not only in academic sciences, but also in much of the corporate (D world, as with, say, pharmaceutical testing. While in-house evaluation is U) M highly valuable for gaining feedback for program improvement, many (D experienced evaluators contend that it is inadequate for assigning overall 77D value because in-house evaluators cannot divorce themselves from their Own stake in the program under examination. Although it is not easy to 0 Specify organizational standards adequate for a high-quality field test of LL some novel technique, it is also not difficult to detect the inadequacies 13 associated with local program sponsors~ having few clear expectations > 0 about the desirable qualities of program operations or evaluative practices. L_ CL In the absence of such expectations, program developers and evaluators CL may believe that few officials care about the small-scale field tests of techniques on which the developers'-and, all too often, the evaluators'- own welfare depends. Since the organizational climate we have just described is not optimal 34 EWHANCING HUMAN PERFORMANCE forgaining trustworthy information about program value, future evaluators of Army field trials might do well to characterize: (1) what program managers expect in terms of the quality of the program and its evaluation; (2) who is paying attention to the trials: and (3) for what purposes they want to use any information provided by the evaluation. This kind of information, as mentioned above, contributes to a description of the vL organizational context of a program, which is a major part of an adequate 0 0 evaluation. 0 C*4 CV) QUALITATIVE APPROACHES C*4 n are the largely qualitative traditions, 0 Alternatives to experimentatio 0 which rely mostly on direct observation, sometimes supplemented by 0 archival data. Investigative journalists operate in this mode; so do many w Ir- cultural anthropologists, political scientists, and historians. These profes sions use clues to suggest hypotheses about possible causes and investigate CD the empirical evidence in ever-greater detail in an attempt to rule out 0 ,L hypotheses until they are left with just one. A critical aspect of their 0) work is the use of substantive theories and ad hoe findings from the past 0- a to help in ruling out alternative explanations. Also working in this tradition Ware committees of psychologists who seek to make statements about the I 0- On balance, the benefits derived from careful experimentation outweigh 0- the costs just mentioned. All other things being equal, experimentation CL is much the preferred strategy forjudging the efficacy of techniques that < purport to enhance performance, and it should be used whenever possible. CO C14 7j-D 0 LL V (1) > 0 I- CL CL PART III Parapsychological Techniques F ALL'rtir--, suBn'c'rsTREATF_D in this volume. none is more contro rsial than parapsychology. While the flavor of the debates is M captured to some extent in this chapter, the subject is treated in the same manner as the other techniques reviewed: we address the question of whether the evidence warrants further consideration of parapsychological techniques for research or application or both. Emphasized here is information gathering by rernote viewing and mind- over-matter effects in controlling machine behavior, particularly machines that generate series of random numbers, which are often used in para- psychology experiments. Although scattered results are said to be statis- tically significant, an evaluation of a large body of the best available evidence does not support the contention that these phenomena exist. If, however, future experiments, conducted according to the best possible methodological standards, are more generally viewed as producing sig- nificant results. it Would be appropriate to consider a systematic program of research. Such a program should include a concern for the need to proceed from small effects to practical applications. 167 0 CO 0 C*4 7j-D 0 LL V (1) > 0 1- CL CL 9 Paranormal. Phenomena BACKGROUND < The primary purpose of this chapter is to evaluate< the scientific evidence 0 on parapsychological techniques in selected areas.0 A more complete understanding of the topic, however, requires that we provide background Q Q on the military's interest in these phenomena and treat the conceptual 00 issue of how people come to believe as they do. 00 This background section Q includes a discussion of the phenomena and the Q military's interest in Q Q them as well as an overview of the committee's Q Q focus. A brief examination Q of the different kinds of justifications for the Q C*4 claims is followed by a more C*4 W detailed treatment of the evidence in areas that W have produced large U) literatures: remote viewing, random number generators,U) M and what are M W called Ganzfeld (whole visual field) experiments. W In addition, we describe 77D experimental Work that the committee actually witnessed7a; W by visiting a W 16- parapsychological laboratory. Despite the growing 16- scientific tradition in 0 some of these areas, many people continue to rely 0 on qualitative or LL experiential evidence to support their beliefs; LL we discuss the problems associated with qualitative evidence in conjunction with the research on > > cognitive and emotional biases, which is reviewed 0 in the paper by Dale 0 16- Griffin (Appendix 13). Finally, the chapter summarizes16- CL the committee's CL CL major conclusions. CL THE NATURE OF THE PHENOMENA Parapsychologists divide psi-the term applied to all psychic phenom- ena-into two broad categories: extrasensory perception (ESP) and 169 170 ENHANCING HUMAN PERFORMANCE psychokinesis (PK). Included in ESP are telepathy, precognition, and clairvoyance, all of which refer to methods of gathering information about objects or thoughts without the intervention of known sensory mecha- msmsJ Popularly called mind over matter, PK refers to the influence of thoughts upon objects without the intervention of known physical proc- ,resses. CD CD A presentation to the committee by several military officers described an some detail the results of experiments in remote viewing carried out C-M both SRI International and the Engineering Anomalies Research 8-aboratory at Princeton University. In these experiments subjects are Daid to have more or less accurately described a geographical location cbeing visited by a target team. Although the human sub ects have no way J f normally knowing the target location, the examples recounted appear 0~_ "o indicate, at first glance, some striking correspondences between their a) Pescriptions and the actual sites. These studies have been related by ome persons to reported out-of-body experiences. a (6 The presentation included discussion of psychic mind-altering tech- 01hiques, the levitation claims of transcendental meditation groups, psy- j:Fhotronic weapons, psychic metal bending, dowsing, thought photogra- oMhy, and bioenergy transfer. It was indicated that the Soviet Union is far .~ahcad of the United States in developing potential applications of such aaranormal phenomena, in particular psychically controlling and influ- encing minds at a distance. At the presentation, personal accounts were !P;iven of spoon-bending parties, in which participants believe they have ~Yauscd cutlery to bend with the power of their minds, as well as instances ef self-hypnosis to control pain and cure illness, walking barefoot on fire (,)and handling hot coals without being burned, leaving one's body at will, ciind bursting clouds by psychic means. C'*4 The media and popular publications, especially in recent years, have (D aliscussed various aspects of psychic warfare. Three recent books, by OEbon (1983), McRae (1984), and Targ and Harary (1984), have attempted -6o document Soviet and American efforts to develop military and intel- Wigence applications of alleged paranormal phenomena. These accounts 8have been augmented by newspaper stories, magazine articles, and Welevision programs. Many of these sources acknowledge the speculative 'Chature of the proposed applications, but others report that sonic of the gechniques already exist and work. .0 The claimed phenomena and applications range from the incredible to 4he outrageously incredible. The "antimissile time warp," for example, CL 0- CL CL 174 ENHANCING HUMAN PERFORMANCE applied and interpreted correctly. Given that all ordinary explanations must be ruled out, the experimenter must take special precautions to ensure that sensory cues, recording errors, subject fraud, and other alternatives have been prevented. Although it is impossible to rule out completely every possible contaminant or to anticipate every alternative, tTe're are reasonable standards that most parapsychologists would agree s&uld be followed. 83ecause different research paradigms have their own special require- nIgnts, no single set of standards can be specified in advance for all C peyapsychological experiments. Experiments with electronic number Aterators, for example, rarely have problems with data recording, but tlQy do require special methods such as tests of randomness and attention Q tothe immediate physical environment that are unnecessary with more tMitional parapsychological experiments. One requirement for assessing tG; adequacy of a given experiment is that its procedures and methods C&nalysis be adequately documented. Unless we know how the targets v9re selected, how the results were analyzed, how the possibility of smsory leakage was prevented, and how other such aspects of the study vmre carried out, we have no basis for evaluating the quality of the iDDrmation provided by the experiment. GLOBAL CRITERIA 0 PARANORMAL PHENOMENA 175 the experiments, which is a kind of replicability quite different from the consistent and lawful patterns of covariation found in other areas of inquiry- Despite the fact that scientific progress in a given area depends on the accumulation of lawful and consistent patterns across many experiments, the methods for deciding that such consistency exists are still quite primitive in comparison with the standards for judging the adequacy of a single experiment. Indeed, it is only within the past few years that serious attention has been devoted to developing objective and standard ized Procedures for evaluating the consistencies across a body of inde pendent studies. For the most part, judgment about what a body of investigations demonstrates is still a surprisingly intuitive and haphazard process. This probably has not been a serious drawback in those areas of inquiry in which the basic phenomena are robust and experiments can be conducted with high confidence that the predicted relations will be obtained; but such impressionistic means for aggregating the outcomes of several experiments in the domain of parapsychology open the door to all the motivational and cognitive biases discussed in the paper prepared for the committee by Griffin. Not only are the data and alleged correlations erratic and elusive in this field, but their very existence is open to P question. Jhe criteria mentioned in the preceding paragraphs apply to the individual experiment. More global criteria come into play when one ts to evaluate an entire research program or set of experiments. Here look for such things as replicability, robustness, lawfulness, manip 2 bility, and coherent theory. These criteria deal with the coherence Q an intelligibility of the alleged phenomena. It is in terms of such global ctiteria that parapsychological research has been especially vulnerable. ,2&4uch of the objectivity involved in assessing the adequacy of research 9' aAblies to judging individual experiments. But science is cumulative and d&1ends not so much on the outcome of a single experiment as on consistent and lawful pattcrns of results across many experiments carried o0 in a variety of independent settings. Lawful consistency in this sense, a!:~ording to both parapsychologists and their critics, has never been fend in parapsychological investigations in the history of psychic rc8earch. Recently a few parapsychologists have expressed the hope that Lexperiments on remote viewing, random number generators, and the tJr Q3--ozfeld (the very ones we have chosen to examine in detail in this roort) may actually yield the long-sought replicability. The type of replicability that has been claimed so far is the possibility of obtaining significant departures from the chance baseline in only a proportion of EVALUATION OF THE SCIENTIFIC EVIDENCE To evaluate the best scientific evidence on the existence of psi, and with the advice of proponents and our sponsors, we conducted site visits to some of the most notable parapsychological laboratories. The para- psychology subcommittee (see Appendix Q visited Robert Jahn's Engi- neering Anomalies Research Laboratory at Princeton University, where it witnessed presentations and demonstrations regarding psychokinetic experiments on random number generators. Jahn and his associates also briefed the subcommittee on the current status of their work in remote viewing. The subcommittee also visited Helmut Schmidt's laboratory at the Mind Science Foundation, San Antonio, Texas. Schmidt pioneered the use of random number generators in parapsychology experiments in 1969. His is considered one of the two major research programs on psychokinesis (the second is Jahn's). As an additional posssible input, the committee agreed to participate in a psychokinetic experiment of new design with Helmut Schmidt. Specifically, Schmidt accepted the suggestion that the committee's con- sultant, Paul Horwitz, be included in the conduct of the experiment. The W 0 CL CL PARANORMAL PHENOMENA 176 ENHANCING HUMAN PERFORMANCE 177 work has not yet begun, however, and it now appears that we will not designated target site and remain there for an agreed-on 15-minute have any results to report before our terms expire. period (after allowing approximately 30 minutes to reach the site). The chair of the parapsychology subcommittee During the time that the target team remains also visited SRI Inter- at the target site, the national, another major laboratory studying subject describes his or her impressions into psychic effects on random a tape recorder and also ,wmber generators. (This latter research group argues that the observed makes any drawings that would help to clarify T- those impressions. j. I ,dffects are not due to psychokinesis but rather When the target team returns to the laboratory, T- represent a special form all the participants t~~F -recognition.) The subcommittee chair listen to the tape recording of the subject's Q also attended the meetings of impressions. Then all tie Parapsychological Association held at Sonoma Q State College in the participants go to the target site, where Q the subject is allowed C*4 nalifornia. The entire committee made a site to see how closely his or her impressions agreed visit to Cleve Backster's with the actual Sboratory in San Diego (arranged to coincide target. Q with the committee's Q dueeting in La Jolla, California). The first subject to participate in such a formalC*4 series of trials was Q Q QThese site visits enabled the committee to the late Pat Price. In the first series, consistingQ observe firsthand the of nine sessions, the perimental arrangements and equipment used duration of each session was 30 minutes. The Q by some of the major transcript for each ,Contributors to parapsychological research. session is rich in detail; the one published T- They also provided us an transcript in Targ and portunity to discuss results, interpretations, Puthoff's first book runs to almost six printed a) and problems with a few pages (Targ and I- W i Onportant investigators. We were impressed Puthoff, 1977). Q with the sincerity and Q Sedication of these investigators and believe Given such data, how does one decide if the experiment that they are trying to was a (Wnduct their research in the best scientific success? Did Price's descriptions, for example, tradition. We also got the convey correct knowl- %npression that this type of research involves edge of the different target sites? In fact, IL many unresolved problems two methods have been ~hd still has a long way to go before it develops used to demonstrate the effectiveness of remote standardized, easily viewing. One method 'icable procedures. The information obtained is simply to compare the description with the from these site visits target and make a < fes not rovide an adequate basis for making J judgment as to whether the correspondence is scientific judgments. For sufficient to claim a P "hit." The second method uses an independent 0 e el ould in other fields of science, on a judge to rank the careful survey is w r y, as we w (of the literature. degree to which each description matches each Q site and then applies T- statistical tests to decide if the association is greater than chance. 00 00 Q RESEARCIi ON REMOTE VIEWING Unprecedented success was claimed for the early Q remote viewing Q experiments in terms of both methods (Targ and Q Puthoff, 1974, 1977; Q The SRI Remote Weiving Prograin 1976). Many examples were supplied of dramatic Q Puthoff and Targ Q , Q C*4 Since the early 1970s, probably the best correspondences between impressions of the percipientC*4 known research program and the physical parapsychology has been the experiments in details of the actual target. Such correspondences,(D remote viewing initiated no matter how U) physicists Harold Puthoff and Russell Targ dramatic and compelling, do not carry scientificM when they were at weight, because it -9R1 International. In a typical remote viewing is impossible to assess their probabilities. experiment a subject, In addition, much psycholog- W percipient, remains in a room or laboratory ical research indicates how such subjective validation with an experimenter, can create while a target team visits a randomly selected strong, but false, illusions of matching (see geographical site below). 0 E.g., a shopping mail, an outdoor arena, the The more formal evidence from the rankings of LL Palo Alto airport, the independent judges 6oover tower). Neither the experimenter nor was also impressive. The first formal series the subject has been of nine trials resulted in gven any information about the target. Once seven of the transcripts being ranked I against the experimenter and their intended target de subject are closeted in the laboratory, sites by the independent judge. Only one such 0 they wait for 30 minutes ranking would be twfore the subject begins to describe his or expected by chance. Puthoff and Targ reported CL her impressions of the the probability of get site. such an outcome being due to chance as only 0.0000029.CL The second Meanwhile the target team, consisting of two formal series, using Hella Hammid, was equally to four members of impressive, producing the SRI staff, obtains instructions for going to a randomly chosen five first places and four second places in the rankings of transcripts target site from another SRI staff member. against target sites. They then drive to the Although subsequent series by Targ and Puthoff, as well as by 178 ENHANCING HUMAN PERFORMANCE other investigators, have not always yielded such overwhelmingly impressive results, most of them have continued to display highly significant outcomes (Targ and Harary, 1984). On the surface, at lqast, this is a reliable, simple, and highly effective recipe for pfWucing paranormal communication. Especially appealing is the c8im that remote viewing works with just about everyone. Targ and Iftrary, for example, provide exercises for anyone who wants to c%elop and improve his or her ability to pick up information at remote 9s. Neither space nor time, its proponents assert, is a barrier. The t9, cipient can pick up information from the surface of Jupiter as well awfrom target sites that can be visited at some future time. Q W Ir Scientific Assessment of Remote Viewing &fter the first remote viewing experiments were conducted in the Wly 1970s, many investigators throughout the world tried to follow 9Wt. Most of them believed that their findings supported the claims t the SRI International researchers. The majority of these experi- Writs, however, consisted of informal demonstrations rather than *mal scientific experiments and relied solely on subjective matching. the past 15 years, the number of formal experimental replications the SRI remote viewing experiments has been surprisingly few. CDTarg and Harary (1984) include as an appendix in their book a Ir- Wort by Hansen, Schlitz, and Tart that evaluates all the known ornote viewing experiments conducted from 1973 through 1982. "In 'm examination of the twenty-eight formal published reports of gempted replications of remote viewing," write Targ and Harary, Cliansen, Schlitz, and Tart at the Institute for Parapsychology 2und that more than half of the papers reported successful out- Ornes." They concluded: "We have found that more than half i~fteen out of twenty-eight) of the published formal experiments Mve been successful, where only one in twenty would be expected 16. chance. " LLTwo comments may be in order with respect to the foregoing iMnelusion. First, given the enormous publicity and the unusually (b 8Xong claims, 28 formal experiments in 10 years seems surprisingly iew. In comparison, the Ganzfeld psi experiments produced approxi- Mately twice as many formal experiments during the same interval. cond, 13 of the 28 formal experiments, or 46 percent, failed to `!aim successful outcomes. This rate of failure is much higher than what might have been expected on the basis of the earlier claims by Targ and Puthoff (1977), namely, that they had succeeded with every subject they had tried. PARANORMAL PHENOMENA 179 Even 15 successful outcomes out of 28 tries is impressive, especially by parapsychological standards. An inspection of the listed studies, however, suggests that the 28 formal experiments vary considerably in their importance. Some of these "published formal experiments" appeared as brief reports or abstracts of papers delivered at meetings of the Parapsychological Association or similar organizations. Others appeared in print only as brief or informal reports in book chapters or letters to the editor. Altogether, 15 of the 28 were published under conditions that fall short of scientific acceptability. Only 13, or 46 percent, of the experiments were published under refereed auspices. As in other sciences, only published reports that have undergone peer review and are adequately documented can be con- sidered seriously as part of the scientific data base. Of the 13 scientifically reported experiments, 9 are classified as successful in their outcomes by Hansen et a]. (Targ and Harary, 1984). Seven of these nine experiments were conducted by Targ and Puthoff at SRI International, the remaining two at other labora tories. This relatively small harvest of nine "successful" experiments suffers from the fact that each is seriously flawed. A variety of e prob] 'ms afflicts the published reports on remote viewing. The documentation, even according to many parapsychologists, is seriously inadequate. Attempts by both neutral and skeptical investigators to gain access to the raw data have typically been thwarted or strongly resisted. Because the essence of scientific justification is public accessibility to the data, this relative inaccessibility suggests that much of the remote viewing data base is not part of science. Most of the reasons for questioning the acceptability of the evi- dence for remote viewing lie in a methodological flaw that char- acterizes all but one of the experiments deemed successful: the successive trials are not independent of one another. This lack of independence has unfortunate consequences for any attempt to draw conclusions about ESP based on the outcomes of such experiments. The concept of independence is technical and somewhat difficult to explain simply, but, since it is critical to understanding why the remote viewing experiments fail to make their case, we supply an intuitive explanation. Assume that we are considering a remote viewing experiment in which the subject participates in only two trials. In other words, we deal with two randomly chosen target sites. For the first trial, the target team goes to the first target site and remains there while the subject produces his or her first description. Immediately after this trial, the target team returns to the laboratory and takes the subject to the actual target site so that he or she and the others can gain a 180 ENHANCING HUMAN PERFORMANCE subjective impression of how closely the description corresponds with the target. For the second trial, the target team visits a second randomly chosen site. While they are visiting this site, the subject produces a second description. When the experiment is over, the list of target sites (in random T-order) and the transcripts of the subject's descriptions are given to Q Qa judge, who also visits each site. While at a given site, the judge Q reads the two transcripts and ranks them in terms of how well each C*4 , Mone corresponds with the particular site. In our example, one of the Q Qtranscripts will be ranked I and the other will be ranked 2 (with 1 C*4 Qindicating the better correspondence between that target and the Q Qtranscript). After visiting one site and doing this ranking, the judge Wthen visits the second site and repeats the ranking procedure. The Ir- a)raw data can be set out in a matrix with the target sites as the [*-columns and the transcripts as the rows. Q Q A perfect outcome would .be indicated if the transcript produced (6at the time the team was visiting site A was ranked I against that a) site, and the transcript produced when the team was visiting site B IL Owas ranked I for that site. (Of course, two trials would be too few (~to make an adequate statistical assessment of the success of the example, that if he or she ranks the first transcript I for target A, 0 "then he or she will probably rank the second transcript I for target r'L CLB. In effect, this lack of independence between trials means that, ooAs we add sessions, this effect of immediate feedback should continue to matthe correlation between the viewer's descriptions and the ta' rget sites better anddwtter. No amount of editing for overt clues can overcome this defect of remote viewing experiments that follow the SRI pattern of dependent trials and immediate feedback. The mechanism described by Hyman PARANORMAL PHENOMENA 183 should result in some dramatic correspondences. These dramatic corre- spondences, in conjunction with subjective validation, are a highly potent recipe for creating the illusion (for both experimenters and subjects) that ESP has occurred. Palmer (1985), a major parapsychologist who otherwise carefully considers the criticisms of parapsychology, misses the seriousness of this flaw. In mentioning Hyman's criticism, he writes (p. 50): since the subjects in most cases It has been suggested by Hyman (1979) that received feedback of the correct target after each trial, the subject could have gained some advantage by avoiding to mention characteristics of targets in earlier trials in their responses in later trials. As noted byTarg, Puthoff, and May (1979), the target pool for the geographical-site experiments was sufficiently large and contained sufficient redundancy that this is unlikely to be a significant biasing factor. Perhaps such complacency has enabled experimenters to continue con- ducting remote viewing experiments with this fatal flaw. In fact, the size of the target pool, no matter how large, does not affect the validity of Hyman and Kennedy's criticism. Nor does the claim that the pool contained sufficient redundancy make much difference. Each geographical site is unique and contains a combination of specific characteristics that distinguishes it from the other sites in a given series. Indeed, as the parapsychologists themselves have asserted, unless this were so, there would be no possibility of the transcripts' being uniquely associated with a given target site. In every one of the remote viewing experiments that ility of thejudges' being allows the possibility of subtle cueing, the possib able to make completely successful matchings because of this artifact is highly plausible; and as long as a highly plausible, normal alternative to ESP can account for the apparent success of the outcomes the parapsy- chologists, by their own standards, cannot claim evidence for paranormal transmission of information. As it turns out, all but one of the nine scientifically reported studies of remote viewing (at the time of the Targ and Harary survey) suffer from the flaw of sensory cueing. The one experiment that cannot be faulted t of Schlitz for this reason is the long-distance remote viewing experimen and Gruber (1980). However, as Hyman (1984-1985) has pointed out, this experiment suffers from another very serious flaw. Gruber, who was as familiar with the targets, t, ber of the target team and thus w a mem translated the subject's target descriptions into Italian for the judging process. Why the experimenters allowed such potential sources of biased experimental procedures is not known, but the violation obviously negates the results as evidence for psi. Since the Targ and Harary survey, we have learned of two attempts 'a > 0 L_ CL CL 184 ENHANCING HUMAN PERFORMANCE to replicate the Schlitz and Gruber experiment without the flaw mentioned. One, still unpublished, produced negative results. The second, by Schlitz and Haight (1984), produced marginally significant results. Indeed, if the more,aCCeptable two-tailed test of significance had been used, the results would not have been considered significant by customary standards. AlTough the report of this study lacks sufficient documentation with reFyect to certain aspects of procedure, both Palmer (1985) and Alcock agMe that this is the best controlled and most methodologically sound of alahe remote viewing experiments so far. 23 summary, after approximately 15 years of claims and sometimes bilar controversy, the literature on remote viewing has managed to prDuce only one possibly successful experiment that is not seriously flaged in its methodology-and that one experiment provides onlyl mftinal evidence for the existence of ESP. By both scientific and pz&psychological standards, then, the case for remote viewing is not ju8 very weak, but virtually nonexistent. It seems that the preeminent pogtion that remote viewing occupies in the minds of many proponents remits from the highly exaggerated claims made for the early experiments, as&ell as the subjectively compelling, but illusory, correspondences that eNarimenters and participants find between components of the descrip- ti* and the target sites. /Z RESEARCH ON RANDOM NUMBER GENERATORS The Basic Paradigm ~Cct CD e use of random number (or random event) generators for Patzpsychological research began in the 1960s and became relatively stfldard during the 1970s as the technology became widely available. A (Wndom number generator (RNG) is simply an electronic device th'a uses either radioactive decay or electronic noise to generate a secMence of random symbols. Originally such devices were used to tesVESP, usually clairvoyance or precognition, but the most wide- spKad and widely known work focuses on what is called microps ch4inesis, or micro-PK. In such research a subject, or o perato:r: atMmpts to mentally bias the output of the random number generator, so'gat it produces a nonrandom sequence. TAost of the work with RNGs has used binary generators, or what Scgfiiidt calls "electronic coin flippers." The output on each trial is liher 0 or 1, that is, heads or tails. If the RNG is unbiased and truK random, then it should produce, on control runs, sequences of Os and is that are independent of each other and that, in the long run, will yield Is 50 percent of the time. 185 PARANORMAL PHENOMENA In a typical experiment, a subject (either a person who claims to be chosen for availability who does not make such apsychic or a person claims) is placed in the vicinity of the RNG and attempts to bias the output either toward more or fewer Is. When an animal is used as ally coupled to an outcome whose the subject, the RNG output is usu frequency the animal presumably would like to either increase or decrease. In an experiment carried out with cockroaches, for example, one outcome was electric shock. If, during the time the output of G was coupled with the shock apparatus, the proportion of the RN shocks decreased below 50 percent, this would be taken as evidence of a psychokinetic effect of the cockroach on the output of the RNG. The RNG experiments have been of interest to some military and governmental personnel because of the possibility, if such micro-PK is demonstrable, of psychically affecting equipment and computers that depend on the output of electronic symbols. Results of the Experiments In a recent survey 56 reports published between 1969 and 1984 and dealing with research on possible psychokinetic perturbations of binary RNGs (Radin, May, and Thomson, 1985), the reviewers counted 332 separate experiments. Of the 332 experiments, 188 were reported in refereed journals or conference proceedings, and of these 188 experiments with some claim to scientific status, 58 reported statis- tically significant results (compared with the 9 or 10 experiments that would be expected by chance). The other 144 experiments were produced by the Engineering Anomalies Research Laboratory at Princeton University; none of them had been published in a refereed ournal. at the time of the survey. Of these 144 experiments, 13 were classified as yielding statistically significant results. So, in the total sample of 332 experiments, 71 yielded ostensibly significant results at the traditional .05 level. This amounts to a success rate of approximately 21 percent, compared with the rate of 5 percent that would be expected by chance. Palmer (1985) and Alcock agree that such results cannot be accounted for by chance. In other words, both the parapsychologist and the skeptic, in their respective reviews of the RNG research, agree that something other than accidental fluctuation is producing these results. Palmer calls this something an anomaly, which, while it may or may not be paranormal, cannot be explained by current scientific theories. Alcock points to various defects in the experimen- tal protocols and concludes that no conclusions about the origins of these departures from randomness are justified until successful 186 ENHANCING HUMAN PERFORMANCE outcomes can be more or less consistently produced with adequately designed and executed experiments. Both Palmer and Alcock focus their reviews on the two most influential research programs on RNGs. One is the program of Helmut Schmidt, a quantum physicist who began working on psi and RNGs in fq69. The other is the program begun by Robert Jahn in the late 970s, when he was dean of the School of Engineering and Applied Zience at Princeton University (see Jahn, 1982). These two programs five accounted for almost 60 percent of all known experiments on 10 Gs. They have also been the most consistently successful in 2chieving statistically significant outcomes. C*4 j3AIthough the results suggest that on each experimental group of X!%, 5dls the number of Is is greater or less than the 50 percent baseline (Mpending on the intended direction), the actual degree of deviation Q,m chance is quite small. As Palmer (1985) indicates, Schmidt's Y tPiects have averaged approximately 50.5 percent hits over the years, 4ompared with the expected baseline of 50 percent. This amounts to pj!vducing one extra I every 100 trials. The reason such a small Eparture from chance is statistically significant is that an enormous 4amber of trials is conducted with each subject. Mahn and his colleagues at Princeton have, in a much shorter time, I fWpduced on the order of 200 times the number of trials that Schmidt a in 17 years. The Princeton researchers have also produced a significantly lower success rate than Schmidt. In their formal series e 78 million trials, the percentage of hits in the intended direction &s only 50.02 percent, or an average of 2 extra hits every 2,500 W els. Again, such an extremely weak effect is statistically signifi- 9t only when one is dealing with very large numbers of trials. Q C*4 W Scientific Assessment of the RNG Experiments U) @Palmer (1985) carefully reviews the major criticisms of the work ZSchmidt and Jahn. He addresses questions about security, because Xiects often are left alone with the apparatus during the data c&ection. In the Princeton experiments, the data arc always col- ILL-ted when the subject is alone with the apparatus. Although the nceton experiments now contain a number of features that would I nX&ke it extremely difficult for a naive subject to bias the results, it Zot clear that this has always been so. It would make good scientific s&se to conduct some trials during which the subject is carefully ra*nitored to see if successful outcomes are still obtained. The major reservations about the RNG experiments concern the adequacy of the randomization of the outputs. Schmidt applied only limited tests for the randomness of his machines, and most of the PARANORMAL PHENOMENA 187 control trials were gathered by allowing the machine to run for long periods, usually overnight. Although these controls usually produced results in line with the chance baseline, critics have pointed out that the controls are unsatisfactory because they were not conducted for shorter runs and at the same time as the data from the experimental sessions. Palmer grants that the critics are correct in pointing out some of the shortcomings in Schmidt's methods for testing and controlling for the randomization of his machines. Palmer also correctly points out that such criticism is somewhat blunted by the fact that the critics have not specified any plausible mechanisms that would account for the obtained differences between the experimental and control trials. He is correct in pointing out that the Princeton experiments provide more adequate controls; however, he has probably assumed that the baseline controls in the Princeton experiments were run at the same time as the two experimental conditions of hitting and missing. It is easy to interpret the somewhat ambiguous description of the procedure in this manner. The relevant part of the authors' methodological description is as follows (Nelson, Dunne, and Jahn, 1984:9): The 'Primary variable in these experiments is the operator's pre-recorded intention to shift the trial counts to higher or lower numbers. This direc- tional intention may be the operator's choice-the so-called "volitional" mode-or it may be assigned by a specified random process-the "instructed" mode. In either mode, data are collected in a "tri-polar" protocol, wherein trials taken under an intention to achieve high numbers (PK+), trials taken under an intention to achieve low numbers (PK-), and trials taken as baseline, i.e. under null intention (13L), are interspersed in some reasonable fashion, with all other operating conditions held identical. For all three streams of data, effect size is measured relative to the theoretical chance mean. This tri-polar protocol is the ultimate safeguard in precluding any artifacts such as residual electronic biases or transient environmental influences from systematically distorting the data. At first glance it might appear as if the tripolar protocol requires that the two types of experimental groups of trials and the baseline group of trials always be taken at the same session. This would be consistent with the claim that "any artifacts such as residual electronic biases or transient environmental influences" were thereby precluded "from systematically distorting the data." Such a claim would be justified if, in fact, at each session one group of trials of each of the three types was obtained, provided that each group of trials was of the same length and that the order of the three types of trials was independently randomized for each session. The description provided by Nelson and his colleagues says nothing 'a W > 0 CL CL < 188 ENHANCING HUMAN PERFORMANCE at all about the order in which the three conditions were conducted, ays and a careful reading indicates that the baseline data may not always have been obtained at the same sessions and under the same conditions as the experimental groups of trials. It is not clear what the authors mW by stating that the three trials "are interspersed in some re44onable fashion." In fact, an examination of the data reported foQ each subject makes it clear that the strict tripolar protocol cog not possibly have been followed with much of the data coftction, because in many cases the baseline data are entirely abInt or occur with many fewer trials than the experimental data. In ed, it is not even clear that PK + and PIC - trials were always ained at the same sessions, because for some subjects the total 0 bp n1bers of these trials are not equal. e suspect that, over the six years or so during which the Princeton gritp was accumulating its data base, it made many changes in both thE hardware and the experimental protocol. The sophisticated prpAedures currently in use and the requirement that the three types of&ials be of equal length and that one of each be conducted at eal§ session are the most recent variations in the paradigm. Unfor- tu&tely, the data are not presented in such a way that it is possible to 4etermine whether the successful results are due to the earlier ole later experiments. Vch issues become especially important when we consider the exbemely small size of the effect being claimed and when we further reMize, as Palmer has pointed out, that the bulk of the significance in&e formal series was due to just one subject, who contributed 23 pedent of the total data. This one subject achieved a hit rate of 50~5 percent. When her data are eliminated, the remaining data yigffl a hit rate of 50-01 percent, which is no longer significantly dillirent from chance. 11 other words, it looks as if almost all the success of Jahn's huge da§ base can be attributed to the results from one individual, who, wjLr the years, produced almost 25 percent of the data. This one indLvidual was not only the most experienced subject, but also, n2urnably, familiar with the equipment. When combined with the faq6 as Palmer points out, that the Princeton experiments provide inQequate documentation on precautions to prevent tampering by sulSects, it becomes even more important to see if the same degree of sufLess can be achieved when the sessions are adequately monitored. Cock, in his review of the same RNG studies surveyed by Palmer, 't poin s to a number of weaknesses in both the Schmidt and the Princeton experiments. For example, he faults Schmidt's experiments for such things as inadequate controls, failure to examine the target se- PARANORMAL PHENOMENA 189 quences, overcomplicated experimental setups, inadequate tests of randomness, and lack of methodological rigor. Alcock faults the Princeton experiments for such things as failing to randomize the sequence of groups of trials at each session, inadequate documentation on precautions against data tampering, and possibilities of data selection. Palmer and Alcock do not really differ in their assessments of the shortcomings of the Schmidt and Princeton RNG experiments. They do differ, however, on what conclusions can be drawn from such imperfect experiments. Palmer emphasizes the fact that the critics have not provided plausible explanations as to how the admitted flaws could have caused the observed results. His position seems to be that, unless the critics can provide such plausible alternatives, the results should be accepted as demonstrating an anomaly. Alcock focuses on the fact that the successful results have been obtained under conditions that fall short of the experimental ideals that parapsychologists themselves profess. He emphasizes that the para- psychologists have no right to claim to have demonstrated psi from experiments that have been conducted with "dirty test tubes." Such a revo 'lutionary conclusion as the existence of psi demands justifi- cationfrom experiments that have clearly used "clean test tubes." What would it take to conduct an adequate RNG experiment? May, Humphrey, and Hubbard (1980) set out to do just that. After reviewing all available RNG experiments from 1970 through 1979 and taking into account the various deficiencies in these experiments, they gathered together and meticulously tested the components necessary to provide adequately randomized trials. They also devised a careful experimental protocol and set out in advance the precise criteria that would have to be fulfilled before they could call their results successful. Going further, after they completed the experi- ment with results that met their criteria for success, they subjected their equipment to all sorts of physical extremes to see if they could obtain such a degree of success by a possible artifact. They report that this singularly well controlled RNG experiment in fact met their criteria for success. It is unfortunate, therefore, that this carefully thought-out experiment was conducted only once. After the one successful series, using seven subjects, the equipment was dismantled, and the authors have no intention of trying to replicate it (personal communication, August 1986). It is unfortunate because this appears to be the only near-flawless RNG experiment known to us, and the results were just barely significant. Only two of the seven subjects produced significant results, and the test of overall significance for the total formal series yielded a probability of 0.029. > 0 CL CL 190 ENHANCING HUMAN PERFORMANCE The experiment, while nearly flawless, still had some problems as evidence for psi. For one thing, it was reported only in a technical report in 1980 and has never been published in a refereed scientific journal. Despite the admirable attention to details, all the control trials were taken w1jan no human being was present. One might argue that this was not an id*41 control for the experimental session, in which a subject was PPRI.-ically present in the room. The authors have assured us that their vaOous attempts to bias the machine by physical means almost certainly rA out the possibility that the mere presence of a human being could hg affected the output. However, a physicist who claims to have scgral years of experience in constructing and testing random number dc5ces tells us that it is quite possible, under some circumstances, for tbRhuman body to act as an antenna and, as a result, possibly bias the onvut. ,W-4 a yand his colleagues at SRI, in the same technical report in which t* claim successful results for their single experiment, surveyed all the ,RXG experiments known to them through the year 1979 and found that W t1or combined significance was astronomically high. They add (May, F&phrey, and Hubbard, 1980:8): T* impressive statistic must, however, be evaluated with respect to experimental e40pment and protocols. All the studies surveyed could be considered incomplete irut least one of the following four areas: (1) No control tests were reported in m~Fe than 44 percent of the references. Of those that did, most did not check fdPtemporal stability of the random sources during the course of the experiment. (25;-There were insufficient details about the physics and constructed parameters o&e experimental apparatus to assess the possibility of environmental influences. (--Whe raw data was not saved for later and independent analysis in virtually afR of the experiments. (4) None of the experiments reported controlled and liamted access to the experimental apparatus. &s far as we can tell, the same four points can be made with respect tk-tO)1L,he RNG experiments that have been conducted since 1980. The strVation for the RNG experiments thus seems to be the same as that for rMote viewing: over a period of approximately 15 years of research, oty one successful experiment can be found that appears to meet most oLlhe minimal criteria of scientific acceptability, and that one successful exieriment yielded results that arejust marginally significant. 0 L- RESEARCH ON THE GANZFELD CL CL < The Ganzfeld Experiments The Ganzfeld psi experiments are named after the term used by Gestalt psychologists to designate the entire visual field. For PARANORMAL PHENOMENA 191 theoretical purposes, the Gestalt psychologists wanted to create a situation in which the subject or observer could view a homogeneous visual field, one with no imperfections or boundaries. Psychologists later discovered that when individuals are put into a Ganzfeld situation they tend quickly to experience what they described as an altered state of mind. In the early 1970s, some parapsychologists decided that the use of the Ganzfeld would provide a relatively safe and easy way to create an altered state in their experimental subjects. They believed that such a state was more conducive to picking up the elusive psi signals. In a typical psi Ganzfeld experiment, the subject, or percipient, has halved ping-pong balls taped over the eyes. The subject then reclines in a comfortable chair while white noise plays through earphones attached to his or her head. A bright light shines in front of the . . . . . .subject's face. When seen through the translucent ping-pong balls, the light is experienced as a homogeneous, foglike field. When so prepared, almost all subjects report experiencing a pleasant, altered state within 15 minutes. While one experimenter is preparing the subject for the Ganzfeld - second experimenter randomly selects a target pool from a state, a large 'et. The target pool typically consists of four possible targets, s usually reproductions of paintings or pictures of travel scenes. One of the four is chosen at random to be the target for that trial. The target is given to an agent, or sender, who tries to communicate its substance psychically to the subject in the Ganzfeld state. After a designated period, the subject is removed from the Ganzfeld state and presented with the four candidates from the target pool. The subject then ranks the four candidates in terms of how well each matched the experience of the Ganzfeld period. If the actual target is ranked first, the trial is designated a hit. An actual experiment consists of several trials. In the example, the probability is that one of every four trials will produce a hit. If the number of hits significantly exceeds the expected 25 percent, then the result is considered to be evidence for the existence of psi. Critique of the Ganzfeld Experiments In a careful and systematic review of the Ganzfeld experiments undertaken in 1981 and published in the March 1985 issue of the Journal of Parapsychology, Hyman concluded that the data base exhibited flaws involving multiple testing, inadequate controls for sensory leakage, inadequate randomization, statistical errors, and inadequate documentation. These flaws, in his opinion, were sufficient 192 ENHANCING HUMAN PERFORMANCE to disqualify the Ganzfeld data base as evidence for psi. Of the 42 experiments, 39 (93 percent) used multiple analyses, which artificially inflated the chances of obtaining significant outcomes. Only 11 (26 percent) 'clearly indicated that they had adequately randomized the tar et selections. As many as 15 (36 percent) used inferior randomi- za&n, such as hand shuffling, or no randomization at all. The regining 16 experiments did not supply sufficient information on how th had chosen the targets. As many as 23 of the experiments (55 p~gent) used only one target pool, which means that the subject w handed for judging not a copy of the target but the very same ta t that the percipient had handled, permitting the possibility of s'1 e ory cueing. Although the argument for psi is mainly a statistical orF4 the reports of 12 experiments (29 percent) revealed statistical ermrs. A number of other departures from optimal practice were also fb~pd. ained a 0he same issue of the Journal of Parapsychology contained a leSthy rebuttal by parapsychologist Charles Honorton, one of the pCneers of the Ganzfeld psi technique. Honorton disputed many of _gnan's opinions as to what constituted flaws; provided a reanalysis H o164,,e data base to overcome many of the statistical weaknesses of thSt original experiments; and argued that the flaws he agreed existed Nze not sufficient to have accounted for the findings, In this respect hWanalysis is consistent with Palmer's approach. He does not deny ta the experiments depart from optimal design, but he argues that sa departures are insufficient to account for the results. 5~onorton and Hyman had the opportunity to discuss their differ- eimes about psi in general at the Parapsychological Association natings in 1986; as a result, they agreed to draft a joint communiqu6 toWmphasize those points on which they agree. That communiqu6 al*eared in the December issue of the Journal of Parapsychology Qkman and Honorton, 1986). They agree that the current data base is nsufficient to support either the conclusion that psi exists or the § cUclusion that the results are due to artifacts. They further agree thOt the issue can be settled only by future experiments conducted Aording to the stated standards of parapsychology, which are also tq accepted standards of psychological research. ~Ilnother important input to the committee's judgment on the 0 zfeld research was the systematic evaluation of the contemporary G& p?Lapsychological literature by Charles Akers (1984), a former pltapsychologist. Akers's critique used a methodological strategy different from that used by Hyman. Hyman undertook to evaluate the entire data base of a single research paradigm (Ganzfeld), including both successful and unsuccessful outcomes. Akers surveyed PARANORMAL PHENOMENA 193 contemporary ESP experiments broadly, but confined his evaluation to those that had produced significant results with unselected subjects. Hyman assigned flaws to experiments without regard to whether each flaw, by itself, could have caused the observed outcome. Akers charged a flaw to a study only if he thought the flaw could have been sufficient to produce the observed result. He chose a sample of 54 parapsychological experiments from areas of research that had been previously reviewed by Honorton or Palmer; his intent was to choose experiments that could be viewed as the best current evidence for the existence of psi. As a result of this exercise, he concluded (Akers, 1984:160-161): Results from the 54-experiment survey have demonstrated that there are 7. many a Iternative explanations for ESP phenomena; the choice is not simply between psi and experimenter fraud.... The numbers of experiments ... flawed on various grounds were as follows: randomization failures (13), sensory leakage (22), subject cheating (12), recording errors (10), classification or scoring errors (9), statistical errors (12), reporting failures (10).... All told, 85% of the experiments were considered flawed (46/54). This leaves eight experiments where no flaws were assigned.... Although none of, these experiments has a glaring weakness, this does not mean that they are, especially strong in either their methods or their results.... In conclusion, eight experiments were conducted with reasonable care, but none of these could be considered as methodologically ideal. When all 54 experiments are considered, it can be stated that the research methods are too weak to establish the existence of a paranormal phenomenon. RESEARCH ON ELECTRICAL ACTIVITY AND EMOTIONAL STATES The Backster Laboratory In addition to examining parapsychological research in areas that have produced large literatures, the committee witnessed an example of experimental work at a far less developed stage. On February 10, 1986, committee members visited the Backster Research Foundation in San Diego and saw a demonstration of experimental procedures for detecting a correlation between the electrical activity of oral leukocytes and the emotional states of the donor. Cleve Backster is a polygraph specialist who had at one time helped develop interrogation techniques for the Central Intelligence Agency and now runs his own polygraph school in San Diego. The school is housed in the same rooms that constitute the Backster Research Foundation, which is devoted to the study of what Backster refers to as primary perception. Backster's research on paranormal matters 194 ENHANCING HUMAN PERFORMANCE PARANORMAL PHENOMENA began in February 1966, when he recorded, from a philodendron plant that he had hooked up to a polygraph, a response he recognized as similar to that of human beings in emotional states. Backster believed he had demonstrated that the plant showed such emotional response wflVn brine shrimp or other living organisms were either threatened orlr- ctually killed in an adjoining room. The notion of primary pe0eption in plants became both a popular subject for research and Q a 1-5jhly controversial concept during the late 1960s and early 1970s. 8e were told that Backster has quietly continued his researches int for c9 this and related matters. He has now devised a technique recDrding electrical activity in leukocytes taken from a donor's m(Sth. The advantage of this technique, we were told, is that the ledY.ocytes respond mostly to emotional states of the donor. Wile committee member volunteered to be the demonstration subject. A her member accompanied him to observe the techniques for obt9tining the leukocytes and preparing them for recording. The safflnle was obtained by having the subject "chew" on a 1.2 percent saffe solution and then spit it back into a centrifuge tube. Ten such sa6k)les were obtained in this way. The samples were then spun in a ce4rifuge for six minutes, and the particulate matter at the bottom of4ach tube was pipetted into the preparation tube. The preparation tuG contained about one centimeter of particulate matter and was filittl almost to the top with 1.2 percent saline solution. Two urRsulated wire electrodes were inserted into the bottom of the tua, which was then placed within a shielded cage and connected bReads to an EEG-type recording apparatus. Suring the demonstration, the subject sat approximately two meters fr(a the preparation. We were told that subjects usually sit about fiva) meters from the preparation. A split-screen projection video ditay was provided: the lower portion of the screen recorded the memements of the polygraph paper and pen as they produced a record of3he electrical activity presumably taking place in the leukocyte prtaration. The upper portion of the screen recorded the behavior of C%e seated subject. Y6 his previous research using this arrangement, Backster reported thZ when the subject revealed an emotional reaction, the electrical actg)n of the leukocytes showed a corresponding reaction. During out- demonstration, the polygraph record produced several strong de&ctions in both the control and the experimental series, but they dieK not obviously correlate with any corresponding thoughts or emotional states of the subject as various stimuli were presented. Backster suggested that this was probably because so many people were crowded into the laboratory that the leukocytes were respond- 195 ing to thoughts and feelings of other individuals in the room. Thus, a demonstration of results, as opposed to techniques, was not, after all, going to be possible during our visit. A Backster then showed us videotapes of the split-screen results he had obtained in his "formal" experiments. The results consisted of 12 examples of apparent correlations between an emotional response and a deflection of the polygraph record. The 12 examples came from 7 sessions with 7 different subjects. Although the information is not given in his written report, it appears that each session lasted for approximately half an hour. During this time, the donor is engaged in conversation or watches videotapes of television programs. The sessions are not standardized or planned. Backster's intent, appar- ently, is to elicit spontaneous emotional responses from a subject during the session. He believes that a stimulus that evokes an emotional response in one subject will not necessarily do so in another subject. In one example, the subject was a young man who was looking at an issue of Playboy magazine. The polygraph tracing began to display large deflections soon after he encountered a nude photograph of an attractive young woman. The large deflections continued for approximately two minutes; the tracing slowly settled down to normal activity after the magazine was closed. Soon after, the young man reached for the closed magazine, and the record reveals a single deflection "at that point. In another example, the subject was a retired police lieutenant. When discussing his approaching retirement, he was asked a question about his wife's attitude toward having him underfoot." A large deflection of the polygraph tracing occurred soon after this question was asked. When asked, the donor confirmed that he was emotionally aroused at that moment in the session (see Backster and White, 1985). Cleve Backster and his supporters apparently believe that he has successfully demonstrated that detached oral leukocytes respond to the emotions of their donor even when separated by as much as several miles. They also believe that these results are reliable and replicable. Critique of the Backster Experiment What we have read and observed about Backster's procedures does not justify the claim he is making. His answers to our questions made it clear that he has not considered using the appropriate controls needed to ensure that the obtained "correlations" are real and due to the causes he has assumed. To make adequate physiological recordings from a > 0 CL CL < 196 ENHANCING HUMAN PERFORMANCE preparation of in vitro leukocytes and to demonstrate the correlation between emotional response and leukocyte activity requires experimental arrangements and procedures at a level of sophistication well beyond those we observed. CTnmittee members who are knowledgeable about the procedures and mentation of psychophysiological experiments expressed doubts inste aboal the adequacy of the setup to perform the tasks Backster has un(Rtaken. Serious doubts were expressed about the possibility that the leul~cytes were alive at the time of recording. Further doubts were expossed about the setup's ability to avoid contamination of the recording profgdures by stray influences of various sorts. We do not discuss these od acks in detail here. We confine our discussion to Backster's meth drag tached for IXtablishing a correlation between the alleged activity of the de teuIMcytes and the emotional state of the donor. When we consider how istence of such correlations was established, we again see how themx inaisropriate methodology can lead to very misleading conclusions. 4ny problems exist with regard to Backster's procedures for detecting corffations. In trying to demonstrate a pattern of covariation between twq3ecords of behavior over time, one record is the tracing of amplified ele&ical activity coming from the electrodes and through the leads. Alt ' ugh this tracing can be quantified, Backster has apparently made no e)tempt to do so. Instead, he has relied on visual inspection of the polygraph record to pick out points at which the deflections of the pen frog the baseline are noticeable. Although such subjective judgment is sciWifically unacceptable, the deflections that he uses in his examples see!R sufficiently marked that they probably can be considered to be real de,ations from the baseline. At any rate, let us assume that responses on Che polygraph record can be visually pinpointed with reasonable C*4 0b" tivity. deflections on the polygraph record are then compared with hap&nings on the concurrent videotaping of the conversation with the suMct. Here we encounter very serious problems as to what constitutes an 19notional response on this behavioral record. Backster believes he can8dentify categories of potentially emotionally arousing stimuli in the noWandardized, qualitative, ongoing record of conversation. He then cangetermine if the subject was experiencing an emotional reaction to suc5; a stimulus by simply replaying the record, pointing to the segment thai2corresponds to a place where the polygraph showed a deflection, andEisking the subject if he or she recalls what was taking place at that moc4ent as an emotionally arousing experience. If the subject agrees, this is said to confirm a "correlation" between the emotional state and the corresponding activity of the tracing. Such a purely subjective determination of an emotional response opens PARANORMAL PHENOMENA 197 the process to a variety of known biases, many of them discussed in the paper prepared for the committee by Griffin (Appendix B). The literature on "illusory correlation" (Alloy and Tabachnik, 1984; Griffin paper) makes it clear how subjective expectations and cognitive biases can lead to false impressions of correlation. Backster's method of searching for correlations compounds these inevitable biases: he does not independently determine moments of emotional response in the subject's behavioral record and moments of polygraph deflections and then look for a match between the two. Instead, he apparently looks for polygraph deflections and then tries to determine if an emotional response can be found that Occurred in the vicinity of the polygraph activity. In other words, the determination of the emotional response is done with full knowledge of the fact that a polygraph deflection has occurred. Under such circumstances, we would expect processes of subjective validation to operate. In addition, the method of verifying the emotional response, by asking the subject to acknowledge that he or she was in fact experiencing such a state at the moment the polygraph record indicated a leukocyte response, is itself suspect. This is the sort of circumstance in which demand characteristics (i.e., responses determined by the presumed intent of the experimenters) are known to operate. Goo&science dictates that the moments of emotional response should be determined independently of the moments of polygraph response. I Both the experimenter and the subject must be blind to the polygraph record when determining the moments of emotional response. Only when the determination of events on the two records has been made independ- ently of each other can the records be compared to determine if the emotional responses and the polygraph activity are correlated. Illusory correlations occur because our subjective judgments of cov ariation tend to use only a portion of the relevant information and because we tend to bias observed events in terms Of Our expectations. In particular, intuitive judgments of covariation tend to focus only on the co-occurrence j~` 11 0 f treatment of interest and successful outcomes, ignoring times when the treatment co-occurred with unsuccessful outcomes. Backster uses only those examples from his records in which an emotional response co-occurs with a Polygraph deflection; the 12 such examples from the 7 1 experimental series represent a very small fraction of the total data collected. Not only is a sample of just 12 co-occurrences probably too small for estimating whether a true correlation exists, but it is also impossible from this information alone to estimate whether any correlation exists All the data are needed for this purpose. Almost certainly, more than 12 polygraph deflections must have appeared in the total record. In the brief demon- stration for the committee, both the control and the experimental series 'a > 0 CL CL 198 ENHANCING HUMAN PERFORMANCE yielded several deflections, so it is reasonable to assume that many more than 12 deflections were obtained in the complete record. It is likely that these unreported deflections were not preceded by any emotional re- sponses. Almo~ t certainly, more than 12 emotional responses must have appeared s in-the total record. The point of conducting the sessions was to expose t1gsubjects to a variety of emotional stimuli; therefore, it is essential to kaw the number of times that emotional responses occurred without the c4*esponding occurrence of polygraph responses. Finally, to determine ccrelation, it is essential to know the frequency of co-occurrence of the alea-rice of emotional responses and the absence of polygraph responses. 811 this information is needed to determine whether the claimed cm-elation exists. All the data must be used. From these data, one can c%pare the proportion of times that an emotional response is followed b polygraph response with the proportion of times that the absence 0sn emotional response is followed by a polygraph response. Only if th%i~e two proportions are significantly different from one another can we alm1me that the data provide evidence for a correlation between emotional W reaponse and leukocyte activity. The fact that Backster was able to find 120examples of the co-occurrence between emotional response and P raph deflection, even if these correspondences had come from d1gle-blind matching, provides us with absolutely no information about W14ther a correlation exists. 9he stronger claim would be, of course, not that a correlation exists, bi2t-that a causal connection exists between the subject's emotional states Tom a the responses of the detached leukocytes. As Chapter 3 on evaluation i ates, such a causal explanation requires much more than the inj d dMonstration of correlation between two series. Because Backster did n(Ruse double-blind procedures to determine emotional responses, and because the procedures he did use are known to be just those that fagitate the occurrence of a variety of subjective biases, he may well hqLPe obtained a correlation between his two series. However, his PrEedures for finding such correlations are sufficiently flawed that we do,not know if in fact the suspected (and presumably biased) correlation acMally does exist in his data. The Backster experiment indicates that LL the best intentions combined with scientific instrumentation and poly- 'a gr;Whic records cannot, in themselves, guarantee data of scientific quality. > 0 CL DISCUSSION OF THE SCIENTIFIC EvIDENCE CL 1~oth the parapsychologists cited in this report and the critics of parapsychology believe that the best contemporary experiments in para- psychology fall short of acceptable methodological standards. The critics PARANORMAL PHENOMENA 199 conclude that such data, based on methodologically flawed procedures, cannot justify any conclusions about psi. The parapsychologists argue that, while each experiment is individually flawed, when taken together they justify the conclusion that psi exists. Palmer's conclusion in this regard is unique. Although he agrees that the data do not justify the conclusion that a paranormal phenomenon has been demonstrated, he argues that the data, with all their drawbacks, do justify the conclusion that an anomaly of some sort has been demonstrated. It is this purported demonstration of an anomaly that, according to Palmer, furtherjustifies the claim that parapsychologists do have a subject matter. The awkward aspect of Palmer's position is that, without an adequate theory, there is no way to know that the anomaly "demon- strated" in one experiment is the same anomaly "demonstrated" in another; indeed, there is no limit to the possible causes of the anomaly in a given experiment. Without an adequate theory, there is no reason to assume that the various anomalies constitute a coherent or intelligibly related class of phenomena. The committee distinguishes among three types of criticism that can be leveled at a given parapsychological finding. The first is what we might refer to as the smoking gun. This type of criticism asserts or strongly implies that the observed findings were due not to psi but to factor X. Such 'a claim puts the burden of proof on the critic. To back up such a claim, the critic must provide evidence that the results were in fact caused by X. Many of the bitterly contested feuds between critics and proponents have often been the result of the proponent's assuming, correctly or incorrectly, that this type of criticism was being made. The second type of criticism can be referred to as the plausible alternative. In this case, the critic does not assert that the result was due to factor X, but instead asserts that the result could have been due to factor X. Such a stance also places a burden on the critic, but one not so stringent as the smoking gun assertion. The critic now has to make a plausible case for the possibility that factor X was sufficient to have caused the result. For example, optional stopping of an experiment on the part of a subject can bias the results, but the bias is a small one; it would be a mistake to assert that an outcome was due to optional stopping if the probability of the outcome is extremely low. Akers's critique, which was previously discussed, is an example based on the plausible alternative. The third type of criticism is what we have called the dirty test tube. In this case, the critic does not claim that the results have been produced by some artifact, but instead points out that the results have been obtained under conditions that fail to meet generally accepted standards. The gist of this type of criticism is that test tubes should be clean when doing 01 0- CL CL 200 ENHANCING HUMAN PERFORMANCE careful and important scientific research. To the extent that the test tubes were dirty, it is suggested that the experiment was not carried out according to acceptable standards. Consequently, the results remain suspect even though the critic cannot demonstrate that the dirt in the te4Ltubes was sufficient to have produced the outcome. Hyman's critique of,~be Ganzfeld psi research and Alcock's paper on remote viewing and racdom number generator research are examples of this type of criticism. Pa the committee's view, it is in this latter sense, the dirty test tube sepje, that the best parapsychological experiments fall short. We do not hact a smoking gun, nor have we demonstrated a plausible alternative; bu?qwe imagine that even the parapsychological community must be cogerned that their best experiments still fall far short of the methodo- logn-1 adequacy that they themselves profess. rniorton and Hyman differ on whether to assign a flaw in randomization to(-R particular series of experiments. With Honorton's assignment, the stiches with adequate randomization do not differ in significance of ou9ome from those with inadequate randomization. With Hyman's as&%nment, the experiments with inadequate randomization have signif- ic&tly more successful outcomes than do those with adequate random- iz on. A simple disagreement on one experiment can thus make a huge di rence as to whether we conclude that this flaw contributed or did n ontribute to the observed outcomes. Several similar examples could be4ited to illustrate the extreme sensitivity of this data base to slight chc;~ges in flaw assignments. !Zven if Palmer is correct in asserting that in a particular case an an~naly has been demonstrated, serious problems remain. In astronomy an&ther sciences, an anomaly is a very precise and specifiable departure fr(~ a well-defined theoretical expectation. Neptune was discovered, for exaMple, when Leverrier was able to specify not only that the orbit of U us departed from that expected by Newtonian theory, but also I ely in what way it departed from expectation. Nothing approaching pr s su a specifiable anomaly has been claimed for parapsychology. A vague an&jnspecifiable departure from chance is a far cry from a well-described and-systernatic departure from a precise, theoretical equation. Leverrier's ani2naly was consistent with only a very narrow range of possibilities. Thaosort of anomaly claimed for parapsychology is currently consistent wit* an almost infinite variety of possibilities, including artifacts of various ki4. CL < THE PROBLEM OF QUALITATIVE EVIDENCE The committee continually encountered the distinction between qual- itative and quantitative evidence for the existence of paranorri PARANORMAL PHENOMENA 201 nomena. Many proponents of the paranormal acknowledge such a differ- ence in one way or a nother. Some realize that it is only quantitative evidence that will convince the scientific community. Although they themselves have relied on qualitative evidence for their own beliefs, they r efer us to the RNG experiments of Robert Jahn or the remote viewing experiments at SRI as examples of supporting quantitative data. Most proponents seem impatient with the request for scientific evidence. They have been convinced through their own experiences or the vivid testimonies of individuals whom they trust. Many argue that qualitative evidence can be as good as quantitative; indeed, they claim that in some circumstances it can be better. The arguments for the superiority of qualitative evidence are based in many cases on such factors as ecological validity, conducive atmosphere, and holism. The ecological validity argument asserts that the artificial conditions required for laboratory experiments are so different from the natural settings in which paranormal phenomena typically occur that findings from such controlled studies are irrelevant. By removing the psychic from his or her natural domain or by arranging conditions to suit the needs of scientific observation, it is claimed, the scientist destroys the very phenomenon under question. The ecological validity argument is closely related to the other arguments. Proponents who emphasize the conducive atmosphere assert that the austere conditions of strict labo- ratory procedure create an atmosphere that is numbing or inimical to psychic functioning. Those who emphasize holism point out that the experimental procedures necessarily dissect and focus on restricted portions of a system. Such compartmentalization, it is claimed, makes it impossi ble to study the sorts of paranormal phenomena that operate only as a total system in a naturalistic context. QUALITATIVE EVIDENCE AND SUBJECTIVE BIASES What is meant by qualitative evidence? Roughly, it means any sort of nonscientific evidence that proponents find personally convincing. Typ- ically, it involves personally experiencing or witnessing the phenomenon. Less compelling, but still effective, is the testimony of friends or trusted acquaintances who have personally experienced it. Even individuals who are intellectually aware of the pitfalls of personal observation and testimony find it difficult, even impossible, to disregard the compelling quality of such evidence in the formation of their own beliefs. A major parapsychologist admitted to one committee member that the scientific evidence did notjustify concluding that psi exists. "As a trained scientist," he said, "I know quite well that by scientific criteria there is no evidence for the existence of psi. In fact, I have always argued with CD T- 00 CD CD Q CD C*4 U) 77D 0 U_ > 0 CL CL 202 ENHANCING HUMAN PERFORMANCE my parapsychological colleagues that they are making a serious mistake in trying to get the scientific community to take their current evidence seriously. Before they do this, they first have to be able to collect the sort of repeatable and lawful data that constitute scientific evidence." Thik-same parapsychologist then explained why, despite the current lack of eiv4dence, he remained a parapsychologist. "When I was 16 1 had some peronal experiences of a psychic nature that were so compelling that I havono doubt that they were real. Yet, as a trained scientist, I know thaAny personal experiences and subjective convictions cannot and sho8d not be the basis for asking others to believe me." This parapsy choWgist is unusual in that he makes the distinction within himself Q betyly.en beliefs that are subjectively compelling and belief-, that are ifically justifiable. More typical is the proponent who, as a result sci(R of ownpelling personal experience, not only has no doubt about the reality of Rderlying paranormal cause, but also has no patience with the refusal of ers to support that belief. IrB see two problems regarding qualitative evidence. First, personal ob.9mvation and testimony are subject to a variety of strong biases of wh most of us are unaware. When such observations and testimony emWe from circumstances that are emotional and personal, the biases anqistortions are greatly enhanced. Psychologists and others have found tha he circumstances under which such evidence is obtained are just those. that foster a variety of human biases and erroneous beliefs. Second, beliefs formed under such circumstances tend to carry a high degree of sub2gctive certainty and often resist alteration by later, more reliable discDnfirming data. Such beliefs become self-sealing, in that when new inftianation comes along that would ordinarily contradict them, the CD beltzvers find ways to turn the apparent contradictions into additional con%mation. 1&F committee asked Dale Griffin to describe many of the ways in whi cognitive and social psychologists have documented that human subtctive judgment can lead us astray. Griffin's paper emphasizes the cogWtive biases termed availabiNy and representativeness, but he also d is sses motivational biases. Although most of these biases have been '21, ; ed under laboratory conditions, they are nonetheless quite powerful, c and-avidence has been mounting that, if anything, they are much more P04rful in natural settings. Griffin points out that one vivid, concrete expaience is usually sufficient to outweigh conclusions based on hundreds or ttLusands of cases based on abstract summary statistics. These and the er biases discussed by Griffin should make us wary of conclusions basteon qualitative evidence. PARANORMAL PHENOMENA 203 EXAMPLES OF PROBLEMATic BELIEFS In this section we discuss some examples of beliefs about paranormal phenomena that have been formed under conditions known to generate cognitive illusions and strong delusional beliefs. We attempt to make clear why we are skeptical of any evidence offered in support of the paranormal that does not strictly fulfill scientific criteria. We believe it is important to realize the power of such conditions to create strong but false beliefs. In 1974 a group of distinguished physicists at the University of London observed renowned psychic Uri Geller apparently bend metallic objects and cause part of a crystal, encapsulated in a container, to disappear. Impressed with what they saw, in 1975 these scientists contributed an 4f~ article to Nature outlining their ideas about how to conduct successful parapsychological research (reprinted in Hasted et at., 1976). In their discussion they note that successful results depend on the relation among the participants and that phenomena are more likely to occur when all participants are in a relaxed state, all sincerely want the psychic to succeed, and "the experimental arrangement is aesthetically or imagi natively appealing to the person with apparent psychokinetic powers." J Hasted and his colleagues describe further desiderata. The psychic should be treated as one of the experimental team, contributing to an attitude of mutual trust and confidence that facilitates successful appear- ance of the allegedly paranormal effects. The slightest hint of suspicion on the part of the observers can stifle the occurrence of any phenomena. Observers should avoid looking for any particular outcome that interferes with the required relaxed state of mind and impedes paranormal powers. To help avoid the inhibiting effects of concentrated attention, participants should talk and think about matters irrelevant to the experiment at hand. Acknowledging that these desiderata make it difficult to preclude trickery, Hasted and his colleagues express confidence that they can both create psi-conducive conditions and eliminate the possibility of being tricked (Hasted et al., 1976:194): It should be possible to design experimental arrangements which are beyond any reasonable possibility of trickery, and which magicians will generally acknowledge to be so. In the first stages of our work we did in fact present Mr. Geller with several such arrangements, but these proved aesthetically unappealing to him. Although we may sympathize with the British physicists' desire to create conditions conducive to the appearance of genuine psychic powers, if such powers exist, we cannot fail to note the quandary that their efforts produce. In their quest for psi-conducive conditions, they have created guidelines that play into the hands of anyone intent on deceiving them. > 0 - CL CL < 204 ENHANCING HUMAN PERFORMANCE The very conditions that are specified as being conducive to the appearance of paranormal phenomena are almost always precisely those that are conducive to the successful performance of con uring tricks. One of the first rules the aspiring conjuror learns is never to announce in advance 7specific outcome that he or she is going to produce. In this way he t on&kers will not know where and on what they should focus their atigtion and consequently will be less apt to detect the method by which thArick was accomplished. The authors' advice to avoid focusing on a praetermined outcome greatly facilitates the conjuror's task. ge insistence that the arrangements meet with the psychic's approval is & far the most devastating of these conditions. Geller will perform onq if the conditions are "aesthetically pleasing." This amounts to giving thk5alleged psychic complete veto power over any situation in which he orMhe feels that success is not ensured. This in turn means that the I ps=hic being tested, not the experimenters, is controlling the experiment. Saly the British physicists ought to realize the irony of their admission d to Rai all their experimental arrangements designed to preclude trickery tuftaed out to be aesthetically unacceptable to Uri Geller. 2nother example of beliefs generated in circumstances that are known to.~reate cognitive iflustions is macro-PK, which is practiced at spoon- b(Mding, or PK, parties. The 15 or more participants in a PK party, who uVally pay a fee to attend and bring their own silverware, are guided tbij~ugh various rituals and encouraged to believe that, by cooperating w* the leader, they can achieve a mental state in which their spoons aig forks will apparently soften and bend through the agency of their m7ads. (Since 1981, although thousands of participants have apparently bent n-JNaI ob9ects successfully, not one scientifically documented case of normal metal bending has been presented to the scientific community. Y participants in the PK parties are convinced that they have both I w2hessed and personally produced paranormal metal bending. Over and 0 r again we have been told by participants that they know that metal g buame paranormally deformed in their presence. This situation gives tIR distinct impression that proponents of macro-PK, having consistently f4d to produce scientific evidence, have forsaken the scientific method arW undertaken a campaign to convince themselves and others on the b,tis of clearly nonscientific data based on personal experience and tetLimony obtained under emotionally charged conditions. Monsider the conditions that leaders and participants agree facilitate 4 spoon bending. Efforts are made to exclude critics because, it is asserted, skepticism and attempts to make objective observations can hinder or prevent the phenomena from appearing. As Houck, the originator of the PK party, describes it, the objective is to create in the participants a PARANORMAL PHENOMENA 205 peak emotional experience (Houck, 1984). To this end, various exercises involving relaxation, guided imagery, concentration, and chanting are performed. The participants are encouraged to shout at the silverware and to "disconnect" by deliberately avoiding looking at what their hands are doing. They are encouraged to shout Bend! throughout the party. "To help with the release of that initial concentration, people are encouraged to jump up or scream that theirs is bending, so that others can observe." Houck makes it clear that the objective is to create a state of emotional chaos. "Shouting at the silverware has also been added as a means of helping to enhance the emotional level in a group. This procedure adds to the intensity of the command to bend and helps create pandemonium throughout the party." A PK party obviously is not the ideal situation for obtaining reliable observations. The conditions are just those which psychologists and others have described as creating states of heighte I ned suggestibility and implanting compelling beliefs that may be unrelated to reality. It is beliefs acquired in this fashion that seem to motivate persons who urge us to take macro-PK seriously. Complete absence of any scientific evidence does not discourage the proponents; they have acquired their beliefs under pircurnstances that instill zeal and subjective certainty. Unfortu- nately, it is just these circumstances that foster false beliefs. DiscuSSION OF QUALITATIVE EVIDENCE Our analysis of the evidence put before us indicates that even the Most solidly based arguments for the existence of paranormal phenomena fall short of the currently accepted parapsychological standards. Even if the best evidence had been collected according to acceptable scientific standards, most proponents would have in fact remained convinced by personal experiences and data that clearly fall far short of scientific acceptability. We have looked at two examples to make clear why and in what ways such failures to meet acceptable standards render the corresponding arguments useless as evidence for the paranormal, even though they have created compelling and strongly held beliefs in those who have been exposed to them. The examples illustrate how different ways of attempting to acquire evidence for paranormal phenomena can depart from adequate standards. These inadequacies become especially critical when we note that the conditions under which the alleged paranormal phenomena are supposed to occur are just those known to foster biases and false beliefs. The PK parties, while creating powerful beliefs in paranormal metal bending, clearly violate almost every principle for obtaining trustworthy data. These parties offer no standardization, no objective records, and no CD 00 CD Q CD CD C*4 U) M CD 0 LL CD 0 L_ '! CL r.L i <