A roved 0 CLn;~ For Releas 000/08/08 : CIA- DP96-00789ROO2200400001-3 I AA- Final Report Covering the Period November 1983 to October 1985 ENHANCED HUMAN PERFORMANCE INVESTIGATIONS (U) L Prepared for: N1 rLFOREIGN This document consIsts 3 of pages Approved for International) December 1988 t-opy I ob RELEASABLE TO RELEASABL NATIONALS 11/08,,,,CfA-R(DP96-00789ROO2200400001-3 Approved For Rel~ais-b~~Q'/08: CIA-RDP96-00789ROO2200400001-3 1. (U) Objective (u) The objective of this program was to provide an overview of the r-urr,'., psychoenergetics research and, based upon this assessment, to recommend avenues Of approach for future investigations. II. (U) Background (U) Psychoenergetic research can be divided into two major areas of interest: (1) Informational Processes (2) Causal Processes. Each of these areas can be subdivided further into training, screening, and fundamentals such as various type of functional correlates (e.g., psychological, physiological, and physical). _k During FY 1985, SRI International completed a retrospective analysis of a substantial body of open and classified literature in order to assess existence issues, research questions and potential applications of the previously reported activity in these areas. Subsequently, part of this analysis produced two reports that outlined an improved remote viewing analysis technique and provided a meta-analysis of the random number generator literature. (These two reports are included as Appendix A and B, respectively.) what follows are the recommendations, for a three-phase multi-year research effort. III. (U) Recommendation A. (U) Phase I-Knowledge Building (U) Phase I is considered to be a knowledge building effort. During this phase, SRI recommends that some form of technical oversight be included in order to provide guidelines on research protocols, to assess the credibility of the research, and to provide insight into new directions for future research. This phase should be as wide in scope as resources allow. More focused research should be delayed until a knowledge base is established. Table I shows the specific areas that are recommended for consideration as research items for Phase I. I Approved For Release 2000/~/08: CIA-RD 6-00789ROO2200400001-3 Approved For Release 2t0108/08 : CIA~t%-00789RO02200400001-3 Table I (U) PHASE I RECOMMENDED RESEARCH AREAS Topic Description Informational Processes Analysis A quantitative remote viewing (RV) analysis technique. Training Novice and advanced RV training methodologies. Screening Techniques to identify good remote viewers. Physical CorrelatesA search for RV correlates to the physical environment. Personality CorrelatesA search for personality traits in good remote viewers. Physiological CorrelatesA search for physiological correlates to RV. Medical CorrelatesMonitor medical conditions of all viewers. Feedback Determine the role of feedback in RV experiments. Spatial Search Determine if items can be located in space. Temporal Search Determine if events can located in time. Causal Processes Micro-remote ActionRemote action (RA) on random number generators. Intuitive Data Test the Intuitive Data Sorting Model. Sorting Macro-remote ActionTest a variety of physical systems as RA targets. Correlates As above, determine correlates to RA, General Information ServicesDevelop a user-accessible library system. UNCLASSIFIED (U) While some of the items shown in Table I can be considered beyond existence issues and thus should be considered during Phase 11, the predominant effort is toward knowledge building. B. (U) Phase 11-Development (U) During Phase II, research areas from the Phase I effort that yielded incontrovertible evidence for their existence, will be expanded. With the assistance of a technical oversight committee, hypotheses will formulated and tested. Those areas under Phase I that showed the most promise, will be expanded toward a potential application area. For example, if a physiological measure could be found that correlated strongly with excellent remote viewing, then that measure could be used to improve applications. 2 Approved For Release'LO/08/08: am~P96-00789ROO2200400001-3 1~ 8-10 '~AD96-00789RO02200400001 Approved For Release 2000 8 : CIA- -3 C. (U) Phase 111-Applications While continuing Phases I and II on specific it ems of interest, Phase III will be devoted toward applicationsA This activity should include at least two parts: (1) Applications research--Formulate and test hypotheses that are specific with regard to potential applications. (2) Application testing--Under actuali conditions, conduct psychoenergetic activity to assess field utility. IV. Financial Report Dur FY 1985 a total of $1,240 K was allocated to contract I -ior the psychoenergetic investigation and review. All moneys were expended in accomplishing the stated objective. 3 Approved For Release 2 08/08 ~-R P96-00789ROO2200400001-3 Approved For Release 2pjpKL8pi49SflqMOO789ROO2200400001-3 APPENDIX A A FIGURE OF MERIT ANALYSIS FOR FREE-REPONSE CIhis Appendix Is Unclassified) w UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 A FIGURE OF MERIT ANALYSIS FOR FREE-RESPONSE MATERIAL by E. C. May B. S. Humphrey C. Mathews SRI Internatioifal, Menlo Park, CA ABSTRACT. A simplified automated procedure is suggested for the analysis of free-response material. As in earlier similar procedures, the target and response materials are coded as yes1no answers to a set of questions (descriptors). By definition, this coding defines the complete target and response information. The accuracy of the response is defined as the percent of the target material that is correctly described (i.e., number of correct response bits divided by the number of target bits = 1). The reliability of the response is defined as the percent of the response that is correct (i.e. the number of correct response bits divided by the total number of response bits = 1). The figure of merit is the product of the accuracy and reliability. The advantages and weaknesses of the figure of merit are discussed with examples. INTRODUCTION Withohe increased use of computers in parapsychology laboratories, it has become possible to consider more complex methods of analysis to provide deeper insight into the mechanisms of the phenomena. The Engineering Anomalies Research Laboratory, Princeton University, provided a major advancement in the analysis of free-response material (Jahn, Dunne and Jahn, 1980). THE PRINCETON EVALUATION PROCEDURE (PEP) - A BRIEF REVIEW In general, the Princeton Evaluation Procedure (PEP) is based on comparing a priori, quantitatively-defined target information with similarly quantitatively-defined response information. So defined, the PEP applies various methods of mathematical comparisons to arrive at a meaningful assessment score for remote viewing responses. A-1 UNCLASSIFIED If Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For Release 2UHQA&i"D00789ROO2200400001-3 Target Information The definition of a particular target site (usually outdoor sites in and around Princeton, New Jersey) is contained in the yes/no answers to a set of questions called descriptors. These descriptors are designed in such a way as to characterize the typical Princeton target. Each descriptor bit is weighted by its a prior! probability of occurrence in a large target pool. By definition, the only target information that is to be considered for analysis, is that which is contained completely in the yes/no answers to the descriptor questions (with their associated set of descriptor weights) for the site in question. For example, one descriptor from the Princeton list, "Are any animals, birds, fish, major insects, or figures of these significant in the scene?" defines the animal content of the site. The question rrould be answered "yes" for a zoo and a pet store target, but "no" in all probability for a typical campus building target. Similarly, a §et ~f yes/no responses (30 for the PEP) constitutes the target information. Response Definition The descriptor list for the target sites is used as a definition of the response as well. For a given remote viewing session, the remote viewer (or an analyst who is blind to the target site) attempts to answer the 30 questions on the basis of that single response only. In the example above, it would be necessary for a viewer (or analyst) to decide whether orpot a particular verbal passage or a quick sketch could be interpreted as depicting animals. For some responses this might be an easy task, e.g. "I get a picture of a cow." Most responses, however, are somewhat ambiguous and require a judgment, e.g. "I see a farm." Nonetheless, the yes/no answers to the 30 questions constitute the only response information that are used in the analysis. Analysis For a given response/target combination-, the information is contained exclusively in the yes/no ans~wers to the descriptors. Two binary numbers (30 bits long each for PEP) are constructed, one for the target and one for the response descriptor questions, respectively. A "yes" answer is considered a binary "I," while a "no" answer is considered a binary "0." The resulting two, 30-bit binary numbers can then be compared by a variety of mathematical techniques involving use of the weighting factors, to form a score for that specific remote viewing session. For a series of sessions, a quantitative assessment is made by comparing a given response (matched to its corresponding target site) against the scores that are computed by matching the response to all other targets used in the series. This procedure has the added advantage of a built-in, within-group control. In other words, this assessment determines the uniqueness of the target/response match as compared with all other possible matches for the series. Advantages of the PEP There are a number of obvious and proven advantages (Dunne, Jahn, and Nelson 1983) of the Princeton Evaluation Procedure: A-2 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For Release 20 LOBMj&ffjV6D789ROO2200400001-3 UNC ~ Automation - Rapid and accurate analysis of a large number of free-response sessions can be accomplished with ease. ~ Archives - With the aid of computer database management, large numbers of free-response sessions can be organized and maintained in a usable manner. Control - The cross-target scoring procedure provides a powerful built-in within-group control. Use - PEP is widely distributed and provides a commonalty of analysis procedure across laboratories. Disadvantages of the PEP There. are actually very few disadvantages to PEP. A common problem that has been observed before (Dunne, Jahn, 1982) arises in the "granularity" of the descriptor list. With any finite list of binary-type descriptors, it is always possible that a response will appear to be correct with "analogue" analysis procedures but will be evaluated as incorrect with the "digital" approach. Another disadvantage of PEP (also noted above, op c1t) is that any given Oescriptor list is likely to be applicable only to a given target pool type (i.e., Princeton area natural sites, National Geographic magazine photographs, etc.). Lastly, one of PEP's strong points--namely, the cross-match, built-in, within-group control--is also potentially one of its weaknesses. Since nearly all of the various PEP scoring algorithms involve bit-by-bit weighting, which is based upon relative probability of occurrences, a given response/target score depends not only upon the correctness of the response, but also upon the nature of the remaining targets in the pool. Thus, a score for a given session depends upon the quality of response and the target pool. The follo%wing hypothetical example illustrates this dependency: a given target has 10 of 30 bits present; furthermore, a few bits (e.g. 3) are particularly rare when compared to the remaining bits (i.e. they possess comparatively large weighting factors). Let us assume that two different viewers provide responses to this target and that each asserts 8 descriptors in the response, 6 of which are correct. If the first viewer's response contains only one of the rare bits, while the other viewer's response contains all three, the second viewer's score will be considerably larger as a consequence of the weighting factors. Such a scoring discrepancy forces us to define what the purpose of the remote viewing session is. If the goal is to demonstrate the existence of psi phenomena, then the PEP is a perfectly adequate system of analysis, and it exhibits all of the advantages described above. If the goal, however, is to demonstrate correlation effects (e.g., correlation of free-response material with personality, physiology, environment, etc.), then the scoring difficulties described above confound the correlation measurement. To summarize, a target pool dependent scoring procedure provides an important measure of a viewer's ability to discriminate from among a number of possible targets. (The A-3 Approved For Release 20YO/N8~~~~A~AEI~P-00789ROO2200400001-3 UNCL Approved For Release 2000108108A~ISA~IE4~P-00789ROO2200400001-3 second viewer in the example above. for instance, would receive a higher score because his/her response is more unique to the target pool.) Tbe-target pool dependent scoring algorithm is less applicable, however, as an independent absolute measure of target contact-a necessary condition for correlation studies. If we remove the within-group control to eliminate a source of variance for a correlation measurement that is potentially unrelated to psi ability, we are obligated to provide some other form of control to demonstrate a deviation from mean chance expectation. FIGURE OF MERIT ANALYSIS The Figure of Merit analysis (FMA) was developed to address the problems associated with correlation studies and to provide a novel form of control. Target Information As in the PEP, the Figure of Merit analysis quantifies the target material into binary numbers corresponding to yes/no answers to a set of descriptors. Our descriptor list was developed on the basis of the target material (National Geographic magazine photographs), and on the basis of responses that might be expected a priori for our novice remote viewers. Table I shows the 20 descriptors that were used for the photon production experiment (Hubbard, May, and Puthoff, 1985). The questions are strongly oriented toward outdoor gestalts. typical of National Geographic magazine material. The horizontal lines separating the descriptors into groups of three are provided as an aid for translating binary numbers (derived from the yes/no answers to the questions) into an octal shorthand notation. A self-consistency check is performed on each coded target, and a set of logically consistent rules must be developed for a given descriptor list. One such example for the list (shown in TAle 1) involves bits 13 and 14. While it is possible to have a land/water Interface that is not a river, canal, or channel, the reverse (I.e. to have a river. canal, or channel without having a land/water interface) is not possible by definition. Thus, if a target analyst asserted bit 14 without asserting bit 13, we could consider this an error in coding and assert bit 13. It is beyond the scope of this paper to provide all the logical consistency rules, but most of them are obvious from Table 1. NaturaHy, these rules must be defined in advance of any experimentation. A-4 UNCLASSIFIED Approved For Release 2000108108: C IA-RDP96-00789 ROO 2200400001 -3 Approved For Release 2UNCIASSOfED00789ROO2200400001-3 Table I DESCRIPTOR-BIT DEFINITION Bit Descriptor No. 1 Is any significant part of the scene hectic, chaotic, congested, orcluttered? 2 Does a single major object or structure dominate the scene? 3 Is the central focus or predominant~ ambience of the scene primarily natural - rather than artificial or manmade? 4 Do the effects of the weather appear to be a significant part of the scene? (e.g., as in the presence of snow or ice, evidence of erosion, etc.) 5 Is the scene predominantly colorful, characterized by a profusion of color, by a strikingly contrastin combination of colors, or by outstanding brightly- g fl colored objects (e.g., owers, stained-glass windows, etc.-not normally blue sky, green grass, or usual building color)? 6 Is a mountain, hill, or cliff, or a range of mountains, hills, o~ cliffs a significant feature of the scene? 7 Is a volcano a significant part of the scene? 8 Are buildings or other manmade structures a significant part of the scene? 9 Is a city a significant part of the scene? 10 Is a town, village, or Isolated settlement or outpost a significant feature of the scene? 11 Are ruins a significant part of the scene? 12 Is,# large expanse of water-specifically an ocean, sea, gulf, lake, or bay-a i ifi f h s cant aspect o t e scene? gn 13 Is a land/water interface a significant part of the scene? 14 Is a river, canal, or channel a significant part of the scene? 15 Is a waterfall a significant part of the scene? 16 Is a port or harbor a significant part of the scene? 17 Is an Island a significant part of the scene? 18 Is a swam , jungle, marsh, or verdant or heavy foliage a significant part of ~ I- the scene 19 Is a flat aspect to the landscape a significant part of the scene? 20 Is a desert a significant part of the scene, or is the scene predominately dry to the point of being arid? A-5 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Response Definition The descriptor list shown in Table I is applied in exactly the same way in order to define each remote viewing response. In the SRI program, remote viewers do not fill in the descriptor list; rather, this task is performed by an analyst who is blind to the target. However, a set of a priori defined guidelines must be established in order to aid the analyst in consistently interpreting the responses. Analysis The target-pool independent scorift algorithm makes an assessment of the accuracy and reliability of a single response when matched only against the target materiatused in the session. As described above, the target and response materials are defined as the yes/no answers to a descriptor list (Table 1). Once the session material is coded into binary, we define session reliability and accuracy as follows: number of correct response bits Accuracy number of target bits = I Reliability number of correct response bits number of response bits = I In other words, the accuracy is the fraction of the target material that is correctly perceived, and the reliability is the fraction of the response that is correct. Neither of these measures, by themselves, is sufficient for a meaningful assessment. For example, in the hypothetical situation in which the viewer simply reads the Encyclopedia Britannica as his/her response, it is certain that the accuracy would be 1.0 simply because all possible target descriptors would have been mentioned. This would not be compelling evidence of psi. Similarly, in a response consisting of one correct word, the reliability would be 1.0, with little evidence of psi as well. We define the figure of merit (FM) as: Figure of Merit = Accuracy , X Reliability The figure of merit, which ranges between zero and one, provides an accurate assessment of the response. In the example above where the Encyclopedia Britannica is the response, the FM will be low. Although the accuracy is one, the fraction of the response that is correct (i.e. the reliability) will be very small. Likewise, in the example of a single correct word as a response, the reliability is one, but the accuracy is low. A-6 UNCLASSIFIED T Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For Release AmwkABiSEdED00789ROO2200400001-3 A figure of merit can be calculated for each session. For a series of sessions, the FM may be used to assess a viewer's progress on either a session-by-session or descriptor-by-descriptor basis or both. ABSOLUTE FIGURE OF MERIT - A METHOD OF CONTROL We have obtained an estimate of the meaning of FM on an absolute basis. Given the hypothetical situation in which ten viewers contribute 50 sessions each to a remote viewing series, a figure of merit can be calculated by the aboy technique for each session. If we add the number of responses for all viewers for each of the descriptor bits, we can obtain an eftimate as to "response/analysis" bias that may have occurred during the series. For example, if bit number I were asserted 40 times in 500 sessions, we can assume on the average for this series (accounting for all known and unknown conditions) that the probability that bit I will be asserted in a given response is 40/500 or 0.08. By repeating this calculation for each of the descriptor bits~ we can determine the probability of occurrence for all bits under exactly the same conditions that were used in the series. Since this procedure displays all response/analysis biases that may have developed during the series, we are able to use this information to construct computer-generated "random" responses, with a total absence of psi functioning, that are subject to exactly the same biases that were observed in the series. Therefore, we are able to simulate the ideal control condition, which addresses an important question that is frequently asked by our critics: namely, how would an average viewer respond to a no-target session (i.e. the "monkey on a typewriter" scenario)? A simple bit-by-bit random generation of a response is completely inadequate because it does not account for the response biases observed during the series. The method for producing "random" sessions that do account for the biases is described below. A ratidom number generator is used to create pseudo-responses that are assumed to be devoid of psi functioning. Each bit in a given pseudo-response is generated from the empirical "bias" described above. Once the complete response is generated, the same logical consistency rules (described above) are applied to finalize the pseudo-response. By this technique, a large set of pseudo-responses containing no psi information can be generated. To use these pseudo--responses, we must select, on a random basis~ targets from the same set that were used during the series from which the biases were observed. A complete pseudo-session consists of a single pseudo-response and a single randomly selected target. The standard figure of merit analysis is applied to all of the pseudo-sessions in order to calculate figures of merit that have, by definition, no psi content. The resulting FMs are fit with a gaussian distribution to provide an estimate of the mean and standard deviation FM for random data. Figure I shows the results of one such fit for a total of 300 pseudo-sessions, using the remote viewings from a photon-production experiment (Hubbard, May, and Puthoff, 1985) as the bias data. From the chi-square, we note that a gaussian is a correct function to use for the fit. Since the gaussian is truncated at zero figure of merit, we must modify the usual z-score techniques to provide p-values for the individual session figure of merits. By definition, the A-7 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For ReleaselJN6;100~~&t~NP-00789ROO2200400001-3 probability of observing a figure of merit, fo, or greater is the area under the FM-gaussian for f f0 divided by the total area under the FM-gaussian. Am exact p-value is calculated as follows: Define the minimum value of a Z-like statistic as z -- A . min or where g and cr are the mean and standard deviation of the best-fit gaussian respectively W - z: 0.132 and or = 0.163 in the example). Define a second Z-like statistic as, fo - A z0 or where f0 is the observed figure of merit. Let Pm,n and PO be the p-values calculated in the usual way assuming Z and Z were valid z-scores. Tben, the correct p-value is given by min 0 p-value < PO P.i. Utts and May (1985) have provided an exact method for combining p-values to enable an overall series evaluation. For mean p-values calculated for a series greater than .1, and the number of sessions gr6eter than 6, a close approximation for the combined Z-score is given by (Edgington, 1972) z (0-50 -7 x 1-1-2N combined where p is the average p-value for N sessions. A-8 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For ReleaseUNGLA~&IRIFA-00789ROO2200400001-3 50 40 30 20 10 0 A-9 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 Figure of Merit Figure 1 BEST-FIT GAUSSIAN TO CONTROL FMs Approved For Release 21AU~'&grA-IA§-VJAP-00789ROO2200400001-3 CONCLUSIONS AND SUGGESTIONS FOR EXTENSIONS We are proposing a target-pool independent method (figure of merit analysis) for scoring free-response material. The FMA provides a number of advantages over previous methods. Figures of merit can be used in correlation studies. ~ FMA provides a novel technique for free-response controls. ~ Target pool independent exact p-v4lues can be computed for each free-response session. ~ Since the FM is computed by simple counting, the computer coding burden is sharply reduced. Because of the lack of descriptor bit independence (and thus a need for logically consistent rules) the effective number of descriptor bits is reduced. We are presently investigating a way to utilize a hierarchical descriptor list: that is, oach level of the hierarchy consists of a variable number of independent descriptors. Finally,:, the ideal descriptor list would include arbitrary weighting factors for the level of hierarchy as well as for the individual descriptors within the level. REFERENCES Dunne, B.J., Jahn, R.G., and Nelson, R.D., "Precognitive Remote Perception," Engineering I Anomalies Research Laboratory, Schoolll of Engineering/Applied Science, Princeton University, Princeton NJ, Technical Note PEAR 83003 (August 1983) Hubbard, Q.S., May, E.C., and Puthoff, H.t., "Possible Production of Photons During a Remote Viewing Task: Preliminary ResWts," Proceedings of the 28th Convention of the Parapsychological Association, (Radin, 6.1, ed.) Tufts University, Medford, MA, (August 1985) Jahn, R.G., Dunne, B.J., and Jahn, E.G., "Analytical Judging Procedure for Remote Perception Experiments," The Journal of Parapsycholo8y, Vol. 44, No. 3, pp. 207-231 (September 1980). Utts, J.M., and May, E.C., "An Exact Method for Combining P-values," Proceedings of the 28th Convention of the Parapsychological Association, (Radin, D.I, ed.) Tufts University, Medford, MA, (August 1985) A-10 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For Release tjkQb&-.C&pKwD-00789ROO2200400001-3 APPENDIX B PSI EXPERIMENTS WITH RANDOM NUMBER GENERATORS; META-ANALYSIS PART 1 (Ihis Appendix 4s Unclassified) UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For Release]J(NCeLA~&IRIFP-00789ROO2200400001-3 Psi Experiments with Random Number Generators: Meta-Analysis Part 1 Dean 1. Radin Edwin C. May Martha J. Thomson SRI International Menlo Pqrk, California ABSTRACT: A meta-analysis of 332 psi experiments involving binary random number generators is -described. The combined binomial probability for data reported in 56 references published from 1969-1984 is p a& 10-43. A "filedrawer" analysis reveals that over 4500 additional, nonsignificant, unpublished or unretrieved studies would be required to bring the overall result down to a nonsignificant level. Using a novel approach, we estimate the actual size of the "filedrawer" to be 95 studies. Adding the equivalent of 9S nonsignificant studies to the existing data results in p AV 10-18, while a meta-analysis of 98 reported control studies re'sults In p Fw .78. An analysis of variance indicates that experimenters' mean z scores are significantly different from each other. We discuss an approach and propose criteria for performing a quality-weighted analysis on the existing data. We conclude that the prima facie evidence supports the notion that observers' intentions can affect the statistical properties of truly random number generators. INTRODUCHON This is Part I of a two part meta-analysis of psi experiments involving truly random number generators (RNG) published from 1969-1984. This part describes the results of a "first-pass" ana lysis, in which the published data was taken at face value. Part 2 will report on a quality-weighted analysis in which the results of each experiment (in terms of z score) will be evaluated on each of a dozen criteria to produce an adjusted z score reflecting that experiment's overall quality. Background: On the scent of a trail When Albert Einstein was asked about his way of thinking, he reportedly replied, "Ali I have is the stubbornness of a mule; no, that's not quite all, I also have a nose" (Bower, 1985, p.330). What he meant was that he was not only extraordinarily obstinate in tracking down solutions to problems, he was also able to sniff out when he was on the right track. The centennial anniversary B-1 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part 1 1r_Z? Approved For Release 2mbigXk-rvffLrilEP-00789ROO2200400001-3 of the American Society for Psychical Research, celebrated this year (1985), clearly demonstrates that parapsychologists have displayed Einstein's stubbornness over the years. One question we might ask after 100 years, however, is whether the parapsychological nose has been sniffing along a clearly defined trail, and if so, is the trail likely to grow more fragrant or more noxious as we progress7 There is evidence that the nose has not been shirking its duty. This can be seen in the single most predictable feature found in the parapsychological literature, that is, the perennial call for a Teplicable experiment. The ideal experiment is supposed to produce a significant result regardless of the phase of the moon, the price of pork bellies, and the experimenter's shoe size. This Quest for replicable experiments is by no means unique to parapsychology, however. Social and behavioral scientists in general have been acutely aware of the slow progress in the "spfter" sciences as compared to the natural sciences such as physics, chemistry, and biology. In experimental psychology, for example, Epstein (1980) has stated, Psychological research is rapidly approaching a crisis as the result of extremely inefficient procedures for establishing replicable generalizations. The traditional solution of attempting to obtain a high degree of control in the laboratory is often ineffective because much human behavior is so sensitive to incidental sources of stimulation that adequate control cannot be achieved.... Not only 4re experimental findings often difficult to replicate when there are the slightest alterations in conditions, but even attempts at exact replication frequently fail. (p. 790) Many observers of parapsychology (both within and outside the field) claim that the repeatable parapsychological experiment does not exist. For example, Be ]off (1977) has written, "There is still no repeatable [psi) experiment on the basis of which any competent investigator can verify a given phenomenon for himself" (p.759). Critics of the field have pointed to the lack of replicability as perhaps the single most serious problem in parapsychology (e.g. Kurtz, 1981, p.12). In response, pXoponents often point to significant psi studies involving ESP card-guessing (Honorton, 1975), ganzfeld stimulation (Honorton, 1978), remote perception (Dunne, Jahn, and Nelson, 1983), and RNGs (May, Hubbard and Humphrey, 1980) to indicate that there are some significant replications. The problem is that from different perspectives the proponents and critics are both right. There are indeed many psi experiments that have been repeated, but whether they are considered robust, successful replications is the crux of the debate. One of the primary reasons for this debate, in our opinion, is because the traditional approach of assessing the results of a set of related studies is by descriptive literature review. Within parapsychology 'there are many excellent examples of such reviews (e.g. Carpenter, 1977; Palmer, 1982; Rush, 1982; Schmeidler, 1984; Stanford, 1977; Stanford, 1984). Unfortunately. what one has typically learned after studying such a review is a hodge-podge of variables, conditions, and p-values. Rarely is one left with a quantitative statement of the degree of significance obtained in the studies as a whole. Addressing this issue empirically, Cooper and Rosenthal (1980) demonstrated that when knowledgeable individuals are instructed to make judgments about the overall significance of a set of studies based on their readino, nf a comprehensive, descriptive literature review, it is possibld for B-2 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For Release Jdi~MASSIOE&00789RonY66lV6do~?-%I them to draw conclusions that are completely the opposite of the results obtainedwhen the same studies are summarized by more explicit, quantitative methods. Given the difficulties in assessing evidence from existing psi studies, is the replication trail likely to be heading - to reinvoke our metaphor - towards a flowering meadow or something decidedly less pleasant? In general, we believe that the prospects are aromatic. In the last few years, quantitative techniques of combining and comparing research results in systematic ways have been developed - called meta-analysis (Rosenthal, 1984) - that show great promise in demonstrating that some areas of social science have been progressing much better than previously thought. In parapsychology, initial meta-analyses applied to ganzfeld research (Honorton, 1985), hypnotic induction (Schechter, 1984), RNG studies (MV, Hubbard and Humphrey, 1980; Nelson, Dunne and Jahn, 1984; Tart, 1983), and remote viewing (Dunne, Jahn, and Nelson, 1983)thave shown that the overall evidence for these psi phenomena is actually quite strong. Because meta-analysis involves the aggregation of results of numerous studies, several criticisms of this technique have been raised (Rosenthal, 19 8 4. p. 124-132). Perhaps the three categories of criticism most pertinant to review of parapsychological data are the following- First, authors may tend to report only the studies with significant results and leave the nonsignificant studies unpublished (called the filedrawer problem) Second, the meta-analysis combines poorer quality studies with better studies. And third, meta-analysis may be comparing "apples and oranges" by combining different experiments studying different variables. The first two problems may inflate the estimate of an overall effect; the third criticism may make the overall summary difficult or impossible to interpret. In the present meta-analysis, however, we actually are interested in whether these psi experiments have borne fruit. not whether they have borne specific flavors of apples or oranges. In other words, we are not concerned with whether hypnotic induction, say, has an effect on RNG outputs, but whether there is evidence for any psi effect on RNG outputs. Tbus, in this investigation we have concentrated on the filedrawer issue (in this report) and the quility of studies (to be described in Part 2 of this study). OVERVIEW OF A TYPICAL RNG EXPERIMENT The typical psi experiment with RNGs involves three main components: An observer (e.g. a human, goldfish, cat or dog), a truly random number generator based on radioactive decay or electronic noise, and an experimental task Unking the observer with the device, such as a video game, a set of instructions, a need to keep a heat lamp on or avoid a shock, and so on. 7be aim of these experiments is to show that the instructions (when humans are involved) or the induced need (when animals or plants are involved) are associated in some way - but not necessarily causally - to the statistical output of the RNG. For example, say an RNG was designed to produce 100 random bits at the press of a button. An individual in this experiment might see a digital display of the number of I's (called hits) produced immediately after he or she pressed a button. The instructions in the experiment would typically be to get as many hits as possible for each button press. The results of many presses, or trials, would 13-3 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part I Approved For Release 2UR~;64SAlk4Fl?-00789ROO2200400001-3 then be evaluated statistically, where under the null hypothesis an average of 50 hits would be expected by chance. If the average number of hits over thousands of repetitions were say, 52, this deviation from chance would be interpreted as evidence of a psi effect (provided that the probability of observing this deviation was less than I in 20). PROCEDURE Because we were ultimately interested in testing among several different models of mechanisms possibly operating in these RNG experiments, in Part I of this meta-analysis (this paper) we surveyed the parapsychological literature with two goals in mind: First, we wanted to see whether the aggregated result of the RNG experimen& showed evidence for an anomalous *effict. And second, we needed the details of these experiments for use in evaluating a model of the underlying mechanism. [Our modeling effort is discussed in May, Radin, Hubbard, Humphrey and Utts (1985).] Source of references We searched through the five major English language parapsychological Journals' over the years 1969 to 1984. We also included the (refereed) Proceedings of Presented Papers for the Annual Parapsychological Association Conventions (1971 and 1984). and a report published by the Princeton Engineering Anomalies Research Laboratory (Nelson, Dunne and Jahn, 1984). The literature search was started in the year 1969 because that was the year Helmut Schmidt (1969) published the seminal RNG study that has since spawned many replications. Defining "qn experiment" One of the difficulties faced in reviewing the articles for this meta-analysis was to decide what constituted an experiment. In most papers, authors analyze their data repeatedly in various ways, sometimes as a priori analyses, sometimes as post hoc afterthoughts. Even in cases of planned analyses, there are many ways of interpreting which of several conditions is the "Teal" experiment. How we decide what is an experiment is important to the meta-analysis for two main reasons: First, the meta-analytic statistical power depends on the number of experiments we find; and second, the z scores are different depending on how we break down the reported results. To illustrate the difficulty of deciding what an experiment is, consider 9-As example. Say an author uses three different groups of 10 percipients each (e.g. meditators, truck drivers and athletes) and subjects each group to two different conditions (e.g. mental imagery vs. muscular tension) in a study on psi-conducive states. The results can be broken into one big, combined experiment, six experiments (3 groups x 2 conditions), two experiments (2 conditions), three 3. These wre the Journal of Parapsychology, European Journal of Parapsychology, Research In Parapsychology, Journal of the Society for Psychical Research, Journal of the American Society for Psychical Research, B-4 UNCLASSIFIED Approved For Release 2000108108: CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part I Approved For Releasekbb&bA&SiflfiDoO789ROO2200400001-3 experiments (3 groups), 30 experiments (subject by subject analysis), and so on. I-low do we decide what to use? We resolved this issue for this first-pass analysis in the following way: For cases where there were multiple hypotheses under test and multiple analyses of the data, we chose as the experimental unit the largest possible accumulation of data compatible with a single "direction of effort" assigned to the subjects. A clearly defined direction of effort meant that the experimental protocol required either more I's or more O's from the RNG to successfully complete the assigned task, regardless of whether or not the subjects actually knew their task in detail. Say, for example, a hypothesis predicted that group A would score higher than group B, and it was stated that "higher" meant more I bits. IThen we would take this study as two experiments: Group A's and group B's scores. In this particular case, since group A was predicted to score higher than group B, if in fact the difference between z(A) and z(B) were significant, then both z scores would be taken as positive, regardless of the reported z's. Thus if z (A) = 1. 5, z (B) = - 1. 0, then the z-score difference between them would be significant one-tailed with Zdiff = 1.77. If the number of trials run in each case were 10000, then the number of hits assigned per experiment would be hits(A)= 5075 and hits(B) = 5050, which are both positive deviations; similarly the z scores would be recorded as z(A) = 1.5, z(B) = 1.0. If z(A) = 1.2 and z(B) = -1.0, the z scores would be recorded as originally reported since Zdiff is not significant. 7ble same would be true if z(A) = -2.0 and z(B) = 2.0. (Fortunately, such problems of interpretation were not often encountered in the survey.) As another example, if groups A, B, and C all tried to influence an RNG in a particular way, and no predictions were made as to interactions, then their overall result would be combined as one experiment. In this way, we attempted to emphasize in the meta-analysis the underlying question of whether or not observers could influence or otherwise affect the statistical output of an RNG accordin8 to the stated intention of the experimenter. Results of literature review We found 73 pertinent references in the journals and Tepons.2 These references included 381 experiments contributed by 38 different principal investigators, representing about 10 different laboratories around the world. We say "about 10 laboratories" because over the years labs have come and gone, researchers have moved among different labs, and in many cases, one or two individuals at an academic or private research institution are considered a "laboratory." Breakdown of experiments Of the 381 experiments found, 332 (in 56 remaining references) were described as using binary generators based on either radioactive decay or electronic noise. For this meta-analysis, we considered only studies using binary RNGs (or any study in which the hit rate was defined or could be interpreted as 50%) for three reasons: First, since 87% of the experiments (332/381) employed binary generators, we felt that this sample was representative of the entire RNG database; second, for the sake of simpbcity; and, third because the test of a model 2. These references are Usted under the beading "Meta-Analysis" in the references at the end of this paper. B-5 UNCLASSIFIED Approved For Release 2000108108: CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part I Approved For Release VN&6#-W-kl~P-00789ROO2200400001-3 we developed (May, Radin et a/, 1985) requires binary statistics. in addition, to avoid the possibility that reported p values or z scores were rounded up, whenever possible we recorded the reported number of trials (bits generated in an experiment) and hits (number of times the designated bit was obtained) in these experiments. Of these 332 binary experiments, 188 were reported in-journals and conference proceedings, of which 58 were reported significant at p < .05, 2-tailed, (against 9.4 expected by chance). The rVM 0-6 probability of obse g 58 significant studies out of 188 is less than 1 7. We refer to this body of data as the "survey." The remaining 144 experiments were obtained from the Princeton Engineering Anomalies Research Laboratory (Dunne, Jahn and Nelson, 1982). Of these experiments, 13 were significant 2-tailed, resulting in p < .04 (corrected for continuity). We refer to these experiments as the "Princeton" datO Experiments with incomplete descriptions Of the 188 survey studies, 30 were simulated by Monte Carlo techniques because the experiment was reported as nonsignificant but neither the z score nor the number of trials and hits were provided. To perform the simulation, we had a pseudorandom generator (cf. May, Humphrey and Hubbard, 1980) choose a z score at random from a normal distribution [N(0, 1) ], but bounded between 10 -25 to 1.64 and -10 -P5 to -1.64.3 In five additional studies, the results were reported as significant and p & z values were provided, but the number of trials or hits were not given. For these five studies, since the z score was known or could be calculated from a p value, the trials or hits (whichever was missing) were calculated. Table I shows a breakdown of the number of experiments reported in each of the seven reference sources we used. It is clear that the reports provided in the Research in Parapsychology series are not as detailed as one might have wished, but it is not surprising since the contents of this reference are only abstracts of the full papers presented at the annual Parapsychological Association conventions. Table 1. Experiment breakdown by source of reference. Reference ExperimentsExperiments with fuH with partial detail detail Journal of the American 5 Society for Psychical Research European Journal of Parapsychology6 Journal of the Society for 9 Psychical Research ]Proceedings of the Parapsychological32 AssociationO Journal of Parapsychology 49 1 Research in Parapsychology 52 34 Princeton Engineering Anomalies144 Research 1Ab for the years 1971 and 1984 3. We did not generate z scores of zeyo because this data was ultimately used in an evaluation of our model (May, Radin et al, 1985), in which log(z) is taken. B-6 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 ~~IIFIED Meta-Analysis Pan I Approved For Release YNO OR~11 A-RDP96-00789ROO2200400001-3 In summary, of the 332 experiments we considered (188 from the survey and 144 from Princeton), 71 were reported significant at p < .05, 2-tailed, for an overall binomial probability of p < 5.4 x 10 -43. ADDRESSING CRITICISMS OF THE DATA Taken as prima facie evidence, one might think that this body of published data provides indisputable evidence that an anomaly exists. But there are numerous reasons why the data may be suspect. 'fhe main criticisms (Akers, 1984; Hansel, 1980; Hyman, 1985; Kurtz, 1981) include 1. Results are due to chance 2. Basic statistical assun~ptions are violated 3. Only significant studies are published 4. Experiments are not replicable 5. RNGs are nonrandom 6. Poorer studies are included with better studies Let us consider each of these six steps as successive filters for the reliability of the data. If each criticism can be satisfactorially refuted or countered, then a persuasive case for an anomalous effect can be made. 1. Results are due to chance I In any one experiment we cannot establish the reality of a phenomenon, regardless of the significance level, unless strong theoretical predictions have preceeded the experiments. For example, the recent experiments suggesting that Bell's inequality is violated (e.g. Aspect, Dalibard, and Roger, 1~82; Aspect, GrangieT and Roger, 1982) have been widely accepted within the physics community on the basis of only a few empirical studies despite its profound implications on our view of the nature of reality (cf. d'Espagnat, 1979; Mermin, 1985; Rohrlich, 1983). Parapsychology, however, has had the disadvantage of not having a firm theoretical base on which to stand. Thus the nature of the claim (any claimed psi effect) understandably requires extremely persuasive evidence. One wonders how statistically strong an effect must be to bring about a consensual agreement within the scientific community that a psi effect on RNGs is real. Would p < 10-43 be sufficient? If this figure were revised to take into account all of the criticisms noted above, and the end result were say, 10-5. would that be sufficient? Clearly an overall p = .1 would not satisfy anyone, so there is a decision curve related to this question. This curve is probably different according to individual prejudices and predihctions, but the resolution of this question is beyond the scope of the present paper. Note that if an anomaly did exist, it would not necessarily imply that psi was the mediating factor. Such an anomaly may, for example, reveal some heretofore unknown statistical peculiarities about random numbers. B-7 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 M Analysis rt I ro Poor Release tjMC"-,W-FffwD-00789ROO2200400001-3 2. Basic statistical assumptions are violated This criticism incorporates such problems as the improper application of statistics to a particular experimental design, violation of assumptions of independence, performing multiple analyses on the same data, and so on. In this meta-analysis, one of the reasons we only considered binary generators was to simplify the statistical assumptions to the point where we could avoid many such problems. Another reason was to avoid the "apples vs. oranges" comparison problem we mentioned earlier. Because we were interested only in RNG experiments that reported (or where we could calculate) the number of hits and trials, we were in fact comparing apples only with apples (actually bits with bits). YvUle it is true that there were many different psychological and physiological conditions involved in these experiments, as well as human and non-hu&n subjects, the underlying question we asked was the same for each experiment: What was the behavior of the RNG as compared to the pre-specified direction of effort defined in the experimental task? The statistics in these RNG experiments are described by the wen understood binomial distribution, and the central limit theorem allows us to use the normal approximation to further simplify the statistical treatment for the range of trials observed in the data~ (200 to 2 million trials in a single experiment) . I Violation of the assumption of independence can be the downfall of an otherwise tightly controlled experiment. In the present case, however, the random events are based on sources that are quantum-mechanical (QM) in nature -- radioactive decay of alpha, beta, or gamma particles, or electronic noise from various semiconductor devices such as tunnel diodes. QM theory states that random numbers based on QM events are in principle indeterminant and therefore independent of each other, provided that the RNG device is properly designed and constructed.4 In this m4a-analysis, under the null hypothesis of no psi effect we can assume independence of random bits. Note that the assumption of independence among bits does not override proper concern about whether the RNCTs used in the experiments produced bits with equal probabilities. It is entirely possible, for example, to produce bits that are completely independent, but with p(1) .6 and p(0) = .4. This is addressed in point 5 below. 3. Only significant studies are published - the Filedrawer problem The filedrawer problem, in which only significant studies are reported and the nonsignificant studies languish in filedrawers, will inflate the results of a meta-analysis because there will be too many small p values (or equivalently, to many large z scores). To address this problem, we followed a procedure proposed by Rosenthal (1984, p. 108), in which the average z score for all combined studies is applied to the formula: 4. Note that some of the diodes used in noise-based RNGs are not QM in nature. RNGs that use avalanche . diodes, for example, derive their noise from fluctuations in charge carrier multiplication, which can be described by classical electromagnetic theory. B-8 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For Release UNQA'WFJAQ-00789ROn2Ob4bdd(ff-r3' K[K~ 2 - 2.72] 2.72 where K is the number of studies combined, Z is the mean Z obtained for the K studies, and X is the number of new, filed, or unretrieved studies averaging null results required to bring the new overall p level to a designated level. The value 2.72 in equation (1) is the square of 1.65, the z value for p = .05 (the p level that Rosenthal uses). To make our filedrawer estimate more conservative, we chose a 2-tailed p = .05, z = 1.96. Thus the formula we used was X K[K Z 2 _3,921 (2) 3.92 We shall consider the Princeton studies separately from the rest of the survey because we have good reason to believe that all of the Princeton data was, in fact, published, thus their data has no filedrawer problem. [(Publishing all data is a part of the Princeton Laboratory's philosophy (Jahn, 1982)]. In the 188 survey experiments, the mean z score = 0.738. A meari z of this value over 188 experiments produces an overall z = 10. 114, for a 2-tailed p < 4.9 x 10-24 (§ee Table 1). Note that this method of estimating the overall probability is more accurate than determining the binomial probability of 71 successes out of 188 samples at p < .05, as described earlier in this paper. Applying Z = .738 and K = 188 to formula (2) results in X = 4723. This means that 4723 additional studies averaging null results would have to be filed away in researchers' filedrawers to bring the overall z score down to a 2-tailed nonsignificant level. According to Rosenthal (1984), the number X has different meanings depending on the research context. In some areas of research (say genetic engineering), perhaps 10 or 12 unpublished~or unretrieved studies might be considered reasonable. In other areas (say child development), perhaps 200 to 500 filedrawer studies might be a reasonable estimate. Rosenthal (19 8 4, p. 110) proposes the following general guideline: " Perhaps we could regard as robust to the file drawer problem any combined results for which the tolerance level (X) reaches 5 K + 10." Thus -- not counting the Princeton data -- since X is more than 25 times larger than the observed number of studies, we could state, based on Rosenthal's guideline, that the observed effect is robust. Indeed, for this many unpublished or unretrieved studies to exist it would have required each of 10 parapsychology laboratories to have continuously produced nonsignificant studies at the rate of 2.6 per month over the 15 years surveyed. This is an unlikely scenario given the limited number of researchers performing these experiments over the years and the time and effort typically required to perform a single study. If we apply the same procedure to the Princeton data of 144 studies, we find the mean z = .339, overall z = 4.063, and p < 4.85 x 10-5. Plugging these values into formula 2, we find we would need X = 476 additional unpublished or unretrieved studies averaging null results. But as previously B-9 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part 1 Approved For Release UN"%WF"-00789ROO2200400001-3 mentioned (vide supra), the Princeton lab has claimed that they have no unpu'blished or filed studies, thus this estimate of the filedrawer size is purely academic. Another way of looking at the Princeton data is shown in Figure 1. This shows a histogram of the absolute value of the observed z scores in light-colored bars, and a best Gaussian fit in dark bars. As is apparent from the figure, the observed z scores are a good Gaussian fit, but the standard deviation of the fit is not 1.0, as one would expect under the null hypothesis of z scores chosen at random from a normal distribution, but rather the best fit Gaussian standard deviation is 1. 17. A variance test between these two variances results in z = 2.90, p < .004 (2-tailed). Thus the distribution of z scores is significantly altered from that expected by chance. This interesting effect is discussed in more detail by Jahn, Ne~on and Dunne (1985) and May, Radin, Hubbard, Humphrey, and Utts (1985). Figure 1. jzj score distribution for Princeton data 20 18 16 14 number of studies 10-t 6-- Gaussian A fl Data (144) ' 141 Ar~ 0- .075 .675 1.275 .875 3.075 1 2.475 3.675 4.275 Izi scores Estimatin,g the actual number offiledrawer studies What if we wished to make an estimate of the actual size of the filedrawer for the rest of the survey data? We would not be surprised to learn, for example, that there are indeed some unpublished or unretrieved nonsignificant studies we may have missed in our survey. To do this, we postulated what a z-score distribution might look like if there were a filedrawer problem. Figure 2 (next page) shows a histogram of the absolute value of B-10 UNCLASSIFIED Approved For Release 2600/08/08 : CIA-RDP96-00789ROO2200400001-3 ,,a-Analysis Part I Approved For ReleaseUNUA.%ME06-00789ROO2200400001-3 hypothetical z scores with a filedrawer problem. Notice the discontinuity at the magic number z 1.65 (p < .05), which is what one would expect if nonsignificant studies remained unpublished. 25 20 number 15 of studies 10 5 01- Figure 2. jzj score distribution with filedrawer problem Z 1.65 - .075 .675 1.275 1.875 2.475 3.075 3.675 4.275 jzj scores In Figure 3, we plot a histogram of z scores from the 188 survey studies. We also plot a double Gaussian curve, assuming that the observed z-score curve is actually the sum of two Gaussians. The resulting two-Gaussian curve is a good fit to the data; in fact, the sum of two Gaussians is a significantly better fit than a single Gaussian curve (Zdiff = 1.718, p < .04, 1-tailed, determined by transforming chi-square goodness-of-fit values for one vs. two Gaussian fits into z scores, and comparing those two z scores.) B-11 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part I Approved For ReleaseiJt4Gkk-'='4'fEQ-00789ROO2200400001-3 Figure 3. jzj score distribution for 186 survey experiments number of studies In Figure 4, we show how we estimated the actual number of filedrawer studies. We have assumed that the observed curve (Figure 3 above) is the sum of two Gaussians (Figure 4b), shown as two separate curves (I and 2) in Figure 4a. We obtained estimates of the amplitude and variance of these curves by allowing a computer-based curve-fitting routine the freedom to vary the amplitude and variance of each curve so the obtained fit to the curve shown in Figure 3 would be the best possible. Under these conditions, the standard deviation (sd) of curve 1 was found to be 0.9256 and sd of curve 2 was 2.024. B-12 T UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 .075 .675 1.275 1.875 2.475 3.075 3.675 4.275 Izi scores Meta-t~nalysis Part I Approved For ReleaseUNUA,%MfiID6-00789ROO2200400001-3 z = 1.65 Figure 4. Method of estimating filedrawer size (see text) (4a) (4b) Now that we had a full description of curves 1 and 2, we assumed that the area labeled "b" in Figure 4a was the number of observed studies with jzj < 1.65 (188 - 76 = 112), that area "c + d" was composed of 76 observed studies with jzj > 1.65, and that the total area "a + b + c + d" was calculated at 283 studies r. Doing the subtraction 283 - 112 - 76 = 95, we estimate 95 unreported or unretrieved nonsignficant studies in the actual filedrawer. We believe that this number is a more realistic estimate than the 4700 studies determined by equation (2). In fact, 95 studies would require each of 10 parapsychology laboratories to -have filed only about 0 - 6 studies per year over the 15 year,.survey period (as opposed to 2.6 per month, as 4700 studies would require). Now if we combine the 188 observed survey studies with 95 new, nonsignificant z scores (generated by Monte Carlo technique with z chosen at random from a normal distribution, and bounded between 10 -25 and ±1.64), we find of the 283 resulting studies, mean z = .462, overall z = 7.7 6 8, and P < 8.03 x 10 -15. Again applying formula (2) to the new values (for the sake of curiosity), we find X = 4078 additional nonsignificant studies needed to bring this overall p value down to p = .05, 2-tailed. Finally, combining all survey, newly estimated, and Princeton studies (188+144), we find that for the 425 total studies the mean z = .420, overall z = 8.684, and p < 3.9 x 10-18. Applying formula (2), we find we would need 7778 additional nonsignificant studies in the filedrawer. Thus, from several different perspectives, it seems that the filedrawer issue is not as serious a problem as many have thought. 5. This calculation was based on the curve-fitted standard deviations for the two Gaussian curves and the observed number of studies in areas b and c + d. B-13 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 'ftM&4 P& helease UNCLOA.9gpito-00789ROO2200400001-3 Incidentally, testing the standard deviation of the z scores observed in these 425 studies (sd 1.414) against the expected variance of 1.0 for a normal, unperturbed z distribution, results in a chi-square value of 853.7 (424 df), for a p < 5.9 x 10-34. Table 1 (below) and 7 (at end of the paper) summarize these findings. Table 1. Summary of z score analyses variance test Z's against or = 1 Source studies z score2 of reference(N) ~p (2-tail)sd X p (2-tail) - 47 Survey 188 0.738 10.1144.9 x 10 1.739 568.5 4.9 X 10 -24 Princeton144 0.339 4.063 4.9 x 10 1.184 201.9 0.001 - 5 Estimated (simulated)95 -0.084-0.8200.412 0.661 41.5 0.51 filedrawer I Combined425 0.420 8.684 3.9 x 10-1111.414 853.7 5.9 x 10- 34 4. Experiments are not replicable Occasional significant effects may be impressive, but the existence of the claimed anomaly cannot be e#stablished on the basis of results reported by only a few individuals.6 The same effect must be replicated by many others. Is it true, as Kurtz (1980) claims, that The basic problem ... is the lack of replicability by other experimenters. Apparently, some experimenters -- a relative few -- are able to get similar results, but most are unable to do so. (Italics in the original, p.12) In fact, of the 332 experiments we considered, 78.6% failed to reach significant levels. It is hardly surprising, then, that on the basis of examining individual experiments it is easy to reach the conclusion that the effect is elusive and non-replicable. At this failure rate, nearly 4 out of 5 experiments will fail to reject the null hypothesis. (Of course, if just chance were operating, 19 out of 20 experiments would fail to reject the null hypothesis.) 6. Actually, compared to experimental psychology, experimental parapsychology is in much better shape as far as replication rates go. Honorlon (1975), for instance, describes a study by Bozarth and Roberts (1972), who, in a survey of 1334 articles from psychology journals, found only eight articles involving replications of previously published work. In this present meta-analysis alone, parapsychology is a factor of 40 ahead of psychology. B-14 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For R t I eleaseUbWAA.Wf5Q-00789Rai~btt~di6S4r-3 Another reason why it may be difficult to produce significant experiments at will is the well-known "experimenter effect" (Rosenthal, 1976). This effect is ubiquitous to all the sciences, but parapsychology seems to be especially vulnerable (see, e.g. White, 1977). The experimenter effect may help explain why some critics of parapsychology claim that they have never obtained significant results in their attempts to replicate psi experiments (e.g. Kurtz, 1981, p.16; Neher, 1980, p.147). Of course, the odds of never obtaining a significant study can be astronomical, depending on the number of studies conducted. Unfortunately, critics rarely report the number and details of their claimed replications, so a good estimate of the probability of their never seeing a significant result cannot be made. It should be noted that experimenter effect)is only one of many confounding problems. involved in the quest for the significant replication. For example, selection of subjects, experitnenters, task conditions, experimental protocols, statistical procedures, environmental conditions, feedback techniques and generation of random numbers are all reflected in the ultimate outcome of an experiment. Regardless of how well controlled an experiment may be, a change in any one of these factors will affect the entire experiment in a complex, poorly understood way. In any case, experimenter bias is unavoidable, and we must rely on well-controlled experiments with features like automated data recording to help eliminate this bias. Iq spite of tight controls, however, it is known that even parapsychologists who would like to replicate RNG studies cannot guarantee significant results. Thus, critics would perhaps claim that any reported significant studies are due more to unconscious or intentional experimenter bias (i.e. fraud or carelessness) rather than there being a real effect. To address the issue of what effect different experimenters may have had in the reported RNG experiments, we ran two analyses on the survey data. The first involved calculating the overall z score obtained by each principal investigator; the second was a test of the homogeneity of mean z scores reporAed by different investigators. Combined z score results Table 2 shows a combined z and mean z calculated for each of 28 different principal investigators. This list is comprised of only those studies where sufficient detail was published for us to calculate z scores from the number of trials and hits in an experiment (332 total - 35 partially detailed experiments = 297 experiments). The z(overall) scores per investigator were calculated by summing the z scores for all experiments contributed by that investigator and dividing by the square root of the number of experiments. In effect, this weights each experiment equally, regardless of the number of trials (bits) actually used in the experiment. (The number of trials run in these experiments ranged between 144 and 2 million.) B-15 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis rL 1 Approved Pelor ReleaseUNQA-MME06-00789ROO2200400001-3 Table 2: Overall z score per investigator Principal investigatorReferencesExperimentsz(overall) Andre 1 4 2.413 Bierman 1 2 3.899 Braud 2 4 3.760 Broughton 1 4 -0.470 Debes 1 8 0.356 Dunne 1 144 4.063 Edge 1 10 0.369 Giesler 1 12 2.694 Heseltine 4 19 -0.386 Hill 1 1 2.950 Honorton 4 14 1.523 Houtkooper1 4 0,981 Jungerman 1 1 2.332 Kelly 1 2 3.366 Matas 1 2 0.513 May 2 1 1 -2.384 Millar 2 2 -0.875 Morris 1 5 1.835 Morrison 1 3 1.342 Palmer 1 1 1.750 Pantas 1 4 1.525 Radin 1 4 4.343 Randall 1 6 -0.029 Schechter 1 2 -1.060 Schmeidler1 1 -1.273 Schmidt 9 30 13.224 Shafer 1 2 -1.440 Winnett 1 5 -0.089 I I I - rTOTAL 44 297 8.548 1 This is the name of the first author as listed in the references. 4 ~2 The study by May, Humphrey, and Hubbard (1980) is not included in this SUTVCY because their sequential analysis data collection technique is not amenable to z score analysis. As seen in Table 2, the overall z scores for these investigators ranged between -2.384 to 13.224. The grand total z score, obtained by summing the 28 z scores and dividing by )/ 2-8 is z = 8.548, for an overall p < 1.27 x 10-17 (2-tailed). If we remove Schmidt's 30 studies, since he obtained the largest overall z score and is responsible for the largest number of references in our survey, we find the grand total z = 6.160, p < 7.31 x 10-10 (2-tailed). If we also remove the Princeton data, which comprise nearly half of the reported experiments, we 'get a grand total z = 5.480, p < 4.25 x 10-8 (2-tailed). Thus, after removing the two largest contributors to the database, we are left with a fairly impressive overall result: Odds against chance of about I in 23,000,000. In addition, we find that 39% (11/28) of the experimenters obtained overall 2-tailed significance and 68% (19/28) obtained positive z scores. Test for homogeneity of effect size Do different experimenters tend to observe about the same effects in their experiments? Or are there some individuals who consistently obtain significant results and others do not? In the present context, to test for homogeneity of effect size among 13-16 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Ana ysis Part 1 Approved For ReleasemQA.%iREE)6-00789ROO220040'0001 -3 different experimenters, we believe it makes more sense to test the individual z scores obtained in each experiment rather than use effect sizes such as d, d', r, or so on, as discussed by Rosenthal (1984) and others. The reason is the following: Effect size may be defined as significance test = [effect size] x [size of study] where "significance test" can be a z, t, r, chi-square, or any other statistical test. In the studies we found in the literature, it is clear that if the effect size were constant regardless of the size of the study (say, N trials), we should be observing epormous z scores when N. is even moderately large. For example, if an investigator ran a study with N = 100 and obtained a z score = 210, this would imply that the effect size (defined as r = 2Ap = zlV'N- for a binary RNG) would be r = 2. ON '100 = 2. 0/ 10. 0 = .2. If this effect size were constant, then if we ran the same experiment again but with N = 10000, the z score for this experiment would be z = (2,&p) Nf_10000 = .2( 100) = 20.0. Z scores of this magnitude are simply not reported in individual experiments, thus our effect size is almost certainly n-dependent. Indeed, this phenomenon has been observed repeatedly in a variety of experiments and has been called a goal-directed effect (e,g. Kennedy, 1978; May, Radin et al, 1985; Schmidt, 1974). 1 To take the effect size n-dependence into account, we must multiply the effect size by a function of the size of the study, which brings us back to a significance test, as noted above. For the sake of convenience, we can use the z score calculated for each experiment. To see whether different experimenters reported about the same magnitude z scores, we performed an analysis of variance; the results are shown in Table 3 (on the next page). It is clear from the results of the ANOVA that different experimenters do indeed obtain different mean z scores, although with 25% (7/28) of the principal investigators reporting mean z scores greater than 2 or less than -2, it is not the case that only one or two experimenters have obtained large mean z scores. B-17 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part I Approved For Release 2U #4CAASSMEMPOO789ROO2200400001-3 Table 3: Results of one-wayanalysis of variance Grand mean N MEAN Z SD SE 2970.5979 1.5823 0.0918 Person N MEAN Z SD SE Andre 4 1.2065 1.9091 0.9546 Bierman 2 2.7570 1.3863 0.9803 Braud 4 1.87~7 0.9373 0.4687 Broughten 4 -0.2347 0.3048 0.1524 Debes 8 0.1260 1.8205 0.6437 Dunne 144 0.3386 1.1842 0.0987 Edge 10 0.1166 2.0067 0.6346 Giesler 12 0.7778 0.8011 0.2313 Heseltine 19 -0.0885 1.7124 0.3,928 Hill 1 2.9498 Honorton 14 0.4071 1.1328 0.3028 Houtkooper 4 0.4906 1.4944 0.7472 Jungerman 1 2.3322 Kelly 2 2.3799 0.3015 0.2132 Matas 2 0.3625 2.9522 2.0875 May 1 -2.3841 Millar 2 -0.6187 1.6406 1.1601 Morris 5 0.8206 1.0562 0.4723 Morrison 3 0.7746 0.4926 0.2844 Palmer 1 1.7500 Pantas 4 0.7625 2.4453 1.2226 Radin 4 2.1712 0.8822 0.4411 Randall 6 -0.0120 1.1753 0.4798 Schechter 2 -0.7496 4.2411 2.9989 Schmeidler 1 -1.2728 Schmidt 30 2.4144 2.0341 0.3714 Shafer 2 -1.017a 1.1158 0.7890 Winnett 5 -0.0396 0.5795 0.2592 SOURCE SS df MS F p person 197.9629 27 7.3320 3.631 2.78 x 10 error 543.1467 269 2.0191 B-18 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO22004000-01-3 Approved For Release 2U"iEt8A495MED00789ROOk"0,4MOMEBn I To see whether the mean z score might be related to the number of experiments each investigator ran, we performed a correlation between N and MEAN Z (in Table 3). Results were as follows: Correlation r-squared t(21) p -0.0185 0.0003 -0.0941 0.9257 In summary, taking the data taken at face value (i.e. not weighted by quality analysis), we can make two statements: First, considering all available data, there do appear to be significant differences among mean z scores obtaine4 by different experimenters. Second, there is a nonsignificant correlation between the number of experiments run by principal in4estigators and their mean z scores. So to return to the question at the beginning of this section: Do different experimenters obtain about the same results? The answer is no -- experimenters in this survey showed mean z scores ranging from -2.38 to 2.95. As to the question of whether only one or two individuals may be responsible for the overall significance, the answer is also no; 25% of the experimenters in our survey reported mean z scores beyond 2 and -2. 5. RNGs were nonrandom This criticism may be addressed by examining the results of control studies reported in the literature. The results shown in Table 4 were compiled from 14 of the 44 detailed references referred to in Table 3, and were contributed by the following twelve authors: Dunne (Princeton), 57 control studies; Schmidt, 23; Broughten, 8; Braud, 2; and one each for Bierman, Hill, May, Millar, Morris, Schechter, Honorton, and Palmer. The other references did not report control results in detail and could not be used. Table 4: Combined control studies Number Data of Y, Z's z overall p (2-tail)sd control z studies Survey 41 -0.012 -0.0003 -0.002 0.999 1.036 Princeton57 2.829 0.0496 0.375 0.708 .806 Combined 98 2.817 0.0287 0.285 0.776 .905 A variance test of the observed standard deviation (sd .905) against the expected variance of 1.0 for 98 samples results in a chi-square = 80.2645 (97 df), z = -1.22, and p < .222 (2-tailed). Thus, for the references where control runs were described in sufficient detail to determine the B-19 UNCLASSIFIED Approved For Release 2000108108: CIA-RDP96-00789ROO2200400001-3 Meta-Anagllas',FFart Approv or Aeiease *J#MASS[Fel)-00789ROO2200400001-3 number of binary hits and trials, there is no evidence of systematic (mean or variance) bias in the RNG equipment. QUALITY ANALYSIS: A PROPOSAL In this section, we address how we plan to judge the quality of the published experiments. Quality analysis in effect adds a weighting factor to each experiment's reported z, t, or p value, depending on the assessed quality of that experiment. To avoid making a subjective quality assessment for each experiment, criteria and associated weights can be defined such that if a criterion is met, the weight associated with that criterion is added to that exper-imentts overall weighting factor. Rosenthal (1984, p.46-48) describes a variety of factors one rifight want to consider when performing quality analyses, but it is clear that the choice of weighting criteria depends on the research context. For the present analysis, Table 5 shows our initial proposal for criteria and associated weights; these are explained following the table. Table 5: Weighting criteria for RNG quality analysis Criteria* Weighting factorg Controls With Without data data local control runs 30 15 global control runs 20 10 other control/random tests 10 5 target bit oscillation 10 5 Data Integrity automatic hithrial counters 5 tamper resistant equipment 5 automatic data recording 10 Statistical Integrity pre-specified analysis 10 fixed run lengths 10 direction of effort stated 5 Subject type ordinary subjects 10 special subjects 4 experimenter as subject 2 Reporting clarity fully reported hits or trials 10 and z report of z, p, or t only 4 report of other statistics significant" only 2 " 4 nonsignificant" only * See text for explanation of criteria. Explanation of RNG weighting criteria Controls in Table 5, a "local" control means the equipment was checked for randomness as part of the experimental protocol. A typical design is to have an experimental run followed by a B-20 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part I Approved For Release 2UDMLASSMED00789ROO2200400001-3 control run equivalent in all respects to the experiment run, but where the subject applies no effort" to the task or is absent. A "global" control means the equipment (RNG, computer, etc.) was tested under the same conditions as used in the experiment, but separate from the experimental sessions. "Other" control or randomness tests meant that some reference was made to control runs or randomness tests, but the detailed results were either (a) not in the report or (b) the explanation of the controls were referenced or related to a description in another article. The columns labeled "With data" and "Without data" show different weights assigned to control runs depending on whether actual data were reported. "Target bit oscillation" means the assigned "hit" bit alternated with each newly generated bit to counterbalance any possible RNG bias. Data Integrity The "automatic hit/trial counters" criterion is satisfied if the RNG equipment has an automated method of keeping track of ~fiits and trials. "Automatic data recorling" requires use of punched paper tape, magnetic tape, computer disk, or so on, to automatically record the data collected in the experiment. There are instances in the literature (especially ffi reports from the early 1970's) where the automatic counter criterion is met, but not automatic data recording. "Tamper resistant equipment" requires either that the RNG was (a) in a locked laboratory and inaccessible to subjects at any time, (b) the experiment was under the immediate supervision of an experimenter, (c) the equipment had a "fail-safe" or interlock system t1lat prevented disruption of or tampering with the data collection process, or (d) the device was a computer with software data protection such as a password, protected files, or so on. Statistical Integrity "Pre-specified analysis" means it is clear from the report that the statistical analysis method was defined before data was collected. "Fixed run lengths" means the total number of trials was specified in advance of data collection. "Direction of effort stated" requires that it was clear whether the planned test was one-tailed or two-tailed, and what direction of "effort" subjects were to aim for during the experiment. Subject Integrity This category checked whether the subjects used in the experiment were ordinary, setected or special in some other way, or the experimenter (s). Stronger weight was applied to unselected subjects because it was felt they would have less invested in the experimental outcome and would be less likely to intentionally or unintentionally interfere with the equipment or procedures. Reporting Integrity If the report included the actual number of trials and hits, or the number of trials and a z, p, or t score, this was assigned the greatest weight. If it included only z, p, or t scores, this was assigned less weight. Report of any other statistics that we had to transform into the equivalent of z scores were assigned the lowest weight. In addition, reports consisting only of the statement "significant," without supporting data, were assigned a weight of 2 and similarly, the statement "nonsignificant" was assigned a weight of 4. Method of calculating quality-weighted analysis The weighting factor per experiment would be calculated as follows: If the criterion was clearly present in the published report, the associated weighting factor would be added to that experiment's B-21 T. UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 -Analysis Part I Meta CXSM"00789ROO2200400001-3 Approved For Release 2ugft 1W weight. If the criterion was not met, the weight assigned for that factor would be zero (0). The sum of the individual weights would be the overall weight per experiment, and the final overall weighted z score is then calculated as Weighted Z - wi Zi (5) 12: W72 Thus the minimum weight per experiment would be 0 if there were no mention of control tests, no description indicating that data collection was protected in some way, no evidence that statistical tests were pre-planned, insufficient report on who the subjects were, and no report of results. The maximum weight would be 125 (sum of thre4 control weights, three data integrity weights, three statistical integrity weights, use of ordinary subjects, and full report of data). Weighting thefiledrawer estimate We propose to weight our estimated 95 nonsignificant filedrawer studies with the average weight found in the rest of the studies. This proposal has a potential criticism,' however. Our means of estimating the filedrawer size depends on the observed z score distributioni Since the individual z scores depend on the weighting factors (which were in effect all I's in the analysis reported in this paper), the unweighted filedrawer estimate may be smaller than a similar estimate made with weighted z scores, thus inflating the final results. In response to this criticism, we would point out that the quality analysis is actually orthogonal to the filedrawer estimate because the actual magnitude of a z score does not change with our quality analysis, instead the importance of the z score is affected, and the importance of a z score is not considered in our filedrawer estimation method, only in the final estimate of overall significance. In additi6n, by adding a group of nonsignificant studies (the filedrawer estimate by definition is composed of nonsignificant studies) into a pool of z scores that have already been weighted according to quality, we are in effect creating an ultra-conservative test. A case could be made, for instance, on why a filedrawer estimate should not be added into a quality-weighted analysis at all, but to take the conservative approach given the nature of the claim, we will pool the 95 estimated studies along with the quality-weighted z scores. Defining experiments in the Quality Analysis Although adequate for a first-pass analysis, the method of selecting experiments described above would be less than perfect for a quality-weighted analysis. The main objection that could be raised is that the decision on what constitutes the subjects' "direction of effort" is dependent on the reviewer's interpretation of the experimental procedure. In many articles, we took educated guesses to decide what were the actual conditions, what were the subjects' intentions, did the authors in fact predict in advance the outcome, and so on. B-22 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part I Approved For Release 2LID&646iSiBfDOO789ROO2200400001-3 To address this problem in Part 2 of this meta-analysis, we will actually be performing two separate meta-analyses. The first will take into account the minimum number of experiments that we decide is a reasonable partitioning, and the second meta-analysis will be for the maximum number of experiments. The two end results will be compared, and the more conservative of the two will be used as the overall result. Deciding on a range of possible experiments allows us to form an "uncertainty" factor for each reference. If a reference's maximum-minimum experiment range is large as compared to the average observed range, we must consider that the quality of that reference, at least for our purposes, is poor. We plan on presenting a breakdown of each reference's uncertainly in the Part 2 meta-analysis to judge how clear each reference was in this study. Example of reference source quality analysis In Table 5 we present an example of a preliminary quality analysis applied to the source of reference. We assigned arbitrary weights according to our perception of the quality of average papers published in each parapsycho logical reference source (not counting the Princeton data). Then, after making guesses for these weights, we calculated a combined z~score contributed by each journal and compared it to a weighted z score according to equation (5). As seen in Table 5, the original combined z score dropped by 4 orders of magnitude in significance, but the weighted z score is still quite significant. We expect that the wider range of quality weights, as we have proposed above, will make a larger difference in a weighted analysis, but it would appear that most of the reports would have to be extremely poor in quality to nullify the overall p value. Table 5. Exploratory quality analysis of reference sources Refergnce StudiesOverall p(2-tail) Assigned weight z Journal of Parapsychology49 7.055 3.30 x 10 10 - 13 Proceedings of the 32 5.036 4.76 x 2 PA 10 - 7 Research in Parapsychology52 4.052 5.08 x I 10 - Journal of the ASPR 5 4.105 4.04 x 8 10 - European Journal of 6 3.052 0.002 8 Parapsychology Journal of the SPR 9 1.692 0.091 5 Combined unweighted 10.27. < 9.45 result = p x 10-21 Combined weighted result 9.53 < 1.60 = p x 10 - 21 B-23 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part I Approved For Release 2tMMASSIE4E&oO789ROO2200400001-3 Example of chronological analysis In Table 6 we show an analysis of variance of the 297 detailed experiments grouped according to year of publication. Table 6. Chronological analysis of variance SOURCE: grand mean year N MEAN Z SD SE 297 0.5979 1.5823 0.0918 SOURCE: year year N MEAN SD SE Z 1970 5 0.8247 3.0969 1.3850 1971 6 0.6292 2.3180 0.9463 1972 9 1.3565 1.4253 0.4751 1973 6 4.1239 1.4665 0.598T 1974 10 1.1539 1.9879 0.6286 1975 9 1.5804 2.1841 0.7280 1976 17 0.7366 1.5784 0.3828 1977 23 0.5695 1.8333 0.3823 1978 9 -0.2520 1.1482 0.3827 1979 7 0.6012 1.2325 0.4658 1980 5 -1.1411 1.3480 0.6029 1981 7 2.3437 0.7836 0.2962 1982 164 0.3098 1.2595 0.0983 1983 1 1.7500 41984 19 0.8492 1.0779 0.2473 SOURCE SS df MS F P ---- ==-= ------ =M ---- =-=-= ------- ---= --- year 151.2565 14 10.8040 5.165 1.14 x 10-8 error 589.8531 282 2.0917 This ANOVA shows that mean z scores differ significantly from year to year. We then looked for trends in z scores by performing a correlation between year and mean z. Results showed that r = -0.205, t(13) = -0.756, p = 0.463, i.e. there was no significant correlation between year of pubhcation and mean z score observed for that year. B-24 UNCLASSIFIED Approved For Release 2000108108: CIA-RDP96-00789ROO2200400001-3 M,,,-Analysis Part I Approved For Release)o*LQAS&I,EiEf)-00789ROO2200400001-3 A planned quality vs. z score correlational study Once we perform the quality analysis and have a list of raw z scores and associated quality weights, we plan on performing a correlation between these pairs of numbers. If the correlation is significantly negative, it would suggest that the better the quality of a study, the lower the z score. This would be in accordance with what some critics have claimed, namely that "there is a strong tendency for the rate of success to increase with the number of obvious defects" (Hyman, 1983, p.23). If a significant positive correlation is seen, however, this criticism can be refuted. CONCLUSION In an initial meta-analysis of psi experiments involving binary RNGs, we have identified 332 experiments published over the years 1969-1984 in 56 references. Based on an analysis of 188 of these experiments reported in parapsychological journals, we estimated the actual number of nonsignificant, unreported or unretrieved experiments to be 95. We found a total of 98 reported control studies in 14 of these references. A summary of the meta-analytic results is shown in Table 7 (on the following page). In agreement with a hypothesis of a "psi effect" on RNGs, the combined data indicate that, in the aggregate, the experimental conditions resulted in anomalous statistical behavior of the RNIG in the direction of effort specified by the task, and the control conditions resulted in expected binomial statistics for both mean z scores and standard deviations. The combined data shows an interesting effect on the distribution of z scores. We find that in the experimental condition the mean z score has been increased significantly from chance expectatim, which in the present context is in accord with the underlying hypothesis that the z score will shift according to the direction of the subject's effort. In the control condition we find the z mean shifted slightly, but not significantly so. We also find that the standard deviation of the combined distribution of experimental z scores has become significantly fatter than chance expectation, and that the combined control standard deviation is as expected. Both of these effects -- a shifting of the mean and fattening of the standard deviation, are accounted for in a model discussed by May, Radin et al (1985). Part 2 of this study will report on a quality-weighted analysis of this same data. By weighting each study according to a semi-objective quality assessment scale, we will address the major criticisms of such experiments in a quantified way, and the overall experimental vs. control result will provide a basis for discussion on whether or not this anomaly is, in fact, real. We urge readers to comment on and criticize the method described here, and especially on the proposed weighting criteria presented in the Quality Analysis section above. We plan to gain a consensus opinion among informed scientists on what constitutes an agreeable, conservative weighting scheme before we perform the quality analysis. In this way, the combined results observed in the weighted data will be less subject to post hoc. debate over the adequacy of the B-25 UNCLASSIFIED T Approved For Release 2000/08/08 : CIA-RDP96-00789ROO22004000-01-3 Meta-Analjsis Part 1 Approved For ReleaseAM&LASiSIBER00789ROO2200400001-3 analysis method, we will avoid an enormous amount of pointless work, and we can proceed with constructive discussion. We are especially interested in comparing the quality weights proposed by parapsychologists, critics of psi research, and "neutral" scientists, as this may give us a clue as to what is considered to be important in establishing consensus agreement among these different groups. Table 7. Summary of RNG meta-analysis standardvariance ZZ's test deviationagai = 1 studies nst cr Source Nf p (2-tail) of 2 - (N) 7 _N z scoresX 0 (2-tail) SURVEY - 47 Experiment 188 0.738 10.1144.9 x 10 1.739 568.5 4.9 x 10 - 24 Control 41 -0.0003-0.0020.999 1.036 44.0 0.62 PRINCETON Experiment 144 0.339 4.0634.9 x 10- 1.184 201.9 0.001 5 Control 57 0.050 0.3750.708 .806 37.0 0.05 - FILEDRAWER ESTIMATE Experiment 95 -0.084 -0.8200.412 0.661 41.5 0.51 COMBINED Experiment 427 0.420 8.6843.9 x 10 1.414 853.7 5.9 x 10 -18 - 34 Cogtrol 98 0.029 0.2850.776 .905 80.3 0.22 This "too small" variance In the control data is compatible with a model proposed by May, Radin et al (1985) and is also discussed by Jahn, Nelson and Dunne (1985). B~-26 T UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part I Approved For Releaseld$MASSIFiED-oo789ROO2200400001-3 REFERENCES General References Akers, C. Methodological criticisms of parapsychology. In S. Krippner (Ed.), Advances in parapsychological research, Volume 4. Jefferson, NC: McFarland & Company, Inc., 1984. Aspect, A., Dalibard, J. and Roger, G. Experimental test of Bell's inequalities using time-varying analyzers, Physical review letters, 49, 1982, 1804-1807. W - Aspect, A., Grangier, P., and Roger, G. Experimental realization of Einstein-Podolsky-Rosen-B ohm Gedankenexperiment: A new violation of Bell's inequalities. Physical review letters, 49, 1982, 91-94. Beloff, J. Parapsychology and philosophy. In B. Wolrhan (Ed.), Handbook of parapsychology, New York: Van N6strand, 1977. Bower, B. Getting into Einstein's brain. Science News, Vol. 127, No. 21, May 25, 1985, p. 330. Bozarth, J. D. and Roberts, R. R. Signifying significant significance. A?herican Psychologist, Vol. 27, 1972, 774-775. Carpenter, J. C. Intrasubject and subject-agent effects in ESP experiments. In B. Wolman (Ed.), Handbook of parapsychology, New York: Van Nostrand, 1977. Cooper, H. M. & Rosenthal, R. Statistical versus traditional procedures for summarizing research findings. Psychological bulletin, Vol. 87, 1980, 442-449. d'Espagnat, B. The quantum theory and reality. Scientific American, November 1979, 158-181. Dunne, B. f., Jahn, R. G., Nelson, R. D. An REG experiment with large data-base capability, 11: Effects of sample size and various operators. Technical Note PEAR 82001, Princeton Engineering Anomalies Research Laboratory, Princeton University, School of Engineering / Applied Science, 1982. Dunne, B. J., Jahn, R. G., Nelson, R. D. Precognitive remote perception. Technical Note PEAR 83003, Princeton Engineering Anomalies Research Laboratory, Princeton University, School of Engineering / Applied Science, 1983. Epstein, S. The stability of behavior. II. Implications for psychological research. American Psychologist, 35, 9, September 1980, 790-806. Fisher, R. A. Statistical methods for research workers, (2nd ed.). London: Oliver & Boyd, 1928. Hansel, C. E. M. ESP and parapsychology: A critical reevaluation. Buffalo, NY: Prometheus Books, 1980. Honorton, C. Error some placel Journal of communication, Vol. 25:1, Winter 1975. Honorton, C. Replicability, experimenter influence, and parapsychology: An empirical context for the study of mind. Paper presented at the meeting of the American Association for the Advancement of Science, Washington, D. C., 1978. B-2 7 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analys.s Part I Approved For Release *MtASSIFTEID-00789ROO2200400001-3 Honorton, C. Meta-analysis of psi ganzfeld research: A response to Hyman. Journal of Parapsychology, Vol. 49, No. 1, 1985, 51-92. Hyman, R. Does the ganzfeld experiment answer the critics' objections? In W. 0. Roll, J. Beloff, & R. A. White (Eds.), Research in Parapsychology 1982. Metuchen, NJ: Scarecrow Press, 1983, 21-23. Hyman, R. The ganzfeld psi experiment: A critical appraisal. Journal of Parapsychology, 49, 1985, 3-50. Jahn, R. G. The persistent paradox of psychic phenomena: An engineering perspective. Proceedings of the IEEE, Vol. 70, No. 2, February 1982. Jahn, R. G., Nelson, R. D., and Dunne, B. J. Variance effects in REG series score distributions. Proceedings of the 281h Annual Parapsychol08ical Association Convention,- Tufts University, Medford, Massachusetts, August 12-16, 1985. Kennedy, J. E. The role of task complexity in PK: A review. Journal of Parapsychology, 42, 1978, 89-122 Kurtz, P. Is parapsychology a science? In K. Frazier, Paranormal borderlands of science. Buffalo, NY: Prometheus Books, 1981. May, E. C., Humphrey, B. S., and Hubbard, G. S. Electronic system perturbation techniques. SRI International Final Report, September 30, 1980. May, E. C., Radin, D. I., Hubbard, G. S., Humphrey, B. S. and Utts, J. M. Psi experiments with random number generators: An informational model. Proceedings of the Presented Papers of the 28th Annual Parapsychological Association Convention, Tufts University, Medford, Massachusetts, August 12-16, 1985. Mermin, N. D. Is the moon there when nobody looks? Reality and the quantum theory. Physics today, April 1985, 38-47. Neber, A. The psychology of transcendence, Englewood Cliffs, NJ: Prentice-Hall, 1980. Nelson, R. D., Dunne, B. J. and Jahn, R. G. An REG experiment with large data-base capability, III: Operator related anomalies. Technical Note PEAR 84003, Princeton Engineering Anomalies Research Laboratory, Princeton University, School of Engineering Applied Science, September 1984. Palmer, J. ESP research findings: 1976-1978. In S. Krippner (Ed.), Advances in parapsychological research, Volume 3. New York: Plenum Press, 1982. Rohrlicb, F. Facing quantum mechanical reality. Science, Vol. 221, No. 4617, September 23, 1983, 1251-1255. Rosenthal, R. Experimenter effects in behavioral research, (rev. ed.). New York: Irvington, 1976. Rosenthal, R. Meta-analytic procedures for social research. Beverly Hills, CA: Sage Publications, 1984. Rush, J. H. Problems and methods in psychokinesis research. In S. Krippner (Ed.), Advances in parapsychological research, Volume 3. New York: Plenum Press, 1982. Schechter, E. 1. Hypnotic induction vs, control conditions- Illustrating an approach to the evaluation of replicability in parapsychological data. Journal of the American Society for Psychical Research, 78, 1-28, 1984. B-28 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Analysis Part 1 Approved For Release IN -"-L-ASSIBEE)-oo789ROO2200400001-3 Schmeidler, G. R. Psychokinesis: The basic problem, research methods, and findings. In S. Krippner (Ed.), Advances in parapsychological research, Volume 4. Jefferson, NC: McFarland & Company, Inc., 1984. Schmidt, H. Comparison of PK action on two different random number generators. Journal of Parapsychology, 1974, 38, 47-55. Stanford, R. G. Experimental psychokinesis: A review from diverse perspectives. In B. B. Wolman (Ed.), Handbook of parapsychology, NY: Van Nostrand Reinhold Company, 1977. Stanford, R. G. Recent ganzfeld-ESP research: A survey and critical analysis. In S. Krippner (Ed.), Advances in parapsychological research, Volume 4. Jefferson, NC: McFarland & Company, Inc., 1984. Tart, C. T. Laboratory PK: Frequency of m4nifestation and resemblance to precognition. In W. G. Roll, J. Beloff, & R. A. White (Eds.), Research in Parapsychology 1982. Metuchen, NJ: Scarecrow Press, 1983, 101-102. White, R. A. The influence of the experimenter motivation, attitudes and methods of handling subjects in psi test results. In B. B. Wolman (Ed.), Handbook of parapsychology, NY: Van Nostrand Reinhold Company, 1977. Meta-Analysis References Note: The following references describe psi experiments using RNGs in various ways. In this list, the following codes are used: A * means the reference was used in our binary RNG meta-analysis; a rr means the reference was also used, but the experiments mentioned in the report were simulated due to lack of sufficient detail; *rr means this report contained both detailed and non-detailed studies. Andre, E. Confirmation of PK action on electronic equipment. Journal of Parapsychology, 1972, 36, 283-293, Bierman, R+ J., and Wout, N. V. T. The performance of healers in PK tests with different RNG feedback algorithms. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1976. Metuchen, NJ: Scarecrow Press, 1977, 131-133. Bierman, R. J., and Houtkooper, J. M. Exploratory PK tests with a programmable high speed random number generator. European Journal of Parapsychology, 1975, 1, 3-14. rr Braud, W. Allobiofeedback: Immediate feedback for a psychokinetic influence upon another person's physiology. In W. G. Roll (Ed.), Research in Parapsychology 1977. Metuchen, NJ: Scarecrow Press, 1978, 123-134. Braud, L., and Braud, W. Psychokinetic effects upon a random event generator under conditions of limited feedback to volunteers and experimenter. Journal of the Society for Psychical Research, 1979, 50, 21-30. rr Braud, W., and Schroeter, W. Psi tests with Algernon, a computer oracle. In W. G. Roll, J. Beloff, & R. A. White (Eds.), Research in Parapsychology 1982. Metuchen, NJ: Scarecrow Press, 1983, 163-165. *rr Braud, W. G., Smith, G., Andrew, K., and Willis, S. Psychokmetic influences on random number generators during evocation of "analytic" vs. "nonanalytic" modes of information processing. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1975. Metuchen, NJ: Scarecrow Press, 1976, 85-88. B-29 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 X0p6%4dTRdft Re I ease 2OWfh§~iNtf)-00789ROO2200400001-3 rr Broughton, R. S., and Millar, B. A PK experiment with a covert release-of-effort test. in J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1976. Metuchen, NJ: Scarecrow Press, 1977, 28-30. rr Broughton, R., Millar, B., Beloff, J., and Wilson, K. A PK investigation of the experimenter effect and its psi-based component. In W. G. Roll (Ed.), Research in Parapsychology 1977. Metuchen, NJ: Scarecrow Press, 1978, 41-48. Broughton, R. S., Millar, B., and Johnson, M. An investigation into the use of aversion therapy techniques for the operant control of PK production in humans. Proceedings of the Parapsychological Association Convention, 1979, 1-18. Broughton, R. S. and Perlstrom, J. R. Results of a special subject in a computerized PK game. Proceedings of the Parapsychological Association Convention, 1984, 411-419. - Camstra, B. PK conditioning. In W. G. Roll, R. L. Morris, & J. D. Morris (Eds), Research in Parapsychology 1972. Metuchen, NJ: Scarecrow Press, 1973, 25-27. rr Davis, J. W. and Morrison, M. D. A test of the Schmidt model's prediction concerning multiple feedback in a PK test. In W. G. Roll (Ed.), Research in Parapsychology 1977. Metuchen, NJ: Scarecrow Press, 1978, 163-168. ~ Debes, J. and Morris, R. L. Comparison of striving and nonstriving instructional sets in a PK study. Journal of Parapsychology, 1982, 46, 297-312. ~ Dunne, B. J., Jahn, R. G., and Nelson, R. D. An REG experiment with large data-base capability. In W. G. Roll, R. L. Morris, & R. A. White (Ed.), Research in Parapsychology 1981. Metuchen, NJ: Scarecrow Press, 1982, 50-51. Main reference is Nelson, R. D., Dunne, B. J. and Jahn, R. G. An REG experiment with large data-base capability, III: Operator related anomalies. Technical Note PEAR 84003, Princeton Engineering Anomalies Research Laboratory, Princeton University, School of Engineering / Applied Science, September 1984. Dunne, B. J., Jahn, R. G., and Nelson, R. D. An REG experiment with large data-base capability, II: effects of sample size and various operators. In W. G. Roll, J. Beloff, & R. A. White (Eds.), Research in Parapsychology 1982. Metuchen, NJ: Scarecrow Press, 1983, 154-157. ;f ~ Edge, H. L. Plant PK on an RNG and the experimenter effect. In W. G. Roll (Ed.), Research in Parapsychology 1977. Metuchen, NJ: Scarecrow Press, 1978, 169-174. ~ Giesler, P. V. Differential micro-PK effects among Afro-Brazilian Caboclo and Candomble cultist using trance-significant symbols as targets. Proceedings of the Parapsychological Association Convention, 1984, 87-105. ~ Heseltine, G. L. Electronic random number generator operation associated with EEG activity. Journal of Parapsychology, 1977, 41, 103-118. ~ Heseltine, G. L., and Mayer-Oakes, S. A. Electronic random generator operation and EEG activity: further studies. Journal of Parapsychology, 1978, 42, 123-136. ~ Heseltine, G. L. and Kirk, J. H. Examination of a majority-vote technique. Journal of Parapsychology, 1980, 44, 167-176. ~ Heseltine, G. L. PK success during structured and non structured RNG operation. Proceedings of the Parapsychological Association Convention, 1984, 379-388. ~ Hill, S. PK effects by a single subject on a binary random number generator based on electronic noise. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1976. Metuchen, NJ: Scarecrow Press, 1977, 26-28. B-30 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For Release MNOOLDA.S%Fflpl)-00789RO6t2DMb6DIiIS 1 ~ Honorton, C. Effects of meditation and feedback on psychokinetic performance: a pilot study with an instructor of transcendental meditation.In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1976. Metuchen, NJ: Scarecrow Press, 1977, 95-97. ~ Honorton, C., Barker, P., and Sondow, N. Feedback and participant-selection parameters in a computer RNG study. In W. G. Roll, J. Beloff, & R. A. White (Eds.), Research in Parapsychology 1982. Metuchen, NJ: Scarecrow Press, 1983, 157-159. ~ Honorton, C. and Barksdale, W. PK performance with waking suggestions for muscle tension versus relaxation. Journal of the American Society for Psychical Research, 1972, 66, 208-214. rr Honorton, C. and Tremmel, L. Psi correlates of volition: a preliminary test of Eccles' "Neurophysiological Hypothesis" of mind-brain interaction. In W. G. Roll (Ed.), Research in Parapsychology 1978. Metuchen, NJ: Scareciow Press, 1979, 36-38. ~ Honorton, C. and May, E. C. Volitional control in a psychokinetic task with audittry and visual feedback. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1975. Metuchen, NJ: Scarecrow Press, 1976, 90-91. ~ Houtkooper, J. M. A study of repeated retroactive psychokinesis in relation to direct and random PK effects. European Journal of Parapsychology, 1977, 4, 1-20. Jungerman, R. L., and Jungerman, J. A. Computer controlled random number generator PK tests. In W. G. Roll (Ed.), Research in Parapsychology 1977. Metuchen, NJ: Scarecrow Press, 1978, 157-162. 1 Kelly, E. F. and Kanthamani, B. K. A subject's efforts toward voluntary control. Journal of Parapsychology, 1972, 36, 185-197. Levi, A. The influence of imagery and feedback on PK effects. Journal of Parapsychology, 1979, 43, 275-289. Matas, F. and Pantas, L. A PK experiment comparing meditating versus nonmeditating subjects. Proceedings of the Parapsychological Association Convention, 1971, 12-13. May, E. C. and Honorton, C. A dynamic PK experiment with Ingo Swann. In J. D. Morris, W. G. Roll,A R. L. Morris (Eds.), Research in Parapsychology 1975. Metuchen, NJ: Scarecrow Press, 1976, 88-89. Millar, B. A covert PK test of a successful psi experimenter. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1976. Metuchen, NJ: Scarecrow Press, 1977, 111-113. Millar, B. and Broughton, R. A preliminary PK experiment with a novel computer-linked high speed random number generator. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1975. Metuchen, NJ: Scarecrow Press, 1976, 83-84. *rr Millar, B. and Mackenzie, P. A test of intentional versus unintentional PK. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1976. Metuchen, NJ: Scarecrow Press, 1977, 32-35. Morrison, M. D., and Davis, J. W. PK with immediate, delayed, and multiple feedback: a test of the Schmidt model's predictions. In W. G. Roll (Ed.), Research in Parapsycholo8Y 1978. Metuchen, NJ: Scarecrow Press, 1979, 117-120. Morris, R. L., Nanko, M., and Phillips, D. A comparison of two popularly advocated visual imagery strategies in a psychokinesis task. Journal of Parapsychology, 1982, 46, 1-16. B-31 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Meta-Anal A e gelease 2UWL8A!93flqED00789ROO2200400001-3 pproV'YdtPdV *rr Palmer, J. and Kramer, W. Internal state and temporal factors in RNG PK. In R. A. White & R. S. Broughton (Eds.), Research in Parapsychology 1983. Metuchen, NJ: Scarecrow Press, 1984, 28-30. Pantas, L. PK scoring under preferred and nonpreferred conditions. Proceedings of the Parapsychological Association Convention, 1971, 47-49. *rr Radin, D. 1. Mental influence on machine-generated random events: six experiments. In W. G. Roll, R. L. Morris, & R. W. White (Eds.), Research in Parapsychology 1981. Metuchen, NJ: Scarecrow Press, 1982, 141-142. Randall, J. L. An extended series of ESP and PK tests with three English schoolboys. Journal of the Society for Psychical Research, 1974, 47, 485-494. rr Schechter, E. I., Honorton, C, Barker, P., and VarvogIis, M. P. Relationships between participant traits and scores on two computer-controlled RNG-PK games. In R. A. White & R. S. Broughton (Eds.), Research in Parapsychology 1983. Metuchen, NJ: Scarecrow Press, 1984, 32-33. rr Schechter, E., Barker, P., and Varvoglis, M. P. A second study with the "psi ball" RNG-PK game. In R. A. White & R. S. Broughton (Eds.), Research in Parapsychology 1983. Metuchen, NJ: Scarecrow Press, 1984, 93-94. ~ Schechter, E. I., Barker, P., and Varvoglis, M. A preliminary study with a PK game involving distraction from the psi task. In W. G. Roll, J. Beloff, & R. A~ White (Eds.), Research in Parapsychology 1982. Metuchen, NJ: Scarecrow Press, 1983, 152-154. ~ Schmeidler, G. R. and Borchardt, R. Psi scores with random and pseudo-random targets. In W. G. Roll (Ed.), Research in Parapsychology 1980. Metuchen, NJ: Scarecrow Press, 1981, 45-47. Schmidt, H. Precognition of a quantum process. Journal of Parapsychology, 1969, 33, 99-108. ~ Schmidt, H. A PK test with electronic equipment. Journal of Parapsychology, 1970a, 34, 175-181. *rr Schmidt, H.~PK experiments with animals as subjects. Journal of Parapsychology, 1970b, 34, 255-261. Schmidt, H. An attempt to increase the efficiency of PK testing by an increase in the generation speed. In W. G. Roll, R. L. Morris, & J. D. Morris (Eds.), Research in Parapsychology 1972. Metuchen, NJ: Scarecrow Press, 1973a, 65-67. ~ Schmidt, H. PK tests with a high-speed random number generator. Journal of Parapsychology, 1973b, 37, 105-118. ~ Schmidt, H. PK effect on random time intervals. In W. G. Roll, R. L. Morris, & J. D. Morris (Eds.), Research in Parapsychology 1973. Metuchen, NJ: Scarecrow Press, 1974a, 46-48. ~ Schmidt, H. Comparison of PK action on two different random number generators. Journal of Parapsychology, 1974b, 38, 47-55. Schmidt, H. Observation of subconscious PK effects with and without time displacement. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research-in Parapsychology 1974. Metuchen, NJ: Scarecrow Press, 1975, 116-121. ~ Schmidt, H. PK experiment with repeated, time displaced feedback. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1975. Metuchen, NJ: Scarecrow Press, 1976a, 107-109. B-32 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 Approved For Releaselalb&bASSIE&EE)-00789RO6l6-W666~ag I ~ Schmidt, H. PK effect on pre-recorded targets. Journal of the American Society for Psychical Research, 1976b, 70, 267-291. ~ Schmidt, H. A take-home test in PK with pre-recorded targets. In W. G. Roll (Ed.), Research in Parapsychology 1977. Metuchen, NJ: Scarecrow Press, 1978, 31-36. Schmidt, H. Use of stroboscopic light as rewarding feedback in a PK test with prerecorded and momenta rily-generated random events. In W. G. Roll (Ed.), Research in Parapsychology 1978. Metuchen, NJ: Scarecrow Press, 1979a, 115-117. a:r Schmidt, H. Search for psi fluctuations in a PK test with cockroaches. In W. G. Roll (Ed.), Research in Parapsychology 1978. Metuchen, NJ: Scarecrow Press, 1979b, 77-78. ~ Schmidt, H. PK tests with pre-recorded and re-inspected seed numbers. Journal of Parapsychology, 1981, 45, 87-98. V $ ~ Schmidt, H. Addition effect for PK on pre-recorded targets. Proceedings of the Parapsychological Association Convention, 1984, 136-139. Schmidt, H. and Pantas, L. Psi tests with psychologically equivalent conditions and internally different machines. Proceedings of the Parapsychological Association Convention, 1971, 49-51. Schmidt, H. and Pantas, L. Psi tests with internally different machines. Journal of Parapsychology, 1972, 222-232. ~ Schmidt, H. and Terry, J. C. Search for a relationship between brainwa'ves and PK performance. In J. D. Morris, W. G. Roll, & R. L. Morris (Eds.), Research in Parapsychology 1976. Metuchen, NJ: Scarecrow Press, 1977, 30-32. Shafer, M. G. A PK experiment with random and pseudorandom targets. In W. G. Roll, J. Beloff, & R. A. White (Eds.), Research in Parapsychology 1982. Metuchen, NJ: Scarecrow Press, 1983, 64-66. Stanford, R. G., Zenhausern, R., Taylor, A., and Dwyer, M. A. Psychokinesis as psi-mediated instrumental response. Journal of the American Society for Psychical Research, 1975, 69, 127-133. Stanford, I~ G. "Associative activation of the unconscious" and "visualization" as methods for influencing the PK target: a second study. Journal of the American Society for Psychical Research, 1981, 75, 229-240. Tart, C. T. Are prepared random sequences and real time random generators interchangeable? In W. G. Roll, J. Beloff, & J. McAllister (Eds.), Research in Parapsychology 1980. Metuchen, NJ: Scarecrow Press, 1981, 43-47. Terry, J. and Schmidt, H. Conscious and subconscious-PK tests with pre-recorded targets. In W. G. Roll (Ed.), Research in Parapsychology 1977. Metuchen, NJ: Scarecrow Press, 1978, 36-41. rr Talbert, R. and Debes, J. Time-displa cement psychokinetic effects on a random-number generator using varying amounts of feedback. In W. G. Roll, R. L. Morris, & R. A. White (Eds.), Research in Parapsychology 1981. Metuchen, NJ: Scarecrow Press, 1982, 58-61. rr Varvoglis, M. P. and McCarthy, D. Psychokinesis, intentionality, and the attentional object. In W. G. Roll, R. L. Morris, & R. A. White (Eds.), Research in Parapsychology 1981. Metuchen, NJ: Scarecrow Press, 1982, 51w-55. Winnett, R. and Honorton, C. Effects of meditation and feedback on psychokinetic performance: results with practitioners of Ajapa yoga. In J. D. Morris, W. G. Roll, & R. L. B-33 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 AWtovled Fur Release ljlqq~LOA-SWMD-00789ROO2200400001-3 Morris (Eds.), Research in Parapsychology 1976. Metuchen, NJ: Scarecrow Press, 1977, 97-98. B-34 UNCLASSIFIED Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 I f I I I Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3 t w V Approved For Release 2000/08/08 : CIA-RDP96-00789ROO2200400001-3