Approved For Release 2000/08/11 : CIA-RDP96-00792ROO0700660002-7
112 Papers
star "makes or breaks" the results. By using large "anonymous"
source groups the incentive for any one individual to create false
anomalies might be greatly reduced.
STATISTICAL ISSUES AND METHODS*
WHEN WILL WE BEGIN TO REDUCE ALPHA AND BETA ERRORS
IN STATISTICAL PSI EXPERIMENTS?
Ulrich Timm (Institut flir Grenzgebiete.der Psychologie und
Psychohygiene, Eichhalde 12, 7800 Freiburg i.Br. , West
Germany)
In many psi experiments some statistical selection errors are
made, after whose correction the initial statistical significance dis-
appears. These are Type I errors, more simply called alpha errors.
That does not necessarily mean, however, that in these experiments
real psi effects do not exist, since the usual methods, if utilized
correctly, are often so ineffective- -with regard to the rareness,-
instability, and inconsistency of psi effects--that they can only
seldom lead to statistical significance. This inefficiency of statisti-
cal methods creates Type II errors, or beta errors. Therefore,
our objective should not only be the reduction of alpha errors and
the related decrease of spurious significances but also the reduction
of beta errors and the related increase of real significance.
First I give an overview of those alpha errors that I call
statistical selection errors. These show, simply stated, the follow-
ing three qualities (Timm, ZP, 1983, 195-229):
(1) From, a set of statistical results a single result is se
lected and evaluated by some significance test.
(2) The selection is not performed randomly but according
to a criterion that is related to the level of the single
result in that it directly or indirectly favors positive
results.
(3) Despite this success -dependent selection, the significance
test is carried out and interpreted in the usual manner
without any correction.
*Chaired by Martin U. Johnson.
Approved For Release 2000/08/11 CIA-RDP96-00792ROO0700660002-7
114 Papers
Following this simple recipe it is almost always possible, even
in such investigations whose results are purely random, to find
some kind of "significant effects." If one finds, for example, among
20 independent statistical results one single result in excess of the
5% significance limit, then one should correctly ascertain that this
corresponds exactly to chance expectation. If one, however, sin-
gles out that particular result and declares it as significant, then
one will have made an exemplary selection error! In contrast, the
correct evaluation would consist in a statistical analysis of the total
result. Through such a global significance test every statistical
selection error will automatically be avoided, But one can also ap-
ply a correction formula to individual selections.
A look at experimental parapsychology immediately shows that
it supplies virtually fantastic possibilities to make such selection er-
rors. Already in the evaluation of simple standard experiments
containing only one hit variable the following (intra- or interexperi-
mental) selection errors appear with various frequency:
(1) The selection of single temporal sections of an experl-
ment, for example, single "runs," "sessions," "situa-
tions," etc.
(2) The selection of single subjects from the total group.
(3) The selection of single significance tests from several
tests responding differently to the intraindividual or
interindividual score distributions.
(4) The selection of single experiments from the total num-
ber of all replications of an experiment.
(5) The selection of single kinds of experiments from the
total number of all psi experiments.
However, there seems to be a plausible argument that one
would be allowed in parapsychology to test separately the signifi-
cance of single experimental sections, single subjects, single ex-
periments and so on. One says, namely, that the separate results
are not homogeneous because of the great intra- and interexperi-
mental variability of psi performance. Heterogeneous results, one
says further, need not be combined since each time one is testing
a different hypothesis. Unfortunately, I cannot accept this argu-
mentation. The significance test of a statistical experiment always
refers to the null hypothesis; and, in the case of complex experi-
ments, which can be broken down into a number of parts, there
usually exists a whole hierarchy of null hypotheses. Then any
subordinate null hypothesis is to be interpreted as a special case
of a superordinate null hypothesis and can only be rejected if the
superordinate null hypothesis has already been rejected. Corres-
pondingly, the subordinate results, in reference to all superordinate
S-at-.'S-~'ca: ISsues Lind .1'et'llods 115
null hypozheses. are to be classified as homogeneous
and can only
then be separately tested when all of the superordinate
results have
become significant.
In parapsychology, one can even formulate such
a general
null hypothesis that it is superordinated to
each and every psi ex-
periment. It simply states that psi phenomena
do not exist at all.
Thus, to evade selection errors, one had to
combine all of the psi
experiments up to that point and let them undergo
a global signifi-
cance test before one is allowed to interpret
them separately. Even
if one assumes that, meanwhile, the existence
of psi has been es-
tablished, one must in any case test the total N
result of every single
experiment, since the psi effect is said to ~
vary among experiments
and consequently may not necessarily appear CD
in each of them. Only
if the total result is significant is one allowed,CD
then, to test the
significance of partial results. (D
(D
0
The same possibilities of error exist also in CD
the case of
differential or correlational psi experiments, CD
w1iich examine differ-
CD
ences between various experimental conditions CD
or correlations be-
tween psi variables and other variables (e.g.,
the sheep-goat ef-
C14
feet). Here, the same principle of hierarchy M
is valid: wherever
a meaningful superordinate null hypothesis exists,N
it must be re-
CD
jected before separate experimental effects, CD
correlations, etc. are
allowed to undergo a normal significance test. (b
Therefore, one must
(7)
also demand the calculation of global significance(L
tests for almost
all correlational experiments. In the c 'ase
of multivariate designs
containing many experimental conditions, personality,
or psi vari-
ables, this can be done through a multiple or
canonical correlation
in which the psi variables serve as criteria
and the other variables
as predictors. If one abstains from this, one
will find in every
larger set of predictor variables some significant
correlations with
any psi variables; but if one singles them out
and interprets them
oo
in the usual manner, one makes a selection errorcD
and could possi-
bly fall victim to a statistical artifact. If CD
the apparently discovered
CD
effect is not replicated in the next experiment,CD
this corresponds to
statistical expectation and naturally has nothingC14
to do with the
"nonrepeatability" of psi. (V
ca
One may object to this discussion that sophisticated
experi-
ments are carried out in a much more refined
manner. Here, in
advance, one formulates certain hypotheses which
correspond to
expected correlations or differences within 0
the results. In the
evaluation one limits oneself to these hypotheses.LL
In this case
selection errors are said to be excluded and (V
only then possible if
one bests post-hoe hypotheses. Unfortunately, >
this argument is 2
also not completely correct. It is true that CL
one limits the evalua-
tion possibilities through these preformulated r
hypotheses, which is .L
very recommendable. However, if one has formulated
sufficiently
enough hypotheses, they still have among these
hypotheses enough
possibilities for selection. One must, for that
reason, here also
116 Papers Statistical Issues and Methods 117
carry out a global significance test for such single hypotheses to
which a superordinate null hypothesis can be assigned.
It should be clear that by performing global significance
tests many psi experiments must lose their significance. I remem-
ber, though, that I also mentioned the interexperimental. selection
above, to whose avoidance, at the least, all similar psi experiments
should be combined and submitted to a global significance test.
Through such a 11meta- analysis," on the other hand, the signifi-
cance may increase so that the single experiment loses part of its
meaning.
My second theme is the reduction of beta errors in the sta-
tistical evaluation of psi experiments. The problem is to increase
the statistical efficiency (or power) of the significance tests in such
a way that--despite the avoidance of selection errors--minimal psi
effects can be statistically detected. I confine myself to two differ-
ent questions, both of w1iich are of considerable importance to the
practice. The first question is: which are the statistically optimal
methods for correcting a given selection or for combinin~,r single re-
sults which shall undergo a global significance test?
Here, it can first be answered that for any selection of a
single result there is a simple statistical correction possible that
An approximate formula for
replaces the global significance test.
this purpose requires that one multiplies the p value of the selected
result with the number of given results. Naturally, in this manner,
the p value will be strongly increased so that the statistical signifi-
cance will in most cases disappear, as in the case of a global sig-
nificance test. Nevertheless, this is a universal and very simple
method of correcting intra- or interexperimental selection.
Most of the other methods consist in weighted combinations of
the single results so as to attain a most efficient global significance
test. In the case of standard psi experiments that seems trivial be-
cause one needs only to add the different hits, whose sum can be
evaluated with a CR just as well as the separate results. However,
an analysis of intra- and interindividual distributions of psi scores
shows that the simple addition of hits is one of the statistically
least efficient methods, even for the aggregation of small experi-
mental units such as individual runs. The reason for this lies in
the strong variability of psi scores, which can vary even in a bi-
polar fashion between psi-hitting and psi-missing so that the I-lit
deviations cancel out each other, Therefore, I have suggested
special (nonlinear) transformations weighting the single scores ac-
cording to their size, Finally, following the method of the likeli-
hood quotient, I came to a measure which is statistically most effi-
cient for strongly varying psi scores and is a linear function of
the well-known "run-score variance."
The second question refers to the identification of permissible
forms of selection which one could use to increase the statistical
efficiency. For example, the above definition of selection error al-
lows one to exclude any partial results from the global significance
test of an experiment if the exclusion ensues according to a criterion
that, under the null hypothesis, is independent of the respective
results. If one, in this way, discovers certain clues that particu-
lar experimental situations, certain subjects, certain variables, etc.,
could be unsuccessful, one is allowed to eliminate them as is. This
can be a great advantage because every nonsignificant partial result
reduces the significance of the total result.
In the global statistical evaluation of a multivariate experiment, 'Ir
one should, further, reduce correlated criterion or predictor vari- C14
0
ables to a smaller number of factors by performing a factor analysis, 0
because the statistical efficiency in the case of correlated variables 0
(D
decreases with the number of variables. Finally, the so-called (D
extreme-group method should be mentioned, according to which one
is allowed to eliminate the middle cases of the distribution of a vari-
able when calculating correlations. For example, one could eliminate 0
all the chance-scoring subjects of a correlational study, if enough
psi-hitters and psi-missers remain. The correlations between psi C14
variables and other variables could, in that way, become much more a)
significant.
I am afraid my explanations will not lead to a decisive cha (D
nge W
in the statistical methods of parapsychologists. When I pointed to
the problem of statistical selection errors at the 1980 PA Convention
in Reykjavik, it also did not have any considerable effect. One
must, apparently, turn to the psi skeptics to attain such effects.
Probably, selection errors serve the general psychological tendency
n empirical data with one's own expectations
to synchronize the give
regarding reality. Therefore, the final demand can only be to
answer one's own ways of acting with increased self-criticism, even 00
in such an objective area as mathematical statistics. Otherwise,
those cynics will be confirmed who always have contended that,
with statistics, one can prove everything,
C14
4)
EVALUATING FREE-RESPONSE RATING DATA
0
Sybo A. Schoutent and Gert Camfferman (Parapsychology LL
Laboratory, University of Utrecht, Sorbonnelaan 16,
3584CA Utrecht, The Netherlands) >
0
L_
CL
During the recent decades the use of forced-choice methods CL
in experimental research in parapsychology has gradually declined
in favor of free-response techniques. A disadvantage of free-
response techniques is that they are rather time consuming. The