Stephen P. King (firstname.lastname@example.org)
Wed, 28 Apr 1999 11:49:30 -0400
Re: Is thinking more like a thunderstorm or like a
Date: 21 Mar 1999 00:00:00 GMT
Organization: Deja News - The Leader in Internet Discussion
In article <email@example.com>,
Modlin made a few statements about the theorems that Papert and I proved
Modlin: Minsky and Papert showed that a SINGLE STAGE (or "layer) of
perceptrons is limited to computing linearly separable functions.
I'm not sure he read the book, although otherwise he knows quite a lot
neural networks. The networks that we considered were not single-layer,
two-layer networks. That is, we discussed what could be computer by a
linearly separable function of arbitrary Boolean functions. Then we
discussed how large must be the coefficients in the second layer, and
these grow with the number of variables presented at the "0-th" layer
is, the inputs to the first layer of Boolean functions. In particular,
discussed several classes of functions at the first layer-in particular,
where there is a bound N on the numbers of inputs to the first-layer
and where, if the input is a two-dimensional 'retina", there is a bound
the diameter of those inputs for each neuron.
Modlin: It is a trivial result, not nearly worth the attention paid it.
Unwarranted emphasis on the point retarded progress in neural modelling
years, and even now people who quote it without understanding continue
propagate the damage it has done.
These results aren't at all trivial. In the 30 years since, no one has
improved or generalized our results, except for one important book,
Neural Computation, by Kai-Yeung Siu, Vwani Roychoudury, and Tom
Modlin: The same limitation applies to ANY set of primitive functions,
whether Perceptrons, arithmetic operations, or things like AND/OR/NOT
logic gates. In one stage you can only compute those primitive
To compute more functions you have to combine the outputs of the first
with more of the primitives, producing a multi-stage computation.
Modlin: Adding more devices lets you compute more complex functions.
Perceptrons with non-linear or thresholding outputs are boolean
you can compute any computable function with a network of them, just as
can with NAND, or with AND/OR/NOT, or with many other sets of
[...] Technically, one extra layer of intermediate functions between
and outputs is enough to compute anything. This cannonical form is
inneficient, and practical circuits use many stages.
That's just what we said. Even a shallow net can compute any function,
there are enough with a first layer of ANDs and a second layer of ORs
compute any function whatever. HOWEVER, in general, this requires a
of OR cells that is exponential in the number of INPUT variables. That
what the book was about, and we showed, for example, that to make such a
count the number of connected objects on the retina requires that sort
exponential growth in coefficients or numbers of cells. This is also
other classes of patterns and input restrictions-but very little is yet
about this, perhaps because of the mistake that Modlin and many others
made about how our book "retarded progress in neural modelling for 20
That rumor, together with not understanding what we did, has had a sad
result. We now know some cases where neural nets perform well, and we
many cases where they don't work at all well (e.g., in grammar-like
problems)-BUT WE KNOW ALMOST NO THEORY ABOUT WHEN THEY WORK AND WHEN
DON''T WORK WELL. (We also showed several surprising special cases
exponential growth is surprisingly small.)
More important, our theorems in most cases also apply to multilayer
too-in contradiction to what Modlin claims. In both of those
number-of-input-limited networks, the coefficients still grow
but with exponents that are smaller by the order of the numbers of
important classes of problems. However, no one has proven much about
Modlin did not mention it, but he has done significant research on
"re-entrant" or "recursive" networks, which can be very efficient for
kinds of problems. By re-using the inner neurons, one can indeed do much
more-but as Modlin has pointed out in other Usenet messages, such
are equivalent to multi-layer nets (whose depth is the number of
operations). We still need better theories of what coefficients each
problems then requires.
I should mention two other bad misunderstandings.
1) There is also a rumor that 'backpropagation' in multilayer nets
obstacles we found. However, although backpropagation is sometimes an
efficient way to compute coefficients in multilayer feedforward nets, it
won't help if no adequate set of such coefficients exists-which was the
of about half of our book.
3. Almost all of the successful known applications of neural networks
"approximate" solutions to pattern recognition problems-that is, for
that are permitted to make some (usually unspecified) portion of
book was about the theory of which patterns could be correctly
So all those people who say that ANNs "work" where we said that they
have changed the definition of 'work" without telling this to all you
folk who didn't actually look at our book. I'm happy to grant that
"frequently works" is often better than nothing at all-but isn't it
professionally disgraceful to not say clearly that you're talking about
different subject, when you're criticizing a mathematical treatise?
In fact we did consider "statistical separability" in Chapter 12 (as
to strictly linear separability), so the distinction should be very
readers. However, I regret that we didn't write more about that. The
trouble, of course, is that this is an even harder subject!
Finally, although most "histories of ANNs" blame our 1969n book for the
decrease in interest in neural networks after 1970, the big decline
already took place before that. Rosenblatt's book was published in
There was a great surge in activity, which declined around 1965 when
projects found that larger Perceptron machines did not produce much
results than smaller one-and we started our own research to try to see
this had happened. Not much has been learned since then about this,
because of those false rumors about our book. We still have very little
theoretical understanding of which patterns can be efficiently
It should be noted, though, that the computers of that era did not have
and typical memory access was the order of 10 microseconds. The
occurred when, in the1980s speeds had increased 100000-fold. Now better
coefficients could be found-that is, when they existed.
In other words, most critics did not understand how we defined
pattern." We meant accepting all cases, and rejecting all non-cases.
true that if some errors are permitted, then more patterns can by
"recognized". However, then you must live with uncertainty.
Finally, I should add that most of the "successful" applications on
feedforward nets aren't actually done with feedforward nets-because they
usually begin with some external device for 'normalizing' or "centering"
machine's focus on the object to be recognized. Naturally, this makes
problem much easier. On the positive side, for practical applications,
substantial amounts of error are acceptable, feedforward nets are indeed
competitive, and, in many instances, superior to other kinds of standard
statistical recognition schemes. Only recently, by using such methods as
'hidden Markov models, which in effect are more versatile at evolvin
-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your
This archive was generated by hypermail 2.0b3 on Sun Oct 17 1999 - 22:31:52 JST