[time 294] Re: [time 293] Re: [time 291] Re: Expressiveness of interactive computing

Stephen P. King (stephenk1@home.com)
Sun, 09 May 1999 13:55:17 -0400

Dear Peter,

Peter Wegner wrote:
> Stephen
> My comments on your comments in the text below are indicated by **.
> It is amazing that you have immediately understood exactly what I was trying to say.
> peter

        I have you to thank for your patience with me. ;) I can visualize your
work very well and relate it rapidly to similar mental pictures. It is
the trade-off of dyslexia... ;)
        I'll snip a bit.
> >From stephenk1@home.com Sun May 9 11:03 EDT 1999
> Date: Sun, 09 May 1999 11:04:39 -0400
> From: "Stephen P. King" <stephenk1@home.com>
> Organization: OutLaw Scientific
> X-Mailer: Mozilla 4.5 [en]C-AtHome0405 (Win95; U)
> X-Accept-Language: en
> MIME-Version: 1.0
> To: Peter Wegner <pw@cs.brown.edu>
> CC: time@kitada.com
> Subject: Re: [time 291] Re: Expressiveness of interactive computing
> Content-Transfer-Encoding: 7bit
> Peter Wegner wrote:
> > Given two finite agents M1, M2, their behavior BX(M1), BX(M2) is distinguishable relative to an environment (observer, tester) X if the exclusive-or BX(M1) XOR BX(M2) is nonempty. Members of this set are called distinguishability certificates. Finite agents are equivalent for X if their behavior cannot be distinguished by a distinguishability certificate. Agents equivalent relative to a restricted environment X1 may be distinguishable relative to a more complex environment X2, just as for people.
> What is the measure of the information contained in the
> distinguishability certificates (DC), e.g. how do we quantitatively
> represent the information content of DCs?

> ** Information content depends on particular processes of observation, though
> the length of DCs has information measure log(k).
> For sequential observers (SIMs) observations are I/O streams (i1,o1), (i2,o2), ..., where ik may depend on previous outputs and on unpredictable temporal events. Because information can include events that occur during the process of computation, more information is available than for algorithms, and more distinctions can be made. Additional information is particularly useful if it allows us to make local decisions that remain globally valid (see later discussion). For example, driving is an example where this is ususally, though not universally true, and evolution is an example of a natural process where this occurs.
> Let me know if you have more thoughts about information measures.

        I am looking into information measures, re: Frieden's work. I have been
thinking about the information theory aspect of LS theory for a long
time but I still do not have a good text version of my thoughts, but I
am working on it. What I have so far is a crude cubist-like cut&pastings
from many sources. The key idea that I think is important is the means
by which observers (as Local Systems) communicate with each other. I
believe that bisimulation is the means of communication and that, in
general, all observations are communications.
> Is there a limit on the cardinality of the CS and CS classes?
> ** The classes CS and CE can be nonenumerable, having the cardinality of the reals. In particular if CE is the real world environment W, the number of possible worlds is nonenumerable. The fact that the real world has the cardinality of the real numbers is clearly something that was understood by mathematicians when the term "real numbers" was introduced.
        That is a point of contention with philosophers since no pointer is
observed with a real valued resolution. Our observations have a finite
bound of accuracy, thus the idea that |W| = |R| is an inference, not an
empirical fact. But, the fact that we can think of such ideal objects
that are not directly observable is extraordinary. R. Penrose's thinking
(that we have discussed in the past) that we can somehow have access to
the Platonic Realm is analogous... ;)

> > Machines induce an equivalence relation on an environment class CE such that machine M1 is more expressive than M2 if the equivalence relation E(M1, CE) induced by M1 on CE is strictly finer than E(M2, CE). Greater expressiveness of machines implies greater distinguishability among environments, which in turn means that more expressive machines can solve larger classes of computational problems.
> Could we extend this into discussions of P=NP, in terms of quantifying
> the ratio of expressiveness of a given pair of IMs?
> ** We can quantify complexity but I am not sure what it means to quantify expressiveness. The following discussion from my paper, which I did not include in my previous message, introduces the notion of competitive ratio, and discusses its limitations as a measure of on-line relative to off-line expressiveness. the book by Borodin and El Yaniv [BE, Cambridge 1999] is worth getting.

        Is it: Online Computation and Competitive Analysis (May 1998) Cambridge
Univ Pr. ISBN: 0521563925?

> [quote from my paper] It is instructive to examine the relation between our notion of interactive computation and on-line algorithms that arise in competitive analysis, which measure algorithm performance relative to best possible performance of an algorithm that has complete knowledge [BE]. The extra cost incurred by on-line algorithms in discovering knowledge about their environment already known to off-line algorithms is measured by the competitive ratio, which compares optimal off-line cost when knowledge is complete with on-line cost. However, such comparisons assume that an off-line algorithm with complete knowledge exists - an assumption that does not hold for inherently interactive problems for which complete knowledge requires knowledge of the future. Interactive problems generally do not have a perfect information model that can serve as a baseline for competitive analysis, and cannot be expressed as combinatorial optimization problems whose solutions are expressible as functions from integers to

        The assumption that "an off-line algorithm with complete knowledge
exists" is the point of my critique of the Computational Complexity
paper: [time 40] The Computational Complexity of LSs... The "existence"
of a source of complete knowledge (God!) follows from the consistency of
the source, *but* it would require infinite time (Eternity) or infinite
space to gain complete bisimulational equivalence with such a source
(with zero error). The restriction to finite durations of the
interfacing of LSs, introduces error to the interfacing with such an
omniscient Source. Umm, this wording is faulty! :( I'll try to clarify
myself as we discuss this further. :)
        This last comment: "...expressible as functions from integers to
integers" is a key property of the communication between LSs. The
restriction to the integers is symptomatic of the difference between
systems with irreversible dynamics and those with invertible dynamics:

"The difference between the definition of dynamical [invertible] and
semidynamical [irreversible] systems is solely in the restriction of t
and t' to values drawn from the positive real numbers, or the positive
integers, for semidynamical systems..." (pg 3. M.C. Mackey. Time's
Arrow: The Origins of Thermodynamic Behaviour. Springer-Verlag, 1992)

> The result that interaction reduces complexity at first appears to conflict with the results of competitive analysis, where on-line computations increase complexity compared with complete information off-line computation. However, a careful analysis shows that models of computation are fundamentally different, and that delay of binding time of interactive decisions decreases complexity when interactive information provided by nature provides a clue to the next action. Though such interactive computations converge to local rather than global optima, they are effective in natural processes like evolution and protein folding, as well as in processes like driving, in providing an adequate basis for interactive decision making.
> [end quote]

"delay of binding time" is a very interesting concept! I believe that
there is a relationship between it and the size of the menory of an SIMk
(and MIMk). The idea that B. Roy Frieden discusses, re: "Fisher
information always either decreases or remains constant", IMHO, is an
expression of this! There is also a connection to causality and
space-time metrics hidden here! :)

> > Behavior of agents in a TM environment is expessed by input-output pairs, while behavior in a SIM environment is expressed by sequences of observations (I/O streams) that allow more finely-grained distinctions among behaviors than single observations. Nonequivalence of two SIMs can be detected by a finite distinguishability certificate (observation sequence) but equivalence cannot be finitely detected.
> I see this last as an expression of how a given theory can only be
> falsified, never proven. ;)
> ** Absolutely right. This relates to Karl Poppers book on Scientific Discovery (I was a colleague of Karl Popper's at the London School of Economics for 3 years from 1961 to 1964).

        What an honor! :)

> > When the environment with which a finite agent interacts is a TM, then behavior richer than a TM cannot be detected, just as the intelligence of a human agent cannot be detected if the agent is never given tasks that require intelligence.
> This seems to speak to the limitations on observables given an
> experimental apparatus. We see in QM that "what can be seen depends on
> how one looks".
> ** Extending finite agents with observation tools like microsocopes, telescopes, and infra-red sensors is the dual of extending the richeness of environments so that agents can realize their full potential. You cannot tell who will be good at a task unless you test them in the actual job, and you cannot tell who your true friends are until you test them under conditions of adversity or extreme conditions.

        I completely agree, but we make assumptions based on short sequences of
tests to make real time desicions. The again speaks to the trade-off
between finite reaction times and error in predictions/inferences... We
can not tell the tiger to wait in mid-pounce while we calculate the
absolute best choice of reaction, re: fight or flight! :)
> > Since computational models aim to express properties of real-world environments, W is the most demanding of the environments considered here. Interaction machines that interact with the real world W, have the property that E(SIM, W) is a finer equivalence relation on the real world that E(TM, W). SIMs permit a finer granularity equivalence relation for real-world environments than TMs and MIMs permit a finer granularity relation than SIMs. Agents that model behavior of real-world environments W impose an expressiveness hierarchy such that TMs, SIMs, and MIMs are progressively more expressive finite agents, able to make progressively finer distinctions about the world and solve larger classes of problems.
> >
> > BW(TMs) < BW(SIMs) < BW(MIMs)
> >
> > However, for environments that are TMs, the equivalence relation induced by SIMs is no finer than that induced by TMs, while for SIM environments, MIMs are no more expressive than SIMs. To observe a given level of expressiveness of a finite agent it is necessary to have a testing environment with at least that expressiveness. Failure to observe that computers have greater expressive power than TMs was due in large measure to the fact that observation environments (testing environments) were restricted to be TMs.
> We see this occurring in physics in the assumptions of unique
> initiality and/or finality of time for all E(CS, CE) since they assume
> that the *complete* class of CS has as a *unique* support a *unique*
> class of CE and that such is a priori!
> ** I am not familiar with the models to which you refer and cannot therefore understand this point. Maybe you can refer me to a reference.

        I don't know any concise references to this line of thinking. It is
something that I have been considering for a long time. I am trying to
generate the knowledge in others via interactions. I find myself
behaving like your chess playing SIM. :) I need to read and think about
your statements here more carefully. What I have fo far is a bit off the
cuff... ;)
> > Let SIMk be the class of SIMs restricted to no more than k interactions. We can show that:
> >
> > For all k>0, BW(SIMk) < BW(SIMk+1); and BW(TM) = BW(SIM1)
> >
> > The distinctions that can be made for real world environments, or environments with at least the expressive > power of SIMs, is greater for SIMk+1 than for SIMk, and TMs are at the bottom of this hierarchy. The behaviors BW(SIMk) of SIMs that can make k observations determine an infinite expression hierarchy with TMs at the bottom (corresponding to behavior BW(SIM1)).
> >
> > This may be proved by example. Consider an environment of SIMs with buffers of length k that buffer the output of their first k-1 interactions and disgorge their first output only on the kth interaction. For this environment, machines in the class SIMk can distinguish machines with buffers of length k by distinguishability certificates of length k, but cannot distinguish maong machines with bufers of length k+1. As a second example, consider an environment of machines Mk with binary inputs that print a 1 whenever the last k interactive inputs are 1 and a 0 otherwise. The shortest distinguishability certificate between Mk+! and Mk has length k+1. Note that SIMk must have a memory of at least size log(k) to track distinguishability certificates of length k, so that the state for the classes SIMk is not bounded.
> This last gets to my question of the information content of DCs. :) How
> would we generically model the "memory" of a SIMk and a MIMk (if there
> is such a thing ;) )? Can we speak to thermodynamic and Shannon entropy
> in terms of memory "size"?
> ** Maybe my earlier discussion of the nature of "stream" information can help
> you answer this question.

        Yes. Thanks! :)

> > SIMs model interactive question answering. The expressiveness hierarchy for SIMs shows that on-line questioning that makes use of follow-up questioning has greater expressiveness than off-line questioning. Ken Starr can obtain more information about questioned subjects (Clinton) by interactive than noninteractive questioning because later questions can use information not available to noninteractive questioners. Strategies that permit k+1 interactive questions can elicit more information (about Clinton) than k questions, assuming that Clinton has at least the expressive power of a SIM. The strategy used by the Senate of asking a list of 81 predetermined questions was predictably ineffective because it is an off-line strategy that corresponds to asking a single complex question with many parts.
> Is it clear that interactive question answering is such that it can't
> be assumed to be an a priori observable? This, IMHO, speaks to how a
> branching time formalism allows us to model a flowing time.
> ** To formally prove that interactive Q/A is more expressive we need to express the problem in terms of SIMs and use the "proof by example" given earlier to show that if both Starr and Clinton can be modeled by SIMs, Starr will be able to make distinctions about Clinton not possible by noninteractive questioning.

        I think that we are working toward this "proof by example" by our act
of communicating here now! :)
> > Interaction during processes like driving, where looking out of the window can elicit information not available prior to the start of the journey, not only increases driving efficiency but allows a larger class of problems to be solved. When local computation of the next step of an interactive process is globally the correct thing to do, as is usually the case in driving, then NP-complete algorithmic tasks can be reduced to interactive tasks of polynomial (usually linear) complexity [We2].
> > However, interaction not only dramatically reduces the complexity for this class of problems but increases the class of problems that can be solved to include temporal domains where complete information is not available prior to the computation. For domains determined by a static map, the problem of driving from A to B can be algorithmically specified but is more simply specified interactively, while dynamic domains with traffic conditions not specified in advance cannot be algorithmically specified, even in principle, and require interactive specification.
> >
> > Problems like protein folding are NP-complete as combinatorial optimization problems, but nature finds a way of naturally folding DNA strands so they always fold in the same way [Pa2]. We hypothesize that nature does not solve such problem algorithmically, but that it folds DNA strands by a robust interactive process having the property that local small interactive steps yield the same folded configuration for a large (robust) collection of initial configurations. Dynamic programming is an example of a process that yields global optima by sequences of local optimization steps. Its use as a practical model for learning reflects the fact that nature "prefers" processes, like evolution, whose local steps yield stable local optima.
> This is remarkably similar to that I am trying to communicate about
> with regards to LSs; that the interactions/communications among LSs
> constructs space-times. These are the mutually perceivable environments
> within which observers (LSs) frame their perceptions. Al Martel and I
> spoke yesterday afternoon about how the increase in knowledge content of
> an LS may be related to the perception of an expanding space-time... And
> perhaps cross-talk noise is the cause for the background black body
> radiation...
> Thank you for this result! :)
> ** Thanks for appreciating these comments. This encourages me to think they may be appreciated by the larger scholarly community.

        You are most welcome. I am especially honored since I am a mere
self-educated amateur... ;)
Kindest regards,

This archive was generated by hypermail 2.0b3 on Sun Oct 17 1999 - 22:10:31 JST