**Stephen Paul King** (*stephenk1@home.com*)

*Fri, 14 May 1999 01:16:43 GMT*

**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]**Next message:**Lester Zick: "[time 315] Particle Structure"**Previous message:**Matti Pitkanen: "[time 313] Re: Mapping p-adic spacetime to its real counterpart"**In reply to:**Matti Pitkanen: "[time 297] Mapping p-adic spacetime to its real counterpart"

On 13 May 1999 11:17:24 -0700, Chris Hillman

<hillman@math.washington.edu> wrote:

*>On 11 May 1999, john baez wrote:
*

*>
*

*>> In article <01be9aec$b623a020$>> In article <01be9aec$b623a020$2897cfa0@sj816bt720500>,
*

*>> Philip Wort <phil.wort@cdott.com> wrote:
*

*>
*

*>> >I find this book is very difficult to follow.
*

*>
*

*>> I wish I could help you, but I can't. When people started talking about
*

*>> Friedan's work I went down to the library to look it up, but I couldn't
*

*>> make much sense out of his papers. I'd hoped his book would be easier
*

*>> to read, but from what you say it sounds like maybe not....
*

*>
*

*>> Does *anyone* understand this stuff? If so, could they explain it?
*

*>
*

*>I had the same reaction to Frieden's papers: I couldn't figure out what he
*

*>was trying to say in the first few paragraphs of the papers I downloaded,
*

*>so I put them aside... (If anyone else wants to have a go, check the
*

*>PROLA archive http://prola.aps.org/search.html)
*

*>
*

*>> I don't even understand what "Fisher information" really is or why
*

*>> people (not just Friedan) are interested in it.
*

*>
*

*>As it happens, the same question came up in bionet.info-theory recently,
*

*>so I quote my reply below. My post was based on what I found in Thomas &
*

*>Cover, so if I've gotten anything wrong, someone should correct what I've
*

*>said!
*

*>
*

*>As for why people are interested... well, statisticians find everything
*

*>Fisher did of enduring interest for one reason or another, it seems :-)
*

*>
*

*>One additional thing, which I'm not sure I remember quite right, but which
*

*>physicists will probably find intriguing, is the notion of a "statistical
*

*>manifold", where one can actually make a manifold out of a parametrized
*

*>family of distributions in such a way that the Riemann curvature turns out
*

*>to be an "entropy" related to Shannon's entropy and the corresponding
*

*>connection is related to Fisher's information! Something like that,
*

*>anyway --- it's been a decade since I looked at this. Now you're probably
*

*>thinking what I thought ten years ago, but when I looked at the books on
*

*>this stuff, the expected connections were not apparent. No forms in
*

*>sight, it didn't even look like differential geometry. In fact, it looked
*

*>very ugly :-(
*

*>
*

*>Chris Hillman
*

*>
*

*>=========== BEGIN REPOST [WITH NEW EXAMPLE] =======================
*

*>
*

*>Date: Fri, 7 May 1999 15:23:58 -0700
*

*>From: Chris Hillman <hillman@math.washington.edu>
*

*>Newsgroups: bionet.info-theory
*

*>Subject: Re: Definition of Fisher Information
*

*>
*

*>
*

*>On Mon, 26 Apr 1999, Stephen Paul King wrote:
*

*>
*

*>> Could someone give a definiton of Fisher Information that a
*

*>> mindless philosopher would understand? :)
*

*>
*

*>Let me start with a couple of intuitive ideas which should help to orient
*

*>you. Fisher information is related to the notion of information (a kind of
*

*>"entropy") developed by Shannon 1948, but not the same. Roughly speaking
*

*>
*

*> 1. Shannon entropy is the volume of a "typical set"; Fisher information
*

*> is the area of a "typical set",
*

*>
*

*> 2. Shannon entropy is allied to "nonparametric statistics"; Fisher
*

*> information is allied to "parametric statistics".
*

*>
*

*>Now, for the definition.
*

*>
*

*>Let f(x,t) be a family of probability densities parametrized by t.
*

*>
*

*>[Example (I just made this up for this repost):
*

*>
*

*> 2 sin(pi t)
*

*> f(x,t) = ------------ x^(1-t) (1-x)^t
*

*> pi t(1-t)
*

*>
*

*>where 0 < x < 1 and 0 < t < 1. (The numerical factor is chosen to ensure
*

*>that the integral over 0 < x < 1 is unity.)
*

*>
*

*>If you take appropriate limits, this family can be extended to -1 < t < 2;
*

*>e.g. (its fun to plot these as functions of x)
*

*>
*

*> f(x,-1/2) = 8/(3 pi) x^(-1/2) (1-x)^(3/2)
*

*>
*

*> f(x, 0) = 2 (1-x)
*

*>
*

*> f(x, 1/4) = 16 sqrt(2)/(3 pi) (1-x)^(3/4) x^(1/4)
*

*>
*

*> f(x, 1/3) = 9 sqrt(2)/(2 pi) (1-x)^(2/3) x^(1/3)
*

*>
*

*> f(x, 1/2) = 8 sqrt(x-x^2)/pi
*

*>
*

*> f(x, 2/3) = 9 sqrt(2)/(2 pi) (1-x)^(1/3) x^(2/3)
*

*>
*

*> f(x, 1) = 2 x
*

*>
*

*> f(x, 3/2) = 8/(3 pi) x^(3/2) (1-x)^(-1/2)
*

*>
*

*>End of example]
*

*>
*

*>In parametric statistics, we want to estimate which t gives the best fit
*

*>to a finite data set, say of size n. An estimator is a function from
*

*>n-tuple data samples to the set of possible parameter values, e.g.
*

*>(0,infty) in the example above. Given an estimator, its bias, as a
*

*>function of t, is the difference between the expected value (as we range
*

*>over x) of the estimator, according to the density f(.,t), and the actual
*

*>value of t. The variance of the estimator, as a function of t, is the
*

*>expectation (as we range over x), according to f(.,t), of the squared
*

*>difference between t and the value of the estimator. If the bias vanishes
*

*>(in this case the estimator is called unbiased), the variance will usually
*

*>still be a positive function of t. It is natural to try to minimize the
*

*>variance over the set of unbiased estimators defined for a given family of
*

*>densities f(.,t).
*

*>
*

*>Given a family of densities, the score is the logarithmic derivative
*

*>
*

*> V(x,t) = d/dt log f(x,t) = d/dt f(x,t)/f(x,t)
*

*>
*

*>[In the example (the one I just made up), if I didn't goof we have, if I
*

*>haven't goofed
*

*>
*

*> V(x,t) = log(x/(1-x)) + pi cot(pi t) + (2t-1)/(t(1-t))
*

*>
*

*>e.g. V(x,1/2) = log(x/(1-x)).]
*

*>
*

*>(We are tacitly now assuming some differentiability properties of our
*

*>parameterized family of densities.)
*

*>
*

*>The mean of the score (as we average over x) is always zero. The Fisher
*

*>information is the variance of the score:
*

*>
*

*> J(t) = expected value of square of V(x) as we vary x
*

*>
*

*>Notice this is a function of t defined in terms of specific parametrized
*

*>family of densities. (Of course, the definition is readily generalized to
*

*>more than one parameter).
*

*>
*

*>[In the example:
*

*>
*

*> J(-1/2) ~ 5.42516
*

*>
*

*> J (0) = pi^2/3 - 1 ~ 2.28987
*

*>
*

*> J (1/3) ~ 1.90947
*

*>
*

*> J(1/2) ~ 1.8696
*

*>
*

*> J(2/3) ~ 1.90947
*

*>
*

*> J(1) = pi^2/3 - 1 ~ 2.28987
*

*>
*

*> J(3/2) ~ 5.42516
*

*>
*

*>if I didn't goof. Note the expected symmetry of these values.]
*

*>
*

*>The fundamentally important Cramer-Rao inequality says that
*

*>
*

*> variance of any estimator >= 1/J(t)
*

*>
*

*>Thus, in parametric statistics one wants to find estimators which achieve
*

*>the optimal variance, the reciprocal of the Fisher information. From this
*

*>point of view, the larger the Fisher the information, the more precisely
*

*>one can (using a suitable estimator) fit a distribution from the given
*

*>parametrized family to the data.
*

*>
*

*>(Incidentally: someone has mentioned the work of Roy Frieden, who has
*

*>attempted to relate the Cramer-Rao inequality to the Heisenberg
*

*>inequality. See the simple "folklore" theorem (with complete proof) I
*

*>posted on a generalized Heisenberg inequality in sci.physics.research a
*

*>few months ago--- you should be able to find it using Deja News.)
*

*>
*

*>This setup is more flexible than might at first appear. For instance,
*

*>given a density f(x), where x is real, define the family of densities
*

*>f(x-t); then the Fisher information is
*

*>
*

*> J(t) = expectation of [d/dt log f(x-t)]^2
*

*>
*

*> = int f(x-t) [d/dt log f(x-t)]^2 dx
*

*>
*

*>By a change of variables, we find that for a fixed density f, this is a
*

*>constant. In this way, we can change our point of view and define a
*

*>(nonlinear) functional on densities f:
*

*>
*

*> J(f) = int f(x) [f'(x)/f(x)]^2 dx
*

*>
*

*>[The idea now is something like this: J(f) is measuring the precision of
*

*>fitting f to numerical data, up to translation of the distribution. The
*

*>larger J(f) is, the more precisely you can identified a particular
*

*>translation which gives the best fit. I think this is the idea, anyway.]
*

*>
*

*>On the other hand, Shannon's "continuous" entropy is the (nonlinear)
*

*>functional:
*

*>
*

*> H(f) = -int f(x) log f(x) dx
*

*>
*

*>Suppose that X is a random variable with finite variance and Z is an
*

*>independent normally distributed random variable with zero mean and unit
*

*>variance ("standard noise"), so that X + sqrt(t) Z is another random
*

*>variable associated with density f_t, represented X perturbed by noise.
*

*>Then de Bruijn's identity says that
*

*>
*

*> J(f_t) = 2 d/dt h(f_t)
*

*>
*

*>and if the limit t-> 0 exists, we have a formula for the Fisher
*

*>information of the density f_0 associated with X.
*

*>
*

*>See Elements of Information Theory, by Cover & Thomas, Wiley, 1981, for
*

*>details on the above and for general orientation to the enormous body of
*

*>ideas which constitutes modern information theory, including typical sets
*

*>and comment (1) above. Then see some of the many other books which cover
*

*>Fisher information in more detail. In one of the books by J. N. Kapur on
*

*>maximal entropy you will find a particularly simple and nice connection
*

*>between the multivariable Fisher information and Shannon's discrete
*

*>"information" (arising from the discrete "entropy" -sum p_j log p_j).
*

*>
*

*>(Come to think of it, if you search under my name using Deja News you
*

*>should find a previous posting of mine in which I gave considerable detail
*

*>on some inequalities which are closely related to the area-volume
*

*>interpretations of Fisher information and entropy. If you've ever heard
*

*>of Hadamard's inequality on matrices, you should definitely look at the
*

*>discussion in Cover & Thomas.)
*

*>
*

*>Hope this helps!
*

*>
*

*>Chris Hillman
*

*>
*

*>
*

**Next message:**Lester Zick: "[time 315] Particle Structure"**Previous message:**Matti Pitkanen: "[time 313] Re: Mapping p-adic spacetime to its real counterpart"**In reply to:**Matti Pitkanen: "[time 297] Mapping p-adic spacetime to its real counterpart"

*
This archive was generated by hypermail 2.0b3
on Sun Oct 17 1999 - 22:10:31 JST
*