**Hitoshi Kitada** (*hitoshi@kitada.com*)

*Sun, 9 May 1999 11:48:23 +0900*

**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]**Next message:**Hitoshi Kitada: "[time 290] Fw: BOUNCE time: Non-member submission from [Phil Diamond <pmd@maths.uq.edu.au>]"**Previous message:**Stephen P. King: "[time 288] Fisher Information"

Dear Stephen,

A miscellaneous remark...

----- Original Message -----

From: Stephen P. King <stephenk1@home.com>

To: <time@kitada.com>

Sent: Saturday, May 08, 1999 10:47 PM

Subject: [time 288] Fisher Information

*> On Mon, 26 Apr 1999, Stephen Paul King wrote:
*

*>
*

*> > Could someone give a definiton of Fisher Information that a
*

*> > mindless philosopher would understand? :)
*

*>
*

*> Let me start with a couple of intuitive ideas which should help to
*

*> orient
*

*> you. Fisher information is related to the notion of information (a kind
*

*> of
*

*> "entropy") developed by Shannon 1948, but not the same. Roughly
*

*> speaking
*

*>
*

*> 1. Shannon entropy is the volume of a "typical set"; Fisher information
*

*> is the area of a "typical set",
*

*>
*

*> 2. Shannon entropy is allied to "nonparametric statistics"; Fisher
*

*> information is allied to "parametric statistics".
*

*>
*

*> Now, for the definiton.
*

*>
*

*> Let f(x,t) be a family of probability densities parametrized by t.
*

*> For example,
*

*>
*

*> f(x,t) = 1/t exp(-x/t), on x >= 0
*

*>
*

*> In parametric statistics, we want to estimate which t gives the best fit
*

*> to a finite data set, say of size n. An estimator is a function from
*

*> n-tuple data samples to the set of possible parameter values, e.g.
*

*> (0,infty) in the example above. Given an estimator, its bias, as a
*

*> function of t, is the difference between the expected value (as we range
*

*> over x) of the estimator, according to the density f(.,t), and the
*

*> actual
*

*> value of t. The variance of the estimator, as a function of t, is the
*

*> expectation (as we range over x), according to f(.,t), of the squared
*

*> difference between t and the value of the estimator. If the bias
*

*> vanishes
*

*> (in this case the estimator is called unbiased), the variance will
*

*> usually
*

*> still be a positive function of t. It is natural to try to minimize the
*

*> variance over the set of unbiased estimators defined for a given family
*

*> of
*

*> densities f(.,t).
*

*>
*

*> Given a family of densities, the score is the logarithmic derivative
*

*>
*

*> V(x,t) = d/dt log f(x,t) = d/dt f(x,t)/f(x,t)
*

*>
*

*> (We are tacitly now assuming some differentiablity properties of our
*

*> parameterized family of densities.) The mean of the score (as we average
*

*> over x) is always zero. The Fisher information is the variance of the
*

*> score:
*

*>
*

*> J(t) = expected value of square of V(x) as we vary x
*

*>
*

*> Notice this is a function of t defined in terms of specific parametrized
*

*> family of densities. (Of course, the definition is readily generalized
*

*> to
*

*> more than one parameter).
*

*>
*

*> The fundamentally important Cramer-Rao inequality says that
*

*>
*

*> variance of any estimator >= 1/J(t)
*

*>
*

*> Thus, in parametric statistics one wants to find estimators which
*

*> achieve
*

*> the optimal variance, the reciprocal of the Fisher information. From
*

*> this
*

*> point of view, the larger the Fisher the information, the more precisely
*

*> one can (using a suitable estimator) fit a distribution from the given
*

*> parametrized family to the data.
*

*>
*

*> (Incidently: someone has mentioned the work of Roy Frieden, who has
*

*> attempted to relate the Cramer-Rao inequality to the Heisenberg
*

*> inequality. See the simple "folklore" theorem (with complete proof)
*

His "folklore" theorem (and proof) in

http://members.home.net/stephenk1/Outlaw/Frieden.txt is given in Von Neumann's

book "Mathematical Foundations of Quantum Mechanics," 1932, Chapter III,

section 4, and is not a folklore at all, but has been well-known. Chris

Hillman seems a young inexperienced mathematician.

I

*> posted on a generalized Heisenberg inequality in sci.physics.research a
*

*> few months ago--- you should be able to find it using Deja News.)
*

*>
*

*> This setup is more flexible than might at first appear. For instance,
*

*> given a density f(x), where x is real, define the family of densities
*

*> f(x-t); then the Fisher information is
*

*>
*

*> J(t) = expectation of [d/dt log f(x-t)]^2
*

*>
*

*> = int f(x-t) [d/dt log f(x-t)]^2 dx
*

*>
*

*> By a change of variables, we find that for a fixed density f, this is a
*

*> constant. In this way, we can change our point of view and define a
*

*> (nonlinear) functional on densities f:
*

*>
*

*> J(f) = int f(x) [f'(x)/f(x)]^2 dx
*

*>
*

*> On the other hand, Shannon's "continuous" entropy is the (nonlinear)
*

*> functional:
*

*>
*

*> H(f) = -int f(x) log f(x) dx
*

*>
*

*> Suppose that X is a random variable with finite variance and Z is an
*

*> independent normally distributed random variable with zero mean and unit
*

*> variance ("standard noise"), so that X + sqrt(t) Z is another random
*

*> variable associated with density f_t, represented X perturbed by noise.
*

*> Then de Bruijn's identity says that
*

*>
*

*> J(f_t) = 2 d/dt h(f_t)
*

*>
*

*> and if the limit t-> 0 exists, we have a formula for the Fisher
*

*> information of the density f_0 associated with X.
*

*>
*

*> See Elements of Information Theory, by Cover & Thomas, Wiley, 1981, for
*

*> details on the above and for general orientation to the enormous body of
*

*> ideas which consistutes modern information theory, including typical
*

*> sets
*

*> and comment (1) above. Then see some of the many other books which
*

*> cover
*

*> Fisher information in more detail. In one of the books by J. N. Kapur
*

*> on
*

*> maximal entropy you will find a particularly simple and nice connection
*

*> between the multivariable Fisher information and Shannon's discrete
*

*> "information" (arising from the discrete "entropy" -sum p_j log p_j).
*

*>
*

*> (Come to think of it, if you search under my name using Deja News you
*

*> should find a previous posting of mine in which I gave considerable
*

*> detail
*

*> on some inequalities which are closely related to the area-volume
*

*> interpretations of Fisher information and entropy. If you've ever heard
*

*> of Hadamard's inequality on matrices, you should definitely look at the
*

*> discussion in Cover & Thomas.)
*

*>
*

*> Hope this helps!
*

*>
*

*> Chris Hillman
*

*>
*

Best wishes,

Hitoshi

**Next message:**Hitoshi Kitada: "[time 290] Fw: BOUNCE time: Non-member submission from [Phil Diamond <pmd@maths.uq.edu.au>]"**Previous message:**Stephen P. King: "[time 288] Fisher Information"

*
This archive was generated by hypermail 2.0b3
on Sun Oct 17 1999 - 22:10:31 JST
*