Thursday, April 26, 2012

Bob Marks grossly misunderstands “no free lunch”

And so does Bill Dembski. But it is Marks who, in a “Darwin or Design?” interview, reveals plainly the fallacy at the core of his and Dembski's notion of “active information.” (He gets going at 7:50. To select a time, it's best to put the player in full-screen mode. I've corrected slips of the tongue in my transcript.)

[The “no free lunch” theorem of Wolpert and Macready] said that with a lack of any knowledge about anything, that one search was as good as any other search. [14:15]

And what Wolpert and Macready said was, my goodness, none of these [“search”] algorithms work as well as [better than] any other one, on the average, if you have no idea what you're doing. And so the question is… and what we've done here is, if indeed that is true, and an algorithm works, then that means information has been added to the search. And what we've been able to do is take this baseline, that all searches are the same, and we've been able to, in cases where searches work, measure the information that is placed into the algorithm in bits. And we have looked at some of the evolutionary algorithms, and we found out that, strikingly, they are not responsible for any creation of information. [14:40]

And according to “no free lunch” theorems, astonishingly, any search, without information about the problem that you're looking for, will operate at the same level as blind search. And that's... It's a mind-boggling result. [28:10]

Bob has read into the “no free lunch” (NFL) theorems what he believed in the first place, namely that if something works, it must have been designed to do so. Although he gets off to a good start by referring to the subjective state of the practitioner (“with a lack of knowledge,” “if you have no idea what you're doing”), he errs catastrophically by making a claim about the objective state of affairs (“one search is as good as any other search,” “all searches are the same”).

Does your lack of knowledge about a problem imply that all available solution methods (algorithms) work equally well in fact? If you think so, then you're on par with the Ravenous Bugblatter Beast of Traal, “such a mind-bogglingly stupid animal, it assumes that if you can't see it, it can't see you.” Your lack of knowledge implies only that you cannot formally justify a choice of algorithm. There not only may be, but in practice usually will be, huge differences in algorithm performance.

What boggles my mind is that Marks and Dembski did not learn this from Wolpert and Macready (1997), “No Free Lunch Theorems for Optimization.” In Section III-A, the authors observe that “it is certainly true that any class of problems faced by a practitioner will not have a flat prior.” This means that some problems are more likely than others, and the NFL theorems do not hold in fact. So what is the significance of the theorems?

First, if the practitioner has knowledge of problem characteristics but does not incorporate them into the optimization algorithm, then... the NFL theorems establish that there are no formal assurances that the algorithm chosen will be at all effective. Second, while most classes of problems will certainly have some structure which, if known, might be exploitable, the simple existence of that structure does not justify choice of a particular algorithm; that structure must be known and reflected directly in the choice of algorithm to serve as such a justification. [emphasis mine]
So don't take my word for it that Bob has twisted himself into intellectual contortions with his apologetics. This comes from an article with almost 2600 citations. If memory serves, Marks and Dembski have cited it in all 7 of their publications.

Marks and Dembski believe, astonishingly, that the NFL theorems say that an algorithm outperforms “blind search” only if some entity has exploited problem-specific information in selecting it, when the correct interpretation is that the practitioner is justified in believing that an algorithm outperforms “blind search” only if he or she exploits problem-specific knowledge [justified true belief, not just information] in selecting it. This leads them to the fallacious conclusion that when a search $s$ outperforms blind search, they can measure the problem-specific information that an ostensible "search-forming process” added to $s$ to produce the gain in performance. They silently equate performance with information, and contrive to transform the gain in performance into an expression that looks like gain of Shannon information.

Their name-game depends crucially on making the outcome of a search dichotomous — absolute success (performance of 1) or absolute failure (performance of 0). Then the expected performance of a search is also its probability of success. There is a probability $p$ that blind search solves the problem, and a probability $p_s > p$ that search $s$ solves the problem, and the ratio $p_s / p$ is naturally interpreted as performance gain. But to exhibit the “added information” (information gain), Marks and Dembski do a gratuitous logarithmic transformation of the performance gain, $$I_+ = \log \frac{p_s}{p} = \log p_s - \log p = -\!\log p + \log p_s,$$ and call the result active information. (The last step is silly, of course. Evidently it makes things look more “Shannon information-ish.”) To emphasize, they convert performance into “information” by sticking to a special case in which expected performance is a probability.

Here's a simple (in)sanity check. Suppose that I have a “pet” algorithm that I run on all problems that come my way. Obviously, there's no sense in which I add problem-specific information. But Marks and Dembski cherry-pick the cases in which my algorithm outperforms blind search, and, because active information is by definition the degree to which an algorithm outperforms blind search, declare that something really did add information to the algorithm.

Now, a point I'll treat only briefly is that Marks and Dembski claim that the cases in which my pet algorithm greatly outperforms blind search are exceedingly rare. The fact is that they do not know the distribution of problems arising in the real world, and have no way of saying how rare or common extreme performance is for simple algorithms. In the case of computational search, we know for sure that the distribution of problems diverges fabulously from the uniform. Yet Marks and Dembski carry on about “Bernoulli's Principle of Insufficient Reason and Conservation of Information in Computer Search,” doing their damnedest to fob off subjective assignment of uniform probability as objective chance.

A bit of irony for dessert [35:50]:

Question: Are you getting any kind of response from the other side? Are they saying this is kind of interesting, or are they kind of putting stoppers in their ears? What's going on?

Answer: It's more of the stoppers in the ears thus far. We have a few responses on blogs, which are unpleasant, and typically personal attacks, so those are to be ignored. We're waiting for, actually, something substantive in response.

A note to reviewers of papers by Dembski and Marks

William A. Dembski and Robert J. Marks II lace their engineering papers with subtle insinuations that will strike reviewers as somewhat strange, but that probably will not raise red flags. The only publication in which they give a crystal-clear explanation of their measure of active information, and state outright what they're trying to do with it, is the somewhat philosophical Life's Conservation Law: Why Darwinian Evolution Cannot Create Biological Information. Note that they previously referred to "English's Law of Conservation of Information" (a term they made up). English is telling you now that he did not understand their engineering papers until he read the one addressing biological evolution.

ABSTRACT: Laws of nature are universal in scope, hold with unfailing regularity, and receive support from a wide array of facts and observations. The Law of Conservation of Information (LCI) is such a law. LCI characterizes the information costs that searches incur in outperforming blind search. Searches that operate by Darwinian selection, for instance, often significantly outperform blind search. But when they do, it is because they exploit information supplied by a fitness function — information that is unavailable to blind search. Searches that have a greater probability of success than blind search do not just magically materialize. They form by some process. According to LCI, any such search-forming process must build into the search at least as much information as the search displays in raising the probability of success. More formally, LCI states that raising the probability of success of a search by a factor of q/p (> 1) incurs an information cost of at least log(q/p). LCI shows that information is a commodity that, like money, obeys strict accounting principles. This paper proves three conservation of information theorems: a function-theoretic, a measure-theoretic, and a fitness-theoretic version. These are representative of conservation of information theorems in general. Such theorems provide the theoretical underpinnings for the Law of Conservation of Information. Though not denying Darwinian evolution or even limiting its role in the history of life, the Law of Conservation of Information shows that Darwinian evolution is inherently teleological. Moreover, it shows that this teleology can be measured in precise information-theoretic terms. [emphasis added]

You do not have to read far into the paper to find that intelligence creates information to guide biological evolution. The passage I've highlighted contradicts the Conservation Lemma (wish I hadn't called it that) I proved in my first paper (1996) regarding "no free lunch" in so-called search. The fundamental reason that there is no free lunch is that the "search" (which is nothing more than sampling, with performance measured on the sample) cannot gain exploitable information by evaluation of the fitness function. This is really just a formalization of the famous problem of induction, i.e., observations say nothing about what has yet to be observed. Use of observations to decide what to observe is a source of sampling bias, not information. Therefore, when the performance measured on a sample obtained by biased sampling is better or worse than the expected performance for uniform sampling ("blind search"), the difference can be explained only in terms of bias. I'll say much more in a forthcoming post.

You will not read all of the paper, and thus I want to call your attention to the 1-1/3 page "Conclusion: 'A Plan for Experimental Validation.'" Some highlights:

The Law of Conservation of Information, however, is not merely an accounting tool. Under its aegis, intelligent design merges theories of evolution and information, thereby wedding the natural, engineering, and mathematical sciences. On this view (and there are other views of intelligent design), its main focus becomes how evolving systems incorporate, transform, and export information. Moreover, a principal theme of its research becomes teasing apart the respective roles of internally produced and externally applied information in the performance of evolving systems.

[...]

In such information-tracking experiments, the opponent of intelligent design hopes to discover a free lunch. The proponent of intelligent design, by contrast, attempts to track down hidden information costs and thereby confirm that the Law of Conservation of Information was preserved. There is no great mystery in any of this. Nor do such experiments to confirm intelligent design merely apply to the origin of life. Insofar as evolution (whether chemical or biological) is an exact experimental science, it will exhibit certain informational properties. Are those properties more akin to alchemy, where more information comes out than was put in? Or are they more akin to accounting, where no more information comes out than was put in? A systematic attempt to resolve such questions constitutes a plan for experimentally verifying intelligent design.

All of the published "information-tracking experiments" have all been analyses of evolutionary computations. (My next post shows that the "information" is nothing but logarithmically transformed performance, and that the misinterpretation is rooted in Marks' misunderstanding of the "no free lunch" theorems.) The highlighted passage indicates how Dembski and Marks will argue, perhaps as expert witnesses in the next judicial test of public-school instruction in "intelligent design" creationism (Dembski was to serve as a witness in the last, but withdrew), that their engineering/computing publications support the claim that biological evolution requires intelligent guidance.

This is in no way a suggestion that you respond to anything but the technical (de)merits of their work. Dembski himself referred a New York Times science reporter to me as a fair-minded critic of ID creationism. I have also protested what I considered to be an infringement of Marks' academic freedom at Baylor University. My intent here is to impress on you how important it is to do a thorough review, and to insist that the authors make clear to you everything that they are doing. In particular, require that they provide a rigorous definition of "search," rather than give examples or suggest that everyone knows what the term means. If the definition does not make "search" out to be sampling, with performance measured on the sample (as in Wolpert and Macready [1997], "No Free Lunch Theorems for Optimization"), then you should ask why it does not.

Monday, April 2, 2012

Raising an eyebrow at a Springer series editor

Springer announced last month that it would publish Biological Information: New Perspectives, the proceedings of a more-or-less secret conference of creationists. The publisher retracted the announcement almost immediately, saying that it was automatically generated, and that the volume was undergoing additional review.

Biological Information was listed, oddly enough, in an engineering series, the "Intelligent Systems Reference Library." The creationist argument that life was engineered is not engineering, of course. The creationists themselves regard it as science.* Only one of the editors of the proceedings, Bob Marks, has worked in the field of intelligent systems. It was probably he who proposed the volume to Springer.

I happened upon a volume in the series, and had a look at its two editors and 43 titles, 35 of which are dated 2011 or 2012. Seeing that one of the series editors is Janusz Kacprzyk, I thought immediately of the Polish journal that announced a forthcoming article by Marks and Dembski (another of the proceedings editors), but suspended operations prior to publishing it. And Prof. Kacprzyk was indeed on the editorial board of the International Journal of Information Technology and Intelligent Computing.

Membership on an editorial board is more an honor than anything else, and it's doubtful that Prof. Kacprzyk was involved in the process of review and acceptance of the article. However, it's not unreasonable to ask what he knew about it. And I did, with no mention whatsoever of Springer:

I'm curious as to how much you knew about the article. Were you aware that many scientists and engineers objected to it as "intelligent design" creationism? Did you read the article?
Prof. Kacprzyk did not dignify my email with a response. So I'll dignify his non-response with a raised eyebrow. If he knew nothing about the article, then why not say so?

How, precisely, do the editors of a series on engineered intelligent systems receive a proposal for a volume on biological information, and conclude other than that it's outside the scope of the series? The parsimonious guess is that they're compensated on a per-volume basis, and care more about cranking out volumes than anything else. But inquiring minds want to know.

* I say that ID creationism falls into the category of speculative philosophy, "which makes claims that cannot be verified by everyday experience of the physical world or by a scientific method." And rather than advocate censorship of Biological Information, I call for Springer to classify it correctly.

The central theme of the volume is, I suspect, that biological information is the consequence of non-material intelligence operating on matter. This is a teleological view of physical reality. What's new about it is the tacit claim that something unobservable (intelligence) creates measurable stuff (information) out of nothing with evident purpose. It is hardly unfair to characterize this as speculative philosophy seeking to become science.

In the Library of Congress classification system, BD493-701 is associated with "teleology, space and time, structure of matter, plurality of worlds." Books in this range have a great deal to say about science, but are not themselves works of science. I believe that Biological Information belongs with them. Irrespective of how libraries classify it, I hope that Springer will go on record with a statement that the volume is meta-scientific.