© 2008, 2009 KnowledgeToTheMax
Knowledge through logic
_______________________________________________________________
Offerings of KnowledgeToTheMax
Builder of ultra-optimized scientific models
______________________________________________________________________________________________________________________
This message targets, and should be
of interest to: science policy makers; funders of scientific studies;
scientific investigators; scientists; philosophers; educators; laymen.
________________________________________________________________________________________________
Summary Theoretical basis Remembering Conflict Battle! Empirical basis
Implications for science Applications Entropy minimax Offerings News Free lecture Bibliography
Summary
KnowledgeToTheMax is one of the few firms in the world
with the resources that are necessary for construction of an ultra-optimized
scientific model (aka scientific theory). Under ultra-optimization, each of the
many inferences made by the model is optimized.
Models built by this methodology consistently excel, sometimes to an astounding
degree. Aristotle’s syllogisms, thermodynamics and the theory of communication were, in
effect, built by ultra-optimization. The model that
revolutionized meteorology was built by ultra-optimization. Applications
have been made in medicine, engineering and throughout the sciences. One result
from ultra-optimization is for the maximum possible knowledge to be created
from fixed scientific resources.
Ultra-optimization
addresses the fundamental problem of logic. When an inference must be selected
for being made by a model, there are alternatives for being made. Logic
postulates the existence of principles
of reasoning that discriminate the one correct alternative from the many
incorrect ones. Solid evidence supports the contention that the principles of
reasoning are ultra-optimization.
The keys to
an understanding of ultra-optimization are the ideas of measure, inferences, optimization and missing information. Logic and measure theory jointly imply the existence of a measure that is the unique measure of inferences. The existence
and uniqueness imply that the one correct alternative may be discriminated from
the many incorrect alternatives by measurement. In the optimization of an
inference, that alternative is judged correct whose measure is greatest or
least; all other alternatives are judged incorrect.
After the
discovery of ultra-optimization 4 decades ago, it was overlooked, misunderstood
or rejected as contrary to self-interest by virtually all scientists,
philosophers and educators. A result from this phenomenon is for the vast
majority of models in use today to have been built illogically. The illogical
models are susceptible to making incorrect inferences. Bad consequences,
including unnecessary wars, financial collapses and deaths to loved ones,
follow from the incorrect inferences. Science policy continues to foster this
situation by overlooking the illogic.
KnowledgeToTheMax is dedicated to benefiting mankind
by eliminating the incorrect inferences and maximizing the knowledge through
logic. The firm would welcome the opportunity to discuss how it might use its
skills in doing so for the benefit of you, your organization or the people who
depend on your organization. Please be in touch.
Background
Given the
premise that “a swan was observed and it was white,” one is unjustified in
concluding that “all swans are white,” for unobserved swans may not be white.
How can one infer the descriptions of unobserved, real objects from the
descriptions of observed, real objects? This is the most basic of questions for
science. This question is asked by the
problem of induction. Today, few scientists, philosophers, educators or
laymen can answer this question.
Induction is
the process by which one builds a scientific model. The modifier “scientific”
on “model” signifies that such a model may provide “scientia,” the Latin word
for “demonstrable knowledge.” The word “scientia” comes to us as the English
word “science.” Informally, “science” and “knowledge” are synonymous.
In science,
a description of a set of real objects is called a “state” of these objects. Cloudy is an example of a state; it
describes a region of the Earth. A complete set of alternate descriptions is
called a “state-space” for these objects, The set {cloudy, not cloudy} is an
example of a state-space; it provides alternate descriptions for a region of
the Earth.
An
“inference” is an extrapolation from a state in a so-called “observed state-space” of a set of objects to a state
in a so-called “unobserved state-space” of the
same set of objects. For example, it is an extrapolation from the state in the
observed state-space {cloudy, not cloudy} to the state in the
unobserved state-space {rain in the next
24 hours, no rain in the next 24
hours} in reference to a region of the Earth.
At a given
time, an object is described by a single state of a particular state-space. For
example, a region of the Earth is described by the state cloudy or the state not
cloudy in the state-space {cloudy,
not cloudy}.
Because it
is observed, the state in an observed state-space is certain. Because it is
unobserved, the state in an unobserved state-space is uncertain.
A scientific
“model” is a procedure for making inferences. It is by making inferences that
such a model provides knowledge.
In the
construction of a model, the builder faces the problem of selecting the
inferences that will be made from a much larger set of alternatives for being
made. Some of these inferences are “correct”; they should be made by the model.
The remaining inferences are “incorrect”; they should not be made by the model.
How can the builder of a model discriminate the “correct” from the “incorrect”
inferences?
Logic
postulates the existence of principles that discriminate correct from incorrect
inferences. The principles that discriminate are called “the principles of reasoning.” Logic is the science of these principles. The problem of
induction is to discover the principles of reasoning.
For the
deductive logic, the principles of reasoning are as established 23 centuries
ago by Aristotle; they state that an inference is correct if it conforms to a syllogism; otherwise, it is incorrect. However,
in building a model, one employs the inductive logic. How to generalize from
the principles of reasoning for the deductive logic to the principles of
reasoning for logic as a whole is the problem of induction. Does this problem
have a solution?
The
historical record reveals that work toward a solution begins at about the time
of Aristotle; however, a solution proves elusive. Writing 21 centuries after
Aristotle, the philosopher David Hume observes that people build models by
“habits of mind” rather than by logical principles. Hume’s colleague Immanuel
Kant describes induction as the “scandal of philosophy.”
While the problem of induction remains
unsolved, there is an ethical issue for scientists. People use inferences made
by models in making decisions on issues of importance to them, including life
or death issues. Thus, the builder of a model is duty bound to ensure that this
model makes no incorrect inferences. However, in lieu of a solution to the
problem of induction, the builder lacks logical means for discriminating
between correct and incorrect inferences!
In lieu of
these means, models of infinite number are consistent with the empirical data.
How can the builder of a model select from among them that model which will be
published for use in making decisions?
Science has
a crying need for a solution to the problem of induction. However, through most
of the scientific age, this problem lacks a solution. A result is for models to
be extremely susceptible to making incorrect inferences. Bad consequences,
including unnecessary wars, financial collapses and deaths, follow from the
incorrect inferences. Of necessity, science policy makers overlook the illogic.
Then, six
decades ago, the engineer-mathematician Claude Shannon discovers an essential
element of a solution to the problem of induction. It is the existence of a measure that is the unique measure
of the inferences which are made by a model. The measure of an inference is
the missing information in this inference for
a deductive conclusion.
The
existence and uniqueness of Shannon’s measure imply
that correct can be discriminated from incorrect inferences by measurement!
Under some circumstances, the correct inference is the one of least measure.
Under other circumstances, the correct inference is the one of greatest
measure. Shannon’s discovery becomes the basis for optimization of the designs
of modern electronic communications systems. Fruits from optimization include
HDTV and noiseless voice messaging from half way around the world.
Two decades
after Shannon’s discovery, the engineer-lawyer-physicist Ronald Christensen
completes this solution to the problem of induction. The solution is the pair
of principles of reasoning called “ultra-optimization.” No competing solution
arises. Every indication is that this 23 centuries old problem is now solved.
The
cybernetics and systems science communities embrace this solution. However, the
vast majority of science policy makers, scientists, philosophers, educators and
laymen overlook or misunderstand the solution or reject it as contrary to
self-interest. Builders of models continue in the tradition of building them
illogically. Illogically built models continue to make incorrect inferences.
Bad consequences continue to follow from the incorrect inferences.
Unnecessarily, science policy makers continue to overlook the illogic. In the
consideration of grant applications and of articles submitted for publication
in peer reviewed journals, it remains acceptable to overlook illogic in the construction
of a model.
However, a
tiny fraction of scientific investigators learn of ultra-optimization and
employ it in their research. One is the founder of KnowledgeToTheMax, Terry Oldberg. Thus, 4 decades after the
discovery of ultra-optimization, it comes to pass that Oldberg’s firm KnowledgeToTheMax is one of the few
firms in the world with the necessary skills for teaching about
ultra-optimization or building an ultra-optimized model.
The
following introduction to the theoretical basis for ultra-optimization is
lengthy and complicated. A person in a hurry or with limited interest could
save time by skipping to the empirical basis. A cost
from skipping would be to limit one’s understanding of how: a) ultra-optimization
solves the problem of induction
and b) the alternatives to ultra-optimization fail to solve this problem. If
you elect to continue, you may get lost. If this happens, one way to recover would
be to skip to the empirical basis and to pick up the
theoretical thread later, with the help of a tutor. To climb the learning curve
by reading the literature would not be cost-effective.
Ultra-optimization
arises from generalization of the deductive logic. An axiom of the deductive
logic states that every proposition has a variable called its “truth-value”;
the value of the truth-value is true
or it is false. In reality, though,
one observes that a proposition may be true in a proportion of instances lying
in the interval between 0% and 100%; this proportion is called the
“probability” of the proposition. The deductive logic may be generalized for
conformity to this reality; in the generalization, the axiom that every proposition
has a truth-value is replaced by the axiom that every proposition has a
probability. When generalized in this way, the logic is called the
“probabilistic logic.” The probabilistic reduces to the deductive logic when
the values of the probabilities of propositions are restricted to 0% and 100%;
here, 0% corresponds to false for the
truth-value and 100% to true.
A state is
an example of a proposition. That it is an example of a proposition ties the
notion of a state to logic, for the relationships among propositions are a
topic of logic.
Under the
probabilistic logic, as every state is a proposition, every state has a
probability. For example, the state rain
in the next 24 hours has a probability.
It can be
proved that the probabilistic logic implies the existence of a measure that is the unique measure of inferences. This
measure is called “Shannon’s measure of information,”
after the person who first describes it, Claude Shannon. Shannon’s measure of
an inference is the missing information in
this inference for a deductive conclusion. Going forward, Shannon’s measure of
an inference is called the “missing information.” In the literature, Shannon’s
measure of an inference is often called the “entropy” or “conditional entropy”
of this inference. The “entropy” of thermodynamics
is Shannon’s measure of a particular kind of inference.
Within the
domain of validity of the deductive logic, the missing information is nil.
Within the domain of validity of the inductive logic, the missing information
is not-nil. Thus, whether there is missing information distinguishes the
domains of validity of the two branches of logic; this, evidently, is the
answer to the age-old question of the essential difference between the
inductive and the deductive logic.
The
existence and uniqueness of Shannon’s measure imply that the probabilistic
logic supports ultra-optimization. In the ultra-optimization of a model, each
of the many inferences made by this model is optimized. In the optimization of
an inference, the complete set of alternate inferences is formed, by variation
of the compositions of the alternatives. That alternative is deemed correct
whose measure is (depending upon the type of inference) least or greatest. All
other alternatives are deemed incorrect.
This line of
thinking yields a pair of principles of reasoning. To gain an understanding of
how these principles apply in general requires absorption of many details.
We’ll avoid a portion of these details by looking at a simplified example. In
the example, we imagine that you, the reader, have been engaged to build a
model that predicts whether there will be rain in your locality in the next 24
hours. One of several unobserved state-spaces for your model is {rain in the next 24 hours, no rain in the next 24 hours}. We’ll
designate this state-space by O. Each
element of O is the outcome of a statistical event.
A model has
one or more independent variables. Your model has one independent variable;
we’ll designate it by the symbol I. I
is the state-space {D, S, R},
where D signifies the state dropping barometric pressure, S the state stable barometric pressure and R
the state rising barometric pressure.
As you may
recall, the state-space O is
unobserved. Inferences are made to O
from an observed state-space; let this state-space be designated by C. If C contains two or more states, it is apt to call these states “conditions,” for they are conditions on the space
formed by the model’s independent variables.
The elements
of the independent variable I of your
model are the ways in which a state in C can occur. Each state in C is either a state in I or is abstracted
from several such states by placing these states in a disjunction (logical OR
statement). An example of an abstracted state is S OR R. The state S OR R
is said to be “abstracted” from the state S
and from the state R because the
description supplied by S OR R is removed from the differences in the
descriptions provided by S and by R.
Under the
rules established in the previous paragraph, there are five alternatives for
the observed state-space C. They are:
{D, S, R}, {D, S
OR R}, {S, D OR R}, {R,
D OR S} and {D OR S OR
R}.
Each of the
five alternatives for C is associated
with a different inference from the observed state-space C to the unobserved state-space O.
Thus, there are five alternatives for the inference that will be made by the
model. Under the postulate of logic, one of these possibilities is correct. In
building your model, how will you discriminate the one correct alternative from
the four incorrect alternatives?
Under
ultra-optimization, the correct alternative is the one for which Shannon’s
measure is minimal; this alternative minimizes the missing
information about the state in O,
given the state in C. This is the
first of the two principles of reasoning under ultra-optimization. In making
this a principle of reasoning, one takes the position that “logic” and the
“probabilistic logic” are synonymous. We’ll take a look at the empirical basis
for this contention later.
To identify
the one correct alternative, one needs to compute Shannon’s measure of each of
the alternatives. To compute it, one needs first to assign a numerical value to
the probability of each state in C
and to the probability of each state in O,
given each state in C; this value is
a real number lying in the interval between 0 and 1. In making this assignment,
one needs a solution to the so-called “inverse problem.” This problem results
from the fact that experimental science gives rise to frequency ratios. The
builder of a model needs to invert this relationship such that values are
assigned to probabilities, given frequency ratios.
For the sake
of illustration, we’ll focus on the problem of how to assign a value to the
probability of the state S OR R in the observed state-space C. Let us suppose that an experiment is
run in support of your model. In this experiment, 500 statistical events are
observed. In 200 of these events the state S
OR C is observed. By definition, the
frequency ratio of S is 200 : 500.
The related quantity called the “relative frequency” is the quotient of the two
frequencies, namely 0.4.
The most
popular solution to the inverse problem is the so-called “straight rule.” Under
this rule, one assigns the relative frequency of a state to the probability of
this state. Thus, in our example, one assigns 0.4 to the probability of S OR
R.
The straight
rule is quite unsuitable for service as the second principle of reasoning, for
when it is used in conjunction with the first principle of reasoning, the
automatic result is for the model to fail! The cause of failure is the neglect,
under the straight rule, of missing information.
Neglect of missing information is the mistake made by the person who, having
observed a white swan, concludes that “all swans are white.”
The next
most popular solution to the inverse problem is called the “Bayes-Laplace
inverse probability theorem.” As it is derived from probability theory, one
subscribes to this theorem in subscribing to the probabilistic logic. However,
while the theorem itself is logical, the manner in which the theorem is
conventionally implemented is objectionable for being illogical. It is
illogical for identifying more than one inference, among a number of
alternatives, as the correct inference. As you may recall, logic postulates
that a single inference among the alternatives is the correct inference.
The second
principle of reasoning employs the Bayes-Laplace theorem, but provides a
logical implementation for it. This implementation employs maximization of the
missing information as the principle that discriminates the one correct
inference from the many incorrect inferences; the maximization is under
constraints, implemented mathematically, that express the available
information. Usually, the net effect of the constraints is to push the missing
information downward toward its minimum value of 0.
That the
missing information is maximized implies that the missing information possesses
a maximum. It possesses a maximum if and only if the states in the unobserved
state-space are “irreducible” in the sense that that these states are at the
least level of abstraction. The irreducible
states are called “the ways in which a state can occur.”
Now, we’ll
conduct a thought experiment in which a number of independent statistical
events are observed. In our experiment, separate counts are kept of events in
which the state is S and in which the
state is R. The results from this
experiment remain hidden from you, the model builder. While you’ll lack
detailed results from the experiment, you’ll nonetheless discover facts in reference
to these results that will prove quite useful.
In one
observed event, it is a fact that the relative frequency of the state S, given that this state is S OR R,
will be 0 OR 1. In two observed events, the relative frequency will be 0 OR ½
OR 1. In three observed events, the relative frequency will be 0 OR 1/3 OR 2/3
OR 1. In N observed events, the
relative frequency will be 0 OR 1/N
OR 2/N OR …OR 1.
Now, let N increase without limit. The relative
frequency becomes known as the “limiting relative frequency.” The limiting
relative frequency will be 0 OR 1/N
OR 2/N OR …OR 1.
The limiting
relative frequency may be any one of the numbers in the sequence 0, 1/N, 2/N,…,1.
Each number in the sequence is a “limiting relative frequency possibility.” The
set {0, 1/N, 2/N,…,1} of limiting relative frequency possibilities is an example
of an unobserved state-space. Your model makes an inference to this unobserved
state-space from the observed state-space { 0 OR 1/N OR 2/N OR …OR 1 } .
Note the close relationship between the unobserved and the observed
state-spaces. The single element of the observed state-space is abstracted from the elements of the unobserved
state-space.
Let the unobserved state-space {0, 1/N, 2/N,…,1} be designated
by F. Each element of F is irreducible and the corresponding
observed state-space contains a single element. It follows that a maximum
exists in the missing information about F.
Thus, values may be assigned to the set P(F) of probabilities of the elements of F by maximizing the missing information,
under constraints. It is apt to reference P(F) as a spectral density function, where
P(.)
represents the spectral density and F represents
the associated frequency. P(.) is the proportion in a statistical
ensemble while F is the proportion in
a statistical population.
Now let us
imagine that an experiment is conducted in which the results become known to
you, the model builder. In this experiment, the count n(S) is kept of events in
which the state is S and the count n(R)
is kept of events in which the state is R.
The frequency ratio of the state S,
given that the state is S OR R, is n(S) : [n(S)
+ n(R)]. The relative frequency of S
given S OR R is n(S) / [n(S) + n(R)].
With the
availability to you of the frequency ratio data, there are two constraints on
maximization of the missing information about the limiting relative frequency.
They are: 1) the frequency ratio data and 2) noise. Acting through the
Bayes-Laplace theorem, the frequency ratio data reduce the missing information
about the limiting relative frequency, with the result that the function P(F)
peaks about a particular limiting relative frequency. The noise increases the
missing information about the limiting relative frequency, making the function P(F)
more ambiguous about the limiting relative frequency..
The
objection to the manner in which the Bayes-Laplace theorem is conventionally
implemented relates to an input to the theorem. It is the spectral density function
which we’ll designate by Pʹ(F). Traditionally, but misleadingly, Pʹ(F) is called the “prior distribution” while P(F) is called the
“posterior distribution.” Like P(F), Pʹ(F) maps the set of limiting relative
frequency possibilities F to a set of
probabilities. In the absence of frequency ratio data, P(F) is identical to Pʹ(F). Otherwise, the two functions differ.
The
objection arises when Pʹ(F) is taken to be independent of the
frequency ratio data. If Pʹ(F) is not grounded in frequency ratio
data, critics argue, the selection of this function must be arbitrary. If it is
arbitrary, though, more than one such function exists. A result from the
arbitrariness is for several different assignments of numerical values to be
made to P(F). Each assignment makes a different inference. Each of the
several inferences is implied to be correct. That more than one inference is
implied to be correct violates the postulate of logic.
Under
ultra-optimization, Pʹ(F) is determined by frequency ratio
data. Maximization of the missing information, under the constraints, assigns
unique sets of values to Pʹ(F) and P(F); that these
assignments are unique conforms the methods of assignment to the postulate of
logic, thus eliminating the usual objection to use of the Bayes-Laplace
theorem. Using the terminology of communications engineering, it is apt to call
the assignment to Pʹ(F), the spectral density function of the
“noise” and the assignment to P(F), the spectral density function of the
“signal plus noise.”
As has been
shown, maximization of the missing information about the limiting relative
frequency of S OR R, under the constraints, yields the
function P(F), where F is the set of
limiting relative frequency possibilities for the state S given that the state is S
OR R and P is the associated set of probabilities. What we need, though, is
the assignment of a value to the probability that the state is S given that the state is S OR R.
By a theorem from probability theory, this value is the expected (average)
value of F.
We have
completed our derivation of a procedure for assignment of a value to the
probability that the state is S,
given that the state is S OR R. A similar derivation yields a
procedure for assignment of a value to the probability that the state is S, given that the state is S OR D.
In the next
step under the second principle of reasoning, a value is assigned to the
probability that the state is S. It
can be shown that this value is:
1 / [ 1/P(S given S OR R)
+ 1/P(S given S OR D) – 1 ]
where P(S
given S OR R) designates the probability that the state is S, given that it is S OR R and P(S
given S OR D) designates the probability that the state is S, given that it is S OR D. A similar
derivation yields a procedure for assignment of a value to the probability that
the state is R. and to the
probability that the state is D.
In the final
step, under the second principle of reasoning, one assigns a value to the
probability that the state is S OR R. As S and R are mutually exclusive, this value is the sum of the values
assigned to the probability of S and
to the probability of R.
Remembering
An aid to
remembering what you’ve learned is to translate key mathematical concepts into
English prose. It is apt to call the information about the state in O, given the state in C, the “knowledge”; in minimizing the
missing information about the state in O,
given the state in C, the first
principle of reasoning maximizes the knowledge. In maximizing the missing
information about the limiting relative frequency, under the constraints, the
second principle of reasoning satisfies the principle that has been called
“honesty in inferences,” for the effect is to avoid presumption of information
not possessed about the limiting relative frequency. Thus, one can translate the
first principle of reasoning to
Maximize the knowledge!
and the
second principle to:
Keep the inferences honest!
When a model
is built under ultra-optimization, the elements of the state-space C are abstractions
whose definitions satisfy the principles of reasoning. Under the theory of
knowledge of Ronald Christensen, such an abstraction is what one means by the
word “pattern.” Thus, an effect of ultra-optimization is “pattern discovery”
and the maximum possible knowledge is created by pattern discovery. To put this
more poetically:
Pattern discovery is motivated by the
quest for knowledge.
The latter
statement is an English translation of Christensen’s theory of knowledge. As
we’ve seen, Christensen’s theory has solid theoretical support. As we’ll see
later, this theory has solid empirical support.
A conclusion
reached earlier conflicts with a belief said typically to be held by
scientists. According to people who have studied the matter, scientists
typically believe that the probability of a state is the limiting relative
frequency of this state; this is the belief called “frequentism.” However, the
conclusion was reached that the probability of a state is the expected value of the limiting relative
frequency of this state. The expected value of the limiting relative frequency
is the equivalent of the limiting relative frequency if and only if the missing
information about the limiting relative frequency is nil.
Frequentism
arose in response to the illogic of the conventional implementation of the
Bayes-Laplace theorem. Believers in frequentism avoided this illogic by various
stratagems for eliminating the influence of the so-called “prior distribution” Pʹ(F) upon the assignment of a value to the probability of a state.
This was accomplished via solutions to the inverse problem that made Pʹ(F) superfluous; one of these solutions was the straight rule.
From the
perspective offered by ultra-optimization, it can be seen that frequentism
assumed the noise out of existence, for the so-called “prior distribution” Pʹ(F) is more aptly described as “the spectral density function of the
noise.” Lacking the noise, information was presumed which was missing. A result
from presumption of the missing information was for models built under
frequentism to be prolific generators of incorrect inferences.
After its
founding in the 19th century, frequentism went on to dominate the
field of mathematical statistics in the 20th century. That it
remains highly influential may well account for the observation that scientists
typically believe in frequentism.
With the
perspective offered by ultra-optimization, though, it becomes clear that
scientific research is a battle against noise. Ultra-optimized models
consistently excel because they wage this battle in the most efficient possible
manner. By assuming noise out of existence, frequentism sets up a situation in
which it is impossible to wage this battle in any coherent fashion.
If one
believes a myth that inhabits the literature of scientific methodology, a
battle rages between believers in frequentism, the so-called “frequentists,”
and believers in the alternate methodology called “Bayesianism,” the so-called
“Bayesians.” According to the myth, this battle never ends, for frequentists
endlessly point out the undoubted flaws in Bayesianism while Bayesians
endlessly point out the undoubted flaws in frequentism. Underlying the myth is
the assumption that there is not an alternative to Bayesianism or frequentism.
Is this
assumption correct? In demystifying this issue, it is helpful to identify
terminology that may be misleading. As the theorem that was left to us by
Thomas Bayes is rooted in probability theory, there is nothing wrong with this
theorem so long as the premises of probability theory hold true. The parties to
the mythological battle do not dispute the truthfulness of these premises. The
flaws of Bayesianism are disassociated with Bayes’s theorem but associated with
the necessity for a so-called “prior distribution.” In the mythological battle,
frequentists accuse Bayesians of selecting this distribution arbitrarily. Given
that frequentists are accurate in doing so, Bayesians select this distribution
arbitrarily. However, it is misleading to call all opponents of frequentism
“Bayesians,” for it is possible to employ Bayes’ theorem without selecting the
prior distribution arbitrarily. In fact, selection without arbitrariness is a
feature of ultra-optimization.
The
mythological battle between the frequentists and the Bayesians pits one flawed
methodology against another. The assumption supporting this mythology, that
there is no alternative to frequentism or Bayesianism, is false. The
methodology called “ultra-optimization” exists as an alternative and it
contains none of the shortcomings of the mythological combatants.
Empirical basis
A scientific
theory called “Christensen’s theory of knowledge” is associated with
ultra-optimization. This theory predicts that models built under
ultra-optimization consistently excel.
Models built
under ultra-optimization include these:
o
the syllogisms of the deductive logic,
o
the theory of fair gambling
devices,
o
The
theory of heat called thermodynamics and,
o
the theory of communication.
Tests of
these models against alternatives have frequently been conducted, over periods
ranging from decades to millennia. Data from tests of additional models are
presented in the journal article entitled “Entropy Minimax Multivariate
Statistical Modeling – II Applications” (Int.
J. General Systems, 1986, Vol 12, 227-305). All of the empirical evidence
is consistent with Christensen’s theory.
Implications for science
The
strengths of the theoretical and empirical bases for ultra-optimization imply
that:
o
the
problem of induction is solved,
o
“logic”
is synonymous with the “probabilistic logic,”
o
Shannon’s
is the measure of inferences,
o
the
principles of reasoning are ultra-optimization,
o
the
theory of knowledge is Christensen’s,
o
scientific
research is a battle against noise and,
o
by
assuming noise out of existence, the widely held belief called “frequentism”
obviates the possibility of conducting research in a coherent fashion.
Questions
addressed by construction of models under ultra-optimization include:
o
whether a drug
will be found to retard lymphoid leukemia, lymphocytic leukemia or
melanocarcinoma in mice, based on this drug’s physical, chemical and biological
features,
o
whether a patient
will be found to have heart disease subsequent to his/her electrocardiogram,
ECG,
o
whether an ECG
waveform indicates a normal or an abnormal heartbeat,
o
which features of
an ECG or other waveform contain the most information about outcomes,
o
whether a biopsy
will reveal prostate cancer, conditioned on a patient’s level of prostate specific
antigen, PSA, plus the values of other independent variables,
o
whether a biopsy
will reveal cervical cancer, based on spectral analysis of data from tissue
fluorescence,
o
whether a biopsy
will reveal breast cancer, based on electrical potentials produced by the
patient’s heart beats,
o
whether patients
with lymphoma, chronic granulocytic leukemia or prostate cancer have high or
low survival risks,
o
whether patients
surgically treated for coronary artery disease have high or low survival risks,
based on catheterization and clinical data,
o
whether a paroled
prison inmate will return to prison,
o
whether
depression and related psychological states are related to early childhood
memories,
o
whether nuclear
reactor fuel will be sufficiently deformed under accident conditions to
obstruct coolant flow,
o
whether nuclear
reactor fuel will be found to be leaking radioactive substances if removed from
a reactor and tested,
o
whether a
gasoline storage tank will be found to be leaking a carcinogen into an aquifer
or an explosive into adjacent basements, if dug up and tested and,
o
how a
photographic or video image should be classified as to type.
Decisions
that have been supported by models built under ultra-optimization include:
o
the course of
treatment for non-Hodgkin’s lymphoma,
o
the course of
treatment for disorders of the cervical spine,
o
which factors, in
addition to
o
which factors
(now referenced in medicine as International Prognostic Indices, IPIs) indicate
high risk for patients with lymphoma, chronic granulocytic leukemia or prostate
cancer, for consideration in treatment selection,
o
whether to submit
a request for approval of a diagnostic technique for breast cancer to the U.S.
Food and Drug Administration, FDA,
o
whether to submit
a request for approval of a diagnostic technique for cervical cancer to the
FDA,
o
whether the U.S.
Nuclear Regulatory Commission should require further research before certifying
that nuclear reactors are adequately safe from loss of coolant accidents,
o
whether to
restart a nuclear reactor containing parts that might fail in service,
o
whether to
suspend licensing of nuclear reactors,
o
when to replace a
leakage-prone gasoline storage tank,
o
the level of
water that should be kept behind a dam, in light of the long range forecast for
precipitation,
o
how an electric
utility should plan for demand for air conditioning and,
o
whether an
electric utility’s rate should be adjusted, in light of the long range forecast
for precipitation.
Factors
discovered by ultra-optimization are embedded in the medical standard for the
classification of patients with non-Hodgkin’s lymphoma.
In the
literature, ultra-optimization by minimization or maximization of the missing
information in each inference made by a model is usually referenced by the
phrase “entropy minimax”; this is the descriptor given to ultra-optimization by
its developer, Ronald Christensen. This document employs “ultra-optimization”
in preference to “entropy minimax” and “missing information” in preference to
“entropy.” It does this on the basis of evidence that this usage communicates
more effectively to a wide audience. There is evidence of confusion among
non-scientists and many scientists about what is meant by “entropy.”
In coming up
to speed on ultra-optimization and in building models, it is highly cost
effective to get outside help. KnowledgeToTheMax
supplies this market with services that include:
o
conduct
of seminars,
o
consultancy
on science policy,
o
consultancy
on curriculum reform in higher education,
o
management
of theoretical aspects of scientific studies and,
o
construction
of ultra-optimized models.
In cases in
which there is a degree of mechanistic understanding of the phenomenon being
modeled, KnowledgeToTheMax can
augment the supply of information with inputs from a mechanistic model.
A portion of
the technology used in building an ultra-optimized model is proprietary. In
constructing a model, KnowledgeToTheMax
operates under a technology sharing agreement with the developer of
ultra-optimization, Dr. Ronald Christensen. In doing so, KnowledgeToTheMax brings Christensen’s 4 decades of experience to
bear on the problem of how to ultra-optimize the construction of a model. Under
its agreement with Christensen, the ability of KnowledgeToTheMax to build models for firms in the for-profit
sector is subject to Christensen’s approval. The firm has unrestricted license
to build models for firms in the not-for-profit sector.
On Oct. 23,
2008, Terry Oldberg of KnowledgeToTheMax presented
a lecture entitled “Information Theory: Maximizing Knowledge” to a meeting of
the American Nuclear Society in San Francisco, California.
On Nov. 20,
2008, Oldberg presented a lecture entitled “Maximizing Knowledge” to a meeting
of the American Chemical Society in Santa Clara, California. The announcement
for the meeting is posted here.
On Feb. 11,
2009, Oldberg presented a lecture entitled “Maximizing Knowledge” to a meeting
of the American Society for Quality in Santa Clara, California.
On May 7,
2009, Oldberg will present a lecture entitled “Maximizing Knowledge” to a
meeting of the American Institute of Chemical Engineers in Berkeley,
California.
KnowledgeToTheMax offers a free one hour lecture on
the topic of ”Maximizing knowledge” to groups of 10 or more people in the San
Francisco Bay Area of California. It offers the same lecture to groups outside
the Bay Area for a fee.
A
bibliography is available by clicking here. The literature
is large and not user friendly. Proofs of key theorems are absent. Hence, it
would be far more cost effective to engage a tutor than to attempt to climb the
learning curve unaided.
For further
information, please contact the owner-operator of KnowledgeToTheMax, Terry Oldberg. He may be reached at or terry@KnowledgeToTheMax.com or
1-650-947-0811 (Los Altos Hills, California).
a