Much of what is presented here applies both to the lambda calculus and to combinatory logic.
There is often a choice of reduction order, for example
(λx.(+ x x) (+ 1 2))
do we want to go λx.(+ x x) (+ 1 2) -> (+ (+ 1 2) (+ 1 2)) -> (+ 3 3) -> 6 or should we prefer λx.(+ x x) (+ 1 2) -> λx.(+ x x) 3 -> (+ 3 3) -> 6 ?
redex = a reducible function application.
An expression is in normal form when it contains no further redexes.
leftmost redex: that redex whose abstraction or predefined function is textually to the left of all other redexes. There is at most one of these.
rightmost redex: that redex whose abstraction or predefined function is textually to the right of all other redexes
outermost redex: a redex which is not contained in any other redexes.
innermost redex: a redex which contains no other redexes. There can be several of these, hence we talk of leftmost innermost etc.
Applicative-order reduction (AOR) : reduce the leftmost innermost redex first
Normal-order reduction (NOR) : reduce the leftmost outermost redex first
And there are two theorems to help us.
We will write P -> Q to mean that we can reduce P to Q by one use of a reduction rule (alpha-delta), and P ->* Q to mean that we can reduce P to Q by a sequence of zero or more uses of reduction rules (alpha-delta).
There is a property that any rewrite relation => might have. => is confluent iff for all E1, E2, E3, E4 if E1=>*E2 and E1=>*E3 then there is a E4 such that E2=>*E4 and E3=>*E4. This is often called the diamond property and is depicted
This property is important in as much as it means a) that all normal forms are unique and b) that even if a rewrite 'goes off in the wrong direction' further rewriting can bring it back to the normal form (so no backtracking is needed).
Our reduction relation for the lambda calculus (or combinatory logic) has the property of being confluent. (Proved by Church and Rosser as their Theorem 1).
The two theorems are:- Standardization theorem or Church-Rosser II. If an expression E has a normal form then reducing the leftmost outermost redex (NOR) at each stage in the reduction of E guarantees to reach that normal form (up to alphabetic equivalence). And Church-Rosser I which has as a consequence : If an expression E can be reduced in two different ways to two normal forms then these normal forms are the same (up to alphabetic equivalence).
This seems to mean that we should prefer Normal Order Reduction. However there is a problem with NOR. The lambda calculus itself has no notion of sharing, all beta-reduction is reduction by textual replacement, and this might mean gross inefficiencies, so for example
λx.(+ x x) (+ 1 2) -> (+ (+ 1 2) (+ 1 2)) -> (+ 3 3) -> 6
the (+ 1 2)s will get evaluated twice
What we are trying to balance in the whole approach is actually terminating (favouring by NOR) with being efficient (favouring AOR). And the modern style is to attempt, by various tricks and subterfuges, to produce an efficient NOR. And the main way of doing this is by keeping track of shared expressions and evaluating them only once. NOR plus sharing is part of lazy evaluation.
Also if reasoning is to be done about programs-- in particular, concerning the substitution of equal expressions, we usually will want to know that normal forms are reached and that comes using NOR or lazy evaluation.
Another problem that we have with reduction in the lambda calculus is that of capturing. Let us remind ourselves of capturing
(λy.λx.(+ y x) x) might mistakenly be reduced to λx.(+ x x)
and this is wrong (although it can be avoided by rewriting the variables). In this case there is a free x as the argument for the original application. This can never arise in a functional programming case of reduction where all top level arguments are going to be constants or functions or whatever... (No free variables at the top level.) However, exactly the same effect can be produced without initial free variables by reducing within the body of a lambda abstraction.
λx.(λy.λx.(+ y x) x )
is not in normal form. But if we attempt to reduce within the body of it we might get λx.(λx.(+ x x)) which is wrong. The modern response to this comes from considering how the free variables and capturing arise. We are starting here with an expression with no free variables in it (which presumably will always be the case with a complete program), yet should we try to reduce inside the body of an abstraction we run the risk of capturing. And you would wonder why we might want to do such a reduction. After all an abstraction is a function of one formal argument; there is no point in reducing it until we know what its actual argument is; in which case, when the actual argument appears we can reduce using it, then we will not be reducing within the body of an abstraction and hence there will be no problem with capturing. To review, do nothing with, for example
λx.(λy.λx.(+ y x) x )
until an argument for it appears, say,
(λx.(λy.λx.(+ y x) x ) 7)
then reduce for that argument -> (λy.λx.(+ y x) 7) -> λx.(+ 7 x) and there is no problem with capturing.
Hence An expression E is in weak head normal form (WHNF) iff
either i) E is a constant,
or ii) E is a variable,
or iii) E is λx.E' for any E'
or iv) E is of the form (F E1 E2 .. En) for a constant ('predefined') function F of arity k>n (i.e. there are not enough arguments to reduce).
Roughly, an expression is in WHNF if it is either in normal form or it is an irreducible unapplied function or abstraction perhaps with reducible arguments. And when we are doing reductions, we reduce to weak head normal form, and not to normal form. Should an argument appear for an expression in WHNF then we reduce the expression using normal order reduction. The advantage of reducing to WHNF (as opposed to full normal form) is that it avoids the need to perform beta reductions in the presence of free variables. Thus avoiding the name clash problem and having to look for alpha rewrites.
There is another place where reduction arises and that is with constructors. Take, for example,the CONS constructor function used in list construction. Most programming systems in use up until the 80's used applicative order strict constructors, so that the constructor evaluated its arguments straight away. But various interesting data structures can be built if the constructors do not use this type of evaluation. For example, say one wanted a list of all the positive integers, now if this came in its infinite entirety it would not be much good (because it would swamp the computers); however, if your list consisted of a single number, and the promise of the next infinity of numbers, where when an attempt was made to cash the promise all you got was the next number and another promise... all would be fine. An infinite list and its accessors would look something like
n-andlarger n =df. cons n promise of n-andlarger (n+1)
head list ... standard
tail list=df. cash promise of tail of list.
What you need to do this sort of thing is some way of promising and cashing promises. It can be done in LISP (by delaying and forcing). But really all that is needed is for the constructor not to evaluate its arguments until they are needed, for the constructor to be lazy.
n-andlarger n =df. lazyCons n n-andlarger (n+1)
head list ... standard
tail list ... standard
Another example where this might be useful is with successive approximation (eg with Newton-Raphson approximation)-- the algorithm will supply 'infinitely many' approximations, but thanks to lazy constructors these will appear just one at a time.
(*You should check here Abelson and Sussman(s) on stream processing. Many consider Hal Abelson's, Jerry Sussman's and Julie Sussman's Structure and Interpretation of Computer Programs (MIT Press, 1984; ISBN 0-262-01077-1) one and the finest books, if not the finest book, on computer science ever written (and I, as a modest and insignificant outsider, agree entirely). They have a section on 'stream processing' that uses the wrapping-and-not-evaluating technique. The book has its own website and is available online. Structure and Interpretation of Computer Programs*)
Constructors can also be included in the definition of weak head normal form. A constructor is normally taken to be a predefined function but one without a reduction rule. So, whenever you meet say C <arg1> <arg2> .. <argn> , where C is a constructor, the whole expression is not a redex even though some of the arguments might be-- C <arg1> <arg2> .. <argn> is in weak head normal form.
Lazy evaluation is also needed to treat interactive programs in a functional way. The problem here is that the User types, say, some input and the program produces some output. Were we to say that the output is just simply a function of the particular localized input we would be in trouble, because the User can type 'F' , say, and produce an output, then 5 minutes later type another 'F' and produce an entirely different output, so output['F'] does not equal output['F'] (which it should do to yield the referential transparency characteristic of functional programs). The solution to this is to say that the output sequence (or list or stream) is a function of the input sequence (or list or stream), and then, for example, output['F'::'G'::'H'::'G':: ...] will equal output['F'::'G'::'H'::'G':: ...]. Now a slightly different problem arises, the input stream started some time in the past, has continued to now, and presumably will continue into the future; the output function which needs this stream as its argument cannot have all of it now; however, the output function can have any or all of the input stream that has put in an appearance until the present. So, if the input stream is conceived of as being built by lazy constructors and accessed by lazy accessors, all is well.
eager evaluation : do anything you can, arguments first, straight away (always strict).
lazy evaluation : don't do anything unless you have to. NOR to WHNF + sharing + lazy constructors, not strict