What is a Compiler?
Recall from your study of assembly language or computer organization the kinds ofinstructions that the computer’s CPU is capable of executing. In general, they are very
simple, primitive operations. For example, there are often instructions which do the
following kinds of operations: (1) add two numbers stored in memory, (2) move numbers
from one location in memory to another, (3) move information between the CPU and
memory. But there is certainly no single instruction capable of computing an arbitrary
expression such as ((x-x0)2 + (x-x1)2)1/2, and there is no way to do the following
with a single instruction:
if (array6[loc]<MAX) sum = 0; else array6[loc] = 0;
These capabilities are implemented with a software translator, known as a
compiler. The function of the compiler is to accept statements such as those above and
translate them into sequences of machine language operations which, if loaded into
memory and executed, would carry out the intended computation. It is important to bear
in mind that when processing a statement such as x = x ∗ 9; the compiler does not perform
the multiplication. The compiler generates, as output, a sequence of instructions, including a "multiply" instruction.
Languages which permit complex operations, such as the ones above, are called
high-level languages, or programming languages. A compiler accepts as input a
program written in a particular high-level language and produces as output an equivalent
program in machine language for a particular machine called the target machine. We say
that two programs are equivalent if they always produce the same output when given the
same input. The input program is known as the source program, and its language is the
source language. The output program is known as the object program, and its language
is the object language. A compiler translates source language programs into equivalent
object language programs.
Some examples of compilers are:
A Java compiler for the Apple MacintoshA COBOL compiler for the SUN
A C++ compiler for the Apple Macintosh
If a portion of the input to a C++ compiler looked like this:
A = B + C ∗ D;
the output corresponding to this input might look something like this:
LOD R1,C // Load the value of C into reg 1
MUL R1,D // Multiply the value of D by reg 1
STO R1,TEMP1 // Store the result in TEMP1
LOD R1,B // Load the value of B into reg 1
ADD R1,TEMP1 // Add value of Temp1 to register 1
STO R1,TEMP2 // Store the result in TEMP2
MOV A,TEMP2 // Move TEMP2 to A, the final result
The compiler must be smart enough to know that the multiplication should be
done before the addition even though the addition is read first when scanning the input.
The compiler must also be smart enough to know whether the input is a correctly formed
program (this is called checking for proper syntax), and to issue helpful error messages if
there are syntax errors.
Note the somewhat convoluted logic after the Test instruction in Sample
Problem 1.1(a) (see p. 3). Why didn’t it simply branch to L3 if the condition code
indicated that the first operand (X) was greater than or equal to the second operand
(Temp1), thus eliminating an unnecessary branch instruction and label? Some compilers
might actually do this, but the point is that even if the architecture of the target machine
Sample Problem (a)
Show the output of a C/C++ compiler, in any typical assembly language, for thefollowing C/C++ input string:
while (x<a+b) x = 2*x;
Solution:
L1: LOD R1,A // Load A into reg. 1
ADD R1,B // Add B to reg. 1
STO R1,Temp1 // Temp1 = A + B
CMP X,Temp1 // Test for while condition
BL L2 // Continue with loop if X<Temp1
B L3 // Terminate loop
L2: LOD R1,=’2'
MUL R1,X
STO R1,X // X = 2*X
B L1 // Repeat loop
L3:
permits it, many compilers will not generate optimal code. In designing a compiler, the
primary concern is that the object program be semantically equivalent to the source
program (i.e. that they mean the same thing, or produce the same output for a given
input). Object program efficiency is important, but not as important as correct code
generation.
What are the advantages of a high-level language over machine or assembly
language? (1) Machine language (and even assembly language) is difficult to work with
and difficult to maintain. (2) With a high-level language you have a much greater degree
of machine independence and portability from one kind of computer to another (as long as
the other machine has a compiler for that language). (3) You don’t have to retrain
application programmers every time a new machine (with a new instruction set) is
introduced. (4) High-level languages may support data abstraction (through data structures) and program abstraction (procedures and functions).
What are the disadvantages of high-level languages? (1) The programmer
doesn’t have complete control of the machine’s resources (registers, interrupts, I/O
buffers). (2) The compiler may generate inefficient machine language programs. (3)
Additional software – the compiler – is needed in order to use a high-level language. As
compiler development and hardware have improved over the years, these disadvantages
have become less problematic. Consequently, most programming today is done with
high-level languages.
An interpreter is software which serves a purpose very similar to that of a
compiler. The input to an interpreter is a program written in a high-level language, but
rather than generating a machine language program, the interpreter actually carries out the
computations specified in the source program. In other words, the output of a compiler is
a program, whereas the output of an interpreter is the source program’s output. Figure 1.1
shows that although the input may be identical, compilers and interpreters produce very
different output. Nevertheless, many of the techniques used in designing compilers are
also applicable to interpreters.
Post a Comment