Wednesday, 4 January 2012

Designing a Two Pass Assembler


Designing a Two Pass Assembler

In this post I will explain the basic logic involved in designing a two pass assembler and write the 'C' program code to simulate the Pass1 and Pass2 of the Assembler.

A Simple Assembler consists of Two data structures namely,

1. OPTAB(Operation Table)
2. SYMTAB(Symbol Table)

Note - LOCCTR is a variable that is used in assigning the addresses. Initially it is assigned to the address specified after 'START' , then after reading each instruction it is incremented to the length of the Instruction so as to keep track of the location.

OPTAB 

1. In a simple Assembler it looks for menomic operation code and translate the into machine language equivalents.

2. In a complex Assemblers, it also contains information about instruction format and length.

3. OPTAB in first pass is used to find the instruction length for incrementing LOCCTR, and in second pass tells which information format to use in assembling the instruction.

4. Organised as a Hash Table with mnemonic opcode as Key.

SYMTAB

1. Includes name and address for each Label in the program.

2. During Pass1 of the assembler,labels and the address specified by LOCCTR are entered into SYMTAB as we encounter the labels in the program.

3. During Pass2 we lookup the Symbols used as operands and obtain the address to be inserted in the assembled instructions.

4. Implemented as a hash table for efficiency. However the hash function must be properly selected.

Programming the Pass1 and Pass2


Now let us focus on programming the pass1 in 'C' language.

Pass1

  • Here we use three files namely - Input.c, Intermediate.c, SymbolTable.c (you can name it as you wish, but use this for clarity).
  • Pass1 usually wrires Intermediate file that contains source statement together with its assigned address.
  • Here we use a function named "wordcount" to compute the number of words in each line of the program.
Let us begin with the "wordcount() function." It reads the "Input.c" file which contains the Input or source program.

variables used - "word" to keep track of the number of words, "i" to keep track of the line number, character "c" to fetch each character from the input file.

Logic 

1. If "c" encounters a blank space increment the word and if it encounters a "\n i.e. new line", increment the word and then display the number of words in the line which is ofcourse the value in "word". For later usage let us store the no.of words in each line in a global array count.

2. Increment the i value, and set word to 0 and repeat step 1.

3. Repeat the above steps until EOF(End of File) is encountered.

Code

int count[20];
void wordcount()
{
FILE *f3;
int word=0,i=1;
char c;

f3=fopen("INPUT.C","r");
c=fgetc(f3);
while(c!=EOF)
{
if(c==' ') // blank space
word+=1;
if(c=='\n')
{
word+=1;
count[i] = word;
printf("\n No.of Words in line number %d is %d",i,word);
i++;
word=0; // Important
}
c=fgetc(f3);
}
fclose(f3);
}

After wordcount we changer our focus to construct the INTERMEDIATE.C and SYMBOLTABLE.c,

Logic

1. Use the LOCCTR to keep track of addresses.
2. We read each line from INPUT.C,
  • If the line has 1 word, then it means it has only a "Mnemonic" code, so add the mnemonic to INTERMEDIATE.c along with LOCCTR value.
  • If the line has 2 words, then it means it has "Mnemonic" as well as an "operand". Add both of this and LOCCTR value in INTERMEDIATE.c.
  • If the line has 3 words, then it has "Mnemonic", "Operand" as well as label before the mnemonic. So add Label, Mnemonic, operand and LOCCTR address to the Intermediate.c.
3. Repeat the step 2 until the end of file INPUT.c
4. Thus all the tables get constructed.

Code

FILE *f1,*f2,*f3;

int linenumber, locctr;
char lbl[20], mne[10], opd[10], ch;

printf("\n Word count for Input program");
wordcount();

f1=fopen("INPUT.c","r");
f2=fopen("INTERMEDIATE.c","w");
f3=fopen("SYMTAB.c","w");

fscanf(f1,"%s %s %x", lbl, mne, &locctr);
linenumber=2;
while(!feof(f1))
{

if(count[linenumber]==1)
{
fscanf(f1,"%s\n",mne);
fprintf(f3,"%x \t %s\n",locctr,mne);
}

if(count[linenumber]==2)
{
fscanf(f1,"%s %s\n",mne,opd);
fprintf(f3,"%x \t %s \t %s\n",locctr,mne,opd);
}
if(count[linenumber]==3)
{
fscanf(f1,"%s %s %s\n",lbl,mne,opd);
fprintf(f3,"%x \t %s \t %s \t %s\n",locctr,lbl,mne,opd);
fprintf(f2,"%s \t %x",lbl,locctr);
}

linenumber+=1;

if(strcmp(mne,"WORD")==0)
locctr+=3;
else if(strcmp(mne,"BYTE")==0)
locctr+=strlen(opd);
else if(strcmp(mne,"RESW")==0)
locctr+=3*atoi(opd);
else if(strcmp(mne,"RESB")==0)
locctr+=atoi(opd);

//atoi converts string to integer

else
locctr+=3;
}
fclose(f1);
fclose(f2);
fclose(f2);
}

0 comments: