TEXT   24

assemble

Guest on 26th July 2022 12:57:49 AM

  1. COMP 3100/3800/3204: Software Engineering
  2. Specification of the Ace Machine
  3. and Assembler Formats
  4. Program Names
  5. There are three programs which are part of the Ace project: a compiler named ace, an assembler named asm and a virtual machine named avm. If you are only taking Comp3100/3800 you will only be building the virtual machine, but reading this document will help you understand how the virtual machine's input file is translated from higher-level languages. The ace program (the Ace compiler) compiles programs written in a high-level language (with suffix .ace) into the assembler file format. The asm program (the Ace assembler) assembles files of suffix .asm into files of suffix .avm which are then interpreted and executed by the avm program (the Ace virtual machine).
  6.  
  7. Assembly Language
  8. The assembly language is a fairly straightforward representation of the instructions and values of the machine. It is usually stored in a file suffixed .asm (dot-lowercase-a-s-m) and assembled into a code file suffixed .avm.
  9.  
  10. Comments begin with a number sign # and go to the end of the line of text.
  11.  
  12. Each type of memory value is grouped into its own section of the assembly file, and instructions are grouped in another section. Each section is prefixed by an identifying word. INT introduces the integer data section, DOUBLE the floating-point data section, STRING the string data section and CODE the instruction section. The sections must appear in the above order, and each section may appear only once in any assembly file. A missing section is taken to be empty; the code section may not be empty. The successive elements of each section are assembled in order. The first value in the first data section starts at memory cell address 0, and the addresses increase as successive values are assembled. Instructions are part of code space, so the numbering begins again at code address 0 with the first instruction in the CODE section.
  13.  
  14. Any address in any section may be labeled by an alphanumeric string, followed by a colon. The label may be used as an operand; its value is the address of the memory cell or instruction that it labels (not the value of that memory cell or instruction). For example,
  15.  
  16. max:    7
  17. defines an integer cell labeled max whose initial value is 7. The value may be loaded into register zero as follows:
  18.         icopy   max, r0
  19.         icopy   [r0], r0
  20. Since max is the address of a cell containing 7, the first line in the above code loads r0 with that address, and the next line loads the integer from that cell of memory into r0.
  21.  
  22. To clarify what constitutes a label: a label must consist of alphabetic or numeric characters or the underscore character '_', and it must begin with an alphabetic character. Labels which are keywords in the assembler are disallowed, such as r0 to r63 or pc or sp or fp or any of the instruction names.
  23.  
  24. In the code section, labels represent instruction addresses; they need no adornment and are usually translated into the pc-relative addressing mode. For example,
  25.  
  26. loop:
  27.         bra     loop
  28. is an infinite loop representing a branch of +0.
  29. Each label must be unique across all sections of the program, but there may be multiple labels with the same value, for instance:
  30.  
  31. loop_test:
  32. exit_test:
  33.         icmp    r0, r1
  34.         bra     loop_end
  35. defines two labels which both address the same instruction.
  36. Instruction syntax is defined in the document specifying the Ace virtual machine. The only difference from that definition and the actual assembly language is that labels may be used as literal values and branch and call targets, and that the assembler is free to encode small literals as large literals if desired.
  37.  
  38. Integer and floating-point values are represented in their usual textual form. Integers may be hexadecimal or octal, so indicated by a 0x or 0 prefix respectively. They may also be normal decimal values.
  39.  
  40. Strings are represented as double-quoted strings, equivalent to those in C. Note that this is not the same as the format used in virtual machine code files: that format has no surrounding double quotes and a less general escape mechanism for special characters.
  41.  
  42. In the integer, floating-point, and string sections, a label may be followed by an integer value in square brackets, for example,
  43.  
  44. myvalues: [10]
  45. This defines (in this case) ten units of storage beginning at the current address. If some of those values are to be initialized, they must appear as a comma-separated list on the same line of text as the original label, and they initialize the first elements of the block of storage. For example,
  46. pow2: [8] 1, 2, 4, 8
  47. defines a block of 8 integers but initializes only the first four of those elements. The remaining elements of the block are initialized to the default value, 0 for integers, 0.0 for doubles, and the empty string for strings.
  48. In general, the input is free format, except that there can be only one instruction on a line, and an entire instruction definition must be on a single line. Also, as noted, the definition of block data is sensitive to the placement of newlines.
  49.  
  50. Machine Format
  51. The input file to the Ace virtual machine, usually stored in a file suffixed .avm (dot-lowercase-a-v-m), is a textual encoding, one item per line, of the various initial values to be loaded into the code and memory sections of the program. The first line of the file contains a comment which begins with a hash character # and continues until the end of the line. This line is ignored by the virtual machine. The second line of the file contains four decimal integers, separated by a single space character, stating the number of lines in each of the four sections of the program, in the order: integer, double, string, code. Call these four values Nint, Ndouble, Nstring, and Ncode. All must be non-negative, and Ncode must be greater than zero.
  52.  
  53. The next Nint lines give the initial values of the successive integer values in the virtual machine's memory cells. These are stored as plain signed decimal integers, as might be converted using atoi in C.
  54.  
  55. Following the integers are Ndouble lines giving the initial values of the successive double-precision floating-point values in the machine's memory cells. These are stored as plain, signed, floating-point values, as might be converted by atof in C.
  56.  
  57. Following the floating-point values are Nstring lines giving the initial values of the successive string values in the machine's memory cells. These are encoded as character data, but with escapes to encode newlines and other difficult-to-represent values. The only escape sequences recognized in this encoding are: \n for newline, \t for tab, \b for backspace, \r for carriage return, \f for vertical tab (form feed), and \\ for a single backslash. Any other backslash-character sequence represents just the character following the backslash, so \v represents just the letter v. Note that each string value occupies a single line of text. The newline that terminates that line is not included in the value, and newlines within the string must be represented explicitly using the \n notation. There are no quotes around the string, so quotes within the string can simply be represented with a quotation mark, or they may be escaped using backslash.
  58.  
  59. The last Ncode lines of the file contain the instructions of the machine, one instruction per line. The instructions are encoded as described in the definition of the Ace virtual machine, and are stored in the file as 8-digit, unsigned, zero padded hexadecimal numbers. Thus, each instruction is stored as 8 hex digits followed by a newline; there is no leading 0x. The instructions are stored in increasing address order, without gaps, starting at address 0.
  60.  
  61. At the start of executing the program in the virtual machine, the memory cells of the machine will be initialised with the values as given, in the order given above, so integers will appear in memory before floating-point numbers, and then the strings.
  62.  
  63. A halt instruction is appended to the code. Implementation-defined space is allocated, at the top of the initialised memory cells, to hold the stack. Note that values must be supplied for every cell of memory to be used in the program (except the stack); there is no mechanism for defining an uninitialized block of values.
  64.  
  65. Example Assembler File .asm
  66. The following is an example assembly language program, which adds up the floating-point numbers in an array and prints the total.
  67.  
  68. # Example assembler file
  69.  
  70. INT
  71.  
  72. max:            5
  73.  
  74. DOUBLE
  75.  
  76. total:          0.0
  77. numbers:        [5] 0.35, 0.57, 0.76, 0.61, 0.83
  78.  
  79. STRING
  80.  
  81. report:         "The total is "
  82. newline:        "\n"
  83.  
  84. CODE
  85.  
  86.         icopy   0, r0                   # counter
  87.         icopy   max, r1                 # address of max
  88.         icopy   [r1], r1                # r1 is now 5
  89.         icopy   total, r2               # address of total
  90.         icopy   numbers, r3             # address of numbers
  91. loop:
  92.         icmp    r0, r1
  93.         bge     end
  94.         dadd    [r3], [r2], [r2]        # add to total
  95.         iadd    1, r0                   # step counter
  96.         iadd    1, r3                   # step pointer
  97.         bra     loop
  98. end:
  99.         icopy   report, r5              # address of report
  100.         sprint  [r5]                    # print report
  101.         dprint  [r2]                    # print total
  102.         icopy   newline, r5             # address of newline
  103.         sprint  [r5]                    # print newline
  104.         halt
  105. Example Virtual Machine Code File .avm
  106. The program in the previous section, when assembled, yields the following .avm file.
  107.  
  108. # avm file
  109. 1 6 2 17
  110. 5
  111. 0.0
  112. 0.35
  113. 0.57
  114. 0.76
  115. 0.61
  116. 0.83
  117. The total is
  118. \n
  119. 31000080
  120. 31000081
  121. 31c10081
  122. 31010082
  123. 31020083
  124. 24808100
  125. 04400005
  126. 47c3c2c2
  127. 27400180
  128. 27400183
  129. 027ffffb
  130. 31070085
  131. 66c50000
  132. 46c20000
  133. 31080085
  134. 66c50000
  135. 00000000
  136.  
  137. Copyright ©  R. Pike and L. Patric

Raw Paste


Login or Register to edit or fork this paste. It's free.