











                      An Assembly Language Primer

                       (C) 1983 by David Whitman






















































                    TABLE OF CONTENTS


      Introduction.......................................2

      The Computer As A Bit Pattern Manipulator..........3

      Digression: A Notation System for Bit Patterns.....5

      Addressing Memory..................................7

      The Contents of Memory: Data and Programs..........8

      The Dawn of Assembly Language......................9

      The 8088..........................................11

      Assembly Language Syntax..........................14

      The Stack.........................................17

      Software Interrupts...............................19

      Pseudo-Operations.................................21

      Tutorial..........................................23






































                                                                      2

      >>INTRODUCTION<<

      Many people requesting CHASM have indicated that they are
      interested in *learning* assembly language.  They are beginners,
      and have little idea just where to start.  This primer is
      directed to those users.  Experienced users will probably find
      little here that they do not already know.

      Being a primer, this text will not teach you everything there is
      to know about assembly language programming.  It's purpose is to
      give you some of the vocabulary and general ideas which will help
      you on your way.

      I must make a small caveat: I consider myself a relative beginner
      in assembly language programming.  A big part of the reason for
      writing CHASM was to try and learn this branch of programming
      from the inside out.  I think I've learned quite a bit, but it's
      quite possible that some of the ideas I relate here may have some
      small, or even large, flaws in them.  Nonetheless, I have
      produced a number of working assembly language programs by
      following the ideas presented here.











































                                                                      3

      >>THE COMPUTER AS A BIT PATTERN MANIPULATOR.<<

      We all have some conception about what a computer does.  On one
      level, it may be thought of as a machine which can execute BASIC
      programs.  Another idea is that the computer is a number
      crunching device.  As I write this primer, I'm using my computer
      as a word processor.

      I'd like to introduce a more general concept of just what sort of
      machine a computer is: a bit pattern manipulator.

      I'm certain that everyone has been introduced to the idea of a
      *bit*.  (Note: Throughout this primer, a word enclosed in
      *asterisks* is to be read as if it were in italics.)  A bit has
      two states: on and off, typically represented with the symbols
      "1"  and "0".  In this context, DON'T think of 1 and 0 as
      numbers.  They are merely convenient shorthand labels for the
      state of a bit.

      The memory of your computer consists of a huge collection of
      bits, each of which could be in either the 1 or 0 (on or off)
      state.

      At the heart of your computer is an 8088 microprocessor chip,
      made by Intel.  What this chip can do is manipulate the bits
      which make up the memory of the computer.

      The 8088 likes to handle bits in chunks, and so we'll introduce
      special names for the two sizes of bit chunks the 8088 is most
      happy with.  A *byte* will refer to a collection of eight bits.
      A *word* consists of two bytes, or equivalently, sixteen bits.

      A collection of bits holds a pattern, determined by the state of
      its individual bits.  Here are some typical byte long patterns:

      10101010         11111111         00001111

      If you've had a course in probability, it's quite easy to work
      out that there are 256 possible patterns that a byte could hold.
      similarly, a word can hold 65,536 different patterns.
























                                                                      4

      All right, now for the single most important idea in assembly
      language programming.  Are you sitting down?  These bit patterns
      can be used to represent other sets of things, by mapping each
      pattern onto a member of the other set.  Doesn't sound like much,
      but IBM has made *BILLIONS* off this idea.

      For example, by mapping the patterns a word can hold onto the set
      of integers, you can represent either the numbers from 0 to 65535
      or -32768 to 32767, depending on the exact mapping you use.  You
      might recognize these number ranges as the range of possible line
      numbers, and the possible values of an integer variable, in BASIC
      programs.  This explains these somewhat arbitrary seeming limits:
      BASIC uses words of memory to hold line numbers and integer
      variables.

      As another example, you could map the patterns a byte can hold
      onto a series of arbitrarily chosen little pictures which might
      be displayed on a video screen.  If you look in appendix G of
      your BASIC manual, you'll notice that there are *exactly* 256
      different characters that can be displayed on your screen.  Your
      computer uses a byte of memory to tell it what character to
      display at each location of the video screen.

      Without getting too far ahead of myself, I'll just casually
      mention that there are about 256 fundamental ways the 8088 can
      manipulate the bit patterns stored in memory.  This suggests
      another mapping which we'll discuss in more detail later.

      The point of this discussion is that we can use bit patterns to
      represent anything we want, and by manipulating the patterns in
      different ways, we can produce results which have significance in
      terms of what we're choosing to represent.
































                                                                      5

      >>DIGRESSION: A NOTATION SYSTEM FOR BIT PATTERNS<<

      Because of their importance, it would be nice to have a
      convenient way to represent the various bit patterns we'll be
      talking about.  We already have one way, by listing the states of
      the individual bits as a series of 1's and 0's.  This system is
      somewhat clumsy, and error prone.  Are the following word
      patterns identical or different?

      1111111011111111                         1111111101111111

      You probably had trouble telling them apart.  It's easier to tell
      that they're different by breaking them down into more manageable
      pieces, and comparing the pieces.  Here are the same two patterns
      broken down into four bit chunks:

      1111 1110 1111 1111                  1111 1111 0111 1111

      Some clown has given the name *nybble* to a chunk of 4 bits,
      presumably because 4 bits are half a byte.  A nybble is fairly
      easy to handle.  There are only 16 possible nybble long patterns,
      and most people can distinguish between the patterns quite
      easily.

      Each nybble pattern has been given a unique symbol agreed upon by
      computer scientists.  The first 10 patterns were given symbols
      "0" through "9", and when they ran out of digit style symbols,
      they used the letters "A" through "F" for the last six patterns.
      Below is the "nybble pattern code":

      0000 = 0    0001 = 1    0010 = 2    0011 = 3

      0100 = 4    0101 = 5    0110 = 6    0111 = 7

      1000 = 8    1001 = 9    1010 = A    1011 = B

      1100 = C    1101 = D    1110 = E    1111 = F

      Using the nybble code, we can represent the two similar word
      patterns given above, with the following more manageable
      shorthand versions:

                     FEFF       FF7F





















                                                                      6

      Of course, the assignment of the symbols for the various nybble
      patterns was not so arbitrary as I've tried to make it appear.  A
      perceptive reader who has been exposed to binary numbers will
      have noticed an underlying system to the assignments.  If the 1's
      and 0's of the patterns are interpreted as actual *numbers*,
      rather than mere symbols for bit states, the first 10 patterns
      correspond to binary numbers whose decimal representation is the
      symbol assigned to the pattern.

      The last six patterns receive the symbols "A" through "F", and
      taken together, the symbols 0 through F constitute the digits of
      the *hexadecimal* number system.  Thus, the symbols assigned to
      the different nybble patterns were born out of historical
      prejudice in thinking of the computer as strictly a number
      handling machine.  Although this is an important interpretation
      of these symbols, for the time being it's enough to merely think
      of them as a shorthand way to write down bit patterns.

      Because some nybble patterns can look just like a number, it's
      often necessary to somehow indicate that we're talking about a
      pattern.  In BASIC, you do this by adding the characters &H to
      the beginning of the pattern: &H1234.  A more common convention
      is to just add the letter H to the end of the pattern: 1234H.  In
      both conventions, the H is referring to hexadecimal.

      Eventually you'll want to learn about using the hexadecimal
      number system, since it is an important way to use bit patterns.
      I'm not going to discuss it in this primer, because a number of
      books have much better treatments of this topic than I could
      produce.  Consider this an advanced topic you'll want to fill in
      later.

































                                                                      7

      >>ADDRESSING MEMORY<<

      As stated before, the 8088 chip inside your computer can
      manipulate the bit patterns which make up the computer's memory.
      Some of the possible manipulations are copying patterns from one
      place to another, turning on or turning off certain bits, or
      interpreting the patterns as numbers and performing arithmetic
      operations on them.  To perform any of these actions, the 8088
      has to know what part of memory is to be worked on.  A specific
      location in memory is identified by its *address*.

      An address is a pointer into memory.  Each address points to the
      beginning of a byte long chunk of memory.  The 8088 has the
      capability to distinguish 1,048,576 different bytes of memory.

      By this point, it probably comes as no surprise to hear that
      addresses are represented as patterns of bits.  It takes 20 bits
      to get a total of 1,048,576 different patterns, and thus an
      address may be written down as a series of 5 nybble codes.  For
      example, DOS stores a pattern which encodes information about
      what equipment is installed on your IBM PC in the word which
      begins at location 00410.  Interpreting the address as a hex
      number, the second byte of this word has an address 1 greater
      than 00410, or 00411.

      The 8088 isn't very happy handling 20 bits at a time.  The
      biggest chunk that's convenient for it to use is a 16 bit word.
      The 8088 actually calculates 20 bit addresses as the combination
      of two words, a segment word and an offset word.  The combination
      process involves interpreting the two patterns as hexadecimal
      numbers and adding them.  The way that two 16 bit patterns can be
      combined to give one 20 bit pattern is that the two patterns are
      added out of alignment by one nybble:

          0040      4 nybble segment
           0010     4 nybble offset
         --------
          00410     5 nybble address

      Because of this mechanism for calculating addresses, they will
      often be written down in what may be called segment:offset form.
      Thus, the address in above calculation could be written:

      0040:0010




















                                                                      8

      >>MEMORY CONTENTS: DATA AND PROGRAMS<<

      The contents of memory may be broken down into two broad classes.
      The first is *data*, just raw patterns of bits for the 8088 to
      work on.  The significance of the patterns is determined by what
      the computer is being used for at any given time.

      The second class of memory contents are *instructions*.  The 8088
      can look at memory and interpret a pattern it sees there as
      specifying one of the 200 some fundamental operations it knows
      how to do.  This mapping of patterns onto operations is called
      the *machine language* of the 8088.  A machine language *program*
      consists of a series of patterns located in consecutive memory
      locations, whose corresponding operations perform some useful
      process.

      Note that there is no way for the 8088 to know whether a given
      pattern is meant to be an instruction, or a piece of data to
      operate on.  It is quite possible for the chip to accidentally
      begin reading what was intended to be data, and interpret it as a
      program.  Some pretty bizarre things can occur when this happens.
      In assembly language programming circles, this is known as
      "crashing the system".









































                                                                      9

      >>THE DAWN OF ASSEMBLY LANGUAGE<<

      Unless you happen to be an 8088 chip, the patterns which make up
      a machine language program can be pretty incomprehensible.  For
      example, the pattern which tells the 8088 to flip all the bits in
      the byte at address 5555 is:

      F6 16 55 55

      which is not very informative, although you can see the 5555
      address in there.  In ancient history, the old wood-burning and
      vacuum tube computers were programmed by laboriously figuring out
      bit patterns which represented the series of instructions
      desired.  Needless to say, this technique was incredibly tedious,
      and very prone to making errors.  It finally occurred to these
      ancestral programmers that they could give the task of figuring
      out the proper patterns to the computer itself, and assembly
      language programming was born.

      Assembly language represents each of the many operations that the
      computer can do with a *mnemonic*, a short, easy to remember
      series of letters.  For example, in boolean algebra, the logical
      operation which inverts the state of a bit is called "not", and
      hence the assembly language equivalent of the preceding machine
      language pattern is:

          NOTB [5555]

      The brackets around the 5555 roughly mean "the memory location
      addressed by".  The "B" at the end of "NOTB" indicates that we
      want to operate on a byte of memory, not a word.

      Unfortunately, the 8088 can't make head nor tail of the string of
      characters "NOTB".  What's needed is a special program to run on
      the 8088 which converts the string "NOTB" into the pattern F6 16.
      This program is called an assembler.  A good analogy is that an
      assembler program is like a meat grinder which takes in assembly
      language and gives out machine language.

      Typically, an assembler reads a file of assembly language and
      translates it one line at a time, outputting a file of machine
      language.  Often times the input file is called the *source file*
      and the output file is called the *object file*.  The machine
      language patterns produced are called the *object code*.




















                                                                     10

      Also produced during the assembly process is a *listing*, which
      summarizes the results of the assembly process.  The listing
      shows each line from the source file, along with the shorthand
      "nybble code" representation of the object code produced.  In the
      event that the assembler was unable to understand any of the
      source lines, it inserts error messages in the listing, pointing
      out the problem.

      The primeval assembly language programmers had to write their
      assembler programs in machine language, because they had no other
      choice.  Not being a masochist, I wrote CHASM in BASIC.  When you
      think about it, there's a sort of circular logic in action here.
      Some programmers at Microsoft wrote the BASIC interpreter in
      assembly language, and I used BASIC to write an assembler.
      Someday, I hope to use the present version of CHASM to produce a
      machine language version, which will run about a hundred times
      faster, and at the same time bring this crazy process full
      circle.














































                                                                     11

      >>THE 8088<<

      The preceding discussions have (I hope) given you some very
      general background, a world view if you will, about assembly and
      machine language programming.  At this point, I'd like to get
      into a little more detail, beginning by examining the internal
      structure of the 8088 microprocessor, from the programmer's point
      of view.  This discussion is a condensation of information which
      I obtained from "The 8086 Book" which was written by Russell
      Rector and George Alexy, and published by Osborne/McGraw-Hill.
      Once you've digested this, I'd recommend going to The 8086 Book
      for a deeper treatment.  To use the CHASM assembler, you're going
      to need The 8086 Book anyway, to tell you the different 8088
      instructions and their mnemonics.

      Inside the 8088 are a number of *registers* each of which can
      hold a 16 bit pattern.  In assembly language, each of the
      registers has a two letter mnemonic name.  There are 14
      registers, and their mnemonics are:

      AX BX CX DX     SP BP    SI DI     CS DS SS ES    PC ST

      Each of the registers are a little different and have different
      intended uses, but they can be grouped into some broad classes.

      The *general purpose* registers (AX BX CX DX) are just that.
      These are registers which hold patterns pulled in from memory
      which are to be worked on within the 8088.  You can use these
      registers for just about anything you want.

      Each of the general purpose registers can be broken down into two
      8 bit registers, which have names of their own.  Thus, the CX
      register is broken down into the CH and CL registers.  The "H"
      and "L" stand for high and low respectively.  Each general
      purpose register breaks down into a high/low pair.

      The AX register, and its 8 bit low half, the AL register, are
      somewhat special.  Mainly for historical reasons, these registers
      are referred to as the 16 bit and 8 bit *accumulators*.  Some
      operations of the 8088 can only be carried out on the contents of
      the accumulators, and many others are faster when used in
      conjunction with these registers.






















                                                                     12

      Another group of registers are the *segment* registers (CS DS SS
      ES).  These registers hold segment values for use in calculating
      memory addresses.  The CS, or code segment register, is used
      every  time the 8088 accesses memory to read an instruction
      pattern.  The  DS, or data segment register, is used for bringing
      data patterns in.  The SS register is used to access the stack
      (more about the stack later).  The ES is the extra segment
      register.  A very few special instructions use the ES register to
      access memory, plus you can override use of the DS register and
      substitute the ES register, if you need to maintain two separate
      data areas.

      The *pointer* (SP BP) and *index* (DI SI) registers are used to
      provide indirect addressing, which is an very powerful technique
      for accessing memory.  Indirect addressing is beyond the scope of
      this little primer, but is discussed in The 8086 Book.  The SP
      register is used to implement a stack in memory. (again, more
      about the stack later)  Besides their special function, the BP,
      DI and SI registers can be used as additional general purpose
      registers.  Although it's physically possible to directly
      manipulate the value in the SP register, it's best to leave it
      alone, since you could wipe out the stack.

      Finally, there are two registers which are relatively
      inaccessible to direct manipulation.  The first is the *program
      counter*, PC.  This register always contains the offset part of
      the address of the next instruction to be executed.  Although
      you're not allowed to just move values into this register, you
      *can* indirectly affect its contents, and hence the next
      instruction to be executed, using operations which are equivalent
      to BASIC's GOTO and GOSUB instructions.  Occasionally, you will
      see the PC referred to as the *IP*, which stands for instruction
      pointer.

      The last register is also relatively inaccessible.  This is the
      *status* register, ST.  This one has a *two* alternate names, so
      watch for FL (flag register) and PSW (program status word).  The
      latter is somewhat steeped in history, since this was the name
      given to a special location in memory which served a similar
      function on the antique IBM 360 mainframe.
























                                                                     13

      The status register consists of a series of one bit *flags* which
      can affect how the 8088 works.  There are special instructions
      which allow you to set or clear each of these flags.  In
      addition, many instructions affect the state of the flags,
      depending on their outcome.  For example, one of the bits of the
      status register is called the Zero flag.  Any operation which
      ends up generating a bit pattern of all 0's automatically sets
      the Zero flag on.

      Setting the flags doesn't seem to do much, until you know that
      there a whole set of conditional branching instructions which
      cause the equivalent to a BASIC GOTO if the particular flag
      pattern they look for is set.  In assembly language, the only way
      to make a decision and branch accordingly is via this flag
      testing mechanism.

      Although some instructions implicitly affect the flags, there are
      a series of instructions whose *only* effect is to set the flags,
      based on some test or comparison.  It's very common to see one
      of these comparison operations used to set the flags just before
      a conditional branch.  Taken together, the two instructions are
      exactly equivalent to BASIC's:

      IF (comparison) THEN GOTO (linenumber)








































                                                                     14

      >>ASSEMBLY LANGUAGE SYNTAX<<

      In general, each line of an assembly language program translates
      to a set of patterns which specify one fundamental operation for
      the 8088 to carry out.

      Each line may consist of one or more of the following parts:

      First, a label, which is just a marker for the assembler to use.
      If you want to branch to an instruction from some other part of
      the program, you put a label on the instruction.  When you want
      to branch, you refer to the label.  In general, the label can be
      any string of characters you want.  A good practice is to use a
      name which reminds you what that particular part of the program
      does. CHASM will assume that any string of characters which
      starts in the first column of a line is intended to be a label.

      After the label, or if the text of the line starts to the right
      of the first column, at the beginning of the text, comes an
      instruction mnemonic.  This specifies the operation that the line
      is asking for.  For a list of the 200-odd mnemonics, along with
      the instructions they stand for, see The 8086 Book.

      Most of the 8088 instructions require that you specify one or
      more *operands*.  The operands are what the operation is to work
      on, and are listed after the instruction mnemonic.

      There are a number of possible operands.  Probably the most
      common are registers, specified by their two letter mnemonics.

      Another operand type is *immediate data*, a pattern of bits to be
      put somewhere or compared or combined with some other pattern.
      Generally immediate data is specified by its nybble code
      representation, marked as such by following it with the letter
      "H".  Some assemblers allow alternate ways to specify immediate
      data which emphasize the pattern's intended use.  CHASM
      recognizes five different ways to represent immediate data.



























                                                                     15

      A memory location can be used as an operand.  We've seen one way
      to do this, by enclosing its address in brackets.  (You can now
      see why the brackets are needed.  Without them, you couldn't
      distinguish between an address and immediate data.)  If you've
      asked the assembler to set aside a section of memory for data
      (more on this latter), and put a label on the request, you can
      specify that point in memory by using the label.  Finally, there
      are a number of indirect ways to address memory locations, which
      you can read about in The 8086 Book.

      The last major type of operands are labels.  Branching
      instructions require an operand to tell them where to branch
      *to*.  In assembly language, you specify locations which may be
      branched to by putting a label on them.  You can then use the
      label as an operand on branches.

      Often times, the order in which the operands are listed can be
      important.  For example, when moving a pattern from one place to
      another, you need to specify where the pattern is to come from,
      and where it's going.  The convention in general use is that the
      first operand is the *destination* and the second is the
      *source*.  Thus, to move the pattern in the DX register into the
      AX register, you would write:

              MOV AX,DX

      This may take some getting used to, since when reading from left
      to right it seems reasonable to assume that the transfer goes in
      this direction as well.  However, since this convention is pretty
      well entrenched in the assembly language community, CHASM goes
      along with it.

      The last part of an assembly language line is a *comment*.  The
      comment is totally ignored by the assembler, but is *vital* for
      humans who are attempting to understand the program.  Assembly
      language programs tend to be very hard to follow, and so it's
      particularly important to put in lots of comments so that you'll
      remember just what it was you were trying to do with a given
      piece of code.  Professional assembly language programmers put a
      comment on *every* line of code, explaining what it does, plus
      devoting many entire lines for additional explanations.  For an
      example, you should examine the BIOS source listing given in the
      IBM Technical Reference manual.  Over *half* the text consists of
      comments!




















                                                                     16

      Since the assembler ignores the comments, they cost you nothing
      in terms of size or speed of execution in the resulting machine
      language program.  This is in sharp contrast to BASIC, where each
      remark slows your program down and eats up precious memory.

      Generally, a character is set aside to indicate to the assembler
      the beginning of a comment, so that it knows to skip over.  CHASM
      follows a common convention of reserving the semi-colon (;) for
      marking comments.























































                                                                     17

      >>THE STACK<<

      I've been dropping the name *stack* from time to time.  The stack
      is just a portion of memory which has been temporarily set aside
      to be used in a special way.

      To get a picture of how the stack works, think of the spring
      loaded contraptions you sometimes see holding trays in a
      cafeteria.  As each tray is washed, the busboy puts it on top of
      the stack in the contraption.  Because the thing is spring
      loaded, the whole stack sinks down from the weight of the new
      tray, and the top of the stack ends up always being the same
      height off the floor.  When a customer takes a tray off the
      stack, the next one rises up to take its place.

      In the computer, the stack is used to hold data patterns, which
      are generally being passed from one program or subroutine to
      another.  By putting things on the stack, the receiving routine
      doesn't need to know a particular address to look for the
      information it needs, it just pulls them off the top of the
      stack.

      There is some jargon associated with use of the stack.  Patterns
      are *pushed* onto the stack, and *popped* off.  Accordingly,
      there are a set of PUSH and POP instructions in the 8088's
      repertoire.

      Because you don't need to keep track of where the patterns are
      actually being kept, the stack is often used as a scratch pad
      area, patterns being pushed when the register they're in is
      needed for some other purpose, then popped out when the register
      is free.  It's very common for the first few instructions of a
      subroutine to be a series of pushes to save the patterns which
      are occupying the registers it's about to use.   This is referred
      to as *saving the state* of the registers.  The last thing the
      subroutine will do is pop the patterns back into the registers
      they came from, thus *restoring the state* of the registers.

      Following the analogy of the cafeteria contraption, when you pop
      the stack, the pattern you get is the last one which was pushed.
      When you pop a pattern off, the next-to-last thing pushed
      automatically moves to the top, just as the trays rise up when a
      customer removes one.  Everything comes off the stack in the
      reverse order of which they went on.  Sometimes you'll see the
      phrase "last in, first out" or *LIFO stack*.



















                                                                     18


      Of course, there are no special spring loaded memory locations
      inside the computer.  The stack is implemented using a register
      which keeps track of where the top of the stack is currently
      located.  When you push something, the pointer is moved to the
      next available memory location, and the pattern is put in that
      spot.  When something is popped, it is copied from the location
      pointed at, then the pointer is moved back.  You don't have to
      worry about moving the pointer because it's all done
      automatically with the push and pop instructions.

      The register set aside to hold the pointer is SP, and that's why
      you don't want to monkey with SP.  You'll recall that to form an
      address, two words are needed, an offset and a segment.  The
      segment word for the stack is kept in the SS register, so you
      should leave SS alone as well.  When you run the type of machine
      language program that CHASM produces, DOS will automatically set
      the SP and SS registers to reserve a stack capable of holding 128
      words.













































                                                                     19

      >>SOFTWARE INTERRUPTS<<

      I have been religiously avoiding talking about the various
      individual instructions the 8088 can carry out, because if I
      didn't, this little primer would soon grow into a rather long
      book.  However, there's one very important instruction, which
      when you read about it in The 8088 Book, won't seem particularly
      useful.  This section will discuss the *software interrupt*
      instruction, INT, and why it's so important.

      The 8088 reserves the first 1024 bytes of memory for a series of
      256 *interrupt vectors*.  Each of these two word long interrupt
      vectors is used to store the segment:offset address of a location
      in memory.  When you execute a software interrupt instruction,
      the the 8088 pushes the location of the next instruction of your
      program onto the stack, then branches to the memory location
      pointed at by the vector specified in the interrupt.

      This probably seems like a rather awkward way to branch around in
      memory, and chances are you'd never use this method to get from
      one part of your program to another.  The way these instructions
      become important is that IBM has pre-loaded a whole series of
      useful little (and not so little) machine language routines into
      your computer, and set the interrupt vectors to point to them.
      All of these routines are set up so that after doing their thing,
      they use the location pushed on the stack by the interrupt
      instruction to branch back to your program.

      Some of these routines are a part of DOS, and documentation for
      them can be found in Appendix D of the DOS manual.  The rest of
      them are stored in ROM (read only memory) and comprise the
      *BIOS*, or basic input/output system of the computer.  Details of
      the BIOS routines can be found in Appendix A of IBM's Technical
      Reference Manual.  IBM charges around $40 for Technical
      Reference, but the information in Appendix A alone is easily
      worth the money.

      The routines do all kinds of useful things, such as run the disk
      drive for you, print characters on the screen, or read data from
      the keyboard.  In effect, the software interrupts add a whole
      series of very powerful operations to the 8088 instruction set.























                                                                     20

      A final point is that if you don't like the way that DOS or the
      BIOS does something, the vectored interrupt system makes it very
      easy to substitute your own program to handle that function.  You
      just load your program and reset the appropriate interrupt vector
      to point at your program rather than the resident routine.  This
      is how all those RAM disk and print spooler programs work.  The
      programs change the vector for disk drive or printer support to
      point to themselves, and carry out the operations in their own
      special way.

      To make things easy for you, one of the DOS interrupt routines
      has the function of resetting interrupt vectors to point at new
      code.  Still another DOS interrupt routine is used to graft new
      code onto DOS, so that it doesn't accidentally get wiped out by
      other programs.  The whole thing is really quite elegant and easy
      to use, and IBM is to be complimented for setting things up this
      way.















































                                                                     21

      >>PSEUDO-OPERATIONS<<

      Up to this point, I've implied that each line of an assembly
      language program gets translated into a machine language
      instruction.  In fact, this is not the case.  Most assemblers
      recognize a series of *pseudo-operations* which are handled as
      embedded commands to the assembler itself, not as an instruction
      in the machine language program being built.  Almost invariably
      you'll see the phrase "pseudo-operation" abbreviated down to
      *pseudo-op*. Sometimes you'll see *assembler directive*, which
      means the same thing, but just doesn't seem to roll off the
      tongue as well as pseudo-op.

      One very common pseudo-op is the *equate*, usually given mnemonic
      *EQU*.  What this allows you to do is assign a name to a
      frequently used constant.  Thereafter, anywhere you use that
      name, the assembler automatically substitutes the equated
      constant.  This process makes your program easier to read, since
      in place of the somewhat meaningless looking pattern, you see a
      name which tells you what the pattern is for.  It also makes your
      program easier to modify, since if you decide to change the
      constant, you only need to do it once, rather than all over the
      program.

      The only other type of pseudo-op I'll talk about here are those
      for setting aside memory locations for data.  These pseudo-ops
      tend to be quite idiosyncratic with each assembler.  CHASM
      implements two such pseudo-ops: DB (declare byte) and DS (declare
      storage).  DB is used to set aside small data areas, which can be
      initialized to any pattern, one byte at a time.  DS sets up
      relatively large areas, but all the locations are filled with the
      same initial pattern.

      If you put a label on a pseudo-op which sets aside data areas,
      most assemblers allow you to use the label as an operand, in
      place of the actual address of the location.  The assembler
      automatically substitutes the address for the name during the
      translation process.

      Some assemblers have a great number of pseudo-ops.  CHASM
      implements a few more, which aren't discussed here.























                                                                     22

      >>TUTORIAL<<

      To conclude this primer, this section will walk through the
      process of writing, assembling, and running a very simple
      program.

      Our program will just print a message on the video screen, and
      then return to DOS.  Although very simple, this program will
      demonstrate a number of points, including a DOS function call,
      setting aside memory for storage, and good programming form.

      Appendix D of the DOS manual discusses the various DOS
      functions and interrupts available to the assembly language
      programmer.   To print a text string to the video screen, we'll
      use function 9.  You should read the documentation for this
      function in your DOS manual at this time.


      Did it make any sense?  To use this function, we have to load the
      DX register with the address of a string in memory, specify
      function 9 by loading a "9" into the AH register, then ask DOS to
      do the printing by executing interrupt 21H.  Basically, we just
      set things up, and DOS does all the real work.

      Here's the code to do this:

           MOV AH, 9                 ;specify DOS function 9
           MOV DX, OFFSET(MESSAGE)   ;get address of string
           INT 21H                   ;call DOS

      Note that none of the lines starts at the left margin  (column
      one). If they did, CHASM would think that the instruction
      mnemonic was meant to be a label, and would get very confused.
      Also note that each line has a comment explaining what's going
      on.

      The second line needs a little explaining.  CHASM's OFFSET
      function returns the address of whatever is included in the
      parentheses, in this case, MESSAGE.  The assumption made here is
      that later in the program we will set aside a memory location
      containing our string to be printed, and give it the name
      "MESSAGE".






















                                                                     23

      Once we've printed our string, we want to return to DOS.  If we
      don't explicitly transfer control back to DOS, the 8088 will
      happily continue to execute whatever random patterns are in
      memory after our stuff.  Remember "crashing the system"?  One of
      DOS's reserved interrupts handles program termination, returning
      you to DOS.  The proper instruction is:

             INT 20H        ;return to DOS

      All that's left at this point is to set aside a chunk of memory
      containing a string to be printed.  We'll use CHASM's DB (Declare
      Bytes) pseudo-op to do this:

      MESSAGE DB  'Hello, World!$'     ;message to be printed

      The memory location is given the name "MESSAGE" because the line
      started with a "MESSAGE" as a label.  Now CHASM will know that
      the preceding OFFSET was talking about this memory location.  You
      don't need to worry about what the actual address of MESSAGE is,
      CHASM takes care of that.

      Fourteen bytes of memory get set aside here, containing the ASCII
      codes for the characters in "Hello, World!$".  Note that the
      string ends in the character "$".  DOS function 9 prints
      characters until it encounters a "$", at which point it stops.
      If you forget to put the "$" at the end of your string, you'll
      have the less-than-amusing experience of watching DOS attempt to
      print out the entire contents of memory.




































                                                                     24

      Bringing everything together, and adding a few comments at the
      beginning, here's our complete program:

      ;=====================================;
      ; HELLO    Version 1.00               ;
      ;          1984 by David Whitman      ;
      ;                                     ;
      ; Sample source file for CHASM.       ;
      ; Prints a greeting on the console.   ;
      ;=====================================;

              MOV AH, 9                 ;specify DOS function 9
              MOV DX, OFFSET(MESSAGE)   ;get address of string
              INT 21H                   ;call DOS

              INT 20H                   ;return to DOS

      MESSAGE DB  'Hello, World!$'      ;message to be printed

      After writing all this, we need to create a text file which
      contains the lines of our program.  You do this with a text
      editor or word processor.  (Of course, in real life you write the
      program using the editor in the first place.)

      CHASM likes its source files in "standard DOS" format, what some
      word processors call "document" or "ASCII mode".  Most word
      processors, and all straight text editors work in this format
      automatically.  Wordstar and Easywriter (and probably a few other
      packages) have their own special formats, but their manual should
      tell you how to make standard DOS files.

      At this point, make a standard DOS file named HELLO.ASM which
      contains the above program lines.  If you're feeling lazy, or if
      you run into problems, the file EXAMPLE.ASM on your CHASM disk
      has these lines already entered for you.  Just copy EXAMPLE.ASM
      into a new file called HELLO.ASM and you're in business.

      It's now time to assemble the program.  To start out, you have to
      set up a CHASM disk.  Follow the directions in the User's Manual
      under "Setting up a CHASM Work Disk", or for now, just copy the
      file BASIC.COM from your DOS disk onto your CHASM distribution
      disk.  Copy HELLO.ASM onto this disk as well. 






















                                                                     25

      Put the CHASM disk into your default drive, and start up CHASM by
      typing its name:

         A> CHASM

      CHASM will respond by printing a hello screen, and ask you to
      press a key when you're done reading it.  When you do so, CHASM
      will ask you some questions:

          Source code file name? [.asm]

      Type in the HELLO.ASM, or just HELLO, then hit return.  (If you
      don't enter the file extension, CHASM assumes that it's .ASM)

          Direct listing to Printer (P), Screen (S), or Disk (D)?

      CHASM wants to know where to send the listing produced during the
      assembly process.  If you have a printer, turn it on, and then
      press P.  If you don't have a printer, press S.

      The last question is:

          Name for object file? [hello.com]

      CHASM is asking for the name you'd like to give to the machine
      language program which is about to be produced.  Just press enter
      here.  CHASM will name the program HELLO.COM

      At this point CHASM will start accessing the disk drive, reading
      in your program a line at a time.  A status line will appear at
      the bottom of your screen, telling you how far along the
      translation has gotten.  For this program, the whole process
      takes about a minute.

      If the listing went to your printer, CHASM automatically returns
      you to DOS when it's finished.  If it went to the screen, CHASM
      waits for you to press a key to indicate that you're done
      reading.  Near the bottom of the listing will be the message:

      XXX Diagnostics Offered
      YYY Errors Detected























                                                                     26

      If both numbers are 0, everything went fine.  If not, look up on
      the listing for error messages, which will point out the
      offending lines.  At this point, don't worry too much about what
      the error messages say, just fix the line in your input file to
      look like the text developed above.  Once you manage to get an
      assembly with no errors, you're ready to go on.

      Your disk will now contain machine language program named
      HELLO.COM.  Confirm this by typing DIR to get a directory
      listing.  You should see the new program file listed.

      To run the machine language program, you just type its name,
      without the .COM extension.  (Note: even though you don't *enter*
      the "COM", the file has to have this extension for DOS to
      recognize it as a machine language program.)  Try it out now.
      From the DOS prompt, type: HELLO.  Your disk drive will whir for
      a second, then the message "Hello World!" will appear.

      For a further exercise, you might try printing a carriage return
      and then a line feed before the message, to space it down the
      screen a little.  Carriage return has ASCII code 13, and line
      feed is 10.  Read the CHASM User's Manual about the DB pseudo-op,
      and add these two characters at the beginning of STRING using
      their decimal representations.

      Try writing a new program called BEEP which writes the "bell"
      character to the screen.  You can use BEEP to signal the end of
      long batch files, or to annoy your co-workers.  Resist the urge
      to program a loop into BEEP.

      An even more advanced exercise would be to clear the screen
      before printing a message.  The easiest way to do this is to
      use a BIOS function, VIDEO_IO, documented on pages A-43 through
      A-44 of Technical Reference.  The comments in the BIOS listing
      tell you what values to load into which registers to get VIDEO_IO
      to monkey with the screen for you.  Load the registers and
      execute INT 10H, once to blank the screen, and again to
      move the cursor to the upper left hand corner.

      If you've read all of this primer and run the above program,
      maybe modifying it a little, you're no longer a rank beginner.
      At this point you should have enough of a start to be able to
      digest the CHASM User's Manual and The 8086 Book, then begin to
      write your own programs.  Good Luck!



















