This is an old revision of the document!
MASM basics
An assembler, understood as software that translates assembler source code into machine code, can be implemented in various ways. While the processor's instructions remain constant, other language elements may be implementation-specific. In this chapter, we present the most important language elements specific to the MASM assembly implementation. For a detailed description of MASM assembler language implementation, please refer to the Microsoft© website.
Alphabet
An alphabet is a set of characters which can be used in writing programs. In MASM, they include small and capital letters, digits, hidden characters and special characters.
Special characters may have a defined meaning, and not all of them can be used freely in the program.
, – comma – separates the operands,
‘…’ – apostrophe – text delimiter,
”…” – quotation marks – text delimiter,
(…) – round brackets – determine order of expression counting,
; – semicolon – begins the comment,
: – colon – delimits the labels and segment prefixes,
. – dot – used in record data types, begins some of the directives,
& – ampersand – used in macros to replace a formal argument with an actual value,
% – percent – expand operator used in macros,
<…> – angle brackets – text delimiter used in macros,
[…] – square brackets – used in address expressions,
$ – dollar sign – actual value of instruction pointer,
= – equal sign – directive to define constants,
? – question mark – indefinite value,
@ – at - begins predefined names,
_ – underline – often used in symbolic names instead of the space,
+ - * / – mathematical operators.
Hidden ASCII characters.
0D0Ah – CRLF (enter) – end of line,
20h – space – separates items in a line
09h – tabulation – used instead of space to improve readability of the code
Keywords and symbolic names
Reserved words, known also as keywords, are words that have special meaning in MASM. They represent elements of the language defined by MASM creators, and their use is reserved for special purposes only. They can’t be used as label names, variable names, constant names and similar user-defined items. They are:
Instructions,
Directives,
Attributes,
Operators,
Predefined symbols.
Symbolic names are words defined by the programmer used for identifying elements of the program. They are used to name constants, variables, addresses, segments and other items of the source code. Certain specific rules must be followed when creating the symbolic name. Symbolic name can’t begin with a digit and can consist of letters, digits and four special characters: $, @, _, and ?. Upper-case and lower-case letters are treated as the same, and only the first 31 characters are recognised.
Examples of proper symbolic names.
ABC123_4
Number_602602602_@1
?quest
$125
__Right_Here
Examples of improper symbolic names.
12_cats
?
‘name’
Hello.world
Right Here
Operators
Operators are used in expressions calculated during assembly. They enable performing arithmetic and logic calculations with numeric expressions and address expressions. Some operators are associated with macros or other specific elements of assembler language. We'll present some of them. For details about all operators, please refer to the MASM documentation .
The operators which can be used in numeric expressions are
*,
- multiplication and division
* MOD
- remainder of an integer division
* SHL
, SHR
- shift left and right
* OR
, XOR
, AND
, NOT
- logical functions
The operators which can be used in numeric and address expressions are
* +
, -
- addition, subtraction
* HIGH
, LOW
- high/low 8 bits of a 16-bit variable/address
* HIGHWORD
, LOWWORD
- high/low 16 bits of 32-bit variable/address
* HIGH32
, LOW32
- high/low 32 bits of 64-bit variable/address
The type operators determine the number of bytes in a single variable, the number of elements in a data array, or the size of a whole data array ##REF:masmtypeoperators##. They are very useful because they can automatically recalculate, for example, the number of iterations necessary to review a string of characters or a data array when its length changes.
<table masmtypeoperators>
<caption>MASM type operators</caption>
^ operator ^ value ^
| TYPE
| type (number of bytes in one variable) |
| LENGTH
| number of elements in one-dimensional array |
| SIZE
| number of bytes in one-dimensional array |
| LENGTHOF
| number of elements in a multi-dimensional array |
| SIZEOF
| number of bytes in a multi-dimensional array |
| PTR
| type cast operator |
</table>
<note>
The following dependencies occur:
SIZE = TYPE * LENGTH
SIZEOF = TYPE * LENGTHOF
</note>
The PTR
operator is similar to type casting in other programming languages. In some cases, it is required to specify the size of an operand. For example, if we have the indirect increment instruction. The assembler can't determine the size of the operand in memory pointed with the RBX register, which is why we have to specify the operand size with the PTR operator.
<code asm>
inc [RBX] ; Error! - argument size is not specified
inc BYTE PTR [RBX] ; Increment one byte addressed with RBX
inc WORD PTR [RBX] ; Increment word addressed with RBX
mov [RSI], AX ; Store word from AX to memory - AX use determines the size
mov [RSI], 5 ; Error! - constant operand does not determine the size
mov BYTE PTR [RSI], 5 ; Store 8-bit value
mov WORD PTR [ESI], 5 ; Store 16-bit value
mov DWORD PTR [ESI], 5 ; Store 32-bit value
</code>
An important operator used in data definitions is DUP
. It specifies the number of repetitions of the initial value. We'll present details of it later in this chapter.
===== Code and data sections =====
Programs in modern 64-bit operating systems are divided into code and data sections. The operating system maintains the stack, and currently, no stack section is defined in user programs.
To start the code section, the .CODE
directive is used. The code section contains all instructions in a program. To identify the beginning of the data section, the .DATA
directive is used. The data section contains all the variables used in a program.
<note>
Up to 32-bit processors, the functional fragments of programs were referred to as segments. It was because they were assigned to segment registers in the processor. Currently, the segmentation mechanism is no longer operational, so the code and data fragments of programs are named sections. However, in many literature sources and internet websites, the name segment can still be frequently found.
</note>
===== Location counter ====
The location counter is an internal variable, maintained by the assembler, to assign addresses to program items. During assembly, it performs a similar role as the instruction pointer during program execution. The location counter contains the address of the currently processed variable in a data section and the instruction in a code section.
Any directive which starts a section defines a new location counter and sets it to 0. If the same section is continued in another place in a program, the location counter increments continuously throughout the whole section. Assembling subsequent bytes increases the content of the location counter by 1.
While the SEGMENT
and ENDS
directives are used, the SEGMENT
directive, used for the specific section (segment) for the first time, creates the location counter for this section. The ENDS
directive suspends byte counting in a given location counter until the next fragment of the section with the same name starts with another SEGMENT
directive.
The current value of the location counter can be retrieved with the $
sign.
===== Important directives =====
The ORG
directive sets the location counter to the specified value x. It is used to align the parts of the program to a specific address.
The EVEN
directive aligns the next variable or instruction on an even byte. As the data elements in modern processors require alignment to addresses divisible by 16, the ALIGN
directive is often used instead of EVEN
.
The ALIGN
directive aligns the next variable or instruction on an address of a byte that is a multiple of the argument.
The argument of ALIGN
must be a power of two. Empty spaces are filled with zeros for the data section or appropriately-sized NOP
instructions for the code section. Note that ALIGN 2
is equal to EVEN
.
The LABEL
directive creates a new label by assigning the current location-counter value and the given type to the defined name. Usually in a program, the :
(colon) sign is used for label definition, but the LABEL
directive enables specifying the type of element which the label points to.
There is a set of directives for defining variables. They enable the assignment of a name to a variable and the specification of its type. They are summarised in a table
##REF:masmdatadefine##.
<table masmdatadefine>
<caption>MASM variable definition directives</caption>
^ Name ^ data type ^ data size ^ comment ^
| DB
| byte | 1 byte | |
| BYTE
| byte | 1 byte | |
| SBYTE
| signed byte | 1 byte | |
| DW
| word | 2 bytes | |
| WORD
| word | 2 bytes | |
| SWORD
| signed word | 2 bytes | |
| DD
| doubleword | 4 bytes | |
| DWORD
| doubleword | 4 bytes | |
| SDWORD
| signed doubleword | 4 bytes | |
| DF
| farword | 6 bytes | used as a pointer in 32-bit mode |
| FWORD
| farword | 6 bytes | used as a pointer in 32-bit mode |
| DQ
| quadword | 8 bytes | |
| QWORD
| quadword | 8 bytes | |
| SQWORD
| signed quadword | 8 bytes | |
| DT
| 10 bytes | 10 bytes | used as 80-bit BCD integer for FPU |
| TBYTE
| 10 bytes | 10 bytes | used as 80-bit BCD integer for FPU |
| OWORD
| octalword | 16 bytes | |
| REAL4
| single precision | 4 bytes | floatng point for FPU |
| REAL8
| double precision | 8 bytes | floatng point for FPU |
| REAL10
| extended double precision | 10 bytes | floatng point for FPU |
</table>
Variable definition directives can be used to define single variables, data tables or strings. The list of operands determines it. It is allowed to use ?
as an operand signalling that the initialisation value remains undefined.
<code asm>
var_x DB 10 ; single byte variable with initial value 10
var_y DW 20 ; single word variable with initial value 20
var_z DD ? ; single uninitialised doubleword
table_a DQ 1, 2, 3, 4, 5 ; table of five quadwords
string_b BYTE “I like assembler” ; string with ASCII codes of all characters
</code>
Previously mentioned DUP
operator and type operators can be explained with some exemplary data definitions.
<code asm>
; TYPE LENGHT SIZE
A DB 10 DUP (?) ; 1 10 10
AB DW 10 DUP (?) ; 2 10 20
ABC DD 10 DUP (?) ; 4 10 40
AD DB 5 DUP (5 DUP (5 DUP(?))) ; 1 125 125
</code>
An example which shows the DUP
and SIZEOF
operators together with data definitions is in the following code. This code defines the uninitialised 256-byte data buffer and fills it with zeros. Please note that in the mov [RBX], 0
instruction, BYTE PTR** must be used, because neither [RBX] nor 0 determines the operand size.
.DATA
buffer DB 256 DUP (?)
.CODE
lea RBX, buffer
mov RCX, SIZEOF buffer
clear:
mov BYTE PTR [RBX], 0
inc RBX
loop clear
Statements
Statements in assembler programs written in MASM are the lines of code composing the source files. Each MASM statement specifies an instruction for the processor or a directive for the assembler. Statements have up to four fields.
name - Specifies the name of the program line. This can serve as a label for the instruction, allowing other instructions to refer to it by name. Some directives also require naming, specifying a variable, type, constant, segment, macro, procedure and other elements of the source file.
operation - This is the main element which defines the action of the statement. This can be an instruction for the processor or an assembler directive.
operands - This field depends on the operation. Some operations do not accept operands, some require a list of one or more operands. Operands are also referred to as arguments.
comment - This field is for documentation purposes and is ignored by the assembler. Good comments make it easier to understand and maintain the program.
All fields in a statement are optional. A statement can be composed of a label only (ended with a colon), an operation only (if it doesn't require operands), or a comment only. A few examples of proper statements are presented in the following code.
; name ; operation ; operands ; comment
cns_y EQU 134 ; definition of a constant named cns_y with the value 134
.DATA ; operation only - directive to start data section
var_x DB 123 ; definition of a variable named var_x with init value 123
.CODE ; operation only - directive to start code section
begin: ; name only - label that represents an address
mov rax, rbx ; operation and corresponding operands
; comment only statement
END ; operation only - end of the source file