====== MASM basics ======
An assembler, understood as software that translates assembler source code into machine code, can be implemented in various ways. While the processor's instructions remain constant, other language elements may be implementation-specific. In this chapter, we present the most important language elements specific to the MASM assembly implementation. For a detailed description of MASM assembler language implementation, please refer to the Microsoft© website((https://learn.microsoft.com/en-us/cpp/assembler/masm/microsoft-macro-assembler-reference?view=msvc-170)).
===== Alphabet =====
An alphabet is a set of characters which can be used in writing programs. In MASM, they include small and capital letters, digits, hidden characters and special characters.
* Letters: A…Z, a…z
* Digits: 0…9
* Hidden ASCII characters: 09h, 20h, 0Dh, 0Ah
* Special characters: + - * / = ( ) [ ] < > . , ‘ ” _ : ? @ $ & %
Special characters may have a defined meaning, and not all of them can be used freely in the program.
* , – comma – separates the operands,
* ‘…’ – apostrophe – text delimiter,
* ”…” – quotation marks – text delimiter,
* (…) – round brackets – determine order of expression counting,
* ; – semicolon – begins the comment,
* : – colon – delimits the labels and segment prefixes,
* . – dot – used in record data types, begins some of the directives,
* & – ampersand – used in macros to replace a formal argument with an actual value,
* % – percent – expand operator used in macros,
* <…> – angle brackets – text delimiter used in macros,
* […] – square brackets – used in address expressions,
* $ – dollar sign – actual value of instruction pointer,
* = – equal sign – directive to define constants,
* ? – question mark – indefinite value,
* @ – at - begins predefined names,
* _ – underline – often used in symbolic names instead of the space,
* + - * / – mathematical operators.
Hidden ASCII characters.
* 0D0Ah – CRLF (enter) – end of line,
* 20h – space – separates items in a line
* 09h – tabulation – used instead of space to improve readability of the code
===== Keywords and symbolic names =====
Reserved words, known also as keywords, are words that have special meaning in MASM. They represent elements of the language defined by MASM creators, and their use is reserved for special purposes only. They can’t be used as label names, variable names, constant names and similar user-defined items. They are:
* Instructions,
* Directives,
* Attributes,
* Operators,
* Predefined symbols.
Symbolic names are words defined by the programmer used for identifying elements of the program. They are used to name constants, variables, addresses, segments and other items of the source code. Certain specific rules must be followed when creating the symbolic name. Symbolic name can’t begin with a digit and can consist of letters, digits and four special characters: $, @, _, and ?. Upper-case and lower-case letters are treated as the same, and only the first 31 characters are recognised.
Examples of proper symbolic names.
ABC123_4
Number_602602602_@1
?quest
$125
__Right_Here
Examples of improper symbolic names.
12_cats
?
‘name’
Hello.world
Right Here
===== Operators =====
Operators are used in expressions calculated during assembly. They enable performing arithmetic and logic calculations with numeric expressions and address expressions. Some operators are associated with macros or other specific elements of assembler language. We'll present some of them. For details about all operators, please refer to the MASM documentation ((https://learn.microsoft.com/en-us/cpp/assembler/masm/operators-reference?view=msvc-170)).
The operators which can be used in numeric expressions are
* ** * **, ** / ** - multiplication and division
* **MOD** - remainder of an integer division
* **SHL**, **SHR** - shift left and right
* **OR**, **XOR**, **AND**, **NOT** - logical functions
The operators which can be used in numeric and address expressions are
* ** + **, ** - ** - addition, subtraction
* **HIGH**, **LOW** - high/low 8 bits of a 16-bit variable/address
* **HIGHWORD**, **LOWWORD** - high/low 16 bits of 32-bit variable/address
* **HIGH32**, **LOW32** - high/low 32 bits of 64-bit variable/address
The type operators are used in address expressions. They determine the number of bytes in a single variable, the number of elements in a data array, or the size of a whole data array {{ref>masmtypeoperators}}. They are very useful because they can automatically recalculate, for example, the number of iterations necessary to review a string of characters or a data array when its length changes.
inc [RBX] ; Error! - argument size is not specified
inc BYTE PTR [RBX] ; Increment one byte addressed with RBX
inc WORD PTR [RBX] ; Increment word addressed with RBX
mov [RSI], AX ; Store word from AX to memory - AX use determines the size
mov [RSI], 5 ; Error! - constant operand does not determine the size
mov BYTE PTR [RSI], 5 ; Store 8-bit value
mov WORD PTR [ESI], 5 ; Store 16-bit value
mov DWORD PTR [ESI], 5 ; Store 32-bit value
An important operator used in data definitions is **DUP**. It specifies the number of repetitions of the initial value. We'll present details of it later in this chapter.
===== Code and data sections =====
Programs in modern 64-bit operating systems are divided into code and data sections. The operating system maintains the stack, and currently, no stack section is defined in user programs.
To start the code section, the **.CODE** directive is used. The code section contains all instructions in a program. To identify the beginning of the data section, the **.DATA** directive is used. The data section contains all the variables used in a program.
var_x DB 10 ; single byte variable with initial value 10
var_y DW 20 ; single word variable with initial value 20
var_z DD ? ; single uninitialised doubleword
table_a DQ 1, 2, 3, 4, 5 ; table of five quadwords
string_b BYTE "I like assembler" ; string with ASCII codes of all characters
Previously mentioned **DUP** operator and type operators can be explained with some exemplary data definitions.
; TYPE LENGHT SIZE
A DB 10 DUP (?) ; 1 10 10
AB DW 10 DUP (?) ; 2 10 20
ABC DD 10 DUP (?) ; 4 10 40
AD DB 5 DUP (5 DUP (5 DUP(?))) ; 1 125 125
An example which shows the **DUP** and **SIZEOF** operators together with data definitions is in the following code. This code defines the uninitialised 256-byte data buffer and fills it with zeros. Please note that in the **mov [RBX], 0** instruction, **BYTE PTR** must be used, because neither [RBX] nor 0 determines the operand size.
.DATA
buffer DB 256 DUP (?)
.CODE
lea RBX, buffer
mov RCX, SIZEOF buffer
clear:
mov BYTE PTR [RBX], 0
inc RBX
loop clear
===== Constants =====
Constants in an assembler program define the name for the value that can't be changed during normal program execution. It is the assembly-time assignment of the value and its name. Although their name suggests that their value can't be altered, it is true at the program run-time. Some forms of constants can be modified during assembly time. Usually, constants are used to self-document the code, parameterise the assembly process, and perform assembly-time calculations.
The constants can be integer, floating-point numeric, or text strings.\\
Integer numeric constants can be defined with the data assignment directives, **EQU** or the equal sign **=**. The difference is that a numeric constant defined with the EQU directive can’t be modified later in the program, while a constant created with the equal sign can be redefined many times in the program. Numeric constants can be expressed as binary, octal, decimal or hexadecimal values. They can also be a result of an expression calculated during assembly time. It is possible to use a previously defined constant in such an expression.
int_const1 EQU 5 ; no suffix by default decimal value
int_const_dec = 7 ; finished with "d", "D", "t", "T", or by default without suffix
int_const_binary = 100100101b ; finished with "b", "B", "y", or "Y"
int_const_octal = 372o ; finished with "o", "O", "q", or "Q"
int_const_hex = 0FFA4h ; finished with "h", or "H"
int_const_expr = int_const_dec * 5
Floating-point numeric constants can be defined with the **EQU** directive only. The number can be expressed in decimal or scientific notation.
real_const1 EQU 3.1415 ; decimal
real_const2 EQU 6.28e2 ; scientific
Text string constants can be defined with **EQU** or **TEXTEQU** directives. Text constants assigned with the **EQU** or **TEXTEQU** directive can be redefined later in the program. The **TEXEQU** is considered a text macro and is described in the section about macros.
text_const1 EQU 'Hello World!'
text_const2 EQU "Hello World!"
===== Conditional assembly directives =====
The condition assembly directives have the same functionality as in high-level language compilers. They control the assembly process by checking the defined conditions and enabling or disabling the process for fragments of the source code.
* **IF** expression, **IFE** expression - tests the value of the expression and performs (or do not) assemble according to the result (0-false),
* **IFDEF** symbol - tests whether a symbol is defined,
* **IFNDEF** symbol - tests whether a symbol is undefined,
* **IFB**
IF expression1
statements
[[ELSEIF expression2
statements ]]
[[ELSE
statements ]]
ENDIF
===== Statements =====
Statements in assembler programs written in MASM are the lines of code composing the source files. Each MASM statement specifies an instruction for the processor or a directive for the assembler. Statements have up to four fields.
* name - Specifies the name of the program line. This can serve as a label for the instruction, allowing other instructions to refer to it by name. Some directives also require naming, specifying a variable, type, constant, segment, macro, procedure and other elements of the source file.
* operation - This is the main element which defines the action of the statement. This can be an instruction for the processor or an assembler directive.
* operands - This field depends on the operation. Some operations do not accept operands, some require a list of one or more operands. Operands are also referred to as arguments.
* comment - This field is for documentation purposes and is ignored by the assembler. Good comments make it easier to understand and maintain the program.
All fields in a statement are optional. A statement can be composed of a label only (ended with a colon), an operation only (if it doesn't require operands), or a comment only. A few examples of proper statements are presented in the following code.
; name ; operation ; operands ; comment
cns_y EQU 134 ; definition of a constant named cns_y with the value 134
.DATA ; operation only - directive to start data section
var_x DB 123 ; definition of a variable named var_x with init value 123
.CODE ; operation only - directive to start code section
begin: ; name only - label that represents an address
mov rax, rbx ; operation and corresponding operands
; comment only statement
END ; operation only - end of the source file