This article is about the details, when you are porting Linux i386 assembly programs to macOS i386 (not PowerPC or x86-64). I wrote this article by the time I was porting Jonesforth to MacIntel in 2010, so it may not be much accurate today.
The macOS assembler is Mach-O Assembler and let us use Mach-O for short. The Linux assembler is GNU Assembler or Gas for short. The good news is that Mach-O is much compatible to Gas and they both use AT&T syntax.
Data types are identical to both assemblers with exception of .int
type in Linux, that is .long
in macOS.
Labels are declared in the same way for both assemblers, you can even use the forward/backwards (f/b) syntax. Here is an example from Jonesforth:
jnz 2f pop %eax push %ebx // push <> 0 on stack, indicating negative dec %ecx jnz 1f pop %ebx // error: string is only '-'. movl $1,%ecx ret // Loop reading digits. 1: imull %edx,%eax // %eax *= BASE movb (%edi),%bl // %bl = next character in string inc %edi // Convert 0-9, A-Z to a number 0-35. 2: subb $48,%bl // < '0'?
Macros are defined within .macro
and .endm
statements. The macro definition bellow is interchangeable between Linux and Mac.
.macro NEXT lodsl jmp *(%eax) .endm
Linux has named arguments for macros and uses backslash to refer the named argument, like \reg
or \foo
.
.macro PUSHRSP reg lea -4(%ebp),%ebp // push reg on to return stack movl \reg,(%ebp) .endm
Argument macros in macOS are numbers with prefix $
. The first argument is $0
, the second is $1
and so on. The especial name $n
gives the total number of arguments. Note that you do not declare the arguments, you just use it in the body.
.macro PUSHRSP lea -4(%ebp),%ebp // push reg on to return stack movl $0,(%ebp) .endm
The Mach-O syntax $0
$1
is a bit annoying because they are also used as pure constants, outside macros definitions. Gas can accept default values for arguments, but you cannot do this with Mach-O assembler.
The system call (syscall for short) numbers are totally different between Linux and macOS, even worse, they may change in the whole life of an Operating System.
Linux syscall is invoked with int $0x80
instruction, all arguments goes into registers, then you set the register EAX
with the number of the system call. At the end of syscall, the register EAX contains the return value.
xor %ebx,%ebx // 0 mov $1,%eax // syscall: exit int $0x80
macOS syscall is invoked with instruction int $0x80
but, because of FreeBSD inheritance, the arguments are placed into the stack and the register EAX has the appropriate system call number. At the end of syscall, the register EAX contains the return value.
push $0 // 0 mov $1,%eax // syscall: exit push %eax int $0x80
The instruction push %eax
above is just for padding the stack, because of int $0x80
requirement. In practice, it is better to place int $0x80
in this subroutine:
_syscall: int $0x80 ret
And then call the routine like this example (exit 0):
push $0 // 0 mov $1,%eax // syscall: exit call _syscall