Porting Linux Assembly Programs to macOS

This article is about the details, when you are porting Linux i386 assembly programs to macOS i386 (not PowerPC or x86-64). I wrote this article by the time I was porting Jonesforth to MacIntel in 2010, so it may not be much accurate today.

The macOS assembler is Mach-O Assembler and let us use Mach-O for short. The Linux assembler is GNU Assembler or Gas for short. The good news is that Mach-O is much compatible to Gas and they both use AT&T syntax.

Diferences between Mach-O and Gas

Data types

Data types are identical to both assemblers with exception of .int type in Linux, that is .long in macOS.

Labels

Labels are declared in the same way for both assemblers, you can even use the forward/backwards (f/b) syntax. Here is an example from Jonesforth:

	jnz 2f
	pop %eax
	push %ebx		// push <> 0 on stack, indicating negative
	dec %ecx
	jnz 1f
	pop %ebx		// error: string is only '-'.
	movl $1,%ecx
	ret

	// Loop reading digits.
1:	imull %edx,%eax		// %eax *= BASE
	movb (%edi),%bl		// %bl = next character in string
	inc %edi

	// Convert 0-9, A-Z to a number 0-35.
2:	subb $48,%bl		// < '0'?

Macros

Macros are defined within .macro and .endm statements. The macro definition bellow is interchangeable between Linux and Mac.

.macro NEXT
	lodsl
	jmp *(%eax)
.endm

Linux has named arguments for macros and uses backslash to refer the named argument, like \reg or \foo.

.macro PUSHRSP reg
	lea -4(%ebp),%ebp	// push reg on to return stack
	movl \reg,(%ebp)
.endm

Argument macros in macOS are numbers with prefix $. The first argument is $0, the second is $1 and so on. The especial name $n gives the total number of arguments. Note that you do not declare the arguments, you just use it in the body.

.macro PUSHRSP
	lea -4(%ebp),%ebp	// push reg on to return stack
	movl $0,(%ebp)
.endm

The Mach-O syntax $0 $1 is a bit annoying because they are also used as pure constants, outside macros definitions. Gas can accept default values for arguments, but you cannot do this with Mach-O assembler.

System Call

The system call (syscall for short) numbers are totally different between Linux and macOS, even worse, they may change in the whole life of an Operating System.

Linux syscall is invoked with int $0x80 instruction, all arguments goes into registers, then you set the register EAX with the number of the system call. At the end of syscall, the register EAX contains the return value.

	xor %ebx,%ebx	// 0
	mov $1,%eax	// syscall: exit
	int $0x80

macOS syscall is invoked with instruction int $0x80 but, because of FreeBSD inheritance, the arguments are placed into the stack and the register EAX has the appropriate system call number. At the end of syscall, the register EAX contains the return value.

	push $0		// 0
	mov $1,%eax	// syscall: exit
	push %eax
	int $0x80

The instruction push %eax above is just for padding the stack, because of int $0x80 requirement. In practice, it is better to place int $0x80 in this subroutine:

_syscall:
	int $0x80
	ret

And then call the routine like this example (exit 0):

	push $0		// 0
	mov $1,%eax	// syscall: exit
	call _syscall