LLVM框架学习笔记(1)

LLVM，对于很多OS X/iOS开发者来说，即使没有真正接触过，但想必也是有所耳闻。

LLVM，从名字上看就让人想到JVM （Java Virtual Machine），实际上也差不多，LLVM即Low Level Virtual Machine。那么既然是Virtual Machine，它也拥有自己的IR（中间表示）。

你可以在LLVM的官网上下载它的源码或者预编译版本。

那么就先从Hello World！开始吧

hello.c

#include <stdio.h>

int main(int argc, char *argv[]) {
    puts("Hello World!");
}

然后调用llvm-gcc来生成它的LLVM中间表示：

llvm-gcc -S hello.c -emit-llvm

随后我们就得到了hello.ll（因为我这里没有能正确高亮LLVM中间表示的模式，所以只好选了一种最接近的）

; ModuleID = 'hello.c'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"
 
@.str = private unnamed_addr constant [13 x i8] c"Hello World!0", align 1

; Function Attrs: nounwind ssp uwtable
define i32 @main(i32 %argc, i8** %argv) #0 {
  %1 = alloca i32, align 4
  %2 = alloca i8**, align 8
  store i32 %argc, i32* %1, align 4
  store i8** %argv, i8*** %2, align 8
  %3 = call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0))
  ret i32 0
}

declare i32 @puts(i8*) #1

attributes 0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes 1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
 
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
 
!0 = !{i32 1, !"PIC Level", i32 2}
!1 = !{!"clang version 3.6.0 (tags/RELEASE_360/final)"}

对于刚才的hello.ll，我们可以直接使用lli来执行它（lli中的i就是interpreter的缩写）：

lli hello.ll

LLVM的IR的一些简单语法：

以英文分号作为注释的开始，持续到一行的结束
@作为全局声明的标识符，所有的函数名和全局变量都必须以@开始
局部标识符则以%开始
整数：iN，其中N是这个整数所占用的比特数，N的取值范围是[1, 2²³ - 1],
字符串：c"0"
矢量或阵列类型：[元素个数 x 元素大小], 比如[5 x i32]就是5个32位的整数

那么我们来分析一下LLVM的IR吧：

  @.str = private unnamed_addr constant [13 x i8] c"Hello World!0", align 1

@声明了一个全局的.str变量，它的内容是一个13 x 1字节的常字符串，以NULL结尾。

请注意！

在C中的''或者'n'会被LLVM展开为自符''加上对应的ASCII码。

例如C中的'n'会被展开为'A'，请不要误认为NULL再加上一个字符'A'。

再看一下句：

  define i32 @main(i32 %argc, i8** %argv) #0

define是我们自己写的函数的声明，i32即是一个32 bits的整数，也就是4字节的int。

之后LLVM保存了我们在main中接到的参数：

  %1 = alloca i32, align 4
  %2 = alloca i8**, align 8
  store i32 %argc, i32* %1, align 4
  store i8** %argv, i8*** %2, align 8

接下来，LLVM调用了我们的puts：

  %3 = call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0))

在LLVM的IR中，调用函数使用如下模式：

  call <function return type> <function name> <optional function arguments>

其中getelementptr是获取元素指针的操作。

之后是

  ret i32 0

在LLVM IR中，函数返回有两种：

ret <type> <value>
ret void

最后，声明系统函数：

  declare i32 @puts(i8*) #1

声明系统函数使用declare，其余部分和声明我们自己实现的函数无异。

直接用LLVM IR写一个最简化的Hello World：

@.str = constant [13 x i8] c"Hello World!0", align 1

define i32 @main(i32 %argc, i8** %argv) #0 {
  call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0))
  ret i32 0
}

declare i32 @puts(i8*) #1

最后编译、执行就可以了

llc hello.ll
clang hello.s -x assembler -o hello
./hello

Cocoa

LLVM框架学习笔记(1)

Leave a Reply Cancel reply

いまが最高！