去年我也在这里发过一次单周期16位的CPU设计,只是简单地实现了几条指令,用if else写了一个能用的编译器,那这次就参照MIPS的架构,尽量还原一下试试。
上期传送门:如何制作一个简易单周期16位CPU – LovelyCat的小站 (lovelycatv.cn)
指令类型
R
- op 操作数
- rs 源寄存器1
- rt 源寄存器2
- rd 目标寄存器
- shamt 移位位数
- funct 功能码
I
- rs 用作与立即数求和的寄存器
- rt 源寄存器
- constant 十六位有符号立即数
J
寄存器文件
编号 | 寄存器名称 | 描述 |
0 | zero | 值恒为零且不可修改 |
1 | $at | 编译器保留 |
2-3 | \$v0 ~ $v1 | 函数返回结果 |
4-7 | \$a0 ~ $a3 | 函数调用的四个参数 |
8-15 | \$t0 ~ $t7 | 临时寄存器 |
16-23 | \$s0 ~ $s7 | 函数调用时保存原有的值 |
24-25 | \$k0 ~ $k1 | 供中断程序使用 |
26-27 | \$t8 ~ $t9 | t1 ~ t7的补充 |
28 | $gp | 全局指针 |
29 | $sp | 栈指针 |
30 | $fp | 帧指针 |
31 | $ra | 返回地址 |
指令集
时钟周期
时钟周期的部分和上次一样,也是采用了4个时钟周期作为一条指令的周期。
C1
更新PC的值并传输到内存的地址线
C2
将内存读出的指令保存在指令寄存器(IR)中,并且由译码器生成控制信号。
C3
再次给RAM一次上升沿,以保存或读取数据。
C4
将RAM读出的数据或ALU的计算结果写回寄存器,完成一个指令周期。
开始绘制
寄存器文件
- DInput 欲写入的数据
- ROA A输出的寄存器地址
- ROB B输出的寄存器地址
- W 欲写入的寄存器地址
- WriteEnable 是否可写
- CLK 时钟信号
内存
- Addr 选择的地址
- WriteEnable 是否可写
- DInput 欲写入的数据
- Mode 00 双字访问 01 半字访问 10 字访问
例如将0xf000e000以swr写入内存,对应的内存位置将会变成0000e000,前16位保持不变,后16位更改。
同理再将0xf000e000以sha写入内存,对应的内存位置将会变成f000e000,后24位保持不变,前8位改变。
ALU
- Enable 是否启用ALU
- Func 功能码
- ALU_I 当此开启后,无论Func是什么,都将执行加法
- shamt 仅用于移位指令
- lui 仅用于lui指令,将B输入端左移16位再进行加法
译码器及控制器
这一部分是整个结构中最复杂的,它的作用是将指令翻译成控制信号,下面的表格将会帮助你更好地理解。
在设计指令集时,可以一边填写这样的表格,以便绘制译码器。
解释一下表格中一些特殊的输出信号:
- OFunc 主要用于立即数加减乘除,由于立即数将原有的Func6位占用了,所以用OFunc来代替,当OFunc不为000000时,无论Func是什么,ALU都将只接受OFunc的值。
- RamMode 控制RAM的读写方式,与WordAddr、ByteAddr共同作用。
- WordAddr 当其值为0时,RAM选择低16位,为1时选择高16位。
- ByteAddr 当其值为00时,RAM选择最低8位,为11时选择最高8位。
- BranchSelector 主要用于判断分支跳转是否生效,选择对应的控制信号
- RegDst 若为1时则将rd作为写入地址(W)输入到寄存器,否则默认使用rt作为写入地址
有了这张表格,就可以轻松绘制出译码器了。
CPU
接下来就是将这些组件串联到一起了,在这之前也还需要注意几点
- ALU的A端输入只能接收寄存器的A端输出而ALU的B端输入既可以接收寄存器的B端输出,也可以接收立即数。
- M1:在执行分支跳转时,需要在下一个指令周期中的第一、二个周期对RamAddr强制输入PC的值,否则将有可能导致地址异常。
- M2:在执行字、半字访问指令时,需要在下一个指令周期中的第一个周期对RamMode强制输入00以便下一条指令的正常读取,否则读取的指令将可能不完整。
PC2Reg信号专用于控制将PC + 2的值保存在$ra寄存器中。
hlt信号将控制CPU是否继续执行指令。
指令测试
绘制好了之后,接下来我们写几条简单的指令测试一下效果吧。
1 : movi $0, $t0, 0x00ee | 080800EE | 00001000000010000000000011101110
2 : movi $0, $t1, 0x00ff | 080900FF | 00001000000010010000000011111111
3 : movi $0, $t2, 0x00cc | 080A00CC | 00001000000010100000000011001100
4 : add $t0, $t2, $t4 | 0D0A6001 | 00001101000010100110000000000001
5 : addi $0, $t1, 0xf6e5 | 6809F6E5 | 01101000000010011111011011100101
6 : muli $0, $t2, 0x0002 | 700A0002 | 01110000000010100000000000000010
7 : movi $0, $t5, 0x0001 | 080D0001 | 00001000000011010000000000000001
8 : movi $0, $t6, 0x000a | 080E000A | 00001000000011100000000000001010
9 : addi $t5, $t5, 0x0001 | 69AD0001 | 01101001101011010000000000000001
10 : blt $t5, $t6, 0x0008 | 1DAE0008 | 00011101101011100000000000001000
11 : nop | 60000000 | 01100000000000000000000000000000
12 : sd $0, $t5, 0x00f0 | 440D00F0 | 01000100000011010000000011110000
13 : ld $0, $t7, 0x00f0 | 240F00F0 | 00100100000011110000000011110000
14 : lui $0, $t8, 0x5bcd | 7C185BCD | 01111100000110000101101111001101
15 : hlt | 64000000 | 01100100000000000000000000000000
将十六进制分割成4份,由高到低输入H3 ~ H1中,然后打开自动时钟模拟,等待执行完毕。
此时程序计数器显示的是0x000E,正好对应第十五条指令hlt,表示程序已执行完毕,接下来到寄存器中看看效果。
可以看到CPU已经按照指令,将值正确的写入了各寄存器中,由于$t7的值来自内存地址0x00f0,因此l与s型指令也能正常执行。
编译器
设计编译器
如上所示的15条指令的二进制码我们可以很轻松的写出来,但是如果数量更多,将会产生不可避免的错误(有可能写错),因此我们需要一个编译器。
得益于现代计算机的高级语言,我们可以很轻松地写出一个简单的编译器来代替这种重复的工作。
如果你阅读过编译原理相关书籍,一定不会对词法分析器陌生,这次我们将模仿高级语言的编译原理来做一个简单的编译器。
不同于高级语言,我们所设计的汇编语言语法非常简单,只需要将它们分为R、I、J型指令即可,并且有固定的格式,例如:[助记符] [寄存器] [寄存器] [寄存器] 或 [助记符] [寄存器] [寄存器] [立即数]等。
因此我们可以把助记符、寄存器、十六进制数作为词素,这些词素不同于高级语言中的词素,只是借用一下概念。
例如下面这一条指令:
addi $t1, $t2, 0x00d2 ($t2 = $t1 + 0x00d2)
我们首先删除其中的空格,再从左到右逐个字节分析。
addi$t1,$t2,0x00d2
首先我们读出的都是字母,也就是说这一串字母很可能是助记符。
接着我们遇到了第一个终结符号’$’,出现这个符号表示这是一个寄存器,因此往后出现的字母或数字都可能是这个寄存器的名字。同时这也表示助记符到底为止,所以我们读取到的助记符是addi。
再继续读取,遇到下一个终结符号’,’,由于我们刚才已经读取到一个寄存器并且在不断记录寄存器的名字,因此遇到’,’时,则表示寄存器的名字到此为止,最终我们知道,这个寄存器叫做$t1。
继续向后,重复刚才的操作,下一个寄存器是$t2。
我们在第二个’,’之后遇到的第一个字符是数字而不是其他终结符($),说明这很可能是一个数字,且可能是十六进制,我们需要再向后读取至少一个字符,才能知道这是否是十六进制。
词法分析器原理
根据上面的原理,我们就可以开始做编译器啦,在这之前,如果你还是不太明白词法分析器的原理,可以看看如下代码,尝试自己运行一下试试。
Lexer.class
import java.io.IOException;
import java.util.Hashtable;
public class Lexer {
public int line = 1;
private char peek = ' ';
private Hashtable<String, Word> words = new Hashtable<String, Word>();
void reserve(Word t) {
words.put(t.lexeme, t);
}
public Lexer() {
reserve(new Word(Tag.TRUE, "true"));
reserve(new Word(Tag.FALSE, "false"));
}
public Token scan() throws IOException {
for (;;peek = (char) System.in.read()) {
if (peek == ' ' || peek == '\t') {
continue;
} else if (peek == '\n') {
line++;
} else {
break;
}
}
if (Character.isDigit(peek)) {
int v = 0;
do {
v = v * 10 + Character.digit(peek, 10);
peek = (char) System.in.read();
if (peek == 10) {
peek = (char) System.in.read();
}
} while (Character.isDigit(peek));
return new Num(v);
}
if (Character.isLetter(peek)) {
StringBuffer b = new StringBuffer();
do {
b.append(peek);
peek = (char) System.in.read();
if (peek == 10) {
peek = (char) System.in.read();
}
} while (Character.isLetterOrDigit(peek));
String s = b.toString();
Word w = words.get(s);
if (w != null) {
return w;
}
w = new Word(Tag.ID, s);
return w;
}
Token t = new Token(peek);
peek = ' ';
return t;
}
public static void main(String[] args) throws IOException {
Lexer lexer = new Lexer();
Token token = lexer.scan();
if (token instanceof Num) {
System.out.println(((Num) token).value);
}else if (token instanceof Word) {
System.out.println(((Word) token).lexeme);
}
System.out.println(token.getClass().getName() + " | " + token.tag);
}
}
Tag.class
public class Tag {
public static final int NUM = 256, ID = 257, TRUE = 258, FALSE = 259;
}
Token.class
public class Token {
public final int tag;
public Token(int tag) {
this.tag = tag;
}
}
Word.class
public class Word extends Token {
public final String lexeme;
public Word(int tag, String lexeme) {
super(tag);
this.lexeme = lexeme;
}
}
Num.class
public class Num extends Token {
public final int value;
public Num(int value) {
super(Tag.NUM);
this.value = value;
}
}
运行上述代码,当你逐个输入字符时,词法分析器将会自动识别你输入的是数字还是词素或者是终结符号(true/false)。
开始编写
准备
首先我们定义一些类来保存指令集和词素。
Instruction.class
public class Instruction extends Token {
public final String name;
public final Type type;
public final String opCode;
public Instruction(String name, Type type, String opCode) {
super(Tag.INSTRUCTION);
this.name = name;
this.type = type;
this.opCode = opCode;
}
public enum Type {
R,
I,
J,
// R-Type using shamt
RS
}
}
InstructionI.class
package cn.lovelycatv.compiler32.lexer;
public class InstructionI extends Instruction {
public InstructionI(String name, String opCode) {
super(name, Type.I, opCode);
}
}
InstructionR.class
public class InstructionR extends Instruction {
public final String func;
public final boolean useFunc;
public InstructionR(String name, String opCode, String func, boolean useFunc) {
super(name, Type.R, opCode);
this.func = func;
this.useFunc = useFunc;
}
}
InstructionRS.class
public class InstructionRS extends Instruction {
public final String func;
public String shamt;
public InstructionRS(String name, String opCode, String shamt, String func) {
super(name, Type.RS, opCode);
this.shamt = shamt;
this.func = func;
}
}
InstructionJ.class
public class InstructionJ extends Instruction {
public InstructionJ(String name, String opCode) {
super(name, Type.J, opCode);
}
}
Tag.class
public class Tag {
public static final int INSTRUCTION = 1;
public static final int REGISTER = 2;
public static final int NUM_HEX = 3;
}
Token.class
public class Token {
public final int tag;
public Token(int tag) {
this.tag = tag;
}
}
NumHex.class
public class NumHex extends Token {
public final String hex;
public NumHex(String hex) {
super(Tag.NUM_HEX);
this.hex = hex;
}
public String getHex() {
return hex.replace("0x","");
}
}
Register.class
public class Register extends Token {
public final String registerName;
public Register(String regName) {
super(Tag.REGISTER);
this.registerName = regName;
}
}
Registers.class
public enum Registers {
zero("$0","00000"),
at("$at","00001"),
v0("$v0","00010"),
v1("$v1","00011"),
a0("$a0","00100"),
a1("$a1","00101"),
a2("$a2","00110"),
a3("$a3","00111"),
t0("$t0","01000"),
t1("$t1","01001"),
t2("$t2","01010"),
t3("$t3","01011"),
t4("$t4","01100"),
t5("$t5","01101"),
t6("$t6","01110"),
t7("$t7","01111"),
s0("$s0","10000"),
s1("$s1","10001"),
s2("$s2","10010"),
s3("$s3","10011"),
s4("$s4","10100"),
s5("$s5","10101"),
s6("$s6","10110"),
s7("$s7","10111"),
t8("$t8","11000"),
t9("$t9","11001"),
k0("$k0","11010"),
k1("$k1","11011"),
gp("$gp","11100"),
sp("$sp","11101"),
fp("$fp","11110"),
ra("$ra","11111");
public final String name;
public final String addr;
Registers(String name, String addr) {
this.name = name;
this.addr = addr;
}
public static boolean contains(String name) {
for (Registers r : Registers.values()) {
if (r.name.equals(name)) {
return true;
}
}
return false;
}
public static Registers getByName(String name) {
for (Registers r : values()) {
if (r.name.equalsIgnoreCase(name)) {
return r;
}
}
return null;
}
}
Parser.class
public class Parser {
public static boolean isHexIllegal(String hexStr) {
for (char c : hexStr.toCharArray()) {
if(!(c >= 48 && c <= 57)) {
if (!(c >= 65 && c <= 70)) {
if (!(c >= 97 && c <= 102)) {
return false;
}
}
}
}
return true;
}
// 000ef -> ef
public static String hexClear(String hex) {
boolean allZero = true;
for (char c : hex.toCharArray()) {
if (c != '0') {
allZero = false;
break;
}
}
if (allZero) {
return "0";
}
int index = 0;
boolean cleared = false;
if (hex.startsWith("0")) {
cleared = true;
while (hex.charAt(index) == '0') {
index++;
}
}
return cleared ? hex.substring(index) : hex;
}
public static String hex2binary(String input) {
input = hexClear(input);
StringBuilder sb = new StringBuilder();
int len = input.length();
for (int i = 0; i < len; i++){
String temp = input.substring(i, i + 1);
int tempInt = Integer.parseInt(temp, 16);
String tempBin = Integer.toBinaryString(tempInt);
if (tempBin.length() < 4){
int num = 4 - tempBin.length();
for (int j = 0; j < num; j++){
sb.append("0");
}
}
sb.append(tempBin);
}
return sb.toString();
}
public static String binary2hex(String input) {
StringBuilder sb = new StringBuilder();
int len = input.length();
for (int i = 0; i < len / 4; i++){
String temp = input.substring(i * 4, (i + 1) * 4);
int tempInt = Integer.parseInt(temp, 2);
String tempHex = Integer.toHexString(tempInt).toUpperCase();
sb.append(tempHex);
}
return sb.toString();
}
public static String extender(String binary, int bit) {
int lost = bit - binary.length();
if (lost > 0) {
StringBuilder binaryBuilder = new StringBuilder(binary);
for (int i = 0; i < lost; i++) {
binaryBuilder.insert(0, "0");
}
binary = binaryBuilder.toString();
}else {
binary = binary.substring(binary.length() - bit);
}
return binary;
}
}
编写词法分析器
接下来我们就要开始编写词法分析器了,在Lexer中我们需要先把指令集和终结符号保存为静态变量以便读取。
import java.util.ArrayList;
import java.util.Hashtable;
import java.util.List;
public class Lexer {
private final Hashtable<String, Instruction> instructions = new Hashtable<String, Instruction>();
private final List<Character> terminals = new ArrayList<>();
public LexerTmp() {
reserve(new InstructionR("mov", "000001","", false));
reserve(new InstructionI("movi", "000010"));
reserve(new InstructionR("add", "000011","000001",true));
reserve(new InstructionR("sub", "000011","000011",true));
reserve(new InstructionR("mul", "000011","000100",true));
reserve(new InstructionR("div", "000011","000101",true));
reserve(new InstructionR("neg", "000011","000110",true));
reserve(new InstructionRS("sra", "000011","00000","000111"));
reserve(new InstructionRS("sll", "000011","00000","001000"));
reserve(new InstructionRS("slr", "000011","00000","001001"));
reserve(new InstructionR("and", "000011","001010",true));
reserve(new InstructionR("or", "000011","001011",true));
reserve(new InstructionR("not", "000011","001100",true));
reserve(new InstructionR("nand", "000011","001101",true));
reserve(new InstructionR("nor", "000011","001110",true));
reserve(new InstructionR("xnor", "000011","001111",true));
reserve(new InstructionR("xor", "000011","010000",true));
reserve(new InstructionI("beq", "000100"));
reserve(new InstructionI("bne", "000101"));
reserve(new InstructionI("bgt", "000110"));
reserve(new InstructionI("blt", "000111"));
reserve(new InstructionJ("jmp", "001000"));
reserve(new InstructionI("ld", "001001"));
reserve(new InstructionI("lwl", "001010"));
reserve(new InstructionI("lwr", "001011"));
reserve(new InstructionI("lha", "001100"));
reserve(new InstructionI("lhb", "001101"));
reserve(new InstructionI("lhc", "001110"));
reserve(new InstructionI("lhd", "001111"));
reserve(new InstructionJ("sr", "010000"));
reserve(new InstructionI("sd", "010001"));
reserve(new InstructionI("swl", "010010"));
reserve(new InstructionI("swr", "010011"));
reserve(new InstructionI("sha", "010100"));
reserve(new InstructionI("shb", "010101"));
reserve(new InstructionI("shc", "010110"));
reserve(new InstructionI("shd", "010111"));
reserve(new InstructionJ("nop", "011000"));
reserve(new InstructionJ("hlt", "011001"));
reserve(new InstructionI("addi", "011010"));
reserve(new InstructionI("subi", "011011"));
reserve(new InstructionI("muli", "011100"));
reserve(new InstructionI("divi", "011101"));
reserve(new InstructionI("jr", "011110"));
reserve(new InstructionI("lui", "011111"));
terminals.add('$');
terminals.add(',');
}
public void reserve(Instruction instruction) {
instructions.put(instruction.name, instruction);
}
}
接下来就是最重要的部分了,我们在上面对一条指令进行了逐字节分析,现在用代码来实现吧。
private static final int SCAN_MARK_NORMAL = 0;
private static final int SCAN_MARK_REGISTER = 1;
private static final int SCAN_MARK_SPLITTER = 2;
public List<Token> scan(String line, int lineCount) {
List<Token> tokenList = new ArrayList<>();
int tag = SCAN_MARK_NORMAL;
StringBuilder tmp = new StringBuilder();
for (int i = 0; i < line.length(); i++) {
char c = line.charAt(i);
tmp.append(c);
if (terminals.contains(c)) {
if (c == '$') {
tag = SCAN_MARK_REGISTER;
tmp = new StringBuilder();
} else if (c == ',') {
tag = SCAN_MARK_SPLITTER;
tmp = new StringBuilder();
}
} else if (Character.isLetter(c)) {
if (tag == SCAN_MARK_NORMAL) {
if (instructions.containsKey(tmp.toString())) {
if (i + 1 >= line.length() || !Character.isLetter(line.charAt(i + 1))) {
tokenList.add(instructions.get(tmp.toString()));
}
}
} else if (tag == SCAN_MARK_REGISTER) {
String regName = "$" + tmp;
if (Registers.contains(regName)) {
tokenList.add(new Register(regName));
}
} else {
if (tmp.toString().startsWith("0x")) {
if ((i + 1) >= line.length() || terminals.contains(line.charAt(i + 1))) {
NumHex numHex = new NumHex(tmp.toString());
tokenList.add(numHex);
if (!Parser.isHexIllegal(numHex.getHex())) {
throw new IllegalStateException(String.format("Illegal hex number : %s, at line %s : %s", numHex.hex, lineCount, line));
}
}
}
}
} else if (Character.isDigit(c)) {
if (tag == SCAN_MARK_REGISTER) {
String regName = "$" + tmp;
if (Registers.contains(regName)) {
tokenList.add(new Register(regName));
}
} else if (tag == SCAN_MARK_NORMAL) {
// If the after char is x, that means this is a hex number
if (i + 1 < line.length() && line.charAt(i + 1) == 'x') {
tmp = new StringBuilder("0");
}
if (tmp.toString().startsWith("0x")) {
if ((i + 1) >= line.length() || terminals.contains(line.charAt(i + 1))) {
NumHex numHex = new NumHex(tmp.toString());
tokenList.add(numHex);
if (!Parser.isHexIllegal(numHex.getHex())) {
throw new IllegalStateException(String.format("Illegal hex number : %s, at line %s : %s", numHex.hex, lineCount, line));
}
}
}
} else {
if (tmp.toString().startsWith("0x")) {
if ((i + 1) >= line.length() || terminals.contains(line.charAt(i + 1))) {
NumHex numHex = new NumHex(tmp.toString());
tokenList.add(numHex);
if (!Parser.isHexIllegal(numHex.getHex())) {
throw new IllegalStateException(String.format("Illegal hex number : %s, at line %s : %s", numHex.hex, lineCount, line));
}
}
}
}
} else {
throw new IllegalArgumentException(String.format("Unknown character %s, at line %s : %s", c, lineCount, line));
}
}
return tokenList;
}
如上所示,用SCAN_MARK来标记接下来要读取的词素是什么类型,以便向tokenList添加元素。
调用scan函数并将addi$t1,$t2,0x00d2
作为参数,最终将会得到一个size为4的tokenList。
tokenList的第一个元素为Instruction,而第二、三个元素为Register,最后一个元素为HexNum。
编译器
有了词法分析器,我们就可以将指令逐条输入到其中并且得到一个tokenList,编译器需要根据这些token来生成二进制码。
assemblyList储存的是一条条指令,首先我们需要跳过注释行,也就是以//开头的那一行。
然后再判断第一个元素是否为Instruction,因为指令中助记符永远在最前面。
for (String assembly : assemblyList) {
if (!"".equals(assembly) && assembly.startsWith("//")) {
currentLine++;
continue;
}
List<Token> tokenList = lexer.scan(assembly.replaceAll("\\s+",""), currentLine);
if (tokenList == null || tokenList.size() == 0) {
throw new IllegalStateException("Syntax error");
}
if (!(tokenList.get(0) instanceof Instruction)) {
throw new IllegalStateException(String.format("Unknown instruction : %s", assembly));
}
Instruction instruction = (Instruction) tokenList.get(0);
StringBuilder binaryBuilder = new StringBuilder(instruction.opCode);
}
binaryBuilder则是用于拼接二进制码,如上所示,这个StringBuilder的默认值是六位操作码。
接着我们根据不同的指令类型(R、I、J)来进行不同的处理,下面仅展示R型指令的翻译过程。
public StringBuilder compileR(StringBuilder binaryBuilder, List<Token> tokenList, int currentLine, String assembly) {
InstructionR instruction = (InstructionR) tokenList.get(0);
if (tokenList.size() != 4) {
throw new IllegalStateException(String.format("The params of the R-type instruction can only have 3, at line %s : %s", currentLine, assembly));
}
if (!(tokenList.get(1) instanceof Register) || !(tokenList.get(2) instanceof Register) || !(tokenList.get(3) instanceof Register)) {
throw new IllegalStateException(String.format("The type of params of the R-type instruction can only be register, at line %s : %s", currentLine, assembly));
}
for (int i = 1; i < 4; i++) {
String regName = ((Register) tokenList.get(i)).registerName;
Registers register = Registers.getByName(regName);
if (register != null) {
binaryBuilder.append(register.addr);
}else {
throw new IllegalStateException(String.format("Register %s cannot be found, at line %s : %s", regName, currentLine, assembly));
}
}
binaryBuilder.append("00000");
if (instruction.useFunc) {
binaryBuilder.append(instruction.func);
}else {
// No function code
binaryBuilder.append("000000");
}
return binaryBuilder;
}
下面是Compiler.class的完整代码
public class Compiler {
private static final Lexer lexer = new Lexer();
private final List<String> assemblyList = new ArrayList<>();
public void start(ICompilerCallBack iCompilerCallBack) {
System.out.printf("%s line(s) detected%n", assemblyList.size());
System.out.println("Start compile...");
Map<Integer, String> originMap = new HashMap<>();
List<String> binaryList = new ArrayList<>();
int currentLine = 1;
for (String assembly : assemblyList) {
if (!"".equals(assembly) && assembly.startsWith("//")) {
currentLine++;
continue;
}
List<Token> tokenList = lexer.scan(assembly.replaceAll("\\s+",""), currentLine);
if (tokenList == null || tokenList.size() == 0) {
throw new IllegalStateException("Syntax error");
}
if (!(tokenList.get(0) instanceof Instruction)) {
throw new IllegalStateException(String.format("Unknown instruction : %s", assembly));
}
Instruction instruction = (Instruction) tokenList.get(0);
// Check whether the tokens are legal
StringBuilder binaryBuilder = new StringBuilder(instruction.opCode);
if (instruction.type.equals(Instruction.Type.R)) {
binaryBuilder = compileR(binaryBuilder, tokenList, currentLine, assembly);
} else if (instruction.type.equals(Instruction.Type.RS)) {
binaryBuilder = compileRS(binaryBuilder, tokenList, currentLine, assembly);
} else if (instruction.type.equals(Instruction.Type.I)) {
binaryBuilder = compileI(binaryBuilder, tokenList, currentLine, assembly);
} else if (instruction.type.equals(Instruction.Type.J)) {
binaryBuilder = compileJ(binaryBuilder, tokenList, currentLine, assembly);
}
binaryList.add(binaryBuilder.toString());
originMap.put(currentLine, assembly);
currentLine++;
}
iCompilerCallBack.onFinished(binaryList, originMap);
System.out.println("Compile is finished!");
}
public StringBuilder compileR(StringBuilder binaryBuilder, List<Token> tokenList, int currentLine, String assembly) {
InstructionR instruction = (InstructionR) tokenList.get(0);
if (tokenList.size() != 4) {
throw new IllegalStateException(String.format("The params of the R-type instruction can only have 3, at line %s : %s", currentLine, assembly));
}
if (!(tokenList.get(1) instanceof Register) || !(tokenList.get(2) instanceof Register) || !(tokenList.get(3) instanceof Register)) {
throw new IllegalStateException(String.format("The type of params of the R-type instruction can only be register, at line %s : %s", currentLine, assembly));
}
for (int i = 1; i < 4; i++) {
String regName = ((Register) tokenList.get(i)).registerName;
Registers register = Registers.getByName(regName);
if (register != null) {
binaryBuilder.append(register.addr);
}else {
throw new IllegalStateException(String.format("Register %s cannot be found, at line %s : %s", regName, currentLine, assembly));
}
}
binaryBuilder.append("00000");
if (instruction.useFunc) {
binaryBuilder.append(instruction.func);
}else {
// No function code
binaryBuilder.append("000000");
}
return binaryBuilder;
}
public StringBuilder compileRS(StringBuilder binaryBuilder, List<Token> tokenList, int currentLine, String assembly) {
InstructionRS instruction = (InstructionRS) tokenList.get(0);
if (tokenList.size() != 4) {
throw new IllegalStateException(String.format("The params of the R-S-type instruction can only have 3, at line %s : %s", currentLine, assembly));
}
if (!(tokenList.get(1) instanceof Register) || !(tokenList.get(2) instanceof Register)) {
throw new IllegalStateException(String.format("The type of first two params of the R-S-type instruction can only be register, at line %s : %s", currentLine, assembly));
}
for (int i = 1; i < 3; i++) {
String regName = ((Register) tokenList.get(i)).registerName;
Registers register = Registers.getByName(regName);
if (register != null) {
binaryBuilder.append(register.addr);
if (i == 1) {
// rt is 00000
binaryBuilder.append("00000");
}
}else {
throw new IllegalStateException(String.format("Register %s cannot be found, at line %s : %s", regName, currentLine, assembly));
}
}
NumHex numHex = (NumHex) tokenList.get(3);
instruction.shamt = Parser.extender(Parser.hex2binary(numHex.getHex()), 5);
binaryBuilder.append(instruction.shamt);
binaryBuilder.append(instruction.func);
return binaryBuilder;
}
public StringBuilder compileI(StringBuilder binaryBuilder, List<Token> tokenList, int currentLine, String assembly) {
if (tokenList.size() != 4) {
throw new IllegalStateException(String.format("The params of the I-type instruction can only have 3, at line %s : %s", currentLine, assembly));
}
if (!(tokenList.get(1) instanceof Register) || !(tokenList.get(2) instanceof Register)) {
throw new IllegalStateException(String.format("The type of the first two params of the I-type instruction can only be register, at line %s : %s", currentLine, assembly));
}
if (!(tokenList.get(3) instanceof NumHex)) {
throw new IllegalStateException(String.format("The type of the 3th param of the I-type instruction can only be hex-number, at line %s : %s", currentLine, assembly));
}
NumHex numHex = (NumHex) tokenList.get(3);
for (int i = 1; i < 3; i++) {
String regName = ((Register) tokenList.get(i)).registerName;
Registers register = Registers.getByName(regName);
if (register != null) {
binaryBuilder.append(register.addr);
}else {
throw new IllegalStateException(String.format("Register %s cannot be found, at line %s : %s", regName, currentLine, assembly));
}
}
binaryBuilder.append(Parser.extender(Parser.hex2binary(numHex.getHex()), 16));
return binaryBuilder;
}
public StringBuilder compileJ(StringBuilder binaryBuilder, List<Token> tokenList, int currentLine, String assembly) {
if (tokenList.size() > 2) {
throw new IllegalStateException(String.format("The params of the J-type instruction can only have 0 ~ 1, at line %s : %s", currentLine, assembly));
}
if (tokenList.size() == 2) {
if (!((tokenList.get(1)) instanceof NumHex)) {
throw new IllegalStateException(String.format("The params of the J-type instruction can only be empty or hex-number, at line %s : %s", currentLine, assembly));
}
NumHex numHex = (NumHex) tokenList.get(1);
binaryBuilder.append(Parser.extender(Parser.hex2binary(numHex.getHex()), 26));
}else {
binaryBuilder.append("00000000000000000000000000");
}
return binaryBuilder;
}
public void addInstruction(String instruction) {
assemblyList.add(instruction);
}
public interface ICompilerCallBack {
void onFinished(List<String> binaryList, Map<Integer, String> originMap);
}
}
编译器测试
现在,我们就完成了一个简单的编译器,让我们一起试试效果如何。
compiler.addInstruction()
compiler.start()
调用addInstruction来添加要编译的指令,然后调用start方法开始编译,由interface将结果返回。
我们仍用指令测试部分的指令来编译,将binaryList输出后如下所示
00001000000010000000000011101110
00001000000010010000000011111111
00001000000010100000000011001100
00001101000010100110000000000001
01101000000010011111011011100101
01110000000010100000000000000010
00001000000011010000000000000001
00001000000011100000000000001010
01101001101011010000000000000001
00011101101011100000000000001000
01100000000000000000000000000000
01000100000011010000000011110000
00100100000011110000000011110000
01111100000110000101101111001101
01100100000000000000000000000000
对比一下我们上面自己写的二进制码,应该是完全一致的。
导入Logisim测试
接下来为了方便导入logisim中,还需要写一个自动生成可导入文件的方法。
v3.0 hex words addressed
0000: 08 08 08 0D 68 70 08 08 69 1D 60 44 24 7C 64 00
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
如上所示,我们需要的文件应该是这种格式,由于RAM被切分成了4片,所以需要把生成的指令切割成4份,并且转换成十六进制以上述格式写出。
让我们一起用代码实现一下吧:
public static String buildRamHexFile(List<String> byteHex) {
StringBuilder result = new StringBuilder();
String head = "v3.0 hex words addressed\n";
result.append(head);
int byteIndex = 0;
int linesCount = 4096;
int columnsCount = 16;
for (int lineIndex = 0; lineIndex < linesCount; lineIndex++) {
StringBuilder line = new StringBuilder();
String lineHead = Parser.extender(Long.toHexString(lineIndex * columnsCount), 4) + ": ";
line.append(lineHead);
for (int columnIndex = 0; columnIndex < columnsCount; columnIndex++, byteIndex++) {
if (byteIndex >= byteHex.size()) {
line.append("00").append(" ");
}else {
line.append(byteHex.get(byteIndex)).append(" ");
}
}
line = new StringBuilder(line.substring(0, line.length() - 1));
result.append(line).append("\n");
}
return result.toString();
}
然后再将它们写出即可,在logisim中右键RAM选择Load Image即可一键导入了。
完善编译器
现在这个编译器已经实现我们所需的功能了,接下来就可以自由发挥再给它加点新东西了。
可以像这样在编译时同时输出对应的二进制、十六进制以及代码行数,便于后续调试。
结束语
到这里,也差不多该结束了。这次我们实现了30多条指令,可以完成大多数任务,但还有一些不足。
也许你已经注意到了,如果执行的乘法运算超过32位怎么办,其实在MIPS32中还有两个特殊的寄存器,分别是HI(乘除结果高位寄存器)与LO(乘除结果低位寄存器),这里我只是在ALU将这两个作为Output输出,并没有赋予实际功能。
MIPS32中并没有类似于立即数减法的指令,因为x + y
可以变成x + (-y)
。
MIPS32中还有许多其他的指令,本期只是实现了最常用的一些,感兴趣可以自己再做修改。
参考文献
Alfred V.Aho, Monica S.Lam, Ravi Sethi and Jeffrey D.Ullman, “Compilers: Principles, Techniques and Tools, Second Edition“〔M〕. China Machine Press, 2022, pp, 47-53
Alan Clements, “Computer Organization and Architecture: Themes and Variations“, 〔M〕. China Machine Press, 2017, pp, 190-198
请问CPU那张的译码器左下角的圆圈是什么引脚?
不好意思哈,图画得不太好。
那个不是引脚,是从译码器的Jmp传出来的信号经过了一个非门,非门被译码器挡住了,所以只能看见一个小圆圈。