单周期MIPS 32位CPU设计及编译器设计
本文最后更新于 679 天前,其中的信息可能已经有所发展或是发生改变。
使用PC端阅读本文章以获得最佳阅读效果。

去年我也在这里发过一次单周期16位的CPU设计,只是简单地实现了几条指令,用if else写了一个能用的编译器,那这次就参照MIPS的架构,尽量还原一下试试。

上期传送门:如何制作一个简易单周期16位CPU – LovelyCat的小站 (lovelycatv.cn)

首先需要说明的是,以下图中出现的所有线路均没有经过布线设计,只是为了模拟效果。

指令类型

R

  • op 操作数
  • rs 源寄存器1
  • rt 源寄存器2
  • rd 目标寄存器
  • shamt 移位位数
  • funct 功能码

I

  • rs 用作与立即数求和的寄存器
  • rt 源寄存器
  • constant 十六位有符号立即数

J

寄存器文件

编号寄存器名称描述
0zero值恒为零且不可修改
1$at编译器保留
2-3\$v0 ~ $v1函数返回结果
4-7\$a0 ~ $a3函数调用的四个参数
8-15\$t0 ~ $t7临时寄存器
16-23\$s0 ~ $s7函数调用时保存原有的值
24-25\$k0 ~ $k1供中断程序使用
26-27\$t8 ~ $t9t1 ~ t7的补充
28$gp全局指针
29$sp栈指针
30$fp帧指针
31$ra返回地址

指令集

时钟周期

时钟周期的部分和上次一样,也是采用了4个时钟周期作为一条指令的周期。

C1

更新PC的值并传输到内存的地址线

C2

将内存读出的指令保存在指令寄存器(IR)中,并且由译码器生成控制信号。

C3

再次给RAM一次上升沿,以保存或读取数据。

C4

将RAM读出的数据或ALU的计算结果写回寄存器,完成一个指令周期。

开始绘制

寄存器文件

  • DInput 欲写入的数据
  • ROA A输出的寄存器地址
  • ROB B输出的寄存器地址
  • W 欲写入的寄存器地址
  • WriteEnable 是否可写
  • CLK 时钟信号

内存

这里为了方便,只是把RAM设计成了四片单片8字节的阵列,这样的话PC不需要+4,也可以正常使用半字或双字访问。
  • Addr 选择的地址
  • WriteEnable 是否可写
  • DInput 欲写入的数据
  • Mode 00 双字访问 01 半字访问 10 字访问
还需要注意的是,当执行swl、swr等指令时,只是将DIpunt的对应高位低位写入RAM的高位低位。
例如将0xf000e000以swr写入内存,对应的内存位置将会变成0000e000,前16位保持不变,后16位更改。
同理再将0xf000e000以sha写入内存,对应的内存位置将会变成f000e000,后24位保持不变,前8位改变。

ALU

  • Enable 是否启用ALU
  • Func 功能码
  • ALU_I 当此开启后,无论Func是什么,都将执行加法
  • shamt 仅用于移位指令
  • lui 仅用于lui指令,将B输入端左移16位再进行加法

译码器及控制器

这一部分是整个结构中最复杂的,它的作用是将指令翻译成控制信号,下面的表格将会帮助你更好地理解。

在设计指令集时,可以一边填写这样的表格,以便绘制译码器。

解释一下表格中一些特殊的输出信号:

  • OFunc 主要用于立即数加减乘除,由于立即数将原有的Func6位占用了,所以用OFunc来代替,当OFunc不为000000时,无论Func是什么,ALU都将只接受OFunc的值。
  • RamMode 控制RAM的读写方式,与WordAddr、ByteAddr共同作用。
  • WordAddr 当其值为0时,RAM选择低16位,为1时选择高16位。
  • ByteAddr 当其值为00时,RAM选择最低8位,为11时选择最高8位。
  • BranchSelector 主要用于判断分支跳转是否生效,选择对应的控制信号
  • RegDst 若为1时则将rd作为写入地址(W)输入到寄存器,否则默认使用rt作为写入地址

有了这张表格,就可以轻松绘制出译码器了。

CPU

接下来就是将这些组件串联到一起了,在这之前也还需要注意几点

  • ALU的A端输入只能接收寄存器的A端输出而ALU的B端输入既可以接收寄存器的B端输出,也可以接收立即数。
  • M1:在执行分支跳转时,需要在下一个指令周期中的第一、二个周期对RamAddr强制输入PC的值,否则将有可能导致地址异常。
  • M2:在执行字、半字访问指令时,需要在下一个指令周期中的第一个周期对RamMode强制输入00以便下一条指令的正常读取,否则读取的指令将可能不完整。
logisim中还有一个叫做隧道(Tunnel)的组件,可以免去布线的麻烦,相当于一扇任意门。

PC2Reg信号专用于控制将PC + 2的值保存在$ra寄存器中。

hlt信号将控制CPU是否继续执行指令。

指令测试

绘制好了之后,接下来我们写几条简单的指令测试一下效果吧。

1  : movi $0, $t0, 0x00ee  | 080800EE | 00001000000010000000000011101110
2  : movi $0, $t1, 0x00ff  | 080900FF | 00001000000010010000000011111111
3  : movi $0, $t2, 0x00cc  | 080A00CC | 00001000000010100000000011001100
4  : add $t0, $t2, $t4     | 0D0A6001 | 00001101000010100110000000000001
5  : addi $0, $t1, 0xf6e5  | 6809F6E5 | 01101000000010011111011011100101
6  : muli $0, $t2, 0x0002  | 700A0002 | 01110000000010100000000000000010
7  : movi $0, $t5, 0x0001  | 080D0001 | 00001000000011010000000000000001
8  : movi $0, $t6, 0x000a  | 080E000A | 00001000000011100000000000001010
9  : addi $t5, $t5, 0x0001 | 69AD0001 | 01101001101011010000000000000001
10 : blt $t5, $t6, 0x0008  | 1DAE0008 | 00011101101011100000000000001000
11 : nop                   | 60000000 | 01100000000000000000000000000000
12 : sd $0, $t5, 0x00f0    | 440D00F0 | 01000100000011010000000011110000
13 : ld $0, $t7, 0x00f0    | 240F00F0 | 00100100000011110000000011110000
14 : lui $0, $t8, 0x5bcd   | 7C185BCD | 01111100000110000101101111001101
15 : hlt                   | 64000000 | 01100100000000000000000000000000

将十六进制分割成4份,由高到低输入H3 ~ H1中,然后打开自动时钟模拟,等待执行完毕。

此时程序计数器显示的是0x000E,正好对应第十五条指令hlt,表示程序已执行完毕,接下来到寄存器中看看效果。

可以看到CPU已经按照指令,将值正确的写入了各寄存器中,由于$t7的值来自内存地址0x00f0,因此l与s型指令也能正常执行。

编译器

设计编译器

如上所示的15条指令的二进制码我们可以很轻松的写出来,但是如果数量更多,将会产生不可避免的错误(有可能写错),因此我们需要一个编译器。

得益于现代计算机的高级语言,我们可以很轻松地写出一个简单的编译器来代替这种重复的工作。

如果你阅读过编译原理相关书籍,一定不会对词法分析器陌生,这次我们将模仿高级语言的编译原理来做一个简单的编译器。

不同于高级语言,我们所设计的汇编语言语法非常简单,只需要将它们分为R、I、J型指令即可,并且有固定的格式,例如:[助记符] [寄存器] [寄存器] [寄存器] 或 [助记符] [寄存器] [寄存器] [立即数]等。

因此我们可以把助记符、寄存器、十六进制数作为词素,这些词素不同于高级语言中的词素,只是借用一下概念。

例如下面这一条指令:

addi $t1, $t2, 0x00d2 ($t2 = $t1 + 0x00d2)

我们首先删除其中的空格,再从左到右逐个字节分析。

addi$t1,$t2,0x00d2

首先我们读出的都是字母,也就是说这一串字母很可能是助记符。

接着我们遇到了第一个终结符号’$’,出现这个符号表示这是一个寄存器,因此往后出现的字母或数字都可能是这个寄存器的名字。同时这也表示助记符到底为止,所以我们读取到的助记符是addi。

再继续读取,遇到下一个终结符号’,’,由于我们刚才已经读取到一个寄存器并且在不断记录寄存器的名字,因此遇到’,’时,则表示寄存器的名字到此为止,最终我们知道,这个寄存器叫做$t1。

继续向后,重复刚才的操作,下一个寄存器是$t2。

我们在第二个’,’之后遇到的第一个字符是数字而不是其他终结符($),说明这很可能是一个数字,且可能是十六进制,我们需要再向后读取至少一个字符,才能知道这是否是十六进制。

词法分析器原理

根据上面的原理,我们就可以开始做编译器啦,在这之前,如果你还是不太明白词法分析器的原理,可以看看如下代码,尝试自己运行一下试试。

Lexer.class

import java.io.IOException;
import java.util.Hashtable;

public class Lexer {
    public int line = 1;
    private char peek = ' ';
    private Hashtable<String, Word> words = new Hashtable<String, Word>();
    void reserve(Word t) {
        words.put(t.lexeme, t);
    }

    public Lexer() {
        reserve(new Word(Tag.TRUE, "true"));
        reserve(new Word(Tag.FALSE, "false"));
    }

    public Token scan() throws IOException {
        for (;;peek = (char) System.in.read()) {
            if (peek == ' ' || peek == '\t') {
                continue;
            } else if (peek == '\n') {
                line++;
            } else {
                break;
            }
        }
        if (Character.isDigit(peek)) {
            int v = 0;
            do {
                v = v * 10 + Character.digit(peek, 10);
                peek = (char) System.in.read();
                if (peek == 10) {
                    peek = (char) System.in.read();
                }
            } while (Character.isDigit(peek));
            return new Num(v);
        }
        if (Character.isLetter(peek)) {
            StringBuffer b = new StringBuffer();
            do {
                b.append(peek);
                peek = (char) System.in.read();
                if (peek == 10) {
                    peek = (char) System.in.read();
                }
            } while (Character.isLetterOrDigit(peek));
            String s = b.toString();
            Word w = words.get(s);
            if (w != null) {
                return w;
            }
            w = new Word(Tag.ID, s);
            return w;
        }
        Token t = new Token(peek);
        peek = ' ';
        return t;
    }

    public static void main(String[] args) throws IOException {
        Lexer lexer = new Lexer();
        Token token = lexer.scan();
        if (token instanceof Num) {
            System.out.println(((Num) token).value);
        }else if (token instanceof Word) {
            System.out.println(((Word) token).lexeme);
        }
        System.out.println(token.getClass().getName() + " | " + token.tag);
    }
}

Tag.class

public class Tag {
    public static final int NUM = 256, ID = 257, TRUE = 258, FALSE = 259;
}

Token.class

public class Token {
    public final int tag;
    public Token(int tag) {
        this.tag = tag;
    }
}

Word.class

public class Word extends Token {
    public final String lexeme;
    public Word(int tag, String lexeme) {
        super(tag);
        this.lexeme = lexeme;
    }
}

Num.class

public class Num extends Token {
    public final int value;
    public Num(int value) {
        super(Tag.NUM);
        this.value = value;
    }
}

运行上述代码,当你逐个输入字符时,词法分析器将会自动识别你输入的是数字还是词素或者是终结符号(true/false)。

开始编写

准备

首先我们定义一些类来保存指令集和词素。

Instruction.class

public class Instruction extends Token {
    public final String name;
    public final Type type;
    public final String opCode;
    public Instruction(String name, Type type, String opCode) {
        super(Tag.INSTRUCTION);
        this.name = name;
        this.type = type;
        this.opCode = opCode;
    }

    public enum Type {
        R,
        I,
        J,
        // R-Type using shamt
        RS
    }
}

InstructionI.class

package cn.lovelycatv.compiler32.lexer;

public class InstructionI extends Instruction {
    public InstructionI(String name, String opCode) {
        super(name, Type.I, opCode);
    }
}

InstructionR.class

public class InstructionR extends Instruction {
    public final String func;
    public final boolean useFunc;
    public InstructionR(String name, String opCode, String func, boolean useFunc) {
        super(name, Type.R, opCode);
        this.func = func;
        this.useFunc = useFunc;
    }

}

InstructionRS.class

public class InstructionRS extends Instruction {
    public final String func;
    public String shamt;
    public InstructionRS(String name, String opCode, String shamt, String func) {
        super(name, Type.RS, opCode);
        this.shamt = shamt;
        this.func = func;
    }
}

InstructionJ.class

public class InstructionJ extends Instruction {
    public InstructionJ(String name, String opCode) {
        super(name, Type.J, opCode);
    }
}

Tag.class

public class Tag {
    public static final int INSTRUCTION = 1;
    public static final int REGISTER = 2;
    public static final int NUM_HEX = 3;
}

Token.class

public class Token {
    public final int tag;
    public Token(int tag) {
        this.tag = tag;
    }
}

NumHex.class

public class NumHex extends Token {
    public final String hex;
    public NumHex(String hex) {
        super(Tag.NUM_HEX);
        this.hex = hex;
    }

    public String getHex() {
        return hex.replace("0x","");
    }
}

Register.class

public class Register extends Token {
    public final String registerName;

    public Register(String regName) {
        super(Tag.REGISTER);
        this.registerName = regName;
    }
}

Registers.class

public enum Registers {
    zero("$0","00000"),
    at("$at","00001"),
    v0("$v0","00010"),
    v1("$v1","00011"),
    a0("$a0","00100"),
    a1("$a1","00101"),
    a2("$a2","00110"),
    a3("$a3","00111"),
    t0("$t0","01000"),
    t1("$t1","01001"),
    t2("$t2","01010"),
    t3("$t3","01011"),
    t4("$t4","01100"),
    t5("$t5","01101"),
    t6("$t6","01110"),
    t7("$t7","01111"),
    s0("$s0","10000"),
    s1("$s1","10001"),
    s2("$s2","10010"),
    s3("$s3","10011"),
    s4("$s4","10100"),
    s5("$s5","10101"),
    s6("$s6","10110"),
    s7("$s7","10111"),
    t8("$t8","11000"),
    t9("$t9","11001"),
    k0("$k0","11010"),
    k1("$k1","11011"),
    gp("$gp","11100"),
    sp("$sp","11101"),
    fp("$fp","11110"),
    ra("$ra","11111");

    public final String name;
    public final String addr;
    Registers(String name, String addr) {
        this.name = name;
        this.addr = addr;
    }

    public static boolean contains(String name) {
        for (Registers r : Registers.values()) {
            if (r.name.equals(name)) {
                return true;
            }
        }
        return false;
    }

    public static Registers getByName(String name) {
        for (Registers r : values()) {
            if (r.name.equalsIgnoreCase(name)) {
                return r;
            }
        }
        return null;
    }
}

Parser.class

public class Parser {
    public static boolean isHexIllegal(String hexStr) {
        for (char c : hexStr.toCharArray()) {
            if(!(c >= 48 && c <= 57)) {
                if (!(c >= 65 && c <= 70)) {
                    if (!(c >= 97 && c <= 102)) {
                        return false;
                    }
                }
            }
        }
        return true;
    }
    
    // 000ef -> ef
    public static String hexClear(String hex) {
        boolean allZero = true;
        for (char c : hex.toCharArray()) {
            if (c != '0') {
                allZero = false;
                break;
            }
        }
        if (allZero) {
            return "0";
        }
        int index = 0;
        boolean cleared = false;
        if (hex.startsWith("0")) {
            cleared = true;
            while (hex.charAt(index) == '0') {
                index++;
            }
        }
        return cleared ? hex.substring(index) : hex;
    }

    public static String hex2binary(String input) {
        input = hexClear(input);
        StringBuilder sb = new StringBuilder();
        int len = input.length();
        for (int i = 0; i < len; i++){
            String temp = input.substring(i, i + 1);
            int tempInt = Integer.parseInt(temp, 16);
            String tempBin = Integer.toBinaryString(tempInt);
            if (tempBin.length() < 4){
                int num = 4 - tempBin.length();
                for (int j = 0; j < num; j++){
                    sb.append("0");
                }
            }
            sb.append(tempBin);
        }
        return sb.toString();
    }

    public static String binary2hex(String input) {
        StringBuilder sb = new StringBuilder();
        int len = input.length();
        for (int i = 0; i < len / 4; i++){
            String temp = input.substring(i * 4, (i + 1) * 4);
            int tempInt = Integer.parseInt(temp, 2);
            String tempHex = Integer.toHexString(tempInt).toUpperCase();
            sb.append(tempHex);
        }

        return sb.toString();
    }

    public static String extender(String binary, int bit) {
        int lost = bit - binary.length();
        if (lost > 0) {
            StringBuilder binaryBuilder = new StringBuilder(binary);
            for (int i = 0; i < lost; i++) {
                binaryBuilder.insert(0, "0");
            }
            binary = binaryBuilder.toString();
        }else {
            binary = binary.substring(binary.length() - bit);
        }
        return binary;
    }
}

编写词法分析器

接下来我们就要开始编写词法分析器了,在Lexer中我们需要先把指令集和终结符号保存为静态变量以便读取。

import java.util.ArrayList;
import java.util.Hashtable;
import java.util.List;

public class Lexer {
    private final Hashtable<String, Instruction> instructions = new Hashtable<String, Instruction>();
    private final List<Character> terminals = new ArrayList<>();
    public LexerTmp() {
        reserve(new InstructionR("mov", "000001","", false));
        reserve(new InstructionI("movi", "000010"));

        reserve(new InstructionR("add", "000011","000001",true));
        reserve(new InstructionR("sub", "000011","000011",true));
        reserve(new InstructionR("mul", "000011","000100",true));
        reserve(new InstructionR("div", "000011","000101",true));
        reserve(new InstructionR("neg", "000011","000110",true));
        reserve(new InstructionRS("sra", "000011","00000","000111"));
        reserve(new InstructionRS("sll", "000011","00000","001000"));
        reserve(new InstructionRS("slr", "000011","00000","001001"));
        reserve(new InstructionR("and", "000011","001010",true));
        reserve(new InstructionR("or", "000011","001011",true));
        reserve(new InstructionR("not", "000011","001100",true));
        reserve(new InstructionR("nand", "000011","001101",true));
        reserve(new InstructionR("nor", "000011","001110",true));
        reserve(new InstructionR("xnor", "000011","001111",true));
        reserve(new InstructionR("xor", "000011","010000",true));

        reserve(new InstructionI("beq", "000100"));
        reserve(new InstructionI("bne", "000101"));
        reserve(new InstructionI("bgt", "000110"));
        reserve(new InstructionI("blt", "000111"));

        reserve(new InstructionJ("jmp", "001000"));

        reserve(new InstructionI("ld", "001001"));
        reserve(new InstructionI("lwl", "001010"));
        reserve(new InstructionI("lwr", "001011"));
        reserve(new InstructionI("lha", "001100"));
        reserve(new InstructionI("lhb", "001101"));
        reserve(new InstructionI("lhc", "001110"));
        reserve(new InstructionI("lhd", "001111"));

        reserve(new InstructionJ("sr", "010000"));

        reserve(new InstructionI("sd", "010001"));
        reserve(new InstructionI("swl", "010010"));
        reserve(new InstructionI("swr", "010011"));
        reserve(new InstructionI("sha", "010100"));
        reserve(new InstructionI("shb", "010101"));
        reserve(new InstructionI("shc", "010110"));
        reserve(new InstructionI("shd", "010111"));

        reserve(new InstructionJ("nop", "011000"));
        reserve(new InstructionJ("hlt", "011001"));

        reserve(new InstructionI("addi", "011010"));
        reserve(new InstructionI("subi", "011011"));
        reserve(new InstructionI("muli", "011100"));
        reserve(new InstructionI("divi", "011101"));

        reserve(new InstructionI("jr", "011110"));
        reserve(new InstructionI("lui", "011111"));

        terminals.add('$');
        terminals.add(',');
    }

    public void reserve(Instruction instruction) {
        instructions.put(instruction.name, instruction);
    }
}

接下来就是最重要的部分了,我们在上面对一条指令进行了逐字节分析,现在用代码来实现吧。

private static final int SCAN_MARK_NORMAL = 0;
private static final int SCAN_MARK_REGISTER = 1;
private static final int SCAN_MARK_SPLITTER = 2;
public List<Token> scan(String line, int lineCount) {
    List<Token> tokenList = new ArrayList<>();
    int tag = SCAN_MARK_NORMAL;
    StringBuilder tmp = new StringBuilder();
    for (int i = 0; i < line.length(); i++) {
        char c = line.charAt(i);
        tmp.append(c);
        if (terminals.contains(c)) {
            if (c == '$') {
                tag = SCAN_MARK_REGISTER;
                tmp = new StringBuilder();
            } else if (c == ',') {
                tag = SCAN_MARK_SPLITTER;
                tmp = new StringBuilder();
            }
        } else if (Character.isLetter(c)) {
            if (tag == SCAN_MARK_NORMAL) {
                if (instructions.containsKey(tmp.toString())) {
                    if (i + 1 >= line.length() || !Character.isLetter(line.charAt(i + 1))) {
                        tokenList.add(instructions.get(tmp.toString()));
                    }
                }
            } else if (tag == SCAN_MARK_REGISTER) {
                String regName = "$" + tmp;
                if (Registers.contains(regName)) {
                    tokenList.add(new Register(regName));
                }
            } else {
                if (tmp.toString().startsWith("0x")) {
                    if ((i + 1) >= line.length() || terminals.contains(line.charAt(i + 1))) {
                        NumHex numHex = new NumHex(tmp.toString());
                        tokenList.add(numHex);
                        if (!Parser.isHexIllegal(numHex.getHex())) {
                            throw new IllegalStateException(String.format("Illegal hex number : %s, at line %s : %s", numHex.hex, lineCount, line));
                        }
                    }
                }
            }
        } else if (Character.isDigit(c)) {
            if (tag == SCAN_MARK_REGISTER) {
                String regName = "$" + tmp;
                if (Registers.contains(regName)) {
                    tokenList.add(new Register(regName));
                }
            } else if (tag == SCAN_MARK_NORMAL) {
                // If the after char is x, that means this is a hex number
                if (i + 1 < line.length() && line.charAt(i + 1) == 'x') {
                    tmp = new StringBuilder("0");
                }
                if (tmp.toString().startsWith("0x")) {
                    if ((i + 1) >= line.length() || terminals.contains(line.charAt(i + 1))) {
                        NumHex numHex = new NumHex(tmp.toString());
                        tokenList.add(numHex);
                        if (!Parser.isHexIllegal(numHex.getHex())) {
                            throw new IllegalStateException(String.format("Illegal hex number : %s, at line %s : %s", numHex.hex, lineCount, line));
                        }
                    }
                }
            } else {
                if (tmp.toString().startsWith("0x")) {
                    if ((i + 1) >= line.length() || terminals.contains(line.charAt(i + 1))) {
                        NumHex numHex = new NumHex(tmp.toString());
                        tokenList.add(numHex);
                        if (!Parser.isHexIllegal(numHex.getHex())) {
                            throw new IllegalStateException(String.format("Illegal hex number : %s, at line %s : %s", numHex.hex, lineCount, line));
                        }
                    }
                }
            }
        } else {
            throw new IllegalArgumentException(String.format("Unknown character %s, at line %s : %s", c, lineCount, line));
        }
    }
    return tokenList;
}

如上所示,用SCAN_MARK来标记接下来要读取的词素是什么类型,以便向tokenList添加元素。

调用scan函数并将addi$t1,$t2,0x00d2作为参数,最终将会得到一个size为4的tokenList。

tokenList的第一个元素为Instruction,而第二、三个元素为Register,最后一个元素为HexNum。

编译器

有了词法分析器,我们就可以将指令逐条输入到其中并且得到一个tokenList,编译器需要根据这些token来生成二进制码。

assemblyList储存的是一条条指令,首先我们需要跳过注释行,也就是以//开头的那一行。

然后再判断第一个元素是否为Instruction,因为指令中助记符永远在最前面。

for (String assembly : assemblyList) {
    if (!"".equals(assembly) && assembly.startsWith("//")) {
        currentLine++;
        continue;
    }
    List<Token> tokenList = lexer.scan(assembly.replaceAll("\\s+",""), currentLine);
    if (tokenList == null || tokenList.size() == 0) {
        throw new IllegalStateException("Syntax error");
    }
    if (!(tokenList.get(0) instanceof Instruction)) {
        throw new IllegalStateException(String.format("Unknown instruction : %s", assembly));
    }
    Instruction instruction = (Instruction) tokenList.get(0);
    StringBuilder binaryBuilder = new StringBuilder(instruction.opCode);
}

binaryBuilder则是用于拼接二进制码,如上所示,这个StringBuilder的默认值是六位操作码。

接着我们根据不同的指令类型(R、I、J)来进行不同的处理,下面仅展示R型指令的翻译过程。

public StringBuilder compileR(StringBuilder binaryBuilder, List<Token> tokenList, int currentLine, String  assembly) {
    InstructionR instruction = (InstructionR) tokenList.get(0);
    if (tokenList.size() != 4) {
        throw new IllegalStateException(String.format("The params of the R-type instruction can only have 3, at line %s : %s", currentLine, assembly));
    }
    if (!(tokenList.get(1) instanceof Register) || !(tokenList.get(2) instanceof Register) || !(tokenList.get(3) instanceof Register)) {
        throw new IllegalStateException(String.format("The type of params of the R-type instruction can only be register, at line %s : %s", currentLine, assembly));
    }
    for (int i = 1; i < 4; i++) {
        String regName = ((Register) tokenList.get(i)).registerName;
        Registers register = Registers.getByName(regName);
        if (register != null) {
            binaryBuilder.append(register.addr);
        }else {
            throw new IllegalStateException(String.format("Register %s cannot be found, at line %s : %s", regName, currentLine, assembly));
        }
    }
    binaryBuilder.append("00000");
    if (instruction.useFunc) {
        binaryBuilder.append(instruction.func);
    }else {
        // No function code
        binaryBuilder.append("000000");
    }
    return binaryBuilder;
}

下面是Compiler.class的完整代码

public class Compiler {
    private static final Lexer lexer = new Lexer();
    private final List<String> assemblyList = new ArrayList<>();
    public void start(ICompilerCallBack iCompilerCallBack) {
        System.out.printf("%s line(s) detected%n", assemblyList.size());
        System.out.println("Start compile...");

        Map<Integer, String> originMap = new HashMap<>();
        List<String> binaryList = new ArrayList<>();

        int currentLine = 1;

        for (String assembly : assemblyList) {
            if (!"".equals(assembly) && assembly.startsWith("//")) {
                currentLine++;
                continue;
            }
            List<Token> tokenList = lexer.scan(assembly.replaceAll("\\s+",""), currentLine);
            if (tokenList == null || tokenList.size() == 0) {
                throw new IllegalStateException("Syntax error");
            }
            if (!(tokenList.get(0) instanceof Instruction)) {
                throw new IllegalStateException(String.format("Unknown instruction : %s", assembly));
            }
            Instruction instruction = (Instruction) tokenList.get(0);
            // Check whether the tokens are legal
            StringBuilder binaryBuilder = new StringBuilder(instruction.opCode);
            if (instruction.type.equals(Instruction.Type.R)) {
                binaryBuilder = compileR(binaryBuilder, tokenList, currentLine, assembly);
            } else if (instruction.type.equals(Instruction.Type.RS)) {
                binaryBuilder = compileRS(binaryBuilder, tokenList, currentLine, assembly);
            } else if (instruction.type.equals(Instruction.Type.I)) {
                binaryBuilder = compileI(binaryBuilder, tokenList, currentLine, assembly);
            } else if (instruction.type.equals(Instruction.Type.J)) {
                binaryBuilder = compileJ(binaryBuilder, tokenList, currentLine, assembly);
            }
            binaryList.add(binaryBuilder.toString());
            originMap.put(currentLine, assembly);

            currentLine++;
        }

        iCompilerCallBack.onFinished(binaryList, originMap);

        System.out.println("Compile is finished!");
    }

    public StringBuilder compileR(StringBuilder binaryBuilder, List<Token> tokenList, int currentLine, String  assembly) {
        InstructionR instruction = (InstructionR) tokenList.get(0);
        if (tokenList.size() != 4) {
            throw new IllegalStateException(String.format("The params of the R-type instruction can only have 3, at line %s : %s", currentLine, assembly));
        }
        if (!(tokenList.get(1) instanceof Register) || !(tokenList.get(2) instanceof Register) || !(tokenList.get(3) instanceof Register)) {
            throw new IllegalStateException(String.format("The type of params of the R-type instruction can only be register, at line %s : %s", currentLine, assembly));
        }
        for (int i = 1; i < 4; i++) {
            String regName = ((Register) tokenList.get(i)).registerName;
            Registers register = Registers.getByName(regName);
            if (register != null) {
                binaryBuilder.append(register.addr);
            }else {
                throw new IllegalStateException(String.format("Register %s cannot be found, at line %s : %s", regName, currentLine, assembly));
            }
        }
        binaryBuilder.append("00000");
        if (instruction.useFunc) {
            binaryBuilder.append(instruction.func);
        }else {
            // No function code
            binaryBuilder.append("000000");
        }
        return binaryBuilder;
    }

    public StringBuilder compileRS(StringBuilder binaryBuilder, List<Token> tokenList, int currentLine, String  assembly) {
        InstructionRS instruction = (InstructionRS) tokenList.get(0);
        if (tokenList.size() != 4) {
            throw new IllegalStateException(String.format("The params of the R-S-type instruction can only have 3, at line %s : %s", currentLine, assembly));
        }
        if (!(tokenList.get(1) instanceof Register) || !(tokenList.get(2) instanceof Register)) {
            throw new IllegalStateException(String.format("The type of first two params of the R-S-type instruction can only be register, at line %s : %s", currentLine, assembly));
        }
        for (int i = 1; i < 3; i++) {
            String regName = ((Register) tokenList.get(i)).registerName;
            Registers register = Registers.getByName(regName);
            if (register != null) {
                binaryBuilder.append(register.addr);
                if (i == 1) {
                    // rt is 00000
                    binaryBuilder.append("00000");
                }
            }else {
                throw new IllegalStateException(String.format("Register %s cannot be found, at line %s : %s", regName, currentLine, assembly));
            }
        }
        NumHex numHex = (NumHex) tokenList.get(3);
        instruction.shamt = Parser.extender(Parser.hex2binary(numHex.getHex()), 5);
        binaryBuilder.append(instruction.shamt);
        binaryBuilder.append(instruction.func);
        return binaryBuilder;
    }

    public StringBuilder compileI(StringBuilder binaryBuilder, List<Token> tokenList, int currentLine, String  assembly) {
        if (tokenList.size() != 4) {
            throw new IllegalStateException(String.format("The params of the I-type instruction can only have 3, at line %s : %s", currentLine, assembly));
        }
        if (!(tokenList.get(1) instanceof Register) || !(tokenList.get(2) instanceof Register)) {
            throw new IllegalStateException(String.format("The type of the first two params of the I-type instruction can only be register, at line %s : %s", currentLine, assembly));
        }
        if (!(tokenList.get(3) instanceof NumHex)) {
            throw new IllegalStateException(String.format("The type of the 3th param of the I-type instruction can only be hex-number, at line %s : %s", currentLine, assembly));
        }
        NumHex numHex = (NumHex) tokenList.get(3);
        for (int i = 1; i < 3; i++) {
            String regName = ((Register) tokenList.get(i)).registerName;
            Registers register = Registers.getByName(regName);
            if (register != null) {
                binaryBuilder.append(register.addr);
            }else {
                throw new IllegalStateException(String.format("Register %s cannot be found, at line %s : %s", regName, currentLine, assembly));
            }
        }
        binaryBuilder.append(Parser.extender(Parser.hex2binary(numHex.getHex()), 16));
        return binaryBuilder;
    }

    public StringBuilder compileJ(StringBuilder binaryBuilder, List<Token> tokenList, int currentLine, String  assembly) {
        if (tokenList.size() > 2) {
            throw new IllegalStateException(String.format("The params of the J-type instruction can only have 0 ~ 1, at line %s : %s", currentLine, assembly));
        }
        if (tokenList.size() == 2) {
            if (!((tokenList.get(1)) instanceof NumHex)) {
                throw new IllegalStateException(String.format("The params of the J-type instruction can only be empty or hex-number, at line %s : %s", currentLine, assembly));
            }
            NumHex numHex = (NumHex) tokenList.get(1);
            binaryBuilder.append(Parser.extender(Parser.hex2binary(numHex.getHex()), 26));
        }else {
            binaryBuilder.append("00000000000000000000000000");
        }
        return binaryBuilder;
    }

    public void addInstruction(String instruction) {
        assemblyList.add(instruction);
    }

    public interface ICompilerCallBack {
        void onFinished(List<String> binaryList, Map<Integer, String> originMap);
    }
}

编译器测试

现在,我们就完成了一个简单的编译器,让我们一起试试效果如何。

compiler.addInstruction()

compiler.start()

调用addInstruction来添加要编译的指令,然后调用start方法开始编译,由interface将结果返回。

我们仍用指令测试部分的指令来编译,将binaryList输出后如下所示

00001000000010000000000011101110
00001000000010010000000011111111
00001000000010100000000011001100
00001101000010100110000000000001
01101000000010011111011011100101
01110000000010100000000000000010
00001000000011010000000000000001
00001000000011100000000000001010
01101001101011010000000000000001
00011101101011100000000000001000
01100000000000000000000000000000
01000100000011010000000011110000
00100100000011110000000011110000
01111100000110000101101111001101
01100100000000000000000000000000

对比一下我们上面自己写的二进制码,应该是完全一致的。

导入Logisim测试

接下来为了方便导入logisim中,还需要写一个自动生成可导入文件的方法。

v3.0 hex words addressed
0000: 08 08 08 0D 68 70 08 08 69 1D 60 44 24 7C 64 00
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

如上所示,我们需要的文件应该是这种格式,由于RAM被切分成了4片,所以需要把生成的指令切割成4份,并且转换成十六进制以上述格式写出。

让我们一起用代码实现一下吧:

public static String buildRamHexFile(List<String> byteHex) {
    StringBuilder result = new StringBuilder();
    String head = "v3.0 hex words addressed\n";
    result.append(head);
    int byteIndex = 0;
    int linesCount = 4096;
    int columnsCount = 16;
    for (int lineIndex = 0; lineIndex < linesCount; lineIndex++) {
        StringBuilder line = new StringBuilder();
        String lineHead = Parser.extender(Long.toHexString(lineIndex * columnsCount), 4) + ": ";
        line.append(lineHead);
        for (int columnIndex = 0; columnIndex < columnsCount; columnIndex++, byteIndex++) {
            if (byteIndex >= byteHex.size()) {
                line.append("00").append(" ");
            }else {
                line.append(byteHex.get(byteIndex)).append(" ");
            }
        }
        line = new StringBuilder(line.substring(0, line.length() - 1));
        result.append(line).append("\n");
    }
    return result.toString();
}

然后再将它们写出即可,在logisim中右键RAM选择Load Image即可一键导入了。

完善编译器

现在这个编译器已经实现我们所需的功能了,接下来就可以自由发挥再给它加点新东西了。

可以像这样在编译时同时输出对应的二进制、十六进制以及代码行数,便于后续调试。

结束语

到这里,也差不多该结束了。这次我们实现了30多条指令,可以完成大多数任务,但还有一些不足。

也许你已经注意到了,如果执行的乘法运算超过32位怎么办,其实在MIPS32中还有两个特殊的寄存器,分别是HI(乘除结果高位寄存器)与LO(乘除结果低位寄存器),这里我只是在ALU将这两个作为Output输出,并没有赋予实际功能。

MIPS32中并没有类似于立即数减法的指令,因为x + y可以变成x + (-y)

MIPS32中还有许多其他的指令,本期只是实现了最常用的一些,感兴趣可以自己再做修改。

参考文献

Alfred V.Aho, Monica S.Lam, Ravi Sethi and Jeffrey D.Ullman, “Compilers: Principles, Techniques and Tools, Second Edition“〔M〕. China Machine Press, 2022, pp, 47-53

Alan Clements, “Computer Organization and Architecture: Themes and Variations“, 〔M〕. China Machine Press, 2017, pp, 190-198

未经允许禁止转载本站内容,经允许转载后请严格遵守CC-BY-NC-ND知识共享协议4.0,代码部分则采用GPL v3.0协议

评论

  1. kwy
    Android Chrome
    1 年前
    2023-8-10 21:28:41

    请问CPU那张的译码器左下角的圆圈是什么引脚?

    • kwy
      博主
      Windows Edge
      1 年前
      2023-8-10 21:32:00

      不好意思哈,图画得不太好。
      那个不是引脚,是从译码器的Jmp传出来的信号经过了一个非门,非门被译码器挡住了,所以只能看见一个小圆圈。

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇