Compiler Weekly: BrainF*** Interpreter

This week I set out to solve my parser woes from my last post. I worked really hard to build a library that would match my needs. In the end I came up with a project called pebble_parser. Its a similar parser combinator library to pom but with a better syntax and it works on iterators rather than arrays.

My project for this week was to build a BrainF**k compiler. Its a simple language with very little syntax. I figured it would be a great way to test out my parser without having too hard a project to complete.

There are not many differences between this post and my last. I am going to quickly point out the parts of interest and leave the rest as an exercise for the reader.

The lexer is very similar to last time. Using logos and mainly focusing on basic tokens.

pub fn lex(input: &str) -> BFLexer {
    BFLexer{lexer: Token::lexer(input)}
}

#[derive(Clone, Debug, PartialEq, Logos)]
pub enum Token {
    #[token(">")] IncPtr,
    #[token("<")] DecPtr,
    #[token("+")] IncData,
    #[token("-")] DecData,
    #[token(".")] Output,
    #[token(",")] Input,
    #[token("[")] OpenLoop,
    #[token("]")] CloseLoop,
    #[error]
    #[regex(r".|[\t\n\f]+", logos::skip)]
    Error
}

In a similar sense the runtime is simple and similar to the last one as well. It is recursive and holds some common references. The big difference is the looping structure and block evaluation. As you can see here we use existing structures within rust to simply execute as needed.

use std::io;
use std::io::Read;

#[derive(Debug)]
pub enum Program {
    IncPtr,
    DecPtr,
    IncVal,
    DecVal,
    Output,
    Input,
    Block(Vec<Program>),
    Loop(Box<Program>)
}

pub struct BFMachine {
    ptr: usize,
    data: Vec<u8>
}

impl BFMachine {
    pub fn new() -> Self{
        BFMachine{
            ptr: 0,
            data: vec![0;30_000]
        }
    }

    pub fn eval(&mut self, program: &Program){
        match program {
            Program::IncPtr => self.ptr += 1,
            Program::DecPtr => self.ptr -= 1,
            Program::IncVal => self.data[self.ptr] += 1,
            Program::DecVal => self.data[self.ptr] -= 1,
            Program::Output => print!("{}", self.data[self.ptr] as char),
            Program::Input => {
                io::stdin().read(&mut self.data[self.ptr..self.ptr+1]).unwrap();
            },
            Program::Block(instructions) => {
                for i in instructions {
                    self.eval(i);
                }
            }
            Program::Loop(body) => {
                while self.data[self.ptr] != 0 {
                    self.eval(body)
                }
            }
        }
    }
}

Finally there is the parser. Here is where I began to work with full intentions of ending up with a cleaner parser that contained better error handling. Unfortunately, what I ended up with was more or less the same as last time. I like the method calls more than operator overloading, but the parser is the same speed, has bad error handling, and is generally the same thing in a different syntax.

pub fn parse(lexer: &mut BFLexer) -> Result<Program, Error> {
    parse_block().then_ignore(end()).parse(lexer)
}

type BFParser<'a> = Parser<'a, BFLexer<'a>, Program>;

fn parse_block<'a>() -> BFParser<'a> {
    parse_instruction().or(parse_loop()).many().map(|v| Program::Block(v))
}

macro_rules! token {
    ($token: pat) => {
        Parser::new(String::new(), |reader|{
            read_next(reader).and_then(|(token, location)|{
                if let $token = token {
                    return Ok(())
                }
                Err(Error::expected_found(stringify!($token), token, location))
            })
        });
    };
}

fn parse_loop<'a>() -> BFParser<'a> {
    token!(Token::OpenLoop)
        .ignore_then(factory(|| parse_block()))
        .then_ignore(token!(Token::CloseLoop))
        .map(|program|Program::Loop(Box::new(program)))
        .with_name("Loop")
}

fn parse_instruction<'a>() -> BFParser<'a> {
    Parser::new(String::from("Instruction"), |reader|{
        read_next(reader).and_then(|(token, location)|{
            return match token {
                Token::IncPtr => Ok(Program::IncPtr),
                Token::DecPtr => Ok(Program::DecPtr),
                Token::IncData => Ok(Program::IncVal),
                Token::DecData => Ok(Program::DecVal),
                Token::Output => Ok(Program::Output),
                Token::Input => Ok(Program::Input),
                _ => Err(Error::expected_found("Instruction", token, location))
            }
        })
    })
}

Conclusion

Like the last project this one works fine. It will run any BrainF*** program that you want. However the implementation is not an improvement over last week.

Parser libraries like pom and pebble_parser are not the correct approach for any big compiler work. The source code of the compiler is much better at handling complex worflows than libraries. For that reason I wont be releasing pebble_parser as I don’t think that its the correct solution. I’m now thinking that using logos is enough to build a toolchain. My next parser will be hand built and I will see if that works out better.

_17/10/2021