This week I took yet another look at improving the parser logic that I have been using. In the past I tried to use parser-combinators to build up my parsers with the idea that they might be smaller, simpler or better. This time I decided to simply write the parser from scratch.
The program for this week is a logic language. You can give it facts or ask it queries. Facts take the form of SOMEONE is FACT and queries take the form of who is QUERY?. The program keeps track of all the facts and performs the desired set operations to find the result. You can use and, or and not as well as () to build up complex queries.
The lexer is very simple. Using logos again we come up with a few tokens but nothing unexpected here:
The parser is much different than my previous attempts.
First off the AST is split into multiple enums rather than just a single one. I found this to be a cleaner way of representing the different levels of the program. At the top we have Statement. A statement actually does something within the program. It changes state by either setting a fact or printing the results of a query. Below that we have Expression. Expressions are used to build up queries and can be nested to build something more complex.
The parser itself takes the lexer and starts reading tokens. It is a recursive decent parser. That means that it starts at the top and looks at the next token that comes in. If its a Token::Who then it calls the query parser and goes down that path. If its a Token::Identifier it call the fact parser and goes down that path. Each path is very similar to the parser combinators we build before but rather than having a data structure that puts the parsers together, we are just using normal function calls.
One thing I would also like to point out is that I discovered the ? operator in Rust. It makes it super easy to use Result and Option in your code. Rather than having to use unwrap or is_ok you can just add a ? and if there was an error rust will return that error.
The final parser is about the same size as before. However, error handling is a lot better and its easy to debug and step through what is happening. Overall I am happy with this approach and will probably continue using it.
The runtime for this program is kind of simple but interesting none the less. It keeps a HashMap of all the facts it knows. So if you tell it rust is cool then it stores cool -> ["rust"]. Then if you query it with something like who is cool?, its just a quick lookup into the table. More complex queries take the form of set operations. Union, intersect and inverse are used to create and, or and not.
rust is cool java is old who is cool? > ["rust"] who is not cool? > ["java"]
After this week I am pretty happy with the technology stack that I am using for the compiler frontends now. Logos plus a hand built parser seems to be ideal for me. So next week I will probably look into writing some compiler backends and see if I can get some native code compilation working using LLVM.