📜 Add better documentation across the compiler. (#3)
These changes pay particular attention to API endpoints, to try to ensure that any rustdocs generated are detailed and sensible. A good next step, eventually, might be to include doctest examples, as well. For the moment, it's not clear that they would provide a lot of value, though. In addition, this does a couple refactors to simplify the code base in ways that make things clearer or, at least, briefer.
This commit is contained in:
@@ -1,12 +1,32 @@
|
||||
use crate::syntax::Location;
|
||||
|
||||
/// The set of valid binary operators.
|
||||
pub static BINARY_OPERATORS: &[&str] = &["+", "-", "*", "/"];
|
||||
|
||||
/// A structure represented a parsed program.
|
||||
///
|
||||
/// One `Program` is associated with exactly one input file, and the
|
||||
/// vector is arranged in exactly the same order as the parsed file.
|
||||
/// Because this is the syntax layer, the program is guaranteed to be
|
||||
/// syntactically valid, but may be nonsense. There could be attempts
|
||||
/// to use unbound variables, for example, until after someone runs
|
||||
/// `validate` and it comes back without errors.
|
||||
#[derive(Clone, Debug, PartialEq)]
|
||||
pub struct Program {
|
||||
pub statements: Vec<Statement>,
|
||||
}
|
||||
|
||||
/// A parsed statement.
|
||||
///
|
||||
/// Statements are guaranteed to be syntactically valid, but may be
|
||||
/// complete nonsense at the semantic level. Which is to say, all the
|
||||
/// print statements were correctly formatted, and all the variables
|
||||
/// referenced are definitely valid symbols, but they may not have
|
||||
/// been defined or anything.
|
||||
///
|
||||
/// Note that equivalence testing on statements is independent of
|
||||
/// source location; it is testing if the two statements say the same
|
||||
/// thing, not if they are the exact same statement.
|
||||
#[derive(Clone, Debug)]
|
||||
pub enum Statement {
|
||||
Binding(Location, String, Expression),
|
||||
@@ -28,6 +48,12 @@ impl PartialEq for Statement {
|
||||
}
|
||||
}
|
||||
|
||||
/// An expression in the underlying syntax.
|
||||
///
|
||||
/// Like statements, these expressions are guaranteed to have been
|
||||
/// formatted correctly, but may not actually make any sense. Also
|
||||
/// like Statements, the [`PartialEq`] implementation does not take
|
||||
/// source positions into account.
|
||||
#[derive(Clone, Debug)]
|
||||
pub enum Expression {
|
||||
Value(Location, Value),
|
||||
@@ -54,7 +80,9 @@ impl PartialEq for Expression {
|
||||
}
|
||||
}
|
||||
|
||||
/// A value from the source syntax
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
pub enum Value {
|
||||
/// The value of the number, and an optional base that it was written in
|
||||
Number(Option<u8>, i64),
|
||||
}
|
||||
|
||||
@@ -4,11 +4,23 @@ use crate::eval::{EvalEnvironment, EvalError, Value};
|
||||
use crate::syntax::{Expression, Program, Statement};
|
||||
|
||||
impl Program {
|
||||
/// Evaluate the program, returning either an error or what it prints out when run.
|
||||
///
|
||||
/// Doing this evaluation is particularly useful for testing, to ensure that if we
|
||||
/// modify a program in some way it does the same thing on both sides of the
|
||||
/// transformation. It's also sometimes just nice to know what a program will be
|
||||
/// doing.
|
||||
///
|
||||
/// Note that the errors here are slightly more strict that we enforce at runtime.
|
||||
/// For example, we check for overflow and underflow errors during evaluation, and
|
||||
/// we don't check for those in the compiled code.
|
||||
pub fn eval(&self) -> Result<String, EvalError> {
|
||||
let mut env = EvalEnvironment::empty();
|
||||
let mut stdout = String::new();
|
||||
|
||||
for stmt in self.statements.iter() {
|
||||
// at this point, evaluation is pretty simple. just walk through each
|
||||
// statement, in order, and record printouts as we come to them.
|
||||
match stmt {
|
||||
Statement::Binding(_, name, value) => {
|
||||
let actual_value = value.eval(&env)?;
|
||||
@@ -40,6 +52,7 @@ impl Expression {
|
||||
let mut arg_values = Vec::with_capacity(args.len());
|
||||
|
||||
for arg in args.iter() {
|
||||
// yay, recursion! makes this pretty straightforward
|
||||
arg_values.push(arg.eval(env)?);
|
||||
}
|
||||
|
||||
|
||||
@@ -1,5 +1,9 @@
|
||||
use codespan_reporting::diagnostic::{Diagnostic, Label};
|
||||
|
||||
/// A source location, for use in pointing users towards warnings and errors.
|
||||
///
|
||||
/// Internally, locations are very tied to the `codespan_reporting` library,
|
||||
/// and the primary use of them is to serve as anchors within that library.
|
||||
#[derive(Clone, Debug, Eq, PartialEq)]
|
||||
pub struct Location {
|
||||
file_idx: usize,
|
||||
@@ -7,10 +11,22 @@ pub struct Location {
|
||||
}
|
||||
|
||||
impl Location {
|
||||
/// Generate a new `Location` from a file index and an offset from the
|
||||
/// start of the file.
|
||||
///
|
||||
/// The file index is based on the file database being used. See the
|
||||
/// `codespan_reporting::files::SimpleFiles::add` function, which is
|
||||
/// normally where we get this index.
|
||||
pub fn new(file_idx: usize, offset: usize) -> Self {
|
||||
Location { file_idx, offset }
|
||||
}
|
||||
|
||||
/// Generate a `Location` for a completely manufactured bit of code.
|
||||
///
|
||||
/// Ideally, this is used only in testing, as any code we generate as
|
||||
/// part of the compiler should, theoretically, be tied to some actual
|
||||
/// location in the source code. That being said, this can be used in
|
||||
/// a pinch ... just maybe try to avoid it if you can.
|
||||
pub fn manufactured() -> Self {
|
||||
Location {
|
||||
file_idx: 0,
|
||||
@@ -18,27 +34,73 @@ impl Location {
|
||||
}
|
||||
}
|
||||
|
||||
/// Generate a primary label for a [`Diagnostic`], based on this source
|
||||
/// location.
|
||||
///
|
||||
/// Note, this is just the [`Label`], you'll want to fill in the [`Diagnostic`]
|
||||
/// with a lot more information.
|
||||
///
|
||||
/// Primary labels are the things that are they key cause of the message.
|
||||
/// If, for example, it was an error to bind a variable named "x", and
|
||||
/// then have another binding of a variable named "x", the second one
|
||||
/// would likely be the primary label (because that's where the error
|
||||
/// actually happened), but you'd probably want to make the first location
|
||||
/// the secondary label to help users find it.
|
||||
pub fn primary_label(&self) -> Label<usize> {
|
||||
Label::primary(self.file_idx, self.offset..self.offset)
|
||||
}
|
||||
|
||||
/// Generate a secondary label for a [`Diagnostic`], based on this source
|
||||
/// location.
|
||||
///
|
||||
/// Note, this is just the [`Label`], you'll want to fill in the [`Diagnostic`]
|
||||
/// with a lot more information.
|
||||
///
|
||||
/// Secondary labels are the things that are involved in the message, but
|
||||
/// aren't necessarily a problem in and of themselves. If, for example, it
|
||||
/// was an error to bind a variable named "x", and then have another binding
|
||||
/// of a variable named "x", the second one would likely be the primary
|
||||
/// label (because that's where the error actually happened), but you'd
|
||||
/// probably want to make the first location the secondary label to help
|
||||
/// users find it.
|
||||
pub fn secondary_label(&self) -> Label<usize> {
|
||||
Label::secondary(self.file_idx, self.offset..self.offset)
|
||||
}
|
||||
|
||||
pub fn range_label(&self, end: &Location) -> Vec<Label<usize>> {
|
||||
if self.file_idx == end.file_idx {
|
||||
vec![Label::primary(self.file_idx, self.offset..end.offset)]
|
||||
} else if self.file_idx == 0 {
|
||||
// if this is a manufactured item, then ... just try the other one
|
||||
vec![Label::primary(end.file_idx, end.offset..end.offset)]
|
||||
/// Given this location and another, generate a primary label that
|
||||
/// specifies the area between those two locations.
|
||||
///
|
||||
/// See [`Self::primary_label`] for some discussion of primary versus
|
||||
/// secondary labels. If the two locations are the same, this method does
|
||||
/// the exact same thing as [`Self::primary_label`]. If this item was
|
||||
/// generated by [`Self::manufactured`], it will act as if you'd called
|
||||
/// `primary_label` on the argument. Otherwise, it will generate the obvious
|
||||
/// span.
|
||||
///
|
||||
/// This function will return `None` only in the case that you provide
|
||||
/// labels from two different files, which it cannot sensibly handle.
|
||||
pub fn range_label(&self, end: &Location) -> Option<Label<usize>> {
|
||||
if self.file_idx == 0 {
|
||||
return Some(end.primary_label());
|
||||
}
|
||||
|
||||
if self.file_idx != end.file_idx {
|
||||
return None;
|
||||
}
|
||||
|
||||
if self.offset > end.offset {
|
||||
Some(Label::primary(self.file_idx, end.offset..self.offset))
|
||||
} else {
|
||||
// we'll just pick the first location if this is in two different
|
||||
// files
|
||||
vec![Label::primary(self.file_idx, self.offset..self.offset)]
|
||||
Some(Label::primary(self.file_idx, self.offset..end.offset))
|
||||
}
|
||||
}
|
||||
|
||||
/// Return an error diagnostic centered at this location.
|
||||
///
|
||||
/// Note that this [`Diagnostic`] will have no information associated with
|
||||
/// it other than that (a) there is an error, and (b) that the error is at
|
||||
/// this particular location. You'll need to extend it with actually useful
|
||||
/// information, like what kind of error it is.
|
||||
pub fn error(&self) -> Diagnostic<usize> {
|
||||
Diagnostic::error().with_labels(vec![Label::primary(
|
||||
self.file_idx,
|
||||
@@ -46,6 +108,12 @@ impl Location {
|
||||
)])
|
||||
}
|
||||
|
||||
/// Return an error diagnostic centered at this location, with the given message.
|
||||
///
|
||||
/// This is much more useful than [`Self::error`], because it actually provides
|
||||
/// the user with some guidance. That being said, you still might want to add
|
||||
/// even more information to ut, using [`Diagnostic::with_labels`],
|
||||
/// [`Diagnostic::with_notes`], or [`Diagnostic::with_code`].
|
||||
pub fn labelled_error(&self, msg: &str) -> Diagnostic<usize> {
|
||||
Diagnostic::error().with_labels(vec![Label::primary(
|
||||
self.file_idx,
|
||||
|
||||
@@ -1,14 +1,32 @@
|
||||
//! The parser for NGR!
|
||||
//!
|
||||
//! This file contains the grammar for the NGR language; a grammar is a nice,
|
||||
//! machine-readable way to describe how your language's syntax works. For
|
||||
//! example, here we describe a program as a series of statements, statements
|
||||
//! as either variable binding or print statements, etc. As the grammar gets
|
||||
//! more complicated, using tools like [`lalrpop`] becomes even more important.
|
||||
//! (Although, at some point, things can become so complicated that you might
|
||||
//! eventually want to leave lalrpop behind.)
|
||||
//!
|
||||
use crate::syntax::{LexerError, Location};
|
||||
use crate::syntax::ast::{Program,Statement,Expression,Value};
|
||||
use crate::syntax::tokens::Token;
|
||||
use internment::ArcIntern;
|
||||
|
||||
// one cool thing about lalrpop: we can pass arguments. in this case, the
|
||||
// file index of the file we're parsing. we combine this with the file offset
|
||||
// that Logos gives us to make a [`crate::syntax::Location`].
|
||||
grammar(file_idx: usize);
|
||||
|
||||
// this is a slighlyt odd way to describe this, but: consider this section
|
||||
// as describing the stuff that is external to the lalrpop grammar that it
|
||||
// needs to know to do its job.
|
||||
extern {
|
||||
type Location = usize;
|
||||
type Location = usize; // Logos, our lexer, implements locations as
|
||||
// offsets from the start of the file.
|
||||
type Error = LexerError;
|
||||
|
||||
// here we redeclare all of the tokens.
|
||||
enum Token {
|
||||
"=" => Token::Equals,
|
||||
";" => Token::Semi,
|
||||
@@ -22,57 +40,123 @@ extern {
|
||||
"*" => Token::Operator('*'),
|
||||
"/" => Token::Operator('/'),
|
||||
|
||||
// the previous items just match their tokens, and if you try
|
||||
// to name and use "their value", you get their source location.
|
||||
// For these, we want "their value" to be their actual contents,
|
||||
// which is why we put their types in angle brackets.
|
||||
"<num>" => Token::Number((<Option<u8>>,<i64>)),
|
||||
"<var>" => Token::Variable(<ArcIntern<String>>),
|
||||
}
|
||||
}
|
||||
|
||||
pub Program: Program = {
|
||||
// a program is just a set of statements
|
||||
<stmts:Statements> => Program {
|
||||
statements: stmts
|
||||
}
|
||||
}
|
||||
|
||||
Statements: Vec<Statement> = {
|
||||
// a statement is either a set of statements followed by another
|
||||
// statement (note, here, that you can name the result of a sub-parse
|
||||
// using <name: subrule>) ...
|
||||
<mut stmts:Statements> <stmt:Statement> => {
|
||||
stmts.push(stmt);
|
||||
stmts
|
||||
},
|
||||
|
||||
// ... or it's nothing. This may feel like an awkward way to define
|
||||
// lists of things -- and it is a bit awkward -- but there are actual
|
||||
// technical reasons that you want to (a) use recursivion to define
|
||||
// these, and (b) use *left* recursion, specifically. That's why, in
|
||||
// this file, all of the recursive cases are to the left, like they
|
||||
// are above.
|
||||
//
|
||||
// the details of why left recursion is better is actually pretty
|
||||
// fiddly and in the weeds, and if you're interested you should look
|
||||
// up LALR parsers versus LL parsers; both their differences and how
|
||||
// they're constructed, as they're kind of neat.
|
||||
//
|
||||
// but if you're just writing grammars with lalrpop, then you should
|
||||
// just remember that you should always use left recursion, and be
|
||||
// done with it.
|
||||
=> {
|
||||
Vec::new()
|
||||
}
|
||||
}
|
||||
|
||||
pub Statement: Statement = {
|
||||
// A statement can be a variable binding. Note, here, that we use this
|
||||
// funny @L thing to get the source location before the variable, so that
|
||||
// we can say that this statement spans across everything.
|
||||
<l:@L> <v:"<var>"> "=" <e:Expression> ";" => Statement::Binding(Location::new(file_idx, l), v.to_string(), e),
|
||||
|
||||
// Alternatively, a statement can just be a print statement.
|
||||
"print" <l:@L> <v:"<var>"> ";" => Statement::Print(Location::new(file_idx, l), v.to_string()),
|
||||
}
|
||||
|
||||
// Expressions! Expressions are a little fiddly, because we're going to
|
||||
// use a little bit of a trick to make sure that we get operator precedence
|
||||
// right. The trick works by creating a top-level `Expression` grammar entry
|
||||
// that just points to the thing with the *weakest* precedence. In this case,
|
||||
// we have addition, subtraction, multiplication, and division, so addition
|
||||
// and subtraction have the weakest precedence.
|
||||
//
|
||||
// Then, as we go down the precedence tree, each item will recurse (left!)
|
||||
// to other items at the same precedence level. The right hand operator, for
|
||||
// binary operators (which is all of ours, at the moment) will then be one
|
||||
// level stronger precendence. In addition, we'll let people just fall through
|
||||
// to the next level; so if there isn't an addition or subtraction, we'll just
|
||||
// fall through to the multiplication/division case.
|
||||
//
|
||||
// Finally, at the bottom, we'll have the core expressions (like constants,
|
||||
// variables, etc.) as well as a parenthesized version of `Expression`, which
|
||||
// gets us right up top again.
|
||||
//
|
||||
// Understanding why this works to solve all your operator precedence problems
|
||||
// is a little hard to give an easy intuition for, but for myself it helped
|
||||
// to run through a few examples. Consider thinking about how you want to
|
||||
// parse something like "1 + 2 * 3", for example, versus "1 + 2 + 3" or
|
||||
// "1 * 2 + 3", and hopefully that'll help.
|
||||
Expression: Expression = {
|
||||
AdditiveExpression,
|
||||
}
|
||||
|
||||
// we group addition and subtraction under the heading "additive"
|
||||
AdditiveExpression: Expression = {
|
||||
<e1:AdditiveExpression> <l:@L> "+" <e2:MultiplicativeExpression> => Expression::Primitive(Location::new(file_idx, l), "+".to_string(), vec![e1, e2]),
|
||||
<e1:AdditiveExpression> <l:@L> "-" <e2:MultiplicativeExpression> => Expression::Primitive(Location::new(file_idx, l), "-".to_string(), vec![e1, e2]),
|
||||
MultiplicativeExpression,
|
||||
}
|
||||
|
||||
// similarly, we group multiplication and division under "multiplicative"
|
||||
MultiplicativeExpression: Expression = {
|
||||
<e1:MultiplicativeExpression> <l:@L> "*" <e2:AtomicExpression> => Expression::Primitive(Location::new(file_idx, l), "*".to_string(), vec![e1, e2]),
|
||||
<e1:MultiplicativeExpression> <l:@L> "/" <e2:AtomicExpression> => Expression::Primitive(Location::new(file_idx, l), "/".to_string(), vec![e1, e2]),
|
||||
AtomicExpression,
|
||||
}
|
||||
|
||||
// finally, we describe our lowest-level expressions as "atomic", because
|
||||
// they cannot be further divided into parts
|
||||
AtomicExpression: Expression = {
|
||||
// just a variable reference
|
||||
<l:@L> <v:"<var>"> => Expression::Reference(Location::new(file_idx, l), v.to_string()),
|
||||
// just a number
|
||||
<l:@L> <n:"<num>"> => {
|
||||
let val = Value::Number(n.0, n.1);
|
||||
Expression::Value(Location::new(file_idx, l), val)
|
||||
},
|
||||
// a tricky case: also just a number, but using a negative sign. an
|
||||
// alternative way to do this -- and we may do this eventually -- is
|
||||
// to implement a unary negation expression. this has the odd effect
|
||||
// that the user never actually writes down a negative number; they just
|
||||
// write positive numbers which are immediately sent to a negation
|
||||
// primitive!
|
||||
<l:@L> "-" <n:"<num>"> => {
|
||||
let val = Value::Number(n.0, -n.1);
|
||||
Expression::Value(Location::new(file_idx, l), val)
|
||||
},
|
||||
// finally, let people parenthesize expressions and get back to a
|
||||
// lower precedence
|
||||
"(" <e:Expression> ")" => e,
|
||||
}
|
||||
@@ -1,63 +0,0 @@
|
||||
use crate::syntax::ast::{Expression, Program, Statement};
|
||||
|
||||
impl Program {
|
||||
pub fn simplify(mut self) -> Self {
|
||||
let mut new_statements = Vec::new();
|
||||
let mut gensym_index = 1;
|
||||
|
||||
for stmt in self.statements.drain(..) {
|
||||
new_statements.append(&mut stmt.simplify(&mut gensym_index));
|
||||
}
|
||||
|
||||
self.statements = new_statements;
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
impl Statement {
|
||||
pub fn simplify(self, gensym_index: &mut usize) -> Vec<Statement> {
|
||||
let mut new_statements = vec![];
|
||||
|
||||
match self {
|
||||
Statement::Print(_, _) => new_statements.push(self),
|
||||
Statement::Binding(_, _, Expression::Reference(_, _)) => new_statements.push(self),
|
||||
Statement::Binding(_, _, Expression::Value(_, _)) => new_statements.push(self),
|
||||
Statement::Binding(loc, name, value) => {
|
||||
let (mut prereqs, new_value) = value.rebind(&name, gensym_index);
|
||||
new_statements.append(&mut prereqs);
|
||||
new_statements.push(Statement::Binding(loc, name, new_value))
|
||||
}
|
||||
}
|
||||
|
||||
new_statements
|
||||
}
|
||||
}
|
||||
|
||||
impl Expression {
|
||||
fn rebind(self, base_name: &str, gensym_index: &mut usize) -> (Vec<Statement>, Expression) {
|
||||
match self {
|
||||
Expression::Value(_, _) => (vec![], self),
|
||||
Expression::Reference(_, _) => (vec![], self),
|
||||
Expression::Primitive(loc, prim, mut expressions) => {
|
||||
let mut prereqs = Vec::new();
|
||||
let mut new_exprs = Vec::new();
|
||||
|
||||
for expr in expressions.drain(..) {
|
||||
let (mut cur_prereqs, arg) = expr.rebind(base_name, gensym_index);
|
||||
prereqs.append(&mut cur_prereqs);
|
||||
new_exprs.push(arg);
|
||||
}
|
||||
|
||||
let new_name = format!("<{}:{}>", base_name, *gensym_index);
|
||||
*gensym_index += 1;
|
||||
prereqs.push(Statement::Binding(
|
||||
loc.clone(),
|
||||
new_name.clone(),
|
||||
Expression::Primitive(loc.clone(), prim, new_exprs),
|
||||
));
|
||||
|
||||
(prereqs, Expression::Reference(loc, new_name))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -4,8 +4,30 @@ use std::fmt;
|
||||
use std::num::ParseIntError;
|
||||
use thiserror::Error;
|
||||
|
||||
/// A single token of the input stream; used to help the parsing go down
|
||||
/// more easily.
|
||||
///
|
||||
/// The key way to generate this structure is via the [`Logos`] trait.
|
||||
/// See the [`logos`] documentation for more information; we use the
|
||||
/// [`Token::lexer`] function internally.
|
||||
///
|
||||
/// The first step in the compilation process is turning the raw string
|
||||
/// data (in UTF-8, which is its own joy) in to a sequence of more sensible
|
||||
/// tokens. Here, for example, we turn "x=5" into three tokens: a
|
||||
/// [`Token::Variable`] for "x", a [`Token::Equals`] for the "=", and
|
||||
/// then a [`Token::Number`] for the "5". Later on, we'll worry about
|
||||
/// making sense of those three tokens.
|
||||
///
|
||||
/// For now, our list of tokens is relatively straightforward. We'll
|
||||
/// need/want to extend these later.
|
||||
///
|
||||
/// The [`std::fmt::Display`] implementation for [`Token`] should
|
||||
/// round-trip; if you lex a string generated with the [`std::fmt::Display`]
|
||||
/// trait, you should get back the exact same token.
|
||||
#[derive(Logos, Clone, Debug, PartialEq, Eq)]
|
||||
pub enum Token {
|
||||
// Our first set of tokens are simple characters that we're
|
||||
// going to use to structure NGR programs.
|
||||
#[token("=")]
|
||||
Equals,
|
||||
|
||||
@@ -18,12 +40,20 @@ pub enum Token {
|
||||
#[token(")")]
|
||||
RightParen,
|
||||
|
||||
// Next we take of any reserved words; I always like to put
|
||||
// these before we start recognizing more complicated regular
|
||||
// expressions. I don't think it matters, but it works for me.
|
||||
#[token("print")]
|
||||
Print,
|
||||
|
||||
// Next are the operators for NGR. We only have 4, now, but
|
||||
// we might extend these later, or even make them user-definable!
|
||||
#[regex(r"[+\-*/]", |v| v.slice().chars().next())]
|
||||
Operator(char),
|
||||
|
||||
/// Numbers capture both the value we read from the input,
|
||||
/// converted to an `i64`, as well as the base the user used
|
||||
/// to write the number, if they did so.
|
||||
#[regex(r"0b[01]+", |v| parse_number(Some(2), v))]
|
||||
#[regex(r"0o[0-7]+", |v| parse_number(Some(8), v))]
|
||||
#[regex(r"0d[0-9]+", |v| parse_number(Some(10), v))]
|
||||
@@ -31,12 +61,23 @@ pub enum Token {
|
||||
#[regex(r"[0-9]+", |v| parse_number(None, v))]
|
||||
Number((Option<u8>, i64)),
|
||||
|
||||
// Variables; this is a very standard, simple set of characters
|
||||
// for variables, but feel free to experiment with more complicated
|
||||
// things. I chose to force variables to start with a lower case
|
||||
// letter, too.
|
||||
#[regex(r"[a-z][a-zA-Z0-9_]*", |v| ArcIntern::new(v.slice().to_string()))]
|
||||
Variable(ArcIntern<String>),
|
||||
|
||||
// the next token will be an error token
|
||||
#[error]
|
||||
// we're actually just going to skip whitespace, though
|
||||
#[regex(r"[ \t\r\n\f]+", logos::skip)]
|
||||
// this is an extremely simple version of comments, just line
|
||||
// comments. More complicated /* */ comments can be harder to
|
||||
// implement, and didn't seem worth it at the time.
|
||||
#[regex(r"//.*", logos::skip)]
|
||||
/// This token represents that some core error happened in lexing;
|
||||
/// possibly that something didn't match anything at all.
|
||||
Error,
|
||||
}
|
||||
|
||||
@@ -63,19 +104,28 @@ impl fmt::Display for Token {
|
||||
}
|
||||
}
|
||||
|
||||
/// A sudden and unexpected error in the lexer.
|
||||
#[derive(Debug, Error, PartialEq, Eq)]
|
||||
pub enum LexerError {
|
||||
/// The `usize` here is the offset that we ran into the problem, given
|
||||
/// from the start of the file.
|
||||
#[error("Failed lexing at {0}")]
|
||||
LexFailure(usize),
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
impl Token {
|
||||
/// Create a variable token with the given name. Very handy for
|
||||
/// testing.
|
||||
pub(crate) fn var(s: &str) -> Token {
|
||||
Token::Variable(ArcIntern::new(s.to_string()))
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse a number in the given base, return a pair of the base and the
|
||||
/// parsed number. This is just a helper used for all of the number
|
||||
/// regular expression cases, which kicks off to the obvious Rust
|
||||
/// standard library function.
|
||||
fn parse_number(
|
||||
base: Option<u8>,
|
||||
value: &Lexer<Token>,
|
||||
|
||||
@@ -2,6 +2,13 @@ use crate::syntax::{Expression, Location, Program, Statement};
|
||||
use codespan_reporting::diagnostic::Diagnostic;
|
||||
use std::collections::HashMap;
|
||||
|
||||
/// An error we found while validating the input program.
|
||||
///
|
||||
/// These errors indicate that we should stop trying to compile
|
||||
/// the program, because it's just fundamentally broken in a way
|
||||
/// that we're not going to be able to work through. As with most
|
||||
/// of these errors, we recommend converting this to a [`Diagnostic`]
|
||||
/// and using [`codespan_reporting`] to present them to the user.
|
||||
pub enum Error {
|
||||
UnboundVariable(Location, String),
|
||||
}
|
||||
@@ -16,6 +23,13 @@ impl From<Error> for Diagnostic<usize> {
|
||||
}
|
||||
}
|
||||
|
||||
/// A problem we found validating the input that isn't critical.
|
||||
///
|
||||
/// These are things that the user might want to do something about,
|
||||
/// but we can keep going without it being a problem. As with most of
|
||||
/// these things, if you want to present this information to the user,
|
||||
/// the best way to do so is via [`From`] and [`Diagnostic`], and then
|
||||
/// interactions via [`codespan_reporting`].
|
||||
#[derive(Debug, PartialEq, Eq)]
|
||||
pub enum Warning {
|
||||
ShadowedVariable(Location, Location, String),
|
||||
@@ -37,6 +51,11 @@ impl From<Warning> for Diagnostic<usize> {
|
||||
}
|
||||
|
||||
impl Program {
|
||||
/// Validate that the program makes semantic sense, not just syntactic sense.
|
||||
///
|
||||
/// This checks for things like references to variables that don't exist, for
|
||||
/// example, and generates warnings for things that are inadvisable but not
|
||||
/// actually a problem.
|
||||
pub fn validate(&self) -> (Vec<Error>, Vec<Warning>) {
|
||||
let mut errors = vec![];
|
||||
let mut warnings = vec![];
|
||||
@@ -53,6 +72,15 @@ impl Program {
|
||||
}
|
||||
|
||||
impl Statement {
|
||||
/// Validate that the statement makes semantic sense, not just syntactic sense.
|
||||
///
|
||||
/// This checks for things like references to variables that don't exist, for
|
||||
/// example, and generates warnings for things that are inadvisable but not
|
||||
/// actually a problem. Since statements appear in a broader context, you'll
|
||||
/// need to provide the set of variables that are bound where this statement
|
||||
/// occurs. We use a `HashMap` to map these bound locations to the locations
|
||||
/// where their bound, because these locations are handy when generating errors
|
||||
/// and warnings.
|
||||
pub fn validate(
|
||||
&self,
|
||||
bound_variables: &mut HashMap<String, Location>,
|
||||
|
||||
Reference in New Issue
Block a user