Skip to content

nik-hz/viper

Repository files navigation

viper

Angel Cui: lc3542
Nikolaus Holzer: nh2677

Code Generation

For TAs: shell script to set up virtual environment and run full code examples for HW3

For hw3 grading, please refer to the Code Generation section.

  1. Make sure you are in the viper directory.
  2. Run chmod +x vpc.sh to ensure executable access to the shell script.
  3. Run source ./vpc.sh. This will activate a virtual environment called viper and set you up with required dependencies, and run the five code examples and return the outputs. For grading please run python vpc.py which will run the Viper compiler on the ex{num}.vp files in ./examples and generate the corresponding ex{num}.py code if source ./parser.sh fails.
  4. And then you can run the generated .py code and compare their output with the expected ones. We run the code and compared the output in the demo video 2.

Please find the demo video at: https://drive.google.com/drive/folders/1Dgx0PggO9zZVhKScSzEIXWRdzvQyOG7i?usp=sharing

Demo video folder is open to CU emails, please log in to see the videos, or request access, we will grant access as soon as we can.

Viper to Python sample pairs

Our code generator should convert the AST into correct python code. We check for compile time type consistency in the Viper code and then turn it to python. Please refer to the ex{num}_correct.py as expected output for ex{num}.vp viper code. All examples provided have the expected python output and expected behavior.

Syntactic Analysis

For TAs: shell script to set up virtual environment and run full code examples for HW2

For hw2 grading, please refer to the Tokenizing section.

  1. Make sure you are in the viper directory.
  2. Run chmod +x parser.sh to ensure executable access to the shell script.
  3. Run source ./parser.sh. This will activate a virtual environment called viper and set you up with required dependencies, and run the five code examples and return the outputs. For grading please run python parser which will execute the __main__.py file in the scanner dir if source ./parser.sh fails.

Context Free Grammar for Viper AST

Viper -> StatementList

StatementList -> Statement StatementList | ε

Statement -> TypeDeclaration Statement' | ExpressionStatement

Statement' -> <VAR> <ASSIGN> Expression <SEMICOLON> // VariableDeclaration 
	| <DEF> <FUNC> <LPAREN> ParameterList <RPAREN> <PYTHON_CODE, :> FunctionBody // FunctionDefinition

TypeDeclaration -> <TYPE> <TYPE_DEC>

ParameterList -> Parameter ParameterListRest | ε

ParameterListRest -> <PYTHON_CODE, ,> Parameter ParameterListRest | ε

Parameter -> TypeDeclaration <VAR>

FunctionBody -> <LBRACE> StatementList ReturnStatement <RBRACE>

ReturnStatement -> <PYTHON_CODE, return> ExpressionStatement <SEMICOLON> | ε

ExpressionStatement -> Expression <SEMICOLON> | Expression

Expression -> SimpleExpression ExpressionPrime

ExpressionPrime -> SimpleExpression ExpressionPrime | ε

SimpleExpression -> PYTHON_CODE | VAR | FunctionCall | Loop | ParenthesizedExpression | Range | OP

Range -> <LPAREN> Python Var Python Var Range <RPAREN> | <LPAREN> Python Var Python Var Range <RPAREN> <SEMICOLON> | ε

Python -> <PYTHON_CODE> Python | ε

Var -> <VAR> Var | ε

ArithmeticExpression -> Expression <OP> Expression

FunctionCall -> <FUNC> <LPAREN> ArgumentList <RPAREN> <SEMICOLON>

ArgumentList -> Expression ArgumentListRest | ε

ArgumentListRest -> <PYTHON_CODE, ,> Expression ArgumentListRest | ε

Loop -> <PYTHON_CODE, for> <PYTHON_CODE> <PYTHON_CODE, in> Python <LBRACE> StatementList <RBRACE>

Error Handling

Our parser handles syntactic errors. We implement a stack based error handler that creates a new local error context for each recursive call, only propagating the errors from true syntactic errors. In this way, our parser does not throw errors coming from the regular tree search of the recursive parser. Below

"""Error handling: stack management"""
 def push_error_context(self):
     # Create a new temporary error context on the stack
     self.error_context_stack.append([])

 def pop_error_context(self, success):
     # Pop the top error context and commit to main error list if unsuccessful
     if self.error_context_stack:
         temp_errors = self.error_context_stack.pop()
         if not success and temp_errors:
             self.err.extend(temp_errors)

 def add_error(self, message):
     # token = self.current_token()
     token = None
     error = ParserError(message, self.position, token)
     if self.error_context_stack:
         self.error_context_stack[-1].append(error)
     else:
         self.err.append(error)

"""Error handling: Local error context"""
 def parse_statement(self):
     pos = self.position
     self.push_error_context()
     type_decl = self.parse_type_declaration()
     if type_decl is not None:
         stmt_prime = self.parse_statement_prime()
         if stmt_prime is not None:
             self.pop_error_context(True)
             return ("Statement", type_decl, stmt_prime)
         self.add_error("Expected StatementPrime after TypeDeclaration")
         self.position = pos

     expr_stmt = self.parse_expression_statement()
     if expr_stmt is not None:
         self.pop_error_context(True)
         return ("Statement", expr_stmt)

     self.pop_error_context(False)
     self.add_error("Expected a TypeDeclaration or ExpressionStatement")
     return None

Parsing Examples

We show examples that illustrate how parsed viper code looks like. We only include the short examples below, when you run our scanner as instructed above, you would be able to see the full list of input and output (same as expected output).


Running test case 1:
Code:
 int :: x_a = 10; list :: y = range(0,x_a); for i in y: { print(i); };

Tokens:
<TYPE, int>, <TYPE_DEC, ::>, <VAR, x_a>, <ASSIGN, =>, <PYTHON_CODE, 10>,
<SEMICOLON, ;>, <TYPE, list>, <TYPE_DEC, ::>, <VAR, y>, <ASSIGN, =>, 
<TYPE, range>, <LPAREN, (>, <PYTHON_CODE, 0>, <PYTHON_CODE, ,>, <VAR, x_a>,
<RPAREN, )>, <SEMICOLON, ;>, <PYTHON_CODE, for>, <PYTHON_CODE, i>, 
<PYTHON_CODE, in>, <PYTHON_CODE, y:>, <LBRACE, {>, <PYTHON_CODE, print>, 
<LPAREN, (>, <PYTHON_CODE, i>, <RPAREN, )>, <SEMICOLON, ;>, <RBRACE, }>, <SEMICOLON, ;>

AST:
- VariableDeclaration
   - TypeDeclaration
      - <TYPE, int> <TYPE_DEC, ::>
   - Statement'
      - <VAR, x_a> <ASSIGN, => <PYTHON_CODE, 10> <SEMICOLON, ;>
- VariableDeclaration
   - TypeDeclaration
      - <TYPE, list> <TYPE_DEC, ::>
   - Statement'
      - <VAR, y> <ASSIGN, => <TYPE, range> <LPAREN, (> <PYTHON_CODE, 0> <PYTHON_CODE, ,> <VAR, x_a> <RPAREN, )> <SEMICOLON, ;>
- ExpressionStatement
   - Expression
         - Loop
            - <PYTHON_CODE, for> <PYTHON_CODE, i> <PYTHON_CODE, in> <PYTHON_CODE, y:> <LBRACE, {> <PYTHON_CODE, print> <LPAREN, (> <PYTHON_CODE, i> <RPAREN, )> <SEMICOLON, ;> <RBRACE, }>
   - <SIMILON, ;>

----------------------------------------

Running test case 2: (Error catched in parsing: python_code type_dec is not valid)
Code:
 str :: def say_hello_world(){ string :: text = 'hello world'; print(text);};

Tokens:
<TYPE, str>, <TYPE_DEC, ::>, <DEF, def>, <FUNC, say_hello_world>, <LPAREN, (>,
<RPAREN, )>, <LBRACE, {>, <PYTHON_CODE, string>, <TYPE_DEC, ::>, <VAR, text>,
<ASSIGN, =>, <PYTHON_CODE, 'hello>, <PYTHON_CODE, world'>, <SEMICOLON, ;>,
<PYTHON_CODE, print>, <LPAREN, (>, <VAR, text>, <RPAREN, )>, <SEMICOLON, ;>,
<RBRACE, }>, <SEMICOLON, ;>

AST:
- VariableDeclaration
   - TypeDeclaration
      - <TYPE, str> <TYPE_DEC, ::>
   - Statement'
      - <DEF, def> <FUNC, say_hello_world> <LPAREN, (> <RPAREN, )> <LBRACE, {> <PYTHON_CODE, string>ERROR (Here there should be an error)


----------------------------------------

Running test case 3:
Code:
 int :: def func(int :: a, int :: b):{ int :: c = a + b; return c;}

Tokens:
<TYPE, int>, <TYPE_DEC, ::>, <DEF, def>, <FUNC, func>, <LPAREN, (>,
<TYPE, int>, <TYPE_DEC, ::>, <VAR, a>, <PYTHON_CODE, ,>, <TYPE, int>,
<TYPE_DEC, ::>, <VAR, b>, <RPAREN, )>, <PYTHON_CODE, :>, <LBRACE, {>,
<TYPE, int>, <TYPE_DEC, ::>, <VAR, c>, <ASSIGN, =>, <VAR, a>,
<OP, +>, <VAR, b>, <SEMICOLON, ;>, <PYTHON_CODE, return>, <VAR, c>,
<SEMICOLON, ;>, <RBRACE, }>

AST:
- VariableDeclaration
   - TypeDeclaration
      - <TYPE, int> <TYPE_DEC, ::>
   - Statement'
      - <DEF, def> <FUNC, func> <LPAREN, (> 
      - ParameterList
         - <TYPE, int> <TYPE_DEC, ::> <VAR, a> <PYTHON_CODE, ,> <TYPE, int> <TYPE_DEC, ::> <VAR, b> 
      - <RPAREN, )> 
      - FunctionBody
         - <LBRACE, {> 
         - StatementList
            - TypeDeclaration
               - <TYPE, int> <TYPE_DEC, ::>
            - Statement'
               - <VAR, c> <ASSIGN, =>
               - ExpressionStatement
                  - ArithmeticExpression
                   - <VAR, a> <OP, +> <VAR, b>
                  - <SEMICOLON, ;>
         - ReturnStatement
            - <PYTHON_CODE, return> 
            - ExpressionStatement
               - Expression
                  - <VAR, c>
            - <SEMICOLON>
         - <RBRACE, }>

----------------------------------------

Tokenizing

For TAs: shell script to set up virtual environment and run full code examples for HW1

  1. Make sure you are in the viper directory.
  2. Run chmod +x scanner.sh to ensure executable access to the shell script.
  3. Run source ./scanner.sh. This will activate a virtual environment called viper and set you up with required dependencies, and run the five code examples and return the outputs. For grading please run python scanner which will execute the __main__.py file in the scanner dir if source ./scanner.sh fails.

For grading please refer to lexical_dfa.jpg for the dfa for the scanner.

This will run 5 examples shown below and parse them.

Note: we only include 5 simple examples below because the output is too long to be included in the README, more complicated examples will be printed by the script which shows the expected output. The expected output included below has a different format of printed to make it looks better in README.

Lexical grammar

We define thirteen new token classes that the viper tokenizer recognizes.

  1. <TYPE_DEC>: type declaration token → "::"
    1. The :: token will be used to declare the type of a variable or return value. It serves as the separator between variable names and their type annotations.
    2. Example: int :: x = 1;
  2. <TYPE>: Type tokens → “Any”, “bool”, “Callable”, “complex”, “dict”, “float”, “frozenset”, “int”, “list”, “Optional”, “range”, “set”, “str”, “tuple”, “Union”, “NoneType”
    1. Tokens representing data types such as str, float, list, tuple are recognized and reserved.
    2. Example list :: y = [];
  3. <SEMICOLON>: Semicolon → ";"
    1. The semicolon terminates logical lines, offering an alternative to python's newline delinated logical lines.
    2. Example print("Hello world!")
  4. <LPAREN>: Paren left → "("
    1. Left parenthesis
  5. <RPAREN>: Paren right → ")"
    1. Right parenthesis
  6. <LCBRACE>: Curly brace left → "{"
    1. Curly braces define blocks of code (such as function bodies or control flow syntax) and replaces pythons indentation based syntax.
    2. Example if a == True: {print("Hello world!");}
  7. <RCBRACE>: Curly brace right → "}"
    1. Curly braces define blocks of code (such as function bodies or control flow syntax) and replaces pythons indentation based syntax.
    2. Example if a == True: {print("Hello world!");}
  8. <VAR>: Variable name → "a"
    1. Any variable name that come after <TYPE> <TYPE_DEC>.
    2. Example int :: a;
  9. <FUNC>: Function name → "a"
    1. Any function name that come after <DEF>.
  10. <DEF>: Function definition → "def"
  11. Function definition
  12. <ASSIGN>: Defines the = operator which assigns values to variables.
  13. <PYTHON_CODE>: All regular python tokens → special token for unchecked token, viper relies on python interpreter for correctness.
  14. <OP>: all type operators → ["**", "*", "+", "-", "//", "/", "%"]

Error handling

Our lexical parser only handles errors directly related to malformed type declarations.

We implement panic mode for handling malformed lexemes of the types that we are defining in our lexical grammar. The default error handling of our lexer is to pass lexemes through to the python code class, errors will then appear when attempting to run the python program.

Tokenizing Examples

We show examples that illustrate how tokenized viper code looks like. We only include the short examples below, when you run our scanner as instructed above, you would be able to see the full list of input and output (same as expected output).


Running test case 1:
Code:
 int :: x_a = 10; list :: y = range(0,x_a); for i in y: { print(i); }
<TYPE, int>, <TYPE_DEC, ::>, <VAR, x_a>, <ASSIGN, =>, <PYTHON_CODE, 10>,
<SEMICOLON, ;>, <TYPE, list>, <TYPE_DEC, ::>, <VAR, y>, <ASSIGN, =>, 
<TYPE, range>, <LPAREN, (>, <PYTHON_CODE, 0>, <PYTHON_CODE, ,>, <VAR, x_a>,
<RPAREN, )>, <SEMICOLON, ;>, <PYTHON_CODE, for>, <PYTHON_CODE, i>, 
<PYTHON_CODE, in>, <PYTHON_CODE, y:>, <LBRACE, {>, <PYTHON_CODE, print>, 
<LPAREN, (>, <PYTHON_CODE, i>, <RPAREN, )>, <SEMICOLON, ;>, <RBRACE, }>
----------------------------------------

Running test case 2:
Code:
 str :: def say_hello_world(){ string :: text = 'hello world'; print(text);}
<TYPE, str>, <TYPE_DEC, ::>, <DEF, def>, <FUNC, say_hello_world>, <LPAREN, (>,
<RPAREN, )>, <LBRACE, {>, <PYTHON_CODE, string>, <TYPE_DEC, ::>, <VAR, text>,
<ASSIGN, =>, <PYTHON_CODE, 'hello>, <PYTHON_CODE, world'>, <SEMICOLON, ;>,
<PYTHON_CODE, print>, <LPAREN, (>, <VAR, text>, <RPAREN, )>, <SEMICOLON, ;>,
<RBRACE, }>
----------------------------------------

Running test case 3:
Code:
 int :: def func(int :: a, int :: b):{ int :: c = a + b; return c;}
<TYPE, int>, <TYPE_DEC, ::>, <DEF, def>, <FUNC, func>, <LPAREN, (>,
<TYPE, int>, <TYPE_DEC, ::>, <VAR, a>, <PYTHON_CODE, ,>, <TYPE, int>,
<TYPE_DEC, ::>, <VAR, b>, <RPAREN, )>, <PYTHON_CODE, :>, <LBRACE, {>,
<TYPE, int>, <TYPE_DEC, ::>, <VAR, c>, <ASSIGN, =>, <VAR, a>,
<OP, +>, <VAR, b>, <SEMICOLON, ;>, <PYTHON_CODE, return>, <VAR, c>,
<SEMICOLON, ;>, <RBRACE, }>
----------------------------------------

Additionally, here are examples of how our program handles errors. Our scanner has two recovery strategy for different errors.

Incorrect type assignment error: We consider the easy mistake of programmers not inserting the correct space between a type and the declarative ::. Here, if the scanner has consumed a valid type string, and the next token is a : and not a space, then it inserts the whitespace and tokenizes correctly.

Code:
 str:: t = 'hello world!';
<TYPE, str>, <TYPE_DEC, ::>, <VAR, t>, <ASSIGN, =>, <PYTHON_CODE, 'hello>, 
<PYTHON_CODE, world!'>, <SEMICOLON, ;>

Mixing ints and strs: We consider the case where a variable name starts with numbers, or conversely an integer includes digits. Our compiler results to panic mode deleting str characters and _ until it reaches a valid token which may be more ints or a whitespace.

Code:
 int :: 123x_a;
<TYPE, int>, <TYPE_DEC, ::>, <VAR, x_a>, <SEMICOLON, ;>

Invalid characters: Our default error handling strategy for characters that are not valid in python is to pass them through as python code. Since they are not recognized by Viper, the scanner assumes that they are valid python code and tokenizes them. In this case the Python interpreter will notify the programmer of the invalid input.

Code:
 int :: a = @;
<TYPE, int>, <TYPE_DEC, ::>, <VAR, a>, <ASSIGN, =>, <PYTHON_CODE, @>, <SEMICOLON, ;>

We have more test cases that are able to output expected output based on our language in scanner/main.py, you are able to see the output from running the shell script as instructed previously.

Development

For developpers, make sure you are in the viper directory. and run source ./setup.sh. This should activate a virtual environment called viper and set you up with required dependencies.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors