lexers / lexing

Lexers: Generates program units.

How to recover trace during errors?

Source-code position of the character can be stored in lexer for error reflection.

Backtracking occurs due to choice regex:

R1 | R2

If we write bad regexes:

('a'+) 'b' | ('a'+) 'c' | ('a'+) d

Instead of:

('a'+) ('b'|'c'|'d')

We would backtrack a lot more.

We can reduce SAT to: Recognizing a string by regex

Hence regex is in same class. We can do it the other way round as well.

Hence regex is npcomplete.

Regular expressions are ambiguous: Given the string “ifx = 0”

We could parse it has:

| if | x | = | 0 |

Or:

| ifx | = | 0 |

Give some matches higher priority.

state is regex we are matching transition is successfully matching the regex.

transition forks for choice, because there could be multiple ways to match.

nfa

ab* | ba*(ab)?

Run all executions of NFA in parallel. Keep track of possible states: “finite fingers”

Needed for describing nested chain structure / syntactic structure. e.g. if else, balanced paranthesis etc…

Check a phrase is part of our language. See if language rules can produce the phrase.

parse tree is concrete syntax structure ast abstracts unneeded info, preserving structure of syntax.

2021-09-15