COMS W4115
Programming Languages and Translators
Lecture 7: Implementing a Lexical Analyzer
February 13, 2008
Lecture Outline
- Review
- Finite automata
- Converting a regular expression to an NFA
- Converting an NFA to a DFA
- Simulating an NFA
1. Review
- Issues for a the lexical analyzer
- Tokens/patterns/lexemes/attributes
- Specifying a lexical analyzer with Lex
- Creating a lexical processor with Lex
2. Finite Automata
- A nondeterministic finite automaton (NFA) consists of
- A finite set of states S.
- An input alphabet consisting of a finite set of symbols Σ.
- A transition function δ that maps S × (&Sigma ∪ {ε})
to subsets of S. This transition function can be represented
by a transition graph in which the nodes are labeled by states
and there is a directed edge labeled a from node w to node v
if &delta(w, a) contains v.
- An initial state s0 in S.
- F, a subset of S, called the final (or accepting) states.
- An NFA accepts an input string x if there is a path in the transition
graph from the initial state to a final state that spells out x.
- The language defined by an NFA is the set of strings accepted by the NFA.
- A deterministic finite automaton (DFA) is an NFA in which
- There are no ε moves, and
- For each state s and input symbol a there is exactly one transition
out of s labeled a.
3. Converting a Regular Expression to an NFA
- The McNaughton-Yamada-Thompson algorithm: Algorithm 3.23, pp. 159-163.
4. Converting an NFA to a DFA
- The subset construction: Algorithm 3.20, ALSU, p. 153.
5. Simulating an NFA
- Two-stack simulation of an NFA: Algorithm 3.22, pp. 156-159.
6. Reading Assignment
- Read Chapter 3, all sections except 3.9.
aho@cs.columbia.edu