nfa-tools
Resources & Distribution
Source Code
Package Registries
NFA Tools: Regular Languages and Finite Automata
An elegant, pedagogical implementation of finite automata with NFA to DFA conversion, regex parsing, and visualization capabilities.
Why Regular Languages? A Theoretical Motivation
The Computational Hierarchy
Regular languages represent the simplest class of formal languages in the Chomsky hierarchy. They are precisely the languages that can be recognized by finite automata—machines with a finite amount of memory. This limitation is not a bug, but a feature: it makes regular languages decidable, efficient, and mathematically elegant.
The Power and Limits of Regular Expressions
Regular expressions give us exactly the right amount of expressive power for many practical tasks. Consider what happens if we had more power:
- With arithmetic expressions: We could express
{ w | w encodes a prime number in binary }, but this isn’t regular—it requires unbounded computation to verify primality. - With set-theoretic operations beyond union: We could express
{ aⁿbⁿ | n ≥ 0 }, but this requires counting, which needs unbounded memory. - With first-order logic: We could express non-regular properties that require remembering arbitrary amounts of information.
Regular expressions strike a perfect balance: they can express patterns like “strings ending with abb” or “valid email addresses” but cannot express “balanced parentheses” or “palindromes of arbitrary length”—these require more computational power.
The Finite State Model
The beauty of regular languages lies in their computational model:
States: Finite set Q = {q₀, q₁, ..., qₙ}
Alphabet: Finite set Σ = {a, b, ...}
Transitions: δ: Q × Σ → Q (or → P(Q) for NFAs)
Memory: None! Everything is encoded in the current state
Key insight: A finite automaton has no stack, no tape, no counters—just states and transitions. The current state is the entire memory of the machine. This is why:
- We can’t count arbitrarily high (would need infinite states)
- We can’t match nested structures (would need a stack)
- We can’t remember arbitrary strings (would need a tape)
Regular Languages in Practice
Despite their limitations, regular languages are everywhere:
- Lexical analysis: Tokenizing programming languages
- Pattern matching: Finding patterns in text (grep, sed, awk)
- Protocol specification: Defining valid message sequences
- Input validation: Checking formats (phone numbers, emails, URLs)
The restrictions of regular languages become features in practice:
- Guaranteed linear time: O(n) processing for string of length n
- Constant space: O(1) memory usage regardless of input size
- Decidable equivalence: We can algorithmically determine if two regular expressions describe the same language
- Closure properties: Closed under union, concatenation, star, complement, and intersection
The Pedagogical Value
Understanding regular languages teaches fundamental concepts:
- Abstraction: How simple rules create complex behaviors
- Equivalence: NFAs, DFAs, and regular expressions all have the same expressive power
- Algorithmic thinking: Subset construction, minimization, and pumping lemma
- Theoretical limits: What can and cannot be computed with finite memory
This implementation demonstrates these concepts concretely, allowing you to see how a regex becomes an NFA, how an NFA becomes a DFA, and how a DFA can be minimized—making the abstract tangible.
Features
- NFA/DFA Data Structures: Clean, formal implementations of Non-deterministic and Deterministic Finite Automata
- Regex to NFA: Thompson’s construction for converting regular expressions to NFAs
- NFA to DFA: Subset construction algorithm for converting NFAs to equivalent DFAs
- DFA Minimization: Hopcroft’s algorithm for minimizing DFAs
- Visualization: Graphviz-based visualization with state highlighting during input processing
- Animation: Step-by-step visualization of input string processing
Installation
# Install the package in development mode
pip install -e .
# Or just install dependencies
pip install -r requirements.txt
Command-Line Tool
After installation, use the nfa-tools command:
# Match pattern against string
nfa-tools match '(a|b)*abb' 'aabb'
# Search in files (grep-like)
nfa-tools grep 'a+b+' myfile.txt
# Visualize automaton
nfa-tools visualize '(a|b)*' -o graph.png
# Show statistics
nfa-tools stats 'a*b+c?'
# Test multiple strings
nfa-tools test 'a*b+' ab aab aaab
# Step-by-step subset construction demo
python demo_subset_construction.py '(a|b)*abb'
Quick Start
Basic Usage
from automata import NFA, DFA, State, nfa_to_dfa
from regex_parser import regex_to_nfa
from visualizer import AutomatonVisualizer
# Convert regex to NFA
nfa = regex_to_nfa("(a|b)*abb")
# Convert NFA to DFA
dfa = nfa_to_dfa(nfa)
# Minimize DFA
min_dfa = dfa.minimize()
# Test strings
print(min_dfa.accepts("abb")) # True
print(min_dfa.accepts("aabb")) # True
print(min_dfa.accepts("ab")) # False
# Visualize
viz = AutomatonVisualizer(min_dfa, "DFA for (a|b)*abb")
viz.save("my_dfa", format='png')
Manual NFA Construction
from automata import NFA, State, EPSILON
# Create states
q0 = State('q0')
q1 = State('q1')
q2 = State('q2')
# Create NFA
nfa = NFA(
states={q0, q1, q2},
alphabet={'a', 'b'},
initial=q0,
accepting={q2}
)
# Add transitions
nfa.add_transition(q0, 'a', q1)
nfa.add_transition(q1, 'b', q2)
nfa.add_transition(q0, EPSILON, q2) # Epsilon transition
# Test
print(nfa.accepts("ab")) # True
print(nfa.accepts("")) # True (via epsilon)
Supported Regular Expression Operators
- Concatenation:
ab(a followed by b) - Union:
a|b(a or b) - Kleene Star:
a*(zero or more a’s) - Plus:
a+(one or more a’s) - Optional:
a?(zero or one a) - Grouping:
(ab)*(zero or more repetitions of ab)
Interactive Demo
Run the interactive demo to experiment with your own regular expressions:
python main.py
The demo includes:
- Regex to DFA conversion examples
- Manual NFA construction examples
- Input processing animation
- Interactive mode for custom regular expressions
Testing
Run the test suite:
pytest test_automata.py -v
Architecture
The project follows formal automata theory definitions:
- NFA: (Q, Σ, δ, q₀, F) where δ: Q × (Σ ∪ {ε}) → P(Q)
- DFA: (Q, Σ, δ, q₀, F) where δ: Q × Σ → Q
Key algorithms implemented:
- Thompson’s Construction (Regex → NFA)
- Subset Construction (NFA → DFA)
- Hopcroft’s Algorithm (DFA Minimization)
Project Structure
nfa_tools/
├── automata.py # Core NFA/DFA classes
├── regex_parser.py # Thompson's construction
├── visualizer.py # Graphviz visualization
├── subset_construction_demo.py # Step-by-step algorithm demo
└── cli.py # Command-line interface
tests/
├── test_automata.py # Core functionality tests
├── test_regex_comparison.py # Comparison with Python's re
├── test_nested_parentheses.py # Deep nesting tests
└── test_real_world_patterns.py # Practical pattern tests
Examples of Non-Regular Languages
To appreciate what regular languages can do, it’s instructive to see what they cannot:
- Balanced parentheses:
{ (ⁿ)ⁿ | n ≥ 0 }- Requires a stack - Palindromes:
{ w | w = wᴿ }- Requires comparing both ends - Prime numbers:
{ 1ⁿ | n is prime }- Requires division/multiplication - Equal counts:
{ aⁿbⁿ | n ≥ 0 }- Requires counting - Nested structures: HTML/XML with proper nesting - Requires recursion
These limitations aren’t flaws—they define the boundary between simple pattern matching (regular) and more complex parsing (context-free and beyond).
Further Reading
Books:
- Introduction to Automata Theory by Hopcroft, Motwani, and Ullman
- Introduction to the Theory of Computation by Michael Sipser
Concepts to Explore:
- Pumping Lemma for Regular Languages
- Myhill-Nerode Theorem
- Regular Expression Derivatives
- DFA Minimization Algorithms
- Connection to Monoids and Algebra
License
MIT