active library

nfa-tools

Started 2025 Python

Resources & Distribution

Source Code

Package Registries

NFA Tools: Regular Languages and Finite Automata

An elegant, pedagogical implementation of finite automata with NFA to DFA conversion, regex parsing, and visualization capabilities.

Why Regular Languages? A Theoretical Motivation

The Computational Hierarchy

Regular languages represent the simplest class of formal languages in the Chomsky hierarchy. They are precisely the languages that can be recognized by finite automata—machines with a finite amount of memory. This limitation is not a bug, but a feature: it makes regular languages decidable, efficient, and mathematically elegant.

The Power and Limits of Regular Expressions

Regular expressions give us exactly the right amount of expressive power for many practical tasks. Consider what happens if we had more power:

With arithmetic expressions: We could express { w | w encodes a prime number in binary }, but this isn’t regular—it requires unbounded computation to verify primality.
With set-theoretic operations beyond union: We could express { aⁿbⁿ | n ≥ 0 }, but this requires counting, which needs unbounded memory.
With first-order logic: We could express non-regular properties that require remembering arbitrary amounts of information.

Regular expressions strike a perfect balance: they can express patterns like “strings ending with abb” or “valid email addresses” but cannot express “balanced parentheses” or “palindromes of arbitrary length”—these require more computational power.

The Finite State Model

The beauty of regular languages lies in their computational model:

States:      Finite set Q = {q₀, q₁, ..., qₙ}
Alphabet:    Finite set Σ = {a, b, ...}
Transitions: δ: Q × Σ → Q (or → P(Q) for NFAs)
Memory:      None! Everything is encoded in the current state

Key insight: A finite automaton has no stack, no tape, no counters—just states and transitions. The current state is the entire memory of the machine. This is why:

We can’t count arbitrarily high (would need infinite states)
We can’t match nested structures (would need a stack)
We can’t remember arbitrary strings (would need a tape)

Regular Languages in Practice

Despite their limitations, regular languages are everywhere:

Lexical analysis: Tokenizing programming languages
Pattern matching: Finding patterns in text (grep, sed, awk)
Protocol specification: Defining valid message sequences
Input validation: Checking formats (phone numbers, emails, URLs)

The restrictions of regular languages become features in practice:

Guaranteed linear time: O(n) processing for string of length n
Constant space: O(1) memory usage regardless of input size
Decidable equivalence: We can algorithmically determine if two regular expressions describe the same language
Closure properties: Closed under union, concatenation, star, complement, and intersection

The Pedagogical Value

Understanding regular languages teaches fundamental concepts:

Abstraction: How simple rules create complex behaviors
Equivalence: NFAs, DFAs, and regular expressions all have the same expressive power
Algorithmic thinking: Subset construction, minimization, and pumping lemma
Theoretical limits: What can and cannot be computed with finite memory

This implementation demonstrates these concepts concretely, allowing you to see how a regex becomes an NFA, how an NFA becomes a DFA, and how a DFA can be minimized—making the abstract tangible.

Features

NFA/DFA Data Structures: Clean, formal implementations of Non-deterministic and Deterministic Finite Automata
Regex to NFA: Thompson’s construction for converting regular expressions to NFAs
NFA to DFA: Subset construction algorithm for converting NFAs to equivalent DFAs
DFA Minimization: Hopcroft’s algorithm for minimizing DFAs
Visualization: Graphviz-based visualization with state highlighting during input processing
Animation: Step-by-step visualization of input string processing

Installation

# Install the package in development mode
pip install -e .

# Or just install dependencies
pip install -r requirements.txt

Command-Line Tool

After installation, use the nfa-tools command:

# Match pattern against string
nfa-tools match '(a|b)*abb' 'aabb'

# Search in files (grep-like)
nfa-tools grep 'a+b+' myfile.txt

# Visualize automaton
nfa-tools visualize '(a|b)*' -o graph.png

# Show statistics
nfa-tools stats 'a*b+c?'

# Test multiple strings
nfa-tools test 'a*b+' ab aab aaab

# Step-by-step subset construction demo
python demo_subset_construction.py '(a|b)*abb'

Quick Start

Basic Usage

from automata import NFA, DFA, State, nfa_to_dfa
from regex_parser import regex_to_nfa
from visualizer import AutomatonVisualizer

# Convert regex to NFA
nfa = regex_to_nfa("(a|b)*abb")

# Convert NFA to DFA
dfa = nfa_to_dfa(nfa)

# Minimize DFA
min_dfa = dfa.minimize()

# Test strings
print(min_dfa.accepts("abb"))    # True
print(min_dfa.accepts("aabb"))   # True
print(min_dfa.accepts("ab"))     # False

# Visualize
viz = AutomatonVisualizer(min_dfa, "DFA for (a|b)*abb")
viz.save("my_dfa", format='png')

Manual NFA Construction

from automata import NFA, State, EPSILON

# Create states
q0 = State('q0')
q1 = State('q1')
q2 = State('q2')

# Create NFA
nfa = NFA(
    states={q0, q1, q2},
    alphabet={'a', 'b'},
    initial=q0,
    accepting={q2}
)

# Add transitions
nfa.add_transition(q0, 'a', q1)
nfa.add_transition(q1, 'b', q2)
nfa.add_transition(q0, EPSILON, q2)  # Epsilon transition

# Test
print(nfa.accepts("ab"))  # True
print(nfa.accepts(""))    # True (via epsilon)

Supported Regular Expression Operators

Concatenation: ab (a followed by b)
Union: a|b (a or b)
Kleene Star: a* (zero or more a’s)
Plus: a+ (one or more a’s)
Optional: a? (zero or one a)
Grouping: (ab)* (zero or more repetitions of ab)

Interactive Demo

Run the interactive demo to experiment with your own regular expressions:

python main.py

The demo includes:

Regex to DFA conversion examples
Manual NFA construction examples
Input processing animation
Interactive mode for custom regular expressions

Testing

Run the test suite:

pytest test_automata.py -v

Architecture

The project follows formal automata theory definitions:

NFA: (Q, Σ, δ, q₀, F) where δ: Q × (Σ ∪ {ε}) → P(Q)
DFA: (Q, Σ, δ, q₀, F) where δ: Q × Σ → Q

Key algorithms implemented:

Thompson’s Construction (Regex → NFA)
Subset Construction (NFA → DFA)
Hopcroft’s Algorithm (DFA Minimization)

Project Structure

nfa_tools/
├── automata.py                    # Core NFA/DFA classes
├── regex_parser.py                # Thompson's construction
├── visualizer.py                  # Graphviz visualization
├── subset_construction_demo.py    # Step-by-step algorithm demo
└── cli.py                        # Command-line interface

tests/
├── test_automata.py              # Core functionality tests
├── test_regex_comparison.py     # Comparison with Python's re
├── test_nested_parentheses.py   # Deep nesting tests
└── test_real_world_patterns.py  # Practical pattern tests

Examples of Non-Regular Languages

To appreciate what regular languages can do, it’s instructive to see what they cannot:

Balanced parentheses: { (ⁿ)ⁿ | n ≥ 0 } - Requires a stack
Palindromes: { w | w = wᴿ } - Requires comparing both ends
Prime numbers: { 1ⁿ | n is prime } - Requires division/multiplication
Equal counts: { aⁿbⁿ | n ≥ 0 } - Requires counting
Nested structures: HTML/XML with proper nesting - Requires recursion

These limitations aren’t flaws—they define the boundary between simple pattern matching (regular) and more complex parsing (context-free and beyond).

License

MIT