Alga is a C++20 header-only library that transforms text manipulation from imperative string processing into declarative mathematical expressions. Built on rigorous algebraic foundations with monoids, functors, and extended operators, it provides compositional parsing with built-in fuzzy matching.
The Core Idea
Instead of treating strings as mutable buffers, Alga treats text as elements of algebraic structures:
#include "parsers/lc_alpha.hpp"
#include "parsers/porter2stemmer.hpp"
#include "parsers/algebraic_operators.hpp"
using namespace alga;
auto word1 = make_lc_alpha("hello");
auto word2 = make_lc_alpha("world");
if (word1 && word2) {
// Monoid concatenation
auto combined = *word1 * *word2; // "helloworld"
// Repetition
auto emphasis = *word1 ^ 3; // "hellohellohello"
// Sequential composition (produces vector)
auto sequence = *word1 >> *word2; // vector["hello", "world"]
// Porter2 stemming with algebraic composition
auto stem = make_porter2_stem("running");
if (stem) {
auto repeated = *stem ^ 2; // "runrun"
}
}
Algebraic Operators
Alga provides mathematically-grounded operators:
| Operator | Meaning | Example |
|---|---|---|
* | Monoid concatenation | *word1 * *word2 |
| | Choice (first valid) | word1 | word2 |
^ | Repetition (n times) | *word ^ 3 |
>> | Sequential (to vector) | *word1 >> *word2 |
List Combinators
Parse separated lists and sequences:
#include "parsers/list_combinators.hpp"
// CSV parsing
auto csv = sepBy(int_parser(), char_parser(','));
auto [pos, nums] = csv.parse("1,2,3"); // vector<int>{1, 2, 3}
// One or more items (fails on empty)
auto csv1 = sepBy1(word_parser(), char_parser(','));
// Optional trailing separator
auto items = sepEndBy(word_parser(), char_parser(';'));
Fuzzy Matching
Parse noisy, imperfect input with built-in fuzzy matching:
#include "parsers/fuzzy_parsers.hpp"
#include "parsers/similarity.hpp"
using namespace alga::fuzzy;
using namespace alga::similarity;
// Accept "hello" with up to 2 typos
auto greeting = fuzzy_match("hello", 2);
greeting.parse("helo"); // ✓ Matches (1 edit)
greeting.parse("heello"); // ✓ Matches (1 edit)
greeting.parse("world"); // ✗ Fails (too different)
// Sound-alike name matching
auto name_parser = phonetic_match("Smith");
name_parser.parse("Smyth"); // ✓ Matches (same Soundex)
// Combined fuzzy: case + phonetic + edit distance
auto flexible = combined_fuzzy("Python", 2);
flexible.parse("python"); // ✓ Case-insensitive
flexible.parse("Pyton"); // ✓ Fuzzy match (1 typo)
// String similarity metrics
auto dist = levenshtein_distance("kitten", "sitting"); // → 3
auto sim = jaro_winkler_similarity("Martha", "Marhta"); // → 0.96
Phonetic Algorithms
Sound-alike word matching:
#include "parsers/phonetic.hpp"
auto code1 = soundex("Smith"); // → "S530"
auto code2 = soundex("Smyth"); // → "S530" (same!)
bool alike = sounds_like_soundex("Robert", "Rupert"); // true
Unicode Support
Full UTF-8 with multi-script alphabetic parsing:
#include "parsers/utf8_alpha.hpp"
auto french = make_utf8_alpha("Café"); // → "café"
auto greek = make_utf8_alpha("Αλφα"); // Greek
auto japanese = make_utf8_alpha("こんにちは"); // Hiragana
auto combined = *french * *greek; // Concatenation works
size_t chars = french->char_count(); // 4 characters, 5 bytes
Numeric Parsers
Type-safe numeric types with algebraic operations:
#include "parsers/numeric_parsers.hpp"
auto num = make_unsigned_int("42");
auto sci = make_scientific_notation("1.5e10");
auto sum = *num * *make_unsigned_int("10"); // Monoid addition
Error Reporting
Rich error diagnostics with position tracking:
#include "parsers/parse_error.hpp"
auto err = ParseError(position, "unexpected character")
.expect("digit")
.expect("letter")
.but_found("!")
.with_context(tracker.get_context(20, 20));
std::cout << err.format();
// error at line 5, column 12: unexpected character
// expected: digit, letter
// found: !
Streaming Support
Memory-efficient large file processing:
#include "parsers/streaming_parser.hpp"
auto parser = from_file("data.txt", word_parser());
parser.parse_by_line([](size_t line_num, const std::string& line, auto result) {
if (result) {
process(*result);
}
});
Statistical Analysis
Text analysis and metrics:
#include "parsers/statistics.hpp"
FrequencyCounter<std::string> counter;
counter.add_all(words);
double entropy = shannon_entropy(counter);
double diversity = simpson_diversity(counter);
auto top10 = counter.top_n(10);
Text Normalization
Text cleaning and standardization:
#include "parsers/normalization.hpp"
auto clean = normalize_text(" HELLO WORLD "); // → "hello world"
auto slug = to_slug("Hello World!"); // → "hello-world"
auto title = to_title_case("hello world"); // → "Hello World"
Mathematical Foundations
Alga implements verified algebraic laws:
// Monoid laws (automatically tested)
ASSERT_EQ((a * b) * c, a * (b * c)); // Associativity
ASSERT_EQ(empty * a, a); // Left identity
ASSERT_EQ(a * empty, a); // Right identity
// Functor laws (automatically tested)
ASSERT_EQ(fmap(id, parser), parser); // Identity preservation
Installation
Header-only—just include the headers you need:
#include "parsers/lc_alpha.hpp"
#include "parsers/fuzzy_parsers.hpp"
// etc.
Or with CMake:
add_subdirectory(alga)
target_link_libraries(your_target PRIVATE alga::alga)
Requirements: C++20 compatible compiler (GCC 10+, Clang 10+, MSVC 2019+).
Resources
- GitHub: github.com/queelius/alga
- Tests: 425 tests, 100% passing
Alga: When parsing becomes algebra, complexity becomes composition.
Discussion