Skip to main content

fuzzy-logic-search: Query Documents with Fuzzy Logic

fuzzy-logic-search (fls) brings fuzzy logic to document querying. Unlike traditional Boolean search that returns binary relevant/not-relevant results, fls produces a degree-of-membership score in [0, 1], indicating how well each document matches your query.

Boolean search is rigid: a document either matches or it does not. If you search for “python AND machine-learning,” you get a binary split. A document about Python ML that never uses the exact term “machine-learning” gets zero, same as a document about medieval pottery.

Fuzzy logic captures the gradation that Boolean search throws away.

from fuzzy_logic_search.fuzzy_query import FuzzyQuery
from fuzzy_logic_search.fuzzy_set import FuzzySet

# Construct a query
query = FuzzyQuery("(and python machine-learning)")

# Or use Python operators
q1 = FuzzyQuery("python")
q2 = FuzzyQuery("machine-learning")
query = q1 & q2  # Equivalent to (and python machine-learning)

Query Language

Queries use a Lisp-like syntax that maps to an AST:

; Simple conjunction
(and cat dog)

; With negation
(and cat dog (not fish))

; With fuzzy modifiers
(very (and cat dog))

; Complex nested query
(or (and python ml) (very (not java)))

Or construct directly with Python:

# Using operators
query = FuzzyQuery("cat") & FuzzyQuery("dog") & ~FuzzyQuery("fish")

# Using AST directly
query = FuzzyQuery(['and', 'cat', 'dog', ['not', 'fish']])

I went with S-expressions for the query language because they map directly to the AST. No parsing ambiguity, trivial to serialize, and anyone who has written a Lisp evaluator can understand the implementation in about ten minutes.

Fuzzy Modifiers

Linguistic hedges transform membership values:

# "Very" squares the membership (emphasizes strong matches)
very_query = FuzzyQuery("python").very()
# 0.9 -> 0.81, 0.5 -> 0.25

# "Somewhat" takes square root (broadens tolerance)
somewhat_query = FuzzyQuery("python").somewhat()
# 0.9 -> 0.95, 0.25 -> 0.5

# "Extremely" cubes the membership
extremely_query = FuzzyQuery("python").extremely()

# "Slightly" takes 10th root
slightly_query = FuzzyQuery("python").slightly()

These come from Zadeh’s original fuzzy logic work. “Very” is concentration (squaring), “somewhat” is dilation (square root). They are mathematically clean and semantically intuitive: “very python” means “only documents that are strongly about Python.”

Evaluating Queries

Evaluate queries against a document corpus:

# Documents as lists of terms
docs = [
    ["python", "machine-learning", "tensorflow"],
    ["java", "spring", "microservices"],
    ["python", "web", "flask"],
    ["machine-learning", "neural-networks", "pytorch"]
]

# Evaluate query
query = FuzzyQuery("python") & FuzzyQuery("machine-learning")
result = query.evaluate(docs)  # Returns FuzzySet

# result.memberships = [1.0, 0.0, 0.0, 0.0]
# Only first document has both terms

Custom Membership Functions

The default membership is crisp (term present or not), but you can provide custom functions for more nuanced matching:

def tf_idf_membership(term, doc):
    """Use TF-IDF instead of crisp membership."""
    if term not in doc:
        return 0.0
    tf = doc.count(term) / len(doc)
    # ... compute IDF from corpus
    return min(tf * idf, 1.0)

result = query.evaluate(docs, membership_fn=tf_idf_membership)

This is where it becomes useful for real applications. Swap in TF-IDF, BM25, or embedding cosine similarity as your membership function, and you get fuzzy set operations on top of whatever relevance model you prefer.

Fuzzy Set Operations

Results are FuzzySet objects with set-theoretic operations:

# Evaluate two queries
result1 = query1.evaluate(docs)  # FuzzySet
result2 = query2.evaluate(docs)  # FuzzySet

# Fuzzy intersection (AND) - element-wise min
combined = result1 & result2

# Fuzzy union (OR) - element-wise max
either = result1 | result2

# Fuzzy complement (NOT)
opposite = ~result1

Logical Operators

Operator Fuzzy Operation Effect
and minimum Both conditions must match
or maximum Either condition can match
not 1 - x Inverts membership
sym-diff max - min Symmetric difference
diff max(a - b, 0) Set difference

Homomorphism Property

A key mathematical property: the mapping from queries to results is a homomorphism. Operations on queries translate directly to operations on their result sets:

# These produce identical results:
result1 = (q1 & q2).evaluate(docs)
result2 = q1.evaluate(docs) & q2.evaluate(docs)

# For any operation op:
# (q1 op q2).evaluate(D) = q1.evaluate(D) op q2.evaluate(D)

This is not just a nice-to-have. It means you can optimize query processing by pushing operations down to the result level, or pull them up to the query level, without changing semantics. The algebra is consistent across the abstraction boundary.

JSON Path Queries

For structured JSON documents, use path expressions:

; Query nested fields
(> :user.age 25)

; String predicates
(starts-with? :name "John")

; Combine with logic
(and
  (== :address.city "New York")
  (not (< :age 25)))

Supported Predicates

  • ==, >, <, >=, <= – Numeric comparisons with fuzzy tolerance
  • starts-with?, ends-with?, contains? – String matching
  • in? – Membership in set/range
  • regex? – Regular expression matching
  • jaccard? – Jaccard similarity
  • tf-idf? – TF-IDF scoring
  • lev? – Levenshtein distance-based matching

Use Cases

  • Search Engines: Graded results reflecting partial matches
  • Recommendation Systems: Combine multiple preferences fuzzily
  • Data Analysis: Query JSON datasets with flexible, human-like reasoning
  • Information Retrieval: Beyond binary keyword matching

Installation

pip install fuzzy-logic-search

Resources

Discussion