Skip to main content

fuzzy-logic-search: Query Documents with Fuzzy Logic

fuzzy-logic-search (fls) brings fuzzy logic to document querying. Unlike traditional Boolean search that returns binary relevant/not-relevant results, fls produces a degree-of-membership score in [0, 1], indicating how well each document matches your query.

The Core Insight

Boolean search is rigid: a document either matches or it doesn’t. Fuzzy logic captures nuance through gradation:

from fuzzy_logic_search.fuzzy_query import FuzzyQuery
from fuzzy_logic_search.fuzzy_set import FuzzySet

# Construct a query
query = FuzzyQuery("(and python machine-learning)")

# Or use Python operators
q1 = FuzzyQuery("python")
q2 = FuzzyQuery("machine-learning")
query = q1 & q2  # Equivalent to (and python machine-learning)

Query Language

Queries use a Lisp-like syntax that maps to an AST:

; Simple conjunction
(and cat dog)

; With negation
(and cat dog (not fish))

; With fuzzy modifiers
(very (and cat dog))

; Complex nested query
(or (and python ml) (very (not java)))

Or construct directly with Python:

# Using operators
query = FuzzyQuery("cat") & FuzzyQuery("dog") & ~FuzzyQuery("fish")

# Using AST directly
query = FuzzyQuery(['and', 'cat', 'dog', ['not', 'fish']])

Fuzzy Modifiers

Linguistic hedges transform membership values:

# "Very" squares the membership (emphasizes strong matches)
very_query = FuzzyQuery("python").very()
# 0.9 → 0.81, 0.5 → 0.25

# "Somewhat" takes square root (broadens tolerance)
somewhat_query = FuzzyQuery("python").somewhat()
# 0.9 → 0.95, 0.25 → 0.5

# "Extremely" cubes the membership
extremely_query = FuzzyQuery("python").extremely()

# "Slightly" takes 10th root
slightly_query = FuzzyQuery("python").slightly()

Evaluating Queries

Evaluate queries against a document corpus:

# Documents as lists of terms
docs = [
    ["python", "machine-learning", "tensorflow"],
    ["java", "spring", "microservices"],
    ["python", "web", "flask"],
    ["machine-learning", "neural-networks", "pytorch"]
]

# Evaluate query
query = FuzzyQuery("python") & FuzzyQuery("machine-learning")
result = query.eval(docs)  # Returns FuzzySet

# result.memberships = [1.0, 0.0, 0.0, 0.0]
# Only first document has both terms

Custom Membership Functions

Provide custom functions for nuanced matching:

def tf_idf_membership(term, doc):
    """Use TF-IDF instead of crisp membership."""
    if term not in doc:
        return 0.0
    tf = doc.count(term) / len(doc)
    # ... compute IDF from corpus
    return min(tf * idf, 1.0)

result = query.eval(docs, membership_fn=tf_idf_membership)

Fuzzy Set Operations

Results are FuzzySet objects with set-theoretic operations:

# Evaluate two queries
result1 = query1.eval(docs)  # FuzzySet
result2 = query2.eval(docs)  # FuzzySet

# Fuzzy intersection (AND) - element-wise min
combined = result1 & result2

# Fuzzy union (OR) - element-wise max
either = result1 | result2

# Fuzzy complement (NOT)
opposite = ~result1

Logical Operators

OperatorFuzzy OperationEffect
andminimumBoth conditions must match
ormaximumEither condition can match
not1 - xInverts membership
sym-diffmax - minSymmetric difference
diffmax(a - b, 0)Set difference

Homomorphism Property

A key mathematical property: the mapping from queries to results is a homomorphism. Operations on queries translate directly to operations on their result sets:

# These produce identical results:
result1 = (q1 & q2).eval(docs)
result2 = q1.eval(docs) & q2.eval(docs)

# For any operation op:
# (q1 op q2).eval(D) = q1.eval(D) op q2.eval(D)

This ensures consistency: whether you combine fuzzy sets at the query level or result level, you arrive at the same final degrees of membership.

JSON Path Queries

For structured JSON documents, use path expressions:

; Query nested fields
(> :user.age 25)

; String predicates
(starts-with? :name "John")

; Combine with logic
(and
  (== :address.city "New York")
  (not (< :age 25)))

Supported Predicates

  • ==, >, <, >=, <= - Numeric comparisons with fuzzy tolerance
  • starts-with?, ends-with?, contains? - String matching
  • in? - Membership in set/range
  • regex? - Regular expression matching
  • jaccard? - Jaccard similarity
  • tf-idf? - TF-IDF scoring
  • lev? - Levenshtein distance-based matching

Use Cases

  • Search Engines: Graded results reflecting partial matches
  • Recommendation Systems: Combine multiple preferences fuzzily
  • Data Analysis: Query JSON datasets with flexible, human-like reasoning
  • Information Retrieval: Beyond binary keyword matching

Installation

pip install fuzzy-logic-search

Resources


fuzzy-logic-search: Because real-world queries aren’t black and white.

Discussion