algebraic_search package

Submodules

algebraic_search.boolean_query module

class algebraic_search.boolean_query.BooleanQuery(query: str | List = None)

Bases: object

A Boolean algebra for constructing and evaluating queries. We define a Boolean Query language for parsing, generating, and evaluating Boolean queries. The query language is based on the following theoretical framework:

## Formal Theory

Q = (P(T*), and, or, not, {}, T*)

where: - T is the set of all characters, - T* is the set of all strings of characters, - {} is the empty set, - P(T*) is the power set of T*.

This framework allows constructing queries such as:

“(or (and cat dog) (not (or fish bird)))”

which is internally represented as:

[‘or’, [‘and’, ‘cat’, ‘dog’], [‘not’, [‘or’, ‘fish’, ‘bird’]]]

Queries can also be combined using Python operators:

Q1 & Q2 # Represents logical AND Q1 | Q2 # Represents logical OR ~Q1 # Represents logical NOT

## Evaluation

The evaluation of queries is performed by the eval method, which takes a list of documents and returns a ResultQuery instance indicating which documents match the query. The ResultQuery instance is itself a Boolean algebra, where the query results are elements of the algebra. Thus, the evaluation function eval serves as a homomorphism eval: Q -> R that preserves the algebraic structure. See ResultQuery for more details.

eval(docs: List) ResultQuery

Evaluate the query against a list of documents.

Parameters:
  • docs – A list where each document has method for determining if it

  • term (contains a)

  • __contains__.

Returns:

A list indicating whether each document matches the query, 1 for a match and 0 for a non-match.

Return type:

List[float]

tokenize(query: str) List

Tokenize the input query string into a nested list structure.

Parameters:

query (str) – The query string.

Returns:

A nested list representing the parsed query.

Return type:

list

Raises:

ValueError – If there are mismatched parentheses or unexpected tokens.

algebraic_search.fuzzy_boolean_query module

class algebraic_search.fuzzy_boolean_query.FuzzyBooleanQuery(query: str | List = None)

Bases: object

A fuzzy Boolean algebra for constructing and evaluating queries with degrees of membership.

## Formal Theory

Fuzzy Boolean Algebra for Queries

We define two algebras in this context:

  1. Fuzzy Query Algebra (Q_f):
    • Elements: Q_f = P(T*) where T* is the set of all finite strings composed of ASCII characters. P(T*) represents the power set of T*, i.e., all possible subsets of T*.

    • Operations:
      • AND (`&`): Fuzzy intersection using minimum.

      • OR (`|`): Fuzzy union using maximum.

      • NOT (`~`): Fuzzy complement (1.0 - x).

      • Modifiers: Such as very, somewhat, etc., to adjust degrees of membership.

eval(docs: ~typing.List, ranker: ~typing.Callable = <function FuzzyBooleanQuery.<lambda>>) ResultQuery

Evaluate the fuzzy query against one or more documents.

Parameters:
  • docs – A list of documents where each document is a set of terms, or a single document represented as a set of terms.

  • ranker (Callable, optional) – A function that computes the degree of membership of a term in a document. Defaults to a function that returns 1.0 if the term is in the document, and 0.0 otherwise. This is also know as the set-membership function, i.e., it generates a Boolean query (0 or 1). All the modifiers like very will thus have no effect.

Returns:

An instance containing a list of float scores

indicating the degree of membership for each document.

Return type:

ResultQuery

extremely() FuzzyBooleanQuery

Apply the extremely modifier to the fuzzy query.

Returns:

A new FuzzyBooleanQuery with the extremely modifier applied.

Return type:

FuzzyBooleanQuery

nary_ops = {'and': <built-in function min>, 'or': <built-in function max>}
ops = ['very', 'somewhat', 'slightly', 'extremely', 'not', 'and', 'or']
slightly() FuzzyBooleanQuery

Apply the slightly modifier to the fuzzy query.

Returns:

A new FuzzyBooleanQuery with the slightly modifier applied.

Return type:

FuzzyBooleanQuery

somewhat() FuzzyBooleanQuery

Apply the somewhat modifier to the fuzzy query.

Returns:

A new FuzzyBooleanQuery with the somewhat modifier applied.

Return type:

FuzzyBooleanQuery

tokenize(query: str) List

Tokenize the input query string into a nested list structure, recognizing fuzzy modifiers.

Parameters:

query (str) – The query string.

Returns:

A nested list representing the parsed query.

Return type:

list

Raises:

ValueError – If there are mismatched parentheses or unexpected tokens.

unary_ops = {'extremely': <function FuzzyBooleanQuery.<lambda>>, 'not': <function FuzzyBooleanQuery.<lambda>>, 'slightly': <function FuzzyBooleanQuery.<lambda>>, 'somewhat': <function FuzzyBooleanQuery.<lambda>>, 'very': <function FuzzyBooleanQuery.<lambda>>}
very() FuzzyBooleanQuery

Apply the very modifier to the fuzzy query.

Returns:

A new FuzzyBooleanQuery with the very modifier applied.

Return type:

FuzzyBooleanQuery

algebraic_search.result_query module

class algebraic_search.result_query.ResultQuery(scores: List[float])

Bases: object

A class representing the evaluation results of a query.

## Formal Theory: Boolean Algebra Over Query Results

The evaluation results of a query can be represented as a Boolean algebra, where the query results are elements of the algebra. Using conventional notation, the algebra is defined as follows:

(R=P([0, 1]^n), and=&, or=|, not=~, bottom=0, top=1)

  • R = [r_1, r_2, …, r_n] where each r_i ∈ [0, 1] represents the degree-of-membership of the i-th document in the query result.

  • & is the element-wise minimum.

  • | is the element-wise maximum.

  • ~ is the element-wise complement (1 - r_i).

These operations form a Boolean algebra when results are binary (0 or 1) and a fuzzy Boolean algebra when results are in the continuous range [0, 1].

## Homomorphism between queries (like BooleanQuery) and ResultQuery

The evaluation function eval in, for instance, the BooleanQuery class serves as a homomorphism eval: Q -> R that preserves the algebraic structure:

  • eval(Q1 & Q2) = eval(Q1) & eval(Q2)

  • eval(Q1 | Q2) = eval(Q1) | eval(Q2)

  • eval(~Q1) = ~eval(Q1)

This ensures that the logical composition of queries translates appropriately to the combination of their evaluation results.

## Fuzzy Operations

We also provide a range of built-in fuzzy operations that can be applied to the evaluation results of a query:

  • very: Squares the degree-of-membership of each document.

  • somewhat: Takes the square root of the degree-of-membership of each document.

  • slightly: Takes the 10th root of the degree-of-membership of each document.

  • extremely: Cubes the degree-of-membership of each document.

  • binary: Maps the degree-of-membership of each document to 0 if it is less

    than 0.5, and 1 otherwise.

  • true: Maps the degree-of-membership of each document to 1.

  • false: Maps the degree-of-membership of each document to 0.

  • map: Maps the degree-of-membership of each document to 0 if it is less than a specified threshold, and 1 otherwise.

Of course, you are free to define your own fuzzy operations as needed.

binary() ResultQuery
extremely() ResultQuery
false() ResultQuery
map(threshold: float) ResultQuery
slightly() ResultQuery
somewhat() ResultQuery
true() ResultQuery
very() ResultQuery