algebraic_search package¶
Submodules¶
algebraic_search.boolean_query module¶
- class algebraic_search.boolean_query.BooleanQuery(query: str | List = None)¶
Bases:
object
A Boolean algebra for constructing and evaluating queries. We define a Boolean Query language for parsing, generating, and evaluating Boolean queries. The query language is based on the following theoretical framework:
## Formal Theory
Q = (P(T*), and, or, not, {}, T*)
where: - T is the set of all characters, - T* is the set of all strings of characters, - {} is the empty set, - P(T*) is the power set of T*.
- This framework allows constructing queries such as:
“(or (and cat dog) (not (or fish bird)))”
- which is internally represented as:
[‘or’, [‘and’, ‘cat’, ‘dog’], [‘not’, [‘or’, ‘fish’, ‘bird’]]]
- Queries can also be combined using Python operators:
Q1 & Q2 # Represents logical AND Q1 | Q2 # Represents logical OR ~Q1 # Represents logical NOT
## Evaluation
The evaluation of queries is performed by the eval method, which takes a list of documents and returns a ResultQuery instance indicating which documents match the query. The ResultQuery instance is itself a Boolean algebra, where the query results are elements of the algebra. Thus, the evaluation function eval serves as a homomorphism eval: Q -> R that preserves the algebraic structure. See ResultQuery for more details.
- eval(docs: List) ResultQuery ¶
Evaluate the query against a list of documents.
- Parameters:
docs – A list where each document has method for determining if it
term (contains a)
__contains__.
- Returns:
A list indicating whether each document matches the query, 1 for a match and 0 for a non-match.
- Return type:
List[float]
- tokenize(query: str) List ¶
Tokenize the input query string into a nested list structure.
- Parameters:
query (str) – The query string.
- Returns:
A nested list representing the parsed query.
- Return type:
list
- Raises:
ValueError – If there are mismatched parentheses or unexpected tokens.
algebraic_search.fuzzy_boolean_query module¶
- class algebraic_search.fuzzy_boolean_query.FuzzyBooleanQuery(query: str | List = None)¶
Bases:
object
A fuzzy Boolean algebra for constructing and evaluating queries with degrees of membership.
## Formal Theory
Fuzzy Boolean Algebra for Queries
We define two algebras in this context:
- Fuzzy Query Algebra (Q_f):
Elements: Q_f = P(T*) where T* is the set of all finite strings composed of ASCII characters. P(T*) represents the power set of T*, i.e., all possible subsets of T*.
- Operations:
AND (`&`): Fuzzy intersection using minimum.
OR (`|`): Fuzzy union using maximum.
NOT (`~`): Fuzzy complement (1.0 - x).
Modifiers: Such as very, somewhat, etc., to adjust degrees of membership.
- eval(docs: ~typing.List, ranker: ~typing.Callable = <function FuzzyBooleanQuery.<lambda>>) ResultQuery ¶
Evaluate the fuzzy query against one or more documents.
- Parameters:
docs – A list of documents where each document is a set of terms, or a single document represented as a set of terms.
ranker (Callable, optional) – A function that computes the degree of membership of a term in a document. Defaults to a function that returns 1.0 if the term is in the document, and 0.0 otherwise. This is also know as the set-membership function, i.e., it generates a Boolean query (0 or 1). All the modifiers like very will thus have no effect.
- Returns:
- An instance containing a list of float scores
indicating the degree of membership for each document.
- Return type:
- extremely() FuzzyBooleanQuery ¶
Apply the extremely modifier to the fuzzy query.
- Returns:
A new FuzzyBooleanQuery with the extremely modifier applied.
- Return type:
- nary_ops = {'and': <built-in function min>, 'or': <built-in function max>}¶
- ops = ['very', 'somewhat', 'slightly', 'extremely', 'not', 'and', 'or']¶
- slightly() FuzzyBooleanQuery ¶
Apply the slightly modifier to the fuzzy query.
- Returns:
A new FuzzyBooleanQuery with the slightly modifier applied.
- Return type:
- somewhat() FuzzyBooleanQuery ¶
Apply the somewhat modifier to the fuzzy query.
- Returns:
A new FuzzyBooleanQuery with the somewhat modifier applied.
- Return type:
- tokenize(query: str) List ¶
Tokenize the input query string into a nested list structure, recognizing fuzzy modifiers.
- Parameters:
query (str) – The query string.
- Returns:
A nested list representing the parsed query.
- Return type:
list
- Raises:
ValueError – If there are mismatched parentheses or unexpected tokens.
- unary_ops = {'extremely': <function FuzzyBooleanQuery.<lambda>>, 'not': <function FuzzyBooleanQuery.<lambda>>, 'slightly': <function FuzzyBooleanQuery.<lambda>>, 'somewhat': <function FuzzyBooleanQuery.<lambda>>, 'very': <function FuzzyBooleanQuery.<lambda>>}¶
- very() FuzzyBooleanQuery ¶
Apply the very modifier to the fuzzy query.
- Returns:
A new FuzzyBooleanQuery with the very modifier applied.
- Return type:
algebraic_search.result_query module¶
- class algebraic_search.result_query.ResultQuery(scores: List[float])¶
Bases:
object
A class representing the evaluation results of a query.
## Formal Theory: Boolean Algebra Over Query Results
The evaluation results of a query can be represented as a Boolean algebra, where the query results are elements of the algebra. Using conventional notation, the algebra is defined as follows:
(R=P([0, 1]^n), and=&, or=|, not=~, bottom=0, top=1)
R = [r_1, r_2, …, r_n] where each r_i ∈ [0, 1] represents the degree-of-membership of the i-th document in the query result.
& is the element-wise minimum.
| is the element-wise maximum.
~ is the element-wise complement (1 - r_i).
These operations form a Boolean algebra when results are binary (0 or 1) and a fuzzy Boolean algebra when results are in the continuous range [0, 1].
## Homomorphism between queries (like BooleanQuery) and ResultQuery
The evaluation function eval in, for instance, the BooleanQuery class serves as a homomorphism eval: Q -> R that preserves the algebraic structure:
eval(Q1 & Q2) = eval(Q1) & eval(Q2)
eval(Q1 | Q2) = eval(Q1) | eval(Q2)
eval(~Q1) = ~eval(Q1)
This ensures that the logical composition of queries translates appropriately to the combination of their evaluation results.
## Fuzzy Operations
We also provide a range of built-in fuzzy operations that can be applied to the evaluation results of a query:
very: Squares the degree-of-membership of each document.
somewhat: Takes the square root of the degree-of-membership of each document.
slightly: Takes the 10th root of the degree-of-membership of each document.
extremely: Cubes the degree-of-membership of each document.
- binary: Maps the degree-of-membership of each document to 0 if it is less
than 0.5, and 1 otherwise.
true: Maps the degree-of-membership of each document to 1.
false: Maps the degree-of-membership of each document to 0.
map: Maps the degree-of-membership of each document to 0 if it is less than a specified threshold, and 1 otherwise.
Of course, you are free to define your own fuzzy operations as needed.
- binary() ResultQuery ¶
- extremely() ResultQuery ¶
- false() ResultQuery ¶
- map(threshold: float) ResultQuery ¶
- slightly() ResultQuery ¶
- somewhat() ResultQuery ¶
- true() ResultQuery ¶
- very() ResultQuery ¶