algebraic_search package¶
Submodules¶
algebraic_search.boolean_query module¶
- class algebraic_search.boolean_query.BooleanQuery(query: str | List = None)¶
- Bases: - object- A Boolean algebra for constructing and evaluating queries. We define a Boolean Query language for parsing, generating, and evaluating Boolean queries. The query language is based on the following theoretical framework: - ## Formal Theory - Q = (P(T*), and, or, not, {}, T*) - where: - T is the set of all characters, - T* is the set of all strings of characters, - {} is the empty set, - P(T*) is the power set of T*. - This framework allows constructing queries such as:
- “(or (and cat dog) (not (or fish bird)))” 
- which is internally represented as:
- [‘or’, [‘and’, ‘cat’, ‘dog’], [‘not’, [‘or’, ‘fish’, ‘bird’]]] 
- Queries can also be combined using Python operators:
- Q1 & Q2 # Represents logical AND Q1 | Q2 # Represents logical OR ~Q1 # Represents logical NOT 
 - ## Evaluation - The evaluation of queries is performed by the eval method, which takes a list of documents and returns a ResultQuery instance indicating which documents match the query. The ResultQuery instance is itself a Boolean algebra, where the query results are elements of the algebra. Thus, the evaluation function eval serves as a homomorphism eval: Q -> R that preserves the algebraic structure. See ResultQuery for more details. - eval(docs: List) ResultQuery¶
- Evaluate the query against a list of documents. - Parameters:
- docs – A list where each document has method for determining if it 
- term (contains a) 
- __contains__. 
 
- Returns:
- A list indicating whether each document matches the query, 1 for a match and 0 for a non-match. 
- Return type:
- List[float] 
 
 - tokenize(query: str) List¶
- Tokenize the input query string into a nested list structure. - Parameters:
- query (str) – The query string. 
- Returns:
- A nested list representing the parsed query. 
- Return type:
- list 
- Raises:
- ValueError – If there are mismatched parentheses or unexpected tokens. 
 
 
algebraic_search.fuzzy_boolean_query module¶
- class algebraic_search.fuzzy_boolean_query.FuzzyBooleanQuery(query: str | List = None)¶
- Bases: - object- A fuzzy Boolean algebra for constructing and evaluating queries with degrees of membership. - ## Formal Theory - Fuzzy Boolean Algebra for Queries - We define two algebras in this context: - Fuzzy Query Algebra (Q_f):
- Elements: Q_f = P(T*) where T* is the set of all finite strings composed of ASCII characters. P(T*) represents the power set of T*, i.e., all possible subsets of T*. 
- Operations:
- AND (`&`): Fuzzy intersection using minimum. 
- OR (`|`): Fuzzy union using maximum. 
- NOT (`~`): Fuzzy complement (1.0 - x). 
- Modifiers: Such as very, somewhat, etc., to adjust degrees of membership. 
 
 
 
 
 - eval(docs: ~typing.List, ranker: ~typing.Callable = <function FuzzyBooleanQuery.<lambda>>) ResultQuery¶
- Evaluate the fuzzy query against one or more documents. - Parameters:
- docs – A list of documents where each document is a set of terms, or a single document represented as a set of terms. 
- ranker (Callable, optional) – A function that computes the degree of membership of a term in a document. Defaults to a function that returns 1.0 if the term is in the document, and 0.0 otherwise. This is also know as the set-membership function, i.e., it generates a Boolean query (0 or 1). All the modifiers like very will thus have no effect. 
 
- Returns:
- An instance containing a list of float scores
- indicating the degree of membership for each document. 
 
- Return type:
 
 - extremely() FuzzyBooleanQuery¶
- Apply the extremely modifier to the fuzzy query. - Returns:
- A new FuzzyBooleanQuery with the extremely modifier applied. 
- Return type:
 
 - nary_ops = {'and': <built-in function min>, 'or': <built-in function max>}¶
 - ops = ['very', 'somewhat', 'slightly', 'extremely', 'not', 'and', 'or']¶
 - slightly() FuzzyBooleanQuery¶
- Apply the slightly modifier to the fuzzy query. - Returns:
- A new FuzzyBooleanQuery with the slightly modifier applied. 
- Return type:
 
 - somewhat() FuzzyBooleanQuery¶
- Apply the somewhat modifier to the fuzzy query. - Returns:
- A new FuzzyBooleanQuery with the somewhat modifier applied. 
- Return type:
 
 - tokenize(query: str) List¶
- Tokenize the input query string into a nested list structure, recognizing fuzzy modifiers. - Parameters:
- query (str) – The query string. 
- Returns:
- A nested list representing the parsed query. 
- Return type:
- list 
- Raises:
- ValueError – If there are mismatched parentheses or unexpected tokens. 
 
 - unary_ops = {'extremely': <function FuzzyBooleanQuery.<lambda>>, 'not': <function FuzzyBooleanQuery.<lambda>>, 'slightly': <function FuzzyBooleanQuery.<lambda>>, 'somewhat': <function FuzzyBooleanQuery.<lambda>>, 'very': <function FuzzyBooleanQuery.<lambda>>}¶
 - very() FuzzyBooleanQuery¶
- Apply the very modifier to the fuzzy query. - Returns:
- A new FuzzyBooleanQuery with the very modifier applied. 
- Return type:
 
 
algebraic_search.result_query module¶
- class algebraic_search.result_query.ResultQuery(scores: List[float])¶
- Bases: - object- A class representing the evaluation results of a query. - ## Formal Theory: Boolean Algebra Over Query Results - The evaluation results of a query can be represented as a Boolean algebra, where the query results are elements of the algebra. Using conventional notation, the algebra is defined as follows: - (R=P([0, 1]^n), and=&, or=|, not=~, bottom=0, top=1) - R = [r_1, r_2, …, r_n] where each r_i ∈ [0, 1] represents the degree-of-membership of the i-th document in the query result. 
- & is the element-wise minimum. 
- | is the element-wise maximum. 
- ~ is the element-wise complement (1 - r_i). 
 - These operations form a Boolean algebra when results are binary (0 or 1) and a fuzzy Boolean algebra when results are in the continuous range [0, 1]. - ## Homomorphism between queries (like BooleanQuery) and ResultQuery - The evaluation function eval in, for instance, the BooleanQuery class serves as a homomorphism eval: Q -> R that preserves the algebraic structure: - eval(Q1 & Q2) = eval(Q1) & eval(Q2) 
- eval(Q1 | Q2) = eval(Q1) | eval(Q2) 
- eval(~Q1) = ~eval(Q1) 
 - This ensures that the logical composition of queries translates appropriately to the combination of their evaluation results. - ## Fuzzy Operations - We also provide a range of built-in fuzzy operations that can be applied to the evaluation results of a query: - very: Squares the degree-of-membership of each document. 
- somewhat: Takes the square root of the degree-of-membership of each document. 
- slightly: Takes the 10th root of the degree-of-membership of each document. 
- extremely: Cubes the degree-of-membership of each document. 
- binary: Maps the degree-of-membership of each document to 0 if it is less
- than 0.5, and 1 otherwise. 
 
- true: Maps the degree-of-membership of each document to 1. 
- false: Maps the degree-of-membership of each document to 0. 
- map: Maps the degree-of-membership of each document to 0 if it is less than a specified threshold, and 1 otherwise. 
 - Of course, you are free to define your own fuzzy operations as needed. - binary() ResultQuery¶
 - extremely() ResultQuery¶
 - false() ResultQuery¶
 - map(threshold: float) ResultQuery¶
 - slightly() ResultQuery¶
 - somewhat() ResultQuery¶
 - true() ResultQuery¶
 - very() ResultQuery¶