Skip to content

API Reference

Complete Python API reference for Complex Network RAG.

Table of Contents

  1. NetworkRAG Class
  2. Builder Pattern
  3. Fluent Query API
  4. Result Objects
  5. Batch Operations
  6. Storage API
  7. Embedding Providers
  8. Advanced Usage

NetworkRAG Class

The main interface for building and querying knowledge graphs.

Initialization

from src.network_rag import NetworkRAG
from src.storage import SQLiteStorage
from src.embeddings import TFIDFEmbedding

# Direct initialization
storage = SQLiteStorage('knowledge.db')
embedder = TFIDFEmbedding()
rag = NetworkRAG(storage, embedder, min_similarity=0.7)

# Using builder pattern (recommended)
rag = (NetworkRAG.builder()
       .with_storage('knowledge.db')
       .with_tfidf_embeddings()
       .with_similarity_threshold(0.7)
       .build())

Constructor

NetworkRAG(
    storage: SQLiteStorage,
    embedding_provider: EmbeddingProvider,
    min_similarity: float = 0.7,
    strong_similarity: float = 0.8,
    dsl_config: Optional[DSLConfig] = None
)

Parameters: - storage: SQLiteStorage instance for persistence - embedding_provider: EmbeddingProvider for generating embeddings - min_similarity: Minimum similarity for edge creation (default: 0.7) - strong_similarity: Threshold for strong edges (default: 0.8) - dsl_config: Optional DSLConfig for structured document processing

Adding Documents

add() - Fluent interface

node_id = rag.add(
    content: str,
    id: Optional[str] = None,
    **metadata
) -> str

Parameters: - content: Text content (for simple docs) or None (for structured docs) - id: Optional node ID (auto-generated if not provided) - **metadata: Arbitrary metadata key-value pairs

Returns: Node ID (string)

Example:

# Simple document
node_id = rag.add("This is a document about machine learning")

# With metadata
node_id = rag.add(
    "Another document",
    category='ML',
    author='John Doe',
    year=2024
)

# Custom ID
node_id = rag.add("Document", id='custom-id-123')

add() - Structured documents

node_id = rag.add(
    id: str,
    document: Dict[str, Any]
) -> str

Parameters: - id: Node identifier - document: Structured document matching configuration

Example:

# With YAML config loaded
node_id = rag.add('paper1', document={
    'title': 'Attention Is All You Need',
    'abstract': 'The dominant sequence transduction...',
    'tags': ['transformers', 'attention'],
    'authors': ['Vaswani', 'Shazeer']
})

add_node() - Legacy interface

rag.add_node(
    node_id: str,
    content: str,
    metadata: Optional[Dict[str, Any]] = None,
    source_id: Optional[str] = None,
    node_type: Optional[str] = None
)

Parameters: - node_id: Unique node identifier - content: Text content - metadata: Optional metadata dictionary - source_id: Optional external reference ID - node_type: Optional type classification

Example:

rag.add_node(
    'doc1',
    'Document content',
    metadata={'category': 'ML'},
    source_id='external-123',
    node_type='article'
)

Building the Network

build_network()

graph = rag.build_network(
    rebuild: bool = False
) -> nx.Graph

Parameters: - rebuild: Force rebuild even if network exists (default: False)

Returns: NetworkX Graph object

What it does: 1. Loads embeddings from storage 2. Computes pairwise similarities 3. Creates edges where similarity ≥ min_similarity 4. Stores graph in rag.graph

Example:

# Build network
graph = rag.build_network()
print(f"Nodes: {len(graph.nodes())}")
print(f"Edges: {len(graph.edges())}")

# Rebuild network (after adding new documents)
graph = rag.build_network(rebuild=True)

Searching

search() - Fluent interface

query_builder = rag.search(query: str)

Parameters: - query: Search query text

Returns: QueryBuilder for method chaining

Example:

results = (rag.search("machine learning")
           .with_strategy("hybrid")
           .filter(category="ML")
           .top(10))

See Fluent Query API for details.

find_similar() - Direct interface

node_ids = rag.find_similar(
    query: str,
    n: int = 10,
    strategy: str = "similarity"
) -> List[str]

Parameters: - query: Search query text - n: Number of results to return - strategy: Retrieval strategy ("similarity", "community", "hub", "bridge", "hybrid")

Returns: List of node IDs ordered by relevance

Example:

# Basic search
results = rag.find_similar("transformer models", n=5)

# With strategy
results = rag.find_similar(
    "transformer models",
    n=10,
    strategy="hybrid"
)

Community Detection

detect_communities()

communities = rag.detect_communities() -> Dict[str, int]

Returns: Dict mapping node_id → community_id

Algorithm: Louvain community detection

Example:

communities = rag.detect_communities()

# Count community sizes
from collections import Counter
sizes = Counter(communities.values())
print(f"Found {len(sizes)} communities")

for comm_id, size in sizes.most_common():
    print(f"  Community {comm_id}: {size} nodes")

get_nodes_in_community()

nodes = rag.get_nodes_in_community(
    community_id: int
) -> List[str]

Returns: List of node IDs in the community

Example:

# Get all nodes in community 0
nodes = rag.get_nodes_in_community(0)
for node_id in nodes:
    node = rag.storage.get_node(node_id)
    print(f"{node_id}: {node['content_text'][:50]}...")

get_community_for_node()

community_id = rag.get_community_for_node(
    node_id: str
) -> Optional[int]

Returns: Community ID or None if node not in graph

Example:

comm_id = rag.get_community_for_node('paper1')
print(f"Document 'paper1' is in community {comm_id}")

auto_tag_community()

tags = rag.auto_tag_community(
    community_id: int,
    n_samples: int = 20,
    n_keywords: int = 5
) -> List[str]

Parameters: - community_id: Community to analyze - n_samples: Number of nodes to sample - n_keywords: Number of keywords to extract

Returns: List of distinctive keywords

Example:

tags = rag.auto_tag_community(0, n_samples=30, n_keywords=10)
print(f"Community 0 keywords: {', '.join(tags)}")

Network Analysis

get_hub_nodes()

hubs = rag.get_hub_nodes(
    min_degree: int = 5
) -> List[str]

Parameters: - min_degree: Minimum number of connections

Returns: List of hub node IDs

Example:

hubs = rag.get_hub_nodes(min_degree=10)
print(f"Found {len(hubs)} hub nodes")

for hub_id in hubs:
    degree = rag.graph.degree(hub_id)
    print(f"{hub_id}: {degree} connections")

get_bridge_nodes()

bridges = rag.get_bridge_nodes(
    min_betweenness: float = 0.1
) -> List[str]

Parameters: - min_betweenness: Minimum betweenness centrality (0.0 to 1.0)

Returns: List of bridge node IDs

Example:

bridges = rag.get_bridge_nodes(min_betweenness=0.05)

for bridge_id in bridges:
    # Get communities this bridge connects
    neighbors = rag.get_neighbors(bridge_id, k_hops=1)
    communities = {rag.get_community_for_node(n) for n in neighbors}
    print(f"{bridge_id} connects: {communities}")

get_neighbors()

neighbors = rag.get_neighbors(
    node_id: str,
    k_hops: int = 1
) -> List[str]

Parameters: - node_id: Starting node - k_hops: Number of hops (1 = direct neighbors, 2 = neighbors of neighbors, etc.)

Returns: List of neighbor node IDs

Example:

# Direct neighbors
neighbors1 = rag.get_neighbors('paper1', k_hops=1)

# 2-hop neighborhood
neighbors2 = rag.get_neighbors('paper1', k_hops=2)

print(f"Direct: {len(neighbors1)}, 2-hop: {len(neighbors2)}")

Builder Pattern

The builder pattern provides a fluent interface for constructing NetworkRAG instances.

Basic Usage

from src.network_rag import NetworkRAG

rag = (NetworkRAG.builder()
       .with_storage('data.db')
       .with_tfidf_embeddings()
       .build())

Builder Methods

builder()

builder = NetworkRAG.builder()

Returns: NetworkBuilder instance

with_storage()

builder.with_storage(db_path: str)

Example:

builder.with_storage('knowledge.db')
builder.with_storage(':memory:')  # In-memory database

with_embeddings()

builder.with_embeddings(provider: EmbeddingProvider)

Example:

from src.embeddings import TFIDFEmbedding, OllamaEmbedding

# Custom TF-IDF
tfidf = TFIDFEmbedding(max_features=1000)
builder.with_embeddings(tfidf)

# Ollama
ollama = OllamaEmbedding(host='http://localhost:11434')
builder.with_embeddings(ollama)

with_tfidf_embeddings()

builder.with_tfidf_embeddings(max_features: int = 512)

Example:

builder.with_tfidf_embeddings(max_features=1000)

with_ollama_embeddings()

builder.with_ollama_embeddings(
    model: str = "nomic-embed-text",
    host: str = "http://localhost:11434"
)

Example:

builder.with_ollama_embeddings(
    model='nomic-embed-text',
    host='http://192.168.1.100:11434'
)

with_similarity_threshold()

builder.with_similarity_threshold(
    min_similarity: float,
    strong_similarity: Optional[float] = None
)

Example:

builder.with_similarity_threshold(0.6)
builder.with_similarity_threshold(0.6, strong_similarity=0.8)

from_config()

builder.from_config(config: Union[str, Dict, DSLConfig])

Parameters: - config: YAML file path, dict, or DSLConfig object

Example:

# From YAML file
builder.from_config('config/papers.yaml')

# From dict
config_dict = {...}
builder.from_config(config_dict)

# From DSLConfig
from src.dsl import DSLParser
config = DSLParser.parse('config/papers.yaml')
builder.from_config(config)

with_dsl_config()

builder.with_dsl_config(config: Union[str, DSLConfig])

Example:

from src.dsl import DSLParser

# From YAML file path
builder.with_dsl_config('config/papers.yaml')

# From DSLConfig instance
config = DSLParser.parse('config/papers.yaml')
builder.with_dsl_config(config)

build()

rag = builder.build() -> NetworkRAG

Returns: Configured NetworkRAG instance

Complete Builder Example

from src.network_rag import NetworkRAG

rag = (NetworkRAG.builder()
       .with_storage('papers.db')
       .with_ollama_embeddings(
           model='nomic-embed-text',
           host='http://localhost:11434'
       )
       .with_similarity_threshold(
           min_similarity=0.6,
           strong_similarity=0.8
       )
       .from_config('config/papers.yaml')
       .build())

Fluent Query API

The fluent query API provides method chaining for building complex queries.

QueryBuilder

Created by calling rag.search(query).

with_strategy()

query.with_strategy(strategy: str)

Strategies: - "similarity": Pure cosine similarity (baseline) - "community": Community-aware retrieval - "bridge": Cross-domain via bridges - "hub": Versatile knowledge via hubs - "hybrid": Intelligent combination (recommended)

Example:

results = (rag.search("machine learning")
           .with_strategy("hybrid")
           .top(10))

in_community()

query.in_community(community_id: int)

Example:

# Restrict search to community 0
results = (rag.search("neural networks")
           .in_community(0)
           .top(10))

filter()

query.filter(**kwargs)

Example:

results = (rag.search("transformers")
           .filter(category="ML", year=2024)
           .top(10))

expand_neighbors()

query.expand_neighbors(hops: int = 1)

Example:

# Include 2-hop neighbors of results
results = (rag.search("attention mechanism")
           .expand_neighbors(hops=2)
           .top(10))

prioritize_hubs()

query.prioritize_hubs()

Example:

# Boost hub nodes in results
results = (rag.search("deep learning")
           .prioritize_hubs()
           .top(10))

prioritize_bridges()

query.prioritize_bridges()

Example:

# Boost bridge nodes
results = (rag.search("optimization")
           .prioritize_bridges()
           .top(10))

top()

results = query.top(n: int = 10) -> ResultSet

Returns: ResultSet with top n results

Complete Query Example

results = (rag.search("neural networks")
           .with_strategy("hybrid")
           .in_community(0)
           .filter(year=2024, category="ML")
           .expand_neighbors(hops=2)
           .prioritize_hubs()
           .prioritize_bridges()
           .top(20))

Result Objects

SearchResult

Individual search result with metadata.

Properties: - id: str - Node identifier - content: str - Text content - score: float - Similarity score (0.0 to 1.0) - metadata: Dict[str, Any] - Node metadata - community_id: Optional[int] - Community membership

Example:

result = results[0]
print(f"ID: {result.id}")
print(f"Score: {result.score:.3f}")
print(f"Content: {result.content[:100]}...")
print(f"Community: {result.community_id}")
print(f"Metadata: {result.metadata}")

ResultSet

Collection of SearchResult objects with fluent interface.

Iteration

for result in results:
    print(result.id, result.score)

Indexing

first = results[0]
first_five = results[:5]

Length

count = len(results)

Conversion Methods

ids = results.ids()                    # List[str]
scores = results.scores()              # List[float]
contents = results.contents()          # List[str]
metadata_list = results.metadata_list()  # List[Dict]

Example:

results = rag.search("query").top(10)

# Get just IDs
ids = results.ids()

# Get scores
scores = results.scores()
print(f"Best score: {max(scores)}")

# Get content
contents = results.contents()

Filtering Methods

filtered = results.filter_by_score(min_score: float)
filtered = results.filter_by_metadata(**kwargs)
top_n = results.top(n: int)

Example:

# High-confidence results only
high_conf = results.filter_by_score(0.8)

# Filter by metadata
ml_results = results.filter_by_metadata(category="ML")

# Combine filters
filtered = (results
            .filter_by_score(0.7)
            .filter_by_metadata(year=2024)
            .top(5))

Batch Operations

BatchContext

Context manager for batch additions with automatic network rebuild.

with rag.batch() as batch:
    batch.add(content, **metadata)
    batch.add(content, **metadata)
    # Network automatically rebuilt on exit

Example:

# Batch add many documents
with rag.batch() as batch:
    for doc in documents:
        batch.add(
            doc['content'],
            id=doc['id'],
            category=doc['category']
        )

# Network automatically built after all adds
print(f"Added {len(documents)} documents")

Manual Batch Operations

# Add many documents
for doc in documents:
    rag.add_node(doc['id'], doc['content'], doc['metadata'])

# Rebuild network once
rag.build_network(rebuild=True)

Storage API

SQLiteStorage

Direct storage access (advanced usage).

get_node()

node = storage.get_node(node_id: str) -> Optional[Dict]

Returns: Node dictionary or None

Example:

node = rag.storage.get_node('paper1')
print(f"Content: {node['content_text']}")
print(f"Metadata: {node['metadata']}")

search_nodes()

nodes = storage.search_nodes(filters: Dict) -> List[Dict]

Example:

# All nodes
all_nodes = rag.storage.search_nodes({})

# Filtered nodes
ml_nodes = rag.storage.search_nodes({'metadata': {'category': 'ML'}})

load_embeddings()

embeddings = storage.load_embeddings(
    model_name: Optional[str] = None,
    node_ids: Optional[List[str]] = None
) -> Dict[str, np.ndarray]

Returns: Dict mapping node_id → embedding array

Example:

# All embeddings
all_emb = rag.storage.load_embeddings()

# Specific model
tfidf_emb = rag.storage.load_embeddings(model_name='tfidf')

# Specific nodes
subset = rag.storage.load_embeddings(node_ids=['paper1', 'paper2'])

Embedding Providers

TFIDFEmbedding

Fast TF-IDF embeddings (no external dependencies).

from src.embeddings import TFIDFEmbedding

embedder = TFIDFEmbedding(
    max_features: int = 1000,
    ngram_range: Tuple[int, int] = (1, 1)
)

Example:

embedder = TFIDFEmbedding(max_features=512, ngram_range=(1, 2))

# Must fit before use
texts = ["doc 1", "doc 2", "doc 3"]
embedder.fit(texts)

# Embed
embeddings = embedder.embed(texts)

OllamaEmbedding

Remote Ollama service embeddings.

from src.embeddings import OllamaEmbedding

embedder = OllamaEmbedding(
    host: str = "http://localhost:11434",
    model: str = "nomic-embed-text"
)

Example:

embedder = OllamaEmbedding(
    host='http://192.168.1.100:11434',
    model='nomic-embed-text'
)

# Embed (no fit required)
embeddings = embedder.embed(["text 1", "text 2"])

Custom Embedding Provider

Implement the EmbeddingProvider interface:

from src.embeddings import EmbeddingProvider
import numpy as np

class CustomEmbedder(EmbeddingProvider):
    def embed(self, texts: List[str]) -> np.ndarray:
        # Your embedding logic
        return embeddings

    def get_dimension(self) -> int:
        return self.dimension

    def get_model_name(self) -> str:
        return "custom"

    def get_model_config(self) -> Dict[str, Any]:
        return {}

Advanced Usage

Incremental Updates

# Load existing database
rag = NetworkRAG.builder().with_storage('existing.db').build()

# Add new documents
new_docs = get_new_documents()
for doc in new_docs:
    rag.add(doc['content'], **doc['metadata'])

# Rebuild network (only recomputes for new edges)
rag.build_network(rebuild=True)

Custom Similarity Thresholds

# Per-query thresholds
rag_relaxed = NetworkRAG.builder().with_similarity_threshold(0.5).build()
rag_strict = NetworkRAG.builder().with_similarity_threshold(0.8).build()

# Use different thresholds for different queries
relaxed_results = rag_relaxed.search("broad query").top(20)
strict_results = rag_strict.search("specific query").top(10)

Network Export

import networkx as nx

# Get graph
graph = rag.graph

# Export to various formats
nx.write_gexf(graph, 'network.gexf')
nx.write_graphml(graph, 'network.graphml')
nx.write_edgelist(graph, 'network.edgelist')

Community Analysis

# Detect communities
communities = rag.detect_communities()

# Analyze each community
from collections import defaultdict
comm_groups = defaultdict(list)
for node_id, comm_id in communities.items():
    comm_groups[comm_id].append(node_id)

for comm_id, nodes in comm_groups.items():
    print(f"\nCommunity {comm_id}: {len(nodes)} nodes")

    # Auto-tag
    tags = rag.auto_tag_community(comm_id)
    print(f"  Keywords: {', '.join(tags)}")

    # Sample nodes
    sample = nodes[:5]
    for node_id in sample:
        node = rag.storage.get_node(node_id)
        print(f"  - {node_id}: {node['content_text'][:50]}...")

Similarity Computation

# Get embeddings
embeddings_dict = rag.storage.load_embeddings()

# Compute similarity matrix
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

node_ids = list(embeddings_dict.keys())
embeddings = np.array([embeddings_dict[nid] for nid in node_ids])

similarity_matrix = cosine_similarity(embeddings)

# Find most similar pairs
n = len(node_ids)
for i in range(n):
    for j in range(i+1, n):
        sim = similarity_matrix[i, j]
        if sim >= 0.8:
            print(f"{node_ids[i]} <-> {node_ids[j]}: {sim:.3f}")

See Also


Complex Network RAG API - Pythonic interface for topology-aware retrieval.