API Reference¶
Complete Python API reference for Complex Network RAG.
Table of Contents¶
- NetworkRAG Class
- Builder Pattern
- Fluent Query API
- Result Objects
- Batch Operations
- Storage API
- Embedding Providers
- Advanced Usage
NetworkRAG Class¶
The main interface for building and querying knowledge graphs.
Initialization¶
from src.network_rag import NetworkRAG
from src.storage import SQLiteStorage
from src.embeddings import TFIDFEmbedding
# Direct initialization
storage = SQLiteStorage('knowledge.db')
embedder = TFIDFEmbedding()
rag = NetworkRAG(storage, embedder, min_similarity=0.7)
# Using builder pattern (recommended)
rag = (NetworkRAG.builder()
.with_storage('knowledge.db')
.with_tfidf_embeddings()
.with_similarity_threshold(0.7)
.build())
Constructor¶
NetworkRAG(
storage: SQLiteStorage,
embedding_provider: EmbeddingProvider,
min_similarity: float = 0.7,
strong_similarity: float = 0.8,
dsl_config: Optional[DSLConfig] = None
)
Parameters:
- storage: SQLiteStorage instance for persistence
- embedding_provider: EmbeddingProvider for generating embeddings
- min_similarity: Minimum similarity for edge creation (default: 0.7)
- strong_similarity: Threshold for strong edges (default: 0.8)
- dsl_config: Optional DSLConfig for structured document processing
Adding Documents¶
add() - Fluent interface¶
Parameters:
- content: Text content (for simple docs) or None (for structured docs)
- id: Optional node ID (auto-generated if not provided)
- **metadata: Arbitrary metadata key-value pairs
Returns: Node ID (string)
Example:
# Simple document
node_id = rag.add("This is a document about machine learning")
# With metadata
node_id = rag.add(
"Another document",
category='ML',
author='John Doe',
year=2024
)
# Custom ID
node_id = rag.add("Document", id='custom-id-123')
add() - Structured documents¶
Parameters:
- id: Node identifier
- document: Structured document matching configuration
Example:
# With YAML config loaded
node_id = rag.add('paper1', document={
'title': 'Attention Is All You Need',
'abstract': 'The dominant sequence transduction...',
'tags': ['transformers', 'attention'],
'authors': ['Vaswani', 'Shazeer']
})
add_node() - Legacy interface¶
rag.add_node(
node_id: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
source_id: Optional[str] = None,
node_type: Optional[str] = None
)
Parameters:
- node_id: Unique node identifier
- content: Text content
- metadata: Optional metadata dictionary
- source_id: Optional external reference ID
- node_type: Optional type classification
Example:
rag.add_node(
'doc1',
'Document content',
metadata={'category': 'ML'},
source_id='external-123',
node_type='article'
)
Building the Network¶
build_network()¶
Parameters:
- rebuild: Force rebuild even if network exists (default: False)
Returns: NetworkX Graph object
What it does:
1. Loads embeddings from storage
2. Computes pairwise similarities
3. Creates edges where similarity ≥ min_similarity
4. Stores graph in rag.graph
Example:
# Build network
graph = rag.build_network()
print(f"Nodes: {len(graph.nodes())}")
print(f"Edges: {len(graph.edges())}")
# Rebuild network (after adding new documents)
graph = rag.build_network(rebuild=True)
Searching¶
search() - Fluent interface¶
Parameters:
- query: Search query text
Returns: QueryBuilder for method chaining
Example:
See Fluent Query API for details.
find_similar() - Direct interface¶
Parameters:
- query: Search query text
- n: Number of results to return
- strategy: Retrieval strategy ("similarity", "community", "hub", "bridge", "hybrid")
Returns: List of node IDs ordered by relevance
Example:
# Basic search
results = rag.find_similar("transformer models", n=5)
# With strategy
results = rag.find_similar(
"transformer models",
n=10,
strategy="hybrid"
)
Community Detection¶
detect_communities()¶
Returns: Dict mapping node_id → community_id
Algorithm: Louvain community detection
Example:
communities = rag.detect_communities()
# Count community sizes
from collections import Counter
sizes = Counter(communities.values())
print(f"Found {len(sizes)} communities")
for comm_id, size in sizes.most_common():
print(f" Community {comm_id}: {size} nodes")
get_nodes_in_community()¶
Returns: List of node IDs in the community
Example:
# Get all nodes in community 0
nodes = rag.get_nodes_in_community(0)
for node_id in nodes:
node = rag.storage.get_node(node_id)
print(f"{node_id}: {node['content_text'][:50]}...")
get_community_for_node()¶
Returns: Community ID or None if node not in graph
Example:
comm_id = rag.get_community_for_node('paper1')
print(f"Document 'paper1' is in community {comm_id}")
auto_tag_community()¶
tags = rag.auto_tag_community(
community_id: int,
n_samples: int = 20,
n_keywords: int = 5
) -> List[str]
Parameters:
- community_id: Community to analyze
- n_samples: Number of nodes to sample
- n_keywords: Number of keywords to extract
Returns: List of distinctive keywords
Example:
tags = rag.auto_tag_community(0, n_samples=30, n_keywords=10)
print(f"Community 0 keywords: {', '.join(tags)}")
Network Analysis¶
get_hub_nodes()¶
Parameters:
- min_degree: Minimum number of connections
Returns: List of hub node IDs
Example:
hubs = rag.get_hub_nodes(min_degree=10)
print(f"Found {len(hubs)} hub nodes")
for hub_id in hubs:
degree = rag.graph.degree(hub_id)
print(f"{hub_id}: {degree} connections")
get_bridge_nodes()¶
Parameters:
- min_betweenness: Minimum betweenness centrality (0.0 to 1.0)
Returns: List of bridge node IDs
Example:
bridges = rag.get_bridge_nodes(min_betweenness=0.05)
for bridge_id in bridges:
# Get communities this bridge connects
neighbors = rag.get_neighbors(bridge_id, k_hops=1)
communities = {rag.get_community_for_node(n) for n in neighbors}
print(f"{bridge_id} connects: {communities}")
get_neighbors()¶
Parameters:
- node_id: Starting node
- k_hops: Number of hops (1 = direct neighbors, 2 = neighbors of neighbors, etc.)
Returns: List of neighbor node IDs
Example:
# Direct neighbors
neighbors1 = rag.get_neighbors('paper1', k_hops=1)
# 2-hop neighborhood
neighbors2 = rag.get_neighbors('paper1', k_hops=2)
print(f"Direct: {len(neighbors1)}, 2-hop: {len(neighbors2)}")
Builder Pattern¶
The builder pattern provides a fluent interface for constructing NetworkRAG instances.
Basic Usage¶
from src.network_rag import NetworkRAG
rag = (NetworkRAG.builder()
.with_storage('data.db')
.with_tfidf_embeddings()
.build())
Builder Methods¶
builder()¶
Returns: NetworkBuilder instance
with_storage()¶
Example:
with_embeddings()¶
Example:
from src.embeddings import TFIDFEmbedding, OllamaEmbedding
# Custom TF-IDF
tfidf = TFIDFEmbedding(max_features=1000)
builder.with_embeddings(tfidf)
# Ollama
ollama = OllamaEmbedding(host='http://localhost:11434')
builder.with_embeddings(ollama)
with_tfidf_embeddings()¶
Example:
with_ollama_embeddings()¶
builder.with_ollama_embeddings(
model: str = "nomic-embed-text",
host: str = "http://localhost:11434"
)
Example:
with_similarity_threshold()¶
builder.with_similarity_threshold(
min_similarity: float,
strong_similarity: Optional[float] = None
)
Example:
builder.with_similarity_threshold(0.6)
builder.with_similarity_threshold(0.6, strong_similarity=0.8)
from_config()¶
Parameters:
- config: YAML file path, dict, or DSLConfig object
Example:
# From YAML file
builder.from_config('config/papers.yaml')
# From dict
config_dict = {...}
builder.from_config(config_dict)
# From DSLConfig
from src.dsl import DSLParser
config = DSLParser.parse('config/papers.yaml')
builder.from_config(config)
with_dsl_config()¶
Example:
from src.dsl import DSLParser
# From YAML file path
builder.with_dsl_config('config/papers.yaml')
# From DSLConfig instance
config = DSLParser.parse('config/papers.yaml')
builder.with_dsl_config(config)
build()¶
Returns: Configured NetworkRAG instance
Complete Builder Example¶
from src.network_rag import NetworkRAG
rag = (NetworkRAG.builder()
.with_storage('papers.db')
.with_ollama_embeddings(
model='nomic-embed-text',
host='http://localhost:11434'
)
.with_similarity_threshold(
min_similarity=0.6,
strong_similarity=0.8
)
.from_config('config/papers.yaml')
.build())
Fluent Query API¶
The fluent query API provides method chaining for building complex queries.
QueryBuilder¶
Created by calling rag.search(query).
with_strategy()¶
Strategies:
- "similarity": Pure cosine similarity (baseline)
- "community": Community-aware retrieval
- "bridge": Cross-domain via bridges
- "hub": Versatile knowledge via hubs
- "hybrid": Intelligent combination (recommended)
Example:
in_community()¶
Example:
# Restrict search to community 0
results = (rag.search("neural networks")
.in_community(0)
.top(10))
filter()¶
Example:
expand_neighbors()¶
Example:
# Include 2-hop neighbors of results
results = (rag.search("attention mechanism")
.expand_neighbors(hops=2)
.top(10))
prioritize_hubs()¶
Example:
prioritize_bridges()¶
Example:
top()¶
Returns: ResultSet with top n results
Complete Query Example¶
results = (rag.search("neural networks")
.with_strategy("hybrid")
.in_community(0)
.filter(year=2024, category="ML")
.expand_neighbors(hops=2)
.prioritize_hubs()
.prioritize_bridges()
.top(20))
Result Objects¶
SearchResult¶
Individual search result with metadata.
Properties:
- id: str - Node identifier
- content: str - Text content
- score: float - Similarity score (0.0 to 1.0)
- metadata: Dict[str, Any] - Node metadata
- community_id: Optional[int] - Community membership
Example:
result = results[0]
print(f"ID: {result.id}")
print(f"Score: {result.score:.3f}")
print(f"Content: {result.content[:100]}...")
print(f"Community: {result.community_id}")
print(f"Metadata: {result.metadata}")
ResultSet¶
Collection of SearchResult objects with fluent interface.
Iteration¶
Indexing¶
Length¶
Conversion Methods¶
ids = results.ids() # List[str]
scores = results.scores() # List[float]
contents = results.contents() # List[str]
metadata_list = results.metadata_list() # List[Dict]
Example:
results = rag.search("query").top(10)
# Get just IDs
ids = results.ids()
# Get scores
scores = results.scores()
print(f"Best score: {max(scores)}")
# Get content
contents = results.contents()
Filtering Methods¶
filtered = results.filter_by_score(min_score: float)
filtered = results.filter_by_metadata(**kwargs)
top_n = results.top(n: int)
Example:
# High-confidence results only
high_conf = results.filter_by_score(0.8)
# Filter by metadata
ml_results = results.filter_by_metadata(category="ML")
# Combine filters
filtered = (results
.filter_by_score(0.7)
.filter_by_metadata(year=2024)
.top(5))
Batch Operations¶
BatchContext¶
Context manager for batch additions with automatic network rebuild.
with rag.batch() as batch:
batch.add(content, **metadata)
batch.add(content, **metadata)
# Network automatically rebuilt on exit
Example:
# Batch add many documents
with rag.batch() as batch:
for doc in documents:
batch.add(
doc['content'],
id=doc['id'],
category=doc['category']
)
# Network automatically built after all adds
print(f"Added {len(documents)} documents")
Manual Batch Operations¶
# Add many documents
for doc in documents:
rag.add_node(doc['id'], doc['content'], doc['metadata'])
# Rebuild network once
rag.build_network(rebuild=True)
Storage API¶
SQLiteStorage¶
Direct storage access (advanced usage).
get_node()¶
Returns: Node dictionary or None
Example:
node = rag.storage.get_node('paper1')
print(f"Content: {node['content_text']}")
print(f"Metadata: {node['metadata']}")
search_nodes()¶
Example:
# All nodes
all_nodes = rag.storage.search_nodes({})
# Filtered nodes
ml_nodes = rag.storage.search_nodes({'metadata': {'category': 'ML'}})
load_embeddings()¶
embeddings = storage.load_embeddings(
model_name: Optional[str] = None,
node_ids: Optional[List[str]] = None
) -> Dict[str, np.ndarray]
Returns: Dict mapping node_id → embedding array
Example:
# All embeddings
all_emb = rag.storage.load_embeddings()
# Specific model
tfidf_emb = rag.storage.load_embeddings(model_name='tfidf')
# Specific nodes
subset = rag.storage.load_embeddings(node_ids=['paper1', 'paper2'])
Embedding Providers¶
TFIDFEmbedding¶
Fast TF-IDF embeddings (no external dependencies).
from src.embeddings import TFIDFEmbedding
embedder = TFIDFEmbedding(
max_features: int = 1000,
ngram_range: Tuple[int, int] = (1, 1)
)
Example:
embedder = TFIDFEmbedding(max_features=512, ngram_range=(1, 2))
# Must fit before use
texts = ["doc 1", "doc 2", "doc 3"]
embedder.fit(texts)
# Embed
embeddings = embedder.embed(texts)
OllamaEmbedding¶
Remote Ollama service embeddings.
from src.embeddings import OllamaEmbedding
embedder = OllamaEmbedding(
host: str = "http://localhost:11434",
model: str = "nomic-embed-text"
)
Example:
embedder = OllamaEmbedding(
host='http://192.168.1.100:11434',
model='nomic-embed-text'
)
# Embed (no fit required)
embeddings = embedder.embed(["text 1", "text 2"])
Custom Embedding Provider¶
Implement the EmbeddingProvider interface:
from src.embeddings import EmbeddingProvider
import numpy as np
class CustomEmbedder(EmbeddingProvider):
def embed(self, texts: List[str]) -> np.ndarray:
# Your embedding logic
return embeddings
def get_dimension(self) -> int:
return self.dimension
def get_model_name(self) -> str:
return "custom"
def get_model_config(self) -> Dict[str, Any]:
return {}
Advanced Usage¶
Incremental Updates¶
# Load existing database
rag = NetworkRAG.builder().with_storage('existing.db').build()
# Add new documents
new_docs = get_new_documents()
for doc in new_docs:
rag.add(doc['content'], **doc['metadata'])
# Rebuild network (only recomputes for new edges)
rag.build_network(rebuild=True)
Custom Similarity Thresholds¶
# Per-query thresholds
rag_relaxed = NetworkRAG.builder().with_similarity_threshold(0.5).build()
rag_strict = NetworkRAG.builder().with_similarity_threshold(0.8).build()
# Use different thresholds for different queries
relaxed_results = rag_relaxed.search("broad query").top(20)
strict_results = rag_strict.search("specific query").top(10)
Network Export¶
import networkx as nx
# Get graph
graph = rag.graph
# Export to various formats
nx.write_gexf(graph, 'network.gexf')
nx.write_graphml(graph, 'network.graphml')
nx.write_edgelist(graph, 'network.edgelist')
Community Analysis¶
# Detect communities
communities = rag.detect_communities()
# Analyze each community
from collections import defaultdict
comm_groups = defaultdict(list)
for node_id, comm_id in communities.items():
comm_groups[comm_id].append(node_id)
for comm_id, nodes in comm_groups.items():
print(f"\nCommunity {comm_id}: {len(nodes)} nodes")
# Auto-tag
tags = rag.auto_tag_community(comm_id)
print(f" Keywords: {', '.join(tags)}")
# Sample nodes
sample = nodes[:5]
for node_id in sample:
node = rag.storage.get_node(node_id)
print(f" - {node_id}: {node['content_text'][:50]}...")
Similarity Computation¶
# Get embeddings
embeddings_dict = rag.storage.load_embeddings()
# Compute similarity matrix
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
node_ids = list(embeddings_dict.keys())
embeddings = np.array([embeddings_dict[nid] for nid in node_ids])
similarity_matrix = cosine_similarity(embeddings)
# Find most similar pairs
n = len(node_ids)
for i in range(n):
for j in range(i+1, n):
sim = similarity_matrix[i, j]
if sim >= 0.8:
print(f"{node_ids[i]} <-> {node_ids[j]}: {sim:.3f}")
See Also¶
- Getting Started - Tutorial for beginners
- Core Concepts - Deep dive into structured similarity and network topology
- YAML DSL Reference - Complete YAML configuration reference
- Fluent API Guide - Fluent API patterns
Complex Network RAG API - Pythonic interface for topology-aware retrieval.