Blog Posts
Tutorial 3: Blog Post Discovery¶
Build a blog discovery system with hierarchical field weighting.
Goal¶
Create a blog search where: - Title is most important (users scan titles first) - Summary captures main ideas - Full content provides depth - Category enables filtering
Configuration¶
schema:
title:
type: text
required: true
summary:
type: text
required: true
content:
type: text
required: true
category:
type: text
required: true
tags:
type: list
default: []
embeddings:
# Title embedding (most important)
title_vec:
field: title
model: tfidf
# Summary embedding
summary_vec:
field: summary
model: tfidf
# Full content embedding (chunked)
content_vec:
field: content
model: tfidf
chunking:
method: sentences
max_tokens: 1024
# Combined text embedding with title emphasis
text_vec:
combine:
- ref: title_vec
weight: 0.45
- ref: summary_vec
weight: 0.35
- ref: content_vec
weight: 0.2
similarities:
# Semantic text similarity
text_sim:
embedding: text_vec
# Category exact match
category_sim:
field: category
metric: exact
# Tag overlap
tag_sim:
field: tags
metric: jaccard
# Combined similarity
overall:
combine:
- ref: text_sim
weight: 0.9
- ref: category_sim
weight: 0.05
- ref: tag_sim
weight: 0.05
network:
edges:
similarity: overall
min: 0.4
Implementation¶
rag = (NetworkRAG.builder()
.with_storage('blog.db')
.from_config('config/blog.yaml')
.build())
posts = [
{
'id': 'intro-rag',
'title': 'Introduction to Retrieval-Augmented Generation',
'summary': 'Learn the basics of RAG systems and how they improve LLM responses',
'content': 'RAG systems combine retrieval with generation... [long content]',
'category': 'AI',
'tags': ['rag', 'llm', 'retrieval']
},
# ... more posts
]
for post in posts:
rag.add(post['id'], document=post)
rag.build_network()
# Search
results = rag.search('retrieval systems for LLMs').top(10)