Blog Posts

Tutorial 3: Blog Post Discovery¶

Build a blog discovery system with hierarchical field weighting.

Goal¶

Create a blog search where: - Title is most important (users scan titles first) - Summary captures main ideas - Full content provides depth - Category enables filtering

Configuration¶

schema:
  title:
    type: text
    required: true
  summary:
    type: text
    required: true
  content:
    type: text
    required: true
  category:
    type: text
    required: true
  tags:
    type: list
    default: []

embeddings:
  # Title embedding (most important)
  title_vec:
    field: title
    model: tfidf

  # Summary embedding
  summary_vec:
    field: summary
    model: tfidf

  # Full content embedding (chunked)
  content_vec:
    field: content
    model: tfidf
    chunking:
      method: sentences
      max_tokens: 1024

  # Combined text embedding with title emphasis
  text_vec:
    combine:
      - ref: title_vec
        weight: 0.45
      - ref: summary_vec
        weight: 0.35
      - ref: content_vec
        weight: 0.2

similarities:
  # Semantic text similarity
  text_sim:
    embedding: text_vec

  # Category exact match
  category_sim:
    field: category
    metric: exact

  # Tag overlap
  tag_sim:
    field: tags
    metric: jaccard

  # Combined similarity
  overall:
    combine:
      - ref: text_sim
        weight: 0.9
      - ref: category_sim
        weight: 0.05
      - ref: tag_sim
        weight: 0.05

network:
  edges:
    similarity: overall
    min: 0.4

Implementation¶

rag = (NetworkRAG.builder()
       .with_storage('blog.db')
       .from_config('config/blog.yaml')
       .build())

posts = [
    {
        'id': 'intro-rag',
        'title': 'Introduction to Retrieval-Augmented Generation',
        'summary': 'Learn the basics of RAG systems and how they improve LLM responses',
        'content': 'RAG systems combine retrieval with generation... [long content]',
        'category': 'AI',
        'tags': ['rag', 'llm', 'retrieval']
    },
    # ... more posts
]

for post in posts:
    rag.add(post['id'], document=post)

rag.build_network()

# Search
results = rag.search('retrieval systems for LLMs').top(10)