active library

chatgpt-complex-net

Cognitive MRI of AI Conversations: Network analysis of ChatGPT conversation logs using semantic embeddings to reveal knowledge topology, community structure, and cross-domain bridges

Started 2025 Python

Resources & Distribution

Source Code

Package Registries

ChatGPT Complex Networks

A toolkit for generating, analyzing, and visualizing complex networks from conversational data. This project provides tools for embedding conversations, generating similarity-based edges, and performing network analysis on conversation graphs.

Features

  • Plugin-Based Data Integration:

    • Extensible integration system for various data sources
    • Built-in support for chat logs (JSON, Markdown), bookmarks, playlists
    • Easy to add custom integrations
  • Flexible Embedding Generation:

    • Multiple LLM providers: Ollama (local), OpenAI, Anthropic/Voyage, HuggingFace, Cohere
    • Memory-efficient TF-IDF for large datasets
    • Advanced chunking strategies (sliding window, sentence, paragraph)
    • Role-based weighting for conversations
    • Configurable aggregation methods (mean, weighted, PCA, max-pool)
  • Graph Generation:

    • Generate edges based on cosine similarity between node embeddings
    • GPU-accelerated edge generation for large networks
    • Filter edges using similarity cutoffs
    • Normalize and rescale edge weights
  • Network Analysis:

    • Compute comprehensive graph statistics (centrality, clustering, etc.)
    • Export to common graph formats (GEXF, GraphML, GML)
    • Analyze conversation connectivity and important nodes
  • Recommendation System:

    • Find semantically similar conversations
    • Interactive REPL for exploring the conversation graph
    • Adjustable weighting between similarity and centrality

Installation

# Clone the repository
git clone https://github.com/queelius/llm-semantic-net.git
cd llm-semantic-net

# Install as package
pip install -e .

# For development (includes testing tools)
pip install -e ".[dev]"
# Or:
pip install -r requirements-dev.txt

After installation, the following commands become available:

  • semnet - Main CLI for integration-based semantic network generation
  • semnet-rec - Recommendation system

Usage

Quick Start

# List available integrations
semnet list

# Import your data (e.g., JSON chat logs)
semnet import --integration chatlog.json \
    --source ./data/conversations --output-dir ./data/imported

# Generate embeddings (with separate storage)
semnet embed --input-dir ./data/imported \
    --output-dir ./data/embeddings \
    --integration chatlog --method llm

# Create edges and export
semnet edges --input-dir ./data/embeddings \
    --output-file edges.json --threshold 0.7
    
semnet export --nodes-dir ./data/imported \
    --edges-file edges.json --format gexf --output-file graph.gexf

Advanced Embedding Options

# Use OpenAI embeddings
semnet embed --input-dir ./data/imported \
    --output-dir ./data/embeddings \
    --integration chatlog --method llm \
    --llm-provider openai \
    --llm-config '{"model": "text-embedding-3-large"}'

# Chunked embeddings with role weighting
semnet embed --input-dir ./data/imported \
    --output-dir ./data/embeddings \
    --integration chatlog --method llm \
    --chunk-size 1024 --chunk-overlap 100 \
    --aggregation-method weighted_mean \
    --role-weights '{"user": 1.5, "assistant": 1.0}'

GPU-Accelerated Edge Generation

For large datasets, you can use GPU acceleration for edge generation:

# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Then use the GPU edge generation module
python graph/gpu-edge.py --input-dir ./data/embeddings \
    --output-file edges_gpu.json

Recommendation System

# Start interactive recommendation shell
python rec-conv.py --nodes-dir ./data/embeddings_json \
    --csv nodes.csv --repl

# One-shot recommendation
python rec-conv.py --nodes-dir ./data/embeddings_json \
    --csv nodes.csv --recommend new_conv.json --topk 8

Project Structure

├── cli.py                     # Main CLI with integration support
├── rec_conv.py                # Recommendation system
├── integrations/             # Plugin-based data integrations
   ├── base.py              # Abstract base classes
   ├── chatlog/             # Chat log integrations
   ├── bookmarks/           # Bookmark integrations
   └── playlist/            # Playlist integrations
├── embedding/                # Embedding generation
   ├── llm_providers.py     # Multiple LLM provider support
   ├── chunking.py          # Advanced chunking strategies
   ├── tfidf_memory_efficient.py  # Memory-efficient TF-IDF
   └── llm_embedding_model.py     # Legacy Ollama support
├── graph/                    # Graph processing
   ├── edge_utils.py        # Edge generation
   ├── export_utils.py      # Export to various formats
   └── gpu-edge.py          # GPU acceleration
├── tests/                    # Test suite
   ├── test_core/           # Core functionality tests
   ├── test_embedding/      # Embedding tests
   ├── test_integrations/   # Integration tests
   └── conftest.py          # Shared fixtures
└── config.example.yaml       # Configuration examples

Testing

# Run all tests
pytest

# Run with coverage report
pytest --cov

# Run specific test categories
pytest -m unit           # Unit tests only
pytest -m integration    # Integration tests only

# Generate HTML coverage report
pytest --cov --cov-report=html

Configuration

Environment Variables

Set these environment variables for different LLM providers:

# Ollama (local)
export OLLAMA_HOST="http://localhost:11434"
export OLLAMA_MODEL="nomic-embed-text"

# OpenAI
export OPENAI_API_KEY="sk-..."

# Other providers
export VOYAGE_API_KEY="..."        # Anthropic/Voyage
export HUGGINGFACE_API_KEY="..."
export COHERE_API_KEY="..."

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use this tool in your research, please cite:

@software{chatgpt-complex-net,
  author = {Alex Towell},
  title = {ChatGPT Complex Networks},
  year = {2025},
  url = {https://github.com/queelius/chatgpt-complex-net}
}

Discussion