active library

seqwise

Started 2025 Python

Resources & Distribution

Source Code

Package Registries

Seqwise - Sequential Image Analysis with Vision Language Models

A simple, cost-free approach to analyzing sequences of images using local Vision Language Models (VLMs). No training required - just ask questions in plain English.

What This Does

Instead of training complex models, we use pre-trained VLMs to answer questions about image sequences:

“Is there an accident in these images?”
“Is the camera dirty or blocked?”
“What changed between the first and last image?”
“Is this a road, parking lot, or building entrance?”

Quick Start

from seqwise import ImageAnalyzer

# Use a local model (free, no internet required)
analyzer = ImageAnalyzer(model="blip2")

# Process a sequence of images
images = load_your_images()  # List of PIL Images or paths
result = analyzer.ask(images, "What is happening in these images?")
print(result)

Installation

# Install from PyPI
pip install seqwise

# Or install from source
git clone https://github.com/yourusername/seqwise
cd seqwise
pip install -e .

# For Ollama support (recommended for easy setup)
brew install ollama  # Mac
# or see https://ollama.ai for other platforms

# Start Ollama with a vision model
ollama run llava "describe this image"

Supported Models

1. Ollama (Easiest Setup)

# Runs completely locally, OpenAI-compatible API
analyzer = ImageAnalyzer(model="ollama", model_name="llava")

# Ollama supports several vision models:
# - llava (7B, 13B) - Good balance
# - bakllava - Optimized for speed
# - llava-phi3 - Smaller, faster

2. BLIP-2 (Direct Python)

# No server needed, runs directly in Python
analyzer = ImageAnalyzer(model="blip2")

# Variants:
# - "blip2-opt-2.7b" - Smallest, fastest (6GB RAM)
# - "blip2-flan-t5-xl" - Better reasoning (8GB RAM)

3. OpenAI-Compatible Endpoints

# Works with any OpenAI-compatible API (local or remote)
analyzer = ImageAnalyzer(
    model="openai",
    api_base="http://localhost:11434/v1",  # Ollama endpoint
    api_key="not-needed-for-local"
)

Simple Examples

Basic Question Answering

# Single image
analyzer = ImageAnalyzer(model="blip2")
image = Image.open("camera_frame.jpg")
answer = analyzer.ask(image, "Is this indoors or outdoors?")
# Returns: "outdoors"

# Multiple images (temporal analysis)
images = [Image.open(f"frame_{i}.jpg") for i in range(5)]
answer = analyzer.ask(images, "Did anything change between these images?")
# Returns: "Yes, a car moved from the left side to the right side"

Continuous Monitoring

from seqwise import StreamingAnalyzer

monitor = StreamingAnalyzer(
    model="ollama",
    buffer_size=60  # Keep last 60 frames
)

# Process frames as they arrive
for frame in camera_stream():
    monitor.add_frame(frame)
    
    # Check conditions
    if monitor.check("Is there an accident?"):
        send_alert("Possible accident detected")
    
    # Periodic comprehensive check
    if monitor.frame_count % 60 == 0:
        status = monitor.analyze([
            "What type of location is this?",
            "Is the camera working properly?",
            "Any safety concerns?"
        ])
        log_status(status)

Multi-Question Analysis

analyzer = ImageAnalyzer(model="blip2")

questions = {
    "location": "Is this a road, parking lot, or building?",
    "occupancy": "Is it empty, sparse, or crowded?",
    "time": "Is it day or night?",
    "weather": "What is the weather condition?"
}

results = analyzer.ask_multiple(image, questions)
# Returns: {
#     "location": "road",
#     "occupancy": "sparse",
#     "time": "day",
#     "weather": "clear"
# }

Performance Guide

Model	RAM Needed	Speed (CPU)	Speed (GPU)	Quality
Ollama llava-7b	8 GB	2-5 sec	0.5 sec	Good
BLIP-2 2.7B	6 GB	1-3 sec	0.2 sec	Good
Ollama llava-13b	16 GB	5-10 sec	1 sec	Better

Template Specification

Seqwise offers 5 ways to specify classification prompts:

# 1. Stock Templates (Easiest - 10 domains available)
classifier = TemplateClassifier.from_stock_template("traffic")

# 2. Custom Jinja2 (Most Flexible - 30+ computed variables)
classifier = TemplateClassifier(ClassifierConfig(
    prompt_template="Frame {{ frame_index }}: {{ last_classification }}..."
))

# 3. From File (Team Collaboration)
classifier = TemplateClassifier.from_template_file("template.j2")

# 4. Preset Styles (Quick Prototyping)
classifier = TemplateClassifier.from_preset("simple", schema=my_schema)

# 5. System + User (Advanced Control)
classifier = TemplateClassifier(ClassifierConfig(
    system_template="You are an expert...",
    prompt_template="Analyze this frame..."
))

📖 See Template Specification Guide for detailed examples.

Model Abstraction

All models use the same simple interface:

class ImageAnalyzer:
    def __init__(self, model="blip2", **kwargs):
        """
        model: "blip2", "ollama", or "openai"
        kwargs: model-specific settings
        """
    
    def ask(self, images, question):
        """Ask a question about image(s)"""
    
    def check(self, images, condition):
        """Check if a condition is true (returns bool)"""
    
    def describe(self, images):
        """Get a general description"""

Implementation Structure

seqwise/
├── core.py           # Main ImageAnalyzer class
├── models/
│   ├── blip2.py     # Direct BLIP-2 implementation
│   ├── ollama.py    # Ollama client (OpenAI-compatible)
│   └── openai.py    # Generic OpenAI-compatible client
├── streaming.py      # StreamingAnalyzer for continuous feeds
└── utils.py          # Image preprocessing, batching

Tips for Best Results

1. Question Phrasing

# Good: Specific, answerable
"Is there a vehicle accident?"
"How many people are visible?"
"Is the camera lens dirty?"

# Less effective: Vague
"What do you see?"
"Is everything okay?"

2. CPU Optimization

# Use smaller models on CPU
analyzer = ImageAnalyzer(
    model="blip2",
    variant="opt-2.7b",  # Smallest variant
    dtype="int8"          # Quantized for speed
)

# Process lower resolution
analyzer.set_image_size(512)  # Default is 768

# Batch questions together
questions = ["Is it day?", "Any people?", "What type of scene?"]
answers = analyzer.ask_batch(image, questions)  # Single forward pass

3. GPU Optimization

# Use larger models on GPU
analyzer = ImageAnalyzer(
    model="ollama",
    model_name="llava-13b",
    device="cuda"
)

# Process multiple frames in parallel
batch_results = analyzer.ask(
    images[:10],  # Process 10 frames at once
    "Describe what happens in this sequence"
)

Complete Example: Security Camera Monitor

from seqwise import ImageAnalyzer, StreamingAnalyzer
from PIL import Image
import time

def monitor_camera(video_source):
    # Initialize analyzer with local model
    analyzer = StreamingAnalyzer(
        model="ollama",
        model_name="llava",
        buffer_size=30  # Keep 5 minutes at 10-second intervals
    )
    
    # Define what to monitor
    checks = {
        "obstruction": ("Is the camera blocked or obstructed?", 1),    # Check every frame
        "accident": ("Is there a vehicle accident?", 5),               # Every 5 frames
        "crowding": ("Is the area becoming crowded?", 10),            # Every 10 frames
        "camera_health": ("Is the image clear and properly lit?", 30) # Every 30 frames
    }
    
    frame_count = 0
    for frame in video_source:
        analyzer.add_frame(frame)
        frame_count += 1
        
        # Run scheduled checks
        for check_name, (question, interval) in checks.items():
            if frame_count % interval == 0:
                result = analyzer.check_recent(question, num_frames=3)
                
                if check_name == "obstruction" and result:
                    print(f"⚠️ Camera obstruction detected!")
                elif check_name == "accident" and result:
                    print(f"🚨 Possible accident detected!")
                    # Get more details
                    details = analyzer.ask_recent(
                        "Describe the accident",
                        num_frames=5
                    )
                    print(f"Details: {details}")
        
        # Periodic summary
        if frame_count % 100 == 0:
            summary = analyzer.get_summary()
            print(f"Status at frame {frame_count}: {summary}")

if __name__ == "__main__":
    # Example with webcam
    import cv2
    cap = cv2.VideoCapture(0)
    
    def frame_generator():
        while True:
            ret, frame = cap.read()
            if ret:
                # Convert to PIL Image
                yield Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
            time.sleep(10)  # 10-second intervals
    
    monitor_camera(frame_generator())

Ollama Setup Guide

Install Ollama:

# Mac/Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Windows
# Download from https://ollama.ai/download

Download a vision model:

ollama pull llava:7b  # Recommended for most users
# or
ollama pull llava:13b  # Better quality, needs more RAM

Test it:

# In terminal
ollama run llava "describe an image of a cat"

# In Python
from seqwise import ImageAnalyzer
analyzer = ImageAnalyzer(model="ollama")
result = analyzer.ask(your_image, "What is this?")

FAQ

Q: Which model should I use?

CPU only: Use BLIP-2 2.7B
GPU available: Use Ollama with llava-7b or llava-13b
Limited RAM: Use BLIP-2 with int8 quantization

Q: How much RAM do I need?

Minimum: 8GB for small models
Recommended: 16GB for better models
With GPU: 8GB VRAM for good performance

Q: Can this run on a Raspberry Pi?

Yes, with BLIP-2 2.7B quantized, but slowly (~10-30 seconds per image)
Better: Use a Jetson Nano or similar edge AI device

Q: Is this really free?

Yes! All models run locally on your hardware
No API costs, no internet required after download
Models are open source (MIT, Apache licenses)

License

MIT