active
library
seqwise
Resources & Distribution
Source Code
Package Registries
Seqwise - Sequential Image Analysis with Vision Language Models
A simple, cost-free approach to analyzing sequences of images using local Vision Language Models (VLMs). No training required - just ask questions in plain English.
What This Does
Instead of training complex models, we use pre-trained VLMs to answer questions about image sequences:
- “Is there an accident in these images?”
- “Is the camera dirty or blocked?”
- “What changed between the first and last image?”
- “Is this a road, parking lot, or building entrance?”
Quick Start
from seqwise import ImageAnalyzer
# Use a local model (free, no internet required)
analyzer = ImageAnalyzer(model="blip2")
# Process a sequence of images
images = load_your_images() # List of PIL Images or paths
result = analyzer.ask(images, "What is happening in these images?")
print(result)
Installation
# Install from PyPI
pip install seqwise
# Or install from source
git clone https://github.com/yourusername/seqwise
cd seqwise
pip install -e .
# For Ollama support (recommended for easy setup)
brew install ollama # Mac
# or see https://ollama.ai for other platforms
# Start Ollama with a vision model
ollama run llava "describe this image"
Supported Models
1. Ollama (Easiest Setup)
# Runs completely locally, OpenAI-compatible API
analyzer = ImageAnalyzer(model="ollama", model_name="llava")
# Ollama supports several vision models:
# - llava (7B, 13B) - Good balance
# - bakllava - Optimized for speed
# - llava-phi3 - Smaller, faster
2. BLIP-2 (Direct Python)
# No server needed, runs directly in Python
analyzer = ImageAnalyzer(model="blip2")
# Variants:
# - "blip2-opt-2.7b" - Smallest, fastest (6GB RAM)
# - "blip2-flan-t5-xl" - Better reasoning (8GB RAM)
3. OpenAI-Compatible Endpoints
# Works with any OpenAI-compatible API (local or remote)
analyzer = ImageAnalyzer(
model="openai",
api_base="http://localhost:11434/v1", # Ollama endpoint
api_key="not-needed-for-local"
)
Simple Examples
Basic Question Answering
# Single image
analyzer = ImageAnalyzer(model="blip2")
image = Image.open("camera_frame.jpg")
answer = analyzer.ask(image, "Is this indoors or outdoors?")
# Returns: "outdoors"
# Multiple images (temporal analysis)
images = [Image.open(f"frame_{i}.jpg") for i in range(5)]
answer = analyzer.ask(images, "Did anything change between these images?")
# Returns: "Yes, a car moved from the left side to the right side"
Continuous Monitoring
from seqwise import StreamingAnalyzer
monitor = StreamingAnalyzer(
model="ollama",
buffer_size=60 # Keep last 60 frames
)
# Process frames as they arrive
for frame in camera_stream():
monitor.add_frame(frame)
# Check conditions
if monitor.check("Is there an accident?"):
send_alert("Possible accident detected")
# Periodic comprehensive check
if monitor.frame_count % 60 == 0:
status = monitor.analyze([
"What type of location is this?",
"Is the camera working properly?",
"Any safety concerns?"
])
log_status(status)
Multi-Question Analysis
analyzer = ImageAnalyzer(model="blip2")
questions = {
"location": "Is this a road, parking lot, or building?",
"occupancy": "Is it empty, sparse, or crowded?",
"time": "Is it day or night?",
"weather": "What is the weather condition?"
}
results = analyzer.ask_multiple(image, questions)
# Returns: {
# "location": "road",
# "occupancy": "sparse",
# "time": "day",
# "weather": "clear"
# }
Performance Guide
| Model | RAM Needed | Speed (CPU) | Speed (GPU) | Quality |
|---|---|---|---|---|
| Ollama llava-7b | 8 GB | 2-5 sec | 0.5 sec | Good |
| BLIP-2 2.7B | 6 GB | 1-3 sec | 0.2 sec | Good |
| Ollama llava-13b | 16 GB | 5-10 sec | 1 sec | Better |
Template Specification
Seqwise offers 5 ways to specify classification prompts:
# 1. Stock Templates (Easiest - 10 domains available)
classifier = TemplateClassifier.from_stock_template("traffic")
# 2. Custom Jinja2 (Most Flexible - 30+ computed variables)
classifier = TemplateClassifier(ClassifierConfig(
prompt_template="Frame {{ frame_index }}: {{ last_classification }}..."
))
# 3. From File (Team Collaboration)
classifier = TemplateClassifier.from_template_file("template.j2")
# 4. Preset Styles (Quick Prototyping)
classifier = TemplateClassifier.from_preset("simple", schema=my_schema)
# 5. System + User (Advanced Control)
classifier = TemplateClassifier(ClassifierConfig(
system_template="You are an expert...",
prompt_template="Analyze this frame..."
))
📖 See Template Specification Guide for detailed examples.
Model Abstraction
All models use the same simple interface:
class ImageAnalyzer:
def __init__(self, model="blip2", **kwargs):
"""
model: "blip2", "ollama", or "openai"
kwargs: model-specific settings
"""
def ask(self, images, question):
"""Ask a question about image(s)"""
def check(self, images, condition):
"""Check if a condition is true (returns bool)"""
def describe(self, images):
"""Get a general description"""
Implementation Structure
seqwise/
├── core.py # Main ImageAnalyzer class
├── models/
│ ├── blip2.py # Direct BLIP-2 implementation
│ ├── ollama.py # Ollama client (OpenAI-compatible)
│ └── openai.py # Generic OpenAI-compatible client
├── streaming.py # StreamingAnalyzer for continuous feeds
└── utils.py # Image preprocessing, batching
Tips for Best Results
1. Question Phrasing
# Good: Specific, answerable
"Is there a vehicle accident?"
"How many people are visible?"
"Is the camera lens dirty?"
# Less effective: Vague
"What do you see?"
"Is everything okay?"
2. CPU Optimization
# Use smaller models on CPU
analyzer = ImageAnalyzer(
model="blip2",
variant="opt-2.7b", # Smallest variant
dtype="int8" # Quantized for speed
)
# Process lower resolution
analyzer.set_image_size(512) # Default is 768
# Batch questions together
questions = ["Is it day?", "Any people?", "What type of scene?"]
answers = analyzer.ask_batch(image, questions) # Single forward pass
3. GPU Optimization
# Use larger models on GPU
analyzer = ImageAnalyzer(
model="ollama",
model_name="llava-13b",
device="cuda"
)
# Process multiple frames in parallel
batch_results = analyzer.ask(
images[:10], # Process 10 frames at once
"Describe what happens in this sequence"
)
Complete Example: Security Camera Monitor
from seqwise import ImageAnalyzer, StreamingAnalyzer
from PIL import Image
import time
def monitor_camera(video_source):
# Initialize analyzer with local model
analyzer = StreamingAnalyzer(
model="ollama",
model_name="llava",
buffer_size=30 # Keep 5 minutes at 10-second intervals
)
# Define what to monitor
checks = {
"obstruction": ("Is the camera blocked or obstructed?", 1), # Check every frame
"accident": ("Is there a vehicle accident?", 5), # Every 5 frames
"crowding": ("Is the area becoming crowded?", 10), # Every 10 frames
"camera_health": ("Is the image clear and properly lit?", 30) # Every 30 frames
}
frame_count = 0
for frame in video_source:
analyzer.add_frame(frame)
frame_count += 1
# Run scheduled checks
for check_name, (question, interval) in checks.items():
if frame_count % interval == 0:
result = analyzer.check_recent(question, num_frames=3)
if check_name == "obstruction" and result:
print(f"⚠️ Camera obstruction detected!")
elif check_name == "accident" and result:
print(f"🚨 Possible accident detected!")
# Get more details
details = analyzer.ask_recent(
"Describe the accident",
num_frames=5
)
print(f"Details: {details}")
# Periodic summary
if frame_count % 100 == 0:
summary = analyzer.get_summary()
print(f"Status at frame {frame_count}: {summary}")
if __name__ == "__main__":
# Example with webcam
import cv2
cap = cv2.VideoCapture(0)
def frame_generator():
while True:
ret, frame = cap.read()
if ret:
# Convert to PIL Image
yield Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
time.sleep(10) # 10-second intervals
monitor_camera(frame_generator())
Ollama Setup Guide
- Install Ollama:
# Mac/Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Windows
# Download from https://ollama.ai/download
- Download a vision model:
ollama pull llava:7b # Recommended for most users
# or
ollama pull llava:13b # Better quality, needs more RAM
- Test it:
# In terminal
ollama run llava "describe an image of a cat"
# In Python
from seqwise import ImageAnalyzer
analyzer = ImageAnalyzer(model="ollama")
result = analyzer.ask(your_image, "What is this?")
FAQ
Q: Which model should I use?
- CPU only: Use BLIP-2 2.7B
- GPU available: Use Ollama with llava-7b or llava-13b
- Limited RAM: Use BLIP-2 with int8 quantization
Q: How much RAM do I need?
- Minimum: 8GB for small models
- Recommended: 16GB for better models
- With GPU: 8GB VRAM for good performance
Q: Can this run on a Raspberry Pi?
- Yes, with BLIP-2 2.7B quantized, but slowly (~10-30 seconds per image)
- Better: Use a Jetson Nano or similar edge AI device
Q: Is this really free?
- Yes! All models run locally on your hardware
- No API costs, no internet required after download
- Models are open source (MIT, Apache licenses)
License
MIT