Vectorized Feature Sets

Deterministic encodings for sets of structured features

---
1. Motivation

Vector indices have emerged as a valuable tool in the arsenal of AI researchers and developers. But they are hardly ever presented as a tool to use outside of this context. It's generally assumed that vector indices encode values via machine learning: trained data-to-vector models grouping instances by semantic relevance. These architectures tend to involve non-structured data and non-deterministic querying accuracy.

This document explores the use of vector indices for structured data and deterministic querying.

1.1 Tone and Interactivity

This document was designed to be understood by any technically-capable person. It is not a rigorous proof or a formal whitepaper; it's plain english with interactive examples.

This document was intended for the web, but it's also printer-friendly. The printed version of this document will not contain interactive examples, obviously.

2. Background
2.1 Vectors
2.2 Vector Indices
2.3 Similarity Search
2.4 Features and Featuresets
3. Encoding Basis
3.1 Waveforms
3.3 Chirps
3.4 Wavelets
4. Basis Composition
4.1 Algebraic Composition
4.2 Optimizing Feature Overlap
4.3 Partial Feature Overlap
4.4 Embedding Digital Data
5. Search Algorithms
5.1 Cosine Similarity
5.2 Dot Product
5.3 Denormalized Dot Product
5.4 Multi-Resolution Filtering/Masking
5.5 Dynamic Range Aware Similarity
6. Clustering
6.1 Classic Clustering Methods
6.2 Mediant Clustering
6.3 Cluster Aware Search