Back to Pattern Factory

Vector embeddings · Semantic similarity

Embedding Galaxy

Click a concept; watch its semantic neighbors light up. Toggle cosine vs euclidean and see why every vector database uses cosine.

Modern AI represents meaning as position in a high-dimensional space — text, images, audio all become vectors of numbers, where semantically related things land near each other. This demo plots 36 CS concepts on a 2D map we hand-placed by meaning. Click any concept to highlight its three nearest neighbors by cosine similarity; toggle to euclidean distance and watch the ranking shift. The math you're running here is the same math a vector database (Pinecone, Weaviate, pgvector) runs at query time — just in 2 dimensions instead of 1,536.

What’s happening under the hood

  • Real embeddings come from neural networks (BERT, OpenAI's text-embedding-3, sentence-transformers) pre-trained on massive text corpora with objectives that pull semantically related pieces of text close together — typically into a 768- or 1,536-dimensional vector space. Models don't 'understand' meaning; they learn a geometry in which proximity correlates with meaning.
  • Cosine similarity measures the angle between two vectors, ignoring their lengths. Euclidean measures absolute distance. Embedding spaces care about direction — a long document and a short one about the same topic point the same way but live at different distances from the origin. Cosine catches the similarity; euclidean misses it. Every production vector search engine defaults to cosine for this reason.
  • “Semantic space” means a coordinate system where geometric operations have meaning — the famous example is `vector('king') - vector('man') + vector('woman') ≈ vector('queen')`. The model never stored these analogies; they emerged from the training objective. Nearest-neighbor lookup over that space is what powers retrieval-augmented generation, image search, recommendation systems, and the “related lessons” surface DURA may eventually ship.

Dig deeper

Phase 6 · AI/ML Engineering

The concept you just explored is taught with full depth in the formal DURA curriculum.