Working with 2D points

by: Eric S. Téllez

This demonstration shows in a 2D example the functionality of SearchGraph.

using SimilaritySearch, PlotlyLight, StatsBase, LinearAlgebra, Markdown, Random, Printf
M = randn(Float16, 2, 10^4)
db = MatrixDatabase(M)
dist = Dist.CastF32.SqL2()

Now we can create the index

1G = SearchGraph(; dist, db)
ctx = SearchGraphContext(hyperparameters_callback=OptimizeParameters(MinRecall(0.99)))
2index!(G, ctx)
3optimize_index!(G, ctx, MinRecall(0.9))
1
Defines the index and the search context (caches and hyperparameters); particularly, we use a very high quality build MinRecall(0.99); high quality constructions yield to faster queries due to the underlying graph structure.
2
Actual indexing procedure using the given search context.
3
Optimizing the index to trade quality and speed.

The set of queries

We define a small set of queries being close to the border of the dataset and also in the most dense regions of the dataset.

Q = [Float32[-2, -2], Float32[2, -2], Float32[-2, 0], Float32[-0, 2], Float32[0, 0],   Float32[-3, 3],  Float32[4, 4], Float32[1, 0.5]]
knns = searchbatch(G, ctx, VectorDatabase(Q), 12)

Please note how queries in low and high dense regions are located.


data = Config[]

# dataset
push!(data, Config(
    x = view(M, 1, :),
    y = view(M, 2, :),
    mode = "markers",
    marker = (color = "cyan", opacity = 0.3, size = 2, line = (width = 0,)),
    name = "Database",
    showlegend = false
))

# nearest neighbors
for (qID, c) in enumerate(eachcol(knns))
    indices = sort!(collect(IdView(c)))
    X = M[:, indices] 
    hovertext = ["neighbor of qID=$qID, dist=$(round(sqrt(d), digits=2)))" for d in DistView(c)]

    push!(data, Config(;
        x = view(X, 1, :),
        y = view(X, 2, :),
        mode = "markers",
        marker = (color = "blue", opacity = 0.5, size = 4, line = (width = 0,)),
        hovertext,
        showlegend = false
    ))
end

# queries
hovertext = [let
    h = round.(quantile(sqrt.(DistView(c)), 0:0.25:1); digits=2)
    "query qID=$qID, quant=$h"
    end for (qID, c) in enumerate(eachcol(knns))]

push!(data, Config(;
    x = getindex.(Q, 1),
    y = getindex.(Q, 2),
    mode = "markers",
    marker = (color = "red", opacity = 0.7, size = 8, line = (width = 0,)),
    name = "Queries",
    hovertext,
    showlegend = false
))

# layout
layout = Config(
    width = 600,
    height = 600,
    xaxis = (
        showgrid = false,
    ),
    yaxis = (
        showgrid = false,
    ),
    hovermode = "closest",
    showlegend = false
)

Plot(data, layout)

Since points are distributed in several regions with disparate density, their radii are also quite diverse. The next list illustrates the distribution of distances for the set of queries

query ID x y radius
1 -2.0 -2.0 0.3871
2 2.0 -2.0 0.3142
3 -2.0 0.0 0.1756
4 0.0 2.0 0.1527
5 0.0 0.0 0.0594
6 -3.0 3.0 1.3949
7 4.0 4.0 2.8097
8 1.0 0.5 0.0862

Note how the central radius are quite dense.

Environment and dependencies

Julia Version 1.10.11
Commit a2b11907d7b (2026-03-09 14:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin24.0.0)
  CPU: 8 × Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = auto
  JULIA_PROJECT = @.
  JULIA_LOAD_PATH = @:@stdlib
Status `~/Research/SimilaritySearchDemos/Project.toml`
  [aaaa29a8] Clustering v0.15.8
  [944b1d66] CodecZlib v0.7.8
  [a93c6f00] DataFrames v1.8.1
  [c5bfea45] Embeddings v0.4.6
  [f67ccb44] HDF5 v0.17.2
  [b20bd276] InvertedFiles v0.9.2
 [682c06a0] JSON v0.21.4
  [23fbe1c1] Latexify v0.16.10
  [eb30cadb] MLDatasets v0.7.21
  [06eb3307] ManifoldLearning v0.9.0
 [ca7969ec] PlotlyLight v0.11.0
  [91a5bcdd] Plots v1.41.6
  [27ebfcd6] Primes v0.5.7
  [ca7ab67e] SimSearchManifoldLearning v0.4.0
  [053f045d] SimilaritySearch v0.14.3
 [2913bbd2] StatsBase v0.33.21
  [f3b207a7] StatsPlots v0.15.8
  [7f6f6c8a] TextSearch v0.20.0
Info Packages marked with  and  have new versions available. Those with  may be upgradable, but those with  are restricted by compatibility constraints from upgrading. To see why use `status --outdated`