using SimilaritySearch, PlotlyLight, StatsBase, LinearAlgebra, Markdown, Random, PrintfWorking with 2D points
by: Eric S. Téllez
This demonstration shows in a 2D example the functionality of SearchGraph.
M = randn(Float16, 2, 10^4)
db = MatrixDatabase(M)
dist = Dist.CastF32.SqL2()Now we can create the index
- 1
-
Defines the index and the search context (caches and hyperparameters); particularly, we use a very high quality build
MinRecall(0.99); high quality constructions yield to faster queries due to the underlying graph structure. - 2
- Actual indexing procedure using the given search context.
- 3
- Optimizing the index to trade quality and speed.
The set of queries
We define a small set of queries being close to the border of the dataset and also in the most dense regions of the dataset.
Q = [Float32[-2, -2], Float32[2, -2], Float32[-2, 0], Float32[-0, 2], Float32[0, 0], Float32[-3, 3], Float32[4, 4], Float32[1, 0.5]]
knns = searchbatch(G, ctx, VectorDatabase(Q), 12)Please note how queries in low and high dense regions are located.
data = Config[]
# dataset
push!(data, Config(
x = view(M, 1, :),
y = view(M, 2, :),
mode = "markers",
marker = (color = "cyan", opacity = 0.3, size = 2, line = (width = 0,)),
name = "Database",
showlegend = false
))
# nearest neighbors
for (qID, c) in enumerate(eachcol(knns))
indices = sort!(collect(IdView(c)))
X = M[:, indices]
hovertext = ["neighbor of qID=$qID, dist=$(round(sqrt(d), digits=2)))" for d in DistView(c)]
push!(data, Config(;
x = view(X, 1, :),
y = view(X, 2, :),
mode = "markers",
marker = (color = "blue", opacity = 0.5, size = 4, line = (width = 0,)),
hovertext,
showlegend = false
))
end
# queries
hovertext = [let
h = round.(quantile(sqrt.(DistView(c)), 0:0.25:1); digits=2)
"query qID=$qID, quant=$h"
end for (qID, c) in enumerate(eachcol(knns))]
push!(data, Config(;
x = getindex.(Q, 1),
y = getindex.(Q, 2),
mode = "markers",
marker = (color = "red", opacity = 0.7, size = 8, line = (width = 0,)),
name = "Queries",
hovertext,
showlegend = false
))
# layout
layout = Config(
width = 600,
height = 600,
xaxis = (
showgrid = false,
),
yaxis = (
showgrid = false,
),
hovermode = "closest",
showlegend = false
)
Plot(data, layout)Since points are distributed in several regions with disparate density, their radii are also quite diverse. The next list illustrates the distribution of distances for the set of queries
| query ID | x | y | radius |
|---|---|---|---|
| 1 | -2.0 | -2.0 | 0.3871 |
| 2 | 2.0 | -2.0 | 0.3142 |
| 3 | -2.0 | 0.0 | 0.1756 |
| 4 | 0.0 | 2.0 | 0.1527 |
| 5 | 0.0 | 0.0 | 0.0594 |
| 6 | -3.0 | 3.0 | 1.3949 |
| 7 | 4.0 | 4.0 | 2.8097 |
| 8 | 1.0 | 0.5 | 0.0862 |
Note how the central radius are quite dense.
Environment and dependencies
Julia Version 1.10.11 Commit a2b11907d7b (2026-03-09 14:59 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: macOS (x86_64-apple-darwin24.0.0) CPU: 8 × Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-15.0.7 (ORCJIT, skylake) Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores) Environment: JULIA_NUM_THREADS = auto JULIA_PROJECT = @. JULIA_LOAD_PATH = @:@stdlib Status `~/Research/SimilaritySearchDemos/Project.toml` [aaaa29a8] Clustering v0.15.8 [944b1d66] CodecZlib v0.7.8 [a93c6f00] DataFrames v1.8.1 [c5bfea45] Embeddings v0.4.6 [f67ccb44] HDF5 v0.17.2 [b20bd276] InvertedFiles v0.9.2 ⌅ [682c06a0] JSON v0.21.4 [23fbe1c1] Latexify v0.16.10 [eb30cadb] MLDatasets v0.7.21 [06eb3307] ManifoldLearning v0.9.0 ⌃ [ca7969ec] PlotlyLight v0.11.0 [91a5bcdd] Plots v1.41.6 [27ebfcd6] Primes v0.5.7 [ca7ab67e] SimSearchManifoldLearning v0.4.0 [053f045d] SimilaritySearch v0.14.3 ⌅ [2913bbd2] StatsBase v0.33.21 [f3b207a7] StatsPlots v0.15.8 [7f6f6c8a] TextSearch v0.20.0 Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`