using SimilaritySearch
Solving single queries
by: Eric S. Téllez
This example shows how to perform single queries instead of solving a batch of them. This is particularly useful for some applications, and we also show how they are solved, which could be used to avoid some memory allocations.
= 8
dim = MatrixDatabase(randn(Float32, dim, 10^4))
db = db = MatrixDatabase(randn(Float32, dim, 100))
queries = SqL2Distance()
dist = SearchGraph(; dist, db)
G = SearchGraphContext()
ctx index!(G, ctx)
Suppose you want to compute some \(k\) nearest neighbors, for this we use the structure KnnResult
which is a priority queue of maximum size \(k\).
for _ in 1:10
= KnnResult(3)
res
@time search(G, ctx, randn(Float32, dim), res)
@show minimum(res), maximum(res), argmin(res), argmax(res)
@show collect(IdView(res))
@show collect(DistView(res))
end
0.207479 seconds (118.20 k allocations: 7.439 MiB, 99.96% compilation time)
(minimum(res), maximum(res), argmin(res), argmax(res)) = (2.17419f0, 3.781705f0, 0x00000042, 0x00000023)
collect(IdView(res)) = UInt32[0x00000042, 0x0000001c, 0x00000023]
collect(DistView(res)) = Float32[2.17419, 2.7570584, 3.781705]
0.000015 seconds (3 allocations: 160 bytes)
(minimum(res), maximum(res), argmin(res), argmax(res)) = (2.9567637f0, 8.3428545f0, 0x00000009, 0x00000059)
collect(IdView(res)) = UInt32[0x00000009, 0x00000047, 0x00000059]
collect(DistView(res)) = Float32[2.9567637, 6.4577327, 8.3428545]
0.000006 seconds (3 allocations: 160 bytes)
(minimum(res), maximum(res), argmin(res), argmax(res)) = (1.8161318f0, 2.363171f0, 0x00000051, 0x00000007)
collect(IdView(res)) = UInt32[0x00000051, 0x00000023, 0x00000007]
collect(DistView(res)) = Float32[1.8161318, 2.3379018, 2.363171]
0.000008 seconds (3 allocations: 160 bytes)
(minimum(res), maximum(res), argmin(res), argmax(res)) = (1.8514413f0, 3.9120069f0, 0x00000003, 0x0000001f)
collect(IdView(res)) = UInt32[0x00000003, 0x0000003c, 0x0000001f]
collect(DistView(res)) = Float32[1.8514413, 3.377074, 3.9120069]
0.000004 seconds (3 allocations: 160 bytes)
(minimum(res), maximum(res), argmin(res), argmax(res)) = (3.570466f0, 3.9146912f0, 0x0000002c, 0x00000064)
collect(IdView(res)) = UInt32[0x0000002c, 0x00000058, 0x00000064]
collect(DistView(res)) = Float32[3.570466, 3.8789546, 3.9146912]
0.000004 seconds (3 allocations: 160 bytes)
(minimum(res), maximum(res), argmin(res), argmax(res)) = (2.2877853f0, 5.390996f0, 0x0000004f, 0x00000013)
collect(IdView(res)) = UInt32[0x0000004f, 0x0000003f, 0x00000013]
collect(DistView(res)) = Float32[2.2877853, 4.867247, 5.390996]
0.000004 seconds (3 allocations: 160 bytes)
(minimum(res), maximum(res), argmin(res), argmax(res)) = (4.23781f0, 5.603115f0, 0x00000023, 0x00000032)
collect(IdView(res)) = UInt32[0x00000023, 0x0000004f, 0x00000032]
collect(DistView(res)) = Float32[4.23781, 5.4622717, 5.603115]
0.000004 seconds (3 allocations: 160 bytes)
(minimum(res), maximum(res), argmin(res), argmax(res)) = (4.6007833f0, 5.5480776f0, 0x0000002a, 0x0000001b)
collect(IdView(res)) = UInt32[0x0000002a, 0x00000023, 0x0000001b]
collect(DistView(res)) = Float32[4.6007833, 5.0899153, 5.5480776]
0.000004 seconds (3 allocations: 160 bytes)
(minimum(res), maximum(res), argmin(res), argmax(res)) = (4.8126698f0, 4.8592696f0, 0x00000024, 0x00000001)
collect(IdView(res)) = UInt32[0x00000024, 0x00000004, 0x00000001]
collect(DistView(res)) = Float32[4.8126698, 4.813215, 4.8592696]
0.000003 seconds (3 allocations: 160 bytes)
(minimum(res), maximum(res), argmin(res), argmax(res)) = (2.365991f0, 4.04924f0, 0x0000002c, 0x00000020)
collect(IdView(res)) = UInt32[0x0000002c, 0x00000003, 0x00000020]
collect(DistView(res)) = Float32[2.365991, 3.51064, 4.04924]
KnnResult
This structure is the container for the result and it is also used to specify the number of elements to retrieve. As mentioned before, it is a priority queue
= KnnResult(4)
res push_item!(res, 1, 10)
push_item!(res, 2, 9)
push_item!(res, 3, 8)
push_item!(res, 4, 7)
push_item!(res, 6, 5)
@show res
# it also supports removals
@show :popfirst! => popfirst!(res)
push_item!(res, 7, 0.1)
@show :push_item! => res
@show :pop! => pop!(res)
res# It can be iterated
@show collect(res)
res = SimilaritySearch.KnnResult(SimilaritySearch.AdjacencyLists.IdWeight[SimilaritySearch.AdjacencyLists.IdWeight(0x00000006, 5.0f0), SimilaritySearch.AdjacencyLists.IdWeight(0x00000004, 7.0f0), SimilaritySearch.AdjacencyLists.IdWeight(0x00000003, 8.0f0), SimilaritySearch.AdjacencyLists.IdWeight(0x00000002, 9.0f0)], 4)
:popfirst! => popfirst!(res) = :popfirst! => SimilaritySearch.AdjacencyLists.IdWeight(0x00000006, 5.0f0)
:push_item! => res = :push_item! => SimilaritySearch.KnnResult(SimilaritySearch.AdjacencyLists.IdWeight[SimilaritySearch.AdjacencyLists.IdWeight(0x00000007, 0.1f0), SimilaritySearch.AdjacencyLists.IdWeight(0x00000004, 7.0f0), SimilaritySearch.AdjacencyLists.IdWeight(0x00000003, 8.0f0), SimilaritySearch.AdjacencyLists.IdWeight(0x00000002, 9.0f0)], 4)
:pop! => pop!(res) = :pop! => SimilaritySearch.AdjacencyLists.IdWeight(0x00000002, 9.0f0)
collect(res) = SimilaritySearch.AdjacencyLists.IdWeight[SimilaritySearch.AdjacencyLists.IdWeight(0x00000007, 0.1f0), SimilaritySearch.AdjacencyLists.IdWeight(0x00000004, 7.0f0), SimilaritySearch.AdjacencyLists.IdWeight(0x00000003, 8.0f0)]
3-element Vector{IdWeight}:
IdWeight(0x00000007, 0.1f0)
IdWeight(0x00000004, 7.0f0)
IdWeight(0x00000003, 8.0f0)
Environment and dependencies
Julia Version 1.10.9
Commit 5595d20a287 (2025-03-10 12:51 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 64 × Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 64 default, 0 interactive, 32 GC (on 64 virtual cores)
Environment:
JULIA_PROJECT = .
JULIA_NUM_THREADS = auto
JULIA_LOAD_PATH = @:@stdlib
Status `~/sites/SimilaritySearchDemos/Project.toml`
[aaaa29a8] Clustering v0.15.8
[944b1d66] CodecZlib v0.7.8
[a93c6f00] DataFrames v1.7.0
[c5bfea45] Embeddings v0.4.6
[f67ccb44] HDF5 v0.17.2
[b20bd276] InvertedFiles v0.8.0 `~/.julia/dev/InvertedFiles`
[682c06a0] JSON v0.21.4
[23fbe1c1] Latexify v0.16.6
[eb30cadb] MLDatasets v0.7.18
[06eb3307] ManifoldLearning v0.9.0
⌃ [ca7969ec] PlotlyLight v0.11.0
[91a5bcdd] Plots v1.40.11
[27ebfcd6] Primes v0.5.7
[ca7ab67e] SimSearchManifoldLearning v0.3.0 `~/.julia/dev/SimSearchManifoldLearning`
[053f045d] SimilaritySearch v0.12.0 `~/.julia/dev/SimilaritySearch`
⌅ [2913bbd2] StatsBase v0.33.21
[f3b207a7] StatsPlots v0.15.7
[7f6f6c8a] TextSearch v0.19.0 `~/.julia/dev/TextSearch`
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`