Incremental construction with SearchGraph

by: Eric S. Téllez

using SimilaritySearch

For incremental construction we need a database backend that supports incremental insertions. Currently, there are two backends for this: BlockMatrixDatabase and VectorDatabase:

dim = 8
db = BlockMatrixDatabase(dim, Float32) # or VectorDatabase(Vector{Float32}) yet we will expect an sparser memory layout
dist = Dist.L1()

it can use any distance function described in SimilaritySearch and Distances.jl, and in fact any SemiMetric as described in the later package. The index construction is made as follows:

G = SearchGraph(; dist, db)
ctx = SearchGraphContext()

instead of index! we can use append_items! to index a batch of items

append_items!(G, ctx, MatrixDatabase(rand(Float32, dim, 10^4))) # append_items! inserts many items at once

Now we have a populated index.

Note that we used a MatrixDatabase to wrap the matrix to be inserted since it will be copied into the index. Whenever we want to avoid copies, we can also use VectorDatabase, and work with references or views, or even any kind of struct.

@assert length(G) == 10^4

this will display a lot of information in the console, since as construction advances the hyperparameters of the index are adjusted.

Once the index is created, the index can solve nearest neighbor queries


1Q = MatrixDatabase(rand(dim, 30))
2k = 5
3knns = searchbatch(G, ctx, Q, k)
display((typeof(knns), sizeof(knns)))
1
Creates the query
2
The number of nearest neighbors to retrieve
3
Solve queries, returns neighbor identifiers and distances.
(Matrix{IdDist}, 1200)

Environment and dependencies

Julia Version 1.10.11
Commit a2b11907d7b (2026-03-09 14:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin24.0.0)
  CPU: 8 × Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = auto
  JULIA_PROJECT = @.
  JULIA_LOAD_PATH = @:@stdlib
Status `~/Research/SimilaritySearchDemos/Project.toml`
  [aaaa29a8] Clustering v0.15.8
  [944b1d66] CodecZlib v0.7.8
  [5ae59095] Colors v0.13.1
  [a93c6f00] DataFrames v1.8.1
  [c5bfea45] Embeddings v0.4.6
  [f67ccb44] HDF5 v0.17.2
  [916415d5] Images v0.26.2
  [b20bd276] InvertedFiles v0.9.2
 [682c06a0] JSON v0.21.4
  [23fbe1c1] Latexify v0.16.10
  [eb30cadb] MLDatasets v0.7.21
  [06eb3307] ManifoldLearning v0.9.0
 [ca7969ec] PlotlyLight v0.11.0
  [27ebfcd6] Primes v0.5.7
  [ca7ab67e] SimSearchManifoldLearning v0.4.0
  [053f045d] SimilaritySearch v0.14.3
 [2913bbd2] StatsBase v0.33.21
  [7f6f6c8a] TextSearch v0.20.0
Info Packages marked with  and  have new versions available. Those with  may be upgradable, but those with  are restricted by compatibility constraints from upgrading. To see why use `status --outdated`