Incremental construction
Usage demonstrations of `SimilaritySearch` with synthetic and real world data
Incremental construction with SearchGraph
By Eric S. Téllez
using SimilaritySearch
For incremental construction we need a database backend that supports incremental insertions. Currently, there are two backends for this: DynamicMatrixDatabase
and VectorDatabase
DynamicMatrixDatabase
: produces matrix like databases (all objects having the same number of components)VectorDatabase
: A generic conainer of objects, objects can be of any kind
const dim = 8
db = DynamicMatrixDatabase(Float32, dim) # or VectorDatabase(Vector{Float32})
dist = L1Distance()
SimilaritySearch.L1Distance()
it can use any distance function described in SimilaritySearch
and Distances.jl
, and in fact any SemiMetric
as described in the later package. The index construction is made as follows
G = SearchGraph(; dist, db, verbose=false)
SimilaritySearch.SearchGraph{SimilaritySearch.L1Distance, SimilaritySearch.DynamicMatrixDatabase{Float32, 8}, SimilaritySearch.AdjacencyLists.AdjacencyList{UInt32}, SimilaritySearch.BeamSearch}
dist: SimilaritySearch.L1Distance SimilaritySearch.L1Distance()
db: SimilaritySearch.DynamicMatrixDatabase{Float32, 8}
adj: SimilaritySearch.AdjacencyLists.AdjacencyList{UInt32}
hints: Array{Int32}((0,)) Int32[]
search_algo: SimilaritySearch.BeamSearch
len: Base.RefValue{Int64}
verbose: Bool false
instead of index!
we can use push_item!
and append_items!
functions
for _ in 1:10^4
push_item!(G, rand(Float32, dim)) # push_item! inserts one item at a time
end
append_items!(G, MatrixDatabase(rand(Float32, dim, 10^4))) # append_items! inserts many items at once
SimilaritySearch.SearchGraph{SimilaritySearch.L1Distance, SimilaritySearch.DynamicMatrixDatabase{Float32, 8}, SimilaritySearch.AdjacencyLists.AdjacencyList{UInt32}, SimilaritySearch.BeamSearch}
dist: SimilaritySearch.L1Distance SimilaritySearch.L1Distance()
db: SimilaritySearch.DynamicMatrixDatabase{Float32, 8}
adj: SimilaritySearch.AdjacencyLists.AdjacencyList{UInt32}
hints: Array{Int32}((100,)) Int32[190, 354, 392, 429, 442, 455, 502, 621, 709, 748 … 3317, 3354, 3357, 3361, 3368, 3381, 3404, 3414, 3415, 3462]
search_algo: SimilaritySearch.BeamSearch
len: Base.RefValue{Int64}
verbose: Bool false
Note that we used a MatrixDatabase
to wrap the matrix to be inserted since it will be copied into the index.
Now we have a populated index
@assert length(G) == 20_000
this will display a lot of information in the console, since as construction advances the hyperparameters of the index are adjusted. The default optimization takes into account both quality and speed, and tries to adjust to take the best of both. See ParetoRecall
in docs.
Once the index is created, the index can solve nearest neighbor queries
Q = MatrixDatabase(rand(dim, 30))
k = 5
I, D = searchbatch(G, Q, k)
D
5×30 Matrix{Float32}:
0.502752 0.52576 0.726595 0.401183 0.464119 0.444866 0.756742 0.456873 0.40771 0.367571 0.370388 0.507813 0.744219 0.876406 0.836829 0.528599 0.545427 0.661519 0.5463 0.399571 0.745745 0.409689 0.729336 0.539553 0.582203 0.724071 0.609153 0.335656 0.638144 0.625584
0.553149 0.684503 0.75752 0.433573 0.591741 0.479094 0.853407 0.499817 0.532571 0.609003 0.475572 0.515589 0.785361 0.906292 0.847161 0.536406 0.766627 0.664324 0.589698 0.461357 0.766175 0.601051 0.785035 0.669979 0.608382 0.724944 0.661835 0.479266 0.784063 0.731361
0.660131 0.695987 0.785675 0.49282 0.649916 0.635248 0.948492 0.767378 0.544482 0.644764 0.503525 0.542498 0.805892 0.922792 0.852931 0.546636 0.789724 0.672152 0.714921 0.542695 0.78998 0.635067 0.78868 0.698604 0.627333 0.738631 0.740231 0.633751 0.806445 0.812788
0.68536 0.70219 0.823099 0.724192 0.685036 0.652608 0.949991 0.781874 0.604717 0.699274 0.608819 0.567185 0.821345 0.930386 0.866786 0.636144 0.815309 0.683346 0.842521 0.626694 0.830637 0.694602 0.829939 0.724968 0.636138 0.878802 0.741749 0.636044 0.867437 0.924791
0.694604 0.707156 0.889462 0.738963 0.704689 0.719938 0.953765 0.782926 0.725796 0.733226 0.649799 0.601782 0.82788 0.956573 0.893695 0.706392 0.852041 0.691399 0.853528 0.65921 0.840649 0.709495 0.843366 0.729257 0.669603 0.893223 0.784865 0.685117 0.899927 0.943416
Dependencies
Status `~/Research/SimilaritySearchDemos/site-src/Project.toml`
⌃ [91a5bcdd] Plots v1.15.2
[053f045d] SimilaritySearch v0.10.8
Info Packages marked with ⌃ have new versions available and may be upgradable.