Usage demonstrations of `SimilaritySearch` with synthetic and real world data
You need a Julia installation https://julialang.org/downloads/, in particular, we used Julia for Jupyter notebooks. Pluto notebooks were created with Julia
1.7. Then you must install
Pluto packages, just run the following commands in the REPL
] add IJulia
] add Pluto
This should be enough to run jupyter and pluto, note that if IJulia has some rules about using a preinstalled Jupyter installation or install a new one. Please check its documentation.
Nearest neighbor search can be computationally expensive and, that is why,
SimilaritySearch has multithreading support. You should want to run jupyter and julia using all available threads, that is
JULIA_NUM_THREADS=auto jupyter-lab .
june 5th, 2023: adds news section, some installation requirements, Jupyter notebookes were updated to work with the
v0.10and with the current julia release
v1.9. I also moved most plots to
Given a finite dataset, where , and a metric distance function working with any pair of elements in , the similarity search problem consists on retrieving similar items to a given query , for example, the most similar items to in ( nearest neighbors).
At first glance, the problem is simple since it can be solved using an exhaustive evaluation of all possible results (that is, for all ) and select those items having the least distance to . This solution is impractical when is large or when the number of expected queries is high. In these cases, it is necessary to create a data structure that preprocess the dataset and reduce the cost of solving queries, it is often called a metric index. When the dataset is pretty large or the metric space is quite complex, sometimes we can loose the ability of retrieving the exact solution to gain speed, clearly, the approximation quality becomes a major concern and these approximate methods require a lot of knowledge to trade speed retrieval process also kept high the solution's quality. Additionally, the amount of memory used by the index and the construction time are also concerns whenever is big.
SearchGraph index in the
SimilaritySearch.jl package is a competitive alternative for solving search queries that automatically tune search speed and quality and also remains very competitive in memory and construction costs. Here we show some demostrations of how using
SimilaritySearch.jl in several synthetic and real problems.