SimilaritySearch and ManifoldLearning
SimSearchManifoldLearning.ManifoldKnnIndex — TypeManifoldKnnIndex{DistType,MinRecall}Implements the ManifoldLearning.AbstractNearestNeighbors interface to interoperate with the non-linear dimensionality reduction methods of the ManifoldLearning package.
It should be passed to the fit method as a type, e.g.,
fit(ManifoldKnnIndex{L2Distance,0.9}) # will use an approximate index with an expected recall of 0.9DistType should be any in SimilaritySearch package, or Distances.jl or any other following the SemiMetric.
The second argument of the composite type indicates the quality and therefore the type of index to use:
- It takes values between
0and1. 0means for aSearchGraphindex usingParetoRecalloptimization for the construction and searching, this will try to achieve a competitive structure in both quality and search speed1means for aExhaustiveSearchindex, this will compute the exact solution (exact knns) but at cost speed. Can work pretty well on small datasets and very high dimensionality. Really high dimensions suffer from the curse of dimensionality such that an index likeSearchGraphdegrades to ExhaustiveSearch.0 < value < 1: Uses aSearchGraphand is the minimum recall-score quality that the index should perform. In particular, it constructs the index usingParetoRecalland the use a final optimization withMinRecall. It takes values from 0 to 1, small values produce faster searches with lower qualities and high values slower searches with higher quality. Values 0.8 or 0.9 should work pretty well.
Note: The minimum performance is evaluated in a small training set took from the database, this could yield to some kind of overfitting in the parameters, and therefore, perform not so good in an unseen query set. If you note this effect, please see SimilaritySearch documentation function optimize!.
The distance functions are defined to work under the evaluate(::SemiMetric, u, v) function (borrowed from Distances.jl package).
KNN predefined types
SimSearchManifoldLearning.ExactEuclidean — TypeExactEuclideanManifoldKnnIndex's type specialization for exact search with the Euclidean distance.
SimSearchManifoldLearning.ExactManhattan — TypeExactManhattanManifoldKnnIndex's type specialization for exact search with the Manhattan distance.
SimSearchManifoldLearning.ExactChebyshev — TypeExactChebyshevManifoldKnnIndex's type specialization for exact search with the Chebyshev distance.
SimSearchManifoldLearning.ExactCosine — TypeExactCosineManifoldKnnIndex's type specialization for exact search with the cosine distance.
SimSearchManifoldLearning.ExactAngle — TypeExactAngleManifoldKnnIndex's type specialization for exact search with the angle distance.
SimSearchManifoldLearning.ApproxEuclidean — TypeApproxEuclideanManifoldKnnIndex's type specialization for approximate search with the Euclidean distance (expected recall of 0.9)
SimSearchManifoldLearning.ApproxManhattan — TypeApproxManhattanManifoldKnnIndex's type specialization for approximate search with the Manhattan distance (expected recall of 0.9)
SimSearchManifoldLearning.ApproxChebyshev — TypeApproxChebyshevManifoldKnnIndex's type specialization for approximate search with the Chebyshev distance (expected recall of 0.9)
SimSearchManifoldLearning.ApproxCosine — TypeApproxCosineManifoldKnnIndex's type specialization for approximate search with the Cosine distance (expected recall of 0.9)
SimSearchManifoldLearning.ApproxAngle — TypeApproxAngleManifoldKnnIndex's type specialization for approximate search with the angle distance (expected recall of 0.9)