SimilaritySearch and ManifoldLearning

SimSearchManifoldLearning.ManifoldKnnIndexType
ManifoldKnnIndex{DistType,MinRecall}

Implements the ManifoldLearning.AbstractNearestNeighbors interface to interoperate with the non-linear dimensionality reduction methods of the ManifoldLearning package.

It should be passed to the fit method as a type, e.g.,

fit(ManifoldKnnIndex{L2Distance,0.9})  # will use an approximate index with an expected recall of 0.9

DistType should be any in SimilaritySearch package, or Distances.jl or any other following the SemiMetric.

The second argument of the composite type indicates the quality and therefore the type of index to use:

  • It takes values between 0 and 1.
  • 0 means for a SearchGraph index using ParetoRecall optimization for the construction and searching, this will try to achieve a competitive structure in both quality and search speed
  • 1 means for a ExhaustiveSearch index, this will compute the exact solution (exact knns) but at cost speed. Can work pretty well on small datasets and very high dimensionality. Really high dimensions suffer from the curse of dimensionality such that an index like SearchGraph degrades to ExhaustiveSearch.
  • 0 < value < 1: Uses a SearchGraph and is the minimum recall-score quality that the index should perform. In particular, it constructs the index using ParetoRecall and the use a final optimization with MinRecall. It takes values from 0 to 1, small values produce faster searches with lower qualities and high values slower searches with higher quality. Values 0.8 or 0.9 should work pretty well.

Note: The minimum performance is evaluated in a small training set took from the database, this could yield to some kind of overfitting in the parameters, and therefore, perform not so good in an unseen query set. If you note this effect, please see SimilaritySearch documentation function optimize!.

source

The distance functions are defined to work under the evaluate(::SemiMetric, u, v) function (borrowed from Distances.jl package).

KNN predefined types