NeighborhoodApproximationIndex

NeighborhoodApproximationIndex.DistanceOnTopKOrdering
NeighborhoodApproximationIndex.DistanceOrdering
NeighborhoodApproximationIndex.InternalDistanceOrdering
NeighborhoodApproximationIndex.KnrCaches
NeighborhoodApproximationIndex.KnrIndex
NeighborhoodApproximationIndex.KnrIndex
Base.append!
Base.push!
NeighborhoodApproximationIndex.get_parallel_block
NeighborhoodApproximationIndex.references
SimilaritySearch.index!
SimilaritySearch.search

NeighborhoodApproximationIndex.DistanceOnTopKOrdering — Type

struct DistanceOnTopKOrdering <: KnrOrderingStrategy
    top::Int
end

Used as ordering parameter of KnrIndex specifies that only the top elements are evaluated against the distance metric. Useful for very costly distance functions. If you are in doubt please use DistanceOrdering instead.

source

NeighborhoodApproximationIndex.DistanceOrdering — Type

struct DistanceOrdering <: KnrOrderingStrategy end

Used as ordering parameter of KnrIndex specifies that each object u found by the inverted index will be evaluated against a given query. This is the default ordering strategy.

source

NeighborhoodApproximationIndex.InternalDistanceOrdering — Type

struct InternalDistanceOrdering <: KnrOrderingStrategy end

Used as ordering parameter of KnrIndex specifies that the internal distance of the underlying inverted index will be used to ordering the k nearest neighbors. Useful when the internal distance is quite representative of the real distance metric or whenever speed is the major objective.

source

NeighborhoodApproximationIndex.KnrCaches — Type

struct KnrCaches
    enc
end

Caches used for KnrIndex (one per thread)

Properties

enc: KnnResult for encoding purposes

source

NeighborhoodApproximationIndex.KnrIndex — Type

struct KnrIndex <: AbstractSearchContext

The K nearest references inverted index

Parameters

dist: the distance function of the index
db: the database of indexed objects
centers: a search index for a set of references
invfile: an inverted file data structure
kbuild: the number of references to be computed and stored by each indexed object
ordering: specifies how the index performs final k nn selection
opt: the parameters to be optimized by optimize!

source

NeighborhoodApproximationIndex.KnrIndex — Method

KnrIndex(
    dist::SemiMetric,
    db::AbstractDatabase;
    invfiletype=BinaryInvertedFile,
    invfiledist=JaccardDistance(),
    initial=:dnet,
    maxiters=0,
    refs=references(dist, db; initial),
    centers=nothing,
    kbuild=3,
    ksearch=1,
    centersrecall::AbstractFloat=length(db) > 10^3 ? 0.95 : 1.0,
    ordering=DistanceOrdering(),
    pools=nothing,
    parallel_block=get_parallel_block(length(db)),
    verbose=false
)

A convenient function to create a KnrIndex, it uses several default arguments. After the construction, use optimize! to adjust the index to some performance.

Arguments

dist: Distance object (a SemiMetric object, see Distances.jl)
db: The database of objects to be indexed.

Keyword arguments

invfiletype: the type of the underlying inverted file (BinaryInvertedFile or WeightedInvertedFile)
invfiledist: the distance of the underlying inverted file (see InvertedFiles.jl package)
centers: The index used for centers/references, if centers === nothing then a sample of db will be used.
initial: indicates how references are selected, only used if refs will be computed; see references for more detail.
maxiters: how many iterations of the Lloyd algorithm are applied to initial references, only used if refs will be computed; see references for more detail.
refs: the set of reference, only used if centers === nothing
centersrecall: used when centers === nothing; if centersrecall == 1 then it will create an exact index on refs or an approximate if 0 < centersrecall < 1
kbuild: the number of references to compute and store on construction
ksearch: the number of references to compute on searching
ordering: the ordering strategy
pools: an object with preallocated caches specific for KnrIndex, if pools=nothing it will use default caches.
parallel_block Parallel construction works on batches, this is the size of these blocks
verbose true if you want to see messages

source

Base.append! — Method

append!(idx::KnrIndex, db; <kwargs>)

Appends all items in the database db into the index

Arguments

idx: the index structure
db: the objects to be appended

Keyword arguments

parallel_block: the number of elements to be inserted in parallel
pools: unused argument
verbose: controls the verbosity of the procedure

source

Base.push! — Method

push!(idx::KnrIndex, obj; pools=getpools(idx), encpools=getpools(idx.centers))

Inserts obj into the indexed

source

NeighborhoodApproximationIndex.get_parallel_block — Method

get_parallel_block(n)

An heuristic to compute the parallel_block with respect with the number of elements to insert

source

NeighborhoodApproximationIndex.references — Method

references(dist::SemiMetric, db; <kwargs>) -> SubDatabase

Computes a set of references, a sample of db, using the KCenters specification, it is a wrapper to kcenters function in the KCenters.jl package. The set of references are computed taking into account the specified r but also the distance function dist.

Arguments

dist: a distance function
db: the database to be sampled

Keyword arguments

k::Int: the number of centers to compute, defaults to sqrt(|db|)
sample::Real: indicates the sampling size before computing the set of k references, defaults to log(|db|) k; sample=0 means for no sampling.
maxiters::Int: number of iterationso of the Lloyd algorithm that should be applied on the initial computation of centers, that is, maxiters > 0 applies maxiters iterations of the algorithm.
tol::Float64: change tolerance to stop the Lloyd algorithm (error changes smaller than tol among iterations will stop the algorithm)
initial: initial centers or a strategy to compute initial centers, symbols :rand, :fft, and :dnet.

There are several interactions between initial values and parameter interactions (described in KCenters object), for instance, the maxiters > 0 apply the Lloyd's algorithm to the initial computation of references.

if initial=:rand:
- maxiters = 0 will retrieve a simple random sampling
- maxiters > 0' achieve kmeans-centroids,maxiters` should be set appropiately for the the dataset
if initial=:dnet:
- maxiters = 0 computes a pure density-net
- maxiters > 0 will compute a kmeans centroids but with an initialization based on the dnet
if initial=:fft:
- maxiters = 0 computes k centers with the farthest first traversal algorithm
- maxiters > 0 will use the FFT based kcenters as initial points for the Lloyd algorithm

Note 1: maxiters > 0 needs to compute centroids and these centroids should be defined for the specific data model, and also be of use in the specific metric distance and error function.

Note 2: The error function is defined as the mean of distances of all objects in db to its associated nearest centers in each iteration.

Note 3: The computation of references on large databases can be prohibitive, in these cases please consider to work on a sample of db

source

SimilaritySearch.index! — Method

index!(idx::KnrIndex; parallel_block=get_parallel_block(length(idx.db)), pools=nothing, verbose=true)

Indexes all non indexed items in the database

Arguments

idx: the index structure

Keyword arguments

parallel_block: the number of elements to be inserted in parallel
pools: unused parameter
verbose: controls verbosity of the procedure

source

SimilaritySearch.search — Method

search(idx::KnrIndex, q, res::KnnResult; ksearch=idx.opt.ksearch, ordering=idx.ordering, pools=getpools(D))

Searches nearest neighbors of q inside the index under the distance function dist.

source