Using with ManifoldLearning

by: Eric S. Téllez

This demonstration is about using SimilaritySearch and ManifoldLearning methods through SimSearchManifoldLearning.

using SimilaritySearch, SimSearchManifoldLearning, ManifoldLearning, Primes, Plots, StatsPlots, StatsBase, LinearAlgebra, Markdown, Random

SCurve example

X, L = ManifoldLearning.scurve(segments=5)

scatter(X[1, :], X[2, :], X[3, :], color=L, alpha=0.5)

SimilaritySearch support exact and approximate algorithms to solve k nearest neighbors. Also, it supports different metrics. For instance, let see how the selection of the distance function modifies the projection.

Manhattan distance (\(L_1\))

let Y = predict(fit(Isomap, X, nntype=ApproxManhattan))
    scatter(Y[1,:], Y[2,:], color=L, alpha=0.5)
end
LOG add_vertex! sp=1 ep=1 n=1 BeamSearch(bsize=4, Δ=1.0, maxvisits=1000000) 2025-09-22T09:33:10.480
LOG n.size quantiles:[0.0, 0.0, 0.0, 0.0, 0.0]
LOG add_vertex! sp=514 ep=770 n=513 BeamSearch(bsize=2, Δ=0.8638376, maxvisits=154) 2025-09-22T09:33:13.600
LOG n.size quantiles:[2.0, 3.0, 3.0, 4.0, 5.0]
  0.151662 seconds (175.80 k allocations: 11.259 MiB, 99.52% compilation time)

Euclidean distance (\(L_2\))

let
    E = predict(fit(Isomap, X, nntype=ApproxEuclidean))
    scatter(E[1,:], E[2,:], color=L, alpha=0.5)
end
LOG add_vertex! sp=1 ep=1 n=1 BeamSearch(bsize=4, Δ=1.0, maxvisits=1000000) 2025-09-22T09:33:17.135
LOG n.size quantiles:[0.0, 0.0, 0.0, 0.0, 0.0]
LOG add_vertex! sp=514 ep=770 n=513 BeamSearch(bsize=4, Δ=0.84224164, maxvisits=130) 2025-09-22T09:33:18.230
LOG n.size quantiles:[2.0, 3.0, 3.0, 3.0, 4.0]
  0.129605 seconds (167.92 k allocations: 10.729 MiB, 99.69% compilation time)

Chebyshev distance (\(L_\infty\))

let
    Ch = predict(fit(Isomap, X, nntype=ApproxChebyshev))
    scatter(Ch[1,:], Ch[2,:], color=L, alpha=0.5)
end
LOG add_vertex! sp=1 ep=1 n=1 BeamSearch(bsize=4, Δ=1.0, maxvisits=1000000) 2025-09-22T09:33:21.299
LOG n.size quantiles:[0.0, 0.0, 0.0, 0.0, 0.0]
LOG add_vertex! sp=514 ep=770 n=513 BeamSearch(bsize=2, Δ=1.05, maxvisits=180) 2025-09-22T09:33:22.509
LOG n.size quantiles:[1.0, 3.0, 3.0, 4.0, 5.0]
  0.129050 seconds (167.93 k allocations: 10.730 MiB, 99.52% compilation time)

Visualizing prime gaps

The difference between contiguous prime numbers is called a Prime gap. We use this series of values as a time series example due to its interesting behavior and since it can be computed without downloading more than the necessary packages.

This example shows how to generate the dataset and index it. We will use the ManifoldLearning for generating the 2d visualization.

Generation of the dataset

The time series is represented with windows of size w, we also take log of gaps to reduce variance in gap values. We create a matrix to avoid redefinition of the knn interface for ManifoldLearning.

function create_database_primes_diff(n, w)
    T = log2.(diff(primes(n)))
    M = Matrix{Float32}(undef, w, length(T) - w)
    @info size(M)
    for i in 1:size(M, 2)
        M[:, i] .= view(T, i:(i+w-1))
    end

    M
end


x, y = let
    P = create_database_primes_diff(3 * 10^4, 5)
    # or LLE
    primesgap = fit(Isomap, P; k=16, maxoutdim=2, nntype=ApproxEuclidean)
    
    p = predict(primesgap)
    p[1, :], p[2, :]
end

A 2D histogram

histogram2d(x, y; nbins=100)

Environment and dependencies

Julia Version 1.10.10
Commit 95f30e51f41 (2025-06-27 09:51 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 64 default, 0 interactive, 32 GC (on 64 virtual cores)
Environment:
  JULIA_NUM_THREADS = auto
  JULIA_PROJECT = .
  JULIA_LOAD_PATH = @:@stdlib
Status `~/Research/SimilaritySearchDemos/Project.toml`
  [aaaa29a8] Clustering v0.15.8
  [944b1d66] CodecZlib v0.7.8
  [a93c6f00] DataFrames v1.8.0
  [c5bfea45] Embeddings v0.4.6
  [f67ccb44] HDF5 v0.17.2
  [b20bd276] InvertedFiles v0.8.1
  [682c06a0] JSON v0.21.4
  [23fbe1c1] Latexify v0.16.10
  [eb30cadb] MLDatasets v0.7.18
  [06eb3307] ManifoldLearning v0.9.0
⌃ [ca7969ec] PlotlyLight v0.11.0
  [91a5bcdd] Plots v1.40.20
  [27ebfcd6] Primes v0.5.7
  [ca7ab67e] SimSearchManifoldLearning v0.3.1
  [053f045d] SimilaritySearch v0.13.0
⌅ [2913bbd2] StatsBase v0.33.21
  [f3b207a7] StatsPlots v0.15.7
  [7f6f6c8a] TextSearch v0.19.6
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`