using SimilaritySearch, SimSearchManifoldLearning, ManifoldLearning, Primes, Plots, StatsPlots, StatsBase, LinearAlgebra, Markdown, Random
Using with ManifoldLearning
by: Eric S. Téllez
This demonstration is about using SimilaritySearch
and ManifoldLearning
methods through SimSearchManifoldLearning
.
SCurve example
= ManifoldLearning.scurve(segments=5)
X, L
scatter(X[1, :], X[2, :], X[3, :], color=L, alpha=0.5)
SimilaritySearch
support exact and approximate algorithms to solve k
nearest neighbors. Also, it supports different metrics. For instance, let see how the selection of the distance function modifies the projection.
Manhattan distance (\(L_1\))
let Y = predict(fit(Isomap, X, nntype=ApproxManhattan))
scatter(Y[1,:], Y[2,:], color=L, alpha=0.5)
end
LOG add_vertex! sp=1 ep=1 n=1 BeamSearch(bsize=4, Δ=1.0, maxvisits=1000000) 2025-09-22T09:33:10.480
LOG n.size quantiles:[0.0, 0.0, 0.0, 0.0, 0.0]
LOG add_vertex! sp=514 ep=770 n=513 BeamSearch(bsize=2, Δ=0.8638376, maxvisits=154) 2025-09-22T09:33:13.600
LOG n.size quantiles:[2.0, 3.0, 3.0, 4.0, 5.0]
0.151662 seconds (175.80 k allocations: 11.259 MiB, 99.52% compilation time)
Euclidean distance (\(L_2\))
let
= predict(fit(Isomap, X, nntype=ApproxEuclidean))
E scatter(E[1,:], E[2,:], color=L, alpha=0.5)
end
LOG add_vertex! sp=1 ep=1 n=1 BeamSearch(bsize=4, Δ=1.0, maxvisits=1000000) 2025-09-22T09:33:17.135
LOG n.size quantiles:[0.0, 0.0, 0.0, 0.0, 0.0]
LOG add_vertex! sp=514 ep=770 n=513 BeamSearch(bsize=4, Δ=0.84224164, maxvisits=130) 2025-09-22T09:33:18.230
LOG n.size quantiles:[2.0, 3.0, 3.0, 3.0, 4.0]
0.129605 seconds (167.92 k allocations: 10.729 MiB, 99.69% compilation time)
Chebyshev distance (\(L_\infty\))
let
= predict(fit(Isomap, X, nntype=ApproxChebyshev))
Ch scatter(Ch[1,:], Ch[2,:], color=L, alpha=0.5)
end
LOG add_vertex! sp=1 ep=1 n=1 BeamSearch(bsize=4, Δ=1.0, maxvisits=1000000) 2025-09-22T09:33:21.299
LOG n.size quantiles:[0.0, 0.0, 0.0, 0.0, 0.0]
LOG add_vertex! sp=514 ep=770 n=513 BeamSearch(bsize=2, Δ=1.05, maxvisits=180) 2025-09-22T09:33:22.509
LOG n.size quantiles:[1.0, 3.0, 3.0, 4.0, 5.0]
0.129050 seconds (167.93 k allocations: 10.730 MiB, 99.52% compilation time)
Visualizing prime gaps
The difference between contiguous prime numbers is called a Prime gap. We use this series of values as a time series example due to its interesting behavior and since it can be computed without downloading more than the necessary packages.
This example shows how to generate the dataset and index it. We will use the ManifoldLearning
for generating the 2d visualization.
Generation of the dataset
The time series is represented with windows of size w
, we also take log
of gaps to reduce variance in gap values. We create a matrix to avoid redefinition of the knn interface for ManifoldLearning
.
function create_database_primes_diff(n, w)
= log2.(diff(primes(n)))
T = Matrix{Float32}(undef, w, length(T) - w)
M @info size(M)
for i in 1:size(M, 2)
:, i] .= view(T, i:(i+w-1))
M[end
Mend
= let
x, y = create_database_primes_diff(3 * 10^4, 5)
P # or LLE
= fit(Isomap, P; k=16, maxoutdim=2, nntype=ApproxEuclidean)
primesgap
= predict(primesgap)
p 1, :], p[2, :]
p[end
A 2D histogram
histogram2d(x, y; nbins=100)
Environment and dependencies
Julia Version 1.10.10
Commit 95f30e51f41 (2025-06-27 09:51 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 64 × Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 64 default, 0 interactive, 32 GC (on 64 virtual cores)
Environment:
JULIA_NUM_THREADS = auto
JULIA_PROJECT = .
JULIA_LOAD_PATH = @:@stdlib
Status `~/Research/SimilaritySearchDemos/Project.toml`
[aaaa29a8] Clustering v0.15.8
[944b1d66] CodecZlib v0.7.8
[a93c6f00] DataFrames v1.8.0
[c5bfea45] Embeddings v0.4.6
[f67ccb44] HDF5 v0.17.2
[b20bd276] InvertedFiles v0.8.1
[682c06a0] JSON v0.21.4
[23fbe1c1] Latexify v0.16.10
[eb30cadb] MLDatasets v0.7.18
[06eb3307] ManifoldLearning v0.9.0
⌃ [ca7969ec] PlotlyLight v0.11.0
[91a5bcdd] Plots v1.40.20
[27ebfcd6] Primes v0.5.7
[ca7ab67e] SimSearchManifoldLearning v0.3.1
[053f045d] SimilaritySearch v0.13.0
⌅ [2913bbd2] StatsBase v0.33.21
[f3b207a7] StatsPlots v0.15.7
[7f6f6c8a] TextSearch v0.19.6
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`