Using with ManifoldLearning

by: Eric S. Téllez

This demonstration is about using SimilaritySearch and ManifoldLearning methods through SimSearchManifoldLearning.

using SimilaritySearch, SimSearchManifoldLearning, ManifoldLearning, Primes, Plots, StatsPlots, StatsBase, LinearAlgebra, Markdown, Random

SCurve example

X, L = ManifoldLearning.scurve(segments=5)

scatter(X[1, :], X[2, :], X[3, :], color=L, alpha=0.5)

SimilaritySearch support exact and approximate algorithms to solve k nearest neighbors. Also, it supports different metrics. For instance, let see how the selection of the distance function modifies the projection.

Manhattan distance (\(L_1\))

let Y = predict(fit(Isomap, X, nntype=ApproxManhattan))
    scatter(Y[1,:], Y[2,:], color=L, alpha=0.5)
end
computing farthest point 1, dmax: Inf, imax: 118, n: 153
computing farthest point 2, dmax: 4.0211563, imax: 77, n: 153
computing farthest point 3, dmax: 3.7380528, imax: 68, n: 153
computing farthest point 4, dmax: 2.7191744, imax: 149, n: 153
computing farthest point 5, dmax: 2.6464257, imax: 114, n: 153
computing farthest point 6, dmax: 2.640573, imax: 46, n: 153
computing farthest point 7, dmax: 2.5892792, imax: 95, n: 153
computing farthest point 8, dmax: 2.2490485, imax: 34, n: 153
computing farthest point 9, dmax: 2.0707157, imax: 35, n: 153
computing farthest point 10, dmax: 1.8713273, imax: 133, n: 153
computing farthest point 11, dmax: 1.6823415, imax: 142, n: 153
computing farthest point 12, dmax: 1.6623374, imax: 53, n: 153
computing farthest point 13, dmax: 1.6203825, imax: 26, n: 153
computing farthest point 14, dmax: 1.5765537, imax: 115, n: 153
computing farthest point 15, dmax: 1.410517, imax: 128, n: 153
computing farthest point 16, dmax: 1.3394569, imax: 83, n: 153
computing farthest point 17, dmax: 1.2512485, imax: 6, n: 153
computing farthest point 18, dmax: 1.2162488, imax: 20, n: 153
computing farthest point 19, dmax: 1.1743883, imax: 99, n: 153
computing farthest point 20, dmax: 1.13403, imax: 152, n: 153
computing farthest point 21, dmax: 1.1250432, imax: 16, n: 153
computing farthest point 22, dmax: 1.0729687, imax: 79, n: 153
computing farthest point 23, dmax: 1.0596776, imax: 94, n: 153
computing farthest point 24, dmax: 0.992541, imax: 84, n: 153
computing farthest point 25, dmax: 0.9615145, imax: 139, n: 153
computing farthest point 26, dmax: 0.9568216, imax: 127, n: 153
computing farthest point 27, dmax: 0.94813186, imax: 11, n: 153
computing farthest point 28, dmax: 0.91824263, imax: 144, n: 153
computing farthest point 29, dmax: 0.9127529, imax: 122, n: 153
computing farthest point 30, dmax: 0.91068125, imax: 1, n: 153
computing farthest point 31, dmax: 0.8962267, imax: 134, n: 153
computing farthest point 32, dmax: 0.87231386, imax: 143, n: 153
computing farthest point 33, dmax: 0.8440936, imax: 18, n: 153
computing farthest point 34, dmax: 0.84028184, imax: 100, n: 153
computing farthest point 35, dmax: 0.83248854, imax: 102, n: 153
computing farthest point 36, dmax: 0.806729, imax: 21, n: 153
(n, m, k, length(A.centers), length(C)) = (513, 216, 36, 36, 35)
[ Info: using 32 random queries from the dataset
computing farthest point 1, dmax: Inf, imax: 137, n: 182
computing farthest point 2, dmax: 5.915717, imax: 50, n: 182
computing farthest point 3, dmax: 3.802009, imax: 72, n: 182
computing farthest point 4, dmax: 3.4993076, imax: 42, n: 182
computing farthest point 5, dmax: 2.8268332, imax: 139, n: 182
computing farthest point 6, dmax: 2.6311707, imax: 145, n: 182
computing farthest point 7, dmax: 2.4075136, imax: 158, n: 182
computing farthest point 8, dmax: 2.2946515, imax: 74, n: 182
computing farthest point 9, dmax: 2.1040294, imax: 147, n: 182
computing farthest point 10, dmax: 1.7738006, imax: 114, n: 182
computing farthest point 11, dmax: 1.738389, imax: 106, n: 182
computing farthest point 12, dmax: 1.6103764, imax: 119, n: 182
computing farthest point 13, dmax: 1.4781154, imax: 107, n: 182
computing farthest point 14, dmax: 1.3894738, imax: 29, n: 182
computing farthest point 15, dmax: 1.3378047, imax: 64, n: 182
computing farthest point 16, dmax: 1.323116, imax: 164, n: 182
computing farthest point 17, dmax: 1.314264, imax: 132, n: 182
computing farthest point 18, dmax: 1.2353854, imax: 13, n: 182
computing farthest point 19, dmax: 1.2177114, imax: 82, n: 182
computing farthest point 20, dmax: 1.1549094, imax: 111, n: 182
computing farthest point 21, dmax: 1.126887, imax: 92, n: 182
computing farthest point 22, dmax: 1.097383, imax: 169, n: 182
computing farthest point 23, dmax: 1.014452, imax: 120, n: 182
computing farthest point 24, dmax: 1.0047059, imax: 146, n: 182
computing farthest point 25, dmax: 0.989815, imax: 2, n: 182
computing farthest point 26, dmax: 0.9298565, imax: 33, n: 182
computing farthest point 27, dmax: 0.9104564, imax: 62, n: 182
computing farthest point 28, dmax: 0.8806298, imax: 51, n: 182
computing farthest point 29, dmax: 0.8756791, imax: 84, n: 182
computing farthest point 30, dmax: 0.8525454, imax: 162, n: 182
computing farthest point 31, dmax: 0.8460682, imax: 112, n: 182
computing farthest point 32, dmax: 0.84477603, imax: 174, n: 182
computing farthest point 33, dmax: 0.8063342, imax: 59, n: 182
computing farthest point 34, dmax: 0.77896065, imax: 130, n: 182
computing farthest point 35, dmax: 0.7667773, imax: 166, n: 182
computing farthest point 36, dmax: 0.76588833, imax: 98, n: 182
computing farthest point 37, dmax: 0.7577023, imax: 61, n: 182
computing farthest point 38, dmax: 0.7502051, imax: 25, n: 182
(n, m, k, length(A.centers), length(C)) = (770, 235, 38, 38, 37)
[ Info: using 32 random queries from the dataset
computing farthest point 1, dmax: Inf, imax: 43, n: 195
computing farthest point 2, dmax: 5.2367373, imax: 20, n: 195
computing farthest point 3, dmax: 3.7180321, imax: 47, n: 195
computing farthest point 4, dmax: 3.3775744, imax: 56, n: 195
computing farthest point 5, dmax: 2.8840945, imax: 112, n: 195
computing farthest point 6, dmax: 2.4868114, imax: 88, n: 195
computing farthest point 7, dmax: 2.3150055, imax: 188, n: 195
computing farthest point 8, dmax: 2.0486703, imax: 35, n: 195
computing farthest point 9, dmax: 2.0384965, imax: 193, n: 195
computing farthest point 10, dmax: 1.9964312, imax: 34, n: 195
computing farthest point 11, dmax: 1.7749311, imax: 36, n: 195
computing farthest point 12, dmax: 1.6782948, imax: 108, n: 195
computing farthest point 13, dmax: 1.6529802, imax: 177, n: 195
computing farthest point 14, dmax: 1.6158656, imax: 135, n: 195
computing farthest point 15, dmax: 1.6062155, imax: 76, n: 195
computing farthest point 16, dmax: 1.4386594, imax: 156, n: 195
computing farthest point 17, dmax: 1.3821534, imax: 175, n: 195
computing farthest point 18, dmax: 1.3454163, imax: 114, n: 195
computing farthest point 19, dmax: 1.2357537, imax: 120, n: 195
computing farthest point 20, dmax: 1.1884189, imax: 100, n: 195
computing farthest point 21, dmax: 1.1775814, imax: 55, n: 195
computing farthest point 22, dmax: 1.1763892, imax: 183, n: 195
computing farthest point 23, dmax: 1.1216246, imax: 29, n: 195
computing farthest point 24, dmax: 1.0551457, imax: 48, n: 195
computing farthest point 25, dmax: 1.0244915, imax: 116, n: 195
computing farthest point 26, dmax: 1.009263, imax: 25, n: 195
computing farthest point 27, dmax: 0.9924124, imax: 32, n: 195
computing farthest point 28, dmax: 0.9912797, imax: 5, n: 195
computing farthest point 29, dmax: 0.9148598, imax: 71, n: 195
computing farthest point 30, dmax: 0.8952584, imax: 50, n: 195
computing farthest point 31, dmax: 0.88100934, imax: 78, n: 195
computing farthest point 32, dmax: 0.8656275, imax: 145, n: 195
computing farthest point 33, dmax: 0.8407385, imax: 104, n: 195
computing farthest point 34, dmax: 0.8363576, imax: 155, n: 195
computing farthest point 35, dmax: 0.83308464, imax: 105, n: 195
computing farthest point 36, dmax: 0.8036619, imax: 41, n: 195
computing farthest point 37, dmax: 0.7765903, imax: 106, n: 195
computing farthest point 38, dmax: 0.7743584, imax: 18, n: 195
computing farthest point 39, dmax: 0.75459856, imax: 161, n: 195
(n, m, k, length(A.centers), length(C)) = (1000, 244, 39, 39, 35)
[ Info: using 32 random queries from the dataset
[ Info: using 64 random queries from the dataset
  0.168119 seconds (184.08 k allocations: 11.860 MiB, 99.72% compilation time)

Euclidean distance (\(L_2\))

let
    E = predict(fit(Isomap, X, nntype=ApproxEuclidean))
    scatter(E[1,:], E[2,:], color=L, alpha=0.5)
end
computing farthest point 1, dmax: Inf, imax: 81, n: 136
computing farthest point 2, dmax: 3.1765275, imax: 57, n: 136
computing farthest point 3, dmax: 2.0887954, imax: 31, n: 136
computing farthest point 4, dmax: 2.0824862, imax: 32, n: 136
computing farthest point 5, dmax: 2.0101128, imax: 6, n: 136
computing farthest point 6, dmax: 1.8339777, imax: 58, n: 136
computing farthest point 7, dmax: 1.5729568, imax: 119, n: 136
computing farthest point 8, dmax: 1.5285559, imax: 73, n: 136
computing farthest point 9, dmax: 1.4363505, imax: 51, n: 136
computing farthest point 10, dmax: 1.3445479, imax: 21, n: 136
computing farthest point 11, dmax: 1.3426316, imax: 15, n: 136
computing farthest point 12, dmax: 1.230112, imax: 56, n: 136
computing farthest point 13, dmax: 1.2029935, imax: 42, n: 136
computing farthest point 14, dmax: 1.0324553, imax: 2, n: 136
computing farthest point 15, dmax: 1.0257555, imax: 113, n: 136
computing farthest point 16, dmax: 0.9813445, imax: 33, n: 136
computing farthest point 17, dmax: 0.950305, imax: 85, n: 136
computing farthest point 18, dmax: 0.85221976, imax: 75, n: 136
computing farthest point 19, dmax: 0.80340844, imax: 62, n: 136
computing farthest point 20, dmax: 0.78229374, imax: 37, n: 136
computing farthest point 21, dmax: 0.7375622, imax: 134, n: 136
computing farthest point 22, dmax: 0.7137364, imax: 114, n: 136
computing farthest point 23, dmax: 0.7051105, imax: 72, n: 136
computing farthest point 24, dmax: 0.68795955, imax: 16, n: 136
computing farthest point 25, dmax: 0.6821181, imax: 8, n: 136
computing farthest point 26, dmax: 0.6720803, imax: 97, n: 136
computing farthest point 27, dmax: 0.653411, imax: 99, n: 136
computing farthest point 28, dmax: 0.63688326, imax: 74, n: 136
computing farthest point 29, dmax: 0.61100155, imax: 117, n: 136
computing farthest point 30, dmax: 0.5996261, imax: 63, n: 136
computing farthest point 31, dmax: 0.5886997, imax: 13, n: 136
computing farthest point 32, dmax: 0.5831495, imax: 34, n: 136
computing farthest point 33, dmax: 0.5676932, imax: 96, n: 136
computing farthest point 34, dmax: 0.5461543, imax: 52, n: 136
computing farthest point 35, dmax: 0.53182745, imax: 92, n: 136
computing farthest point 36, dmax: 0.5034151, imax: 93, n: 136
(n, m, k, length(A.centers), length(C)) = (513, 216, 36, 36, 32)
[ Info: using 32 random queries from the dataset
computing farthest point 1, dmax: Inf, imax: 114, n: 176
computing farthest point 2, dmax: 4.2590475, imax: 22, n: 176
computing farthest point 3, dmax: 2.3317323, imax: 171, n: 176
computing farthest point 4, dmax: 2.1820593, imax: 52, n: 176
computing farthest point 5, dmax: 1.8651805, imax: 51, n: 176
computing farthest point 6, dmax: 1.855484, imax: 112, n: 176
computing farthest point 7, dmax: 1.6379606, imax: 75, n: 176
computing farthest point 8, dmax: 1.6221762, imax: 102, n: 176
computing farthest point 9, dmax: 1.4285843, imax: 144, n: 176
computing farthest point 10, dmax: 1.2335333, imax: 35, n: 176
computing farthest point 11, dmax: 1.2249413, imax: 82, n: 176
computing farthest point 12, dmax: 1.1653776, imax: 115, n: 176
computing farthest point 13, dmax: 1.0823392, imax: 157, n: 176
computing farthest point 14, dmax: 1.0770983, imax: 86, n: 176
computing farthest point 15, dmax: 1.0537095, imax: 88, n: 176
computing farthest point 16, dmax: 0.9846361, imax: 20, n: 176
computing farthest point 17, dmax: 0.9070302, imax: 170, n: 176
computing farthest point 18, dmax: 0.8937644, imax: 85, n: 176
computing farthest point 19, dmax: 0.83370537, imax: 145, n: 176
computing farthest point 20, dmax: 0.80990356, imax: 83, n: 176
computing farthest point 21, dmax: 0.7718367, imax: 103, n: 176
computing farthest point 22, dmax: 0.76973563, imax: 61, n: 176
computing farthest point 23, dmax: 0.7521878, imax: 108, n: 176
computing farthest point 24, dmax: 0.7453805, imax: 33, n: 176
computing farthest point 25, dmax: 0.7228984, imax: 92, n: 176
computing farthest point 26, dmax: 0.7225953, imax: 6, n: 176
computing farthest point 27, dmax: 0.7109165, imax: 37, n: 176
computing farthest point 28, dmax: 0.69256, imax: 1, n: 176
computing farthest point 29, dmax: 0.65361494, imax: 64, n: 176
computing farthest point 30, dmax: 0.64091974, imax: 107, n: 176
computing farthest point 31, dmax: 0.59408426, imax: 62, n: 176
computing farthest point 32, dmax: 0.5882155, imax: 156, n: 176
computing farthest point 33, dmax: 0.5775966, imax: 128, n: 176
computing farthest point 34, dmax: 0.57226753, imax: 118, n: 176
computing farthest point 35, dmax: 0.5606112, imax: 142, n: 176
computing farthest point 36, dmax: 0.536344, imax: 158, n: 176
computing farthest point 37, dmax: 0.52651685, imax: 122, n: 176
computing farthest point 38, dmax: 0.5259393, imax: 25, n: 176
(n, m, k, length(A.centers), length(C)) = (770, 235, 38, 38, 37)
[ Info: using 32 random queries from the dataset
computing farthest point 1, dmax: Inf, imax: 29, n: 187
computing farthest point 2, dmax: 3.830146, imax: 134, n: 187
computing farthest point 3, dmax: 2.5796425, imax: 56, n: 187
computing farthest point 4, dmax: 2.1877341, imax: 144, n: 187
computing farthest point 5, dmax: 1.9068332, imax: 61, n: 187
computing farthest point 6, dmax: 1.7362905, imax: 22, n: 187
computing farthest point 7, dmax: 1.7227757, imax: 59, n: 187
computing farthest point 8, dmax: 1.714947, imax: 15, n: 187
computing farthest point 9, dmax: 1.373168, imax: 126, n: 187
computing farthest point 10, dmax: 1.2260824, imax: 39, n: 187
computing farthest point 11, dmax: 1.2162443, imax: 30, n: 187
computing farthest point 12, dmax: 1.212902, imax: 161, n: 187
computing farthest point 13, dmax: 1.102339, imax: 133, n: 187
computing farthest point 14, dmax: 1.0960392, imax: 125, n: 187
computing farthest point 15, dmax: 1.0665832, imax: 186, n: 187
computing farthest point 16, dmax: 1.0401922, imax: 106, n: 187
computing farthest point 17, dmax: 0.8400171, imax: 32, n: 187
computing farthest point 18, dmax: 0.8089342, imax: 10, n: 187
computing farthest point 19, dmax: 0.80517256, imax: 17, n: 187
computing farthest point 20, dmax: 0.8015938, imax: 116, n: 187
computing farthest point 21, dmax: 0.79428023, imax: 107, n: 187
computing farthest point 22, dmax: 0.7830063, imax: 158, n: 187
computing farthest point 23, dmax: 0.7447058, imax: 105, n: 187
computing farthest point 24, dmax: 0.71653545, imax: 141, n: 187
computing farthest point 25, dmax: 0.6906978, imax: 110, n: 187
computing farthest point 26, dmax: 0.6894929, imax: 78, n: 187
computing farthest point 27, dmax: 0.6626851, imax: 168, n: 187
computing farthest point 28, dmax: 0.6571361, imax: 180, n: 187
computing farthest point 29, dmax: 0.6434514, imax: 25, n: 187
computing farthest point 30, dmax: 0.63434345, imax: 148, n: 187
computing farthest point 31, dmax: 0.63132244, imax: 117, n: 187
computing farthest point 32, dmax: 0.6117757, imax: 28, n: 187
computing farthest point 33, dmax: 0.5996607, imax: 108, n: 187
computing farthest point 34, dmax: 0.5982805, imax: 187, n: 187
computing farthest point 35, dmax: 0.56903, imax: 137, n: 187
computing farthest point 36, dmax: 0.56004924, imax: 90, n: 187
computing farthest point 37, dmax: 0.55774206, imax: 3, n: 187
computing farthest point 38, dmax: 0.5408214, imax: 16, n: 187
computing farthest point 39, dmax: 0.52153254, imax: 14, n: 187
(n, m, k, length(A.centers), length(C)) = (1000, 244, 39, 39, 37)
[ Info: using 32 random queries from the dataset
[ Info: using 64 random queries from the dataset
  0.146815 seconds (176.19 k allocations: 11.329 MiB, 99.66% compilation time)

Chebyshev distance (\(L_\infty\))

let
    Ch = predict(fit(Isomap, X, nntype=ApproxChebyshev))
    scatter(Ch[1,:], Ch[2,:], color=L, alpha=0.5)
end
computing farthest point 1, dmax: Inf, imax: 37, n: 138
computing farthest point 2, dmax: 3.681346, imax: 136, n: 138
computing farthest point 3, dmax: 1.8167326, imax: 88, n: 138
computing farthest point 4, dmax: 1.5577658, imax: 19, n: 138
computing farthest point 5, dmax: 1.4974395, imax: 93, n: 138
computing farthest point 6, dmax: 1.3243232, imax: 55, n: 138
computing farthest point 7, dmax: 1.1247166, imax: 65, n: 138
computing farthest point 8, dmax: 1.104864, imax: 56, n: 138
computing farthest point 9, dmax: 1.0830495, imax: 80, n: 138
computing farthest point 10, dmax: 0.9983501, imax: 53, n: 138
computing farthest point 11, dmax: 0.9559294, imax: 23, n: 138
computing farthest point 12, dmax: 0.933417, imax: 16, n: 138
computing farthest point 13, dmax: 0.90263486, imax: 11, n: 138
computing farthest point 14, dmax: 0.84324414, imax: 4, n: 138
computing farthest point 15, dmax: 0.8017093, imax: 77, n: 138
computing farthest point 16, dmax: 0.75653005, imax: 101, n: 138
computing farthest point 17, dmax: 0.72419083, imax: 134, n: 138
computing farthest point 18, dmax: 0.71365297, imax: 113, n: 138
computing farthest point 19, dmax: 0.66759443, imax: 51, n: 138
computing farthest point 20, dmax: 0.6181785, imax: 75, n: 138
computing farthest point 21, dmax: 0.60782164, imax: 13, n: 138
computing farthest point 22, dmax: 0.5979855, imax: 21, n: 138
computing farthest point 23, dmax: 0.5911692, imax: 31, n: 138
computing farthest point 24, dmax: 0.5825815, imax: 85, n: 138
computing farthest point 25, dmax: 0.5791361, imax: 87, n: 138
computing farthest point 26, dmax: 0.54057574, imax: 125, n: 138
computing farthest point 27, dmax: 0.49795043, imax: 94, n: 138
computing farthest point 28, dmax: 0.496994, imax: 104, n: 138
computing farthest point 29, dmax: 0.49253204, imax: 7, n: 138
computing farthest point 30, dmax: 0.48707512, imax: 112, n: 138
computing farthest point 31, dmax: 0.48348525, imax: 49, n: 138
computing farthest point 32, dmax: 0.47950244, imax: 52, n: 138
computing farthest point 33, dmax: 0.47066557, imax: 119, n: 138
computing farthest point 34, dmax: 0.46259525, imax: 127, n: 138
computing farthest point 35, dmax: 0.45734242, imax: 24, n: 138
computing farthest point 36, dmax: 0.4561991, imax: 28, n: 138
(n, m, k, length(A.centers), length(C)) = (513, 216, 36, 36, 32)
[ Info: using 32 random queries from the dataset
computing farthest point 1, dmax: Inf, imax: 16, n: 172
computing farthest point 2, dmax: 3.7887464, imax: 22, n: 172
computing farthest point 3, dmax: 1.9140197, imax: 4, n: 172
computing farthest point 4, dmax: 1.7988492, imax: 101, n: 172
computing farthest point 5, dmax: 1.5327418, imax: 64, n: 172
computing farthest point 6, dmax: 1.4871794, imax: 33, n: 172
computing farthest point 7, dmax: 1.4807838, imax: 159, n: 172
computing farthest point 8, dmax: 1.4417468, imax: 136, n: 172
computing farthest point 9, dmax: 1.3102832, imax: 100, n: 172
computing farthest point 10, dmax: 1.1919875, imax: 109, n: 172
computing farthest point 11, dmax: 1.1184697, imax: 167, n: 172
computing farthest point 12, dmax: 1.0664252, imax: 79, n: 172
computing farthest point 13, dmax: 0.92027813, imax: 55, n: 172
computing farthest point 14, dmax: 0.91349506, imax: 66, n: 172
computing farthest point 15, dmax: 0.86564094, imax: 115, n: 172
computing farthest point 16, dmax: 0.83629006, imax: 53, n: 172
computing farthest point 17, dmax: 0.75365174, imax: 113, n: 172
computing farthest point 18, dmax: 0.7529488, imax: 70, n: 172
computing farthest point 19, dmax: 0.7194774, imax: 150, n: 172
computing farthest point 20, dmax: 0.7169178, imax: 6, n: 172
computing farthest point 21, dmax: 0.6541379, imax: 154, n: 172
computing farthest point 22, dmax: 0.64790004, imax: 86, n: 172
computing farthest point 23, dmax: 0.6473165, imax: 91, n: 172
computing farthest point 24, dmax: 0.6438238, imax: 46, n: 172
computing farthest point 25, dmax: 0.61353356, imax: 142, n: 172
computing farthest point 26, dmax: 0.60014266, imax: 99, n: 172
computing farthest point 27, dmax: 0.5818377, imax: 161, n: 172
computing farthest point 28, dmax: 0.5551741, imax: 19, n: 172
computing farthest point 29, dmax: 0.54581636, imax: 35, n: 172
computing farthest point 30, dmax: 0.5301293, imax: 24, n: 172
computing farthest point 31, dmax: 0.5186341, imax: 32, n: 172
computing farthest point 32, dmax: 0.5180221, imax: 95, n: 172
computing farthest point 33, dmax: 0.49686342, imax: 152, n: 172
computing farthest point 34, dmax: 0.48940948, imax: 49, n: 172
computing farthest point 35, dmax: 0.4677162, imax: 132, n: 172
computing farthest point 36, dmax: 0.42519245, imax: 117, n: 172
computing farthest point 37, dmax: 0.421064, imax: 63, n: 172
computing farthest point 38, dmax: 0.4169775, imax: 165, n: 172
(n, m, k, length(A.centers), length(C)) = (770, 235, 38, 38, 34)
[ Info: using 32 random queries from the dataset
computing farthest point 1, dmax: Inf, imax: 92, n: 187
computing farthest point 2, dmax: 3.7251282, imax: 89, n: 187
computing farthest point 3, dmax: 1.8172259, imax: 25, n: 187
computing farthest point 4, dmax: 1.6521376, imax: 111, n: 187
computing farthest point 5, dmax: 1.547313, imax: 162, n: 187
computing farthest point 6, dmax: 1.4839215, imax: 153, n: 187
computing farthest point 7, dmax: 1.347505, imax: 40, n: 187
computing farthest point 8, dmax: 1.2894305, imax: 102, n: 187
computing farthest point 9, dmax: 1.2813725, imax: 175, n: 187
computing farthest point 10, dmax: 1.2733448, imax: 106, n: 187
computing farthest point 11, dmax: 1.1018052, imax: 78, n: 187
computing farthest point 12, dmax: 1.0599717, imax: 27, n: 187
computing farthest point 13, dmax: 0.9225848, imax: 187, n: 187
computing farthest point 14, dmax: 0.8804724, imax: 150, n: 187
computing farthest point 15, dmax: 0.78911364, imax: 184, n: 187
computing farthest point 16, dmax: 0.7773338, imax: 146, n: 187
computing farthest point 17, dmax: 0.76996124, imax: 6, n: 187
computing farthest point 18, dmax: 0.70446193, imax: 125, n: 187
computing farthest point 19, dmax: 0.69819665, imax: 129, n: 187
computing farthest point 20, dmax: 0.68887055, imax: 163, n: 187
computing farthest point 21, dmax: 0.68304276, imax: 19, n: 187
computing farthest point 22, dmax: 0.65774864, imax: 59, n: 187
computing farthest point 23, dmax: 0.63491076, imax: 80, n: 187
computing farthest point 24, dmax: 0.6217184, imax: 67, n: 187
computing farthest point 25, dmax: 0.60838157, imax: 44, n: 187
computing farthest point 26, dmax: 0.58522314, imax: 52, n: 187
computing farthest point 27, dmax: 0.5583265, imax: 115, n: 187
computing farthest point 28, dmax: 0.5512252, imax: 93, n: 187
computing farthest point 29, dmax: 0.5377479, imax: 32, n: 187
computing farthest point 30, dmax: 0.52359104, imax: 53, n: 187
computing farthest point 31, dmax: 0.5048922, imax: 62, n: 187
computing farthest point 32, dmax: 0.49204767, imax: 22, n: 187
computing farthest point 33, dmax: 0.48676142, imax: 185, n: 187
computing farthest point 34, dmax: 0.47781816, imax: 31, n: 187
computing farthest point 35, dmax: 0.4739993, imax: 142, n: 187
computing farthest point 36, dmax: 0.4638383, imax: 165, n: 187
computing farthest point 37, dmax: 0.45496857, imax: 95, n: 187
computing farthest point 38, dmax: 0.44374564, imax: 139, n: 187
computing farthest point 39, dmax: 0.43240353, imax: 34, n: 187
(n, m, k, length(A.centers), length(C)) = (1000, 244, 39, 39, 37)
[ Info: using 32 random queries from the dataset
[ Info: using 64 random queries from the dataset
  0.146462 seconds (176.20 k allocations: 11.335 MiB, 99.70% compilation time)

Visualizing prime gaps

The difference between contiguous prime numbers is called a Prime gap. We use this series of values as a time series example due to its interesting behavior and since it can be computed without downloading more than the necessary packages.

This example shows how to generate the dataset and index it. We will use the ManifoldLearning for generating the 2d visualization.

Generation of the dataset

The time series is represented with windows of size w, we also take log of gaps to reduce variance in gap values. We create a matrix to avoid redefinition of the knn interface for ManifoldLearning.

function create_database_primes_diff(n, w)
    T = log2.(diff(primes(n)))
    M = Matrix{Float32}(undef, w, length(T) - w)
    @info size(M)
    for i in 1:size(M, 2)
        M[:, i] .= view(T, i:(i+w-1))
    end

    M
end


x, y = let
    P = create_database_primes_diff(3 * 10^4, 5)
    # or LLE
    primesgap = fit(Isomap, P; k=16, maxoutdim=2, nntype=ApproxEuclidean)
    
    p = predict(primesgap)
    p[1, :], p[2, :]
end

A 2D histogram

histogram2d(x, y; nbins=100)

Environment and dependencies

Julia Version 1.10.9
Commit 5595d20a287 (2025-03-10 12:51 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 64 default, 0 interactive, 32 GC (on 64 virtual cores)
Environment:
  JULIA_PROJECT = .
  JULIA_NUM_THREADS = auto
  JULIA_LOAD_PATH = @:@stdlib
Status `~/sites/SimilaritySearchDemos/Project.toml`
  [aaaa29a8] Clustering v0.15.8
  [944b1d66] CodecZlib v0.7.8
  [a93c6f00] DataFrames v1.7.0
  [c5bfea45] Embeddings v0.4.6
  [f67ccb44] HDF5 v0.17.2
  [b20bd276] InvertedFiles v0.8.0 `~/.julia/dev/InvertedFiles`
  [682c06a0] JSON v0.21.4
  [23fbe1c1] Latexify v0.16.6
  [eb30cadb] MLDatasets v0.7.18
  [06eb3307] ManifoldLearning v0.9.0
⌃ [ca7969ec] PlotlyLight v0.11.0
  [91a5bcdd] Plots v1.40.11
  [27ebfcd6] Primes v0.5.7
  [ca7ab67e] SimSearchManifoldLearning v0.3.0 `~/.julia/dev/SimSearchManifoldLearning`
  [053f045d] SimilaritySearch v0.12.0 `~/.julia/dev/SimilaritySearch`
⌅ [2913bbd2] StatsBase v0.33.21
  [f3b207a7] StatsPlots v0.15.7
  [7f6f6c8a] TextSearch v0.19.0 `~/.julia/dev/TextSearch`
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`