1 数据
假设我们有这样的一个数据tst_lst,表示的是5条轨迹的墨卡托坐标,我们希望算出逐点的曼哈顿距离之和,作为两条轨迹的距离
[array([[11549759.51313693, 148744.89246911],
[11549751.49813359, 148732.97804463],
[11549757.62070558, 148738.21148336],
[11549877.73443613, 148886.64075531],
[11549855.1365795 , 148900.67083319]]),
array([[11556428.51911408, 145454.58226351],
[11557035.91165162, 145493.83259114],
[11557310.50343952, 145408.66217089],
[11557748.16714946, 145339.9824732 ],
[11558124.96136184, 145498.27539452]]),
array([[11560299.60987809, 143642.48133694],
[11560236.88134503, 143437.08940241],
[11560254.26944949, 143331.75455279],
[11560222.79942945, 143349.26953089],
[11560224.0350758 , 143354.70329418]]),
array([[11559757.30584681, 143885.2194761 ],
[11560304.02926187, 143639.87580025],
[11560743.21804884, 143750.12120076],
[11560626.52182665, 144103.28312704],
[11560722.44583186, 144272.53199179]]),
array([[11569978.06036478, 151723.38135785],
[11569938.73118869, 151248.5811628 ],
[11569616.11617246, 150791.67584703],
[11569571.34347327, 150687.55191842],
[11569688.57402901, 150674.10077112]])]
2 处理原始数据
2.1 直接喂入的问题
如果直接将上面的数据fit入NearestNeighbors,是会报错的:
from sklearn.neighbors import NearestNeighbors
cellKDtree=NearestNeighbors().fit(tst_lst)
cellKDtree
'''
ValueError: Found array with dim 3. NearestNeighbors expected <= 2.
'''
ValueError
是由于尝试在 NearestNeighbors
对象上使用三维数组导致的。NearestNeighbors
期望的输入是一个二维数组,其中每行代表一个数据点,每列代表一个特征
2.2 修改数据形状
每一个轨迹二维矩阵转化成一个一维向量
tst_lst=np.array(tst_lst)
tst_lst_new=[]
for i in range(len(tst_lst)):
tst_lst_new.append(np.hstack(tst_lst[i]).tolist())
tst_lst_new
'''
[[11549759.513136925,
148744.89246911363,
11549751.49813359,
148732.97804463338,
11549757.620705582,
148738.2114833576,
11549877.734436132,
148886.6407553058,
11549855.136579504,
148900.67083319122],
[11556428.519114085,
145454.58226351053,
11557035.911651615,
145493.83259113596,
11557310.503439516,
145408.66217089174,
11557748.167149458,
145339.9824731981,
11558124.961361844,
145498.2753945235],
[11560299.609878086,
143642.48133694328,
11560236.881345032,
143437.0894024146,
11560254.269449493,
143331.75455278732,
11560222.79942945,
143349.26953088713,
11560224.035075797,
143354.7032941798],
[11559757.305846812,
143885.21947610297,
11560304.02926187,
143639.8758002481,
11560743.218048835,
143750.12120075937,
11560626.521826653,
144103.28312704086,
11560722.445831856,
144272.53199179273],
[11569978.060364777,
151723.38135785353,
11569938.731188687,
151248.58116280191,
11569616.116172463,
150791.67584703089,
11569571.343473272,
150687.55191841844,
11569688.57402901,
150674.1007711226]]
'''
此时送入NearestNeighbor已经可以了
from sklearn.neighbors import NearestNeighbors
cellKDtree=NearestNeighbors().fit(tst_lst_new)
cellKDtree
3 自定义函数
from scipy.spatial.distance import *
import numpy as np
def disfunc(x,y):
#每次比较fit入Nearest Neighbor 的矩阵的两行
x_points=np.array([(x[i],x[i+1]) for i in range(0,len(x),2)])
y_points=np.array([(y[i],y[i+1]) for i in range(0,len(y),2)])
#提取经纬度,将每一行一维向量改成二维矩阵
return float(np.sum(np.diag(cdist(x_points,y_points,metric='cityblock'))))
'''
cdist(x_points,y_points,metric='cityblock') 将得到一个二维矩阵,表示x每一个元素和y每一个元素的曼哈顿距离
np.diag是取二维矩阵的对角元素,表示x和y对应位置元素的距离
求和就是两条轨迹的距离
'''
4 使用NearestNeighbor
注:似乎algorithm只能选择默认的brute,KD_tree和ball_tree都不行
from sklearn.neighbors import *
cellKDtree=NearestNeighbors(metric=disfunc).fit(tst_lst_new)
cellKDtree