4、对于所有的 k 个簇新,经过 2、3 多轮迭代后,簇心值保持不变 / 达到约定边界条件,则结束迭代。
算法的原理非常简单,但写起来却不是很容易,我特别喜欢面试的时候问候选人这个问题 😂
代码实现
这里以 python 为例,进行实现。
假定,点距离和簇心方法都已经给出,比如这个样子。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
import math input_data = [[1,1],[1,1.5],[5,5],[5,5.5]] k = 2
# 点之间的距离 defdistance(point_a,point_b): x = abs(point_a[0]-point_b[0]) y = abs(point_a[1]-point_b[1]) return math.sqrt(x*x+y*y)
# 当前簇的新中心 defpoint_mean(point_list): x = sum([point[0] for point in point_list]) / len(point_list) y = sum([point[1] for point in point_list]) / len(point_list) return (x,y)
k_cluster = {} for i inrange(k): k_cluster[tuple(input_data[i])] = []
# 2、根据距离选择中心 for point in input_data: max_distance = math.inf target_kernel = None for kernel in k_cluster: if distance(kernel, point) < max_distance: max_distance = distance(kernel, point) target_kernel = kernel k_cluster[tuple(target_kernel)].append(point) print("now cluster",k_cluster)
# 3、开始迭代 k_cluster_old = k_cluster.copy()
whileTrue: # 新的一轮迭代 k_cluster = {} for old_kernel in k_cluster_old: new_kernel = point_mean(k_cluster_old[old_kernel]) k_cluster[new_kernel] = [] # 2、根据距离选择中心 for point in input_data:
max_distance = math.inf target_kernel = None
for kernel in k_cluster: if distance(kernel, point) < max_distance: max_distance = distance(kernel, point) target_kernel = kernel