查看“Levenshtein distance”的源代码
←
Levenshtein distance
跳到导航
跳到搜索
因为以下原因,您没有权限编辑本页:
您请求的操作仅限属于该用户组的用户执行:
用户
您可以查看和复制此页面的源代码。
MED(Minimum Edit Distance,编辑距离),由俄罗斯数学家弗拉基米尔·莱文斯坦(Vladimir Levenshtein)在 1965 年提出,也因此而得名 Levenshtein Distance(莱文斯坦距离)。 在信息论、语言学和计算机科学领域,Levenshtein Distance 是用来度量两个序列相似程度的指标。通俗地来讲,编辑距离指的是在两个单词之间,由其中一个单词转换为另一个单词所需要的最少单字符编辑操作次数(插入(Insertion)、删除(Deletion)、替换(Substitution))。 如:ld[hello, hell] = 1(插入"o"), ld[sit, seat] = 2 ==== 应用场景 ==== * 拼写检查 采用数据对齐的方式解决数据不完全一致问题,选取匹配度比较高(LD值比较小)的数据。如解决地址信息填写不规范问题 * 数据匹配 词典应用的拼写提示。例如输入了 throwab,就能提示出 throwable。遍历 t 开头的单词库,寻找匹配度比较高(LD值比较小)的单词进行提示 * 匹配侦测 例如可以把匹配度高于某一个阈值的代码、文章初筛出来以便于进一步确认是否抄袭 ==== Time Complexity ==== The Levenshtein distance algorithm implemented with dynamic programming has a time complexity of <i>O(mn)</i>, where: * <i>m</i> is the length of the first string. * <i>n</i> is the length of the second string. This means the running time of the algorithm grows proportionally to the product of the string lengths. Here's why: 1. The algorithm uses a 2D DP table with dimensions (m+1) x (n+1). 2. It iterates through each cell of this table, performing constant-time operations like comparisons and minimum value calculations. 3. The total number of cells to be filled is (m+1) * (n+1), which translates to O(mn) complexity. Therefore, the time it takes to calculate the Levenshtein distance between two strings increases linearly with the combined length of both strings. === Code === ld(x, y) = min -> ld(x-1, y)+1 OR ld(x, y-1)+1 OR ld(x-1, y-1)+[0|1] [0|1]: if x[-1] = y[-1], then 0 else 1 ==== Python ==== // Levenshtein Distance, recursion // Less efficient because there are multiple double counts def ld(str1, str2): m = len(str1) n = len(str2) if m == 0: return n if n == 0: return m if str1 == str2: return 0 d = 0 if str1[-1] == str2[-1] else 1 return min(ld(str1, str2[:-1]) + 1, ld(str1[:-1], str2) + 1, ld(str1[:-1], str2[:-1]) + d) ==== Java ==== // Levenshtein Distance, m * n array // The space used is larger public static int levenshtein2 (String str1, String str2) { if (isnull(str1)) return str2.length(); if (isnull(str2)) return str1.length(); if (str1.equals(str2)) return 0; int m, n, d, i, j; m = str1.length(); n = str2.length(); int[][] dp = new int[m+1][n+1]; for (i = 0; i<=m; i++) dp[i][0] = i; for (j = 1; j<=n; j++) dp[0][j] = j; for (i = 1; i<=m; i++) { for (j = 1; j<=n; j++) { d = (str1.charAt(i-1) == str2.charAt(j-1)) ? 0 : 1; dp[i][j] = min(dp[i-1][j-1]+d, min(dp[i][j-1]+1, dp[i-1][j]+1)); } } return dp[m][n]; } // Levenshtein Distance, min(m, n) * 2 array public static int levenshtein (String str1, String str2) { if (isnull(str1)) return str2.length(); if (isnull(str2)) return str1.length(); if (str1.equals(str2)) return 0; // less space used - << if (str1.length() > str2.length()) return levenshtein(str2, str1); // less space used - >> int m, n, d, i, j; m = str1.length(); n = str2.length(); int[][] dp = new int[m+1][2]; for (i = 0; i<=m; i++) dp[i][0] = i; dp[1][1] = 1; for (j = 1; j<=n; j++) { for (i = 1; i<=m; i++) { d = (str1.charAt(i-1) == str2.charAt(j-1)) ? 0 : 1; dp[i][1] = min(dp[i-1][0]+d, min(dp[i][0]+1, dp[i-1][1]+1)); } for (i = 0; i<=m; i++) { dp[i][0] = dp[i][1]; } dp[0][1] = dp[0][0] + 1; } return dp[m][1]; } [[分类:Develop]] [[分类:Algorithm]]
返回
Levenshtein distance
。
导航菜单
个人工具
登录
命名空间
页面
讨论
大陆简体
查看
阅读
查看源代码
查看历史
更多
搜索
导航
首页
最近更改
随机页面
目录
文章分类
侧边栏
帮助
工具
链入页面
相关更改
特殊页面
页面信息