-
-
Notifications
You must be signed in to change notification settings - Fork 30
feat: add Levenshtein distance #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ZoomRmc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
Some general overview of the algorithm an a basic use-case in a runnableExamples block would be nice (for example, just selecting a suitable string from a "dictionary" sequence as a substitution for a string with a typo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice implementation! At certain places, I would like more comments.
You can consider adding the two-vector solution to the Levenshtein distance.
In any case, Levenshtein should be put in regard to other metrics. Hamming distance is at the basis of linear code theory and operates only by substitution. We might also add in another module in the future the Longest Common Substring sequence, which on the contrary, operates by deletion and insertion.
The Edit Distance is more general and gives weights to the three operations.
|
Take your time! |
Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>
1c127df to
b306027
Compare
I have been thinking about that. My reasoning for not doing that is the following:
|
ZoomRmc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Regarding other, more optimized solutions: it's absolutely fine to have only the basic one and leave the room for improvement in later PRs, possibly from new contributors. If it happens, we can always come to a decision where exactly to put them.
Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>
ZoomRmc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think by this point it's ready to merge!
dlesnoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must decide between the Key type and the proc overloading solution but I need some additional time for this. I let @ZoomRmc decide on this if I do not react quickly which is highly probable.
I am happy with the changes but require some documentation improvements.
Dropping the more subtle two-vector implementations is fine by me.
He would have then reinvented the wheel. No, most people would not spot this. You are right. Restraining information teaches less and educates more. Should we teach or educate? I am more in favour of teaching. One must know a similar example to spot this kind of space savings. He has to be already knowledgeable. Anyway, this is more suitable for another PR. I think that the implementation is neat and fine as is. I just want:
|
dlesnoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this excellent contribution!
We had some fruitful conversations on different approaches, and I appreciate having some opposite views.
* feat: add Levenshtein distance * Apply suggestions from code review Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com> * refactor: use toKey in initSubsolutions * style: use .Natural * style: remove blank line * refactor: add Key type * style: use descriptive variable names * style: use a, b instead of strA, strB * docs: add note about complexity * refator: add name to TestCase * docs: remove sentence about dp * docs: update documentation of `levenshteinDistance` Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com> * docs: rename initLevenshteinDistanceMatrix and add its documentation * refactor: add fillLevenshteinDistanceMatrix * tests: reorganise tests, remove TestCase type * docs: add edit distance to module-level documentation * docs: mention Wagner–Fischer algorithm * docs: add comment in fillLevenshteinDistanceMatrix * style: reformat fillLevenshteinDistanceMatrix * docs: update DIRECTORY.md --------- Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>
* Add a .github action to build DIRECTORY.md * Update gh action build_directory_md * Add ignored paths and git credentials * chore: minor Gitpod improvements (#34) Add Gitpod badge and prebuilds for all branches and forks * Add binary search (#37) * Add binary search * Update binary search: add recursive version * Update binary search: better description * Update binary search: update runnableExamples * Update binary search: improve tests, add overload for recursive func * Update binary search: evaluate if-expression into result * grammar * Update binary search: improve description, add comments * Update binary search: add missing test cases * Update binary search: add info about pivot choice in description, rearrange it * Update binary search: change int to Natural * Update binary search: if block -> guard clause * Update binary search: test cases * Update binary search: simplify docs * Update binary search: nimpretty * Apply suggestions from code review Co-authored-by: David Leal <halfpacho@gmail.com> * Replace by an attempt of github pr create * Fix branching and environment variable in the PR Co-authored-by: David Leal <halfpacho@gmail.com> * Create and populate the Maths directory (#4) * Add a sample of maths basic algorithms * Update maths/abs.nim Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com> * Use openArray in absMaxSort Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com> * Fix seq->openArray and int->SomeInteger * Use Positive as input instead * Update maths/addition_without_arithmetic.nim Co-authored-by: Pietro Peterlongo <pietro.peterlongo@gmail.com> * Name allocation number * [maths/abs] Replace maxAbsSort by signed[Min/Max]Abs * [Aliquot Sum] Add header * Add empty line at end of file * Remove Allocation number since it is a non standard algorithm * Fix titles * Run nimpretty on the files * Rename file and add DIRECTORY.md --------- Co-authored-by: Dimitri LESNOFF <dimitri.lesnoff@lip6.fr> Co-authored-by: David Leal <halfpacho@gmail.com> * Add arc length (#10) * Fix linear search (#33) * Use a dynamic allocated array (sequence) for strings * Run nimpretty * Add requested changes * Fix documentation generation * Fix test title according to changes Co-authored-by: Satin Wuker <74630829+SatinWuker@users.noreply.github.com> * Update comments to warn about indexing issues Modified a bit the suggestions of comments. Please check them before approving. Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com> --------- Co-authored-by: Dimitri LESNOFF <dimitri.lesnoff@lip6.fr> Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com> * chore: run `apt-get update` before running other commands (#41) This will run `sudo apt-get update` before running any other commands on the Gitpod prebuild image (`.gitpod.dockerfile`). * Readme: fix gitter link, add Nim logo, formatting fixes (#42) * Update README.md Use more extended markdown syntax instead of HTML, Add Matrix links Superseed PR #40 * Actually fix gitter! Move img src links to footer. Also changed the TheAlgorithms logo image link. * Restore TheAlgorithms logo link * More link fixes --------- Co-authored-by: dlesnoff <54949944+dlesnoff@users.noreply.github.com> * fix: nimpretty for `arc_length.nim` (#43) * chore: `gc`->`mm`, move back to `config.nims` (#47) * chore: revert #41 (#45) * Update codeowners * Fix: CONTRIBUTING.md links (#49) Uppercase `directory.md` to `DIRECTORY.md` * feat: add Levenshtein distance (#46) * feat: add Levenshtein distance * Apply suggestions from code review Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com> * refactor: use toKey in initSubsolutions * style: use .Natural * style: remove blank line * refactor: add Key type * style: use descriptive variable names * style: use a, b instead of strA, strB * docs: add note about complexity * refator: add name to TestCase * docs: remove sentence about dp * docs: update documentation of `levenshteinDistance` Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com> * docs: rename initLevenshteinDistanceMatrix and add its documentation * refactor: add fillLevenshteinDistanceMatrix * tests: reorganise tests, remove TestCase type * docs: add edit distance to module-level documentation * docs: mention Wagner–Fischer algorithm * docs: add comment in fillLevenshteinDistanceMatrix * style: reformat fillLevenshteinDistanceMatrix * docs: update DIRECTORY.md --------- Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com> * docs: add missing `#` (#50) * docs: add missing `#` * docs: update DIRECTORY.md * docs: remove ` of a Circle` * docs: update DIRECTORY.md --------- Co-authored-by: dlesnoff <54949944+dlesnoff@users.noreply.github.com> * Fix: Remove unuseful date command * chore: Update checkout version to v4 in build_directory_md * Add error status checking + separate compilation and execution of directory.nim --------- Co-authored-by: Dimitri LESNOFF <dimitri.lesnoff@lip6.fr> Co-authored-by: David Leal <halfpacho@gmail.com>
This PR adds a dynamic programming implementation of the Levenshtein distance.