Skip to content

Conversation

@vil02
Copy link
Member

@vil02 vil02 commented Jun 25, 2023

This PR adds a dynamic programming implementation of the Levenshtein distance.

@vil02 vil02 requested a review from dlesnoff as a code owner June 25, 2023 20:18
Copy link
Contributor

@ZoomRmc ZoomRmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

Some general overview of the algorithm an a basic use-case in a runnableExamples block would be nice (for example, just selecting a suitable string from a "dictionary" sequence as a substitution for a string with a typo).

Copy link
Collaborator

@dlesnoff dlesnoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice implementation! At certain places, I would like more comments.

You can consider adding the two-vector solution to the Levenshtein distance.
In any case, Levenshtein should be put in regard to other metrics. Hamming distance is at the basis of linear code theory and operates only by substitution. We might also add in another module in the future the Longest Common Substring sequence, which on the contrary, operates by deletion and insertion.
The Edit Distance is more general and gives weights to the three operations.

@vil02
Copy link
Member Author

vil02 commented Jun 26, 2023

@dlesnoff @ZoomRmc thanks for your feedback. I need some time to fully digest it.

@dlesnoff
Copy link
Collaborator

Take your time!

@vil02 vil02 marked this pull request as draft June 27, 2023 19:58
@vil02 vil02 marked this pull request as ready for review June 27, 2023 21:24
@vil02 vil02 force-pushed the add_levenshtein_distance branch from 1c127df to b306027 Compare June 27, 2023 21:25
@vil02
Copy link
Member Author

vil02 commented Jun 27, 2023

You can consider adding the two-vector solution to the Levenshtein distance.

I have been thinking about that. My reasoning for not doing that is the following:

  • this repository is for educational proposes,
  • when someone would read this code and think "wait a second, actually only two rows are needed - one can save lots of space!" - that would have a very high educational value,
  • having levenshteinDistanceRecursive, levenshteinDistanceFullMatrix and levenshteinDistanceTwoRows is an overkill in my opinion (and quite cumbersome to insert into this project because of its structure).

@vil02 vil02 requested review from ZoomRmc and dlesnoff June 27, 2023 21:45
Copy link
Contributor

@ZoomRmc ZoomRmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Regarding other, more optimized solutions: it's absolutely fine to have only the basic one and leave the room for improvement in later PRs, possibly from new contributors. If it happens, we can always come to a decision where exactly to put them.

@vil02 vil02 requested a review from ZoomRmc June 28, 2023 08:09
Copy link
Contributor

@ZoomRmc ZoomRmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by this point it's ready to merge!

Copy link
Collaborator

@dlesnoff dlesnoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must decide between the Key type and the proc overloading solution but I need some additional time for this. I let @ZoomRmc decide on this if I do not react quickly which is highly probable.
I am happy with the changes but require some documentation improvements.
Dropping the more subtle two-vector implementations is fine by me.

@dlesnoff
Copy link
Collaborator

When someone would read this code and think: "Wait a second, actually only two rows are needed - one can save lots of space!" - that would have a very high educational value,

He would have then reinvented the wheel. No, most people would not spot this. You are right. Restraining information teaches less and educates more. Should we teach or educate?

I am more in favour of teaching. One must know a similar example to spot this kind of space savings. He has to be already knowledgeable.

Anyway, this is more suitable for another PR. I think that the implementation is neat and fine as is. I just want:

  • the small documentation changes.

  • the DIRECTORY.md file obtained with this command:

nim r .scripts/directory.nim > DIRECTORY.md
  • A last check with nimpretty

@vil02 vil02 requested a review from Panquesito7 as a code owner June 28, 2023 10:52
@vil02
Copy link
Member Author

vil02 commented Jun 28, 2023

As requested:

@vil02 vil02 requested a review from dlesnoff June 28, 2023 11:04
Copy link
Collaborator

@dlesnoff dlesnoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this excellent contribution!
We had some fruitful conversations on different approaches, and I appreciate having some opposite views.

@dlesnoff dlesnoff merged commit fffbb33 into TheAlgorithms:main Jun 28, 2023
@vil02
Copy link
Member Author

vil02 commented Jun 28, 2023

@dlesnoff @ZoomRmc thanks again for the review and fruitful discussions!

@vil02 vil02 deleted the add_levenshtein_distance branch June 28, 2023 13:51
dlesnoff pushed a commit to dlesnoff/Nim-Algorithms that referenced this pull request Jul 4, 2023
* feat: add Levenshtein distance

* Apply suggestions from code review

Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>

* refactor: use toKey in initSubsolutions

* style: use .Natural

* style: remove blank line

* refactor: add Key type

* style: use descriptive variable names

* style: use a, b instead of strA, strB

* docs: add note about complexity

* refator: add name to TestCase

* docs: remove sentence about dp

* docs: update documentation of `levenshteinDistance`

Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>

* docs: rename initLevenshteinDistanceMatrix and add its documentation

* refactor: add fillLevenshteinDistanceMatrix

* tests: reorganise tests, remove TestCase type

* docs: add edit distance to module-level documentation

* docs: mention Wagner–Fischer algorithm

* docs: add comment in fillLevenshteinDistanceMatrix

* style: reformat fillLevenshteinDistanceMatrix

* docs: update DIRECTORY.md

---------

Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>
dlesnoff added a commit that referenced this pull request Sep 8, 2023
* Add a .github action to build DIRECTORY.md

* Update gh action build_directory_md

* Add ignored paths and git credentials

* chore: minor Gitpod improvements (#34)

Add Gitpod badge and prebuilds for all branches and forks

* Add binary search (#37)

* Add binary search

* Update binary search: add recursive version

* Update binary search: better description

* Update binary search: update runnableExamples

* Update binary search: improve tests, add overload for recursive func

* Update binary search: evaluate if-expression into result

* grammar

* Update binary search: improve description, add comments

* Update binary search: add missing test cases

* Update binary search: add info about pivot choice in description, rearrange it

* Update binary search: change int to Natural

* Update binary search: if block -> guard clause

* Update binary search: test cases

* Update binary search: simplify docs

* Update binary search: nimpretty

* Apply suggestions from code review

Co-authored-by: David Leal <halfpacho@gmail.com>

* Replace by an attempt of github pr create

* Fix branching and environment variable in the PR

Co-authored-by: David Leal <halfpacho@gmail.com>

* Create and populate the Maths directory (#4)

* Add a sample of maths basic algorithms

* Update maths/abs.nim

Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>

* Use openArray in absMaxSort

Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>

* Fix seq->openArray and int->SomeInteger

* Use Positive as input instead

* Update maths/addition_without_arithmetic.nim

Co-authored-by: Pietro Peterlongo <pietro.peterlongo@gmail.com>

* Name allocation number

* [maths/abs] Replace maxAbsSort by signed[Min/Max]Abs

* [Aliquot Sum] Add header

* Add empty line at end of file

* Remove Allocation number since it is a non standard algorithm

* Fix titles

* Run nimpretty on the files

* Rename file and add DIRECTORY.md



---------

Co-authored-by: Dimitri LESNOFF <dimitri.lesnoff@lip6.fr>
Co-authored-by: David Leal <halfpacho@gmail.com>

* Add arc length (#10)

* Fix linear search (#33)

* Use a dynamic allocated array (sequence) for strings

* Run nimpretty

* Add requested changes

* Fix documentation generation

* Fix test title according to changes

Co-authored-by: Satin Wuker <74630829+SatinWuker@users.noreply.github.com>

* Update comments to warn about indexing issues

Modified a bit the suggestions of comments. Please check them before approving.

Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>

---------

Co-authored-by: Dimitri LESNOFF <dimitri.lesnoff@lip6.fr>
Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>

* chore: run `apt-get update` before running other commands (#41)

This will run `sudo apt-get update` before running any other commands on the Gitpod prebuild image (`.gitpod.dockerfile`).

* Readme: fix gitter link, add Nim logo, formatting fixes (#42)

* Update README.md

Use more extended markdown syntax instead of HTML,
Add Matrix links
Superseed PR #40

* Actually fix gitter! Move img src links to footer.

Also changed the TheAlgorithms logo image link.

* Restore TheAlgorithms logo link

* More link fixes

---------

Co-authored-by: dlesnoff <54949944+dlesnoff@users.noreply.github.com>

* fix: nimpretty for `arc_length.nim` (#43)

* chore: `gc`->`mm`, move back to `config.nims` (#47)

* chore: revert #41 (#45)

* Update codeowners

* Fix: CONTRIBUTING.md links (#49)

Uppercase `directory.md` to `DIRECTORY.md`

* feat: add Levenshtein distance (#46)

* feat: add Levenshtein distance

* Apply suggestions from code review

Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>

* refactor: use toKey in initSubsolutions

* style: use .Natural

* style: remove blank line

* refactor: add Key type

* style: use descriptive variable names

* style: use a, b instead of strA, strB

* docs: add note about complexity

* refator: add name to TestCase

* docs: remove sentence about dp

* docs: update documentation of `levenshteinDistance`

Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>

* docs: rename initLevenshteinDistanceMatrix and add its documentation

* refactor: add fillLevenshteinDistanceMatrix

* tests: reorganise tests, remove TestCase type

* docs: add edit distance to module-level documentation

* docs: mention Wagner–Fischer algorithm

* docs: add comment in fillLevenshteinDistanceMatrix

* style: reformat fillLevenshteinDistanceMatrix

* docs: update DIRECTORY.md

---------

Co-authored-by: Zoom <ZoomRmc@users.noreply.github.com>

* docs: add missing `#` (#50)

* docs: add missing `#`

* docs: update DIRECTORY.md

* docs: remove ` of a Circle`

* docs: update DIRECTORY.md

---------

Co-authored-by: dlesnoff <54949944+dlesnoff@users.noreply.github.com>

* Fix: Remove unuseful date command

* chore: Update checkout version to v4 in build_directory_md

* Add error status checking + separate compilation and execution of directory.nim

---------

Co-authored-by: Dimitri LESNOFF <dimitri.lesnoff@lip6.fr>
Co-authored-by: David Leal <halfpacho@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants