
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Longest Common Substring Using Binary Search and Rolling Hash
In this problem, we will find the longest common substring using the binary search and rolling hash algorithm.
The binary search is an efficient technique for searching values in the sorted array. Here, we will use it to find the maximum length of the common substring. The rolling hash algorithm is used to calculate the hash value for the string. Also, it calculates the hash value for the next substring from the previous substring's hash value when we add or remove a character from the previous substring.
Problem statement ? We have given two strings named the first and second. We need to find the longest common substring using the binary search and rolling hash algorithm.
Sample examples
Input
first = "qwertdfghj"; second = "aqwertdfj";
Output
7
Explanation ? The longest common substring is ?qwertdfj'.
Input
first = "abcd"; second = "pqrs";
Output
0
Explanation ? There is no common substring. So, it prints 0.
Input
first = "mnokjl"; second = "mnokjl";
Output
6
Explanation ? Both strings are same. So, it prints the length of the string.
Approach 1
In this approach, we will take the 0 as a minimum value and the string's length as a maximum value for the binary search technique. We will take the middle length and find the rolling hash value for each substring of the middle length of the first string.
Next, we will find the rolling hash value of each substring of the middle length of the second string. If the hash value matches any hash value of the substring of the first string, we found the common substring of the middle length. So, we need to find the maximum length of a common substring in the right part. Otherwise, we need to find the maximum length in the left part.
Algorithm
Step 1 ? Initialize the ?mini' with 0 and ?maxi' with the maximum length of the first and second string's length, representing the minimum and maximum range of the binary search technique.
Step 2 ? Make traversals using the while loop until the value of ?mini' is less than a ?maxi'.
Step 3? Take the middle length using the ?mini' and ?maxi' values. Also, initialize the ?isPresent' boolean value with false to track whether the middle length's common substring is present.
Step 4 ? Initialize the set named ?hash values' to store the hash value of the substrings of the first string.
Step 5 ? Take each substring of the middle length of the first string, calculate the hash value, and insert it into the ?hashValues' set.
Step 5.1 ? In the calculate function, ?val' with 0 to store the hash value of the previous substring, and ?powVal' to store the power value for each iteration.
Step 5.2 ? Start traversing the string, and take a character value between 1 to 26.
Step 5.3? Multiply the ?powVal' with a character value, add it to the ?val', take modulo with the modulus, and store it back to the ?val' variable.
Step 5.4 ? Multiply the ?powVal' with 26 and take its modulo with the ?modulus'.
Step 5.5 ? Return ?val' at the end of the calculate() function.
Step 6 ? Next, traverse the second string to take each substring of the middle length and calculate the hash value.
Step 7 ? If a hash value related to any substring of the second string exists in the ?hashValues' set, update the ?isPresent' to true, and break the loop.
Step 8 ? If ?isPresent' is true, update ?mini' with ?middle'. Otherwise, update the ?maxi' with the middle - 1.
Step 9 ? At last, return the value of the ?mini' variable.
Example
#include <iostream> #include <unordered_set> #include <cmath> using namespace std; long long modulus = 1000000007; // Calculate the hash value for the given string long long calculate(string str) { long long val = 0; long long powVal = 1; // Traverse each character of the string for (int p = 0; p < str.length(); p++) { // Get character value between 1 to 26 long long ch = str[p] - 'A' + 1; // Calculate hash value val = (val + ch * powVal) % ::modulus; powVal = (powVal * 26) % ::modulus; } return val; } int maxSubStr(string first, string second) { int mini = 0, maxi = min(first.length(), second.length()); // Binary search iterations until mini is less than the maxi while (mini < maxi) { // Get middle value int middle = (mini + maxi + 1) / 2; bool isPresent = false; // To store hash values of substring unordered_set<long long> hashValues; for (int p = 0; p + middle <= first.length(); p++) { // Calculating hash value for each substring of the middle length of first string long long hashVal = calculate(first.substr(p, middle)); // Insert to map hashValues.insert(hashVal); } for (int p = 0; p + middle <= second.length(); p++) { long long hashVal = calculate(second.substr(p, middle)); // If the hash value exists in the map, a common substring of length K exists. if (hashValues.count(hashVal)) { isPresent = true; break; } } // Update the pointer according to whether a string of middle length is present or not if (isPresent) { mini = middle; } else { maxi = middle - 1; } } return mini; } int main() { string first = "mnokjl"; string second = "mnokjl"; cout << "The length of the longest common substring is " << maxSubStr(first, second) << endl; return 0; }
Output
The length of the longest common substring is 6
Time complexity ? O(log(min(N,M)) * (O(N*N) + O(M*M)), where O(log(min(N,M)) is for binary search, and O(N*N) + O(M*M) is for calculating the rolling hash of each substring.
Space complexity ? O(N + M) to store the hash value for each substring.
We learned to use the rolling hash and binary search technique to find the longest common substring's length. It is an efficient approach than the recursive approach of finding the maximum length of the common substring. However, the dynamic programming approach is more efficient than the above approach.