串的数据结构及模式匹配算法（C++）_c++7-4 串的模式匹配-CSDN博客

本文链接：https://blog.csdn.net/Worthwhile_DUT/article/details/121487445

串的数据结构

与栈和队列不同，栈和队列属于操作受限的线性表，而串属于内容受限的线性表，要求只允许存储“字符”；

字符的存储方式也有顺序存储和链式存储结构，但考虑到存储效率，一般采用顺序存储结构，关于顺序存储结构，包括静态开辟和动态开辟在堆区：

#pragma once
#include<iostream>
using namespace std;

const int MAXSIZE = 100;
const int ERROR = 0;
const int OK = 1;


//顺序结构存储
  //以数组的形式静态存储数据
class MyString
{
	char base[MAXSIZE];
	int m_length;
};
  //以数组的形式在堆区开辟一段内存空间
class MyString1
{
	char *base;
	int m_length;
public:
	MyString1();
};
MyString1::MyString1()
{
	base = new char[MAXSIZE];
	this->m_length = 0;
}

链式存储（不常用）

 //每个自定义类型作为一个结点，即结点大小为1的链表
class Snode
{
	char data;
	Snode* next;
};
class Mystring3
{
	Snode* head;//串的头指针
	Snode* tail;//串的尾指针
	int m_length;//串的长度
};
   //n个自定义类型作为一个结点，即结点大小为n的链表
class ChunkNode
{
	char data[10];
	Snode* next;
};
class Mystring4
{
	Snode* head;//串的头指针
	Snode* tail;//串的尾指针
	int m_length;//串的长度
};

针对字符串的基本操作，C++ string类已经提供了多种函数，这里不再赘述。

模式匹配算法

接下来是串的模式匹配算法，包括暴力但效率低的BF算法，和避免回溯的高效的KMP算法：

BF算法：

串的第一次模式匹配示意图

第一步匹配

串的第二次模式匹配示意图

第二步匹配

串的第三次模式匹配示意图

第三步匹配

串模式匹配成功示意图

第四步匹配：成功！

//代码
#include<iostream>
#include<string>
using namespace std;

//BF算法
int Index_BF(const string& s, const string& t, int pos)
{
	unsigned int i = pos-1, j = 0;
	while (i<s.length()&&j<t.length())
	{
		if(s[i]==t[j])
		{i++; j++;}
		else
		{
			j = 0;
			i = i - j + 1;
		}
	}
	if (j == t.length())
		return i - t.length() + 1;
	else
		return 0;
}

int main()
{
	string S = "asdfghjkl";
	string T = "sd";
	int pos = 1;//开始查找的位置，pos=1意味着从第一个位置开始查
	int reslut = 0;
	reslut=Index_BF(S, T, 1);
	cout << "BF算法返回第一个字符的序号：" << reslut << endl;
	system("pause");
	return 0;
}

KMP算法：

改进在于，当对应i,j位置上字符不一样时，i不需要回溯，而是在已得到的部分匹配结果中，寻找最大公共前缀，使j=next[j]，以实现模式串T滑动多格的效果；

至于原理：可去B站搜索，KMP算法找到天勤公开课（易懂）有比较直观的介绍。

#include<iostream>
#include<string>
using namespace std;


//KMP算法
void Get_next(const string& t, int next[])
{
	next[0] = t.length(); next[1] = 0;
	int i = 1, j = 0;//i表示后缀，j表示前缀
	while (i < next[0])
	{
		if (j == 0 || t[i-1] == t[j-1])
		{
			next[++i] = ++j;
		}
		else
		{
			j = next[j];
		}
	}
}

int Index_KMP(const string& s, const string& t, int pos)
{
	int num = t.length();
	int* next = new int[num+1];
	Get_next(t, next);
	for (int k = 1; k < num + 1; k++)
	{
		cout << next[k] << "\t";
	}
	cout << endl;
	int i = pos-1, j =0;
	while ((i<s.length()&&j<t.length())||j==-1)//注意无符号数与有符号树的比较，或出现-1>10的情况
	{
		if(j==-1||s[i]==t[j])
		{i++; j++;}
		else
		{
			j = next[j+1]-1;//移动位数=模式串已匹配的字符数-最长前缀匹配字数
		}
	}
	if (j == t.length())
		return i - t.length()+1;
	else
		return 0;
}


int main()
{ 
	string S = "aaba";
	string T = "aba";
	int reslut=Index_KMP(S, T, 1);
	cout << "结果是：" << reslut << endl;
	system("pause");
	return 0;
}

KMP算法太难理解了，发明的人真是天才啊，由于书上的字符都是从1开始编号，string类是从0开始编号，这里是硬挤出来的加加减减....... （但我再码这段文字的时候，想起来了一种方法，就像excel表中特长的数字，前面加个’，算法可以修改一下，去和书上保持一致）

如下：我加的是‘#’

#include<iostream>
#include<string>
using namespace std;

//KMP算法
void Get_next(const string& t, int next[])
{
	next[0] = t.length(); next[1] = 0;
	int i = 1, j = 0;//i表示后缀，j表示前缀
	while (i < next[0])
	{
		if (j == 0 || t[i] == t[j])
			next[++i] = ++j;
		else
			j = next[j];
	}
}

int Index_KMP(const string& s, const string& t, int pos)
{
	int num = t.length();
	int* next = new int[num+1];
	Get_next(t, next);
	unsigned i = pos, j =1;
	while ((i<s.length()&&j<t.length()))
	{
		if(j==0||s[i]==t[j])
		{i++; j++;}
		else
		{
			j = next[j];//移动位数=模式串已匹配的字符数-最长前缀匹配字数
		}
	}
	if (j == t.length())
		return i - t.length()+1;
	else
		return 0;
}


int main()
{ 
	string S = "#aaba";
	string T = "#aba";
	int reslut=Index_KMP(S, T, 1);
	cout << "结果是：" << reslut << endl;
	system("pause");
	return 0;
}

运行结果：