Boost库-功能介绍-字符串常用功能-编码转换-格式化为字符串-字符串分割合并-类型转换-字符替换删除裁剪

本文链接：https://blog.csdn.net/m0_67316550/article/details/124855688

文章目录

传统的C++提供了基本的字符串处理功能，对于一些常用的功能没有直接的封装，本文从编码转换，格式化字符串，字符串分割合并和类型转换四个方面，讲解boost库中的字符串处理功能。

1.编码转换

在计算机中，为了描述不同文字类型（英文和汉字）和不同的文字内容（汉字中的中字和国字），就需要将这些文字按一定的规则进行编码，方便统一描述，在不同的设备采用相同的编号描述。进而解析编码而获得文字。下文中是可能涉及到的编码名称集合，在编程中需要指定。
欧洲语言系
ASCII, ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16}, KOI8-R, KOI8-U, KOI8-RU, CP{1250,1251,1252,1253,1254,1257}, CP{850,866,1131}, Mac{Roman,CentralEurope,Iceland,Croatian,Romania}, Mac{Cyrillic,Ukraine,Greek,Turkish}, Macintosh
Semitic languages
ISO-8859-{6,8}, CP{1255,1256}, CP862, Mac{Hebrew,Arabic}
日语
EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1, ISO-2022-JP-MS
中文
EUC-CN, HZ, GBK, CP936, GB18030, EUC-TW, BIG5, CP950, BIG5-HKSCS, BIG5-HKSCS:2004, BIG5-HKSCS:2001, BIG5-HKSCS:1999, ISO-2022-CN, ISO-2022-CN-EXT
朝鲜文
EUC-KR, CP949, ISO-2022-KR, JOHAB
亚美尼亚
ARMSCII-8
格鲁吉亚
Georgian-Academy, Georgian-PS
Tajik
KOI8-T
哈萨克斯坦
PT154, RK1048
泰国
ISO-8859-11, TIS-620, CP874, MacThai
老挝
MuleLao-1, CP1133
越南
VISCII, TCVN, CP1258
Platform specifics
HP-ROMAN8, NEXTSTEP
Full Unicode
UTF-8
UCS-2, UCS-2BE, UCS-2LE
UCS-4, UCS-4BE, UCS-4LE
UTF-16, UTF-16BE, UTF-16LE
UTF-32, UTF-32BE, UTF-32LE
UTF-7
C99, JAVA
Full Unicode, in terms of uint16_t or uint32_t (with machine dependent endianness and alignment)
UCS-2-INTERNAL, UCS-4-INTERNAL
Locale dependent, in terms of char' or wchar_t’ (with machine dependent endianness and alignment, and with OS and locale dependent semantics)
char, wchar_t
最常用的四种是UTF-8 GB2312 GBK UTF-16LE
转换函数，范例代码如下所示：

#include <boost/locale.hpp>
#include <string>
#include <boost/locale/encoding.hpp>

std::string UTF8toGBK(const std::string & str)
{
    return boost::locale::conv::between(str, "GBK", "UTF-8");
}

std::string GBKtoUTF8(const std::string & str)
{
    return boost::locale::conv::between(str, "UTF-8", "GBK");
}

std::wstring GBKtoUNICODE(const std::string & str)
{
    return boost::locale::conv::to_utf<wchar_t>(str, "GBK");
}

std::string UNICODEtoGBK(std::wstring wstr)
{
    return boost::locale::conv::from_utf(wstr, "GBK");
}

std::string UNICODEtoUTF8(const std::wstring& wstr)
{
    return boost::locale::conv::from_utf(wstr, "UTF-8");
}

std::wstring UTF8toUNICODE(const std::string & str)
{
    return boost::locale::conv::utf_to_utf<wchar_t>(str);
}

int main()
{
	std::string source = "18928899728-广州知了软件有限公司";
	//std::string between(std::string const &text, std::string const &to_encoding, std::string const &from_encoding, method_type how = default_method);
	std::string s = boost::locale::conv::between(source, "UTF-8", "GB2312");//目标编码名称和源编码名称
	return 0;
}

2.格式化为字符串

boost::format( "format-string ") % arg1 % arg2 % … % argN ;
注意这里没有示例对象，format-string代表需要格式化的字符串，后面用重载过的%跟参数。

//在format-string中，%X%表示占位符，%1%就是第一个占位符，%2%就是第二个，后面类推，再后面的%"xxx"就对应着每个占位符，也就是说如果我们写成：
boost::format fmt("%1% \n %2% \n %3%" )%"first"%"second"%"third";
std::string st=fmt.str();

下面展示一个简单数字格式的例子：

#include <iostream>
#include <boost/format.hpp>
int main()
{
	//浮点转字符串和整数保持6位
	std::cout << boost::format("%8f - %.2f%% - %06d") % 10.0 % 12.5 %10 << std::endl;
	//结果：10.000000 - 12.50% - 000010
	return 0;
}

3.字符串分割合并

分割采用split函数，合并采用join函数。范例代码如下：

#include <iostream>   
#include <boost/format.hpp>   
#include <boost/tokenizer.hpp>   
#include <boost/algorithm/string.hpp>   
#include <boost/algorithm/string/join.hpp>
//过滤函数--非空
bool is_not_empty(const std::wstring& str)
{
	return !str.empty();
}

int main()
{
	//分割字符串 
	std::wstring strTag = _T("广 州 知 了 软 件 有 限 公 司");
	std::vector<std::wstring> items;	
	boost::split(items, strTag, boost::is_any_of(_T(" ,")));//boost::is_any_of-分割规则

	//链接字符串
	std::wstring join_str1 = boost::join(items, _T("-"));
	//结果：广-州-知-了-软-件-有-限-公-司
	std::wstring join_str2 = boost::join_if(items, _T("+"), is_not_empty);
	//结果：广+州+知+了+软+件+有+限+公+司
	return 0;
}

4.类型转换

在boost中，采用lexical_cast模板函数，转换输入数据 (arg) 必须能够 “完整” 地转换，否则就会抛出 bad_lexical_cast 异常。例如： int i = boost::lexical_cast<int>(“123.456”); // this will throw，范例代码如下所示：

#include <iostream>   
#include <boost/lexical_cast.hpp>
int main()
{
	try
	{
		int num1 = boost::lexical_cast<int>("109");
		double num2 = boost::lexical_cast<double>("1089.6");
		std::string str1 = boost::lexical_cast<std::string>(980);
		std::string str2 = boost::lexical_cast<std::string>(980.89);
	}
	catch (boost::bad_lexical_cast& e)
	{
		std::cout << e.what() << std::endl;
	}
	return 0;
}

5.字符替换删除和裁剪

#include <string>
#include <boost/algorithm/string.hpp>

int _tmain(int argc, _TCHAR* argv[])
{
	//replace_all() 、replace_first()、 replace_last() 以及它们的变体
	std::string rlt1=boost::algorithm::replace_all_copy(std::string("hello world"), "l", "-");
	//erase_all() 、erase_first()、 erase_last() 以及它们的变体
	std::string rlt2 = boost::algorithm::erase_all_copy(std::string("hello world"), "l");
	//Trimming函数主要有trim()、trim_left()、trim_right()和他们的xxx_copy和xxx_if版本。用于去除字符串首位的空白字符：
	std::string rlt3 = boost::algorithm::trim_copy(std::string(" hello world"));
	//也不限于只去掉空白字符：
	std::string rlt4 = boost::algorithm::trim_copy_if(std::string(",,,hello world"), boost::algorithm::is_any_of(" ,.:"));
	return 0;
}