Golang中文字符串每个汉字的长度

在 Golang 中，如果字符串中出现中文字符不能直接调用 len 函数来统计字符串字符长度，这是因为在 Go 中，字符串是以 UTF-8 为格式进行存储的，在字符串上调用 len 函数，取得的是字符串包含的 byte 的个数。每个中文字，占3个byte。所以：

str1 := "Hello,世界"  
fmt.Println(len(str1)) // 打印结果：12

英文字符仍是占一个byte。谷歌有一篇文章专讲字符串：https://blog.golang.org/strings 里面有一段代码：

const nihongo = "日本語"
for index, runeValue := range nihongo {
	fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
}

输出：

U+65E5 '日' starts at byte position 0
U+672C '本' starts at byte position 3
U+8A9E '語' starts at byte position 6

其中%#U，这个格式，即打印Unicode，又打印它所表示的印刷体。

强大

%#U, which shows the code point's Unicode value and its printed representation

这篇文章中包括的其它的，关于字符串处理的技巧，依然很强大。在开发中很少用到。

本文由创作，采用知识共享署名4.0 国际许可协议进行许可。本站文章除注明转载/出处外，均为本站原创或翻译，转载前请务必署名。最后编辑时间为: 2020/11/20 14:26