当前位置：首页 > 后端开发 > 正文

Java如何字符串转码

admin
后端开发
2025-07-02
8

在Java中，字符串转码可通过 String.getBytes("目标编码")获取字节数组，再通过 new String(字节数组, "新编码")重构字符串实现，常用编码如UTF-8、GBK，需处理 UnsupportedEncodingException异常（Java 11+推荐使用 StandardCharsets常量避免异常）。

为什么需要转码？

字符串在Java内部以Unicode格式存储，但与外部系统（文件、网络、数据库）交互时需转换为字节序列，若编码不一致（如系统默认编码为GBK，而数据用UTF-8传输）,会导致乱码。

String text = "中文";
byte[] bytes = text.getBytes(); // 默认编码（如GBK）
String decoded = new String(bytes, "UTF-8"); // 编码不一致，出现乱码

Java转码核心方法

`String.getBytes()` 与 `new String()`

// 字符串 → 指定编码的字节数组
String str = "Hello, 世界!";
byte[] utf8Bytes = str.getBytes(StandardCharsets.UTF_8); // 指定UTF-8编码
// 字节数组 → 字符串（需明确原始编码）
String decodedStr = new String(utf8Bytes, StandardCharsets.UTF_8);

`Charset` 类（推荐）

Charset charset = Charset.forName("GBK"); 
byte[] gbkBytes = str.getBytes(charset); // 转成GBK字节数组
String result = new String(gbkBytes, charset); // 正确还原

处理编码转换流

try (InputStreamReader reader = new InputStreamReader(
        new FileInputStream("input.txt"), "ISO-8859-1"); // 源文件编码
     OutputStreamWriter writer = new OutputStreamWriter(
        new FileOutputStream("output.txt"), "UTF-8")) { // 目标编码
    int c;
    while ((c = reader.read()) != -1) {
        writer.write(c); // 自动转码
    }
}

关键注意事项

明确指定编码
避免依赖系统默认编码（如getBytes()无参方法），用StandardCharsets.UTF_8等常量替代字符串参数。

Java如何字符串转码第1张

// 错误：依赖系统默认编码
byte[] riskyBytes = "文本".getBytes(); 
// 正确：显式声明
byte[] safeBytes = "文本".getBytes(StandardCharsets.UTF_8);

常见乱码场景
- “锟斤拷”：UTF-8字节被误用GBK解码。
- “�”符号：无法映射到目标字符集的字符。
- 字节截断：多字节编码（如UTF-8）被不完整读取。

编码检测工具
若未知源数据编码，可用第三方库（如ICU4J、juniversalchardet）辅助判断：

// 使用juniversalchardet检测编码
byte[] data = Files.readAllBytes(Paths.get("unknown.txt"));
UniversalDetector detector = new UniversalDetector(null);
detector.handleData(data, 0, data.length);
detector.dataEnd();
String encoding = detector.getDetectedCharset(); // 返回可能编码如"UTF-8"

最佳实践

统一内部编码：项目强制使用UTF-8（JVM启动参数加-Dfile.encoding=UTF-8）。
外部交互显式声明：
- HTTP响应头：Content-Type: text/html;charset=UTF-8
- 数据库连接：JDBC URL添加?useUnicode=true&characterEncoding=UTF-8

处理不可映射字符
用CodingErrorAction指定策略：

CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder()
     .onMalformedInput(CodingErrorAction.REPLACE) // 替换无效输入
     .onUnmappableCharacter(CodingErrorAction.IGNORE); // 忽略无法映射字符

完整示例：UTF-8转GBK

public class EncodingConverter {
    public static void main(String[] args) throws Exception {
        String original = "转码测试";
        // UTF-8 → GBK
        byte[] gbkBytes = original.getBytes("GBK"); 
        // GBK → UTF-8还原
        String restored = new String(gbkBytes, "GBK"); 
        System.out.println(restored); // 输出"转码测试"
        // 错误转换（乱码）
        String wrong = new String(gbkBytes, StandardCharsets.UTF_8); 
        System.out.println(wrong); // 输出乱码如"杞爼娴嬭瘯"
    }
}