当前位置：首页 > 行业动态 > 正文

HttpClient抓取网页的两种方式

admin
行业动态
2025-04-30
2844

HttpClient抓取网页可通过同步（直接发送请求并等待响应）和异步（非阻塞式并行处理）两种方式实现，前者适用于简单场景，后者提升高并发效率

使用ResponseHandler自动处理响应

HttpClient提供了ResponseHandler接口，可通过实现该接口或使用内置的ResponseHandler（如StringResponseHandler）直接将响应转换为字符串或其他格式。
步骤与代码示例：

HttpClient抓取网页的两种方式第1张

创建CloseableHttpClient实例。
构建HttpGet请求对象。
调用execute方法并传入ResponseHandler。
自动处理响应并关闭资源。

import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.fluent.Request; // 需添加fluent-hc依赖
public class HttpClientExample {
    public static void main(String[] args) throws Exception {
        // 创建HttpClient实例
        try (CloseableHttpClient httpClient = HttpClients.createDefault()) {
            // 方式一：使用ResponseHandler（推荐）
            String url = "https://www.example.com";
            ResponseHandler<String> responseHandler = response -> {
                int status = response.getStatusLine().getStatusCode();
                if (status >= 200 && status < 300) {
                    return EntityUtils.toString(response.getEntity());
                } else {
                    throw new RuntimeException("HTTP Error: " + status);
                }
            };
            String content = httpClient.execute(new HttpGet(url), responseHandler);
            System.out.println("Page Content: " + content);
        }
    }
}

手动处理HttpEntity和输入流

通过CloseableHttpResponse获取HttpEntity，再手动解析输入流或实体内容。
步骤与代码示例：

创建CloseableHttpClient实例。
构建HttpGet请求对象。
调用execute方法获取CloseableHttpResponse。
手动提取HttpEntity。
显式关闭响应和实体以释放资源。

public class HttpClientExample {
    public static void main(String[] args) throws Exception {
        // 创建HttpClient实例
        try (CloseableHttpClient httpClient = HttpClients.createDefault()) {
            // 方式二：手动处理响应
            String url = "https://www.example.com";
            HttpGet request = new HttpGet(url);
            try (CloseableHttpResponse response = httpClient.execute(request)) {
                // 检查状态码
                int status = response.getStatusLine().getStatusCode();
                if (status >= 200 && status < 300) {
                    HttpEntity entity = response.getEntity();
                    // 将实体内容转换为字符串
                    String content = EntityUtils.toString(entity, "UTF-8");
                    System.out.println("Page Content: " + content);
                } else {
                    throw new RuntimeException("HTTP Error: " + status);
                }
            }
        }
    }
}

两种方式对比表格

特性	方式一（ResponseHandler）	方式二（手动处理）
代码简洁性	高（一行调用）	低（需多步操作）
资源管理	自动关闭（依赖try-with-resources）	需手动关闭响应和实体
灵活性	低（固定处理逻辑）	高（可自定义解析逻辑）
适用场景	快速获取简单内容	处理或流式处理
异常处理	需捕获`IOException`	需额外处理实体解析异常