fix(community): OG 抓取 SSRF 加固 + 事务/模块解耦收尾#20
Merged
longsizhuo merged 17 commits intomainfrom Apr 25, 2026
Merged
Conversation
- 加 PrivateAddressGuard:InetAddress.getAllByName 后逐 IP 判 loopback / RFC1918 / link-local / CGNAT / multicast / ULA,DNS 解析失败 fail-closed - OgFetchService 的 HttpClient 改用 Redirect.NEVER,自己循环处理 3xx(最多 3 跳),每一跳都重新解析 host 再走 PrivateAddressGuard,避免 302 把我们 扔到 169.254.169.254 这种 metadata 端点 - 加 OgFetchServiceSsrfTest:127.0.0.1 / 10.0.0.1 直接挡;公开 host 302 到 169.254.169.254 时第二跳也挡;正常公开 host 走 200 OK 通路
printStackTrace 只往 stderr 写,没有 trace id、采集不到 Loki,生产环境等于
看不到。换成 log.error("未处理的异常", exception),栈信息走统一日志管道。
- submit / submitInternal / report / enrich 四条写入路径都加 @transactional: submit 的「限频计数读 + insert」必须在同一事务里,否则并发能穿透日配额; report 的「reports 插入 + link report_count 自增 + 可能的 transitionStatus」 三步也需要原子落地,不然 count 和 status 会飘。 - 读方法(findById / buildAdminSummary / listApproved / listBySubmitter / listPendingForAdmin)改 @transactional(readOnly=true),告诉驱动别给事务 分配 xid,同时挡掉意外写。 - 刻意不在 SharedLinkEnrichmentWorker.enrich 这种 @async 方法上包事务: 事务放在它调用的 SharedLinkService.enrich 里,边界更窄;避免异步线程 上挂一条事务连接跨越 OG 抓取 + DeepSeek 调用这两个长外部 I/O。
community 模块不应该直接注入 usercenter 的 UserAccountRepository——跨模块 访问只经过 service 层。把 bridge 账号查询从 userRepo.findByUsername 改成 UserCenterService.findByUsername(该方法已存在),去掉仓储层导入。
BodyHandlers.ofString() 没有 size 上限,10s timeout 内恶意公开 host 用 chunked 无尽流可以把 JVM 堆吃光。换成 ofInputStream() 边读边计数: - 新增 MAX_BODY_BYTES=2MB(OG meta 都在 <head>,2MB 远超正常站点) - readBodyCapped 用 ByteArrayOutputStream + 8KB chunk,累计超过上限立刻 close 流,返回 exceededLimit=true;不再继续吃后续字节 - 从 Content-Type 抽 charset(公众号/知乎 UTF-8,少数 GBK 站按实际标签) 无法识别或无标签就 UTF-8 兜底 - 3xx / 非 2xx 的 body 用 drainAndClose 丢弃,避免连接卡在 keep-alive 池 - stub HttpClient 测试跟着改成 ByteArrayInputStream body - 新增 OgFetchServiceSsrfTest#fetch_bodyExceedsMaxSize_returnsFailure
这是跨模块的 SSRF 基础防御,未来 analytics / zotero / github 代理这些 会发 user-controlled URL 请求的模块都要复用。放 community/util 下有误导 ——它不是 community-specific。 - 移动:community/util/PrivateAddressGuard → common/security/PrivateAddressGuard - 更新包声明 + OgFetchService 的 import - 无逻辑变更
Spring 默认只在 unchecked 上回滚;后续有人往 submit / submitInternal / report / enrich 里加 checked 异常(IOException 等)时,不至于悄悄把半 条数据 commit 掉。read-only 方法保持不变——只读事务不用管回滚策略。
之前 URI.resolve 对畸形 Location 抛 IllegalArgumentException 会一路冒到
外层 catch(Exception) 里被伪装成 "解析异常: ...",排障时看不出根因。
改成在循环里就地 catch,返回 failure("redirect target invalid: <msg>")
并 log.warn 原始 Location,供下游告警识别。
新增 SsrfTest#fetch_redirectWithGarbageLocation_returnsStructuredFailure
覆盖这条路径。
PrivateAddressGuard 解析 IP 和 HttpClient 建连前的 DNS 解析是两次独立查询, 低 TTL 攻击者域能在窗口期把 A 记录从公网翻到 169.254.169.254。彻底堵 需要换 HC5 / OkHttp + 自定义 DnsResolver pin IP,属于后续工程项。 Javadoc 里记一笔给后面接手的人一个 breadcrumb。
原写法 drained += sink.length 在流尾 / 短读时会把实际 2KB 当 8KB 计数。 功能方向无害(只会比预期更早 break),但代码在撒谎,触发后面维护的人 误判行为边界。改成捕获 n 加上去。
原写法在 lower 上 indexOf,再回原 contentType 上 substring。ASCII 情况下 "Charset=" 和 "charset=" 长度都是 8,靠巧合能对上;但这是依赖 locale-less ASCII 长度守恒的脆弱逻辑——哪天把 toLowerCase 换成带 locale 的版本、或在 中间塞个 trim,就会错位一字节读到诡异的 charset 名。 Charset.forName 本身对字符集名大小写不敏感(gbk / utf-8 都认),所以直接 在 lower 上 substring 拿到的小写 charset 名完全够用,还更省心。
光换 HC5 / OkHttp + 自定义 DnsResolver 不够——还得把解析到的 IP 直接喂 给 socket connect、Host 头带原域名走 SNI;OS 层 nscd / systemd-resolved 和 JVM networkaddress.cache.ttl 都能留毫秒级残窗口。把这层纹理写进 Javadoc, 免得后来人换个 HttpClient 就以为彻底修好了。
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the community OG fetch pipeline against SSRF (including redirect re-checks and response size caps), improves operational error logging, and finishes transaction boundary + module decoupling work in SharedLinkService.
Changes:
- Add
PrivateAddressGuardand integrate it intoOgFetchServicewith manual redirect handling, capped streaming body reads, and improved failure modes. - Add/adjust tests for OG fetching behavior under SSRF/redirect/oversize/malformed inputs.
- Add transactional annotations to
SharedLinkServicemethods and replace directUserAccountRepositoryaccess withUserCenterService.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/test/java/com/involutionhell/backend/community/service/OgFetchServiceTests.java | Updates stubs to match ofInputStream() body handling and charset parsing path. |
| src/test/java/com/involutionhell/backend/community/service/OgFetchServiceSsrfTest.java | New SSRF-focused test coverage for pre-block, redirect re-check, oversize body, and malformed Location. |
| src/main/java/com/involutionhell/backend/community/service/SharedLinkService.java | Adds transactional boundaries + swaps usercenter access to UserCenterService facade. |
| src/main/java/com/involutionhell/backend/community/service/OgFetchService.java | Implements SSRF guard, manual redirects, streaming body reads with size cap, and structured redirect failures. |
| src/main/java/com/involutionhell/backend/common/security/PrivateAddressGuard.java | New utility for blocking private/unsafe IP ranges via DNS/IP inspection. |
| src/main/java/com/involutionhell/backend/common/error/GlobalExceptionHandler.java | Replaces printStackTrace with structured SLF4J logging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+84
to
+93
| } else if (addr instanceof Inet6Address v6) { | ||
| byte[] b = v6.getAddress(); | ||
| int first = b[0] & 0xff; | ||
|
|
||
| // fc00::/7 — Unique Local Address,JDK 没有 isUniqueLocal() | ||
| if ((first & 0xfe) == 0xfc) return true; | ||
|
|
||
| // fe80::/10 — link-local;JDK 已判过,冗余一遍保险 | ||
| if (first == 0xfe && (b[1] & 0xc0) == 0x80) return true; | ||
| } |
Comment on lines
+69
to
+83
| void fetch_redirectToLinkLocalMetadata_blockedOnSecondHop() { | ||
| // 公开 host 第一跳 200 是异常情况;我们用 302 → 169.254.169.254(AWS/GCP | ||
| // metadata endpoint,link-local)。第二跳应在 PrivateAddressGuard 阶段被挡。 | ||
| ScriptedHttpClient client = new ScriptedHttpClient(List.of( | ||
| ScriptedResponse.redirect(302, "http://169.254.169.254/latest/meta-data/") | ||
| )); | ||
| OgFetchService service = new OgFetchService(client); | ||
|
|
||
| OgFetchResult result = service.fetch("https://example.com/og-test"); | ||
|
|
||
| assertThat(result.isSuccess()).isFalse(); | ||
| assertThat(result.errorMessage()).isEqualTo("blocked internal host"); | ||
| // 第一跳被发出(拿到 302),第二跳的 host 解析后被挡,所以只有一次 HTTP 调用 | ||
| assertThat(client.sentRequests).hasSize(1); | ||
| } |
Comment on lines
+86
to
+146
| void fetch_publicHost200_parsesOgMeta() { | ||
| String html = """ | ||
| <html><head> | ||
| <meta property="og:title" content="公共站点 OK" /> | ||
| <meta property="og:description" content="stub 描述" /> | ||
| <meta property="og:image" content="https://cdn.example.com/x.jpg" /> | ||
| <meta property="og:site_name" content="Example" /> | ||
| </head><body></body></html> | ||
| """; | ||
| ScriptedHttpClient client = new ScriptedHttpClient(List.of( | ||
| ScriptedResponse.ok(html) | ||
| )); | ||
| OgFetchService service = new OgFetchService(client); | ||
|
|
||
| OgFetchResult result = service.fetch("https://example.com/og-article"); | ||
|
|
||
| assertThat(result.isSuccess()).isTrue(); | ||
| assertThat(result.ogTitle()).isEqualTo("公共站点 OK"); | ||
| assertThat(result.ogDescription()).isEqualTo("stub 描述"); | ||
| assertThat(result.ogCover()).isEqualTo("https://cdn.example.com/x.jpg"); | ||
| assertThat(result.ogSiteName()).isEqualTo("Example"); | ||
| assertThat(client.sentRequests).hasSize(1); | ||
| } | ||
|
|
||
| @Test | ||
| void fetch_redirectWithGarbageLocation_returnsStructuredFailure() { | ||
| // 畸形 Location(带空格 + 非法字符)让 URI.resolve 抛 IllegalArgumentException; | ||
| // 我们必须把它转成结构化 failure("redirect target invalid: ..."),而不是 | ||
| // 从外层 catch(Exception) 里漏出成通用 "解析异常" | ||
| ScriptedHttpClient client = new ScriptedHttpClient(List.of( | ||
| ScriptedResponse.redirect(302, "ht!tp://bad host /x y") | ||
| )); | ||
| OgFetchService service = new OgFetchService(client); | ||
|
|
||
| OgFetchResult result = service.fetch("https://example.com/og"); | ||
|
|
||
| assertThat(result.isSuccess()).isFalse(); | ||
| assertThat(result.errorMessage()).startsWith("redirect target invalid"); | ||
| assertThat(client.sentRequests).hasSize(1); | ||
| } | ||
|
|
||
| @Test | ||
| void fetch_bodyExceedsMaxSize_returnsFailure() { | ||
| // 恶意公开 host 返回 > 2 MB 的 body —— 服务端必须边读边截断, | ||
| // 不能把无限流整个吃进堆 | ||
| int oversize = OgFetchService.MAX_BODY_BYTES + 16 * 1024; | ||
| byte[] payload = new byte[oversize]; | ||
| // 填可见字符避免读到全 0 被解析成空文档 | ||
| for (int i = 0; i < oversize; i++) payload[i] = 'A'; | ||
|
|
||
| ScriptedHttpClient client = new ScriptedHttpClient(List.of( | ||
| ScriptedResponse.okRaw(payload) | ||
| )); | ||
| OgFetchService service = new OgFetchService(client); | ||
|
|
||
| OgFetchResult result = service.fetch("https://example.com/huge"); | ||
|
|
||
| assertThat(result.isSuccess()).isFalse(); | ||
| assertThat(result.errorMessage()).isEqualTo("response body exceeded max size"); | ||
| assertThat(client.sentRequests).hasSize(1); | ||
| } |
Comment on lines
+129
to
+132
| if (PrivateAddressGuard.isBlockedHost(host)) { | ||
| log.warn("og-fetch 拒绝内网/回环 host: url={} host={}", currentUrl, host); | ||
| return OgFetchResult.failure("blocked internal host"); | ||
| } |
Comment on lines
+111
to
+116
| * 事务覆盖「限频读 + 去重读 + insert」三步:限频计数和 insert 必须在同一 | ||
| * 事务里(同隔离级别下),否则并发请求会穿透日配额。 | ||
| * | ||
| * 末尾的 enrichmentWorker.enrich 是 @Async 分发:只是把 Runnable 扔进另 | ||
| * 一条线程池,不在本事务线程上跑,故对事务边界无副作用。worker 内部已对 | ||
| * findById 空返回做降级处理,覆盖「tx 还没 commit 就被 async 读到」的 race。 |
ofInputStream 切换后 HttpRequest.timeout 只覆盖到拿到 response head, 之后从 InputStream 逐块 read 的耗时不受它管;原注释写成「connect + read 合计 10s」会误导后来者。把防御归因挪回 MAX_BODY_BYTES + readBodyCapped。
::ffff:0:0/96 这块 96 位前缀的 IPv6 地址实际指向 IPv4,但 JDK 的 isLoopbackAddress / isSiteLocalAddress 都按"纯 IPv6"处理,全部返回 false。 原代码 IPv6 分支只判 ULA + link-local,攻击者写 [::ffff:127.0.0.1] / [::ffff:10.0.0.1] / [::ffff:169.254.169.254] (AWS metadata via mapped)就能绕掉所有 IPv4 黑名单。 修法:检测到 ::ffff:0:0/96 前缀(前 10 字节 0、第 11/12 字节 0xff)就 取末 4 字节用 InetAddress.getByAddress 重建 Inet4Address,递归走一遍 isBlockedAddress 走 IPv4 全套规则。 顺便加了 resolveAndCheck 三态枚举(OK / DNS_FAIL / BLOCKED)给上层 区分 DNS 失败和真实命中黑名单——为下一个 commit 做铺垫,旧 isBlockedHost 改成 thin wrapper 不破坏现有调用方。 PrivateAddressGuardTest 新加 10 条用例,包括 4 条 IPv4-mapped 场景。
之前 PrivateAddressGuard.isBlockedHost 把 UnknownHostException 和 IP 命中 黑名单合并成一个 boolean,OgFetchService 一律返回 "blocked internal host"。 用户敲错域名(typo / 老链接)时排障会以为我们在审查他的链接。 改用 resolveAndCheck 拿到三态枚举: - DNS_FAIL → "dns lookup failed: <host>" - BLOCKED → "blocked internal host"(语义保持,给 SSRF 攻击者看的) - OK → 继续走 仍是 fail-closed:DNS 失败和黑名单命中都直接 return failure,没有放行漏洞。
OgFetchService.fetch 在发请求前用 PrivateAddressGuard 解析 host,喂 example.com / mp.weixin.qq.com / zhuanlan.zhihu.com 这种真实域名会真的 查 DNS,导致离线 / 受限 CI 直接挂。 把 SSRF 测试 + 平台维度 OG 解析测试里的 fetch URL 全部换成 1.1.1.1 (Cloudflare DNS,公网 IP 字面量),guard 直接判 OK 不查 DNS。 站点平台维度本身由 OG meta 文本断言覆盖,host 在这些用例里就是个 路由占位,不影响 assertion。 parseOg(html, baseUrl) 那条直接调内部方法的不动——Jsoup 的 baseUrl 参数走不到 guard。
旧 Javadoc 写的是"事务覆盖三步否则并发会穿透日配额"——这是骗自己。 SELECT COUNT(*) + INSERT 是经典 check-then-act,PostgreSQL 默认 Read Committed 下两个并发 tx 完全可以都读到 count=N 然后都插入。 @transactional 给的是单次请求内的一致性快照,不是原子限频。 改写 Javadoc 直接讲清楚:tx 是干嘛的、不是干嘛的、真正原子限频要靠 DB UNIQUE / SELECT FOR UPDATE / Redis 哪种方案。RateLimitExceeded 也注上"best-effort,并发可能短暂穿透",避免下游依赖它做强保证。 代码层面不动——真原子限频是单独 PR 的事,已记入 follow-ups。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
12 commits,三块:
SSRF 加固(
OgFetchService+ 新PrivateAddressGuard)common/security/PrivateAddressGuard:InetAddress.getAllByName后拒所有 loopback / RFC1918 / link-local / CGNAT / multicast / IPv6 ULA / fe80::/10 / 0/8;DNS 失败 fail-closedRedirect.NORMAL,改手动跟随 redirect 最多 3 次,每一跳都重新走 guardofInputStream流式读,2 MB 截断;drainAndClose在所有退出分支都清理连接Location走结构化 failure,不再伪装成 "解析异常"Charset=GBK这类大小写漂移错误处理
GlobalExceptionHandler.java兜底异常把printStackTrace换成log.error("未处理的异常", e)事务 & 模块解耦(
SharedLinkService)submit / submitInternal / report / enrich)补@Transactional(rollbackFor = Exception.class)findById / buildAdminSummary / listApproved / listBySubmitter / listPendingForAdmin)改@Transactional(readOnly = true)UserAccountRepository的直接依赖,改走UserCenterService.findByUsername(...)facadeKnown follow-ups(本 PR 不做)
countBySubmitterSince+ insert 是 check-then-act,并发 race,需 UNIQUE 约束或 SELECT FOR UPDATEUserCenterService.findByUsername返回UserAccountdomain 对象 仍然泄露 usercenter schema 到 community;后续可换 DTOTest plan
./mvnw -q -DskipTests compile— clean./mvnw test -Dtest='OgFetchServiceSsrfTest'— 6/6 pass(SSRF / redirect-to-private / oversize body / malformed Location / happy path)./mvnw test -Dtest='com.involutionhell.backend.community.**'— 50/50 passhttp://127.0.0.1/...应直接拒、提交http://short-lived-redirect-to-169.254.169.254/...每跳都拒🤖 Generated with Claude Code