fix(listener): 拦贴纸/GIF聚合站 + 裸媒体文件，避免 Discord 表情包误入分享库 by longsizhuo · Pull Request #2 · InvolutionHell/ChatBot

longsizhuo · 2026-04-25T07:47:44Z

事故

用户在分享频道发了一个 Discord 贴纸（klipy GIF），`message.content` 里就是裸 `https://klipy.com/gifs/...\` URL。listener 当成正常分享走完 OG fetch + 分类，被打成 APPROVED 上架成 #18。

类似的还有 #5（`mmbiz.qpic.cn/...640.jpg`，WeChat 图片直链）。

根因

原 `_SKIP_HOSTS` 只拦了 `discord.com` / `cdn.discordapp.com` 等 Discord 自家域，没考虑：

Discord 贴纸面板默认走 tenor / klipy / giphy（裸 URL 进 message.content）
普通图片直链（`.jpg` / `.gif` / `.png`）也不应该入库

改法

`_SKIP_HOSTS` 加入 tenor / klipy / giphy 全套（包括 media 子域）
兜底在 path 上做媒体扩展名匹配（host 永远穷举不完）：`.gif/.png/.jpg/.jpeg/.webp/.bmp/.svg/.ico/.mp4/.webm/.mov/.m4v/.mp3/.wav/.ogg/.flac`
匹配只看 path，query 里出现 .jpg 不算（避免误伤带 `?file=foo.jpg` 的正常 API 链接）
测试 +19 case：klipy/tenor/giphy 各域、各种裸图片直链、case-insensitivity、query-only 媒体扩展名应放行

DB 清理

`#5` 和 `#18` 已直接 `UPDATE shared_links SET status = 'REJECTED' WHERE id IN (5, 18)` 在 prod DB 里执行，前端已不展示。

部署

本仓库 systemd 已 `restart chat-bot` 加载新代码（systemd 读磁盘）。

Test

`uv run pytest tests/` — 79/79 pass（新增 19 case 在 `test_listener_skip.py`）
`uv run ruff check src/ tests/` — clean

🤖 Generated with Claude Code

事故：用户 yhn 在分享频道发了一个 Discord 贴纸（klipy GIF），message.content 里就是裸 https://klipy.com/gifs/... URL，listener 当成正常分享走完 OG fetch + 分类，被打成 APPROVED 上架成 #18。原 _SKIP_HOSTS 只拦了 discord.com / cdn.discordapp.com 等 Discord 自家域，没考虑贴纸面板默认走 tenor / klipy / giphy。同类问题：mmbiz.qpic.cn 这类纯图片直链（#5）也不该入库。改法两层：(1) _SKIP_HOSTS 加入 tenor / klipy / giphy 全套；(2) 兜底在 path 上做媒体扩展名（.gif/.png/.jpg/.mp4/...）匹配，host 永远穷举不完。匹配只看 path，query 里出现 .jpg 不算（避免误伤带 ?file=foo.jpg 的正常 API 链接）。+19 个测试 case 覆盖。

/share 是单页提交入口（带 ?url=... 预填，给 bookmarklet 用），/feed 才是已审核通过的展示墙。Bot 在 listener.py（首条 reply + APPROVED 终态 reply）和 commands.py（/share 斜杠命令成功回执）三处都把 '点此查看 / 已收录到内卷地狱分享库' 链接指向 /share——结果用户点过去看到的是空提交表单，不是自己刚分享的内容。

Copilot

Pull request overview

This PR hardens the Discord share listener’s URL filtering so sticker/GIF aggregator links and direct (bare) media-file URLs don’t get treated as “share submissions” and ingested into the backend.

Changes:

Expand _SKIP_HOSTS to include tenor/klipy/giphy domains commonly emitted by Discord sticker/GIF features.
Add a path-based media extension fallback (.gif/.png/.jpg/.../.mp3/...) to skip bare media links regardless of host, while ignoring query-only matches.
Add unit tests covering aggregator domains, bare media links, and non-skipped “normal article” URLs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`src/chat_bot/cogs/listener.py`	Extends skip logic to include sticker/GIF aggregators and path-based media extension filtering.
`tests/test_listener_skip.py`	Adds test cases validating the new skip behavior and non-regressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-25T07:50:33Z

+    # path 走小写匹配，跟 query 解耦：?foo=bar.jpg 不会误命中
+    return parsed.path.lower().endswith(_MEDIA_EXTENSIONS)


Media-extension skipping is based on parsed.path.endswith(...), but the listener’s URL regex can still capture trailing punctuation like , / . / ， / 。 after a URL in chat. In that case the path becomes /file.jpg, and won’t match, defeating the new safeguard. Consider normalizing before the check (e.g., stripping common trailing punctuation from the URL/path) and adding a regression test for it.

Copilot · 2026-04-25T07:50:34Z

+@pytest.mark.parametrize(
+    "url",
+    [
+        # 裸图片（WeChat 图床、随便哪个 host 的图片直链）
+        "https://mmbiz.qpic.cn/mmbiz_jpg/abc/640.jpg",
+        "https://example.com/path/photo.PNG",
+        "https://i.example.com/cat.gif",
+        "https://example.com/foo.webp",
+        # 视频/音频直链
+        "https://example.com/clip.mp4",
+        "https://example.com/audio.mp3",
+        # SVG（即便 host 不在黑名单也拦，配合服务端 SVG 上传黑名单）
+        "https://example.com/icon.svg",
+    ],


Consider adding at least one regression case where a bare media URL is followed by trailing punctuation (e.g. https://example.com/a.jpg, or Chinese punctuation) to reflect how URLs appear in real Discord messages; otherwise the new path-based extension filter can be bypassed if the extracted URL includes that punctuation.

Copilot · 2026-04-25T07:50:34Z

+    host = parsed.netloc.lower().split(":")[0]
+    if host in _SKIP_HOSTS:


host = parsed.netloc.lower().split(":")[0] is a bit fragile (doesn’t handle IPv6 literals like [::1]:443 and can be confused by userinfo in the URL). Prefer parsed.hostname (already lowercased by urlparse) and then compare against _SKIP_HOSTS.

Suggested change

host = parsed.netloc.lower().split(":")[0]

if host in _SKIP_HOSTS:

host = parsed.hostname

if host is not None and host in _SKIP_HOSTS:

用户在分享频道贴自己 PR (#2) 通告，bot 把它当 '社区分享' 收成 #19。同类还会有 issue/commit/compare/actions/releases/discussions/blob/tree 等 dev 子路径。策略：path 至少 3 段（/<org>/<repo>/<sub>）且 org=involutionhell 时 skip，仓库主页和第三方仓库全放行。这是 dev 自循环噪声专杀，不影响合法分享。+11 测试 case。

Copilot AI review requested due to automatic review settings April 25, 2026 07:47

Copilot started reviewing on behalf of longsizhuo April 25, 2026 07:48 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

longsizhuo merged commit 9d77952 into main Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(listener): 拦贴纸/GIF聚合站 + 裸媒体文件，避免 Discord 表情包误入分享库#2

fix(listener): 拦贴纸/GIF聚合站 + 裸媒体文件，避免 Discord 表情包误入分享库#2
longsizhuo merged 3 commits intomainfrom
fix/listener-sticker-gif-blocklist

longsizhuo commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# path 走小写匹配，跟 query 解耦：?foo=bar.jpg 不会误命中
		return parsed.path.lower().endswith(_MEDIA_EXTENSIONS)

		host = parsed.netloc.lower().split(":")[0]
		if host in _SKIP_HOSTS:

Conversation

longsizhuo commented Apr 25, 2026

事故

根因

改法

DB 清理

部署

Test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants