feat(preset): add GPT-OSS 120B presets for H100 SXM#110
Conversation
Add vLLM v0.17.0 E2E, prefill, and decode presets for openai/gpt-oss-120b on NVIDIA H100 SXM with tp4 and tp8 parallelism (both EP and TP MoE variants). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code reviewFound 2 issues (each affects all 12 new preset files):
Apply the same fix to the other 11 new files under 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
There was a problem hiding this comment.
Pull request overview
Adds new Odin InferenceServiceTemplate preset manifests to support running openai/gpt-oss-120b on NVIDIA H100 SXM with vLLM v0.17.0, covering E2E, prefill, and decode roles across TP4/TP8 and MoE TP/EP variants.
Changes:
- Added 12 new vLLM
v0.17.0presets foropenai/gpt-oss-120bonnvidia/h100-sxm. - Included E2E, prefill, and decode roles for
tp4andtp8parallelism. - Provided both MoE TP (
-moe-tp*) and MoE EP (-moe-ep*) variants for each role/TP size.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp8-moe-tp8.helm.yaml | New prefill preset (TP8, MoE-TP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp8-moe-ep8.helm.yaml | New prefill preset (TP8, MoE-EP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml | New prefill preset (TP4, MoE-TP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml | New prefill preset (TP4, MoE-EP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp8-moe-tp8.helm.yaml | New E2E preset (TP8, MoE-TP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp8-moe-ep8.helm.yaml | New E2E preset (TP8, MoE-EP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml | New E2E preset (TP4, MoE-TP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml | New E2E preset (TP4, MoE-EP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp8-moe-tp8.helm.yaml | New decode preset (TP8, MoE-TP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp8-moe-ep8.helm.yaml | New decode preset (TP8, MoE-EP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml | New decode preset (TP4, MoE-TP) for H100 SXM |
| deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml | New decode preset (TP4, MoE-EP) for H100 SXM |
…GPT-OSS 120B H100 SXM presets openai/gpt-oss-120b requires --trust-remote-code to load custom remote code, and --no-enable-log-requests must be set alongside --disable-uvicorn-access-log because ISVC_EXTRA_ARGS in a preset fully overrides the runtime base's value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
GPT-OSS 120B는 사실 trust-remote-code가 없어도 돌긴 하는데 일단 있어도 문제는 안 되어서 리뷰를 따랐습니다 ㅎㅎ |
|
Both issues addressed in 516931a — verified all 12 files now include 🤖 Generated with Claude Code |
Summary
openai/gpt-oss-120bon NVIDIA H100 SXMTest plan
helm lintpasses formoai-inference-presethelm templaterenders all 12 new presets correctly