Skip to content

feat(preset): add GPT-OSS 120B presets for H100 SXM#110

Merged
gangwon merged 2 commits intomainfrom
feat/preset-gpt-oss-120b
Apr 19, 2026
Merged

feat(preset): add GPT-OSS 120B presets for H100 SXM#110
gangwon merged 2 commits intomainfrom
feat/preset-gpt-oss-120b

Conversation

@gangwon
Copy link
Copy Markdown
Contributor

@gangwon gangwon commented Apr 19, 2026

Summary

  • Add 12 vLLM v0.17.0 presets for openai/gpt-oss-120b on NVIDIA H100 SXM
  • Covers E2E, prefill, and decode roles with tp4/tp8 parallelism
  • Includes both EP and TP MoE variants for each configuration

Test plan

  • helm lint passes for moai-inference-preset
  • helm template renders all 12 new presets correctly
  • Verify preset naming and labels match conventions

Add vLLM v0.17.0 E2E, prefill, and decode presets for openai/gpt-oss-120b
on NVIDIA H100 SXM with tp4 and tp8 parallelism (both EP and TP MoE variants).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hhk7734
Copy link
Copy Markdown
Member

hhk7734 commented Apr 19, 2026

Code review

Found 2 issues (each affects all 12 new preset files):

  1. Missing --no-enable-log-requests in ISVC_EXTRA_ARGS (deploy/helm/AGENTS.md says "Logging arguments (--disable-uvicorn-access-log, --no-enable-log-requests) — presets must include these because ISVC_EXTRA_ARGS in a preset fully overrides the runtime base's value during Odin strategic merge patch"). The existing GPT-OSS 120B NVL preset includes it at line 32; PR MAF-19524: feat(preset): add vLLM v0.17.0 E2E presets for AI& April launch models #101 fixed the same omission across earlier presets.

- name: ISVC_EXTRA_ARGS
value: >-
--enable-auto-tool-choice
--tool-call-parser openai
--reasoning-parser openai_gptoss
--max-num-seqs 128
--max-num-batched-tokens 8192
--max-cudagraph-capture-size 2048
--max-model-len -1
--disable-uvicorn-access-log
--exclude-tools-when-tool-choice-none
resources:

  1. Missing --trust-remote-code in ISVC_EXTRA_ARGS. openai/gpt-oss-120b loads custom remote code; without this flag vLLM fails to load the model. Every other GPT-OSS preset sets it (e.g. NVL preset at line 28), and deploy/helm/AGENTS.md lists --trust-remote-code among the model-specific args presets own.

env:
- name: ISVC_EXTRA_ARGS
value: >-
--enable-auto-tool-choice
--tool-call-parser openai
--reasoning-parser openai_gptoss
--max-num-seqs 128
--max-num-batched-tokens 8192
--max-cudagraph-capture-size 2048
--max-model-len -1
--disable-uvicorn-access-log
--exclude-tools-when-tool-choice-none
resources:

Apply the same fix to the other 11 new files under deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new Odin InferenceServiceTemplate preset manifests to support running openai/gpt-oss-120b on NVIDIA H100 SXM with vLLM v0.17.0, covering E2E, prefill, and decode roles across TP4/TP8 and MoE TP/EP variants.

Changes:

  • Added 12 new vLLM v0.17.0 presets for openai/gpt-oss-120b on nvidia/h100-sxm.
  • Included E2E, prefill, and decode roles for tp4 and tp8 parallelism.
  • Provided both MoE TP (-moe-tp*) and MoE EP (-moe-ep*) variants for each role/TP size.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp8-moe-tp8.helm.yaml New prefill preset (TP8, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp8-moe-ep8.helm.yaml New prefill preset (TP8, MoE-EP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml New prefill preset (TP4, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml New prefill preset (TP4, MoE-EP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp8-moe-tp8.helm.yaml New E2E preset (TP8, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp8-moe-ep8.helm.yaml New E2E preset (TP8, MoE-EP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml New E2E preset (TP4, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml New E2E preset (TP4, MoE-EP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp8-moe-tp8.helm.yaml New decode preset (TP8, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp8-moe-ep8.helm.yaml New decode preset (TP8, MoE-EP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml New decode preset (TP4, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml New decode preset (TP4, MoE-EP) for H100 SXM

…GPT-OSS 120B H100 SXM presets

openai/gpt-oss-120b requires --trust-remote-code to load custom remote code,
and --no-enable-log-requests must be set alongside --disable-uvicorn-access-log
because ISVC_EXTRA_ARGS in a preset fully overrides the runtime base's value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gangwon
Copy link
Copy Markdown
Contributor Author

gangwon commented Apr 19, 2026

GPT-OSS 120B는 사실 trust-remote-code가 없어도 돌긴 하는데 일단 있어도 문제는 안 되어서 리뷰를 따랐습니다 ㅎㅎ

@hhk7734
Copy link
Copy Markdown
Member

hhk7734 commented Apr 19, 2026

Both issues addressed in 516931a — verified all 12 files now include --trust-remote-code and --no-enable-log-requests.

- name: ISVC_EXTRA_ARGS
value: >-
--trust-remote-code
--enable-auto-tool-choice
--tool-call-parser openai
--reasoning-parser openai_gptoss
--max-num-seqs 128
--max-num-batched-tokens 8192
--max-cudagraph-capture-size 2048
--max-model-len -1
--disable-uvicorn-access-log
--no-enable-log-requests
--exclude-tools-when-tool-choice-none

🤖 Generated with Claude Code

@gangwon gangwon merged commit 8bf0bf4 into main Apr 19, 2026
3 checks passed
@gangwon gangwon deleted the feat/preset-gpt-oss-120b branch April 19, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants