feat(preset): add GPT-OSS 120B presets for H100 SXM by gangwon · Pull Request #110 · moreh-dev/mif

gangwon · 2026-04-19T08:09:40Z

Summary

Add 12 vLLM v0.17.0 presets for openai/gpt-oss-120b on NVIDIA H100 SXM
Covers E2E, prefill, and decode roles with tp4/tp8 parallelism
Includes both EP and TP MoE variants for each configuration

Test plan

helm lint passes for moai-inference-preset
helm template renders all 12 new presets correctly
Verify preset naming and labels match conventions

Add vLLM v0.17.0 E2E, prefill, and decode presets for openai/gpt-oss-120b on NVIDIA H100 SXM with tp4 and tp8 parallelism (both EP and TP MoE variants). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hhk7734 · 2026-04-19T08:26:50Z

Code review

Found 2 issues (each affects all 12 new preset files):

Missing --no-enable-log-requests in ISVC_EXTRA_ARGS (deploy/helm/AGENTS.md says "Logging arguments (--disable-uvicorn-access-log, --no-enable-log-requests) — presets must include these because ISVC_EXTRA_ARGS in a preset fully overrides the runtime base's value during Odin strategic merge patch"). The existing GPT-OSS 120B NVL preset includes it at line 32; PR MAF-19524: feat(preset): add vLLM v0.17.0 E2E presets for AI& April launch models #101 fixed the same omission across earlier presets.

mif/deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml

Lines 27 to 38 in 730fd29

    
                       - name: ISVC_EXTRA_ARGS 
        
                         value: >- 
        
                           --enable-auto-tool-choice 
        
                           --tool-call-parser openai 
        
                           --reasoning-parser openai_gptoss 
        
                           --max-num-seqs 128 
        
                           --max-num-batched-tokens 8192 
        
                           --max-cudagraph-capture-size 2048 
        
                           --max-model-len -1 
        
                           --disable-uvicorn-access-log 
        
                           --exclude-tools-when-tool-choice-none 
        
                     resources:

Missing --trust-remote-code in ISVC_EXTRA_ARGS. openai/gpt-oss-120b loads custom remote code; without this flag vLLM fails to load the model. Every other GPT-OSS preset sets it (e.g. NVL preset at line 28), and deploy/helm/AGENTS.md lists --trust-remote-code among the model-specific args presets own.

mif/deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml

Lines 26 to 38 in 730fd29

    
                     env: 
        
                       - name: ISVC_EXTRA_ARGS 
        
                         value: >- 
        
                           --enable-auto-tool-choice 
        
                           --tool-call-parser openai 
        
                           --reasoning-parser openai_gptoss 
        
                           --max-num-seqs 128 
        
                           --max-num-batched-tokens 8192 
        
                           --max-cudagraph-capture-size 2048 
        
                           --max-model-len -1 
        
                           --disable-uvicorn-access-log 
        
                           --exclude-tools-when-tool-choice-none 
        
                     resources:

Apply the same fix to the other 11 new files under deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Copilot

Pull request overview

Adds new Odin InferenceServiceTemplate preset manifests to support running openai/gpt-oss-120b on NVIDIA H100 SXM with vLLM v0.17.0, covering E2E, prefill, and decode roles across TP4/TP8 and MoE TP/EP variants.

Changes:

Added 12 new vLLM v0.17.0 presets for openai/gpt-oss-120b on nvidia/h100-sxm.
Included E2E, prefill, and decode roles for tp4 and tp8 parallelism.
Provided both MoE TP (-moe-tp*) and MoE EP (-moe-ep*) variants for each role/TP size.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp8-moe-tp8.helm.yaml	New prefill preset (TP8, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp8-moe-ep8.helm.yaml	New prefill preset (TP8, MoE-EP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml	New prefill preset (TP4, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-prefill-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml	New prefill preset (TP4, MoE-EP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp8-moe-tp8.helm.yaml	New E2E preset (TP8, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp8-moe-ep8.helm.yaml	New E2E preset (TP8, MoE-EP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml	New E2E preset (TP4, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml	New E2E preset (TP4, MoE-EP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp8-moe-tp8.helm.yaml	New decode preset (TP8, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp8-moe-ep8.helm.yaml	New decode preset (TP8, MoE-EP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp4-moe-tp4.helm.yaml	New decode preset (TP4, MoE-TP) for H100 SXM
deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-decode-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml	New decode preset (TP4, MoE-EP) for H100 SXM

…GPT-OSS 120B H100 SXM presets openai/gpt-oss-120b requires --trust-remote-code to load custom remote code, and --no-enable-log-requests must be set alongside --disable-uvicorn-access-log because ISVC_EXTRA_ARGS in a preset fully overrides the runtime base's value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gangwon · 2026-04-19T08:34:27Z

GPT-OSS 120B는 사실 trust-remote-code가 없어도 돌긴 하는데 일단 있어도 문제는 안 되어서 리뷰를 따랐습니다 ㅎㅎ

hhk7734 · 2026-04-19T08:43:03Z

Both issues addressed in 516931a — verified all 12 files now include --trust-remote-code and --no-enable-log-requests.

mif/deploy/helm/moai-inference-preset/templates/presets/vllm/v0.17.0/openai-gpt-oss-120b-nvidia-h100-sxm-tp4-moe-ep4.helm.yaml

Lines 27 to 39 in 516931a

    
                       - name: ISVC_EXTRA_ARGS 
        
                         value: >- 
        
                           --trust-remote-code 
        
                           --enable-auto-tool-choice 
        
                           --tool-call-parser openai 
        
                           --reasoning-parser openai_gptoss 
        
                           --max-num-seqs 128 
        
                           --max-num-batched-tokens 8192 
        
                           --max-cudagraph-capture-size 2048 
        
                           --max-model-len -1 
        
                           --disable-uvicorn-access-log 
        
                           --no-enable-log-requests 
        
                           --exclude-tools-when-tool-choice-none

🤖 Generated with Claude Code

feat(preset): add GPT-OSS 120B presets for H100 SXM

730fd29

Add vLLM v0.17.0 E2E, prefill, and decode presets for openai/gpt-oss-120b on NVIDIA H100 SXM with tp4 and tp8 parallelism (both EP and TP MoE variants). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gangwon requested a review from a team as a code owner April 19, 2026 08:09

gitgod-bot assigned gangwon Apr 19, 2026

hhk7734 requested a review from Copilot April 19, 2026 08:24

Copilot started reviewing on behalf of hhk7734 April 19, 2026 08:24 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

hhk7734 approved these changes Apr 19, 2026

View reviewed changes

gangwon merged commit 8bf0bf4 into main Apr 19, 2026
3 checks passed

gangwon deleted the feat/preset-gpt-oss-120b branch April 19, 2026 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(preset): add GPT-OSS 120B presets for H100 SXM#110

feat(preset): add GPT-OSS 120B presets for H100 SXM#110
gangwon merged 2 commits intomainfrom
feat/preset-gpt-oss-120b

gangwon commented Apr 19, 2026

Uh oh!

hhk7734 commented Apr 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gangwon commented Apr 19, 2026

Uh oh!

hhk7734 commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gangwon commented Apr 19, 2026

Summary

Test plan

Uh oh!

hhk7734 commented Apr 19, 2026

Code review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gangwon commented Apr 19, 2026

Uh oh!

hhk7734 commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants