llama.cpp — Apple Platforms Fork

A fork of ggml-org/llama.cpp focused on producing a working llama.xcframework for local on-device inference across all current Apple platforms.

→ Download the latest prebuilt llama.xcframework.zip (sha256) — rebuilt on every push to master and weekly.

The goal of this fork is narrow: keep a shippable xcframework building on a modern macOS toolchain (Xcode 26+, CMake 4.x). No feature additions, no diverging APIs — everything else is pulled straight from upstream.

What this fork changes

Three commits on top of upstream (plus this README):

cmake_minimum_required bump (CMakeLists.txt, ggml/CMakeLists.txt) — widens the accepted version range to 3.5...4.2 so CMake 4.x stops warning about removed policies. Upstream still pins to 3.14...3.28.
build-xcframework.sh → Ninja generator — see "Why Ninja" below. All 7 cmake -B invocations in the script now use -G Ninja instead of -G Xcode. The Xcode-only -- -quiet build argument was dropped. combine_static_libraries call sites now pass . as the release_dir because Ninja is single-config and emits archives directly under src/, not src/Release-<sdk>/.
Fork-focused README — this file; upstream README moved to README.upstream.md.

No C/C++/Objective-C source has been touched. No APIs added, removed, or renamed. No ggml backend modifications. Library behavior is byte-for-byte identical to upstream b8802 for the same inputs.

Why Ninja

On CMake 4.x with Xcode 26, the Xcode generator fails when cross-compiling to iOS/tvOS/visionOS SDKs:

-- The C compiler identification is unknown
CMake Error at ggml/src/ggml-cpu/CMakeLists.txt:57 (target_compile_features):
  target_compile_features no known features for C compiler "" version .

The failure reproduces against upstream/master, verified 2026-04-15. Ninja bypasses it entirely because it does not rely on Xcode's toolchain detection for cross-SDK builds. The resulting xcframework is equivalent — the Xcode-specific -DCMAKE_XCODE_ATTRIBUTE_* arguments in COMMON_CMAKE_ARGS are harmless no-ops under Ninja, so they were left alone rather than stripped.

Downloading a prebuilt xcframework

GitHub Actions rebuilds the xcframework on every push to master and on a weekly schedule. The latest artifact is attached to a rolling prerelease tagged latest:

curl -LO https://github.com/apocryphx/llama.cpp/releases/download/latest/llama.xcframework.zip
unzip llama.xcframework.zip

Drag llama.xcframework into your Xcode project's Frameworks, Libraries, and Embedded Content section. Each workflow run also uploads the zip as a 30-day Actions artifact if you need a specific build.

Building the xcframework

./build-xcframework.sh

Output: build-apple/llama.xcframework/ containing 7 slices:

Slice	Architectures
`ios-arm64`	arm64
`ios-arm64-simulator`	arm64
`macos-arm64_x86_64`	arm64, x86_64
`xros-arm64`	arm64
`xros-arm64-simulator`	arm64
`tvos-arm64`	arm64
`tvos-arm64-simulator`	arm64

Simulator slices are arm64-only — every modern Xcode Simulator runs arm64-on-arm64 on Apple Silicon. The x86_64 simulator slice has been dropped to halve simulator build times and binary sizes. macOS keeps the Intel slice since real Intel Macs are still in use.

Mac Catalyst is not in the xcframework — CMake's cross-compile flags conflict when combining both Catalyst architectures in a single configure step. See APPLE-PLATFORMS-BUILD.md for the manual lipo workflow.

Every slice links Metal.framework and Accelerate.framework, and embeds the full Metal shader library (110 MSL kernels) via GGML_METAL_EMBED_LIBRARY=ON. No external .metallib file is required at runtime. You can verify this on any slice:

nm build-apple/llama.xcframework/ios-arm64/llama.framework/llama \
  | grep ggml_metallib
# 000000000032cc20 S _ggml_metallib_start
# 00000000003bf6d3 S _ggml_metallib_end

Expected build-time warnings

During configuration of simulator/multi-arch slices you will see:

CMake Warning at ggml/src/ggml-cpu/CMakeLists.txt:558 (message):
  Unknown CPU architecture.  Falling back to generic implementations.

This is not a Metal fallback. It fires only when x86_64 is part of the architecture list (now just the macos-arm64_x86_64 slice) and means the x86_64 CPU backend slice uses generic scalar kernels instead of AVX/AVX2. The arm64 CPU backend and the Metal backend are unaffected. For shipping on Apple Silicon devices this warning is cosmetic.

Requirements

Xcode 26.x with all platform SDKs installed
CMake 4.x (brew install cmake)
Ninja (brew install ninja)

Last verified: 2026-04-15 against upstream tag b8802 with Xcode 26.4, CMake 4.2.3, Ninja 1.13.2.

Verifying Metal works

The xcframework is a library — it doesn't ship a runnable binary. To prove Metal is functional end-to-end against the same source the xcframework was built from, do a parallel host-macOS build of llama-bench:

cmake -B build-host -G Ninja \
    -DCMAKE_BUILD_TYPE=Release \
    -DGGML_METAL=ON -DGGML_METAL_EMBED_LIBRARY=ON \
    -DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_SERVER=OFF -DLLAMA_CURL=OFF
cmake --build build-host --target llama-bench -j
./build-host/bin/llama-bench -m <model>.gguf -p 64 -n 32 -ngl 99

Note: llama-cli is only built when LLAMA_BUILD_SERVER=ON in this upstream (tools/CMakeLists.txt). llama-bench is always available and is a more informative smoke test anyway — it prints tokens/sec per backend.

Look for:

ggml_metal_library_init: using embedded metal library — the embedded metallib loaded, not a disk .metallib.
GPU family: MTLGPUFamilyApple* — real Apple Silicon GPU detected.
Backend column MTL,BLAS — Metal is the compute backend.
tg (token generation) rates in the hundreds of t/s on a small model; CPU-only would be 10× slower.

Reference numbers from the 2026-04-15 verification (SmolLM2-135M-Instruct Q4_K_M on an M-series Mac): pp64 ≈ 8098 t/s, tg32 ≈ 403 t/s, backend MTL,BLAS, family MTLGPUFamilyApple9.

The xcframework slices contain identical Metal backend code — same _ggml_metallib_start/_end symbols, same 110 kernels — so a working host Metal build is a reliable proxy for the framework slices.

Code signing

The xcframework is unsigned. build-xcframework.sh sets CODE_SIGNING_ALLOWED=NO on purpose — a redistributable framework must not carry its own signature, because the consuming app has to re-sign embedded binaries with its own Team ID at archive time. This is the expected pattern for distributing a third-party .xcframework.

When you embed llama.xcframework in an Xcode project:

In the Frameworks, Libraries, and Embedded Content section, set Embed to Embed & Sign. Xcode will sign the embedded llama.framework with your app's signing identity during the build.
No action needed in your entitlements — the framework uses only public Metal.framework / Accelerate.framework APIs.
For distribution outside the App Store (Developer ID / notarization), the framework will be signed and notarized as part of your app's archive, not separately.

If Xcode refuses to build with a signature error, the fix is almost always switching Embed from Do Not Embed to Embed & Sign. Do not pre-sign the framework with codesign before embedding — that produces nested signatures that archive validation rejects.

Syncing with upstream

git fetch upstream --tags
LATEST_TAG=$(git tag --list 'b*' --sort=-v:refname | head -1)
git rebase "$LATEST_TAG"
./build-xcframework.sh     # re-verify

The fork's commits rebase cleanly onto upstream tags with one known pinch point: ggml/CMakeLists.txt conflicts whenever upstream adds code near cmake_minimum_required (e.g. the CMP0194 policy block added around tag b8802). Resolve by keeping both the fork's version-range bump and whatever upstream added adjacent to it.

After rebasing, run ./build-xcframework.sh and spot-check one slice before force-pushing:

lipo -info build-apple/llama.xcframework/ios-arm64_x86_64-simulator/llama.framework/llama
# Architectures in the fat file: ... are: x86_64 arm64
nm build-apple/llama.xcframework/ios-arm64/llama.framework/llama | grep ggml_metallib
# Expect _ggml_metallib_start and _ggml_metallib_end symbols.

License

MIT, same as upstream.

Name		Name	Last commit message	Last commit date
Latest commit History 8,808 Commits
.devops		.devops
.gemini		.gemini
.github		.github
benches		benches
ci		ci
cmake		cmake
common		common
docs		docs
examples		examples
ggml		ggml
gguf-py		gguf-py
grammars		grammars
include		include
licenses		licenses
media		media
models		models
pocs		pocs
requirements		requirements
scripts		scripts
src		src
tests		tests
tools		tools
vendor		vendor
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.ecrc		.ecrc
.editorconfig		.editorconfig
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
APPLE-PLATFORMS-BUILD.md		APPLE-PLATFORMS-BUILD.md
AUTHORS		AUTHORS
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README.upstream.md		README.upstream.md
SECURITY.md		SECURITY.md
build-xcframework.sh		build-xcframework.sh
convert_hf_to_gguf.py		convert_hf_to_gguf.py
convert_hf_to_gguf_update.py		convert_hf_to_gguf_update.py
convert_llama_ggml_to_gguf.py		convert_llama_ggml_to_gguf.py
convert_lora_to_gguf.py		convert_lora_to_gguf.py
flake.lock		flake.lock
flake.nix		flake.nix
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
ty.toml		ty.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama.cpp — Apple Platforms Fork

What this fork changes

Why Ninja

Downloading a prebuilt xcframework

Building the xcframework

Expected build-time warnings

Requirements

Verifying Metal works

Code signing

Syncing with upstream

Further reading

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llama.cpp — Apple Platforms Fork

What this fork changes

Why Ninja

Downloading a prebuilt xcframework

Building the xcframework

Expected build-time warnings

Requirements

Verifying Metal works

Code signing

Syncing with upstream

Further reading

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages