A fork of ggml-org/llama.cpp focused on producing a working llama.xcframework for local on-device inference across all current Apple platforms.
→ Download the latest prebuilt llama.xcframework.zip (sha256) — rebuilt on every push to master and weekly.
The goal of this fork is narrow: keep a shippable xcframework building on a modern macOS toolchain (Xcode 26+, CMake 4.x). No feature additions, no diverging APIs — everything else is pulled straight from upstream.
Three commits on top of upstream (plus this README):
cmake_minimum_requiredbump (CMakeLists.txt,ggml/CMakeLists.txt) — widens the accepted version range to3.5...4.2so CMake 4.x stops warning about removed policies. Upstream still pins to3.14...3.28.build-xcframework.sh→ Ninja generator — see "Why Ninja" below. All 7cmake -Binvocations in the script now use-G Ninjainstead of-G Xcode. The Xcode-only-- -quietbuild argument was dropped.combine_static_librariescall sites now pass.as therelease_dirbecause Ninja is single-config and emits archives directly undersrc/, notsrc/Release-<sdk>/.- Fork-focused README — this file; upstream README moved to README.upstream.md.
No C/C++/Objective-C source has been touched. No APIs added, removed, or renamed. No ggml backend modifications. Library behavior is byte-for-byte identical to upstream b8802 for the same inputs.
On CMake 4.x with Xcode 26, the Xcode generator fails when cross-compiling to iOS/tvOS/visionOS SDKs:
-- The C compiler identification is unknown
CMake Error at ggml/src/ggml-cpu/CMakeLists.txt:57 (target_compile_features):
target_compile_features no known features for C compiler "" version .
The failure reproduces against upstream/master, verified 2026-04-15. Ninja bypasses it entirely because it does not rely on Xcode's toolchain detection for cross-SDK builds. The resulting xcframework is equivalent — the Xcode-specific -DCMAKE_XCODE_ATTRIBUTE_* arguments in COMMON_CMAKE_ARGS are harmless no-ops under Ninja, so they were left alone rather than stripped.
GitHub Actions rebuilds the xcframework on every push to master and on a weekly schedule. The latest artifact is attached to a rolling prerelease tagged latest:
curl -LO https://github.com/apocryphx/llama.cpp/releases/download/latest/llama.xcframework.zip
unzip llama.xcframework.zipDrag llama.xcframework into your Xcode project's Frameworks, Libraries, and Embedded Content section. Each workflow run also uploads the zip as a 30-day Actions artifact if you need a specific build.
./build-xcframework.shOutput: build-apple/llama.xcframework/ containing 7 slices:
| Slice | Architectures |
|---|---|
ios-arm64 |
arm64 |
ios-arm64-simulator |
arm64 |
macos-arm64_x86_64 |
arm64, x86_64 |
xros-arm64 |
arm64 |
xros-arm64-simulator |
arm64 |
tvos-arm64 |
arm64 |
tvos-arm64-simulator |
arm64 |
Simulator slices are arm64-only — every modern Xcode Simulator runs arm64-on-arm64 on Apple Silicon. The x86_64 simulator slice has been dropped to halve simulator build times and binary sizes. macOS keeps the Intel slice since real Intel Macs are still in use.
Mac Catalyst is not in the xcframework — CMake's cross-compile flags conflict when combining both Catalyst architectures in a single configure step. See APPLE-PLATFORMS-BUILD.md for the manual lipo workflow.
Every slice links Metal.framework and Accelerate.framework, and embeds the full Metal shader library (110 MSL kernels) via GGML_METAL_EMBED_LIBRARY=ON. No external .metallib file is required at runtime. You can verify this on any slice:
nm build-apple/llama.xcframework/ios-arm64/llama.framework/llama \
| grep ggml_metallib
# 000000000032cc20 S _ggml_metallib_start
# 00000000003bf6d3 S _ggml_metallib_endDuring configuration of simulator/multi-arch slices you will see:
CMake Warning at ggml/src/ggml-cpu/CMakeLists.txt:558 (message):
Unknown CPU architecture. Falling back to generic implementations.
This is not a Metal fallback. It fires only when x86_64 is part of the architecture list (now just the macos-arm64_x86_64 slice) and means the x86_64 CPU backend slice uses generic scalar kernels instead of AVX/AVX2. The arm64 CPU backend and the Metal backend are unaffected. For shipping on Apple Silicon devices this warning is cosmetic.
- Xcode 26.x with all platform SDKs installed
- CMake 4.x (
brew install cmake) - Ninja (
brew install ninja)
Last verified: 2026-04-15 against upstream tag b8802 with Xcode 26.4, CMake 4.2.3, Ninja 1.13.2.
The xcframework is a library — it doesn't ship a runnable binary. To prove Metal is functional end-to-end against the same source the xcframework was built from, do a parallel host-macOS build of llama-bench:
cmake -B build-host -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_METAL=ON -DGGML_METAL_EMBED_LIBRARY=ON \
-DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_SERVER=OFF -DLLAMA_CURL=OFF
cmake --build build-host --target llama-bench -j
./build-host/bin/llama-bench -m <model>.gguf -p 64 -n 32 -ngl 99Note:
llama-cliis only built whenLLAMA_BUILD_SERVER=ONin this upstream (tools/CMakeLists.txt).llama-benchis always available and is a more informative smoke test anyway — it prints tokens/sec per backend.
Look for:
ggml_metal_library_init: using embedded metal library— the embedded metallib loaded, not a disk.metallib.GPU family: MTLGPUFamilyApple*— real Apple Silicon GPU detected.- Backend column
MTL,BLAS— Metal is the compute backend. - tg (token generation) rates in the hundreds of t/s on a small model; CPU-only would be 10× slower.
Reference numbers from the 2026-04-15 verification (SmolLM2-135M-Instruct Q4_K_M on an M-series Mac): pp64 ≈ 8098 t/s, tg32 ≈ 403 t/s, backend MTL,BLAS, family MTLGPUFamilyApple9.
The xcframework slices contain identical Metal backend code — same _ggml_metallib_start/_end symbols, same 110 kernels — so a working host Metal build is a reliable proxy for the framework slices.
The xcframework is unsigned. build-xcframework.sh sets CODE_SIGNING_ALLOWED=NO on purpose — a redistributable framework must not carry its own signature, because the consuming app has to re-sign embedded binaries with its own Team ID at archive time. This is the expected pattern for distributing a third-party .xcframework.
When you embed llama.xcframework in an Xcode project:
- In the Frameworks, Libraries, and Embedded Content section, set Embed to Embed & Sign. Xcode will sign the embedded
llama.frameworkwith your app's signing identity during the build. - No action needed in your entitlements — the framework uses only public
Metal.framework/Accelerate.frameworkAPIs. - For distribution outside the App Store (Developer ID / notarization), the framework will be signed and notarized as part of your app's archive, not separately.
If Xcode refuses to build with a signature error, the fix is almost always switching Embed from Do Not Embed to Embed & Sign. Do not pre-sign the framework with codesign before embedding — that produces nested signatures that archive validation rejects.
git fetch upstream --tags
LATEST_TAG=$(git tag --list 'b*' --sort=-v:refname | head -1)
git rebase "$LATEST_TAG"
./build-xcframework.sh # re-verifyThe fork's commits rebase cleanly onto upstream tags with one known pinch point: ggml/CMakeLists.txt conflicts whenever upstream adds code near cmake_minimum_required (e.g. the CMP0194 policy block added around tag b8802). Resolve by keeping both the fork's version-range bump and whatever upstream added adjacent to it.
After rebasing, run ./build-xcframework.sh and spot-check one slice before force-pushing:
lipo -info build-apple/llama.xcframework/ios-arm64_x86_64-simulator/llama.framework/llama
# Architectures in the fat file: ... are: x86_64 arm64
nm build-apple/llama.xcframework/ios-arm64/llama.framework/llama | grep ggml_metallib
# Expect _ggml_metallib_start and _ggml_metallib_end symbols.- APPLE-PLATFORMS-BUILD.md — full per-platform build recipes, Mac Catalyst workflow, troubleshooting notes.
- README.upstream.md — the complete upstream project README (features, model support, API docs, contributing, etc.).
- ggml-org/llama.cpp — upstream project. Issues that aren't Apple-build-specific should go there, not here.
MIT, same as upstream.