Skip to content

apocryphx/llama.cpp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8,808 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llama.cpp — Apple Platforms Fork

Download llama.xcframework.zip CI

A fork of ggml-org/llama.cpp focused on producing a working llama.xcframework for local on-device inference across all current Apple platforms.

Download the latest prebuilt llama.xcframework.zip (sha256) — rebuilt on every push to master and weekly.

The goal of this fork is narrow: keep a shippable xcframework building on a modern macOS toolchain (Xcode 26+, CMake 4.x). No feature additions, no diverging APIs — everything else is pulled straight from upstream.

What this fork changes

Three commits on top of upstream (plus this README):

  1. cmake_minimum_required bump (CMakeLists.txt, ggml/CMakeLists.txt) — widens the accepted version range to 3.5...4.2 so CMake 4.x stops warning about removed policies. Upstream still pins to 3.14...3.28.
  2. build-xcframework.sh → Ninja generator — see "Why Ninja" below. All 7 cmake -B invocations in the script now use -G Ninja instead of -G Xcode. The Xcode-only -- -quiet build argument was dropped. combine_static_libraries call sites now pass . as the release_dir because Ninja is single-config and emits archives directly under src/, not src/Release-<sdk>/.
  3. Fork-focused README — this file; upstream README moved to README.upstream.md.

No C/C++/Objective-C source has been touched. No APIs added, removed, or renamed. No ggml backend modifications. Library behavior is byte-for-byte identical to upstream b8802 for the same inputs.

Why Ninja

On CMake 4.x with Xcode 26, the Xcode generator fails when cross-compiling to iOS/tvOS/visionOS SDKs:

-- The C compiler identification is unknown
CMake Error at ggml/src/ggml-cpu/CMakeLists.txt:57 (target_compile_features):
  target_compile_features no known features for C compiler "" version .

The failure reproduces against upstream/master, verified 2026-04-15. Ninja bypasses it entirely because it does not rely on Xcode's toolchain detection for cross-SDK builds. The resulting xcframework is equivalent — the Xcode-specific -DCMAKE_XCODE_ATTRIBUTE_* arguments in COMMON_CMAKE_ARGS are harmless no-ops under Ninja, so they were left alone rather than stripped.

Downloading a prebuilt xcframework

GitHub Actions rebuilds the xcframework on every push to master and on a weekly schedule. The latest artifact is attached to a rolling prerelease tagged latest:

curl -LO https://github.com/apocryphx/llama.cpp/releases/download/latest/llama.xcframework.zip
unzip llama.xcframework.zip

Drag llama.xcframework into your Xcode project's Frameworks, Libraries, and Embedded Content section. Each workflow run also uploads the zip as a 30-day Actions artifact if you need a specific build.

Building the xcframework

./build-xcframework.sh

Output: build-apple/llama.xcframework/ containing 7 slices:

Slice Architectures
ios-arm64 arm64
ios-arm64-simulator arm64
macos-arm64_x86_64 arm64, x86_64
xros-arm64 arm64
xros-arm64-simulator arm64
tvos-arm64 arm64
tvos-arm64-simulator arm64

Simulator slices are arm64-only — every modern Xcode Simulator runs arm64-on-arm64 on Apple Silicon. The x86_64 simulator slice has been dropped to halve simulator build times and binary sizes. macOS keeps the Intel slice since real Intel Macs are still in use.

Mac Catalyst is not in the xcframework — CMake's cross-compile flags conflict when combining both Catalyst architectures in a single configure step. See APPLE-PLATFORMS-BUILD.md for the manual lipo workflow.

Every slice links Metal.framework and Accelerate.framework, and embeds the full Metal shader library (110 MSL kernels) via GGML_METAL_EMBED_LIBRARY=ON. No external .metallib file is required at runtime. You can verify this on any slice:

nm build-apple/llama.xcframework/ios-arm64/llama.framework/llama \
  | grep ggml_metallib
# 000000000032cc20 S _ggml_metallib_start
# 00000000003bf6d3 S _ggml_metallib_end

Expected build-time warnings

During configuration of simulator/multi-arch slices you will see:

CMake Warning at ggml/src/ggml-cpu/CMakeLists.txt:558 (message):
  Unknown CPU architecture.  Falling back to generic implementations.

This is not a Metal fallback. It fires only when x86_64 is part of the architecture list (now just the macos-arm64_x86_64 slice) and means the x86_64 CPU backend slice uses generic scalar kernels instead of AVX/AVX2. The arm64 CPU backend and the Metal backend are unaffected. For shipping on Apple Silicon devices this warning is cosmetic.

Requirements

  • Xcode 26.x with all platform SDKs installed
  • CMake 4.x (brew install cmake)
  • Ninja (brew install ninja)

Last verified: 2026-04-15 against upstream tag b8802 with Xcode 26.4, CMake 4.2.3, Ninja 1.13.2.

Verifying Metal works

The xcframework is a library — it doesn't ship a runnable binary. To prove Metal is functional end-to-end against the same source the xcframework was built from, do a parallel host-macOS build of llama-bench:

cmake -B build-host -G Ninja \
    -DCMAKE_BUILD_TYPE=Release \
    -DGGML_METAL=ON -DGGML_METAL_EMBED_LIBRARY=ON \
    -DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_SERVER=OFF -DLLAMA_CURL=OFF
cmake --build build-host --target llama-bench -j
./build-host/bin/llama-bench -m <model>.gguf -p 64 -n 32 -ngl 99

Note: llama-cli is only built when LLAMA_BUILD_SERVER=ON in this upstream (tools/CMakeLists.txt). llama-bench is always available and is a more informative smoke test anyway — it prints tokens/sec per backend.

Look for:

  • ggml_metal_library_init: using embedded metal library — the embedded metallib loaded, not a disk .metallib.
  • GPU family: MTLGPUFamilyApple* — real Apple Silicon GPU detected.
  • Backend column MTL,BLAS — Metal is the compute backend.
  • tg (token generation) rates in the hundreds of t/s on a small model; CPU-only would be 10× slower.

Reference numbers from the 2026-04-15 verification (SmolLM2-135M-Instruct Q4_K_M on an M-series Mac): pp64 ≈ 8098 t/s, tg32 ≈ 403 t/s, backend MTL,BLAS, family MTLGPUFamilyApple9.

The xcframework slices contain identical Metal backend code — same _ggml_metallib_start/_end symbols, same 110 kernels — so a working host Metal build is a reliable proxy for the framework slices.

Code signing

The xcframework is unsigned. build-xcframework.sh sets CODE_SIGNING_ALLOWED=NO on purpose — a redistributable framework must not carry its own signature, because the consuming app has to re-sign embedded binaries with its own Team ID at archive time. This is the expected pattern for distributing a third-party .xcframework.

When you embed llama.xcframework in an Xcode project:

  • In the Frameworks, Libraries, and Embedded Content section, set Embed to Embed & Sign. Xcode will sign the embedded llama.framework with your app's signing identity during the build.
  • No action needed in your entitlements — the framework uses only public Metal.framework / Accelerate.framework APIs.
  • For distribution outside the App Store (Developer ID / notarization), the framework will be signed and notarized as part of your app's archive, not separately.

If Xcode refuses to build with a signature error, the fix is almost always switching Embed from Do Not Embed to Embed & Sign. Do not pre-sign the framework with codesign before embedding — that produces nested signatures that archive validation rejects.

Syncing with upstream

git fetch upstream --tags
LATEST_TAG=$(git tag --list 'b*' --sort=-v:refname | head -1)
git rebase "$LATEST_TAG"
./build-xcframework.sh     # re-verify

The fork's commits rebase cleanly onto upstream tags with one known pinch point: ggml/CMakeLists.txt conflicts whenever upstream adds code near cmake_minimum_required (e.g. the CMP0194 policy block added around tag b8802). Resolve by keeping both the fork's version-range bump and whatever upstream added adjacent to it.

After rebasing, run ./build-xcframework.sh and spot-check one slice before force-pushing:

lipo -info build-apple/llama.xcframework/ios-arm64_x86_64-simulator/llama.framework/llama
# Architectures in the fat file: ... are: x86_64 arm64
nm build-apple/llama.xcframework/ios-arm64/llama.framework/llama | grep ggml_metallib
# Expect _ggml_metallib_start and _ggml_metallib_end symbols.

Further reading

  • APPLE-PLATFORMS-BUILD.md — full per-platform build recipes, Mac Catalyst workflow, troubleshooting notes.
  • README.upstream.md — the complete upstream project README (features, model support, API docs, contributing, etc.).
  • ggml-org/llama.cpp — upstream project. Issues that aren't Apple-build-specific should go there, not here.

License

MIT, same as upstream.

About

LLM inference in C/C++

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • C++ 57.6%
  • C 12.5%
  • Python 7.5%
  • Cuda 5.9%
  • HTML 3.1%
  • TypeScript 3.0%
  • Other 10.4%