Bug report
PyUnicode_InternFromString and various other interning functions call intern_common, which locks an interpreter-wide lock in the free threading build. We should use _Py_LOCK_DONT_DETACH when locking that lock to avoid potential deadlocks in certain extensions.
PyTorch, TensorFlow, and possibly other extensions use C++11 one time initialization via function local statics to intern string. For example:
void PyNode::compiled_args(CompiledNodeArgs& args) const {
static PyObject* method_name = PyUnicode_InternFromString("_compiled_autograd_key");
...
}
https://github.com/pytorch/pytorch/blob/8804b122355d1aa1cc6bfffd1516f3df39f96c5f/torch/csrc/autograd/python_function.cpp#L369-L370
C++ static locals implement thread-safe one time initialization, similar to std::call_once or pthread_once. That means that a (hidden) C++ lock is held during the call to PyUnicode_InternFromString. If we detach, during PyUnicode_InternFromString, we risk a lock ordering deadlock.
Here's a complicated scenario that could deadlock:
- Thread 1: requests a stop the world pause
- Thread 2: holds the interned mutex (causes lock contention)
- Thread 3: calls
static PyObject* method_name = PyUnicode_InternFromString(...). Blocks on interned mutex and detaches for stop the world pause
- Thread 4: calls
static PyObject* method_name = PyUnicode_InternFromString(...) and blocks on C++ one time initialization lock without detaching.
So there's a deadlock where Thread 4 - (waiting on) -> Thread 3 - (waiting on) -> Thread 1 - (waiting on) -> Thread 4.
This isn't a problem in GIL-enabled builds because thread 3 won't block during PyUnicode_InternFromString.
_Py_LOCK_DONT_DETACH prevents this problem because PyUnicode_InternFromString() will no longer block on a stop the world if there's contention on the interned mutex.
Note that if callers used something like pybind11's gil_safe_call_once_and_store that would also avoid the deadlock.
Linked PRs
Bug report
PyUnicode_InternFromStringand various other interning functions callintern_common, which locks an interpreter-wide lock in the free threading build. We should use_Py_LOCK_DONT_DETACHwhen locking that lock to avoid potential deadlocks in certain extensions.PyTorch, TensorFlow, and possibly other extensions use C++11 one time initialization via function local statics to intern string. For example:
https://github.com/pytorch/pytorch/blob/8804b122355d1aa1cc6bfffd1516f3df39f96c5f/torch/csrc/autograd/python_function.cpp#L369-L370
C++ static locals implement thread-safe one time initialization, similar to
std::call_onceorpthread_once. That means that a (hidden) C++ lock is held during the call toPyUnicode_InternFromString. If we detach, duringPyUnicode_InternFromString, we risk a lock ordering deadlock.Here's a complicated scenario that could deadlock:
static PyObject* method_name = PyUnicode_InternFromString(...). Blocks on interned mutex and detaches for stop the world pausestatic PyObject* method_name = PyUnicode_InternFromString(...)and blocks on C++ one time initialization lock without detaching.So there's a deadlock where Thread 4 - (waiting on) -> Thread 3 - (waiting on) -> Thread 1 - (waiting on) -> Thread 4.
This isn't a problem in GIL-enabled builds because thread 3 won't block during
PyUnicode_InternFromString._Py_LOCK_DONT_DETACHprevents this problem becausePyUnicode_InternFromString()will no longer block on a stop the world if there's contention on the interned mutex.Note that if callers used something like pybind11's
gil_safe_call_once_and_storethat would also avoid the deadlock.Linked PRs