Fix level when using pthreads recursive mutex #68

Συγχωνευμένα
MicheleC συγχώνευσε 1 υποβολές από bug/2462/fix-recursive-posix-mutexes σε master 1 έτος πριν
koorogi σχολίασε 1 έτος πριν
Συνεργάτης

The support for checking how many times a thread holds a lock on a recursive mutex was broken when using pthreads recursive mutexes.

This can cause a deadlock because TQApplication::internalNotify uses this to unlock the tqt_mutex before sending an event to the receiver, and then to know how many times to re-lock it afterwards. Before this PR, the level was always determined to be zero, which meant that the lock did not get unlocked properly if it was already held before entering internalNotify. If the event receiver did work which takes the lock in another thread, it would deadlock.

I observed this in tdm when running on a Linux system using musl instead of glibc.

This change likely fixes bugzilla bugs 2462, 2526, and 2744, and commit 9bbdfea568 can probably be reverted, but I have not done so because I do not have a FreeBSD system for testing.

The support for checking how many times a thread holds a lock on a recursive mutex was broken when using pthreads recursive mutexes. This can cause a deadlock because `TQApplication::internalNotify` [uses this](https://mirror.git.trinitydesktop.org/gitea/TDE/tqt3/src/commit/b167d09c43be6c8b7d30f0d4fb1bf1e86c51b8a7/src/kernel/qapplication.cpp#L2876) to unlock the `tqt_mutex` before sending an event to the receiver, and then to know how many times to re-lock it afterwards. Before this PR, the `level` was always determined to be zero, which meant that the lock did not get unlocked properly if it was already held before entering `internalNotify`. If the event receiver did work which takes the lock in another thread, it would deadlock. I observed this in tdm when running on a Linux system using musl instead of glibc. This change likely fixes bugzilla bugs 2462, 2526, and 2744, and commit 9bbdfea568b8ae52b76aa8366612130403158320 can probably be reverted, but I have not done so because I do not have a FreeBSD system for testing.
MicheleC το πρόσθεσε στο R14.1.x ορόσημο 1 έτος πριν
MicheleC σχολίασε 1 έτος πριν
Ιδιοκτήτης

Thanks for the PR Bobby. It won't make it into R14.1.0 which is going to be hard frozen today, but we will look into this for R14.1.1

Thanks for the PR Bobby. It won't make it into R14.1.0 which is going to be hard frozen today, but we will look into this for R14.1.1
MicheleC σχολίασε 1 έτος πριν
Ιδιοκτήτης

The patch looks good, although mutex support in TQt3 remains flawn due to its desing. Before merging this patch officially, I will give it a good test in linux and bsd, to make sure there are no evident side effects (I don't expect any, but you never know :-) )

The patch looks good, although mutex support in TQt3 remains flawn due to its desing. Before merging this patch officially, I will give it a good test in linux and bsd, to make sure there are no evident side effects (I don't expect any, but you never know :-) )
MicheleC σχολίασε 1 έτος πριν
Ιδιοκτήτης

I managed to reproduce the issues of bug 2462 and 2526 on a modified tqt3 version. After appling this patch, those issues do not happen anymore.

@koorogi I am ready to merge the patch but it needs to be rebased on top of master. Could you do that? Or are you happy enough if I do the rebase and then merge?

I managed to reproduce the issues of bug 2462 and 2526 on a modified tqt3 version. After appling this patch, those issues do not happen anymore. @koorogi I am ready to merge the patch but it needs to be rebased on top of master. Could you do that? Or are you happy enough if I do the rebase and then merge?
MicheleC force-pushed bug/2462/fix-recursive-posix-mutexes από το 9eb28f85ed στο 999cebcb5d 1 έτος πριν
MicheleC συγχώνευσε την υποβολή 999cebcb5d σε master 1 έτος πριν
MicheleC διέγραψε το κλάδο bug/2462/fix-recursive-posix-mutexes 1 έτος πριν
MicheleC σχολίασε 1 έτος πριν
Ιδιοκτήτης

Since the original commit was not GPG signed, I decided to rebase it myself.
@koorogi thanks for the patch!

Since the original commit was not GPG signed, I decided to rebase it myself. @koorogi thanks for the patch!
MicheleC τροποποίησε το ορόσημο από R14.1.x σε R14.1.1 release 1 έτος πριν
MicheleC ενέκρινε αυτές τις αλλαγές 1 έτος πριν
koorogi σχολίασε 1 έτος πριν
Συντάκτης
Συνεργάτης

@MicheleC Thanks!

One thing I did want to think about more was the question of what to do about commit 9bbdfea568. This disabled the use of pthreads recursive mutexes on FreeBSD, so it should be safe to revert now. But it should really be possible to use them on more platforms.

The check for recursive mutex support depends on PTHREAD_MUTEX_RECURSIVE being a preprocessor definition, and on glibc it's defined as an enum value, which is why the broken implementation wasn't used there.

I'm not familiar with cmake, but I could look into writing a configure-time check to see if PTHREAD_MUTEX_RECURSIVE is available, whether or not it's a preprocessor definition.

Alternatively, we could assume the presence of recursive mutex support when using pthreads and rip out the fallback implementation. To the best of my knowledge, recursive mutex support was a required part of pthreads when it was originally standardized -- I don't know what platform this code is trying to support that implements pthreads but not recursive mutexes.

@MicheleC Thanks! One thing I did want to think about more was the question of what to do about commit 9bbdfea568. This disabled the use of pthreads recursive mutexes on FreeBSD, so it should be safe to revert now. But it should really be possible to use them on more platforms. The check for recursive mutex support depends on `PTHREAD_MUTEX_RECURSIVE` being a preprocessor definition, and on glibc it's defined as an `enum` value, which is why the broken implementation wasn't used there. I'm not familiar with cmake, but I could look into writing a configure-time check to see if `PTHREAD_MUTEX_RECURSIVE` is available, whether or not it's a preprocessor definition. Alternatively, we could assume the presence of recursive mutex support when using pthreads and rip out the fallback implementation. To the best of my knowledge, recursive mutex support was a required part of pthreads when it was originally standardized -- I don't know what platform this code is trying to support that implements pthreads but not recursive mutexes.
MicheleC σχολίασε 1 έτος πριν
Ιδιοκτήτης

Hi Bobby,
you are perfectly right.
Currently I have not (yet) reverted commit 9bbdfea568 simply because I want to fix recursive mutex detection, but being deep in some other tasks I have not started to work on it (I did revert the commit for testing though).

I think your suggestion that "if we have pthread we can assume we have recursive mutexes" makes a lot of sense. Moreover Qt4 does not even check for that, it just assumes mutexes are recursive, from a quick look at its code.

If you have time to prepare a patch for it, it would be great since it would be a while before I can look at this myself otherwise.

Btw, for R14.2.0 I am planning to change implementation of mutexes, because the TQt3 is flawn by design given it does not make use of atomic operations.

Hi Bobby, you are perfectly right. Currently I have not (yet) reverted commit `9bbdfea568` simply because I want to fix recursive mutex detection, but being deep in some other tasks I have not started to work on it (I did revert the commit for testing though). I think your suggestion that "if we have pthread we can assume we have recursive mutexes" makes a lot of sense. Moreover Qt4 does not even check for that, it just assumes mutexes are recursive, from a quick look at its code. If you have time to prepare a patch for it, it would be great since it would be a while before I can look at this myself otherwise. Btw, for R14.2.0 I am planning to change implementation of mutexes, because the TQt3 is flawn by design given it does not make use of atomic operations.
koorogi σχολίασε 1 έτος πριν
Συντάκτης
Συνεργάτης

If you have time to prepare a patch for it, it would be great since it would be a while before I can look at this myself otherwise.

I'll do that

Btw, for R14.2.0 I am planning to change implementation of mutexes, because the TQt3 is flawn by design given it does not make use of atomic operations.

If we ignore the fallback implementation because we just said it can be deleted, I assume this is about the count member that I added, which is queried by the locked and level methods?

I haven't verified that there isn't code out there doing this, but those methods are not useful to call from a thread which isn't holding the lock. If you call these methods without holding the lock, then there's a window between when the count value is read and when you use the value where another thread could get/release the lock. The result is already stale before you can use it, and using atomics for the accesses to count can't fix this.

I don't know what the rules are here around maintaining API compatibility, which is why I didn't suggest this in the first place, but if you're open to breaking API changes, here's what I would do:

  • Remove the locked method. As mentioned about, you can't do anything useful with the result if you are not the thread holding the lock, and if you know you are the thread holding the lock, then you don't need to call this method to know that it's locked.
  • Remove the level method so it can't be called from a thread that's not holding the lock. Instead, add an overload to lock which has an output parameter to return the lock count upon successfully acquiring the lock. This way, it's only possible to query the lock count by acquiring the lock.

With those changes, it should be guaranteed that all accesses of the count member are made from a thread that has acquired the lock, and then there's no need to use atomics for them.

> If you have time to prepare a patch for it, it would be great since it would be a while before I can look at this myself otherwise. I'll do that > Btw, for R14.2.0 I am planning to change implementation of mutexes, because the TQt3 is flawn by design given it does not make use of atomic operations. If we ignore the fallback implementation because we just said it can be deleted, I assume this is about the `count` member that I added, which is queried by the `locked` and `level` methods? I haven't verified that there isn't code out there doing this, but those methods are not useful to call from a thread which isn't holding the lock. If you call these methods without holding the lock, then there's a window between when the `count` value is read and when you use the value where another thread could get/release the lock. The result is already stale before you can use it, and using atomics for the accesses to `count` can't fix this. I don't know what the rules are here around maintaining API compatibility, which is why I didn't suggest this in the first place, but if you're open to breaking API changes, here's what I would do: * Remove the `locked` method. As mentioned about, you can't do anything useful with the result if you are not the thread holding the lock, and if you know you are the thread holding the lock, then you don't need to call this method to know that it's locked. * Remove the `level` method so it can't be called from a thread that's not holding the lock. Instead, add an overload to `lock` which has an output parameter to return the lock count upon successfully acquiring the lock. This way, it's only possible to query the lock count by acquiring the lock. With those changes, it should be guaranteed that all accesses of the `count` member are made from a thread that has acquired the lock, and then there's no need to use atomics for them.
MicheleC σχολίασε 1 έτος πριν
Ιδιοκτήτης

If you have time to prepare a patch for it, it would be great since it would be a while before I can look at this myself otherwise.

I'll do that

Thanks!! Appreciated the help :-)

Btw, for R14.2.0 I am planning to change implementation of mutexes, because the TQt3 is flawn by design given it does not make use of atomic operations.

If we ignore the fallback implementation because we just said it can be deleted, I assume this is about the count member that I added, which is queried by the locked and level methods?

The problem was not caused by the new count member, it was there before. The way locked is implemented is not thread safe basically, because it is possible that one thread checks for lock, get a reply that the mutex is not locked but right before returning from locked() another thread interrupts and lock the mutex.

For R14.1.x we are not going to break API. Whatever we can do to improve the current code is fine, but with that limitation in mind. Even if the code is no fully thread safe, TDE is running for many users, so the likelihood of read problems is very very small.

For R14.2.0 I am planning to backport mutex implementation from Qt4 but using c++11 atomic operation in place of the code present in Qt4. This may require other changes as well, but I have not yet investigated in further details. Fyi, locked is gone in Qt4 (or better it is only provided for Qt3 compatibility) so it was already found to be not thread safe.

> > If you have time to prepare a patch for it, it would be great since it would be a while before I can look at this myself otherwise. > > I'll do that Thanks!! Appreciated the help :-) > > Btw, for R14.2.0 I am planning to change implementation of mutexes, because the TQt3 is flawn by design given it does not make use of atomic operations. > > If we ignore the fallback implementation because we just said it can be deleted, I assume this is about the `count` member that I added, which is queried by the `locked` and `level` methods? The problem was not caused by the new `count` member, it was there before. The way `locked` is implemented is not thread safe basically, because it is possible that one thread checks for lock, get a reply that the mutex is not locked but right before returning from `locked()` another thread interrupts and lock the mutex. For R14.1.x we are not going to break API. Whatever we can do to improve the current code is fine, but with that limitation in mind. Even if the code is no fully thread safe, TDE is running for many users, so the likelihood of read problems is very very small. For R14.2.0 I am planning to backport mutex implementation from Qt4 but using c++11 atomic operation in place of the code present in Qt4. This may require other changes as well, but I have not yet investigated in further details. Fyi, `locked` is gone in Qt4 (or better it is only provided for Qt3 compatibility) so it was already found to be not thread safe.

Εξεταστές

MicheleC ενέκρινε αυτές τις αλλαγές 1 έτος πριν
Το pull request έχει συγχωνευθεί ως 999cebcb5d.
Συνδεθείτε για να συμμετάσχετε σε αυτή τη συνομιλία.
Δεν υπάρχουν εξεταστές
Χωρίς Ορόσημο
Χωρίς Αποδέκτη
2 Συμμετέχοντες
Ειδοποιήσεις
Ημερομηνία Παράδοσης

Δεν ορίστηκε ημερομηνία παράδοσης.

Εξαρτήσεις

Δεν έχουν οριστεί εξαρτήσεις.

Αναφορά: TDE/tqt3#68
Φόρτωση…
Δεν υπάρχει ακόμα περιεχόμενο.