Deduplicate code in TQObject::activate_signal()
#123
Open
Fat-Zer
wants to merge 5 commits from feat/dedup-tqobject
into master
pull from: feat/dedup-tqobject
merge into: TDE:master
TDE:fix/eventloop-clenaup
TDE:r14.1.x
TDE:master
TDE:feat/delete-screen-intersect
TDE:fix/various
TDE:fix/issue/142-r1
TDE:fix/tqtextcodec-locale-destr
TDE:fix/no-thread
TDE:feat/naive-dedup
TDE:feat/cmakeConv-r1
TDE:feat/cmakeConv
TDE:r14.0.x
TDE:feat/add-interix-support
Reviewers
Request review
No reviewers
Labels
General - need additional info from contributor PR/keep-branch
Pull request - do not delete branch after merging PR/not-ok
Pull request - need fixing PR/rfc
Pull request - request for comments PR/update-trans
Pull request - update to translation files needed PR/wip
Pull request - work in progress RS/R14.0.x
Related to R14.0.x series RS/R14.1.x
Related to R14.1.x series SL/critical
Severity level - critical SL/major
Severity level - major SL/minor
Severity level - minor SL/normal
Severity level - normal SL/regression
Severity level - regression from previous version SL/trivial
Severity level - trivial SL/wishlist
Severity level - wishlist request ST/duplicate
Status - duplicate of another issue ST/invalid
Status - invalid report ST/notourproblem
Status - not our problem ST/rejected
Status - rejected ST/wontfix
Status - won't fix ST/worksforme
Status - works for me, unable to reproduce
Apply labels
Clear labels
GE/need-info
General - need additional info from contributor PR/keep-branch
Pull request - do not delete branch after merging PR/not-ok
Pull request - need fixing PR/rfc
Pull request - request for comments PR/update-trans
Pull request - update to translation files needed PR/wip
Pull request - work in progress RS/R14.0.x
Related to R14.0.x series RS/R14.1.x
Related to R14.1.x series SL/critical
Severity level - critical SL/major
Severity level - major SL/minor
Severity level - minor SL/normal
Severity level - normal SL/regression
Severity level - regression from previous version SL/trivial
Severity level - trivial SL/wishlist
Severity level - wishlist request ST/duplicate
Status - duplicate of another issue ST/invalid
Status - invalid report ST/notourproblem
Status - not our problem ST/rejected
Status - rejected ST/wontfix
Status - won't fix ST/worksforme
Status - works for me, unable to reproduce
No Label
GE/need-info
PR/keep-branch
PR/not-ok
PR/rfc
PR/update-trans
PR/wip
RS/R14.0.x
RS/R14.1.x
SL/critical
SL/major
SL/minor
SL/normal
SL/regression
SL/trivial
SL/wishlist
ST/duplicate
ST/invalid
ST/notourproblem
ST/rejected
ST/wontfix
ST/worksforme
Milestone
Set milestone
Clear milestone
No items
No Milestone
Assignees
Assign users
Clear assignees
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.
No due date set.
Dependencies
No dependencies set.
Reference: TDE/tqt3#123
Reference in new issue
There is no content yet.
Delete Branch 'feat/dedup-tqobject'
Deleting a branch is permanent. It CANNOT be undone. Continue?
No
Yes
According to a benchmark the while() loop with direct use of iterators
were ~10% slower than the current one which handled case when there is
only one connection to a signal separately.
But use of a range-based for loop sped up even this case for ~1.5%.
Closes: #121
Seems it causes crashes... I'll mark it as WIP
Deduplicate code in TQObject::activate_signal()to WIP: Deduplicate code in TQObject::activate_signal() 2 months agoLGTM
Ah missed this comment.
Due to crash reported by Alex
I found the cause of the crashes: it's probabilisticly happens when a TQObject gets deleted by one of its signal handlers.
Before the patch it surfaced only if there were 2+ connections to a slot and was less likely if the object was deleted in the last slot. With the patch it's much more likely to happen even if there is only one connection.
I don't suppose we can reliably track all the instances where actual deletion happen, so the best we can hope for is to mitigate the issue somehow. (One of variants to mitigate it is "Keep it as-is and don't touch").
Isn't is wrong in first place though? A slot should not delete the sender of the signal, IMO.
Perhpas for the time being we can "keep it as it is" as you suggested because it is not good to crash tqt3, but this PR could be useful to test/debug wrong code somewhere else.
Generally it is, but there are at least two cases like that in tqt itself and if it will be done in a safely manner it could be ok. One produces a semi-persistent crash and another I caught only when I added some debug prints. It will be almost impossible to track all those through out the whole code base and unless we do it it will be just random crashes and disappointment for users. Plus fixing each one of them might have a performance impact (though relatively small)...
I actually have a couple ideas how to workaround this...
WIP: Deduplicate code in TQObject::activate_signal()to Deduplicate code in TQObject::activate_signal() 2 months agoOk, I'm happy enough how it looks now... and no more crashes...
This PR also include a couple of additional patches:
sol
from #124I really like the solution proposed for the crash, very ingenious. Yet it raises a point for thoughts.
Say we have a signal connected to three receivers. The first receiver when invoked, deletes the sender object. Then when invoking the other two receivers, their
sender
would be pointing to an object already deleted, which is indeed wrong and could cause SEGV or UB.I think we are still missing something and until we clear ourselves on that, we should be cautious with this change. What do you think?
Then they are screwed, and it's definitely their fault... If you call
sender()
you are responsible for making sure it wasn't deleted before... luckily, this situation would have caused a crash even with the current code.To be frankly I'm more concerned what if a slot decides to change sender's connections e.g. disconnect from the call it just received... this might actually be a problem...
Indeed. And there is even one more complication: sometimes the sender and the receiver are the same object. So if the first receiver delete the sender or itself, then the second receiver will call a slot on a non existing receiver object. Even worse!
That is another good point/problem.
The question is now what to do with this PR. Options:
I am open to all three options. @Fat-Zer @SlavekB what is your preference?
I did not look at it in detail, but the change seems to make this block of code clearer. At the same time, Alex measured some improvement in speed. This seems like a win-win result.
I assume that we can undergo a change in testing for a few days and further deeper examination of the problem than it will be approved and merged.
Just built tqt3 with this PR and PR #109, installed it (no reboot yet) and run
Result is SEGV with the following bt, which is exactly on the part of code modified by this PR. So I reckon we need more work on this :-)
backtrace
Deduplicate code in TQObject::activate_signal()to WIP: Deduplicate code in TQObject::activate_signal() 2 months agodamnit... looks like it's exactly what I was dreaded for: it changes connections in the slot...
I have another idea to workaround this too: move the TQShared part upon the TQConnectionList itself and monitor it changes.
It will break ABI (though in a relatively minor way). Only probably affected classes would be those who use
TQConnectionList
in a non-opaque manner. As I far as I can grep it's several places intdebindings
,kspy
fromtdevelop
and somewhykview
fromtdegraphics
. Though as far as they don't construct destroy the objects themself it shouldn't really introduce any failures to those (but I can't guarantee that)...It will also change
TQSignalVec
, but it seems nobody uses it anyway (I was thinking of moving it into implementation)...WIP: Deduplicate code in TQObject::activate_signal()to Deduplicate code in TQObject::activate_signal() 2 months agoOk, this time it seems to be fine... at least the system boots fine and there are no obvious crashes or bugs with library compiled from the PR...
I have not yet done any test nor reviewed the code in detail, but any effect on execution time? Deep copying connection lists on connections does give me a bit of goose bumps, if you get what I mean. And if we are doing all this effort to spare 60 lines of codes, I start to wonder whether we'd be better leaving things as they are. Don't mean to sound negative, just trying to be objective on benefits vs. risks vs performances.
I haven't measured it, but it shouldn't affect general performance. The deep copying happens only with quite specific circumstances: connection modifications from within a slot. So it shouldn't be that often; as the signal lists are generally small (1-2 connections), the copying shouldn't take that long. Any longer lists should have caused crashes with the current implementation.
It's hard to put all that into a normal performance test due to the process being stochastic. I can imaging only collecting some tallies to see how often it actually happens on a real system, if you wish (but frankly I don't really want to do it)...
Not only that, the fact that we couldn't simply replace this code is an indicator that the underlying code is broken.
Yeah, I see what you mean... To be honest, I myself already wondering if I'm just a hostage of some sunk cost fallacy here, due to how much time and mental efforts I've already spent on this one... But I still believe it's still yet justified...
Will review the code this week and let's discuss from there. It is true the original code was buggy anyway, so perhaps all the effort made is for a good reason :-)
@Fat-Zer @SlavekB I propose we leave this PR for after R14.1.2. We are too close to freezeing the release and if we get something wrong, we could introduce bugs without enough time to test properly (see PR #127, issue #142). What do you think?
Yes, that's right, now we have already come to the stage where we need to calm down and focus on stabilization before release. There would be not enough time to test other major changes. In any case, it is great how many fixes and optimization managed to do in this development window. Thank you all participated for this effort!
I'm ok with that...