Optimisation causes negative value to turn "for" loop infinite #93
Manually merged
MicheleC
merged 1 commits from bug/prevent-menu-svg-icon-loop
into master
4 years ago
Loading…
Reference in new issue
There is no content yet.
Delete Branch 'bug/prevent-menu-svg-icon-loop'
Deleting a branch is permanent. It CANNOT be undone. Continue?
The declaration of "n_segs" in calculateArc is now declared as "volatile". Something in the GCC optimisation can cause the method to set a negative value in the variable causing the "for" loop to become infinite and hang the menu panel.
Signed-off-by: aneejit1 aneejit1@gmail.com
There is a long-standing potential loop in ksvgiconpainter.cpp. I've seen this running 14.0.6 on CentOS 7, so it's not just my custom setup that has seen this problem.
Certain SVG icons cannot be displayed on the menu panel. When the pointer is run over the main panel and a popup list should appear to the right, kicker just goes into a loop. The only way out is to kill it, restart it and try to avoid touching that option on the menu again.
Since I've been starting TDE from an xterm, I got the following message repeated: "TQCArray::at: Absolute index [n] out of range". In order to get gdb to provide some useful information, I rebuilt tdelibs with debugging information on at which point the issue stopped. The same problem occurs in KDE 3.5.10, so I switched to that and managed to locate the problem.
It in method "calculateArc", the value of n_segs is being calculated to a negative number which causes the for loop there to go crazy. This appears not to be a mathematical issue but rather how GCC is optimising the code (I suspect that turning debugging information on also turned optimisation off which was why I could not do this with tdelibs). Having experimented with turning optimisation for the entire method off, testing for n_segs being negative and not running the loop, and various other options, it appears that simply defining n_segs as "volatile" resolves the issue.
I've commented the new declaration just in case someone wonders "what's that 'volatile' doing there?" and removes it!
Hi @aneejit1, I am sure you have done your studies on this problem, but I can't see how this could be the correct solution.
Variable values are guarantee to be correctly assigned and GCC optimization can not make a variable value wrong (unless of GCC bugs of course). If n_segs becomes negative there must be a reason (probably a bug somewhere) and we should try to find out why.
I don't consider using volatile the correct solution, it may just be hiding the real problem and work for some reasons, but without knowing why.
I discussed with Slavek on IRC and we both agree on the above comment. I am marking the PR as not ok. @aneejit1 please try doing more investigation to find the real reason why n_segs becomes negative 👍
OK, I'll dig out my assembly language book and see if I can work out what it's doing wrong. This may take a while, I've always avoided the floating point instruction set.
I'm starting to think setting "CMAKE_BUILD_TYPE" to "RelWithDebInfo" and walking away from this would have been a better option!
I like IBM's mainframe compilers. You set the "LIST" option on and you get a nicely formatted listing showing the program statements along with the assembly code that the compiler has generated. Not so with GCC. Set the "-S" option and what comes out is a pile of assembly language with no indication as to where it comes from or what it relates to aside from a few obvious library calls.
So, I've compiled ksvgiconpainter.cpp with '-S' set and the 'volatile' removed to produce a baseline compile and done some comparisons on the code generated in various conditions. I've also tried a few other things. Here's what I've found so far:
This looks to me like the optimisation produced by '-O2' and '-O3' in GCC is unable or unwilling to corrently handle test conditions involving n_segs. It seems to be triggered by the assignment of n_segs, the only ways of avoiding this being to either declare n_segs as 'volatile' thus forcing the variable's value to be reserved in and written to storage, or as 'double'. The negativity of the value of n_segs, contrary to what I first thought, appears irrelevent.
I would say that looking at how 'i' and 'n_segs' are used within the loop, it would probably make more sense if they were defined as 'double' rather than 'int' as they are both only used as part of calculations involving floating arithmetic and it would prevent some unnecessary implicit cast conversions being required.
Suggestions?
Great analysis @aneejit1. I will try to have a look at this as well on Monday, I got interested now. Is there a way to reproduce this systematically? Any specific svg icon I should set on the menu?
One quick thing I notices is that -2147483648 is 0x80000000, which does not seem a totally random number to me. Either n_segs is overflowing above the max positive value or somewhere this value gets set in the code.
Again, I personally would be extremely surprised if the problem is just down to compiler optimization.
OK, I think I understand why the negative value is there and it seems to be a perfectly valid situation to have, albeit one which is not handled in a simple to decypher manner without additional documentation. I believe it is one of the conditions referred to in the SVG specifications as an "Out-of-Range Parameter", specifically, the condition where the start and end points are the same:
https://www.w3.org/TR/SVG11/implnote.html#ArcImplementationNotes
I've compiled with a "printf" to show the values of (x0,y0) and (x1,y1) along with the calculated values of th_arc and n_segs. Where the two sets of co-ordinated are identical, a -NaN value is calculated for th_arc resulting in the value of -2147483648 being stored in n_segs. The generated code should, by virtue of the 'i < n_segs' test in the 'for' statement, avoid the loop entirely under these circumstances.
I would suggest that the reason for the start and end points being the same is down to the size of the icon; the arc that needs to be drawn has been scaled down to the point where there is insufficient resolution in the output co-ordinate system for the arc to be discernable.
The icon I was having problems with is 'org.gnome.Music.svg' which is loaded from 'org.gnome.Music.desktop'. It comes from the gnome-music package, version 3.36.2.
I've done a mod to print the icon file names, run up the menu editor and expanded the tree; the only other one that appears to have the same problem is 'org.gnome.Epiphany.svg' which comes from epiphany 3.36.1. The one that I had on the CentOS 7 system was on the 'Wine' menu, but I don't know which specific icon file on there is causing the problem. It seems to be a rare situation but one that's very inconvenient if you happen to hit it!
As for optimisation issues, they can happen. Having turned the "LIST" option on IBM's COBOL compiler on with the optimisation enabled has had me scratching my head at times as to why it has done a certain thing, but what it has done seems to work OK. It's kind of a complicated area which is why I'm willing to point the finger in that direction when the evidence points that way.
I did pick up on the significance of the value of n_segs. I'm currently running with both i and n_segs defined as 'double' and the printfs I have in place are showing n_segs as being NaN. It looks like casting to 'int' is what gives it that specific value as an int obviously has no equivalent of NaN.
Nice finding, that makes more sense to me and was actually expecting something on that line 😄
In any case running with i and n_segs as double is also not correct in my opinion since a for loop on a NaN makes not much sense (and probably will exit immediately). Let me have a look, I have a quick fix in my mind but I would like to reproduce the problem and test first if possible.
Turned out the problem was a division by zero when the original point and the translated point were the same. This was causing NaN and an infinite looping in the while() {} inside the "KSVGIconPainter::drawPath()" function.
@aneejit1 the problem was reproduced here using the same icon and the fix seems to work. Could you test on your system too before I merge the PR?
That's working fine.
It just occurred to me to query why converting i/n_segs to double works. Turns out that if n_segs is NaN, 'i < n_segs', 'i <= n_segs', 'i > n_segs' and 'i >= n_segs' all evaluate to 'false'. The only comparisons that would give a true result would be 'i != n_segs', or 'i == n_segs' if i was NaN.
Well, that's the something new learnt today!
Thanks for testing, I will proceed with the merge.
Regarding this, that's a strange one. I did try (as first solution) to make sure the for() look was only executed if n_segs was positive (using int, not float) and I still ran into problems with an infinite loop on the KSVGIconPainter::drawPath()” function, which prompted me to do further investigation. But perhaps with NaN some other strange things were happening behind the scenes, I guess....
Anyhow, we found the problem and fix it, that is what is important 😄
Congratulations on your good research and successful detection and resolution of the problem!
ffe8e495d7
.