Safe conversion TQString to char* #6
Scalone
SlavekB
scala 1 commity/ów z feat/safe-TQString-char-conversions
do master
5 lat temu
Ładowanie…
Reference in new issue
Nie ma jeszcze treści.
Usuń gałąź 'feat/safe-TQString-char-conversions'
Usunięcie gałęzi jest trwałe i NIE MOŻE zostać cofnięte. Kontynuować?
Here are methods that can be used for conversion:
If the
TQT_NO_ASCII_CAST
is not set, theascii()
method can be automatically used for the conversion.The problem is that many of these automatic conversions are wrong. It is better when
TQT_NO_ASCII_CAST
is set – it is default for CMake builds. The second problem is that theascii()
method is used – see below.Method
ascii()
. This method has two options as she behaves. If the globalTQTextCodec::codecForCStrings
is set, the codec will be used for the conversion. If it is not set, the call is the same as thelatin1()
method.The problem is that most of
ascii()
calls are wrong. Often these calls should beutf8()
orlocal8Bit()
. Alternatively,latin1()
may be used.Method
latin1()
.Method
utf8()
.Method
local8Bit()
.All methods appear to be easily replaceable at first glance – for example, use
str.local8Bit()
instead of the wrongstr.ascii()
. But there is one fundamental difference that represents a hidden danger.The
latin1()
andascii()
methods in both use modes create a internal buffer char* in the TQString object and return the pointer to this buffer. As a result, the return value is valid throughout the validity of the TQString object.The
utf8()
andlocal8Bit()
methods return a new TQCString object. The TQCString object provides a simple conversion to char*, which makes it easy to use to replace instances of wrongascii()
calls. However, this object is not referenced in TQString, which may result in very limited validity. We have encountered this problem several times – for example, in connection with the use ofutf8()
for password. The current case was in KShutdown (already fixed in the next commit).There can be many uses of
utf8()
andlocal8Bit()
that are not safe. We can either check all the uses of these methods in all source codes – at least all the recent commits "Added controlled conversions to char* instead of automatic ascii conversions." – yes, this is my recent contribution to potential problems. Or make theutf8()
andlocal8Bit()
methods safe as well asascii()
andlatin1()
– as it is in the proposed patch.The structure of TQStringData is the internal structure used only in TQString. Changing this structure will not cause break API / ABI compatibility. I believe that the proposed solution can eliminate some hidden issues. Therefore, I would like to also backport that solution into the R14.0.x branch.
What is your opinion?
IMO, there is no need to use new/delete for cString. We can make it a normal member of TQStringData without having to worry about allocation and deallocation. Then use that member instead of rstr in utf8() and smilarly in local8bit(). This will also spare the double copy in those function to set the cString object.
My intent was as follows: Many TQString will never need utf8() or local8Bit() == will never need to allocate a space for TQCString, call constructor / destructor / additional overhead of creating TQCString. Therefore, I chose to use a pointer and allocation only if it is needed.
Good point, had not thought of that. As further discussed and agreed on IRC, let's go with your original solution
Ok, after discussion on IRC
As per IRC discussion, at line 6239 of original code, in case on no codec we are returning a TCString created from a const char* returned by latin1(). This will again result in dandling pointer if TCString is used in temporary expressions.
latin1() pointer is valid, but TCString constructor will make a deep copy of the array ==> c_str() would point to incorrect area if TCString is temporary. Need to save the object in cString before returning it.
Added use of cString for all necessary #ifdef variants.
Looks ok now
4e83f4f200
.