Safe conversion TQString to char* #6

Sloučený
SlavekB sloučil 1 commity z větve feat/safe-TQString-char-conversions do větve master před před 5 roky
SlavekB okomentoval před 5 roky
Vlastník

Here are methods that can be used for conversion:

  1. If the TQT_NO_ASCII_CAST is not set, the ascii() method can be automatically used for the conversion.

    The problem is that many of these automatic conversions are wrong. It is better when TQT_NO_ASCII_CAST is set – it is default for CMake builds. The second problem is that the ascii() method is used – see below.

  2. Method ascii(). This method has two options as she behaves. If the global TQTextCodec::codecForCStrings is set, the codec will be used for the conversion. If it is not set, the call is the same as the latin1() method.

    The problem is that most of ascii() calls are wrong. Often these calls should be utf8() or local8Bit(). Alternatively, latin1() may be used.

  3. Method latin1().

  4. Method utf8().

  5. Method local8Bit().

All methods appear to be easily replaceable at first glance – for example, use str.local8Bit() instead of the wrong str.ascii(). But there is one fundamental difference that represents a hidden danger.

The latin1() and ascii() methods in both use modes create a internal buffer char* in the TQString object and return the pointer to this buffer. As a result, the return value is valid throughout the validity of the TQString object.

The utf8() and local8Bit() methods return a new TQCString object. The TQCString object provides a simple conversion to char*, which makes it easy to use to replace instances of wrong ascii() calls. However, this object is not referenced in TQString, which may result in very limited validity. We have encountered this problem several times – for example, in connection with the use of utf8() for password. The current case was in KShutdown (already fixed in the next commit).

There can be many uses of utf8() and local8Bit() that are not safe. We can either check all the uses of these methods in all source codes – at least all the recent commits "Added controlled conversions to char* instead of automatic ascii conversions." – yes, this is my recent contribution to potential problems. Or make the utf8() and local8Bit() methods safe as well as ascii() and latin1() – as it is in the proposed patch.

The structure of TQStringData is the internal structure used only in TQString. Changing this structure will not cause break API / ABI compatibility. I believe that the proposed solution can eliminate some hidden issues. Therefore, I would like to also backport that solution into the R14.0.x branch.

What is your opinion?

Here are methods that can be used for conversion: 1. If the `TQT_NO_ASCII_CAST` is not set, the `ascii()` method can be automatically used for the conversion. <br/>The problem is that many of these automatic conversions are wrong. It is better when `TQT_NO_ASCII_CAST` is set – it is default for CMake builds. The second problem is that the `ascii()` method is used – see below. 2. Method `ascii()`. This method has two options as she behaves. If the global `TQTextCodec::codecForCStrings` is set, the codec will be used for the conversion. If it is not set, the call is the same as the `latin1()` method. <br/>The problem is that most of `ascii()` calls are wrong. Often these calls should be `utf8()` or `local8Bit()`. Alternatively, `latin1()` may be used. 3. Method `latin1()`. 4. Method `utf8()`. 5. Method `local8Bit()`. All methods appear to be easily replaceable at first glance – for example, use `str.local8Bit()` instead of the wrong `str.ascii()`. But there is one fundamental difference that represents a hidden danger. The `latin1()` and `ascii()` methods in both use modes create a internal buffer char* in the TQString object and return the pointer to this buffer. As a result, the return value is valid throughout the validity of the TQString object. The `utf8()` and `local8Bit()` methods return a new TQCString object. The TQCString object provides a simple conversion to char*, which makes it easy to use to replace instances of wrong `ascii()` calls. However, this object is not referenced in TQString, which may result in very limited validity. We have encountered this problem several times – for example, in connection with the use of `utf8()` for password. The current case was in [KShutdown](../kshutdown/commit/ffbcad84f2d75202fea218e81bb9028b2a35e9c4) (already fixed in the next commit). There can be many uses of `utf8()` and `local8Bit()` that are not safe. We can either check all the uses of these methods in all source codes – at least all the recent commits "Added controlled conversions to char* instead of automatic ascii conversions." – yes, this is my recent contribution to potential problems. Or make the `utf8()` and `local8Bit()` methods safe as well as `ascii()` and `latin1()` – as it is in the proposed patch. The structure of TQStringData is the internal structure used only in TQString. Changing this structure will not cause break API / ABI compatibility. I believe that the proposed solution can eliminate some hidden issues. Therefore, I would like to also backport that solution into the R14.0.x branch. What is your opinion?
SlavekB přidal/a PR/rfc štítek před 5 roky
MicheleC požadované změny před 5 roky
MicheleC zanechal komentář
Vlastník

IMO, there is no need to use new/delete for cString. We can make it a normal member of TQStringData without having to worry about allocation and deallocation. Then use that member instead of rstr in utf8() and smilarly in local8bit(). This will also spare the double copy in those function to set the cString object.

IMO, there is no need to use new/delete for cString. We can make it a normal member of TQStringData without having to worry about allocation and deallocation. Then use that member instead of rstr in utf8() and smilarly in local8bit(). This will also spare the double copy in those function to set the cString object.
SlavekB okomentoval před 5 roky
Autor
Vlastník

My intent was as follows: Many TQString will never need utf8() or local8Bit() == will never need to allocate a space for TQCString, call constructor / destructor / additional overhead of creating TQCString. Therefore, I chose to use a pointer and allocation only if it is needed.

My intent was as follows: Many TQString will never need utf8() or local8Bit() == will never need to allocate a space for TQCString, call constructor / destructor / additional overhead of creating TQCString. Therefore, I chose to use a pointer and allocation only if it is needed.
MicheleC okomentoval před 5 roky
Vlastník

Good point, had not thought of that. As further discussed and agreed on IRC, let's go with your original solution

Good point, had not thought of that. As further discussed and agreed on IRC, let's go with your original solution
MicheleC schválil tyto změny před 5 roky
MicheleC zanechal komentář
Vlastník

Ok, after discussion on IRC

Ok, after discussion on IRC
MicheleC okomentoval před 5 roky
Vlastník

As per IRC discussion, at line 6239 of original code, in case on no codec we are returning a TCString created from a const char* returned by latin1(). This will again result in dandling pointer if TCString is used in temporary expressions.

latin1() pointer is valid, but TCString constructor will make a deep copy of the array ==> c_str() would point to incorrect area if TCString is temporary. Need to save the object in cString before returning it.

As per IRC discussion, at line 6239 of original code, in case on no codec we are returning a TCString created from a const char* returned by latin1(). This will again result in dandling pointer if TCString is used in temporary expressions. latin1() pointer is valid, but TCString constructor will make a deep copy of the array ==> c_str() would point to incorrect area if TCString is temporary. Need to save the object in cString before returning it.
SlavekB okomentoval před 5 roky
Autor
Vlastník

Added use of cString for all necessary #ifdef variants.

Added use of cString for all necessary #ifdef variants.
MicheleC schválil tyto změny před 5 roky
MicheleC zanechal komentář
Vlastník

Looks ok now

Looks ok now
SlavekB odstranil/a PR/rfc štítek před 5 roky
SlavekB uzavřel/a tento požadavek na natažení před 5 roky
SlavekB odstranil/a větev feat/safe-TQString-char-conversions před 5 roky
SlavekB přidal/a toto do milníku R14.0.6 release před 5 roky
Požadavek na natažení byl sloučen jako 4e83f4f200.
Přihlaste se pro zapojení do konverzace.
Žádní posuzovatelé
Bez milníku
Bez zpracovatelů
2 účastníků
Oznámení
Termín dokončení

Žádný termín dokončení.

Závislosti

Nejsou nastaveny žádné závislosti.

Reference: TDE/tqt3#6
Načítá se…
Není zde žádný obsah.