#6 Safe conversion TQString to char*

Merged
SlavekB merged 1 commits from feat/safe-TQString-char-conversions into master 9 months ago
SlavekB commented 9 months ago

Here are methods that can be used for conversion:

  1. If the TQT_NO_ASCII_CAST is not set, the ascii() method can be automatically used for the conversion.
    The problem is that many of these automatic conversions are wrong. It is better when TQT_NO_ASCII_CAST is set – it is default for CMake builds. The second problem is that the ascii() method is used – see below.

  2. Method ascii(). This method has two options as she behaves. If the global TQTextCodec::codecForCStrings is set, the codec will be used for the conversion. If it is not set, the call is the same as the latin1() method.
    The problem is that most of ascii() calls are wrong. Often these calls should be utf8() or local8Bit(). Alternatively, latin1() may be used.

  3. Method latin1().

  4. Method utf8().

  5. Method local8Bit().

All methods appear to be easily replaceable at first glance – for example, use str.local8Bit() instead of the wrong str.ascii(). But there is one fundamental difference that represents a hidden danger.

The latin1() and ascii() methods in both use modes create a internal buffer char* in the TQString object and return the pointer to this buffer. As a result, the return value is valid throughout the validity of the TQString object.

The utf8() and local8Bit() methods return a new TQCString object. The TQCString object provides a simple conversion to char*, which makes it easy to use to replace instances of wrong ascii() calls. However, this object is not referenced in TQString, which may result in very limited validity. We have encountered this problem several times – for example, in connection with the use of utf8() for password. The current case was in KShutdown (already fixed in the next commit).

There can be many uses of utf8() and local8Bit() that are not safe. We can either check all the uses of these methods in all source codes – at least all the recent commits “Added controlled conversions to char* instead of automatic ascii conversions.” – yes, this is my recent contribution to potential problems. Or make the utf8() and local8Bit() methods safe as well as ascii() and latin1() – as it is in the proposed patch.

The structure of TQStringData is the internal structure used only in TQString. Changing this structure will not cause break API / ABI compatibility. I believe that the proposed solution can eliminate some hidden issues. Therefore, I would like to also backport that solution into the R14.0.x branch.

What is your opinion?

Here are methods that can be used for conversion: 1. If the `TQT_NO_ASCII_CAST` is not set, the `ascii()` method can be automatically used for the conversion. <br/>The problem is that many of these automatic conversions are wrong. It is better when `TQT_NO_ASCII_CAST` is set – it is default for CMake builds. The second problem is that the `ascii()` method is used – see below. 2. Method `ascii()`. This method has two options as she behaves. If the global `TQTextCodec::codecForCStrings` is set, the codec will be used for the conversion. If it is not set, the call is the same as the `latin1()` method. <br/>The problem is that most of `ascii()` calls are wrong. Often these calls should be `utf8()` or `local8Bit()`. Alternatively, `latin1()` may be used. 3. Method `latin1()`. 4. Method `utf8()`. 5. Method `local8Bit()`. All methods appear to be easily replaceable at first glance – for example, use `str.local8Bit()` instead of the wrong `str.ascii()`. But there is one fundamental difference that represents a hidden danger. The `latin1()` and `ascii()` methods in both use modes create a internal buffer char* in the TQString object and return the pointer to this buffer. As a result, the return value is valid throughout the validity of the TQString object. The `utf8()` and `local8Bit()` methods return a new TQCString object. The TQCString object provides a simple conversion to char*, which makes it easy to use to replace instances of wrong `ascii()` calls. However, this object is not referenced in TQString, which may result in very limited validity. We have encountered this problem several times – for example, in connection with the use of `utf8()` for password. The current case was in [KShutdown](../kshutdown/commit/ffbcad84f2d75202fea218e81bb9028b2a35e9c4) (already fixed in the next commit). There can be many uses of `utf8()` and `local8Bit()` that are not safe. We can either check all the uses of these methods in all source codes – at least all the recent commits "Added controlled conversions to char* instead of automatic ascii conversions." – yes, this is my recent contribution to potential problems. Or make the `utf8()` and `local8Bit()` methods safe as well as `ascii()` and `latin1()` – as it is in the proposed patch. The structure of TQStringData is the internal structure used only in TQString. Changing this structure will not cause break API / ABI compatibility. I believe that the proposed solution can eliminate some hidden issues. Therefore, I would like to also backport that solution into the R14.0.x branch. What is your opinion?
SlavekB added the
PR/rfc
label 9 months ago
MicheleC requested changes 9 months ago
IMO, there is no need to use new/delete for cString. We can make it a normal member of TQStringData without having to worry about allocation and deallocation. Then use that member instead of rstr in utf8() and smilarly in local8bit(). This will also spare the double copy in those function to set the cString object.
SlavekB commented 9 months ago
Owner

My intent was as follows: Many TQString will never need utf8() or local8Bit() == will never need to allocate a space for TQCString, call constructor / destructor / additional overhead of creating TQCString. Therefore, I chose to use a pointer and allocation only if it is needed.

My intent was as follows: Many TQString will never need utf8() or local8Bit() == will never need to allocate a space for TQCString, call constructor / destructor / additional overhead of creating TQCString. Therefore, I chose to use a pointer and allocation only if it is needed.
MicheleC commented 9 months ago
Owner

Good point, had not thought of that. As further discussed and agreed on IRC, let’s go with your original solution

Good point, had not thought of that. As further discussed and agreed on IRC, let's go with your original solution
MicheleC approved these changes 9 months ago
Ok, after discussion on IRC
MicheleC commented 9 months ago
Owner

As per IRC discussion, at line 6239 of original code, in case on no codec we are returning a TCString created from a const char* returned by latin1(). This will again result in dandling pointer if TCString is used in temporary expressions.

latin1() pointer is valid, but TCString constructor will make a deep copy of the array ==> c_str() would point to incorrect area if TCString is temporary. Need to save the object in cString before returning it.

As per IRC discussion, at line 6239 of original code, in case on no codec we are returning a TCString created from a const char* returned by latin1(). This will again result in dandling pointer if TCString is used in temporary expressions. latin1() pointer is valid, but TCString constructor will make a deep copy of the array ==> c_str() would point to incorrect area if TCString is temporary. Need to save the object in cString before returning it.
SlavekB commented 9 months ago
Owner

Added use of cString for all necessary #ifdef variants.

Added use of cString for all necessary #ifdef variants.
MicheleC approved these changes 9 months ago
Looks ok now
SlavekB removed the
PR/rfc
label 9 months ago
SlavekB deleted branch feat/safe-TQString-char-conversions 9 months ago
SlavekB added this to the R14.0.6 release milestone 9 months ago

Reviewers

MicheleC approved these changes 9 months ago
The pull request has been merged as 4e83f4f200.
Sign in to join this conversation.
Loading…
Cancel
Save
There is no content yet.