#6 Safe conversion TQString to char*

Fusionado
SlavekB fusionados 1 commits de feat/safe-TQString-char-conversions en master hace 1 año
SlavekB comentado hace 1 año

Here are methods that can be used for conversion:

  1. If the TQT_NO_ASCII_CAST is not set, the ascii() method can be automatically used for the conversion.
    The problem is that many of these automatic conversions are wrong. It is better when TQT_NO_ASCII_CAST is set – it is default for CMake builds. The second problem is that the ascii() method is used – see below.

  2. Method ascii(). This method has two options as she behaves. If the global TQTextCodec::codecForCStrings is set, the codec will be used for the conversion. If it is not set, the call is the same as the latin1() method.
    The problem is that most of ascii() calls are wrong. Often these calls should be utf8() or local8Bit(). Alternatively, latin1() may be used.

  3. Method latin1().

  4. Method utf8().

  5. Method local8Bit().

All methods appear to be easily replaceable at first glance – for example, use str.local8Bit() instead of the wrong str.ascii(). But there is one fundamental difference that represents a hidden danger.

The latin1() and ascii() methods in both use modes create a internal buffer char* in the TQString object and return the pointer to this buffer. As a result, the return value is valid throughout the validity of the TQString object.

The utf8() and local8Bit() methods return a new TQCString object. The TQCString object provides a simple conversion to char*, which makes it easy to use to replace instances of wrong ascii() calls. However, this object is not referenced in TQString, which may result in very limited validity. We have encountered this problem several times – for example, in connection with the use of utf8() for password. The current case was in KShutdown (already fixed in the next commit).

There can be many uses of utf8() and local8Bit() that are not safe. We can either check all the uses of these methods in all source codes – at least all the recent commits “Added controlled conversions to char* instead of automatic ascii conversions.” – yes, this is my recent contribution to potential problems. Or make the utf8() and local8Bit() methods safe as well as ascii() and latin1() – as it is in the proposed patch.

The structure of TQStringData is the internal structure used only in TQString. Changing this structure will not cause break API / ABI compatibility. I believe that the proposed solution can eliminate some hidden issues. Therefore, I would like to also backport that solution into the R14.0.x branch.

What is your opinion?

Here are methods that can be used for conversion: 1. If the `TQT_NO_ASCII_CAST` is not set, the `ascii()` method can be automatically used for the conversion. <br/>The problem is that many of these automatic conversions are wrong. It is better when `TQT_NO_ASCII_CAST` is set – it is default for CMake builds. The second problem is that the `ascii()` method is used – see below. 2. Method `ascii()`. This method has two options as she behaves. If the global `TQTextCodec::codecForCStrings` is set, the codec will be used for the conversion. If it is not set, the call is the same as the `latin1()` method. <br/>The problem is that most of `ascii()` calls are wrong. Often these calls should be `utf8()` or `local8Bit()`. Alternatively, `latin1()` may be used. 3. Method `latin1()`. 4. Method `utf8()`. 5. Method `local8Bit()`. All methods appear to be easily replaceable at first glance – for example, use `str.local8Bit()` instead of the wrong `str.ascii()`. But there is one fundamental difference that represents a hidden danger. The `latin1()` and `ascii()` methods in both use modes create a internal buffer char* in the TQString object and return the pointer to this buffer. As a result, the return value is valid throughout the validity of the TQString object. The `utf8()` and `local8Bit()` methods return a new TQCString object. The TQCString object provides a simple conversion to char*, which makes it easy to use to replace instances of wrong `ascii()` calls. However, this object is not referenced in TQString, which may result in very limited validity. We have encountered this problem several times – for example, in connection with the use of `utf8()` for password. The current case was in [KShutdown](../kshutdown/commit/ffbcad84f2d75202fea218e81bb9028b2a35e9c4) (already fixed in the next commit). There can be many uses of `utf8()` and `local8Bit()` that are not safe. We can either check all the uses of these methods in all source codes – at least all the recent commits "Added controlled conversions to char* instead of automatic ascii conversions." – yes, this is my recent contribution to potential problems. Or make the `utf8()` and `local8Bit()` methods safe as well as `ascii()` and `latin1()` – as it is in the proposed patch. The structure of TQStringData is the internal structure used only in TQString. Changing this structure will not cause break API / ABI compatibility. I believe that the proposed solution can eliminate some hidden issues. Therefore, I would like to also backport that solution into the R14.0.x branch. What is your opinion?
SlavekB ha añadido la etiqueta
PR/rfc
hace 1 año
MicheleC cambios solicitados hace 1 año
IMO, there is no need to use new/delete for cString. We can make it a normal member of TQStringData without having to worry about allocation and deallocation. Then use that member instead of rstr in utf8() and smilarly in local8bit(). This will also spare the double copy in those function to set the cString object.
SlavekB comentado hace 1 año
Propietario

My intent was as follows: Many TQString will never need utf8() or local8Bit() == will never need to allocate a space for TQCString, call constructor / destructor / additional overhead of creating TQCString. Therefore, I chose to use a pointer and allocation only if it is needed.

My intent was as follows: Many TQString will never need utf8() or local8Bit() == will never need to allocate a space for TQCString, call constructor / destructor / additional overhead of creating TQCString. Therefore, I chose to use a pointer and allocation only if it is needed.
MicheleC comentado hace 1 año
Propietario

Good point, had not thought of that. As further discussed and agreed on IRC, let's go with your original solution

Good point, had not thought of that. As further discussed and agreed on IRC, let's go with your original solution
MicheleC aprobado estos cambios hace 1 año
Ok, after discussion on IRC
MicheleC comentado hace 1 año
Propietario

As per IRC discussion, at line 6239 of original code, in case on no codec we are returning a TCString created from a const char* returned by latin1(). This will again result in dandling pointer if TCString is used in temporary expressions.

latin1() pointer is valid, but TCString constructor will make a deep copy of the array ==> c_str() would point to incorrect area if TCString is temporary. Need to save the object in cString before returning it.

As per IRC discussion, at line 6239 of original code, in case on no codec we are returning a TCString created from a const char* returned by latin1(). This will again result in dandling pointer if TCString is used in temporary expressions. latin1() pointer is valid, but TCString constructor will make a deep copy of the array ==> c_str() would point to incorrect area if TCString is temporary. Need to save the object in cString before returning it.
SlavekB comentado hace 1 año
Propietario

Added use of cString for all necessary #ifdef variants.

Added use of cString for all necessary #ifdef variants.
MicheleC aprobado estos cambios hace 1 año
Looks ok now
SlavekB ha eliminado la etiqueta
PR/rfc
hace 1 año
SlavekB rama eliminada feat/safe-TQString-char-conversions hace 1 año
SlavekB ha añadido esto al hito R14.0.6 release hace 1 año ' issues.change_milestone_at=`modificó el hito de %!s(MISSING) a %!s(MISSING) %!s(MISSING)

Revisores

MicheleC aprobado estos cambios hace 1 año
El Pull Request se ha fusionado como 4e83f4f200.
Inicie sesión para unirse a esta conversación.
Sin Milestone
No asignados
2 participantes
Fecha de vencimiento

Sin fecha de vencimiento.

Dependencias

Este pull request actualmente no tiene ninguna dependencia.

Cargando…
Cancelar
Guardar
Aún no existe contenido.