WIP: Extend meaning of "Hidden Files" (PR version 3) #179

Closed
VinceR wants to merge 3 commits from issue/270/tdelibs-V3 into master
VinceR commented 2 years ago
Collaborator
  1. Introduce TDEStringMatcher, a general purpose iterative string matching class. Additional information about the class can be found in tdecore/README.tdestringmatcher. Files involved:
tdecore/tdestringmatcher.cpp # new
tdecore/tdestringmatcher.h # new
tdecore/README.tdestringmatcher # new
tdecore/CMakeLists.txt
  1. Introduce concept of a hidden file matcher instance of the TDEStringMatcher class that can be used to extend the definition "hidden" files beyond traditional "dotfiles".

  2. Create a global default instance of a hidden file matcher for use in applications that don't need their own. Files involved:

tdecore/tdeglobal.cpp
tdecore/tdeglobal.h
  1. Modify KFileItem class to utilize a hidden file matcher when evaluating whether or not a filesystem object is hidden. Files involved:
tdeio/tdeio/tdefileitem.cpp
tdeio/tdeio/tdefileitem.h

This commit partially addresses issue TDE/tdebase#270. Additional changes to tdelibs, tdebase/libkonq, and tdebase/konqueror will be required to resolve that issue.

1. Introduce `TDEStringMatcher`, a general purpose iterative string matching class. Additional information about the class can be found in `tdecore/README.tdestringmatcher`. Files involved: ``` tdecore/tdestringmatcher.cpp # new tdecore/tdestringmatcher.h # new tdecore/README.tdestringmatcher # new tdecore/CMakeLists.txt ``` 2. Introduce concept of a *hidden file matcher* instance of the `TDEStringMatcher` class that can be used to extend the definition "hidden" files beyond traditional "dotfiles". 3. Create a global default instance of a *hidden file matcher* for use in applications that don't need their own. Files involved: ``` tdecore/tdeglobal.cpp tdecore/tdeglobal.h ``` 4. Modify `KFileItem` class to utilize a *hidden file matcher* when evaluating whether or not a filesystem object is hidden. Files involved: ``` tdeio/tdeio/tdefileitem.cpp tdeio/tdeio/tdefileitem.h ``` This commit partially addresses issue TDE/tdebase#270. Additional changes to `tdelibs`, `tdebase/libkonq`, and `tdebase/konqueror` will be required to resolve that issue.
VinceR added the SL/minor PR/wip labels 2 years ago
VinceR added 1 commit 2 years ago
78905ffa6a
Introduce TDEStringMatcher, a general purpose iterative string
VinceR commented 2 years ago
Poster
Collaborator

Considerations for reviewers:

  • This code, along with additional code not in this commit, has been successfully tested in a chroot environment running konqueror. To the best of my knowledge, the code successfully addresses issue # 270 and manifests no bugs or noticeable performance penalty.
  • The code submitted for review has numerous trace statements that have been essential for verification and debugging. These trace statements start with a defined macro TSMTRACE and should be ignored when reviewing - they will be removed when final testing is complete.
  • Recent changes introduced TDEStringMatcher signals and associated KFileItem slots. The changes are implemented by #define TSMSIGNALS. After further testing, the associated preprocessor conditional directives will be removed.
Considerations for reviewers: * This code, along with additional code not in this commit, has been successfully tested in a chroot environment running `konqueror`. To the best of my knowledge, the code successfully addresses issue # 270 and manifests no bugs or noticeable performance penalty. * The code submitted for review has numerous trace statements that have been essential for verification and debugging. These trace statements start with a defined macro `TSMTRACE` and should be ignored when reviewing - they will be removed when final testing is complete. * Recent changes introduced `TDEStringMatcher` signals and associated `KFileItem` slots. The changes are implemented by `#define TSMSIGNALS`. After further testing, the associated preprocessor conditional directives will be removed.
VinceR commented 2 years ago
Poster
Collaborator

@MicheleC,

This is in regard to this conversation in which the merits of using associated private classes was dicussed.

  1. As you will see from the TDEStringMatcher code, I did follow your advice and put TQString patternString into a private class instead of declaring it as private: property in the main class. But on second look, this looks contrived and perhaps pointless given how small the private class is. What do you think about this?

  2. As you will see from the KFileItem code, I have added 2 protected class variables (one boolean and one an object pointer. I would definitely agree that both of these should be in an associated private class. The problem is that the KFileItemPrivate class seems to be special purpose and instantiated only conditionally in various methods. Should create another private class, use KFileItemPrivate but allocate it early in the KFileItem constructor, or just leave things as they are?

@MicheleC, This is in regard to this [conversation](https://mirror.git.trinitydesktop.org/gitea/TDE/tdelibs/pulls/178#issuecomment-21138) in which the merits of using associated private classes was dicussed. 1. As you will see from the `TDEStringMatcher` code, I did follow your advice and put `TQString patternString` into a private class instead of declaring it as `private:` property in the main class. But on second look, this looks contrived and perhaps pointless given how small the private class is. What do you think about this? 2. As you will see from the `KFileItem` code, I have added 2 protected class variables (one boolean and one an object pointer. I would definitely agree that both of these should be in an associated private class. The problem is that the `KFileItemPrivate` class seems to be special purpose and instantiated only conditionally in various methods. Should create another private class, use `KFileItemPrivate` but allocate it early in the `KFileItem` constructor, or just leave things as they are?
Owner

@VinceR
thanks for the new PR. FYI, I expect to have a close look at this next weekend. The week ahead is busy (again) and the time available for TDE will be devoted to finalize R14.0.13, whose code base will be frozen this Friday.

@VinceR thanks for the new PR. FYI, I expect to have a close look at this next weekend. The week ahead is busy (again) and the time available for TDE will be devoted to finalize R14.0.13, whose code base will be frozen this Friday.
Owner

This PR replaces PR #178

This PR replaces PR #178
MicheleC reviewed 2 years ago
MicheleC left a comment
Owner

First part of the review. Still do do 3 files.

First part of the review. Still do do 3 files.
TSMTRACE << "TDEGlobal::hiddenFileMatcher(): Global HFM initialization STARTED" << endl;
_hiddenFileMatcher = new TDEStringMatcher();
TDEGlobal::config()->setGroup( "General" );
TQString settings = TDEGlobal::config()->readEntry( "globalHiddenFileSpec", "/oW/.*" );
Owner

Should this be /ow/.* (lower case w)? Or are w and W different?

Should this be `/ow/.*` (lower case `w`)? Or are `w` and `W` different?
VinceR commented 2 years ago
Poster
Collaborator

Should this be /ow/.* (lower case w)? Or are w and W different?

This was a typographical error, it was supposed to be /ow/.*. There may be a case to be made for a relaxed case-insensitive evaluation of option characters, where 'w' and 'W' are interchangeable.

>Should this be /ow/.* (lower case w)? Or are w and W different? This was a typographical error, it was supposed to be `/ow/.*`. There may be a case to be made for a relaxed case-insensitive evaluation of option characters, where 'w' and 'W' are interchangeable.
Owner

yes, we could use 'w' or 'W' interchangably, as long as we document that accordingly. So for the other options.

yes, we could use 'w' or 'W' interchangably, as long as we document that accordingly. So for the other options.
Owner

Actually this should be /ow/p.*/ (note the p), right?

Actually this should be `/ow/p.*/` (note the `p`), right?
VinceR marked this conversation as resolved
}
// Initialize hidden file matching apparatus
setHiddenFileMatcher( TDEGlobal::hiddenFileMatcher() );
Owner

This is fine. Nevertheless there are some constructors that do not use init() but assign(). We need to add the same call there too.

This is fine. Nevertheless there are some constructors that do not use `init()` but `assign()`. We need to add the same call there too.
VinceR marked this conversation as resolved
}
bool KFileItem::isHidden() const
void KFileItem::resetHiddenFileMatcher()
Owner

See comments in setHiddenFileMatcher(). Possibly this method is no longer needed.

See comments in `setHiddenFileMatcher()`. Possibly this method is no longer needed.
VinceR marked this conversation as resolved
if ( hiddenFileMatcher == m_pHiddenFileMatcher )
return;
if ( hiddenFileMatcher == 0 || hiddenFileMatcher == nullptr ) {
Owner

no need to double check. Just if (hiddenFileMatcher == nullptr) is enough.

no need to double check. Just `if (hiddenFileMatcher == nullptr)` is enough.
VinceR commented 2 years ago
Poster
Collaborator

no need to double check. Just if (hiddenFileMatcher == nullptr) is enough.

I thought that '0' and nullptr were different, with the latter pointing to a specific address interpreted to be "unset" versus literal address location 0.

>no need to double check. Just if (hiddenFileMatcher == nullptr) is enough. I thought that '0' and `nullptr` were different, with the latter pointing to a specific address interpreted to be "unset" versus literal address location 0.
Owner

nullptr is technically different than 0 but checking for nullptr is sufficient to cover both cases.

`nullptr` is technically different than `0` but checking for `nullptr` is sufficient to cover both cases.
VinceR marked this conversation as resolved
if ( hiddenFileMatcher == m_pHiddenFileMatcher )
return;
if ( hiddenFileMatcher == 0 || hiddenFileMatcher == nullptr ) {
kdWarning() << "KFileItem::setHiddenFileMatcher: refusing to process null pointer passed by caller" << endl;
Owner

We need to be able to set nullptr as a potential matcher. The reason for this is to correctly handle the case in which the global file matcher is destroyed. The current code would try to call resetHiddenFileMatcher() which would set the matcher back to the global file matcher being destroyed. This is a design mistake.
If we call setHiddenFileMatcher() with a nullptr, we disconnect the existing matcher and set m_pHiddenFileMatcher to nullptr. Of course later we need to check that the matcher is not null before dereferencing it.
By the way, we probably don't even need resetHiddenFileMatcher() any more if we define setHiddenFileMatcher as follow: void setHiddenFileMatcher( TDEStringMatcher *hiddenFileMatcher = nullptr);. In this case, destroyed() could simply be connected to setHiddenFileMatcher()`. I leave a bit of a doubt on this and it should be tested first, to make sure the signal-slot logic does not complain due to the difference in signature of the signal and slot.

We need to be able to set nullptr as a potential matcher. The reason for this is to correctly handle the case in which the global file matcher is destroyed. The current code would try to call `resetHiddenFileMatcher()` which would set the matcher back to the global file matcher being destroyed. This is a design mistake. If we call `setHiddenFileMatcher()` with a nullptr, we disconnect the existing matcher and set `m_pHiddenFileMatcher` to nullptr. Of course later we need to check that the matcher is not null before dereferencing it. By the way, we probably don't even need `resetHiddenFileMatcher()` any more if we define `setHiddenFileMatcher` as follow: `void setHiddenFileMatcher( TDEStringMatcher *hiddenFileMatcher = nullptr);`. In this case, `destroyed()` could simply be connected to setHiddenFileMatcher()`. I leave a bit of a doubt on this and it should be tested first, to make sure the signal-slot logic does not complain due to the difference in signature of the signal and slot.
VinceR commented 2 years ago
Poster
Collaborator

We need to be able to set nullptr as a potential matcher. The reason for this is to correctly handle the case in which the global file matcher is destroyed. The current code would try to call resetHiddenFileMatcher() which would set the matcher back to the global file matcher being destroyed. This is a design mistake.

Under what circumstances would the global matcher be destroyed? I thought by its very nature it lasts the life of the application and could therefore be counted on by all applications. If it cannot, then we should re-think how the global matcher is implemented.

If we call setHiddenFileMatcher() with a nullptr, we disconnect the existing matcher and set m_pHiddenFileMatcher to nullptr. Of course later we need to check that the matcher is not null before dereferencing it.

The only code outside of the KFileItem itself that should be calling setHiddenFileMatcher() is that in the KDirLister that allocated the KFileItem in the first place. It will never call setHiddenFileMatcher() with a nullptr, but with a pointer to its own hidden file matcher.

That said, if we do as you suggest, then I see the followinng options as reasonable responses to setHiddenFileMatcher(nullptr):

  1. We don't have a hidden file matcher anymore, so set m_bHiddenByMatcher to false
  2. Set m_bHiddenByMatcher based on hard coded dotfile matching.
  3. Set m_pHiddenFileMatcher = TDEGlobal::hiddenFileMatcher(). Even if the global matcher had somehow been previously destroyed, make it come to life again. Any reason why that wouldn't work?

By the way, we probably don't even need resetHiddenFileMatcher() any more if we define setHiddenFileMatcher as follow: void setHiddenFileMatcher( TDEStringMatcher *hiddenFileMatcher = nullptr);. In this case, destroyed() could simply be connected to setHiddenFileMatcher()`. I leave a bit of a doubt on this and it should be tested first, to make sure the signal-slot logic does not complain due to the difference in signature of the signal and slot.

I think this could work, but we do need to settle how nullptr should be interpreted.

> We need to be able to set nullptr as a potential matcher. The reason for this is to correctly handle the case in which the global file matcher is destroyed. The current code would try to call `resetHiddenFileMatcher()` which would set the matcher back to the global file matcher being destroyed. This is a design mistake. Under what circumstances would the global matcher be destroyed? I thought by its very nature it lasts the life of the application and could therefore be counted on by all applications. If it cannot, then we should re-think how the global matcher is implemented. > If we call `setHiddenFileMatcher()` with a nullptr, we disconnect the existing matcher and set `m_pHiddenFileMatcher` to nullptr. Of course later we need to check that the matcher is not null before dereferencing it. The **only** code outside of the `KFileItem` itself that should be calling `setHiddenFileMatcher()` is that in the `KDirLister` that allocated the `KFileItem` in the first place. It will never call `setHiddenFileMatcher()` with a `nullptr`, but with a pointer to its own hidden file matcher. That said, if we do as you suggest, then I see the followinng options as reasonable responses to `setHiddenFileMatcher(nullptr)`: 1. We don't have a hidden file matcher anymore, so set `m_bHiddenByMatcher` to `false` 1. Set `m_bHiddenByMatcher` based on hard coded dotfile matching. 1. Set `m_pHiddenFileMatcher = TDEGlobal::hiddenFileMatcher()`. Even if the global matcher had somehow been previously destroyed, make it come to life again. Any reason why that wouldn't work? > By the way, we probably don't even need `resetHiddenFileMatcher()` any more if we define `setHiddenFileMatcher` as follow: `void setHiddenFileMatcher( TDEStringMatcher *hiddenFileMatcher = nullptr);`. In this case, `destroyed()` could simply be connected to setHiddenFileMatcher()`. I leave a bit of a doubt on this and it should be tested first, to make sure the signal-slot logic does not complain due to the difference in signature of the signal and slot. I think this could work, but we do need to settle how `nullptr` should be interpreted.
Owner

Under what circumstances would the global matcher be destroyed? I thought by its very nature it lasts the life of the application and could therefore be counted on by all applications. If it cannot, then we should re-think how the global matcher is implemented.

The global file matcher should be destroyed only when an application using the tdecore lib is closed, so yes in practice this should not be a real problem. Nevertheless it is a design mistake and the fact that the application is closing should not be an excuse to justify the mistake. Since there is a simple solution (handling nullptr with a few extra if()), it should be fairly simple to implement the fix.

It will never call setHiddenFileMatcher() with a nullptr, but with a pointer to its own hidden file matcher.

I wasn't pointing out a bug, just suggesting how to use nullptr and setHiddenFileMatcher to disconnect a matcher correctly.

  1. We don't have a hidden file matcher anymore, so set m_bHiddenByMatcher to false

Yes, agree with you.

  1. Set m_pHiddenFileMatcher = TDEGlobal::hiddenFileMatcher(). Even if the global matcher had somehow been previously destroyed, make it come to life again. Any reason why that wouldn't work?

No because the object may not exists and if you try to use it you may access invalid memory and crash your application. If a new global matcher is created again, it won't have the same address anyway. So we should simply set m_pHiddenFileMatcher to nullptr after disconnecting the existing matcher, if any.

> Under what circumstances would the global matcher be destroyed? I thought by its very nature it lasts the life of the application and could therefore be counted on by all applications. If it cannot, then we should re-think how the global matcher is implemented. The global file matcher should be destroyed only when an application using the tdecore lib is closed, so yes in practice this should not be a real problem. Nevertheless it is a design mistake and the fact that the application is closing should not be an excuse to justify the mistake. Since there is a simple solution (handling nullptr with a few extra `if()`), it should be fairly simple to implement the fix. > It will never call setHiddenFileMatcher() with a nullptr, but with a pointer to its own hidden file matcher. I wasn't pointing out a bug, just suggesting how to use `nullptr` and `setHiddenFileMatcher` to disconnect a matcher correctly. > 1. We don't have a hidden file matcher anymore, so set m_bHiddenByMatcher to false Yes, agree with you. > 3. Set m_pHiddenFileMatcher = TDEGlobal::hiddenFileMatcher(). Even if the global matcher had somehow been previously destroyed, make it come to life again. Any reason why that wouldn't work? No because the object may not exists and if you try to use it you may access invalid memory and crash your application. If a new global matcher is created again, it won't have the same address anyway. So we should simply set `m_pHiddenFileMatcher` to `nullptr` after disconnecting the existing matcher, if any.
VinceR commented 2 years ago
Poster
Collaborator

Sorry, I had misunderstood your main point. Yes, it is always good to be paranoid with when using pointers:)

Sorry, I had misunderstood your main point. Yes, it is always good to be paranoid with when using pointers:)
Owner

No worries, always good to ask in case of doubt 👍

No worries, always good to ask in case of doubt :+1:
VinceR commented 2 years ago
Poster
Collaborator

I have implemented all suggestions and everthing seems to still work correctly!

I have implemented all suggestions and everthing seems to still work correctly!
Owner

Sounds good :-)

Sounds good :-)
MicheleC marked this conversation as resolved
}
#ifdef TSMSIGNALS
if ( m_pHiddenFileMatcher != 0 && m_pHiddenFileMatcher != nullptr ) {
Owner

No double check required.

No double check required.
VinceR marked this conversation as resolved
#ifdef TSMSIGNALS
if ( m_pHiddenFileMatcher != 0 && m_pHiddenFileMatcher != nullptr ) {
TSMTRACE << " Attempting to disconnect slots from hidden file matcher signals ... " << endl;
if ( disconnect( m_pHiddenFileMatcher, 0, 0, 0 ) )
Owner

This is wrong, it basically disconnects all the signals emitted by the m_pHiddenFileMatcher, which is likely shared by many KFileItems.
We need to disconnect only the connection between m_pHiddenFileMatcher and this file item. Therefore a correct call would be:

disconnect(m_pHiddenFileMatcher, 0, this, 0);
This is wrong, it basically disconnects all the signals emitted by the `m_pHiddenFileMatcher`, which is likely shared by many `KFileItem`s. We need to disconnect only the connection between `m_pHiddenFileMatcher` and `this` file item. Therefore a correct call would be: ``` disconnect(m_pHiddenFileMatcher, 0, this, 0); ```
VinceR commented 2 years ago
Poster
Collaborator

This is wrong, it basically disconnects all the signals emitted by the m_pHiddenFileMatcher, which is likely shared by many KFileItems.
We need to disconnect only the connection between m_pHiddenFileMatcher and this file item. Therefore a correct call would be ...

I will give this a try. I know that I got into major trouble trying to be too specific with disconnecting the patternsChanged signal, so I got pretty broad.

>This is wrong, it basically disconnects all the signals emitted by the m_pHiddenFileMatcher, which is likely shared by many KFileItems. We need to disconnect only the connection between m_pHiddenFileMatcher and this file item. Therefore a correct call would be ... I will give this a try. I know that I got into major trouble trying to be too specific with disconnecting the `patternsChanged` signal, so I got pretty broad.
Owner

Being specific is good. In that case you need to disconnect both patternsChanged and destroyed.
What I suggested above is a mid-way form which disconnects all signal-slot connections between the two objects.

Being specific is good. In that case you need to disconnect both `patternsChanged` and `destroyed`. What I suggested above is a mid-way form which disconnects all signal-slot connections between the two objects.
VinceR commented 2 years ago
Poster
Collaborator

I changed the statement to disconnect(m_pHiddenFileMatcher, 0, this, 0) and everything works fine. Historic note:

I had initially specified disconnect(m_pHiddenFileMatcher, 0, 0, 0) in a desperate effort to fix segfaults I was encountering with disconnect. The main problem turned out to be something else.

While changing and testing other parts of the code in response other comments, I started encountering disconnect segfaults again. Changing the code to what you suggested cleared these up.

I changed the statement to `disconnect(m_pHiddenFileMatcher, 0, this, 0)` and everything works fine. Historic note: I had initially specified `disconnect(m_pHiddenFileMatcher, 0, 0, 0)` in a desperate effort to fix segfaults I was encountering with `disconnect`. The main problem turned out to be something else. While changing and testing other parts of the code in response other comments, I started encountering `disconnect` segfaults again. Changing the code to what you suggested cleared these up.
Owner

The problem with disconnect(m_pHiddenFileMatcher, 0, 0, 0) was that you were disconnecting all the connections of the matcher. The same matcher is connected to all the KFileItem in the same KDirLister, so using disconnect as you initially did, was disconnecting the matcher from everything, not just the current KFileItem.

The problem with `disconnect(m_pHiddenFileMatcher, 0, 0, 0)` was that you were disconnecting all the connections of the matcher. The same matcher is connected to all the KFileItem in the same KDirLister, so using `disconnect` as you initially did, was disconnecting the matcher from everything, not just the current KFileItem.
VinceR marked this conversation as resolved
#ifdef TSMSIGNALS
TSMTRACE << " Attempting to reconnect slots to hidden file matcher signals ... " << endl;
if ( connect( m_pHiddenFileMatcher, TQT_SIGNAL( destroyed() ), this, TQT_SLOT( resetHiddenFileMatcher() ) ) )
Owner

Just a reminder to leave the connect/disconnect calls in the code once you remove the TSM traces.

Just a reminder to leave the `connect`/`disconnect` calls in the code once you remove the TSM traces.
VinceR commented 2 years ago
Poster
Collaborator

Just a reminder to leave the connect/disconnect calls in the code once you remove the TSM traces.

Did you mean removing the #ifdef TSMSIGNALS? I hope to remove both #defines.

Just a reminder to leave the connect/disconnect calls in the code once you remove the TSM traces. Did you mean removing the `#ifdef TSMSIGNALS`? I hope to remove both `#define`s.
Owner

I mean that when the TSM* stuff if removed, the body if the if() is empty. Therefore the if needs to be removed, but the connect in the if condition is required.

I mean that when the TSM* stuff if removed, the body if the `if()` is empty. Therefore the `if` needs to be removed, but the `connect` in the `if` condition is required.
VinceR marked this conversation as resolved
TSMTRACE << "KFileItem::reEvaluateHidden() called for " << m_url.fileName() <<endl ;
if ( !m_url.isEmpty() )
return m_url.fileName()[0] == '.';
m_bHiddenByMatcher = m_pHiddenFileMatcher->matchAny( m_url.fileName() );
Owner

As per comments in setHiddenFileMatcher(), we need to check that m_pHiddenFileMatcher is not null before using it. If it is null, by default the file is not hidden.

As per comments in `setHiddenFileMatcher()`, we need to check that `m_pHiddenFileMatcher` is not null before using it. If it is null, by default the file is not hidden.
VinceR marked this conversation as resolved
/**
* Sets object that encapsulates criteria for determining whether or not
* a filesystem entity is hidden based on characteristics of its name.
* Object is stored in @property m_pHiddenFileMatcher.
Owner

Object is stored in @property m_pHiddenFileMatcher. should be removed. APIs are supposed to hide away the details of the inner implementation :-)

`Object is stored in @property m_pHiddenFileMatcher.` should be removed. APIs are supposed to hide away the details of the inner implementation :-)
VinceR marked this conversation as resolved
void resetHiddenFileMatcher();
/**
* Checks whether or not the current filesystem object is "hidden" by
Owner

This comment can be moved at the beginning of the implementation of void reEvaluateHidden(); in the cpp file. Here in the .h file it would more appropriate something like Re-evaluate whether the item is hidden or not. Again, the details of the implementation should be hidden to the outside world.

This comment can be moved at the beginning of the implementation of `void reEvaluateHidden();` in the cpp file. Here in the .h file it would more appropriate something like `Re-evaluate whether the item is hidden or not`. Again, the details of the implementation should be hidden to the outside world.
VinceR marked this conversation as resolved
bool m_bMimeTypeKnown:1;
// Auto: check leading dot.
// Auto: always check if hidden.
Owner

Suggest Auto: check if the item is hidden.

Suggest `Auto: check if the item is hidden.`
VinceR marked this conversation as resolved
/**
* Object that encapsulates criteria for determining whether or not
* this filesystem entity is hidden based on characteristics of its
* name. This property is set by method setHiddenFileMatcher().
Owner

This property is set by method setHiddenFileMatcher(). can be removed.

`This property is set by method setHiddenFileMatcher().` can be removed.
VinceR marked this conversation as resolved
* this filesystem entity is hidden based on characteristics of its
* name. This property is set by method setHiddenFileMatcher().
*/
TDEStringMatcher *m_pHiddenFileMatcher = nullptr;
Owner

Using = nullptr; is not technically wrong since c++11 is allowed. But since all other members of KFileItem are initialized in the constructor, it would be cleaner to move nullptr together with them in the same place.

Using ` = nullptr;` is not technically wrong since c++11 is allowed. But since all other members of KFileItem are initialized in the constructor, it would be cleaner to move `nullptr` together with them in the same place.
VinceR commented 2 years ago
Poster
Collaborator

Using = nullptr; is not technically wrong since c++11 is allowed. But since all other members of KFileItem are initialized in the constructor, it would be cleaner to move nullptr together with them in the same place.

Acknowledged. An historic note:

Before I implemented signals/slots, it was sufficient to initialize m_pHiddenFileMatcher simply by calling setHiddenFileMatcher( TDEGlobal::hiddenFileMatcher() ). No other initialization was required.

After implementing signals/slots, I started getting segfaults during KFileItem initialization because I wasdisconnect-ing from the "previous" hidden file matcher ... which did not exist at this point, so m_pHiddenFileMatcher was pointing to some random location in memory. Pre-initializing m_pHiddenFileMatcher to nullptr cleared this up. I guess that I cannot assume that new zeroes out memory.

> Using ` = nullptr;` is not technically wrong since c++11 is allowed. But since all other members of KFileItem are initialized in the constructor, it would be cleaner to move `nullptr` together with them in the same place. Acknowledged. An historic note: Before I implemented signals/slots, it was sufficient to initialize `m_pHiddenFileMatcher` simply by calling `setHiddenFileMatcher( TDEGlobal::hiddenFileMatcher() )`. No other initialization was required. After implementing signals/slots, I started getting segfaults during `KFileItem` initialization because I was`disconnect`-ing from the "previous" hidden file matcher ... which did not exist at this point, so `m_pHiddenFileMatcher` was pointing to some random location in memory. Pre-initializing `m_pHiddenFileMatcher` to `nullptr` cleared this up. I guess that I cannot assume that `new` zeroes out memory.
Owner

m_pHiddenFileMatcher (llike any variable) needs to first be initialized. If not and trying to access the variable before the first assignment, then you would accessing invalid memory.
new is not the problem here. new simply allocate the memory and invoke the related constructor. TDEStringMatcher::TDEStringMatcher() was not initializing m_pHiddenFileMatcher, hence its content was random (I also missed to catch that in the review btw).

`m_pHiddenFileMatcher` (llike any variable) needs to first be initialized. If not and trying to access the variable before the first assignment, then you would accessing invalid memory. `new` is not the problem here. `new` simply allocate the memory and invoke the related constructor. `TDEStringMatcher::TDEStringMatcher()` was not initializing `m_pHiddenFileMatcher`, hence its content was random (I also missed to catch that in the review btw).
VinceR marked this conversation as resolved
MicheleC requested changes 2 years ago
MicheleC left a comment
Owner

Second and final part of the review.

Second and final part of the review.
'w' - Match patterns are to be interpreted as "wildcards"
'r' - Match patterns are to be interpreted as "regexes" (TQRegExp default)
'c' - Matching will be case-sensitive (TQRegExp default)
'c' - Matching will be case-INsensitive
Owner

'i' for case insensitive.

`'i'` for case insensitive.
VinceR marked this conversation as resolved
Files with a version number suffix will be matched via regex
and dotfiles will be matched via wildcard
Current and potential use of the TDEStringMatcher class include:
Owner

These could probably be removed.

Current and potential use of the TDEStringMatcher class include:

  * Expansion of definition of "hidden" files, addressing issue # 270.

  * Creation of a subclass that provides string parsing functions.
These could probably be removed. ``` Current and potential use of the TDEStringMatcher class include: * Expansion of definition of "hidden" files, addressing issue # 270. * Creation of a subclass that provides string parsing functions. ```
VinceR marked this conversation as resolved
TDEStringMatcher::~TDEStringMatcher()
{
patternList.setAutoDelete( true );
Owner

Need to move patternList.setAutoDelete( true ); to the constructor, otherwise when the patternlist is cleared in generatePatternList(), there will be memory leaks.

Need to move `patternList.setAutoDelete( true );` to the constructor, otherwise when the patternlist is cleared in `generatePatternList()`, there will be memory leaks.
VinceR marked this conversation as resolved
TQStringList specList = TQStringList::split( patternStringDivider, newPatternString.mid(1), true );
TQRegExp rxWork;
TQPtrList<TQRegExp> rxPatternList;
Owner

There is no need for rxPatternList, we can use patternList after having clear it.

There is no need for `rxPatternList`, we can use `patternList` after having clear it.
VinceR commented 2 years ago
Poster
Collaborator

There is no need for rxPatternList, we can use patternList after having clear it.

The original reason for rxPatternList was to fully validate patternString before actually changing patternList.

If a specific pattern is empty, we should ignore it and continue with the potential next pattern in the list.

If rxWork is not valid, we should ignore it and continue with the potential next pattern in the list.

Implementing these suggestions from other conversation items will make generatePatternList() completely tolerant of input. Given that, you are correct and we can operate directly on patternList. The UI should be able make sure that patternString doesn't have empty or invalid patterns.

>There is no need for rxPatternList, we can use patternList after having clear it. The original reason for `rxPatternList` was to fully validate `patternString` before actually changing `patternList`. >If a specific pattern is empty, we should ignore it and continue with the potential next pattern in the list. >If rxWork is not valid, we should ignore it and continue with the potential next pattern in the list. Implementing these suggestions from other conversation items will make `generatePatternList()` completely tolerant of input. Given that, you are correct and we can operate directly on `patternList`. The UI should be able make sure that `patternString` doesn't have empty or invalid patterns.
Owner

The original reason for rxPatternList was to fully validate patternString before actually changing patternList.

This is actually a good idea and I thought about that too. And in the end I came to the same conclusion that the UI should make sure patterns are correct.
But on second thought, I think you original idea is a better solution. Reason being:

  1. the string may come from a file manually edited by a user, which could be incorrect
  2. the function returns a bool to say whether it succeeded or not. So in this case it makes sense to check the string is fully valid before accepting it. The idea of discarding wrong patterns and continue as I had proposed in the comments, does not fit well with having a boolean return type.

So all in all, I think your original implementation was a better choice.

> The original reason for rxPatternList was to fully validate patternString before actually changing patternList. This is actually a good idea and I thought about that too. And in the end I came to the same conclusion that the UI should make sure patterns are correct. But on second thought, I think you original idea is a better solution. Reason being: 1. the string may come from a file manually edited by a user, which could be incorrect 2. the function returns a bool to say whether it succeeded or not. So in this case it makes sense to check the string is fully valid before accepting it. The idea of discarding wrong patterns and continue as I had proposed in the comments, does not fit well with having a boolean return type. So all in all, I think your original implementation was a better choice.
VinceR marked this conversation as resolved
TQChar specificationType = specification[0].lower();
switch ( specificationType ) {
case 'o' : {
TQString optionString = specification.mid(1).lower();
Owner

Instead of using an optionString which requires creating a new string and copy the specification into it, we can use specification and start from ` in the for loop below.

Instead of using an `optionString` which requires creating a new string and copy the specification into it, we can use `specification` and start from ` in the for loop below.
VinceR marked this conversation as resolved
TSMTRACE << " Processing match pattern: '" << pattern << "'" << endl;
if ( pattern.isEmpty() ) {
TSMTRACE << " Empty patterns are not allowed" << endl;
rxPatternList.clear();
Owner

If a specific pattern is empty, we should ignore it and continue with the potential next pattern in the list.

If a specific pattern is empty, we should ignore it and continue with the potential next pattern in the list.
VinceR marked this conversation as resolved
rxWork.setPattern( pattern );
if (! rxWork.isValid() ) {
TSMTRACE << " Invalid pattern" << endl;
rxPatternList.clear();
Owner

If rxWork is not valid, we should ignore it and continue with the potential next pattern in the list.

If rxWork is not valid, we should ignore it and continue with the potential next pattern in the list.
VinceR marked this conversation as resolved
}
// patternList.clear(); // no need to do this?
patternList.setAutoDelete( true );
Owner

patternList.setAutoDelete( true ); not required here.

`patternList.setAutoDelete( true );` not required here.
VinceR marked this conversation as resolved
}
}
if ( patternList.isEmpty() ) {
Owner

if there is no pattern in the list, IMO we should return true. The idea is that if there is no filtering string, the file is not filtered. We shall check with @SlavekB to see what he thinks.
Also we can move the if (patternList.isEmpty()) before the for loop (slightly faster).

  if ( patternList.isEmpty() )
  ... (without else block)

  for (..)
  ...

  return true;
}
if there is no pattern in the list, IMO we should return `true`. The idea is that if there is no filtering string, the file is not filtered. We shall check with @SlavekB to see what he thinks. Also we can move the `if (patternList.isEmpty())` before the `for` loop (slightly faster). ``` if ( patternList.isEmpty() ) ... (without else block) for (..) ... return true; } ```
VinceR commented 2 years ago
Poster
Collaborator

if there is no pattern in the list, IMO we should return true

I completely agree -- whatever was I thinking? :)

>if there is no pattern in the list, IMO we should return true I completely agree -- whatever was I thinking? :)
Owner

Ok, marking this resolved as I guess you have made an update on your side.

Ok, marking this resolved as I guess you have made an update on your side.
MicheleC marked this conversation as resolved
{
//-Debug: TSMTRACE << "Attempting to match string '" << stringToMatch << "' against ALL stored patterns" << endl;
for ( const TQRegExp *rxPattern : patternList ) {
if ( !
Owner

The if condition is wrong. It needs an extra couple of braces.

 if (!(
       ( rxPattern->wildcard() && rxPattern->exactMatch(stringToMatch)) ||
       (!rxPattern->wildcard() && rxPattern->search(stringToMatch) >= 0)
      ))
The if condition is wrong. It needs an extra couple of braces. ``` if (!( ( rxPattern->wildcard() && rxPattern->exactMatch(stringToMatch)) || (!rxPattern->wildcard() && rxPattern->search(stringToMatch) >= 0) ))
VinceR marked this conversation as resolved
}
}
if ( patternList.isEmpty() ) {
Owner

As above, if there is no pattern in the list, IMO we should return true.
We can move if ( patternList.isEmpty() ) before the for loop, slightly more efficient if the list is empty.

  if ( patternList.isEmpty() )
  ... (without else block)

  for (..)
  ...

  return true;
}
As above, if there is no pattern in the list, IMO we should return `true`. We can move `if ( patternList.isEmpty() )` before the `for` loop, slightly more efficient if the list is empty. ``` if ( patternList.isEmpty() ) ... (without else block) for (..) ... return true; } ```
VinceR marked this conversation as resolved
TDEStringMatcher();
~TDEStringMatcher();
/**
Owner

If we refer to README.tdestringmatcher, we should include such file in the installed files, since this comment is part of the public API.
Alternatively we can modify the comment like

Generate @property patternList from the provided @param newPatternString.
If we refer to `README.tdestringmatcher`, we should include such file in the installed files, since this comment is part of the public API. Alternatively we can modify the comment like ``` Generate @property patternList from the provided @param newPatternString. ```
VinceR commented 2 years ago
Poster
Collaborator

If we refer to README.tdestringmatcher, we should include such file in the installed files, since this comment is part of the public API.

I don't think forcing this file to be installed is a good idea (although my distro Gentoo proactively installs all of those readme-type files anyway).

I do think some type of reference is a good idea, how about something like this:

Refer to file README.tdestringmatcher in tdelibs/tdecore source code directory for more information on how the input string should be formatted.

> If we refer to `README.tdestringmatcher`, we should include such file in the installed files, since this comment is part of the public API. I don't think forcing this file to be installed is a good idea (although my distro Gentoo proactively installs all of those readme-type files anyway). I do think some type of reference is a good idea, how about something like this: *Refer to file README.tdestringmatcher in tdelibs/tdecore source code directory for more information on how the input string should be formatted.*
Owner

I have checked what we are doing with other README (for example README.kiosk in the same folder) and we are not installing it (at least in Debian). So I think a reference as you suggested is good enough, since it is similar to other existing cases.

I have checked what we are doing with other README (for example README.kiosk in the same folder) and we are not installing it (at least in Debian). So I think a reference as you suggested is good enough, since it is similar to other existing cases.
VinceR marked this conversation as resolved
*/
bool generatePatternList( TQString newPatternString );
/**
Owner

The whole comment can simply be `Return the pattern string'. APIs hides away the internal implementation.

The whole comment can simply be `Return the pattern string'. APIs hides away the internal implementation.
VinceR commented 2 years ago
Poster
Collaborator

The whole comment can simply be `Return the pattern string'. APIs hides away the internal implementation.

Since I am abandoning storing patternString (now m_patternString) in a private class, I have put it back into the protected: area in the header.

In my opinion, the properties and methods in the protected: area are every much a part of the API as those in the public: area since applications may be subclassing TDEStringMatcher. As such, I would think that any explicit relationships between methods and properties should be documented. For instance:

  /**
      @return a copy of @property m_patternString.
   */
  PatternList getPatternString();

If am thinking about this incorrectly, let me know. Disclaimer: I have never seen the TDE API documentation generated by doxygen. I am not ever sure where it is :\

>The whole comment can simply be `Return the pattern string'. APIs hides away the internal implementation. Since I am abandoning storing `patternString` (now `m_patternString`) in a private class, I have put it back into the `protected:` area in the header. In my opinion, the properties and methods in the `protected:` area are every much a part of the API as those in the `public:` area since applications may be subclassing `TDEStringMatcher`. As such, I would think that any explicit relationships between methods and properties should be documented. For instance: ``` /** @return a copy of @property m_patternString. */ PatternList getPatternString(); ``` If am thinking about this incorrectly, let me know. Disclaimer: I have never seen the TDE API documentation generated by doxygen. I am not ever sure where it is :\
Owner

The whole concept of API is that a user of a library uses the public part of a class (that's the API) to do what he needs without having to worry about how the internal implementation of the library is done. The internal implementation can even change without the user noticing it, as long as the functionality stated in the API remains the same.

Developers who wants to extend the library with extra functionality will need to know the code and in that case they will see the whole picture, internal implementation included. That is the development documentation.

The whole concept of API is that a user of a library uses the public part of a class (that's the API) to do what he needs without having to worry about how the internal implementation of the library is done. The internal implementation can even change without the user noticing it, as long as the functionality stated in the API remains the same. Developers who wants to extend the library with extra functionality will need to know the code and in that case they will see the whole picture, internal implementation included. That is the development documentation.
Owner

I have never seen the TDE API documentation generated by doxygen. I am not ever sure where it is :\

It is normally installed with the respective dev pacakges. But you can also look it up here (the important part at least):
https://www.trinitydesktop.org/apidocs.php

> I have never seen the TDE API documentation generated by doxygen. I am not ever sure where it is :\ It is normally installed with the respective dev pacakges. But you can also look it up here (the important part at least): https://www.trinitydesktop.org/apidocs.php
Owner

Probably still good to mention to refer to README.stringmatcher in the comment, since we have agreed to keep it anyway.

Probably still good to mention to refer to README.stringmatcher in the comment, since we have agreed to keep it anyway.
VinceR marked this conversation as resolved
*/
TQString getPatternString();
/**
Owner

Suggest to use individual comment for matchAny and matchAll, since Doxygen will extract the comments for individual methods.

Suggest to use individual comment for `matchAny` and `matchAll`, since Doxygen will extract the comments for individual methods.
VinceR marked this conversation as resolved
protected:
TQPtrList<TQRegExp> patternList;
Owner

Suggest we rename this to m_patternList. It's convention in TDE to prefix members of a class with m_.

Suggest we rename this to `m_patternList`. It's convention in TDE to prefix members of a class with `m_`.
VinceR marked this conversation as resolved
Owner

As you will see from the TDEStringMatcher code, I did follow your advice and put TQString patternString into a private class instead of declaring it as private: property in the main class. But on second look, this looks contrived and perhaps pointless given how small the private class is. What do you think about this?

I agree with you and as we discussed in #170, we don't need to use a private class for this case.

As you will see from the KFileItem code, I have added 2 protected class variables (one boolean and one an object pointer. I would definitely agree that both of these should be in an associated private class. The problem is that the KFileItemPrivate class seems to be special purpose and instantiated only conditionally in various methods. Should create another private class, use KFileItemPrivate but allocate it early in the KFileItem constructor, or just leave things as they are?

Just leave things as they are. KFileItem has a lot of other private members, so the two new variables fits well where you placed them.

> As you will see from the TDEStringMatcher code, I did follow your advice and put TQString patternString into a private class instead of declaring it as private: property in the main class. But on second look, this looks contrived and perhaps pointless given how small the private class is. What do you think about this? I agree with you and as we discussed in #170, we don't need to use a private class for this case. > As you will see from the KFileItem code, I have added 2 protected class variables (one boolean and one an object pointer. I would definitely agree that both of these should be in an associated private class. The problem is that the KFileItemPrivate class seems to be special purpose and instantiated only conditionally in various methods. Should create another private class, use KFileItemPrivate but allocate it early in the KFileItem constructor, or just leave things as they are? Just leave things as they are. KFileItem has a lot of other private members, so the two new variables fits well where you placed them.
Owner

OVerall the PR is good. Most of the comments are related to either minor improvements or some mistakes that can be easily fixed.

There is one point we should discuss, together with @SlavekB, which is how we specify the patterns. Your solution has some pros but the pattern specification is quite convoluted (although ingenious). I had a different more traditional idea in mind, so I will list pros and cons of both ideas from what I can see and then we can discuss all three together.

Solution in this PR

Pros:

  1. allows to specify multiple patterns in a compact way. This is especially good for saving/restoring the patterns from a config file

Cons:

  1. pattern specification is quite convoluted, a user need to read documentation to ubderstand how to create a pattern
  2. require more work to translate from pattern string to GUI options when we will need to create a GUI for the user to change the pattern.
  3. there is no way to modify the existing pattern list without having to specify all the existing patterns again. Basically there is no addPattern, modifyPattern or removePattern method available. Of course they could be added.
  4. need to maintain two independent variables for the patterns (the string and the list of TQRegExt) and make sure they are always in sync.
Traditional solution

Uses only a list of TQRegEx objects, without the pattern string. Patterns are appended/removed from the list as needed, by calling appropriate methods.

Pros:

  1. simpler concept, much easier to grasp for normal users
  2. should allow easier 1-to-1 correspondance with a GUI dialog
  3. the TQRegEx list is the reference for the patterns. No second entry to update and keep in sync.

Cons:

  1. require more works to save/restore the list from config file.

@VinceR feel free to add pros/cons to both solution as you see them from your point of view. We can then ask @SlavekB what he thinks about both to have a third opinion.

Other than that, as I said at the top, this PR is mostly ready. Well done.

OVerall the PR is good. Most of the comments are related to either minor improvements or some mistakes that can be easily fixed. There is one point we should discuss, together with @SlavekB, which is how we specify the patterns. Your solution has some pros but the pattern specification is quite convoluted (although ingenious). I had a different more traditional idea in mind, so I will list pros and cons of both ideas from what I can see and then we can discuss all three together. ##### Solution in this PR Pros: 1. allows to specify multiple patterns in a compact way. This is especially good for saving/restoring the patterns from a config file Cons: 1. pattern specification is quite convoluted, a user need to read documentation to ubderstand how to create a pattern 2. require more work to translate from pattern string to GUI options when we will need to create a GUI for the user to change the pattern. 3. there is no way to modify the existing pattern list without having to specify all the existing patterns again. Basically there is no `addPattern`, `modifyPattern` or `removePattern` method available. Of course they could be added. 4. need to maintain two independent variables for the patterns (the string and the list of TQRegExt) and make sure they are always in sync. ##### Traditional solution Uses only a list of TQRegEx objects, without the pattern string. Patterns are appended/removed from the list as needed, by calling appropriate methods. Pros: 1. simpler concept, much easier to grasp for normal users 2. should allow easier 1-to-1 correspondance with a GUI dialog 3. the TQRegEx list is the reference for the patterns. No second entry to update and keep in sync. Cons: 1. require more works to save/restore the list from config file. @VinceR feel free to add pros/cons to both solution as you see them from your point of view. We can then ask @SlavekB what he thinks about both to have a third opinion. Other than that, as I said at the top, this PR is mostly ready. Well done.
VinceR commented 2 years ago
Poster
Collaborator

Before I deal with the individual conversation items, I think it is worthwhile to discuss interface design and why I decided to approach things the way I did. I see three separate but related components that are required for a complete solution.

  1. Core string matching engine, currently implemented in TDEStringMatcher, includes:
    • List of match patterns and associated options. Currently implemented as TQPtrList<TQRegExp> patternList
    • Functions to manage items in the list. Currently implemented with the single all-at-once function generatePatternList()
    • Functions to perform string matching operations against the list. Currently implemented with matchAny() and matchAll()
  2. Methods for converting a match pattern list to and from a string suitable for storage. Currently integrated into core string matching engine with respective functions getPatternString() and generatePatternList().
  3. A UI for modifying the match pattern list, either temporarily or permanently. This was formerly integrated with the core string matching engine but was moved to a separate class TDEStringMatcherUI (to be discussed / reviewed later).

I thought that it was important to integrate standardized conversion functions (component # 2) into the core string matching engine. This makes it very easy for applications that need to store and retrieve match settings. They do not need to know anything about patternString syntax to initialize a TDEStringMatcher.

The UI does need to be concerned with details about match patterns and their corresponding options. With the current implementation, it needs to generate a dialog either with direct processing of match pattern list items, or by decoding patternString. On the other hand, a user that uses the UI will never need to understand patternString string syntax.

The current patternString syntax is designed to permit the future addition of other match options. It also permits (but does not require) a more compact (less redundant) specification of pattern options. I would also argue that it is quite easy to understand once explained. It is certainly simpler than understanding regex syntax.

Now in response to your pros & cons:

Solution in this PR

Pros:

  1. allows to specify multiple patterns in a compact way. This is especially good for saving/restoring the patterns from a config file

I consider this to be critical for the original purpose: hidden file matching. It's all going to start and end with config file settings.

Of course we could move this type of code back to applications & views, letting each one do things its own way. But I personally prefer some uniformity so when we look at different configurations, we only need to know one set of rules.

Cons:

  1. pattern specification is quite convoluted, a user need to read documentation to ubderstand how to create a pattern

Users don't need to read anything if they use the UI. They will if they want to edit configuration files directly, but we don't encourage that anyway.

  1. require more work to translate from pattern string to GUI options when we will need to create a GUI for the user to change the pattern.

Not so much work. TDEStringMatcherUI can present a simplified view of "patterns" & "options" in a way that does not require knowledge of TQRegExp properties.

  1. there is no way to modify the existing pattern list without having to specify all the existing patterns again. Basically there is no addPattern, modifyPattern or removePattern method available. Of course they could be added.

For the relatively simple case of hidden file matching, I think setting all patterns at once is sufficient. But you are correct, those could be added for more complex cases. We could also add a setPatterns() for replacing the entire patternList with the content of another TQPtrList<TQRegExp>.

If these additional pattern list modification functions are added, it will become necessary to make getPatternString() actually decode the patternList into a corresponding patternString

  1. need to maintain two independent variables for the patterns (the string and the list of TQRegExt) and make sure they are always in sync.

With current implementation, that's not an issue. The string is simply of the copy of the one used in the last successful call to generatePatternList(). Or the suggested additional pattern list modification functions are added, the string becomes a cached copy of the most recent call to getPatternString().

Before I deal with the individual conversation items, I think it is worthwhile to discuss interface design and why I decided to approach things the way I did. I see three separate but related components that are required for a complete solution. 1. Core string matching engine, *currently implemented in `TDEStringMatcher`*, includes: * List of match patterns and associated options. *Currently implemented as `TQPtrList<TQRegExp> patternList`* * Functions to manage items in the list. *Currently implemented with the single all-at-once function `generatePatternList()`* * Functions to perform string matching operations against the list. *Currently implemented with `matchAny()` and `matchAll()`* 2. Methods for converting a match pattern list **to** and **from** a string suitable for storage. Currently integrated into core string matching engine with respective functions `getPatternString()` and `generatePatternList()`. 3. A UI for modifying the match pattern list, either temporarily or permanently. *This was formerly integrated with the core string matching engine but was moved to a separate class `TDEStringMatcherUI` (to be discussed / reviewed later).* I thought that it was important to integrate standardized conversion functions (component # 2) into the core string matching engine. This makes it very easy for applications that need to store and retrieve match settings. They do not need to know anything about `patternString` syntax to initialize a `TDEStringMatcher`. The UI does need to be concerned with details about match patterns and their corresponding options. With the current implementation, it needs to generate a dialog either with direct processing of match pattern list items, or by decoding `patternString`. On the other hand, a user that uses the UI will never need to understand `patternString` string syntax. The current `patternString` syntax is designed to permit the future addition of other match options. It also permits (but does not require) a more compact (less redundant) specification of pattern options. I would also argue that it is quite easy to understand once explained. It is certainly simpler than understanding regex syntax. Now in response to your pros & cons: > > ##### Solution in this PR > > Pros: > 1. allows to specify multiple patterns in a compact way. This is especially good for saving/restoring the patterns from a config file > I consider this to be critical for the original purpose: hidden file matching. It's all going to start and end with config file settings. Of course we could move this type of code back to applications & views, letting each one do things its own way. But I personally prefer some uniformity so when we look at different configurations, we only need to know one set of rules. > Cons: > 1. pattern specification is quite convoluted, a user need to read documentation to ubderstand how to create a pattern Users don't need to read anything if they use the UI. They will if they want to edit configuration files directly, but we don't encourage that anyway. > 2. require more work to translate from pattern string to GUI options when we will need to create a GUI for the user to change the pattern. Not so much work. `TDEStringMatcherUI` can present a simplified view of "patterns" & "options" in a way that does not require knowledge of `TQRegExp` properties. > 3. there is no way to modify the existing pattern list without having to specify all the existing patterns again. Basically there is no `addPattern`, `modifyPattern` or `removePattern` method available. Of course they could be added. For the relatively simple case of hidden file matching, I think setting all patterns at once is sufficient. But you are correct, those could be added for more complex cases. We could also add a `setPatterns()` for replacing the entire `patternList` with the content of another `TQPtrList<TQRegExp>`. If these additional pattern list modification functions are added, it will become necessary to make `getPatternString()` actually decode the `patternList` into a corresponding `patternString` > 4. need to maintain two independent variables for the patterns (the string and the list of TQRegExt) and make sure they are always in sync. With current implementation, that's not an issue. The string is simply of the copy of the one used in the last successful call to `generatePatternList()`. Or the suggested additional pattern list modification functions are added, the string becomes a cached copy of the most recent call to `getPatternString()`.
Owner

Before I deal with the individual conversation items,

Let me think a bit longer on this and the pattern string thing. Will write back in a day or two.

> Before I deal with the individual conversation items, Let me think a bit longer on this and the pattern string thing. Will write back in a day or two.
Owner

Before I deal with the individual conversation items,

After further consideration, I think your approach is fine, although we should make a couple of refinements.

  1. make the options non-persistent across patterns. This means every pattern will have its own options, like "/ow/p.*/ow/pa?c.txt" for example.
    The reasons to do so is simply to simplify pattern changes. A user may want to add/remove additional patterns to the existing ones, so by making each pattern independent from the others, those operations become much simpler. Otherwise each time would be necessary to parse the whole string, keep memory of the options and adapt the string as required.

  2. add two methods addPattern and removePattern to append/remove an indivual option+pattern to the existing pattern lists.
    Btw, the setPatterns() function that you proposed would be the same as generatePatternList(), so we could as well rename generatePatternList() to setPatternString() which would also match with the corresponding getPatternString() getter method.

What do you think?

> Before I deal with the individual conversation items, After further consideration, I think your approach is fine, although we should make a couple of refinements. 1. make the options non-persistent across patterns. This means every pattern will have its own options, like "/ow/p.*/ow/pa?c.txt" for example. The reasons to do so is simply to simplify pattern changes. A user may want to add/remove additional patterns to the existing ones, so by making each pattern independent from the others, those operations become much simpler. Otherwise each time would be necessary to parse the whole string, keep memory of the options and adapt the string as required. 2. add two methods `addPattern` and `removePattern` to append/remove an indivual option+pattern to the existing pattern lists. Btw, the `setPatterns()` function that you proposed would be the same as `generatePatternList()`, so we could as well rename `generatePatternList()` to `setPatternString()` which would also match with the corresponding `getPatternString()` getter method. What do you think?
VinceR commented 2 years ago
Poster
Collaborator

make the options non-persistent across patterns. This means every pattern will have its own options, like "/ow/p.*/ow/pa?c.txt" for example.

There is nothing that prohibits a pattern string from being like that, so let's consider the ramifications of of making this mandatory:

  1. Do we therefore require that a patternString always have an even number of parts (option specs followed by pattern specs)?
  2. Do we therefore require that all known options be specified for each pattern or can we assume that the absence of an option specification has a fixed default (e.g. the TQRegExp default setting).
  3. What happens when we implement a new option (with new option characters)?

I think part of your concern may come from thinking that the UI will have to screw around directly with an arcane patternString. I don't think that needs to be the case. The UI can deal directly with a copy of patternList (or an array derived from that) and communicate the results back to the hidden file matcher.

A user may want to add/remove additional patterns to the existing ones
add two methods addPattern and removePattern to append/remove an indivual option+pattern to the existing pattern lists

I would be reluctant for a UI to have direct access to protected member patternList. If it is a copy that is being processed, then certainly those methods could be implemented within the UI.

While I have been implementing suggestions in this PR, I also made some other changes to the code to support a UI:

typedef TQValueList<TQRegExp> PatternList;
  // Use typedef to help economy of expression
  // It's now a value list, not a pointer list
PatternList getPatternList();
  // Return a copy of m_patternList, intended for use by UI
TQString generatePatternString( PatternList& patternList );
  // Reverse of generatePatternList()
  // Intended for use by a UI on its own copy after user modifications as follows:
  // generatePatternList( generatePatternString( UIPatternListCopy ) )
PatternList m_patternList;
  // main pattern store, renamed as recommended

Btw, the setPatterns() function that you proposed would be the same as generatePatternList(), so we could as well rename generatePatternList() to setPatternString() which would also match with the corresponding getPatternString() getter method.

I have been using the generate* naming because it connotes a process of transforming a one type of entity (e.g, string) to another (e.g. list of TQRegExp). To me, set* and get* connote copying of variables of the same type. I guess I could use names like patternString2List and patternList2String instead of generate*.

>make the options non-persistent across patterns. This means every pattern will have its own options, like "/ow/p.*/ow/pa?c.txt" for example. There is nothing that prohibits a pattern string from being like that, so let's consider the ramifications of of making this *mandatory*: 1. Do we therefore *require* that a `patternString` always have an even number of parts (option specs followed by pattern specs)? 2. Do we therefore *require* that all known options be specified for each pattern or can we assume that the absence of an option specification has a fixed default (e.g. the TQRegExp default setting). 3. What happens when we implement a new option (with new option characters)? I think part of your concern may come from thinking that the UI will have to screw around directly with an arcane `patternString`. I don't think that needs to be the case. The UI can deal directly with a copy of `patternList` (or an array derived from that) and communicate the results back to the hidden file matcher. >A user may want to add/remove additional patterns to the existing ones >add two methods `addPattern` and `removePattern` to append/remove an indivual option+pattern to the existing pattern lists I would be reluctant for a UI to have direct access to protected member `patternList`. If it is a copy that is being processed, then certainly those methods could be implemented within the UI. While I have been implementing suggestions in this PR, I also made some other changes to the code to support a UI: ``` typedef TQValueList<TQRegExp> PatternList; // Use typedef to help economy of expression // It's now a value list, not a pointer list PatternList getPatternList(); // Return a copy of m_patternList, intended for use by UI TQString generatePatternString( PatternList& patternList ); // Reverse of generatePatternList() // Intended for use by a UI on its own copy after user modifications as follows: // generatePatternList( generatePatternString( UIPatternListCopy ) ) PatternList m_patternList; // main pattern store, renamed as recommended ``` >Btw, the setPatterns() function that you proposed would be the same as generatePatternList(), so we could as well rename generatePatternList() to setPatternString() which would also match with the corresponding getPatternString() getter method. I have been using the `generate*` naming because it connotes a process of transforming a one type of entity (e.g, string) to another (e.g. list of `TQRegExp`). To me, `set*` and `get*` connote copying of variables of the same type. I guess I could use names like `patternString2List` and `patternList2String` instead of `generate*`.
Owner
  1. Do we therefore require that a patternString always have an even number of parts (option specs followed by pattern specs)?

yes, each "piece" of the pattern would have options + pattern string.

  1. Do we therefore require that all known options be specified for each pattern or can we assume that the absence of an option specification has a fixed default (e.g. the TQRegExp default setting).

Nope, it can be handle as of now, specifying only the options to be active , while the inactive ones are not specified. The only difference is that we will need to repeat the options for each "piece" of the string if needed.

  1. What happens when we implement a new option (with new option characters)?

If we specify the option in the "piece", it will be set, otherwise there is no need to specify it. So all existing patterns would keep working.

I think part of your concern may come from thinking that the UI will have to screw around directly with an arcane patternString.

The UI shouldn't have to do anything too complicated. Just add or remove "pieces" to the pattern string, or replace the string with a new one at once in the worst case.

I would be reluctant for a UI to have direct access to protected member patternList.

UI won't have access to the internals of the implementation. You may have a string like (simplified, not literal) pattern1:pattern2:pattern3 and the UI may say removePattern(pattern2) and addPattern(pattern4), resulting in the string pattern1:pattern3:pattern4. By having each piece to be independent from previous/following pieces of the pattern, such operations are much easier to implement and less error-prone.

I have been using the generate* naming because it connotes a process of transforming a one type of entity (e.g, string) to another (e.g. list of TQRegExp).

Highly suggest to use get/set names when two functions work in reverse for the same functionality. Getters and setters are pretty much standard practice when it comes to get or set values. They don't simply copy the values, they can do validation of the fields, reject setting wrong values and add encapsulation to the inner variables.

EDIT: having said that, it is just a suggestion, so feel free to keep current names if you wish.

> 1. Do we therefore *require* that a `patternString` always have an even number of parts (option specs followed by pattern specs)? yes, each "piece" of the pattern would have options + pattern string. > 2. Do we therefore *require* that all known options be specified for each pattern or can we assume that the absence of an option specification has a fixed default (e.g. the TQRegExp default setting). Nope, it can be handle as of now, specifying only the options to be active , while the inactive ones are not specified. The only difference is that we will need to repeat the options for each "piece" of the string if needed. > 3. What happens when we implement a new option (with new option characters)? If we specify the option in the "piece", it will be set, otherwise there is no need to specify it. So all existing patterns would keep working. > I think part of your concern may come from thinking that the UI will have to screw around directly with an arcane `patternString`. The UI shouldn't have to do anything too complicated. Just add or remove "pieces" to the pattern string, or replace the string with a new one at once in the worst case. > I would be reluctant for a UI to have direct access to protected member `patternList`. UI won't have access to the internals of the implementation. You may have a string like (simplified, not literal) `pattern1:pattern2:pattern3` and the UI may say `removePattern(pattern2)` and `addPattern(pattern4)`, resulting in the string `pattern1:pattern3:pattern4`. By having each piece to be independent from previous/following pieces of the pattern, such operations are much easier to implement and less error-prone. > I have been using the `generate*` naming because it connotes a process of transforming a one type of entity (e.g, string) to another (e.g. list of `TQRegExp`). Highly suggest to use get/set names when two functions work in reverse for the same functionality. Getters and setters are pretty much standard practice when it comes to get or set values. They don't simply copy the values, they can do validation of the fields, reject setting wrong values and add encapsulation to the inner variables. EDIT: having said that, it is just a suggestion, so feel free to keep current names if you wish.
VinceR commented 2 years ago
Poster
Collaborator

Continuing the conversation on pattern string format:

I am striving to put all of the logic for encoding / decoding pattern strings exclusively into TDEStringMatcher implementation.

  • Applications should only need to get a pattern string from the object, store/retrieve it to/from disk, and send the string to the back to the object for (re)building the internal pattern list.
  • The UI can be provided a list or array of either TQRegExp objects or of structs containing just the information necessary to create its dialog.
  • Users who really want to edit configuration files can consult the readme, and if they mess things up, they can fix things in the UI.

That said, I am not against your idea of formating the pattern string as option spec, pattern spec, [option spec, pattern spec, …]. Doing so would eliminate the need to explicitly prefix each specification with o or p. We can debate the merits of whether or not each option spec should be fully populated with known option characters, or if not, whether "missing" option characters are inferred from a previous option spec versus a standard default.


In my original implementation, applications could explicitly set the character to be used for splitting up the pattern string. In the current implementation, the 1st character of the pattern string determines the splitter.

I am now considering making the splitter a fixed character that satisfies the following characteristics:

  • Character must be one that could reasonably be prohibited from use in a match pattern.
  • TDE won't try to backslash the character when storing in a configuration file.
  • Character needs to be visible and selectable in various text editors and less.
  • Presence of character in text files won't cause them to be treated as binary files.

The control character Vertical Tab (0x0B) looks like a good candidate:

  • It appears as ^K in less, nano, vi, and emacs.
  • It appears as the I-Don't-Know-How-To-Display-That rectangle in kwrite and kate.
  • It appears as the symbold VT in geany.
  • I have verified that strings containing it can be stored & retrieved without alteration in a TDE config file.

What do you think of the idea generally (standard separator character) and of my proposed choice of Vertical Tab as that character?

Continuing the conversation on pattern string format: I am striving to put all of the logic for encoding / decoding pattern strings exclusively into `TDEStringMatcher` implementation. * Applications should only need to get a pattern string from the object, store/retrieve it to/from disk, and send the string to the back to the object for (re)building the internal pattern list. * The UI can be provided a list or array of either `TQRegExp` objects or of structs containing just the information necessary to create its dialog. * Users who really want to edit configuration files can consult the readme, and if they mess things up, they can fix things in the UI. That said, I am not against your idea of formating the pattern string as `option spec, pattern spec, [option spec, pattern spec, …]`. Doing so would eliminate the need to explicitly prefix each specification with `o` or `p`. We can debate the merits of whether or not each option spec should be fully populated with known option characters, or if not, whether "missing" option characters are inferred from a previous option spec versus a standard default. ----- In my original implementation, applications could explicitly set the character to be used for splitting up the pattern string. In the current implementation, the 1st character of the pattern string determines the splitter. I am now considering making the splitter a fixed character that satisfies the following characteristics: * Character must be one that could reasonably be prohibited from use in a match pattern. * TDE won't try to backslash the character when storing in a configuration file. * Character needs to be visible and selectable in various text editors and `less`. * Presence of character in text files won't cause them to be treated as binary files. The control character Vertical Tab (`0x0B`) looks like a good candidate: * It appears as `^K` in `less`, `nano`, `vi`, and `emacs`. * It appears as the *I-Don't-Know-How-To-Display-That* rectangle in `kwrite` and `kate`. * It appears as the symbold `VT` in `geany`. * I have verified that strings containing it can be stored & retrieved without alteration in a TDE config file. What do you think of the idea generally (standard separator character) and of my proposed choice of Vertical Tab as that character?
Owner

Hi @VinceR,
apologies for the late reply, somehow I missed your message and only spotted it today.

Applications should only need to get a pattern string from the object, store/retrieve it to/from disk

Sounds good.

The UI can be provided a list or array of either TQRegExp objects or of structs containing just the information necessary to create its dialog.

I am not against this, but keep in mind that if the UI has access to the underlying internal structure (for example TQRegExp objects) then we probably need extra code to make sure the pattern string and the internal structure are always in sync (i.e. if the UI changes the internal objects, then we need to re-derive the pattern string). In light of that, I think it would be simpler for the UI to interact with the pattern string only, so that we don't have to add extra syncing code.

Doing so would eliminate the need to explicitly prefix each specification with o or p.

Nice, I had not thought about it. That makes things even simpler. For example a string could be something like |wp.*|wpa?c.txt, where the o is gone. The p is still used to mark the beginning of the pattern string. Regarding the various options, we can assume that those not specified will use a default value. For wildcard-regex we can default to wild card, so a simple use case would be for example |p.*, which would filter all dot files.

The control character Vertical Tab (0x0B) looks like a good candidate:

Uhm, many users may not even know what a vertical tab is, nor how to type it 😰 I think we should look at a character which has all the characteristics that you listed above but is common and visible. How about |? It's the pipe character in unix, so it is extremely unlikely that a user will use that as part of a file name.
Btw I liked your idea of being able to specify the splitting character. Why don't we make something like this:

  1. if the first character of the overall pattern string is s, the second character will specify the splitting char to use
  2. if the first character of the overall pattern string is NOT s, then the splitting char will default to | and the first character will be interpreted as a normal option.

This way we have a simple default but still offer the opportunity to customize the splitting character if needed. For example

  1. wp.*|wpa?c.txt --> default split char is |
  2. s/wp.*/wpa?c.txt --> split char is /
  3. p.* --> minimal pattern string, default to wildcard (filter dot files in this example)

What do you think about this?

Hi @VinceR, apologies for the late reply, somehow I missed your message and only spotted it today. > Applications should only need to get a pattern string from the object, store/retrieve it to/from disk Sounds good. > The UI can be provided a list or array of either TQRegExp objects or of structs containing just the information necessary to create its dialog. I am not against this, but keep in mind that if the UI has access to the underlying internal structure (for example TQRegExp objects) then we probably need extra code to make sure the pattern string and the internal structure are always in sync (i.e. if the UI changes the internal objects, then we need to re-derive the pattern string). In light of that, I think it would be simpler for the UI to interact with the pattern string only, so that we don't have to add extra syncing code. > Doing so would eliminate the need to explicitly prefix each specification with o or p. Nice, I had not thought about it. That makes things even simpler. For example a string could be something like `|wp.*|wpa?c.txt`, where the `o` is gone. The `p` is still used to mark the beginning of the pattern string. Regarding the various options, we can assume that those not specified will use a default value. For wildcard-regex we can default to wild card, so a simple use case would be for example `|p.*`, which would filter all dot files. > The control character Vertical Tab (0x0B) looks like a good candidate: Uhm, many users may not even know what a vertical tab is, nor how to type it 😰 I think we should look at a character which has all the characteristics that you listed above but is common and visible. How about `|`? It's the pipe character in unix, so it is extremely unlikely that a user will use that as part of a file name. Btw I liked your idea of being able to specify the splitting character. Why don't we make something like this: 1. if the first character of the overall pattern string is `s`, the second character will specify the splitting char to use 2. if the first character of the overall pattern string is NOT `s`, then the splitting char will default to `|` and the first character will be interpreted as a normal option. This way we have a simple default but still offer the opportunity to customize the splitting character if needed. For example 1. `wp.*|wpa?c.txt` --> default split char is `|` 2. `s/wp.*/wpa?c.txt` --> split char is `/` 3. `p.*` --> minimal pattern string, default to wildcard (filter dot files in this example) What do you think about this?
VinceR commented 2 years ago
Poster
Collaborator

The UI can be provided a list or array of either TQRegExp objects or of structs containing just the information necessary to create its dialog.

I am not against this, but keep in mind that if the UI has access to the underlying internal structure (for example TQRegExp objects)...

The more I think about this, the more I favor passing implementation-agnostic structures to the UI. Someday we may want to change the TQRegExp implementation to something better (e.g. a port of QRegularExpression or our own wrapper around PCRE2).

...then we probably need extra code to make sure the pattern string and the internal structure are always in sync

Already done, in revised code to be uploaded.

Uhm, many users may not even know what a vertical tab is, nor how to type it 😰

Users will never need to know either -- and that is by design. In fact, we want to make it very difficult to accidently type this internal-use-only character into a regex. If somebody wants to specify a vertical tab in a search pattern, they can use \v.

Regarding rest of your suggestion: | is not a good default splitter since that's the regex alternation character. I had originally thought of using the horizontal tab character but TDE converts that to \t when storing a multi-pattern string.


I think I have implemented almost all of the other suggestions made in conversation items although I need to double-check. I need to fine-tune some of the new stuff I added in support of a UI, then I will be ready to upload a revised version of the code.

Although I know that you have wanted to focus on core TDEStringMatcher functionality, I think think everthing I have done may make more sense if evaluated in a more complete context: KonqListView working with KDirLister and a decent UI for changing patterns. I am overdue for re-doing the UI to accommodate the additional flexibility that comes with setting per-pattern options. My old UI works for my testing but only because I am familiar with patternString rules.

>>The UI can be provided a list or array of either TQRegExp objects or of structs containing just the information necessary to create its dialog. >I am not against this, but keep in mind that if the UI has access to the underlying internal structure (for example TQRegExp objects)... The more I think about this, the more I favor passing implementation-agnostic structures to the UI. Someday we may want to change the `TQRegExp` implementation to something better (e.g. a port of [QRegularExpression]( https://dangelog.wordpress.com/2012/04/07/qregularexpression/) or our own wrapper around PCRE2). >...then we probably need extra code to make sure the pattern string and the internal structure are always in sync Already done, in revised code to be uploaded. >Uhm, many users may not even know what a vertical tab is, nor how to type it 😰 Users will never need to know either -- and that is by design. In fact, we want to make it very difficult to accidently type this internal-use-only character into a regex. If somebody wants to specify a vertical tab in a search pattern, they can use `\v`. Regarding rest of your suggestion: `|` is not a good default splitter since that's the regex alternation character. I had originally thought of using the horizontal tab character but TDE converts that to `\t` when storing a multi-pattern string. ----- I think I have implemented almost all of the other suggestions made in conversation items although I need to double-check. I need to fine-tune some of the new stuff I added in support of a UI, then I will be ready to upload a revised version of the code. Although I know that you have wanted to focus on core `TDEStringMatcher` functionality, I think think everthing I have done may make more sense if evaluated in a more complete context: `KonqListView` working with `KDirLister` and a decent UI for changing patterns. I am overdue for re-doing the UI to accommodate the additional flexibility that comes with setting per-pattern options. My old UI works for my testing but only because I am familiar with `patternString` rules.
Owner

The more I think about this, the more I favor passing implementation-agnostic structures to the UI. Someday we may want to change the TQRegExp implementation to something better

Exactly the point :-)

Regarding rest of your suggestion: | is not a good default splitter since that's the regex alternation character.

Very good point, didn't think about it. | is not a good choice at all.
The only problem I see with vertical tab is that if a user edit a config file manually in an editor, it will show up as a broken string (going to next line), so the user may collapse the string in a single row, effectively losing the vertical tab. Anyway I see you point on users not needing to edit manually in most cases, so we may as well use vertical tab, unless we find a better candidate (@SlavekB any alternative suggestion?)

Although I know that you have wanted to focus on core TDEStringMatcher functionality

Feel free to upload all the code if you wish, but the order of review will be the same :-) If possible use separate commits for eahc part (TDEStringMatcher, KDirLister, KonqListView, UI, ...) so it will be easier to review for me.

> The more I think about this, the more I favor passing implementation-agnostic structures to the UI. Someday we may want to change the TQRegExp implementation to something better Exactly the point :-) > Regarding rest of your suggestion: | is not a good default splitter since that's the regex alternation character. Very good point, didn't think about it. `|` is not a good choice at all. The only problem I see with vertical tab is that if a user edit a config file manually in an editor, it will show up as a broken string (going to next line), so the user may collapse the string in a single row, effectively losing the vertical tab. Anyway I see you point on users not needing to edit manually in most cases, so we may as well use vertical tab, unless we find a better candidate (@SlavekB any alternative suggestion?) > Although I know that you have wanted to focus on core TDEStringMatcher functionality Feel free to upload all the code if you wish, but the order of review will be the same :-) If possible use separate commits for eahc part (TDEStringMatcher, KDirLister, KonqListView, UI, ...) so it will be easier to review for me.
VinceR commented 2 years ago
Poster
Collaborator

Hi @MicheleC,

Circumstances this past month forced my attention away from this PR but I am now back at it. Sorry for the long gap.

Vince

Hi @MicheleC, Circumstances this past month forced my attention away from this PR but I am now back at it. Sorry for the long gap. Vince
Owner

Circumstances this past month forced my attention away from this PR but I am now back at it. Sorry for the long gap.

No worries @VinceR, we are all here in our spare time. No need to apologize for that :-)

> Circumstances this past month forced my attention away from this PR but I am now back at it. Sorry for the long gap. No worries @VinceR, we are all here in our spare time. No need to apologize for that :-)
VinceR added 1 commit 1 year ago
46fedcbe96
Introduced an implementation-agnostic interface to TSM centered around
VinceR commented 1 year ago
Poster
Collaborator

MicheleC,

After such a long absence, I finally have some updates. One of the problems with this amount of time passing is that it has given me ample opportunity to brainstorm and implement "improvements". Because of this, the code has changed enough that you will probably want to go through another review cycle - sorry about that.

As before, please ignore the copious TSMTRACE statements that I will remove before this gets pushed to the main branch.

All of the changes in this commit have been tested along with other changes to tdelibs/tdeio, tdebase/libkonq, and tdebase/konqueror that are needed to create a fully functional testing environment. This includes a rewritten UI for setting match patterns and associated options.

Vince

MicheleC, After such a long absence, I finally have some updates. One of the problems with this amount of time passing is that it has given me ample opportunity to brainstorm and implement "improvements". Because of this, the code has changed enough that you will probably want to go through another review cycle - sorry about that. As before, please ignore the copious TSMTRACE statements that I will remove before this gets pushed to the main branch. All of the changes in this commit have been tested along with other changes to `tdelibs/tdeio`, `tdebase/libkonq`, and `tdebase/konqueror` that are needed to create a fully functional testing environment. This includes a rewritten UI for setting match patterns and associated options. Vince
Owner

Hi @VinceR ,
thanks for the updated code. I will do a new review cycle next week (this weekend is Chinese New Year and I will be traveling) and feedback as usual.

Hi @VinceR , thanks for the updated code. I will do a new review cycle next week (this weekend is Chinese New Year and I will be traveling) and feedback as usual.
Owner

Hi @VinceR
apologies for the delay, got busy with other stuff. I will start the review tomorrow or Wednesday at most.
I wanted to let you know that since the changes in this PR are quite at the core of tdelibs, the PR won't make it in the final R14.1.0 which is due out at the end of April. We are too close to that date to make such kind of change. But there will be plenty of time to add it to R14.2.0 (we are also putting back some other stuff for the same reason).

Hi @VinceR apologies for the delay, got busy with other stuff. I will start the review tomorrow or Wednesday at most. I wanted to let you know that since the changes in this PR are quite at the core of tdelibs, the PR won't make it in the final R14.1.0 which is due out at the end of April. We are too close to that date to make such kind of change. But there will be plenty of time to add it to R14.2.0 (we are also putting back some other stuff for the same reason).
MicheleC requested changes 1 year ago
MicheleC left a comment
Owner

First part of the review done. I will continue tomorrow with the rest.

First part of the review done. I will continue tomorrow with the rest.
* Desired outcome of matching
TRUE: match succeeds if a string matches the match pattern.
FALSE: match fails if a string matches the match pattern.
Owner

Suggest rewording:
FALSE: match succeeds if a string does not match the match pattern.

Suggest rewording: `FALSE: match succeeds if a string does not match the match pattern.`
VinceR marked this conversation as resolved
'i' - Letter case variants are equivalent (e.g. case-insensitive)
'e' - All letter & number character variants are equivalent
'=' - Match succeeds if pattern matches [default]
'!' - Match fails if pattern matches (inverted match)
Owner

Suggested rewording:
'!' - Match succeeds if pattern does not match (inverted match)

Suggested rewording: '!' - Match succeeds if pattern does not match (inverted match)
VinceR marked this conversation as resolved
'=' - Match succeeds if pattern matches [default]
'!' - Match fails if pattern matches (inverted match)
Options set in option string remain in effect until subsequently overridden.
Owner

As commented in the past, it is better to not carry over options across different patterns. First it makes the code easier, but most important it makes patterns independent from each other.
If options are carried over, swapping pieces around could alter the meaning of patterns.

Also what happens if we start with "rc" as defaults and the first option is just "i"? It would result in an 'optionString' of 'rci' which is invalid.

As commented in the past, it is better to not carry over options across different patterns. First it makes the code easier, but most important it makes patterns independent from each other. If options are carried over, swapping pieces around could alter the meaning of patterns. Also what happens if we start with "rc" as defaults and the first option is just "i"? It would result in an 'optionString' of 'rci' which is invalid.
VinceR commented 1 year ago
Poster
Collaborator

With the understanding that further discussion and resolution of this particular concer will be taking place in a future discussion:

Also what happens if we start with "rc" as defaults and the first option is just "i"? It would result in an 'optionString' of 'rci' which is invalid.

Not invalid, 'i' wins!

With the understanding that further discussion and resolution of this particular concer will be taking place in a future discussion: > Also what happens if we start with "rc" as defaults and the first option is just "i"? It would result in an 'optionString' of 'rci' which is invalid. Not invalid, 'i' wins!
Owner

Indeed 'i' wins, my bad for not getting that :-(

With the understanding that further discussion and resolution of this particular concer will be taking place in a future discussion:

Let's park this point of the discussion for the time being. Later we can involved Slavek and see what he thinks about it too, but first let's settle the remianing point.

Indeed 'i' wins, my bad for not getting that :-( > With the understanding that further discussion and resolution of this particular concer will be taking place in a future discussion: Let's park this point of the discussion for the time being. Later we can involved Slavek and see what he thinks about it too, but first let's settle the remianing point.
VinceR commented 1 year ago
Poster
Collaborator

As commented in the past, it is better to not carry over options across different patterns. First it makes the code easier, but most important it makes patterns independent from each other.
If options are carried over, swapping pieces around could alter the meaning of patterns.

In an effort to settle this point, README.tdestringmatcher has been amended in the newly uploaded code. There are no more claims to support pattern option carryover although that is what happens in the code.

>As commented in the past, it is better to not carry over options across different patterns. First it makes the code easier, but most important it makes patterns independent from each other. If options are carried over, swapping pieces around could alter the meaning of patterns. In an effort to settle this point, `README.tdestringmatcher` has been amended in the newly uploaded code. There are no more claims to support pattern option carryover although that is what happens in the code.
Owner

Current README description is clearer, thanks for updating that.

Current README description is clearer, thanks for updating that.
MicheleC marked this conversation as resolved
The following is an example of a string representing a match specification list
intended to apply to file names
w .* e e* cr ~$ \\.[0-9]+
Owner

Once again, vertical tab is hard to understand and type. Yes, users may not need to know/type it, but we should try to make things easy for those who want to.
IMO, the example string is not easily undestandable, even with gitea showing the unicode values of VT rather than the character itself. Requires some thinking :-)

Once again, vertical tab is hard to understand and type. Yes, users may not need to know/type it, but we should try to make things easy for those who want to. IMO, the example string is not easily undestandable, even with gitea showing the unicode values of VT rather than the character itself. Requires some thinking :-)
VinceR commented 1 year ago
Poster
Collaborator

With the understanding that further discussion and resolution of this particular concern will be taking place in a future discussion:

Once again, vertical tab is hard to understand and type. Yes, users may not need to know/type it, but we should try to make things easy for those who want to.

I would like to direct your attention to ~/.qt/qtrc, a perfect example of a configuration file that can be edited with a text editor but is not designed for that purpose. It takes a bit of research to understand what all those comma-separated values in the font specification actually mean and I never could find the decoder ring for the palette stuff:)

IMO, the example string is not easily undestandable, even with gitea showing the unicode values of VT rather than the character itself. Requires some thinking :-)

I must admit that gitea's representation of VT is an eyesore. Text editors do a better job for of representing the character for purposes of copy/paste.

With the understanding that further discussion and resolution of this particular concern will be taking place in a future discussion: >Once again, vertical tab is hard to understand and type. Yes, users may not need to know/type it, but we should try to make things easy for those who want to. I would like to direct your attention to `~/.qt/qtrc`, a perfect example of a configuration file that can be edited with a text editor but is not designed for that purpose. It takes a bit of research to understand what all those comma-separated values in the font specification actually mean and I never could find the decoder ring for the palette stuff:) >IMO, the example string is not easily undestandable, even with gitea showing the unicode values of VT rather than the character itself. Requires some thinking :-) I must admit that gitea's representation of VT is an eyesore. Text editors do a better job for of representing the character for purposes of copy/paste.
Owner

We probably need to involve Slavek on this as well. There are definitely config files that have complex stuff in them, I have seen all sort of things over the years, but usually things are "linear". What I mean by that is that things are complex but arranged in lines or categories. When things are broken up on multiple lines with no visible indication, users who are not aware of a VT will be confused, because they would probably expect a "normal" config file.

Let me try to explain things with an example. If the file is something like this (the line break being VT in reality):

rc
  *.txt
       i
        *.bak

a general user may simply think that things are on different lines and align the config for clarity as follow:

rc
*.txt
i
*.bak

which would break the whole functionality.

We probably need to involve Slavek on this as well. There are definitely config files that have complex stuff in them, I have seen all sort of things over the years, but usually things are "linear". What I mean by that is that things are complex but arranged in lines or categories. When things are broken up on multiple lines with no visible indication, users who are not aware of a VT will be confused, because they would probably expect a "normal" config file. Let me try to explain things with an example. If the file is something like this (the line break being VT in reality): ``` rc *.txt i *.bak ``` a general user may simply think that things are on different lines and align the config for clarity as follow: ``` rc *.txt i *.bak ``` which would break the whole functionality.
Owner

HT looks better that VT and using \t in the string also avoid the rish of a user wrongly deleting or missing the tab character.

`HT` looks better that `VT` and using `\t` in the string also avoid the rish of a user wrongly deleting or missing the tab character.
MicheleC marked this conversation as resolved
TSMTRACE << "TDEGlobal::hiddenFileMatcher(): Global HFM initialization STARTED" << endl;
_hiddenFileMatcher = new TDEStringMatcher();
TDEGlobal::config()->setGroup( "General" );
char settingsDefault[5] = { 'w', SEP, '.', '*', 0 }; // wildcard match of dotfiles
Owner

Just noting how the older version "/oW/.*" was way more readable than { 'w', SEP, '.', '*', 0 }

Just noting how the older version `"/oW/.*"` was way more readable than `{ 'w', SEP, '.', '*', 0 }`
#include <tqregexp.h>
#include <kdebug.h>
typedef TQValueVector<TQRegExp> RegexList;
Owner

A TQValueList may be a better choice. TQValueVector has a capacity, so it would allocate extra memory even if there is a single element. TQValueList (being a list) does not do that.

A TQValueList may be a better choice. TQValueVector has a capacity, so it would allocate extra memory even if there is a single element. TQValueList (being a list) does not do that.
VinceR commented 1 year ago
Poster
Collaborator

A TQValueList may be a better choice.

TQValueVector was chosen to directly correspond to typedef TQValueVector<MatchSpec> MatchSpecList defined in tdecore/tdestringmatcher.cpp.

The reason TQValueVector was chosen for additional public interface to TDEStringmatcher was because it fits very nicely with the TQListBox in the UI, where each specification will be referenced with an integer.

> A TQValueList may be a better choice. `TQValueVector` was chosen to directly correspond to `typedef TQValueVector<MatchSpec> MatchSpecList` defined in `tdecore/tdestringmatcher.cpp`. The reason `TQValueVector` was chosen for additional public interface to `TDEStringmatcher` was because it fits very nicely with the `TQListBox` in the UI, where each specification will be referenced with an integer.
Owner

Ok, the reasoning being the choice is good.

Ok, the reasoning being the choice is good.
MicheleC marked this conversation as resolved
newMatchSpecs.append( optionString );
newMatchSpecs.append( matchSpec.pattern );
newRegexList.append( rxWork );
optionString = "";
Owner

I am confused (not withstanding that I think carry over options across matching spec is not a good idea - see previous comments).

optionString = ""; prevents options to be carried over to the next matchSpec, so it contradicts description in README file.

I am confused (not withstanding that I think carry over options across matching spec is not a good idea - see previous comments). `optionString = "";` prevents options to be carried over to the next matchSpec, so it contradicts description in README file.
VinceR commented 1 year ago
Poster
Collaborator

optionString = ""; prevents options to be carried over to the next matchSpec, so it >contradicts description in README file.

In regard to function setMatchSpecs( MatchSpecList newMatchSpecList )

optionString is a work variable that is reinitialized for each match specification. The function takes as input a list (array) of match patterns along with all of their associated options. For each of these, the function derives a character string that encodes those options. The function is not making any attempt to generate the "most economical" matchSpecString (e.g. by leveraging propagated option characters) - in that respect, I think it is generating what you would prefer.

Given that, on further review of the code, I would agree that the following initialization is pointless: TQString optionString = "rc" ; // start with defaults

I think our discussion of propagated option characters will be focused on the tolerance of setMatchSpecs( TQString newMatchSpecString ) for those.

>`optionString` = ""; prevents options to be carried over to the next matchSpec, so it >contradicts description in README file. In regard to function `setMatchSpecs( MatchSpecList newMatchSpecList )` `optionString` is a work variable that is reinitialized for each match specification. The function takes as input a list (array) of match patterns along with all of their associated options. For each of these, the function derives a character string that encodes those options. The function is not making any attempt to generate the "most economical" matchSpecString (e.g. by leveraging propagated option characters) - in that respect, I think it is generating what you would prefer. Given that, on further review of the code, I would agree that the following initialization is pointless: `TQString optionString = "rc" ; // start with defaults` I think our discussion of propagated option characters will be focused on the tolerance of `setMatchSpecs( TQString newMatchSpecString )` for those.
VinceR commented 1 year ago
Poster
Collaborator

Given that, on further review of the code, I would agree that the following initialization is pointless: 1TQString optionString = "rc" ; // start with defaults`

On second thought, I think I did that to seed a default state for setMatchSpecs( TQString newMatchSpecString )

Anyway, we'll figure it out in the follow-on conversation discussing this.

>Given that, on further review of the code, I would agree that the following initialization is pointless: 1TQString optionString = "rc" ; // start with defaults` On second thought, I think I did that to seed a default state for `setMatchSpecs( TQString newMatchSpecString )` Anyway, we'll figure it out in the follow-on conversation discussing this.
Owner

in that respect, I think it is generating what you would prefer.

Yes, that is my understand of the code too, ans that is why I was confused since the readme and previous discussion were pointing to the options to be propagated.

Let's discuss further on the next iteration.

> in that respect, I think it is generating what you would prefer. Yes, that is my understand of the code too, ans that is why I was confused since the readme and previous discussion were pointing to the options to be propagated. Let's discuss further on the next iteration.
""
};
if ( newMatchSpecString == p->m_matchSpecString )
Owner
if ( newMatchSpecString == p->m_matchSpecString )
    return true;

This should be at the very beginning of the method, before anything else

``` if ( newMatchSpecString == p->m_matchSpecString ) return true; ``` This should be at the very beginning of the method, before anything else
VinceR marked this conversation as resolved
for ( TQString &specification : newMatchSpecs ) {
if ( specification.find( TQChar(SEP) ) >= 0 ) {
Owner

This can't happen, because of this line above:
TQStringList newMatchSpecs = TQStringList::split( SEP, newMatchSpecString, true );

This can't happen, because of this line above: `TQStringList newMatchSpecs = TQStringList::split( SEP, newMatchSpecString, true );`
VinceR commented 1 year ago
Poster
Collaborator

This can't happen, because of this line above:
TQStringList newMatchSpecs = TQStringList::split( SEP, newMatchSpecString, true );

You are correct, I guess I was being overly paranoid. I will flag for removal.

>This can't happen, because of this line above: `TQStringList newMatchSpecs = TQStringList::split( SEP, newMatchSpecString, true );` You are correct, I guess I was being overly paranoid. I will flag for removal.
VinceR marked this conversation as resolved
break;
default:
continue; // should not arise
Owner

Although it should never happen, it would be better to flag an error and return false like in all other places.

Although it should never happen, it would be better to flag an error and return false like in all other places.
VinceR commented 1 year ago
Poster
Collaborator

Although it should never happen, it would be better to flag an error and return false like in all other places.

No it should NEVER happen because this function itself had set a legal value during the previous analysis of the associated option string. Won't handling this impossible condition be similar to what I had done when looking for SEP in a substring produced by split( SEP... )? I will go ahead and follow the recommendation but this raises another question I've been meaning to ask you.

In this and other functions, I look for various error conditions and when one is encountered, I issue a message, free some allocated memory, and return. I know that goto has been frowned upon by programming purists over the years but it seems that there might be a good case for using it for these early error exits. Consider the following example and let me know if you would flag it as bad practice:

159 bool TDEStringMatcher::setMatchSpecs( TQString newMatchSpecString )
160 {
...
232        default:
233          kdWarning() << "Error ..." << endl;
234          goto ERREXIT_setMatchSpecs;
...
311   return true;
312 ERREXIT_setMatchSpecs:
313   newMatchSpecList.clear();
314   newRegexList.clear();
315   return false;
316 }

>Although it should never happen, it would be better to flag an error and return false like in all other places. No it should NEVER happen because this function itself had set a legal value during the previous analysis of the associated option string. Won't handling this impossible condition be similar to what I had done when looking for SEP in a substring produced by `split( SEP... )`? I will go ahead and follow the recommendation but this raises another question I've been meaning to ask you. In this and other functions, I look for various error conditions and when one is encountered, I issue a message, free some allocated memory, and return. I know that `goto` has been frowned upon by programming purists over the years but it seems that there might be a good case for using it for these early error exits. Consider the following example and let me know if you would flag it as bad practice: ``` 159 bool TDEStringMatcher::setMatchSpecs( TQString newMatchSpecString ) 160 { ... 232 default: 233 kdWarning() << "Error ..." << endl; 234 goto ERREXIT_setMatchSpecs; ... 311 return true; 312 ERREXIT_setMatchSpecs: 313 newMatchSpecList.clear(); 314 newRegexList.clear(); 315 return false; 316 } ```
Owner

No it should NEVER happen because this function...

I agree, it should never happen for that exact reason. So we don't need a default case at all. But if we put it in, then it may make more sense to flag an error because it means we have a bug in the code. I am ok either way.

> No it should NEVER happen because this function... I agree, it should never happen for that exact reason. So we don't need a default case at all. But if we put it in, then it may make more sense to flag an error because it means we have a bug in the code. I am ok either way.
Owner

Forgot to comment on the use of goto: I could direct you to this famous article but simply put, there are better ways to write the same code without using goto, avoiding subtle issues that could arise when strange execution path are followed.

Forgot to comment on the use of `goto`: I could direct you to this famous [article](https://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf) but simply put, there are better ways to write the same code without using `goto`, avoiding subtle issues that could arise when strange execution path are followed.
// alphanumeric character in rxWork.pattern to its "least" equivalent.
break;
default:
continue; // should not arise
Owner

Flag error, see before comment

Flag error, see before comment
newMatchSpecList.clear();
newRegexList.clear();
return false;
continue;
Owner

continue will never be reached, it can be removed

`continue` will never be reached, it can be removed
VinceR marked this conversation as resolved
processingPattern = false; // next spec should be an option string
continue;
}
Owner

rather than using continue at line 270, it would be easier to read and understand if there was an else at this point and the part below be inside its body. if-else makes immediately obvious the flow.

rather than using `continue` at line 270, it would be easier to read and understand if there was an `else` at this point and the part below be inside its body. `if-else` makes immediately obvious the flow.
VinceR commented 1 year ago
Poster
Collaborator

rather than using continue at line 270, it would be easier to read and understand if there was an else at this point and the part below be inside its body. if-else makes immediately obvious the flow.

Agreed. I will also move the shorter option processing code to the if and the pattern processing code to the else

>rather than using continue at line 270, it would be easier to read and understand if there was an else at this point and the part below be inside its body. if-else makes immediately obvious the flow. Agreed. I will also move the shorter option processing code to the `if` and the pattern processing code to the `else`
VinceR marked this conversation as resolved
{
PatternType patternType;
ANCHandling ancHandling;
bool wantMatch; // "matching" vs. "not matching"
Owner

Suggest renaming wantMatch to something like 'expectMatch'

Suggest renaming `wantMatch` to something like 'expectMatch'
VinceR marked this conversation as resolved
* Container used in a TDEStringMatcher object
* representing multiple match specifications.
*/
typedef TQValueVector<MatchSpec> MatchSpecList;
Owner

A TQValueList may be a better choice. TQValueVector has a capacity, so it would allocate extra memory even if there is a single element. TQValueList (being a list) does not do that.

A `TQValueList` may be a better choice. `TQValueVector` has a capacity, so it would allocate extra memory even if there is a single element. `TQValueList` (being a list) does not do that.
VinceR commented 1 year ago
Poster
Collaborator

A TQValueList may be a better choice. TQValueVector has a capacity, so it would allocate extra memory even if there is a single element. TQValueList (being a list) does not do that.

Repeating my response to related earlier conversation:

The reason TQValueVector was chosen for additional public interface to TDEStringmatcher was because it fits very nicely with the TQListBox in the UI, where each specification will be referenced with an integer.

>A `TQValueList` may be a better choice. `TQValueVector` has a capacity, so it would allocate extra memory even if there is a single element. `TQValueList` (being a list) does not do that. Repeating my response to related earlier conversation: The reason `TQValueVector` was chosen for additional public interface to `TDEStringmatcher` was because it fits very nicely with the `TQListBox` in the UI, where each specification will be referenced with an integer.
Owner

Understood and I am fine with the explanation.

Understood and I am fine with the explanation.
MicheleC marked this conversation as resolved
/**
@return list of currently defined match specifications.
*/
MatchSpecList getMatchSpecs();
Owner

we can make this function const or provide two versions (const/non const) if editing the return value is required.

const MatchSpecList getMatchSpecs() const;
MatchSpecList getMatchSpecs();
we can make this function `const` or provide two versions (const/non const) if editing the return value is required. ``` const MatchSpecList getMatchSpecs() const; MatchSpecList getMatchSpecs(); ```
VinceR marked this conversation as resolved
/**
@return string encoding list of currently defined match specifications.
*/
TQString getMatchSpecString();
Owner

Same comment about making this const

Same comment about making this const
VinceR marked this conversation as resolved
Utility function for converting a wildcard pattern string
to a regular expression pattern string.
*/
TQString wildcardToRegex( const TQString& wildcardPattern );
Owner

Not sure if this is used in other part of the code not yet uploaded, but in case it is not we could make this function protected

Not sure if this is used in other part of the code not yet uploaded, but in case it is not we could make this function protected
VinceR commented 1 year ago
Poster
Collaborator

Not sure if this is used in other part of the code not yet uploaded, but in case it is not we could make this function protected

I will be discussing this function further in a future conversation.

Although it potentially could be used in TDEStringmatcher, it is not tied to that class. I thought it would be nice to expose the function publicly in case others might want to use it for whatever reason.

> Not sure if this is used in other part of the code not yet uploaded, but in case it is not we could make this function protected I will be discussing this function further in a future conversation. Although it potentially could be used in `TDEStringmatcher`, it is not tied to that class. I thought it would be nice to expose the function publicly in case others might want to use it for whatever reason.
Owner

See other comment about fixing the wildcard to regex directly in TQt3 rather than duplicating the code. Thehn this comment thread will no longer apply since this function would not be in the TSM class anymore.

See other comment about fixing the wildcard to regex directly in TQt3 rather than duplicating the code. Thehn this comment thread will no longer apply since this function would not be in the TSM class anymore.
/**
Utility function for escaping all regex-specific characters.
*/
TQString escapeRegexChars( const TQString& basicString );
Owner

As above one, could it be a protected method? Or isit used in tdebase or somewhere else (don't see any usage in this PR)

As above one, could it be a protected method? Or isit used in tdebase or somewhere else (don't see any usage in this PR)
VinceR commented 1 year ago
Poster
Collaborator

As above one, could it be a protected method? Or isit used in tdebase or somewhere else (don't see any usage in this PR)

True, it is not currently being used anywhere in the code. If we decide to use the TQRegExp for matching PatternType::SUBSTRING (instead of TQString), it will become useful.

Although it potentially could be used in TDEStringmatcher, it is not tied to that class. Again, I thought it would be nice to expose it publicly in case others might want to use it for whatever reason.

> As above one, could it be a protected method? Or isit used in tdebase or somewhere else (don't see any usage in this PR) True, it is not currently being used anywhere in the code. If we decide to use the `TQRegExp` for matching `PatternType::SUBSTRING` (instead of `TQString`), it will become useful. Although it potentially could be used in `TDEStringmatcher`, it is not tied to that class. Again, I thought it would be nice to expose it publicly in case others might want to use it for whatever reason.
Owner

Ok, let's discuss again in a further iteration.

Ok, let's discuss again in a further iteration.
private:
class TDEStringMatcherPrivate;
TDEStringMatcherPrivate *p;
Owner

It is a TQt3/TDE convention to use d for the pointer to the provate class, not p. Don't know why d but it's everywhere, so good to follow that.

It is a TQt3/TDE convention to use `d` for the pointer to the provate class, not `p`. Don't know why `d` but it's everywhere, so good to follow that.
VinceR marked this conversation as resolved
};
// Use vertical tab as m_patternString separator
inline constexpr char SEP { 0x0B };
Owner

Suggest a more meaningful name, like for example "TSM_PATTERN_SEP", so when used in other files it is easier to guess the context and function of it.
Or maybe make it a static member of TDEStringMatcher, so that it can later be used as TDEStringMatcher::SEPARATOR?

Suggest a more meaningful name, like for example "TSM_PATTERN_SEP", so when used in other files it is easier to guess the context and function of it. Or maybe make it a static member of TDEStringMatcher, so that it can later be used as TDEStringMatcher::SEPARATOR?
MicheleC requested changes 1 year ago
MicheleC left a comment
Owner

Second part of review. Still need to review the last two files, which I will hopefully do tomorrow.

Second part of review. Still need to review the last two files, which I will hopefully do tomorrow.
TSMTRACE << "Attempting to match string '" << stringToMatch << "' against stored patterns" << endl;
if ( p->m_matchSpecList.isEmpty() ) {
//-Debug: TSMTRACE << "Match failed on empty pattern list!" << endl;
return false; //FIXME: or should that be true per MicheleC's comment?
Owner

I think false is better. If the string to match is empty, it can't match patterns, so returning could be misleading. If I commented like that before, I was wrong or thinking something else.

I think `false` is better. If the string to match is empty, it can't match patterns, so returning could be misleading. If I commented like that before, I was wrong or thinking something else.
VinceR marked this conversation as resolved
TQString matchThis = stringToMatch;
if ( p->m_matchSpecList[index].ancHandling == ANCHandling::EQUIVALENCE )
{
if ( equivalentString.isNull() ) {
Owner

isEmpty() is a better check than isNull()

`isEmpty()` is a better check than `isNull()`
VinceR marked this conversation as resolved
// FIXME TBD: This is where we will be converting each alphanumeric
// character in stringToMatch to its "least" equivalent and storing
// the result in equivalentString. Until then, we'll just do:
equivalentString = stringToMatch;
Owner

We can use matchThis = equivalentString; instead of equivalentString = stringToMatch; (with the equivalence conversion when FIXED) and so there is no need for the equivalentString variable at all

We can use `matchThis = equivalentString;` instead of `equivalentString = stringToMatch;` (with the equivalence conversion when FIXED) and so there is no need for the `equivalentString` variable at all
VinceR commented 1 year ago
Poster
Collaborator

We can use matchThis = equivalentString; instead of equivalentString = stringToMatch; (with the equivalence conversion when FIXED) and so there is no need for the equivalentString variable at all

I'm drawing a blank on what you are recommending. equivalentString = stringToMatch; is a merely placeholder for a future code wherein equivalentString is constructed for the first and only time for use during this and future matches requiring ANCHandling::EQUIVALENCE. Doing what you recommend now will cause matching against an empty string instead of an unconverted string.

BTW, I have implemented and successfully tested ANCHandling::EQUIVALENCE. Once we complete this review cycle, I recommend we tackle the other parts of this PR (e.g. the KDirLister changes) before reviewing my implementation of ANCHandling::EQUIVALENCE which I suspect will engender a lot of discussion :)

>We can use `matchThis = equivalentString;` instead of `equivalentString = stringToMatch;` (with the equivalence conversion when FIXED) and so there is no need for the `equivalentString` variable at all I'm drawing a blank on what you are recommending. `equivalentString = stringToMatch;` is a merely placeholder for a future code wherein `equivalentString` is constructed for the first and only time for use during this and future matches requiring `ANCHandling::EQUIVALENCE`. Doing what you recommend now will cause matching against an empty string instead of an unconverted string. BTW, I have implemented and successfully tested `ANCHandling::EQUIVALENCE`. Once we complete this review cycle, I recommend we tackle the other parts of this PR (e.g. the `KDirLister` changes) before reviewing my implementation of `ANCHandling::EQUIVALENCE` which I suspect will engender a lot of discussion :)
Owner

Ah, I now understand what you intended to do. You want to calculate the equivalent string but only if that is required. And you are doing that in the same for loop that does the matching too.
I admit it was a bit hard to grasp the logic at first.
I was about to proposed splitting the for loop into two for better clarity but after thinking about it further, I think your code is better.

Ah, I now understand what you intended to do. You want to calculate the equivalent string but only if that is required. And you are doing that in the same `for` loop that does the matching too. I admit it was a bit hard to grasp the logic at first. I was about to proposed splitting the for loop into two for better clarity but after thinking about it further, I think your code is better.
MicheleC marked this conversation as resolved
//-Debug: TSMTRACE << "Attempting to match string '" << stringToMatch << "' against stored patterns" << endl;
if ( p->m_matchSpecList.isEmpty() ) {
//-Debug: TSMTRACE << "Match failed on empty pattern list!" << endl;
return false; //FIXME: or should that be true per MicheleC's comment?
Owner

I think "false" is better. If the string to match is empty, it can't match patterns, so returning could be misleading. If I commented like that before, I was wrong or thinking something else.

I think "false" is better. If the string to match is empty, it can't match patterns, so returning could be misleading. If I commented like that before, I was wrong or thinking something else.
VinceR marked this conversation as resolved
TQString matchThis = stringToMatch;
if ( p->m_matchSpecList[index].ancHandling == ANCHandling::EQUIVALENCE )
{
if ( equivalentString.isNull() ) {
Owner

isEmpty() is a better check than isNull()

`isEmpty()` is a better check than `isNull()`
VinceR marked this conversation as resolved
// FIXME TBD: This is where we will be converting each alphanumeric
// character in stringToMatch to its "least" equivalent and storing
// the result in equivalentString. Until then, we'll just do:
equivalentString = stringToMatch;
Owner

We can use matchThis = equivalentString; instead of equivalentString = stringToMatch; (with the equivalence conversion when FIXED) and so there is no need for the equivalentString variable at all

We can use `matchThis = equivalentString;` instead of `equivalentString = stringToMatch;` (with the equivalence conversion when FIXED) and so there is no need for the `equivalentString` variable at all
VinceR marked this conversation as resolved
}
if (
( p->m_regexList[index].search( matchThis ) < 0 ) // was there no match?
Owner

Suggest to rewrite in a clearer way.
Also I think there was a bug, because it was doing inverted comparsion.
If no match was found and we wanted a positive match, it was not going to return false. So the first less than comparison should be a greater or equal than comparison (correct me if I misunderstood the logic here).

Save the result of the first line in a bool variable, then use it in the if() expression. Simpler to read and understand.

bool matchFound = (p->m_regexList[index].search(matchThis) >= 0);
if (matchFound != p->m_matchSpecList[index].wantMatch)
Suggest to rewrite in a clearer way. Also I think there was a bug, because it was doing inverted comparsion. If no match was found and we wanted a positive match, it was not going to return false. So the first `less than` comparison should be a `greater or equal than` comparison (correct me if I misunderstood the logic here). Save the result of the first line in a bool variable, then use it in the if() expression. Simpler to read and understand. ``` bool matchFound = (p->m_regexList[index].search(matchThis) >= 0); if (matchFound != p->m_matchSpecList[index].wantMatch) ```
VinceR commented 1 year ago
Poster
Collaborator

Suggest to rewrite in a clearer way.

I sheepishly admit that there is a LOT more wrong with the matchAll function than this. It looks like I didn't propagate the changes I had made to the matchAny function. I will make corrections, then we can revisit this and the next conversation item concerning this function.

>Suggest to rewrite in a clearer way. I sheepishly admit that there is a LOT more wrong with the `matchAll` function than this. It looks like I didn't propagate the changes I had made to the `matchAny` function. I will make corrections, then we can revisit this and the next conversation item concerning this function.
Owner

Ok

Ok
return false;
}
if ( p->m_regexList[index].search( matchThis ) < 0 ) {
Owner

This if is not required IMO. There is the wantMatch field to choose whether a string should or should not match, and this is tested in the previous if. This if logic is the equivalent of requiring any string to match, which contractict the poinf of having wantMatch.
I guess this was added because of the comparison error explained in the if aboce.

This `if` is not required IMO. There is the `wantMatch` field to choose whether a string should or should not match, and this is tested in the previous `if`. This `if` logic is the equivalent of requiring any string to match, which contractict the poinf of having `wantMatch`. I guess this was added because of the comparison error explained in the `if` aboce.
//================================================================================================
/*
The following code is a modified copy of that found in tqt3/src/tools/qregexp.cpp.
Owner

I have not checked the code of this function, but just wondering what is the difference that requires a custom function for this.
Also why can't we just match using the "wildcard" functionality of TQRegExp?

I have not checked the code of this function, but just wondering what is the difference that requires a custom function for this. Also why can't we just match using the "wildcard" functionality of TQRegExp?
VinceR commented 1 year ago
Poster
Collaborator

I have not checked the code of this function, but just wondering what is the difference that requires a custom function for this.
Also why can't we just match using the "wildcard" functionality of TQRegExp?

Wildcards are currently handled internally by TQRegExp by converting them to regular expressions via the internal TQt function wc2rx that is not exposed in a public API (see tqt3/src/tools/qregexp.cpp ~ line 810). I do not think this tight binding of wildcard processing and a particular regular expression engine is a very good idea for the following reasons:

  • TQRegExp requires special match-time handling for wildcards, requiring use of the exactMatch() instead of search() function that would normally be used for regex's.
  • The wc2rx function has what I believe to be a bug in that it does not treat '!' as the wildcard character class negation character.
  • I think it would be beneficial to eventually upgrade our regex engine and none of the alternatives have built-in wildcard handling. Even Qt5's QRegularExpression leaves wildcard-to-regex conversions up to the caller: https://doc.qt.io/qt-5/qregularexpression.html#wildcardToRegularExpression
  • The assumption that wildcard expressions are best handled by conversion to regular expressions may not be a good one. We may eventually want to use fnmatch(3) from glibc since that would allow handling of a richer wildcard expression language.
>I have not checked the code of this function, but just wondering what is the difference that requires a custom function for this. Also why can't we just match using the "wildcard" functionality of TQRegExp? Wildcards are currently handled internally by `TQRegExp` by converting them to regular expressions via the internal TQt function `wc2rx` that is not exposed in a public API (see `tqt3/src/tools/qregexp.cpp` ~ line 810). I do not think this tight binding of wildcard processing and a particular regular expression engine is a very good idea for the following reasons: * TQRegExp requires special match-time handling for wildcards, requiring use of the `exactMatch()` instead of `search()` function that would normally be used for regex's. * The `wc2rx` function has what I believe to be a bug in that it does not treat '!' as the wildcard character class negation character. * I think it would be beneficial to eventually upgrade our regex engine and none of the alternatives have built-in wildcard handling. Even Qt5's `QRegularExpression` leaves wildcard-to-regex conversions up to the caller: https://doc.qt.io/qt-5/qregularexpression.html#wildcardToRegularExpression * The assumption that wildcard expressions are best handled by conversion to regular expressions may not be a good one. We may eventually want to use `fnmatch(3)` from `glibc` since that would allow handling of a richer wildcard expression language.
Owner

Uhm... I reserve further judgement after looking into the code, but usually wildcard in linux/bash are *, ? and []. The ! is not considered a wildcard negation. Perhaps different type of applications do, but I believe most users are not expecting ! to negate a wildcard character.

Uhm... I reserve further judgement after looking into the code, but usually wildcard in linux/bash are `*`, `?` and `[]`. The `!` is not considered a wildcard negation. Perhaps different type of applications do, but I believe most users are not expecting `!` to negate a wildcard character.
VinceR commented 1 year ago
Poster
Collaborator

The ! is not considered a wildcard negation. Perhaps different type of applications do, but I believe most users are not expecting ! to negate a wildcard character.

Personally, I've never used more than . and * wildcards but according to man 7 glob:

  • An expression [!...] matches a single character, namely any character that is not matched by the expression obtained by removing the first ! from it. (Thus, [!]a-] matches any single character except ], a, and -.)

  • Now that regular expressions have bracket expressions where the negation is indicated by a ^, POSIX has declared the effect of a wildcard pattern [^...] to be undefined.

Read further and then take a look at the extensions in man fnmatch. It looks like "wildcard" (irregular) expressions can get just as complicated as normal regular expressions. Maybe these would best be supported independently by calls to that function.

>The `!` is not considered a wildcard negation. Perhaps different type of applications do, but I believe most users are not expecting `!` to negate a wildcard character. Personally, I've never used more than `.` and `*` wildcards but according to `man 7 glob`: * *An expression `[!...]` matches a single character, namely any character that is not matched by the expression obtained by removing the first `!` from it. (Thus, `[!]a-]` matches any single character except `]`, `a`, and `-`.)* * *Now that regular expressions have bracket expressions where the negation is indicated by a `^`, POSIX has declared the effect of a wildcard pattern `[^...]` to be undefined.* Read further and then take a look at the extensions in `man fnmatch`. It looks like "wildcard" (irregular) expressions can get just as complicated as normal regular expressions. Maybe these would best be supported independently by calls to that function.
Owner

Personally, I've never used more than . and * wildcards but according to man 7 glob:

Thanks for the info, I wasn't aware of that (likewise I use pretty much just ., ?and*`.

In such case I think it would make more sense to fix the related function in TQt3 to add support for [!...] rather than duplicating and modifying the code, which would result in non consistent behavior between the two versions. What do you think?

> Personally, I've never used more than . and * wildcards but according to man 7 glob: Thanks for the info, I wasn't aware of that (likewise I use pretty much just `., `?` and `*`. In such case I think it would make more sense to fix the related function in TQt3 to add support for `[!...]` rather than duplicating and modifying the code, which would result in non consistent behavior between the two versions. What do you think?
VinceR commented 1 year ago
Poster
Collaborator

In such case I think it would make more sense to fix the related function in TQt3 to add support for [!...] rather than duplicating and modifying the code, which would result in non consistent behavior between the two versions. What do you think?

Consider the following differences between the proposed public TDEStringMatcher::wildcardToRegex() method and the internal TQRegExp wc2rx() helper function:

  1. Fix: use !, not ^ as character class negation character.
  2. Improvement: Final regex always starts with ^ and ends with $, resulting in correct wildcard matching with the TQRegExp::search() function. Currently a wc2rx-created regex requires TQRegExp::exactMatch() for correct matching.
  3. Improvement: functionality is publicly exposed to other applications that might need it independent of any use of TQRegExp or TDEStringMatcher.

Certainly all of this can be done with an update to TQRegExp but there is a potential that implementing the first 2 items might break existing applications that depend on the current incorrect idiosyncratic behavior. If you don't think this is a deal breaker, I will open an issue to request these changes.

The new code is using fnmatch(3) on GLIBC systems which leverages its own matching code that does not rely on prior conversion of a wildcard expression to regex. Even so, it might be nice to still provide a conversion utility for applications that want to go down that path.

>In such case I think it would make more sense to fix the related function in TQt3 to add support for [!...] rather than duplicating and modifying the code, which would result in non consistent behavior between the two versions. What do you think? Consider the following differences between the proposed public `TDEStringMatcher::wildcardToRegex()` method and the internal `TQRegExp` `wc2rx()` helper function: 1. Fix: use `!`, not `^` as character class negation character. 1. Improvement: Final regex always starts with `^` and ends with `$`, resulting in correct wildcard matching with the `TQRegExp::search()` function. Currently a `wc2rx`-created regex *requires* `TQRegExp::exactMatch()` for correct matching. 1. Improvement: functionality is publicly exposed to other applications that might need it independent of any use of `TQRegExp` or `TDEStringMatcher`. Certainly all of this can be done with an update to `TQRegExp` but there is a potential that implementing the first 2 items might break existing applications that *depend* on the current incorrect idiosyncratic behavior. If you don't think this is a deal breaker, I will open an issue to request these changes. The new code is using `fnmatch(3)` on GLIBC systems which leverages its own matching code that does not rely on prior conversion of a wildcard expression to regex. Even so, it might be nice to still provide a conversion utility for applications that want to go down that path.
@return whether or not @param stringToMatch matches any of
the current match specifications.
*/
bool matchAny( const TQString& stringToMatch );
Owner

This could be made a const method I guess.

This could be made a `const` method I guess.
VinceR marked this conversation as resolved
@return whether or not @param stringToMatch matches all of
the current match specifications.
*/
bool matchAll( const TQString& stringToMatch );
Owner

This could be made a const method I guess.

This could be made a `const` method I guess.
VinceR marked this conversation as resolved
VinceR commented 1 year ago
Poster
Collaborator

Hi, MichelleC,

Thanks for your review so far. I think I will go back and resolve the earlier conversations on the assumption that if I missed something from those, you will be reminding me of it during the current review cycle.

I wish I knew more about git. I am currently working on 2 independent branches, the current one you are reviewing and a more comprehensive one that I use for RnD and testing. As it is now, I need to make changes, small and large, to both branches. There must be a better way using git magic. Or maybe not :)

I am perfectly fine with not attaining the 14.1 target. I do hope the interval between future feature releases will be shorter than the one between 14.0 and 14.1.

Do you have a sense of how many people there are that actually use the current 14.1 development branch code for their desktop? I hope it's a lot because there is no better test of committed changes than actual daily use outside of the laboratory.

Vince

Hi, MichelleC, Thanks for your review so far. I think I will go back and resolve the earlier conversations on the assumption that if I missed something from those, you will be reminding me of it during the current review cycle. I wish I knew more about git. I am currently working on 2 independent branches, the current one you are reviewing and a more comprehensive one that I use for RnD and testing. As it is now, I need to make changes, small and large, to both branches. There must be a better way using git magic. Or maybe not :) I am perfectly fine with not attaining the 14.1 target. I do hope the interval between future feature releases will be shorter than the one between 14.0 and 14.1. Do you have a sense of how many people there are that actually use the current 14.1 development branch code for their desktop? I hope it's a lot because there is no better test of committed changes than actual daily use outside of the laboratory. Vince
Owner

Hi Vince,
I shall complete the review of the last two files later today (weekend was busy...) and provide an overall take of where we are so far.

There must be a better way using git magic.

For git, I usually work in the following way. It may not be the best, but it fits my workflow.

  • do R&D/testing on a local copy of the code. This is temporary and I can get rid of it anytime. Or if I don't like a change, I can simply copy back from the git original copy and restart/continue from a different point. If I am working on a big change, I create a temporary git branch and do several local commits that serves as snapshot, so once again if I mess up at some point, I can always go back to a previous snapshot. I tend to have lot of smaller commits (each with a purpose) rather than one huge commit with lots of changes in it.
  • for deployment, we have the master branch (R14.1.0-dev) and we usually deploy there.
  • then we backport to R14.0.x as needed, eventually with fine tuning if required.

I do hope the interval between future feature releases will be shorter than the one between 14.0 and 14.1.

R14.1.0 suffered from lack of planning for the first years. I would say till 2018 there was no real plan. Then we came together with a plan but we had to clean up some mess from the previous 4 years and we ended up where we are. R14.1.0 will be out at the end of April if all goes as per schedule.
For future release I would like to see a R14.x.0 release every 3 or 4 years at most and we will do planning from the very beginning. So it should be a shorter interval.

Do you have a sense of how many people there are that actually use the current 14.1 development branch code for their desktop?

To be honest I don't have any idea about that. There are for sure a number of people using it, myself included. But I think the majority are on R14.0.x.

Hi Vince, I shall complete the review of the last two files later today (weekend was busy...) and provide an overall take of where we are so far. > There must be a better way using git magic. For git, I usually work in the following way. It may not be the best, but it fits my workflow. - do R&D/testing on a local copy of the code. This is temporary and I can get rid of it anytime. Or if I don't like a change, I can simply copy back from the git original copy and restart/continue from a different point. If I am working on a big change, I create a temporary git branch and do several local commits that serves as snapshot, so once again if I mess up at some point, I can always go back to a previous snapshot. I tend to have lot of smaller commits (each with a purpose) rather than one huge commit with lots of changes in it. - for deployment, we have the master branch (R14.1.0-dev) and we usually deploy there. - then we backport to R14.0.x as needed, eventually with fine tuning if required. > I do hope the interval between future feature releases will be shorter than the one between 14.0 and 14.1. R14.1.0 suffered from lack of planning for the first years. I would say till 2018 there was no real plan. Then we came together with a plan but we had to clean up some mess from the previous 4 years and we ended up where we are. R14.1.0 will be out at the end of April if all goes as per schedule. For future release I would like to see a R14.x.0 release every 3 or 4 years at most and we will do planning from the very beginning. So it should be a shorter interval. > Do you have a sense of how many people there are that actually use the current 14.1 development branch code for their desktop? To be honest I don't have any idea about that. There are for sure a number of people using it, myself included. But I think the majority are on R14.0.x.
MicheleC requested changes 1 year ago
MicheleC left a comment
Owner

Last part of the review.

Last part of the review.
return;
TSMTRACE << " Attempting to disconnect slots from hidden file matcher (" << m_pHiddenFileMatcher << ") signals ... " << endl;
if ( (m_pHiddenFileMatcher != nullptr) && disconnect( m_pHiddenFileMatcher, 0, this, 0 ) )
Owner

Suggeest minor adjustment: move the disconnect in the if body rather than doing it in the if condition. It does not change the functionality but readability greatly benefits from it.

Suggeest minor adjustment: move the `disconnect` in the `if` body rather than doing it in the `if` condition. It does not change the functionality but readability greatly benefits from it.
VinceR marked this conversation as resolved
if ( hiddenFileMatcher == nullptr ) {
kdWarning() << "KFileItem::setHiddenFileMatcher: called with null pointer, nothing will be hidden any more" << endl;
return;
Owner

There is a logic bug here, because reEvaluateHidden(); is not called and if we set nullptr as the new file mather object, the old hidden status would remain till the pattern changes.
We need to call reEvaluateHidden(); before returning, but I would suggest a different coding style:

if (hiddenFileMatcher != nullptr)
{
   // connect the file matcher to the required slots (destroyed, patternsChanged)
   ...
}

reEvaluateHidden();

This way reEvaluateHidden(); is always called regardles of whether we are setting a new file matcher or removing an existing one.

There is a logic bug here, because `reEvaluateHidden();` is not called and if we set `nullptr` as the new file mather object, the old hidden status would remain till the pattern changes. We need to call `reEvaluateHidden();` before returning, but I would suggest a different coding style: ``` if (hiddenFileMatcher != nullptr) { // connect the file matcher to the required slots (destroyed, patternsChanged) ... } reEvaluateHidden(); ``` This way `reEvaluateHidden();` is always called regardles of whether we are setting a new file matcher or removing an existing one.
VinceR marked this conversation as resolved
Owner

Hi @VinceR,
I finished this round of review.
Good progress, many things are now in place and the design is more robust compared to the very first version from so long ago.
Most of my comments are minor adjustments and should not take long to rectify.
There are only 2 points which needs some discussion:

  1. the choice of separator character. As already expressed, the vertical tab is not much of a user friendly choice IMO, although quite unique indeed. I think the very first suggestion from you (using /) is a better choice. I know / can be part of a pattern, but I think we could simply escape that as '/' is needed. Users should be quite familiar with that since it is a common thing to do. And it avoids "broken lines"

  2. whether to make options carry forward or not. IMO, having independent pieces is easier from a programming and logic view and less prone to mistakes when moving things around.
    Having said that, see one of my comments because unlessI have misunderstood, it seems the code is not carrying options over but the readme file say so.

We are definitely going in the right direction. Although we will miss the R14.1.0 window, we can merge this into R14.2.0-dev when we are ready.

Hi @VinceR, I finished this round of review. Good progress, many things are now in place and the design is more robust compared to the very first version from so long ago. Most of my comments are minor adjustments and should not take long to rectify. There are only 2 points which needs some discussion: 1. the choice of separator character. As already expressed, the vertical tab is not much of a user friendly choice IMO, although quite unique indeed. I think the very first suggestion from you (using `/`) is a better choice. I know `/` can be part of a pattern, but I think we could simply escape that as '\/' is needed. Users should be quite familiar with that since it is a common thing to do. And it avoids "broken lines" 2. whether to make options carry forward or not. IMO, having independent pieces is easier from a programming and logic view and less prone to mistakes when moving things around. Having said that, see one of my comments because unlessI have misunderstood, it seems the code is not carrying options over but the readme file say so. We are definitely going in the right direction. Although we will miss the R14.1.0 window, we can merge this into R14.2.0-dev when we are ready.
MicheleC removed the SL/minor label 1 year ago
MicheleC added this to the R14.2.0 release milestone 1 year ago
VinceR commented 1 year ago
Poster
Collaborator

Hi MicheleC,

Addressing the separator character:

the choice of separator character. As already expressed, the vertical tab is not much of a user friendly choice IMO, although quite unique indeed.

I too am not thrilled about using <VT>. My standard go-to separator for dividing text strings is the horizontal tab (<HT>) and I had some initial concerns about using that for the divider that may not be founded. But even using <HT> won't make editing config files that easy.

Consider an example where a user wishes to define 3 regular expressions in the matcher: 1\st, 2\nd and 3\rd. Ideally, TDE would store this specification literally in the configuration file as: 1\st 2\nd 3\rd (those are horizontal tabs separating the the 3 expressions). But instead, it stores it as 1\\st\t2\\nd\t3\\rd. I don't agree with what TDE does here: why does \ need to be escaped? and why does a perfectly good and visually pleasing horizontal tab need to be converted to \t? But that's the way it works.

I think the very first suggestion from you (using /) is a better choice. I know / can be part of a pattern, but I think we could simply escape that as '/' is needed. Users should be quite familiar with that since it is a common thing to do. And it avoids "broken lines"

I don't think it is reasonable to ask users, working in a UI, to remember to escape / characters in their regular expressions. They don't have to do that anywhere else. We could post-process user input and convert unescaped instances of / to \/ but ... just to support direct editing of configuration files?

Let me propose that we use <HT> instead of <VT>. I was originally concerned that this would present a conflict when users specify \t in their regular expressions but given TDE's overzealous escaping of \, that won't be an issue.

Hi MicheleC, Addressing the separator character: > the choice of separator character. As already expressed, the vertical tab is not much of a user friendly choice IMO, although quite unique indeed. I too am not thrilled about using `<VT>`. My standard go-to separator for dividing text strings is the horizontal tab (`<HT>`) and I had some initial concerns about using that for the divider that may not be founded. But even using `<HT>` won't make editing config files that easy. Consider an example where a user wishes to define 3 regular expressions in the matcher: `1\st`, `2\nd` and `3\rd`. Ideally, TDE would store this specification literally in the configuration file as: `1\st 2\nd 3\rd` (those are horizontal tabs separating the the 3 expressions). But instead, it stores it as `1\\st\t2\\nd\t3\\rd`. I don't agree with what TDE does here: why does `\` need to be escaped? and why does a perfectly good and visually pleasing horizontal tab need to be converted to `\t`? But that's the way it works. >I think the very first suggestion from you (using `/`) is a better choice. I know `/` can be part of a pattern, but I think we could simply escape that as '\/' is needed. Users should be quite familiar with that since it is a common thing to do. And it avoids "broken lines" > I don't think it is reasonable to ask users, working in a UI, to remember to escape `/` characters in their regular expressions. They don't have to do that anywhere else. We could post-process user input and convert unescaped instances of `/` to `\/` but ... just to support direct editing of configuration files? Let me propose that we use `<HT>` instead of `<VT>`. I was originally concerned that this would present a conflict when users specify `\t` in their regular expressions but given TDE's overzealous escaping of `\`, that won't be an issue.
Owner

Let me propose that we use <HT> instead of <VT>. I was originally concerned that this would present a conflict when users specify \t in their regular expressions but given TDE's overzealous escaping of \, that won't be an issue.

is definitely preferrable compared to (see also other comment I made earlier today). Alternatively, why not using , and make it a CSV list of options and patterns?

> Let me propose that we use `<HT>` instead of `<VT>`. I was originally concerned that this would present a conflict when users specify `\t` in their regular expressions but given TDE's overzealous escaping of `\`, that won't be an issue. <HT> is definitely preferrable compared to <VT> (see also other comment I made earlier today). Alternatively, why not using `,` and make it a CSV list of options and patterns?
Owner

I have a few more comments to answer, I will do that probably on Thursday.

I have a few more comments to answer, I will do that probably on Thursday.
Owner

I have a few more comments to answer, I will do that probably on Thursday.

Done with the remaining comments :-)

> I have a few more comments to answer, I will do that probably on Thursday. Done with the remaining comments :-)
VinceR commented 1 year ago
Poster
Collaborator

Hi MicheleC,

It's been a while, so I thought I should provide a status. Despite my silence, I have been pretty busy with this PR. Once I've made some final design decisions, I will respond to the remaining conversation items and push updated code. I will also include the code that implements what I have dubbed "Alphanumeric Equivalence", which takes case-insensitivity to another level.

One of the things I have done is to develop a class that provides a simplified interface to PCRE2 that can be used as a regex engine in TDEStringMatcher instead of (or in addition to) TQRegExp. I still need to do some more development and testing, but if this turns out to be a successful endeavor, I will push updated code later for review.

Vince

Hi MicheleC, It's been a while, so I thought I should provide a status. Despite my silence, I have been pretty busy with this PR. Once I've made some final design decisions, I will respond to the remaining conversation items and push updated code. I will also include the code that implements what I have dubbed "Alphanumeric Equivalence", which takes case-insensitivity to another level. One of the things I have done is to develop a class that provides a simplified interface to PCRE2 that can be used as a regex engine in `TDEStringMatcher` instead of (or in addition to) `TQRegExp`. I still need to do some more development and testing, but if this turns out to be a successful endeavor, I will push updated code later for review. Vince
Owner

Ok, thanks for the update @VinceR. Looking forward for the new code and the next review cycle :-)

Ok, thanks for the update @VinceR. Looking forward for the new code and the next review cycle :-)
MicheleC modified the milestone from R14.2.0 release to R14.2.x 1 year ago
VinceR added 1 commit 1 year ago
0c107b54a2
Introduce TEquivChars class and an associated global instance that
VinceR commented 1 year ago
Poster
Collaborator

Let me propose that we use <HT> instead of <VT>. I was originally concerned that this would present a conflict when users specify \t in their regular expressions but given TDE's overzealous escaping of \, that won't be an issue.

is definitely preferrable compared to (see also other comment I made earlier today). Alternatively, why not using , and make it a CSV list of options and patterns?

In an effort to settle this point, the new code is using <HT> as the separator. We don't want to use , as the separator because that would require UI users who wanted to write a literal , in a pattern to write \, instead. This is not a problem for <HT> since users already know to specify \t in patterns that require it.

> > Let me propose that we use `<HT>` instead of `<VT>`. I was originally concerned that this would present a conflict when users specify `\t` in their regular expressions but given TDE's overzealous escaping of `\`, that won't be an issue. > > <HT> is definitely preferrable compared to <VT> (see also other comment I made earlier today). Alternatively, why not using `,` and make it a CSV list of options and patterns? In an effort to settle this point, the new code is using `<HT>` as the separator. We don't want to use `,` as the separator because that would require UI users who wanted to write a literal `,` in a pattern to write `\,` instead. This is not a problem for `<HT>` since users already know to specify `\t` in patterns that require it.
VinceR commented 1 year ago
Poster
Collaborator

Hi @MicheleC,

First, congratulations to you, @SlavekB, and the rest on getting 14.1 released. I am sure that was a big job and hopefully users will find the update both pleasing and bug-free.

As you can see, I have uploaded new code for this PR and also made responses to conversation items that still merited a specific response from me.

@VinceR

Hi @MicheleC, First, congratulations to you, @SlavekB, and the rest on getting 14.1 released. I am sure that was a big job and hopefully users will find the update both pleasing and bug-free. As you can see, I have uploaded new code for this PR and also made responses to conversation items that still merited a specific response from me. @VinceR
Owner

Hi @VinceR

First, congratulations to you, @SlavekB, and the rest on getting 14.1 released. I am sure that was a big job and hopefully users will find the update both pleasing and bug-free.

It's a team effort, so everybody who contributed deserves credit for it, including you :-)

As you can see, I have uploaded new code for this PR and also made responses to conversation items that still merited a specific response from me.

I will review the code sometimes this week or next and feedback as usual.

Hi @VinceR > First, congratulations to you, @SlavekB, and the rest on getting 14.1 released. I am sure that was a big job and hopefully users will find the update both pleasing and bug-free. It's a team effort, so everybody who contributed deserves credit for it, including you :-) > As you can see, I have uploaded new code for this PR and also made responses to conversation items that still merited a specific response from me. I will review the code sometimes this week or next and feedback as usual.
MicheleC requested changes 1 year ago
MicheleC left a comment
Owner

See comments

See comments
// #define TSM_PCRE2
// #define TEQUIVCHARS_HASH_LOOKUP
#ifndef TEQUIVCHARS_HASH_LOOKUP // Using binary search on sorted array
Owner

We would need to add cmake code to detect the presence of the hopscotch-map library and then define TEQUIVCHARS_HASH_LOOKUP from cmake. The package is available at least in debian, so it is indeed a good idea to use it if it is faster.

We would need to add cmake code to detect the presence of the hopscotch-map library and then define TEQUIVCHARS_HASH_LOOKUP from cmake. The package is available at least in debian, so it is indeed a good idea to use it if it is faster.
VinceR commented 1 year ago
Poster
Collaborator

We would need to add cmake code to detect the presence of the hopscotch-map library and then define TEQUIVCHARS_HASH_LOOKUP from cmake. The package is available at least in debian, so it is indeed a good idea to use it if it is faster.

First, a comment about the current code: I included everything that contains an idea that that I pursued as a possible implementation. I did that mainly to show you what ideas I tried. All of those possibilities have been tested and work. I intend to choose one of them to use for implementing implementing TSM's ANCHandling:Equivalence

Specifically:

Hash table lookup vs. binary search on pre-sorted table

The 2 methods were close in lookup speed but there was a noticeable delay while the hash table was being constructed. For a table this small, I doubt using a hash makes any sense. Using a 3rd party hash presents additional maintenance difficulty, even though it might out-perform std:unordered_map. For that reason, I want to use binary search.

The need for the replaceCharsMB() function.

This was the original code, developed before I gained a better understanding of how to properly access the 16-bit characters in a TQString. I have learned a lot since then, not only in developing this class but also in developing a class wrapper for PCRE2. Although the function works and performs reasonably well, there are a lot of problems with using the types and functions defined in wchar.h to process strings of 16-bit characters on linux systems. The final code will not have this function.

>We would need to add cmake code to detect the presence of the hopscotch-map library and then define TEQUIVCHARS_HASH_LOOKUP from cmake. The package is available at least in debian, so it is indeed a good idea to use it if it is faster. First, a comment about the current code: I included everything that contains an idea that that I pursued as a possible implementation. I did that mainly to show you what ideas I tried. All of those possibilities have been tested and work. I intend to choose one of them to use for implementing implementing TSM's `ANCHandling:Equivalence` Specifically: ##### Hash table lookup vs. binary search on pre-sorted table The 2 methods were close in lookup speed but there was a noticeable delay while the hash table was being constructed. For a table this small, I doubt using a hash makes any sense. Using a 3rd party hash presents additional maintenance difficulty, even though it might out-perform `std:unordered_map`. For that reason, I want to use binary search. ##### The need for the replaceCharsMB() function. This was the original code, developed before I gained a better understanding of how to properly access the 16-bit characters in a `TQString`. I have learned a lot since then, not only in developing this class but also in developing a class wrapper for PCRE2. Although the function works and performs reasonably well, there are a lot of problems with using the types and functions defined in `wchar.h` to process strings of 16-bit characters on linux systems. The final code will not have this function.
VinceR commented 1 year ago
Poster
Collaborator

The 2 methods were close in lookup speed

Not completely true; see code comment just before the definition of replaceChars function in tdecore/tequivchars.cpp for the complete picture.

> The 2 methods were close in lookup speed Not completely true; see code comment just before the definition of `replaceChars` function in `tdecore/tequivchars.cpp` for the complete picture.
Owner

For that reason, I want to use binary search.

I think it is a very sensible explanation and choice. A 3rd party library for such a small table would be extra maintenance work for not much benefit.

> For that reason, I want to use binary search. I think it is a very sensible explanation and choice. A 3rd party library for such a small table would be extra maintenance work for not much benefit.
class TEquivChars_Private
{
public:
#ifndef TEQUIVCHARS_HASH_LOOKUP // Using binary search on sorted array
Owner

This comment covers the whole code between lines 10 and 30 plus the tequivchars-mapping.h file.

  1. We should move the line 20-30 into the tequivchars-mapping.h file, so that the equivalence table is a well formed array. By doing so, we do not need to hard code the precise size of the map as in line 11-12, but we can derive them using sizeof statements, making the code robust to addition/deletion from the table.

  2. tequivchars-mapping.h needs to be guarded against multiple inclusions with an #ifndef-#define-#endif block.

  3. would the code still work if we use TQChar instead of w_char_t? Nothing against the latter, but given that TDE is heavily coded using TQt3 classes, it would be more natural to use TQChar if the functionality is not affected.

This comment covers the whole code between lines 10 and 30 plus the tequivchars-mapping.h file. 1. We should move the line 20-30 into the tequivchars-mapping.h file, so that the equivalence table is a well formed array. By doing so, we do not need to hard code the precise size of the map as in line 11-12, but we can derive them using sizeof statements, making the code robust to addition/deletion from the table. 2. tequivchars-mapping.h needs to be guarded against multiple inclusions with an #ifndef-#define-#endif block. 3. would the code still work if we use `TQChar` instead of `w_char_t`? Nothing against the latter, but given that TDE is heavily coded using TQt3 classes, it would be more natural to use TQChar if the functionality is not affected.
VinceR commented 1 year ago
Poster
Collaborator

We should move the line 20-30 into the tequivchars-mapping.h file, so that the equivalence table is a well formed array. By doing so, we do not need to hard code the precise size of the map as in line 11-12, but we can derive them using sizeof statements, making the code robust to addition/deletion from the table.

I understand what you are asking but hard-coding table dimensions apparently is required somewhere. Following your suggestion, I defined the table in tequivchars-mapping.h as EquivalentsTable[2993][2]. Doing that allows me to compute table rows using sizeof(). But I was completely unsuccessful when I tried EquivalentsTable[][2], getting error messages such as "flexible array member ... is not a end of class ..." and "initializer for flexible array member".

tequivchars-mapping.h needs to be guarded against multiple inclusions with an #ifndef-#define-#endif block.

I will do this but this file is not a true header file but a code snippet, that is included exactly once in the only code that needs it.

would the code still work if we use TQChar instead of w_char_t?

I think that would make the code more cumbersome and possibly less efficient . I do use TQChar when stepping through the input string and when producing the output string.

That said, I now think that using w_char_t is not a great idea. That's a 32-bit value on linux systems and we are dealing with 16 bit characters. I am going to change it to unsigned short

>We should move the line 20-30 into the tequivchars-mapping.h file, so that the equivalence table is a well formed array. By doing so, we do not need to hard code the precise size of the map as in line 11-12, but we can derive them using sizeof statements, making the code robust to addition/deletion from the table. I understand what you are asking but hard-coding table dimensions apparently is required *somewhere*. Following your suggestion, I defined the table in `tequivchars-mapping.h` as `EquivalentsTable[2993][2]`. Doing that allows me to compute table rows using `sizeof()`. But I was completely unsuccessful when I tried `EquivalentsTable[][2]`, getting error messages such as "flexible array member ... is not a end of class ..." and "initializer for flexible array member". >tequivchars-mapping.h needs to be guarded against multiple inclusions with an #ifndef-#define-#endif block. I will do this but this file is not a true header file but a code snippet, that is included exactly once in the only code that needs it. >would the code still work if we use TQChar instead of w_char_t? I think that would make the code more cumbersome and possibly less efficient . I do use TQChar when stepping through the input string and when producing the output string. That said, I now think that using `w_char_t` is not a great idea. That's a 32-bit value on linux systems and we are dealing with 16 bit characters. I am going to change it to `unsigned short`
Owner

But I was completely unsuccessful when I tried EquivalentsTable[][2]

Something like this should work and give the correct number of rows and columns

int rows = sizeof(EquivalentsTable)/sizeof(EquivalentsTable[0]);
int cols = sizeof(EquivalentsTable[0])/sizeof(EquivalentsTable[0][0]);

I will do this but this file is not a true header file but a code snippet, that is included exactly once in the only code that needs it.

Yes, correct. My original comment was meant as a follow up to make the table a well formed array.

I think that would make the code more cumbersome and possibly less efficient . I do use TQChar when stepping through the input string and when producing the output string.
That said, I now think that using w_char_t is not a great idea. That's a 32-bit value on linux systems and we are dealing with 16 bit characters. I am going to change it to unsigned short

You are using TQChar in part of the string code. Why use a different type (unsigned short, w_char_t, ...) that will result in more (possibly implicit) conversions between types each time? For the record, TQChar uses an nsigned short internally, but using TQChar explicitly will make the code more robust. TQString/TQChar internal implementation could be changed without affecting this PR.

> But I was completely unsuccessful when I tried EquivalentsTable[][2] Something like this should work and give the correct number of rows and columns ``` int rows = sizeof(EquivalentsTable)/sizeof(EquivalentsTable[0]); int cols = sizeof(EquivalentsTable[0])/sizeof(EquivalentsTable[0][0]); ``` > I will do this but this file is not a true header file but a code snippet, that is included exactly once in the only code that needs it. Yes, correct. My original comment was meant as a follow up to make the table a well formed array. > I think that would make the code more cumbersome and possibly less efficient . I do use TQChar when stepping through the input string and when producing the output string. That said, I now think that using w_char_t is not a great idea. That's a 32-bit value on linux systems and we are dealing with 16 bit characters. I am going to change it to unsigned short You are using TQChar in part of the string code. Why use a different type (unsigned short, w_char_t, ...) that will result in more (possibly implicit) conversions between types each time? For the record, TQChar uses an nsigned short internally, but using TQChar explicitly will make the code more robust. TQString/TQChar internal implementation could be changed without affecting this PR.
};
#ifndef TEQUIVCHARS_HASH_LOOKUP // Using binary search on sorted array
inline wchar_t TEquivChars_Private::lookup( wchar_t key )
Owner

The lookup() code is fine, but perhaps we could consider using std::binary_search()?

The `lookup()` code is fine, but perhaps we could consider using std::binary_search()?
VinceR commented 1 year ago
Poster
Collaborator

The lookup() code is fine, but perhaps we could consider using std::binary_search()?

Unless I am overlooking something, that returns only a boolean value. I need a row number.

>The lookup() code is fine, but perhaps we could consider using std::binary_search()? Unless I am overlooking something, that returns only a boolean value. I need a row number.
Owner

You are correct, I didn't check the std::binary_search signature before. Very surprising indeed, as often you want to find the place in the array where an element is.
bsearch() from stdlib does that, so you may want to take a look. The lookup() code is perfectly fine, just longer.

You are correct, I didn't check the std::binary_search signature before. Very surprising indeed, as often you want to find the place in the array where an element is. [`bsearch()`](https://en.cppreference.com/w/c/algorithm/bsearch) from stdlib does that, so you may want to take a look. The `lookup()` code is perfectly fine, just longer.
/*
There are 2 implementations of table lookup within this fuction: one that
uses hash table lookup and another that uses binary search on a presorted
table. We default using the binary search implementation for these reasons:
Owner

We default using the binary search implementation for these reasons

I miss the point 😕
If we already tested and figured out that using the hopscotch-map library is slower and uses more CPU in most cases and therefore we default to the binary search, why making the code more complicated by including the code for the hopscotch-map library at all? Surely it's a nice exercise to test it and compare the performances, but once there is a clear winner, it is preferrable to keep the code as simple and readable as possible 😃

> We default using the binary search implementation for these reasons I miss the point 😕 If we already tested and figured out that using the hopscotch-map library is slower and uses more CPU in most cases and therefore we default to the binary search, why making the code more complicated by including the code for the hopscotch-map library at all? Surely it's a nice exercise to test it and compare the performances, but once there is a clear winner, it is preferrable to keep the code as simple and readable as possible 😃
TQString TEquivChars::replaceChars( TQString inputQstring, bool isRegex )
{
int inStrLen = inputQstring.length();
TQString outString = TQString::fromLatin1( "" );
Owner

Given we know the length of the input string and we want to replace characters with equivalent ones, we should reserve space in the outString from the beginning to make the code more efficient. See documentation for TQString::reserve()

Given we know the length of the input string and we want to replace characters with equivalent ones, we should reserve space in the `outString` from the beginning to make the code more efficient. See documentation for `TQString::reserve()`
bool startedCharClass = false; // Previous character was starting '[' of character class
bool inCharacterClass = false; // [___]
bool inPosixBracketExpr = false; // [:___:]
#ifdef TSM_PCRE2
Owner

We would need to add cmake code to detect whether PCRE2 regex are supported and define TSM_PCRE2 accordingly

We would need to add cmake code to detect whether PCRE2 regex are supported and define TSM_PCRE2 accordingly
VinceR commented 1 year ago
Poster
Collaborator

We would need to add cmake code to detect whether PCRE2 regex are supported and define TSM_PCRE2 accordingly

I will do that once I introduce the class wrapping PCRE2. Since I went to some effort to develop and test the additional code to support PCRE2's enhanced regex, I would like to keep the #ifdef's in place.

>We would need to add cmake code to detect whether PCRE2 regex are supported and define TSM_PCRE2 accordingly I will do that once I introduce the class wrapping PCRE2. Since I went to some effort to develop and test the additional code to support PCRE2's enhanced regex, I would like to keep the #ifdef's in place.
Owner

Sounds good, as long as we do it before merging the PR 🙂

Sounds good, as long as we do it before merging the PR 🙂
bool inDirective = false; // (*___)
bool inGroupName = false; // (?<___>
#endif // TSM_PCRE2
wchar_t currChar = 0;
Owner

As per other comment, is it possible to use TQChar instead of w_char_t?

As per other comment, is it possible to use TQChar instead of w_char_t?
MicheleC marked this conversation as resolved
wchar_t prevChar = 0;
wchar_t nextChar = 0;
for ( int i = 0 ; i < inStrLen ; i++ , outString += TQChar(currChar) )
Owner

outString += TQChar(currChar)

If we pre-reserve the space in the outString, this should be efficient already. But you can also consider direct indexing-n-modifying into the string, as per TQString::reserver() example. Or you can also consider using a TQTextOStream on the string.

> outString += TQChar(currChar) If we pre-reserve the space in the outString, this should be efficient already. But you can also consider direct indexing-n-modifying into the string, as per `TQString::reserver()` example. Or you can also consider using a TQTextOStream on the string.
} // isRegex
CONVERT:
Owner

This label is not needed. At best, it should be made into a comment.

This label is not needed. At best, it should be made into a comment.
// No need to "handle" exception, *currchar was not changed
}
#else // TEQUIVCHARS_HASH_LOOKUP => Use binary search
currChar = p->lookup(currChar);
Owner

Consider using std::binary_search() as mentioned in a previous comment.

Consider using std::binary_search() as mentioned in a previous comment.
the one that replaced it.
*/
TQString TEquivChars::replaceCharsMB( TQString inputQstring, bool isRegex )
Owner

Coming from a point of view of keeping the code as simple as possible, the question is whether replaceCharsMB is needed or not. Is there a specific use case that replaceChars cannot handled and therefore replaceCharsMB is needed? If the answer is yes, we should consider using only replaceCharsMB. If the answer is no, we should consider using only replaceChars.

Coming from a point of view of keeping the code as simple as possible, the question is whether `replaceCharsMB` is needed or not. Is there a specific use case that `replaceChars` cannot handled and therefore `replaceCharsMB` is needed? If the answer is yes, we should consider using only `replaceCharsMB`. If the answer is no, we should consider using only `replaceChars`.
expression and the alphabetical characters inside Posix bracket [::]
expressions are left as-is
*/
TQString replaceChars( TQString inputString, bool isRegex = false );
Owner

Suggest we make inputString a const reference const TQString &inputString)

Suggest we make `inputString` a const reference `const TQString &inputString)`
Alternative implementation of replaceChars function that uses some
"multibyte string" / "wide character" functions defined in wchar.h.
*/
TQString replaceCharsMB( TQString inputString, bool isRegex = false );
Owner

Suggest we make inputString a const reference const TQString &inputString)

Suggest we make `inputString` a const reference `const TQString &inputString)`
Owner

@VinceR
I finally reviewed your latest commit and provided feedback about it.
Step by step, we are moving closer to the point where the PR will be ready for merging.

To summarize where we are and to make sure I did not forget anything, the next steps would be:

  1. to review the equivalence class code based on latest feedback
  2. integrate the equivalence class object into the string matcher, to make sure the characters are replaced by their equivalent form when needed
  3. finally review the whole PR and close the remaining pending points.

After that, we should move at reviewing the rest of the code that will build on this PR. In that respect, does TDE/tdebase#270 need any modification or is it good to go as is? And do we need more code on top of this PR and the one in tdebase? It has been a long time, so I lost a bit of clarity at where we stand in regards to the overall initial idea.

@VinceR I finally reviewed your latest commit and provided feedback about it. Step by step, we are moving closer to the point where the PR will be ready for merging. To summarize where we are and to make sure I did not forget anything, the next steps would be: 1. to review the equivalence class code based on latest feedback 2. integrate the equivalence class object into the string matcher, to make sure the characters are replaced by their equivalent form when needed 3. finally review the whole PR and close the remaining pending points. After that, we should move at reviewing the rest of the code that will build on this PR. In that respect, does TDE/tdebase#270 need any modification or is it good to go as is? And do we need more code on top of this PR and the one in tdebase? It has been a long time, so I lost a bit of clarity at where we stand in regards to the overall initial idea.
Owner

Let me propose that we use <HT> instead of <VT>. I was originally concerned that this would present a conflict when users specify \t in their regular expressions but given TDE's overzealous escaping of \, that won't be an issue.

is definitely preferrable compared to (see also other comment I made earlier today). Alternatively, why not using , and make it a CSV list of options and patterns?

In an effort to settle this point, the new code is using <HT> as the separator. We don't want to use , as the separator because that would require UI users who wanted to write a literal , in a pattern to write \, instead. This is not a problem for <HT> since users already know to specify \t in patterns that require it.

Ok, I think it is a good compromise. <HT> is definitely more readable than <VT> and your observation on \t makes sense. I didn't see any new code for it though, so maybe something you still haven't published.

> > > Let me propose that we use `<HT>` instead of `<VT>`. I was originally concerned that this would present a conflict when users specify `\t` in their regular expressions but given TDE's overzealous escaping of `\`, that won't be an issue. > > > > <HT> is definitely preferrable compared to <VT> (see also other comment I made earlier today). Alternatively, why not using `,` and make it a CSV list of options and patterns? > > In an effort to settle this point, the new code is using `<HT>` as the separator. We don't want to use `,` as the separator because that would require UI users who wanted to write a literal `,` in a pattern to write `\,` instead. This is not a problem for `<HT>` since users already know to specify `\t` in patterns that require it. > Ok, I think it is a good compromise. `<HT>` is definitely more readable than `<VT>` and your observation on `\t` makes sense. I didn't see any new code for it though, so maybe something you still haven't published.
VinceR commented 1 year ago
Poster
Collaborator

To summarize where we are and to make sure I did not forget anything, the next steps would be:

  1. to review the equivalence class code based on latest feedback

I will take a look at the feedback and respond in the next week.

  1. integrate the equivalence class object into the string matcher, to make sure the characters are replaced by their equivalent form when needed

Take a look at tdestringmatcher.{h,cpp} There were a number of changes in these, some of which implemented ANCHandling::EQUIVALENCE

  1. finally review the whole PR and close the remaining pending points.

Yes, I need to push the remaining changes to:

tdeio/tdeio/kdirlister.h
tdeio/tdeio/kdirlister.cpp
tdeio/tdeio/kdirlister_p.h
tdeio/tdeio/tdefilefilter.h
tdeio/tdeio/tdefilefilter.cpp
tdeio/tdefile/tdediroperator.cpp
tdeio/tdefile/tdefiletreebranch.cpp

These will be a lot more straightforward to review, I promise :)

After that, we should move at reviewing the rest of the code that will build on this PR. In that respect, does TDE/tdebase#270 need any modification or is it good to go as is?

I still need to push the latest changes for that PR

And do we need more code on top of this PR and the one in tdebase? It has been a long time, so I lost a bit of clarity at where we stand in regards to the overall initial idea.

Well ... I've been unit testing a new class that wraps PCRE2. I would eventually like to use it instead of TQRegExp for TDEStringMatcher because it supports a richer regex language, it's well supported, it's faster, and does not have the TQRegExp bugs. But in the interest of avoiding more scope creep in this PR, I should probably introduce that in a new PR.

> To summarize where we are and to make sure I did not forget anything, the next steps would be: > 1. to review the equivalence class code based on latest feedback I will take a look at the feedback and respond in the next week. > 2. integrate the equivalence class object into the string matcher, to make sure the characters are replaced by their equivalent form when needed Take a look at `tdestringmatcher.{h,cpp}` There were a number of changes in these, some of which implemented `ANCHandling::EQUIVALENCE` > 3. finally review the whole PR and close the remaining pending points. > Yes, I need to push the remaining changes to: ``` tdeio/tdeio/kdirlister.h tdeio/tdeio/kdirlister.cpp tdeio/tdeio/kdirlister_p.h tdeio/tdeio/tdefilefilter.h tdeio/tdeio/tdefilefilter.cpp tdeio/tdefile/tdediroperator.cpp tdeio/tdefile/tdefiletreebranch.cpp ``` These will be a lot more straightforward to review, I promise :) > After that, we should move at reviewing the rest of the code that will build on this PR. In that respect, does TDE/tdebase#270 need any modification or is it good to go as is? I still need to push the latest changes for that PR >And do we need more code on top of this PR and the one in tdebase? It has been a long time, so I lost a bit of clarity at where we stand in regards to the overall initial idea. Well ... I've been unit testing a new class that wraps PCRE2. I would eventually like to use it instead of `TQRegExp` for `TDEStringMatcher` because it supports a richer regex language, it's well supported, it's faster, and does not have the `TQRegExp` bugs. But in the interest of avoiding more scope creep in this PR, I should probably introduce that in a new PR.
VinceR commented 1 year ago
Poster
Collaborator

Ok, I think it is a good compromise. <HT> is definitely more readable than <VT> and your observation on \t makes sense. I didn't see any new code for it though, so maybe something you still haven't published.

tdestringmatcher.h line 57: inline constexpr char PatterStringDivider { '\t' }; and
referenced in a few other modules (namespace TSM).

But I do see an error: it should be spelled PatternStringDivider

> Ok, I think it is a good compromise. `<HT>` is definitely more readable than `<VT>` and your observation on `\t` makes sense. I didn't see any new code for it though, so maybe something you still haven't published. `tdestringmatcher.h` line 57: `inline constexpr char PatterStringDivider { '\t' };` and referenced in a few other modules (namespace TSM). But I do see an error: it should be spelled `PatternStringDivider`
Owner

Ok, I think it is a good compromise. <HT> is definitely more readable than <VT> and your observation on \t makes sense. I didn't see any new code for it though, so maybe something you still haven't published.

tdestringmatcher.h line 57: inline constexpr char PatterStringDivider { '\t' }; and
referenced in a few other modules (namespace TSM).

But I do see an error: it should be spelled PatternStringDivider

The screenshot below is taken from the current code on this PR on gitea. I don't see the PatterStringDivider.

And as well I still see this comment where the equivalence conversion should be:

// FIXME TBD: This is where we will be converting each alphanumeric
// character in stringToMatch to its "least" equivalent and storing
// the result in equivalentString. Until then, we'll just do:

Please double checked you pushed all the changes :-)

> > Ok, I think it is a good compromise. `<HT>` is definitely more readable than `<VT>` and your observation on `\t` makes sense. I didn't see any new code for it though, so maybe something you still haven't published. > > `tdestringmatcher.h` line 57: `inline constexpr char PatterStringDivider { '\t' };` and > referenced in a few other modules (namespace TSM). > > But I do see an error: it should be spelled `PatternStringDivider` The screenshot below is taken from the current code on this PR on gitea. I don't see the PatterStringDivider. And as well I still see this comment where the equivalence conversion should be: ``` // FIXME TBD: This is where we will be converting each alphanumeric // character in stringToMatch to its "least" equivalent and storing // the result in equivalentString. Until then, we'll just do: ``` Please double checked you pushed all the changes :-)
Owner
  1. integrate the equivalence class object into the string matcher, to make sure the characters are replaced by their equivalent form when needed

Take a look at tdestringmatcher.{h,cpp} There were a number of changes in these, some of which implemented ANCHandling::EQUIVALENCE

See previous comment about not seeing the changes :-)

  1. finally review the whole PR and close the remaining pending points.

Yes, I need to push the remaining changes to:
forward to review, I promise :)

After that, we should move at reviewing the rest of the code that will build on this PR. In that respect, does TDE/tdebase#270 need any modification or is it good to go as is?

I still need to push the latest changes for that PR

Thaanks for refreshing my memory

Well ... I've been unit testing a new class that wraps PCRE2. I would eventually like to use it instead of TQRegExp for TDEStringMatcher because it supports a richer regex language, it's well supported, it's faster, and does not have the TQRegExp bugs. But in the interest of avoiding more scope creep in this PR, I should probably introduce that in a new PR.

Yes, a follow up PR is probably a good idea

> > 2. integrate the equivalence class object into the string matcher, to make sure the characters are replaced by their equivalent form when needed > > Take a look at `tdestringmatcher.{h,cpp}` There were a number of changes in these, some of which implemented `ANCHandling::EQUIVALENCE` See previous comment about not seeing the changes :-) > > > 3. finally review the whole PR and close the remaining pending points. > > > > Yes, I need to push the remaining changes to: forward to review, I promise :) > > > After that, we should move at reviewing the rest of the code that will build on this PR. In that respect, does TDE/tdebase#270 need any modification or is it good to go as is? > > I still need to push the latest changes for that PR Thaanks for refreshing my memory > Well ... I've been unit testing a new class that wraps PCRE2. I would eventually like to use it instead of `TQRegExp` for `TDEStringMatcher` because it supports a richer regex language, it's well supported, it's faster, and does not have the `TQRegExp` bugs. But in the interest of avoiding more scope creep in this PR, I should probably introduce that in a new PR. Yes, a follow up PR is probably a good idea
VinceR commented 1 year ago
Poster
Collaborator

Please double checked you pushed all the changes :-)

Well I'm not sure how this happened, but apparently I committed and pushed only the new files and not the updated existing files.

In order to not confuse things too much, I will review your feedback on the new files, make changes accordingly to them, and then push everything again as merged commit.

> Please double checked you pushed all the changes :-) Well I'm not sure how this happened, but apparently I committed and pushed only the **new** files and not the updated existing files. In order to not confuse things too much, I will review your feedback on the new files, make changes accordingly to them, and then push everything again as merged commit.
VinceR commented 1 year ago
Poster
Collaborator

In order to not confuse things too much, I will review your feedback on the new files, make changes accordingly to them, and then push everything again as merged commit.

OK, I have done this and am ready to push a replacement commit. Since I don't want to destroy context, I will await your responses to my responses to your conversation items before I do the push.

>In order to not confuse things too much, I will review your feedback on the new files, make changes accordingly to them, and then push everything again as merged commit. OK, I have done this and am ready to push a replacement commit. Since I don't want to destroy context, I will await your responses to my responses to your conversation items before I do the push.
Owner

OK, I have done this and am ready to push a replacement commit. Since I don't want to destroy context, I will await your responses to my responses to your conversation items before I do the push.

I am done with replying to the various points.
Maybe it could be wiser to push the code in a new PR and close off this one? That way we preserve the comments here and we start a cleaner discussion for the review of the new code? What do you think?

> OK, I have done this and am ready to push a replacement commit. Since I don't want to destroy context, I will await your responses to my responses to your conversation items before I do the push. I am done with replying to the various points. Maybe it could be wiser to push the code in a new PR and close off this one? That way we preserve the comments here and we start a cleaner discussion for the review of the new code? What do you think?
Owner

Discussion continues on PR #209.

@VinceR should we mark this PR as closed since we continue on the new version?

Discussion continues on PR #209. @VinceR should we mark this PR as closed since we continue on the new version?
MicheleC removed this from the R14.2.x milestone 11 months ago
MicheleC changed title from WIP: Address issue 270 - Extend meaning of "Hidden Files" (PR version 3) to WIP: Extend meaning of "Hidden Files" (PR version 3) 11 months ago
Poster
Collaborator

Discussion continues on PR #209.

@VinceR should we mark this PR as closed since we continue on the new version?

Yes, I will mark it closed now.

> Discussion continues on PR #209. > > @VinceR should we mark this PR as closed since we continue on the new version? Yes, I will mark it closed now.
VinceR closed this pull request 11 months ago
VinceR deleted branch issue/270/tdelibs-V3 11 months ago

Reviewers

MicheleC requested changes 1 year ago
This pull request cannot be reopened because the branch was deleted.
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

No dependencies set.

Reference: TDE/tdelibs#179
Loading…
There is no content yet.