WIP: Replacement for PR #183 #196

This is a cleaned up version of PR #183, for review.

It is not meant to be merged yet.

Michele,

I tested the cleaned up code and it works. I needed to continue your cleanup in order to restore inadvertently broken dictionary sort feature.

I have a few questions / comments on the cleanup:

As you can see, I tend to prefer spaces in function definitions and calls, as well as underscores in variable names. I do that because I find that more readable. However, I will adhere to the "when in Rome, do as Romans do" principle. Is there a TDE coding style manual that I should consult?
A lot of the comments you removed were to help me keep track of where things were or observations/questions to my inexperienced self as I learned more about the code. Those of course should be removed. But there are a few that I feel should be left intact for future coders. We can discuss those later.

Ready for next steps :)

Vince

...Is there a TDE coding style manual that I should consult?

https://mirror.git.trinitydesktop.org/gitea/TDE/tde/wiki/Code-formatting-style-discussion
Still a draft I believe, but It's a start.

...But there are a few that I feel should be left intact for future coders. We can discuss those later.

You might want to discuss this on the dev channel instead, https://jabb.im/reg/
the channel's name is: tde-devs

Michel,

It looks my attempt to upload a patch to your reworked/feature/konqlistview++14.1 branch unintentionally updated another branch.

Sorry, this was a GIT-newbie mistake -- I did a "git push origin" with reworked/feature/konqlistview++14.1 checked out. To avoid speculation, how should I have done this correctly?

Vince

Sorry, this was a GIT-newbie mistake -- I did a "git push origin" with reworked/feature/konqlistview++14.1 checked out. To avoid speculation, how should I have done this correctly?

@VinceR
see comment here

Hi @VinceR,

Greg (@cethyel) has already pointed you to the intended TDE code style. It is not yet applied, so as you can see the TDE code is still quite messily (un)formatted. I am slowly working on a code formatting effort to enforce code style at some point, but it won't be finished any time soon. So for the time being, the "when in Rome, do as Romans do" principle is the best way and what I usually do.

But there are a few that I feel should be left intact for future coders.

Sure, we can talk here or on the developer chat room that Greg pointed you too.

@VinceR
Thanks for fixing the missing rename in my original version. I have now reworked the tree so I have integrated your latest commit into the original commit where it belongs.
You will need to update your local version of the branch to make sure it is aligned with the new one. The easiest way may be to rename the branch on your side ('git branch -m ') and check out the new branch again. This way you can see they are exactly the same 😄

@VinceR
I am back at this PR. A few comments to discuss together so that we can further advance this PR (sorry for the long post).

with alternate sorting, one of the two columns is fixed to "Filename". I think it would be even better if we get rid of this limitation and we let the user alternate between the last two columns that were clicked. It should be a relatively small change and extend the functionality further.
during testing alternate sorting, at the beginning when no column was clicked, Ctrl+S would do nothing and the user would be left wondering what's happening. It will be good to select two columns arbitrarily at the very beginning (for example column 0 and 1) if none was previously saved in the settings.
Dictionary sorter as it is implemented at the moment is a no-go. It needs to be reworked. There are three main reasons for that:

a. design: 'sortString' is used to store the name of an item in 'updateContents()' and then used in 'compare()' to decide the order of items. Although it works, this is a weak design and not recommended.
b. efficiency: dictionary sorting does a lot of string processing (cut and paste char by char). This is no problem with few items, but if you have a big folder with thousands of files, this is likely to have performance implications since string operations are not cheap.
c. functionality: dictionary sorting is looking only at the first non-letter non-digit number for ordering the items. While this may cover most of the cases, it does not work for those cases where the difference is after the first character. For example consider the case a~aa, "a~a a", a~ab, a~a+a, a~a~a. I believe (I have not tested directly) that this will "neutralize" the dictionary sorting logic, since the current algorighm will find the first '~' char and then sort based on the rest of the item name, which is exactly what dictionary sorting is trying to fix.

It could be a good idea to split this PR into two parts. First we work on alternate sorting only, fix point 1. and 2. and merge into Konqueror code. Then we can work on the dictionary sorting algorithm and fix the problems in 3. What do you think?

PS: I am not trying to discourage your effort, on the contrary I want to help to improve the PR 😄 For dictionary sorting implementation, I can guide you through when we come to fix that.

@VinceR I am back at this PR. A few comments to discuss together so that we can further advance this PR (sorry for the long post). 1. with alternate sorting, one of the two columns is fixed to "Filename". I think it would be even better if we get rid of this limitation and we let the user alternate between the last two columns that were clicked. It should be a relatively small change and extend the functionality further. 2. during testing alternate sorting, at the beginning when no column was clicked, Ctrl+S would do nothing and the user would be left wondering what's happening. It will be good to select two columns arbitrarily at the very beginning (for example column 0 and 1) if none was previously saved in the settings. 3. Dictionary sorter as it is implemented at the moment is a no-go. It needs to be reworked. There are three main reasons for that: * a. design: 'sortString' is used to store the name of an item in 'updateContents()' and then used in 'compare()' to decide the order of items. Although it works, this is a weak design and not recommended. * b. efficiency: dictionary sorting does a lot of string processing (cut and paste char by char). This is no problem with few items, but if you have a big folder with thousands of files, this is likely to have performance implications since string operations are not cheap. * c. functionality: dictionary sorting is looking only at the first non-letter non-digit number for ordering the items. While this may cover most of the cases, it does not work for those cases where the difference is after the first character. For example consider the case a~aa, "a~a a", a~ab, a~a+a, a~a~a. I believe (I have not tested directly) that this will "neutralize" the dictionary sorting logic, since the current algorighm will find the first '~' char and then sort based on the rest of the item name, which is exactly what dictionary sorting is trying to fix. It could be a good idea to split this PR into two parts. First we work on alternate sorting only, fix point 1. and 2. and merge into Konqueror code. Then we can work on the dictionary sorting algorithm and fix the problems in 3. What do you think? PS: I am not trying to discourage your effort, on the contrary I want to help to improve the PR 😄 For dictionary sorting implementation, I can guide you through when we come to fix that.

By the way, there is also issue #177, which refers to an existing patch that was created for Dolphin. Does it make sense to take this patch into account as well?

By the way, there is also issue #177, which refers to an existing patch that was created for Dolphin. Does it make sense to take this patch into account as well?

The two topics are related, so it is probably good to have a general look of that patch too. But in terms of implementation I expect the two code bases to be quite different, so probably we can't simply take that patch and put it here.

Michele,

In partial response to your previous email:

with alternate sorting, one of the two columns is fixed to "Filename". I think it would be even better if we get rid of this limitation and we let the user alternate between the last two columns that were clicked. It should be a relatively small change and extend the functionality further.

I completely agree and that was how I had initially intended to do things. It's been a few years, but I recall encountering difficulties when columns were rearranged due to the fact that 'Filename' was being treated differently than the others. My understanding of how things work has come a long way. Could you describe the small change and and can play around with that this weekend.

during testing alternate sorting, at the beginning when no column was clicked, Ctrl+S would do nothing and the user would be left wondering what's happening. It will be good to select two columns arbitrarily at the very beginning (for example column 0 and 1) if none was previously saved in the settings.

I also noticed that. Again, I recall having difficulty initializing default values for those settings in the event that they weren't stored on disk. But is seems that even when those preferences have been stored, they don't seem to take effect with a new listview until the user does that initial column click. Do you see any obvious areas where I should be re-reading the on-disk settings and/or calling updateContents()?

It could be a good idea to split this PR into two parts. First we work on alternate sorting only, fix point 1. and 2. and merge into Konqueror code. Then we can work on the dictionary sorting algorithm and fix the problems in 3. What do you think?

I agree, especially given your concerns :) Should that become a separate PR? I do have responses your three comments, but I will save them for later when we return to reviewing and working on the "dictionary sort" feature.

Vince

By the way, there is also issue #177, which refers to an existing patch that was created for Dolphin. Does it make sense to take this patch into account as well?

The two topics are related, so it is probably good to have a general look of that patch too. But in terms of implementation I expect the two code bases to be quite different, so probably we can't simply take that patch and put it here.

Now we are talking "fancy" sorting!

Using Gnu Coreutils sort as a reference, the sort proposed in #177 is comparable to "sort --version-sort". The dictionary sort feature I had proposed in this PR is roughly comparable to "sort --dictionary-order" except that I only deal with the 1st character and the final (tie-breaking) collation is different.

One observation I have about the patch referenced in #177: The string mangling necessary necessary to accomplish the objective is done in the compare function, which sort will call many times. I was concerned enough about this to move that key generation function to updateContents() which is called mainly when a directory is changed.

I do think that having a flexible array of extended sorting features will be useful to those (such as myself) that who use long descriptive file names for documents.

Hi @VinceR,
thanks for the answer.
I will put this PR aside for one or two weeks and I will come back to it after the release of R14.0.10, which we are closing off in the coming days.
At that point I can revisit the code and help out on the various points.
In the meantime we can discuss various topics related to ir.

Could you describe the small change and and can play around with that this weekend.

You are using two variables to store the columns to use for primary and alternate sorting, so it should be enough to update those two variables each time a user click on a different column. That is why I think it should be a relatively small fix.

I agree, especially given your concerns :) Should that become a separate PR?

Probably it is the wiser way forward. We can keep this PR for reference, but create a first PR for alternate sorting, merge it and later work on the dictionary sorting and the "fancy" patch too.

with alternate sorting, one of the two columns is fixed to "Filename". I think it would be even better if we get rid of this limitation and we let the user alternate between the last two columns that were clicked. It should be a relatively small change and extend the functionality further.

during testing alternate sorting, at the beginning when no column was clicked, Ctrl+S would do nothing and the user would be left wondering what's happening. It will be good to select two columns arbitrarily at the very beginning (for example column 0 and 1) if none was previously saved in the settings.

OK, hopefully these suggestions were successfully and safely addressed in last commit that I just pushed.

And hopefully that push doesn't cause the headaches that my last one did :)

Hi @VinceR, sorry for the long delay, too many thigns to do and not enought time.
Just want to let you know that I have made a list of things I want to do/close in the next 4 months and this PR is part of the list. Not top priority, but it is there.
Please be patient a bit more 😄

No problem, @MicheleC. In the mean time, I will try to implement #177.

No problem, @MicheleC. In the mean time, I will try to implement #177.

Sounds good (y)

@VinceR
fyi, I should be back to this PR either next week or the following one 😄

@VinceR
let's get back to this PR and get things moving forward!!

Going back to this comment and this other one, as I proposed let's focus on the alternate sorting first, then we can look at the dictionary sorting which needs more work.

Could you create a new PR with only the code for alternate sorting? You can use this PR as the starting point and just use the part of code required for alternate sorting. Please use a different branch for the PR to keep things clean (gitea may behave strangely when the same branch is used for two different PRs).

I promise I will look at the new PR in a short time once ready, don't want to drag this another 6 months 😄

I have rebased this PR on top of current master, so make sure you get the latest version of this branch before you start looking into it again. The contents are the same, just the commit hash are different.

MicheleC: since my last message ~6 months ago, I have migrated my home systems from 14.0.x to 14.1 with all of changes made for this PR. I encountered no issues while actively leveraging the new listview features. So, I think the code is solid :)

Based on your concerns, I just pulled out the code that implements "Dictionary Order Sort" and gave it a quick test in a chrooted test environment to make sure that nothing else broke. I will push that change shortly.

There is one side effect of pulling out that feature: Turning off the "Group Hidden First" property may appear to do nothing due to the fact that 'dot' files will still sort together before files that start with an alphanumeric character. The difference will only be visible when sorting in reverse or when there are file names that start with a lower non-alphanumeric character (e.g. ' ', '!', … ,'+', '-'). Dictionary sort order did make the effects more pronounced.

Hi @VinceR,
good to hear you are on R14.1 and there are no issues, it will make integration easier since we work out of master when merging code.

Regarding the dictionaly sort, my concers are not for the functionality, but in the way it is implemented, that will need to be re looked at.

What I was suggesting was to leave this PR untouched for reference and create a separate PR only for the alternate sorting so we can merge that first, then later come back here for the dictionary sorting. What do you think?

OK, so the new PR will contain the sort actions "Alternate Sort Order" (Ctrl+S) and "Reverse Sort Order" (Ctrl+R).

Can it also contain the switchable sort options "Group Directories First" and "Group Hidden First" or should those go into a separate (3rd) PR?

OK, so the new PR will contain the sort actions "Alternate Sort Order" (Ctrl+S) and "Reverse Sort Order" (Ctrl+R).

Can it also contain the switchable sort options "Group Directories First" and "Group Hidden First" or should those go into a separate (3rd) PR?

Yes, that can be included in the same PR.

PR #240 has been merged, we can now start looking into dictionary sorting.
This comment lists the issue with the original dictionary sorting code.
Point 3.c should be the first to be addressed, then we can look at how to implement it efficiently (3.a and 3.b).
@VinceR how shall we proceed on this? Happy to hear your suggestions.

c. functionality: dictionary sorting is looking only at the first non-letter non-digit number for ordering the items. While this may cover most of the cases, it does not work for those cases where the difference is after the first character.

This is a reasonable point. I probably should have described the sort option "Ignore Leading Special Char" instead of the more ambitious "Dictionary Sort Order". There were reasons for that limited focus, including (for example) collating .Xresources next to Xresources when "Group Hidden First" is unchecked.

Before I start modifying TDE code, please take a look at the following code to see if I am on the right track:

#define IS_DICT_CHAR(c) (c.isLetterOrNumber()||c.isSpace())
#define PLACE_HOLDER_CHAR 1

TQString mkDictKey( TQString inKey )
{
  TQString outKey;
  uint keyLength = inKey.length();
  uint inKeyPos, outKeyPos = 0;
  outKey.reserve( keyLength * 2 );

  // Pass 1: Extract only dictionary relevant characters
  for( inKeyPos = 0; inKeyPos < keyLength; inKeyPos++ ) {
    TQChar curChar = inKey.at(inKeyPos);
    if ( IS_DICT_CHAR(curChar) ) {
      outKey += curChar;
    }
  }

  if ( outKey.length() == 0 ) {
    return(inKey);
  }

  /*
     The output key now contains only the dictionary characters from the
     input key. We extend that string with the non-dictionary characters
     from the input key interspersed with a neutral placeholder character
     for each dictionary character. Hopefully that will produce a unique
     output string that will collate predictably.
  */

  // Pass 2: Extract non-dictionary relevant characters + placeholders
  for( inKeyPos = 0; inKeyPos < keyLength; inKeyPos++ ) {
    TQChar curChar = inKey.at(inKeyPos);
    if ( IS_DICT_CHAR(curChar) ) {
      outKey += (TQChar) PLACE_HOLDER_CHAR ; // Append placeholder char
    }
    else {
      outKey += curChar; // Append non-dictionary character
    }
  }

  return outKey ;
}

The function can be called to pre-generate a sort key (like I did originally) or called on the fly in a sort comparison function.

> c. functionality: dictionary sorting is looking only at the first non-letter non-digit number for ordering the items. While this may cover most of the cases, it does not work for those cases where the difference is after the first character. This is a reasonable point. I probably should have described the sort option "Ignore Leading Special Char" instead of the more ambitious "Dictionary Sort Order". There were reasons for that limited focus, including (for example) collating .Xresources next to Xresources when "Group Hidden First" is unchecked. Before I start modifying TDE code, please take a look at the following code to see if I am on the right track: ``` #define IS_DICT_CHAR(c) (c.isLetterOrNumber()||c.isSpace()) #define PLACE_HOLDER_CHAR 1 TQString mkDictKey( TQString inKey ) { TQString outKey; uint keyLength = inKey.length(); uint inKeyPos, outKeyPos = 0; outKey.reserve( keyLength * 2 ); // Pass 1: Extract only dictionary relevant characters for( inKeyPos = 0; inKeyPos < keyLength; inKeyPos++ ) { TQChar curChar = inKey.at(inKeyPos); if ( IS_DICT_CHAR(curChar) ) { outKey += curChar; } } if ( outKey.length() == 0 ) { return(inKey); } /* The output key now contains only the dictionary characters from the input key. We extend that string with the non-dictionary characters from the input key interspersed with a neutral placeholder character for each dictionary character. Hopefully that will produce a unique output string that will collate predictably. */ // Pass 2: Extract non-dictionary relevant characters + placeholders for( inKeyPos = 0; inKeyPos < keyLength; inKeyPos++ ) { TQChar curChar = inKey.at(inKeyPos); if ( IS_DICT_CHAR(curChar) ) { outKey += (TQChar) PLACE_HOLDER_CHAR ; // Append placeholder char } else { outKey += curChar; // Append non-dictionary character } } return outKey ; } ``` The function can be called to pre-generate a sort key (like I did originally) or called on the fly in a sort comparison function.

Hi @VinceR
I think we first need to define exactly what we want to achieve with dictionary sorting. What is the objective?

sort ignoring only leading special (non alphanumerical) characters?
sort ignoring all special characters?
others? please specify

I also think the name "dictionary sorting" is a bit unclear, so probably as you suggested "ignore special (laeading) characters" may be a better fit.

Once we define what we what to implement, we will be clearer on how to proceed. One key point is to avoid string concatenations as much as possible. Those are expensive operations, so adding one character at a time to a string is not recommended (which is what you were doing in the original solution and also in the code above).

@SlavekB
would also be good to hear you opinion on the objective

Hi @VinceR
I think we first need to define exactly what we want to achieve with dictionary sorting. What is the objective?

sort ignoring only leading special (non alphanumerical) characters?

sort ignoring all special characters?

others? please specify

Adding Objective 0. sort ignoring first leading special character.

My first cut was to implement objective (0) - an admittedly limited scope motivated partially by the desire to be able to ignore the '.' in dotfiles when collating. I interpreted your concern with this limited scope as suggesting we implement objective (2), so that's what I tried to do. I was not considering objective (1).

I also think the name "dictionary sorting" is a bit unclear, so probably as you suggested "ignore special (laeading) characters" may be a better fit.

The term "dictionary sorting" was inspired by the Gnu Coreutils sort option --dictionary-order.

Once we define what we what to implement, we will be clearer on how to proceed. One key point is to avoid string concatenations as much as possible. Those are expensive operations, so adding one character at a time to a string is not recommended (which is what you were doing in the original solution and also in the code above).

Of course I have not looked at TQt source code but I am honestly surprised that TQString & operator+= ( TQChar c ) would be so expensive in a preallocated string. How does one build a TQString efficiently one TQChar at a time?

@SlavekB
would also be good to hear you opinion on the objective

Hi @VinceR
I think we first need to define exactly what we want to achieve with dictionary sorting. What is the objective?

sort ignoring only leading special (non alphanumerical) characters?

sort ignoring all special characters?

others? please specify

Adding Objective 0. sort ignoring first leading special character.

My first cut was to implement objective (0) - an admittedly limited scope motivated partially by the desire to be able to ignore the '.' in dotfiles when collating. I interpreted your concern with this limited scope as suggesting we implement objective (2), so that's what I tried to do. I was not considering objective (1).

Thank you, objective 2 seem like a good idea.

I also think the name "dictionary sorting" is a bit unclear, so probably as you suggested "ignore special (laeading) characters" may be a better fit.

The term "dictionary sorting" was inspired by the Gnu Coreutils sort option --dictionary-order.

Thank you. From the name Dictionary I first didn't understand what it means. If we manage to find the name that will be more obvious, it would be good. If not, here it is good that it has the same name as in other tools for sorting.

Once we define what we what to implement, we will be clearer on how to proceed. One key point is to avoid string concatenations as much as possible. Those are expensive operations, so adding one character at a time to a string is not recommended (which is what you were doing in the original solution and also in the code above).

Of course I have not looked at TQt source code but I am honestly surprised that TQString & operator+= ( TQChar c ) would be so expensive in a preallocated string. How does one build a TQString efficiently one TQChar at a time?

I assume this task can be good job for regular expression.

@SlavekB
would also be good to hear you opinion on the objective

Hi @VinceR,
Objective 2 seems good to me too.
TQString copies over the existing string and add the new characters when operator += is used. You can check the code if you want to see the details.
To get a proper string order without doing too much copying operations, you could preallocate enough bytes for the full strings and build up a required key string on the spot, this way you only copy characters onces.
Even better would be to find a numerical growing function that you can build by only reading the string without copying anything.
I will think a bit more on the topic and see if I can find a good idea. Will write back in a few days, hopefully with some smart ideas :-)

Hi @VinceR,
I spent some time to think about an effective solution and I think I found one that does not require copy any character at all, just reading.
To compared two entries we will do a smart char to char comparison, with up to two passes.

In the first pass, you compared only standard characters (letters, numbers and possibly underscore, space and dash since they are commonly used to separate words) and you skip over any other special char. For example:

aab   < aac  : trivial
aa~b  < aac  : the ~ is ignored and 'b' will be compared against 'c'
a#b~c < ab^c : 'b' compares with 'b' and 'c' with 'c'

You can start from the beginning of the name and move one char at a time, but skip over special chars. As soon as you detect a string is < or > of the other, comparison stops, no need to continue till the end of the string. Also there is no need to execute the second pass in this case.

If the two strings are equal at the end of the first pass (same number of normal chars and same chars), then you execute the second pass, where you compare only the special chars of the two strings. The comparison process is the same as the first pass, but looking at the special chars instead. For example:

a@b < a~b : trivial
a@ab < aa~b : first pass will give identical strings, second pass '@' compares against '~'

This way we can compare strings in "dictionary order" without having to do any writing at all.
If you have any question or if the explanation is not clear, let me know and I will be happy to provide more info.

I am sorry for delay in responding but I had to take a "Covid" break over the holidays :(

First to respond to SlavekB

Thank you. From the name Dictionary I first didn't understand what it means. If we manage to find the name that will be more obvious, it would be good. If not, here it is good that it has the same name as in other tools for sorting.

I have to agree that the term "Dictionary Order" may not be very meaninful to most people nowadays. Maybe the proposed sort option would be better named "Alphanumeric sort" or "Ignore special characters".

Now to respond to solution sketched by MicheleC

I have been assuming that pre-constructing a sort key associated with a file name would be more efficient because it would only need to be done once (in KonqListViewItem::updateContents) and reused in the more frequently invoked comparison function (KonqBaseListViewItem::compare) which ends up calling the TQString::localeAwareCompare function on the generated keys to do the final comparison.

Your method suggests bypassing construction of a sort key and relying only on the comparison function to accomplish things. Based on the statements in https://unicode-org.github.io/icu/userguide/collation/concepts#sortkeys-vs-comparison, it would appear that (contrary to my intuition) this general approach would be more efficient. Also implementation of "natural sort" as suggested by PR #177 might not be so amenable to a pre-generated sort key approach.

However: what type of comparison function would you suggest using for each pair of characters? If you use the comparison operators '==', '>' or '<', you would not being doing the locale-aware comparison that users might be expecting. Even if TQChar had an appropriate comparison function, I would be suspicious of it because of how surrogate character pairs would be handled.

Maybe the solution is to allocate an array of TQChar, build it, and convert it to a TQString before returning result. I will play around with that and get back to you.

I am sorry for delay in responding but I had to take a "Covid" break over the holidays :( First to respond to SlavekB >Thank you. From the name Dictionary I first didn't understand what it means. If we manage to find the name that will be more obvious, it would be good. If not, here it is good that it has the same name as in other tools for sorting. I have to agree that the term "Dictionary Order" may not be very meaninful to most people nowadays. Maybe the proposed sort option would be better named "Alphanumeric sort" or "Ignore special characters". Now to respond to solution sketched by MicheleC I have been assuming that pre-constructing a sort key associated with a file name would be more efficient because it would only need to be done once (in KonqListViewItem::updateContents) and reused in the more frequently invoked comparison function (KonqBaseListViewItem::compare) which ends up calling the TQString::localeAwareCompare function on the generated keys to do the final comparison. Your method suggests bypassing construction of a sort key and relying only on the comparison function to accomplish things. Based on the statements in https://unicode-org.github.io/icu/userguide/collation/concepts#sortkeys-vs-comparison, it would appear that (contrary to my intuition) this general approach would be more efficient. Also implementation of "natural sort" as suggested by PR https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/issues/177 might not be so amenable to a pre-generated sort key approach. However: what type of comparison function would you suggest using for each pair of characters? If you use the comparison operators '==', '>' or '<', you would not being doing the locale-aware comparison that users might be expecting. Even if TQChar had an appropriate comparison function, I would be suspicious of it because of how surrogate character pairs would be handled. Maybe the solution is to allocate an array of TQChar, build it, and convert it to a TQString before returning result. I will play around with that and get back to you.

Hi @VinceR,

I have been assuming that pre-constructing a sort key associated with a file name would be more efficient because it would only need to be done once

with or without a sort key, you need to compare strings. The steps to do the comparison are the same, the only difference would be what you would be comparing (string or key). In case there are no special characters, key and sort key would be the same string. Building a sort key takes more time and resources, so it is not as efficient in this context. And an additional problem would be to keep the sort keys in sync with the strings to sort.

What type of comparison function would you suggest using for each pair of characters?

You can use operator[] to get unicode TQChar from the strings, then use operator< for comparison. I also recommend you look at a stable sort algorithm, so if two strings match, their relative order remains unchanged - see here for a bried explanation of the differences between the two methods.
TQString supports local enconding through TQTextCodec (see TQTextCodec class), so there shouldn't be any problem with locale awareness. It would be no difference from what is already implemented in Konqueror at the moment.

Maybe the solution is to allocate an array of TQChar, build it, and convert it to a TQString before returning result. I will play around with that and get back to you.

This once again would not be efficient. Keep in mind we may have to sort folders with thousand on files inside, so the more efficiente the solution, the better.

Hi MicheleC,

At the risk of sounding stubborn and/or dense :) … for the sake of argument, let's ignore the current task at hand (alphanumeric-centric comparison) and the pros and cons of pre-computing & storing a custom comparison key versus deferring everything to a sort-called comparison function.

I am concerned that your suggested method of comparing character-by-character would effectively force all filename sorting to LC_COLLATE="C" or to use older terminology, ASCII sort order. To quote from TQT docs:

int operator> ( TQChar c1, TQChar c2 )
Returns TRUE if the numeric Unicode value of c1 is greater than that of c2; otherwise returns FALSE.

I don't see how TQTextCodec would help. How would a character-by-character comparison method guarantee:

"graße" sorts before "grate" when LC_COLLATE="de_DE.utf8"
"graße" sorts after "grate" when LC_COLLATE="C"
"jet" sorts before "yet" when LC_COLLATE="en_US.utf8"
"jet" sorts after "yet" when LC_COLLATE="lt_LT.utf8"

Currently TDE Konqueror tries to honor locale-specific sorting, albeit with what I consider to be a bug (soon to be reported). It does so by using TQString::localeAwareCompare() on whole strings. There is no comparable function for TQChar.

What am I overlooking here?

Hi Vince,

at the risk of sounding stubborn and/or dense :)

no worries, discussing things together is always a good thing :-)

Basically there are two different things. One is what to sort and the other is how to sort.

what to sort: whether we sort strings or we create sort keys (which are strings too) as in your original code, in the end we have to compare them to order the items. Creating sort keys involve reading the original strings, do some processing and store the sort keys. This requires extra processing time and extra resources (memory) to hold the sort keys. And if the sort keys were built concatenating one char at a time, it would be even more inefficient. Reading the original strings is the solution with less processing/resource requirements.
how to sort: this is where locale awareness comes in. I am not an expert, but the key point is for the sort function to be aware of the locate and order chars (so strings too) based on that. I agree with you that TQChar::operator< only looks at the uncode values, so I guess we need to use a different function that sort characters based on the locale selected. Maybe we need to add a locale aware comparison function for TQChar too ;-)

How would a character-by-character comparison method guarantee:
"graße" sorts before "grate" when LC_COLLATE="de_DE.utf8"
"graße" sorts after "grate" when LC_COLLATE="C"

As said above, if we have a locale aware comparison function, in one case 'ß' will be ordered before 't' and in the other after.

Doing char-by-char comparison is not the problem, it is how we order them. Does this make sense? :-)

If locale aware comparison requires ordering group of adjacent characters, then we may have to go for a solution that use a sort string, but in that case we need to avoid continuouos concatenation and simply copy the required characters at most once.

Is the current sorting in Konqueror/TDE, already locale aware?

Note: I haven't yet looked at your proposed solution on TDE/tdebase#253, I need to find some time for it. But it would help to have your feedback on the above point even before I look at that.

If locale aware comparison requires ordering group of adjacent characters, then we may have to go for a solution that use a sort string, but in that case we need to avoid continuouos concatenation and simply copy the required characters at most once.

Is the current sorting in Konqueror/TDE, already locale aware?

Hey MicheleC,

I think I've been living in Konqueror listview code too long:) Anyway, here is what I discovered during my explorations the past few years:

Konqueror sorts listviews upon directory change, sort column change, and sort option changes. These calls are in konqueror/konqlistviewitems.cpp and take the form of "mpListView->sort()". I was never able to discover where that sort() method was implemented, so I don't know what sort algorithm is being used. But I did discover indirectly that it uses a callback function to conduct the actual comparison of 2 sort column values.

That callback function is KonqBaseListViewItem::compare and is implemented in konqueror/konqlistviewitems.cpp. You can see from its code that it implements course grouping (e.g. directories first), numeric value based comparison (for sort columns Size, Modified, Created, Accessed), and at the bottom, text value based comparison for every other type of sort column.

The text value comparisons are all made between strings, not individual characters. Specifically, Konqueror is calling TQString::localeAwareCompare either directly or indirectly, falling back on TQString::compare when necessary:

konqueror/konq_listviewitems.cpp ~ KonqBaseListViewItem::compare

 if ( m_pListViewWidget->caseInsensitiveSort() )
   return text( col ).lower().localeAwareCompare( k->text( col ).lower() );
 else {
   return m_pListViewWidget->m_pSettings->caseSensitiveCompare( text( col ), k->text( col ) );

libkonq/konq_settings.cpp ~ KonqFMSettings::caseSensitiveCompare

if ( d->localeAwareCompareIsCaseSensitive ) {
    return a.localeAwareCompare( b );
}
else // can't use localeAwareCompare, have to fallback to normal TQString compare
    return a.compare( b );

TQString::localeAwareCompare (which on Linux systems I believe is simply calling strcoll) is already doing all of the hard work of managing the complexity of language-specific collations and string comparisons. I also believe that a great deal of effort has gone into making it (and strcoll) as efficient as possible. I would be reluctant to avoid using it or to reinvent portions of it. But that means we would have to utilize strings or substrings to pass to it.

A final observation:

The standard KonqBaseListViewItem::compare function is called many times during a sort. In one case where a directory had 13 items and was being sorted by file name, that function was called 63 times! It is important to keep that routine as slim (compute vs. memory) as possible when adding new functionality to Konqueror. How to do that for the new sort option is the question.

> If locale aware comparison requires ordering group of adjacent characters, then we may have to go for a solution that use a sort string, but in that case we need to avoid continuouos concatenation and simply copy the required characters at most once. > > Is the current sorting in Konqueror/TDE, already locale aware? Hey MicheleC, I think I've been living in Konqueror listview code too long:) Anyway, here is what I discovered during my explorations the past few years: Konqueror sorts listviews upon directory change, sort column change, and sort option changes. These calls are in *konqueror/konqlistviewitems.cpp* and take the form of "*mpListView->sort()*". I was never able to discover where that sort() method was implemented, so I don't know what sort algorithm is being used. But I did discover indirectly that it uses a callback function to conduct the actual comparison of 2 sort column values. That callback function is *KonqBaseListViewItem::compare* and is implemented in *konqueror/konqlistviewitems.cpp*. You can see from its code that it implements course grouping (e.g. directories first), numeric value based comparison (for sort columns Size, Modified, Created, Accessed), and at the bottom, text value based comparison for every other type of sort column. The text value comparisons are all made between strings, not individual characters. Specifically, Konqueror is calling **TQString::localeAwareCompare** either directly or indirectly, falling back on **TQString::compare** when necessary: **<p align="center">konqueror/konq_listviewitems.cpp ~ KonqBaseListViewItem::compare</p>** if ( m_pListViewWidget->caseInsensitiveSort() ) return text( col ).lower().localeAwareCompare( k->text( col ).lower() ); else { return m_pListViewWidget->m_pSettings->caseSensitiveCompare( text( col ), k->text( col ) ); **<p align="center">libkonq/konq_settings.cpp ~ KonqFMSettings::caseSensitiveCompare</p>** if ( d->localeAwareCompareIsCaseSensitive ) { return a.localeAwareCompare( b ); } else // can't use localeAwareCompare, have to fallback to normal TQString compare return a.compare( b ); **TQString::localeAwareCompare** (which on Linux systems I believe is simply calling *strcoll*) is already doing all of the hard work of managing the complexity of language-specific collations and string comparisons. I also believe that a great deal of effort has gone into making it (and *strcoll*) as efficient as possible. I would be reluctant to avoid using it or to reinvent portions of it. But that means we would have to utilize strings or substrings to pass to it. A final observation: The standard *KonqBaseListViewItem::compare* function is called many times during a sort. In one case where a directory had 13 items and was being sorted by file name, that function was called 63 times! It is important to keep that routine as slim (compute vs. memory) as possible when adding new functionality to Konqueror. How to do that for the new sort option is the question.

Note: I haven't yet looked at your proposed solution on TDE/tdebase#253, I need to find some time for it. But it would help to have your feedback on the above point even before I look at that.

The issue centers around the rationale and implementation of the special front-end comparison KonqFMSettings::caseSensitiveCompare (code quoted in previous message). Although I don't want to leave this current PR unworked for too long, I would like to know that you and SlavekB agree that the newly reported issue is indeed a bug and that maybe my proposed code changes help advance things to a resolution.

Although I don't want to leave this current PR unworked for too long

Hi @VinceR, same here, don't want this to wait for 6 months like before.
I have started too many things in recent weeks and I am now going through one at a time to complete them. This and #253 are on the list, so I will check and write back shortly, either this week or more likely next week.

Hi @VinceR
I think we first need to define exactly what we want to achieve with dictionary sorting. What is the objective?

sort ignoring only leading special (non alphanumerical) characters?

sort ignoring all special characters?

others? please specify

Adding Objective 0. sort ignoring first leading special character.

My first cut was to implement objective (0) - an admittedly limited scope motivated partially by the desire to be able to ignore the '.' in dotfiles when collating. I interpreted your concern with this limited scope as suggesting we implement objective (2), so that's what I tried to do. I was not considering objective (1).

Quoting from my comment in issue #252

I am interested in understanding why you were expecting sort order # 2 for case sensitive sorting.

This is the actual sort order provided with the en_US.utf8 locale. This is the order returned by TQString::localeAwareCompare and strcoll functions. This ordering is not arbitrary but standardized by ICU. Here is what I like about it:

It collates alphabetic characters together, including case variants and accented variants.

It has a consistant general ordering: special/punctuation characters sort before numerals which sort before alphabetic characters.

It has one more benefit that affects pull request #196 which may render it moot. I will describe that later in another message.

In working with this PR and issue #252 and its associated PR #253, I've gotten a lot of hands-on experience with what I have been calling locale-aware sorting and comparison.

I recently noticed that if you are working with a true language-specific locale (e.g. 'en_US@utf8' versus 'POSIX' or 'C') and files are sorted using locale-aware comparison, then objectives 0 and 1 described above seem to already be satisfied. I have not done any testing, but I suspect that objective 2 may also satisfied in a good-enough way. If that is the case, then this PR becomes unnecessary!.

So why didn't I see that when I first came up with the notion of dictionary-order sorting? Well many years ago my system administrator (me) set LC_COLLATE=POSIX systemwide for reasons that are no longer known. As a result, file name sorting was always bound to unicode codepoint order even when "Case Insensitive Sorting" was checked. Getting rid of that LC_COLLATE variable and letting things default to $LANG was a real eye-opener. Things like 'ls' displayed files in a much more reasonable order. And GTK file-open dialogs went from obnoxious to simply annoying.

Even we decide to close this PR, the discussions we've had will definitely be valuable when addressing issue #177 which cannot be so simply addressed.

> > Hi @VinceR > > I think we first need to define exactly what we want to achieve with dictionary sorting. What is the objective? > > 1. sort ignoring _only_ leading special (non alphanumerical) characters? > > 2. sort ignoring _all_ special characters? > > 3. others? please specify > > > > Adding Objective 0. sort ignoring *first* leading special character. > > My first cut was to implement objective (0) - an admittedly limited scope motivated partially by the desire to be able to ignore the '.' in dotfiles when collating. I interpreted your concern with this limited scope as suggesting we implement objective (2), so that's what I tried to do. I was not considering objective (1). > ----- Quoting from my comment in issue https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/issues/252#issuecomment-17677 > > I am interested in understanding why you were expecting sort order # 2 for case sensitive sorting. > > This is the actual sort order provided with the en_US.utf8 locale. This is the order returned by TQString::localeAwareCompare and strcoll functions. This ordering is not arbitrary but standardized by ICU. Here is what I like about it: > > 1. It collates alphabetic characters together, including case variants and accented variants. > 2. It has a consistant general ordering: special/punctuation characters sort before numerals which sort before alphabetic characters. > 3. It has one more benefit that affects pull request https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/pulls/196 which may render it moot. I will describe that later in another message. > ----- In working with this PR and issue https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/issues/252 and its associated PR https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/pulls/253, I've gotten a lot of hands-on experience with what I have been calling locale-aware sorting and comparison. I recently noticed that if you are working with a true language-specific locale (e.g. 'en_US@utf8' versus 'POSIX' or 'C') and files are sorted using locale-aware comparison, then objectives 0 and 1 described above seem to already be satisfied. I have not done any testing, but I suspect that objective 2 may also satisfied in a good-enough way. If that is the case, then this PR becomes unnecessary!. So why didn't I see that when I first came up with the notion of dictionary-order sorting? Well many years ago my system administrator (me) set LC_COLLATE=POSIX systemwide for reasons that are no longer known. As a result, file name sorting was always bound to unicode codepoint order even when "Case Insensitive Sorting" was checked. Getting rid of that LC_COLLATE variable and letting things default to $LANG was a real eye-opener. Things like 'ls' displayed files in a much more reasonable order. And GTK file-open dialogs went from obnoxious to simply annoying. Even we decide to close this PR, the discussions we've had will definitely be valuable when addressing issue #177 which cannot be so simply addressed.

If that is the case, then this PR becomes unnecessary!.

ah ah, makes sense. Let's leave this open for the time being, till we work out #252 / #253.

Hi @VinceR
I would like to do some housekeeping. Given we are continuing the discussion on PR #253, would it be ok to close this PR and remove the associated branch?

Hi @VinceR
I would like to do some housekeeping. Given we are continuing the discussion on PR #253, would it be ok to close this PR and remove the associated branch?

Yes, the "dictionary sorting" feature I had proposed is already available when LC_COLLATE is set to (MY_REAL_LOCALE) instead of 'C' or 'POSIX". So do close the PR & branch.

Even we decide to close this PR, the discussions we've had will definitely be valuable when addressing issue #177 which cannot be so simply addressed.

Just so.

Closed as agreed.