ksmserver: unreliable session restoration times, possible timeouts #255

Open
opened 2 years ago by jstolarek · 10 comments
Collaborator

This is a follow-up to 246#issuecomment-17484.

I have found myself regularly running into very long logging in times, with "Loading settings" step of the splash screen taking up to 20 seconds. After some debugging I discovered that this is related to restoring session. In particular, this behaviour seems to be affected by the contents of $HOME/.trinity/share/config/ksmserverrc file and the files in $HOME/.trinity/share/config/session/ directory. I managed to restore good copies of those files from a backup dated on 21 November 2021. However, logging in times still seem very inconsistent.

I've run experiments by switching to "Restore saved session" in session manager and then saving a session that contained: Konqueror, Akregator, Kmail, Yakuake, Firefox, Amarok and Konsole. In addition to those programs, which I run explicitly, session manager also saved twin (which I understand stores information about particular windows) and pulseaudio. I then repeatedly logged out and in and measured the time between me hitting enter in TDM and program windows appearing on the screen. In the best case scenario this takes 4 seconds. More frequently though there is a noticeable freeze during the "Loading settings" step, in which case logging in takes around 8-9 seconds.

This is a regression compared to late November/early December 2021. My good backup is from 21 November 2021 and I resolved a problem I was originally facing in #246 around 5th December 2021. At this point I was definitely not affected by this bug: as stated in this comment restoring session consistently took around 4 seconds.

Since this is non-deterministic my first guess at the source of the problem would be concurrency and possibly some race condition. However without being able to see any logs this is only a guess and I'm afraid I can't deduce anything more.

This is a follow-up to [246#issuecomment-17484](https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/issues/246#issuecomment-17484). I have found myself regularly running into very long logging in times, with "Loading settings" step of the splash screen taking up to 20 seconds. After some debugging I discovered that this is related to restoring session. In particular, this behaviour seems to be affected by the contents of `$HOME/.trinity/share/config/ksmserverrc` file and the files in `$HOME/.trinity/share/config/session/` directory. I managed to restore good copies of those files from a backup dated on 21 November 2021. However, logging in times still seem very inconsistent. I've run experiments by switching to "Restore saved session" in session manager and then saving a session that contained: Konqueror, Akregator, Kmail, Yakuake, Firefox, Amarok and Konsole. In addition to those programs, which I run explicitly, session manager also saved twin (which I understand stores information about particular windows) and pulseaudio. I then repeatedly logged out and in and measured the time between me hitting enter in TDM and program windows appearing on the screen. In the best case scenario this takes 4 seconds. More frequently though there is a noticeable freeze during the "Loading settings" step, in which case logging in takes around 8-9 seconds. This is a regression compared to late November/early December 2021. My good backup is from 21 November 2021 and I resolved a problem I was originally facing in #246 around 5th December 2021. At this point I was definitely not affected by this bug: as stated in [this](https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/issues/246#issuecomment-16691) comment restoring session consistently took around 4 seconds. Since this is non-deterministic my first guess at the source of the problem would be concurrency and possibly some race condition. However without being able to see any logs this is only a guess and I'm afraid I can't deduce anything more.
jstolarek added the SL/regression label 2 years ago
Owner

Hi Janek,
there hasn't been any significant change since December 2021 that would justify such difference in behavior, so I would hardly call this a regression. It is more likely something is getting saved/restored that wasn't happening back then.
As a reference, it takes 3-4 seconds on my VM to login and being usabl. I have about 20 konsole tabs (in two windows), firefox, konqueror, 20 firefox tabs in two windows, klipper, kompose, korganizer, knotes, kmix, tdehwdevicetray, tderandr, kgpg in my working session.

Hi Janek, there hasn't been any significant change since December 2021 that would justify such difference in behavior, so I would hardly call this a regression. It is more likely something is getting saved/restored that wasn't happening back then. As a reference, it takes 3-4 seconds on my VM to login and being usabl. I have about 20 konsole tabs (in two windows), firefox, konqueror, 20 firefox tabs in two windows, klipper, kompose, korganizer, knotes, kmix, tdehwdevicetray, tderandr, kgpg in my working session.
Owner

Again, comparison with a new user and with a sesson without tdedocker or without saving/restoring the last session will help.

Again, comparison with a new user and with a sesson without tdedocker or without saving/restoring the last session will help.
Poster
Collaborator

As a reference, it takes 3-4 seconds on my VM to login and being usabl.

Yes, this is how long it takes if I'm lucky. Now, to answer your questions:

  1. The problem doesn't happen on if I disable session restoring.
  2. I created a new user account with empty home directory, launched TDE, and skipped the wizard. I then launched the programs I am normally using and measured login times. It seems that under new user account everything works fine, i.e. I didn't experience any freezes.

I then disabled session restoring on my normal account and instead created a script that launches all the programs I am normally using:

#!/bin/bash

pulseaudio &
kfmclient openProfile filemanagement /dane &
konsole &
kmail &
akregator &
amarokapp &
firefox &
keepassx &
yakuake &

I placed the script in ~/.trinity/Autostart and made it executable. With the script as above it always takes ~4 seconds to log into TDE and launch all the programs. However, what seems crucial about this scirpt is pulseaudio & line. If I remove that line it takes around 8,5s to restore the programs and there is a noticeable freeze during logging in when nothing happens (at this point desktop icons are not yet being displayed so it's easy to recognize the freeze). I have no idea why pulseaudio is important but I know I am running pulseaudio in no-daemon mode on my system, i.e. each user has their own pulseaudio instance rather than there being one system-wide pulseaudio daemon for all users.

> As a reference, it takes 3-4 seconds on my VM to login and being usabl. Yes, this is how long it takes if I'm lucky. Now, to answer your questions: 1. The problem doesn't happen on if I disable session restoring. 2. I created a new user account with empty home directory, launched TDE, and skipped the wizard. I then launched the programs I am normally using and measured login times. It seems that under new user account everything works fine, i.e. I didn't experience any freezes. I then disabled session restoring on my normal account and instead created a script that launches all the programs I am normally using: ```bash #!/bin/bash pulseaudio & kfmclient openProfile filemanagement /dane & konsole & kmail & akregator & amarokapp & firefox & keepassx & yakuake & ``` I placed the script in `~/.trinity/Autostart` and made it executable. With the script as above it always takes ~4 seconds to log into TDE and launch all the programs. However, what seems crucial about this scirpt is `pulseaudio &` line. If I remove that line it takes around 8,5s to restore the programs and there is a noticeable freeze during logging in when nothing happens (at this point desktop icons are not yet being displayed so it's easy to recognize the freeze). I have no idea why `pulseaudio` is important but I know I am running pulseaudio in no-daemon mode on my system, i.e. each user has their own pulseaudio instance rather than there being one system-wide pulseaudio daemon for all users.
Poster
Collaborator

The problem doesn't happen on if I disable session restoring.

I have to take that back. Before I wrote this I spent nearly an hour testing this and I could never trigger the problem with session restoring disabled. I now restarted my PC and even when I start TDE with empty session and disable my session restoring script I still run into freezes that are over 10 seconds long. Looking at /var/log/messages I see following messages appearing during TDE login:

Feb  4 23:40:47 skynet kernel: [  677.591616] Lockdown: tde_dbus_hardwa: hibernation is restricted; see man kernel_lockdown.7
Feb  4 23:40:47 skynet kernel: [  677.592054] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
Feb  4 23:40:51 skynet kernel: [  681.519606] Lockdown: tdm_greet: hibernation is restricted; see man kernel_lockdown.7
Feb  4 23:40:51 skynet kernel: [  681.519751] Lockdown: tdm_greet: hibernation is restricted; see man kernel_lockdown.7
Feb  4 23:40:58 skynet kernel: [  688.547036] Lockdown: ksmserver: hibernation is restricted; see man kernel_lockdown.7
Feb  4 23:40:58 skynet kernel: [  688.547186] Lockdown: ksmserver: hibernation is restricted; see man kernel_lockdown.7
Feb  4 23:40:59 skynet kernel: [  689.481242] Lockdown: kded: hibernation is restricted; see man kernel_lockdown.7
Feb  4 23:40:59 skynet kernel: [  689.481387] Lockdown: kded: hibernation is restricted; see man kernel_lockdown.7 

Quick googling shows that Lockdown is a feature that prevents hibernation when Secure Boot is enabled, which is the case on my machine. This is a new feature that was added in kernel 5.4, which means it's new in Debian Bullseye (Buster was 4.19, Bullseye is 5.10).

I'll see whether disabling Secure Boot has any effect. Any idea what actions from tdm_greet, ksmserver and kded could trigger these warnings?

EDIT: here is the kernel_lockdown.7 man page.

> The problem doesn't happen on if I disable session restoring. I have to take that back. Before I wrote this I spent nearly an hour testing this and I could never trigger the problem with session restoring disabled. I now restarted my PC and even when I start TDE with empty session and disable my session restoring script I still run into freezes that are over 10 seconds long. Looking at `/var/log/messages` I see following messages appearing during TDE login: ``` Feb 4 23:40:47 skynet kernel: [ 677.591616] Lockdown: tde_dbus_hardwa: hibernation is restricted; see man kernel_lockdown.7 Feb 4 23:40:47 skynet kernel: [ 677.592054] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7 Feb 4 23:40:51 skynet kernel: [ 681.519606] Lockdown: tdm_greet: hibernation is restricted; see man kernel_lockdown.7 Feb 4 23:40:51 skynet kernel: [ 681.519751] Lockdown: tdm_greet: hibernation is restricted; see man kernel_lockdown.7 Feb 4 23:40:58 skynet kernel: [ 688.547036] Lockdown: ksmserver: hibernation is restricted; see man kernel_lockdown.7 Feb 4 23:40:58 skynet kernel: [ 688.547186] Lockdown: ksmserver: hibernation is restricted; see man kernel_lockdown.7 Feb 4 23:40:59 skynet kernel: [ 689.481242] Lockdown: kded: hibernation is restricted; see man kernel_lockdown.7 Feb 4 23:40:59 skynet kernel: [ 689.481387] Lockdown: kded: hibernation is restricted; see man kernel_lockdown.7 ``` Quick googling shows that Lockdown is a feature that prevents hibernation when Secure Boot is enabled, which is the case on my machine. This is a new feature that was added in kernel 5.4, which means it's new in Debian Bullseye (Buster was 4.19, Bullseye is 5.10). I'll see whether disabling Secure Boot has any effect. Any idea what actions from tdm_greet, ksmserver and kded could trigger these warnings? EDIT: [here](https://www.mankier.com/7/kernel_lockdown) is the kernel_lockdown.7 man page.
Poster
Collaborator

Okay, disabling Secure Boot makes the messages go away but beyond that it doesn't change anything.

Now, if I log into text terminal, launch htop, and switch to that terminal quickly when I press enter on TDM screen I can observe process activity during TDE startup. What I'm seeing during those 10-20 second freezes are two kdesktop processes both running at slightly over 100% CPU time (which means each of the processes most likely has some multithreading and uses more than one CPU core). When TDE finally loads only one of those two processess remains. For example ps aux shows this after logging in:

killy       7586 44.2  0.2 148836 38376 ?        Sl   00:18   0:12 /opt/trinity/bin/kdesktop
killy       7588  0.1  0.1  59956 27856 ?        S    00:18   0:00 /opt/trinity/bin/kdesktop_lock --internal 7586

whereas during the login process there would be a second /opt/trinity/bin/kdesktop process with PID 7587 and process 7588 wouldn't yet exist. Once one of those two processes disappears I can observe applications that are restored by session manager taking up CPU time.

Is it possible that this second kdesktop process shouldn't be launched at all and it's somehow deadlocking with the first one? Is it why the logs I pasted in the previous comment have all messages duplicated?

Okay, disabling Secure Boot makes the messages go away but beyond that it doesn't change anything. Now, if I log into text terminal, launch `htop`, and switch to that terminal quickly when I press enter on TDM screen I can observe process activity during TDE startup. What I'm seeing during those 10-20 second freezes are two `kdesktop` processes both running at slightly over 100% CPU time (which means each of the processes most likely has some multithreading and uses more than one CPU core). When TDE finally loads only one of those two processess remains. For example `ps aux` shows this after logging in: ``` killy 7586 44.2 0.2 148836 38376 ? Sl 00:18 0:12 /opt/trinity/bin/kdesktop killy 7588 0.1 0.1 59956 27856 ? S 00:18 0:00 /opt/trinity/bin/kdesktop_lock --internal 7586 ``` whereas during the login process there would be a second `/opt/trinity/bin/kdesktop` process with PID 7587 and process 7588 wouldn't yet exist. Once one of those two processes disappears I can observe applications that are restored by session manager taking up CPU time. Is it possible that this second `kdesktop` process shouldn't be launched at all and it's somehow deadlocking with the first one? Is it why the logs I pasted in the previous comment have all messages duplicated?
Owner

Hi Janek,
I don't have pulseaudio running automatically on login, perhaps that could influence the login time??

Regarding kdesktop and kdesktop_lock, it is correct to have both of the processes. kdesktop manages your desktop, while kdesktop_lock is a subprocess that handles locking the screen and unlocking. The latter is spawn from the first and the two processes talk to each other.

But there shouldn't be a second kdesktop process as far as I can say, I have never observed that here (although I have never looked for it). Are you using two monitors by any chance? If so, it would be worst trying using only one monitor and see if you spot any difference in behavior.

Hi Janek, I don't have pulseaudio running automatically on login, perhaps that could influence the login time?? Regarding kdesktop and kdesktop_lock, it is correct to have both of the processes. kdesktop manages your desktop, while kdesktop_lock is a subprocess that handles locking the screen and unlocking. The latter is spawn from the first and the two processes talk to each other. But there shouldn't be a second kdesktop process as far as I can say, I have never observed that here (although I have never looked for it). Are you using two monitors by any chance? If so, it would be worst trying using only one monitor and see if you spot any difference in behavior.
Poster
Collaborator

I have a single monitor setup so no, I don't have two monitors.

But there shouldn't be a second kdesktop process as far as I can say, I have never observed that here (although I have never looked for it).

I think I can also reproduce the problem on a new user account - see attached photo. At the moment the longest freeze I got for a new account was about 2 seconds (long enough to switch to tty0 and grab a photo) but I already learned results of these tests are heavily dependent on luck some unknown factors that I can't control.

I have a single monitor setup so no, I don't have two monitors. > But there shouldn't be a second kdesktop process as far as I can say, I have never observed that here (although I have never looked for it). I think I can also reproduce the problem on a new user account - see attached photo. At the moment the longest freeze I got for a new account was about 2 seconds (long enough to switch to tty0 and grab a photo) but I already learned results of these tests are heavily dependent on ~~luck~~ some unknown factors that I can't control.
Owner

Hi Janek,
I decided to try this out myself. If I switch back to CLI and monitor using htop, I also see two kdesktop processes, both at 100% cpu usage. I have to say that if I do this, the graphical session actually freezes completely and I need to restart my VM.
Quick search through kdesktop code shows there are calls to fork(), so it may be a genuine thing that there are two processes. I have not investigated the code further, so I can't confirm 100%.
So I guess that problem you are seeing may have another cause and we need to keep looking for it.

Hi Janek, I decided to try this out myself. If I switch back to CLI and monitor using htop, I also see two kdesktop processes, both at 100% cpu usage. I have to say that if I do this, the graphical session actually freezes completely and I need to restart my VM. Quick search through kdesktop code shows there are calls to fork(), so it may be a genuine thing that there are two processes. I have not investigated the code further, so I can't confirm 100%. So I guess that problem you are seeing may have another cause and we need to keep looking for it.
Poster
Collaborator

Quick search through kdesktop code shows there are calls to fork(), so it may be a genuine thing that there are two processes. I have not investigated the code further, so I can't confirm 100%.

Can you point me at the code responsible for starting TDE session? Maybe looking at it will give me a hint what can be the cause.

> Quick search through kdesktop code shows there are calls to fork(), so it may be a genuine thing that there are two processes. I have not investigated the code further, so I can't confirm 100%. Can you point me at the code responsible for starting TDE session? Maybe looking at it will give me a hint what can be the cause.
Owner

Looks at starttde file. If does lots of init and towards the end it will run tdeinit_phase1 to pass control to the actual session manager.
The code for the session manager is here - make sure you read the readme file, it has lot of info.

Looks at [starttde](https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/src/branch/master/starttde) file. If does lots of init and towards the end it will run tdeinit_phase1 to pass control to the actual session manager. The code for the session manager is [here](https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/src/branch/master/ksmserver) - make sure you read the readme file, it has lot of info.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

No dependencies set.

Reference: TDE/tdebase#255
Loading…
There is no content yet.