After Upgrading to kolibri v0.11.1 - Server continues to disconnects and reconnects clients

Good day,

I have two centres the ones who are using the servers after upgrading to kolibri v0.11.1. This ia a message from one of our centers: She is saying that content is loading slow and as soon as multiple user x10 + starts logging in content takes long to load, clients get a message disconnected from server then reconnects them again…; below is from 1 client:

“As soon as every child lock in, the sign in take longer as usual to go to lessons. If go to lesson then the child only work for 10 min after that the screen shows refresh an we do press refresh but then it goes on an on until it go back to the menu. Some continue to restart over an over again and in the mean time at the bottom there it show disconnected from server 00:00 an after 5 sec it comes back and say reconnected in server 00:00 But the children cannot do work proper at all its a nightmare.”

from another client:
“the headaches continue though. everything is very slow to load and the tablets keep losing connection to the server”…“been loading for ages…” …"“disconnected to server”… etc… he sent me some screen shots.

,

please advise if there is any specific commands i can run to diagnose this issue, to get more information about this issue… Please note that these server has been working fine for the past few months since they started using them.

Looking forward to your reply.

Thank you kindly.

Kind regards,
Bryan

You may want to try to do a “Reload this page” (Chrome) or “Reload current page” (Firefox) to get past the error you are seeing. In both Firefox and Chrome the shortcut of Ctrl + R will do the reload of page.

We have seen the error and the Kolibri logo you show in fotos at times when traffic is heavy on the server. If we do not reload the page, the error remains and the Kolibri logo seems to fly in place endlessly. But with a quick Ctrl + R most times the page will reload normally and progress continue.

I am not suggesting this is a fix, but for us this gives the students a quick tool to continue the learning experience.

Hi @mrdavidhaag,

Thanks so much for you reply, much appreciate it.

This issue is now critical as it affects +15 centres that are currently operating with our servers. All of them have the same problem.

Our issue is slightly different as i have monitored it with the client in a live environment, our users are accessing content via a locked tablet, even when viewed in a browser, the resfresh button option won’t work because kolibri application outputs an error.

This is the finding:
When user start logging into kolibri, it takes a bit longer.
After they login and accessing content, some takes longer to load but eventually plays.
When 10+ users are logged in things start to wrong…
The server user on average 6% CPU load.
When users start to disconnect, I checked the the status of kolibri using the ‘kolibri status’ command; this error comes up:
Kolibri Server configuration error (9)

This is when you get the error: 'Disconnected from server… the status of kolibri is “Kolibri Server configuration error (9)”

Below is more errors when using the ‘kolibri status’ command:

The server has to be rebooted in order for it to restart kolibri again.
then the same thing happens.

Here are the server and kolibri.log files, if you can find it useful.

Server/Machine info:
Ubuntu 16.04. 5 LTS,
RAM: 4GB
CPU: Intel® Celeron® CPU J1800 @ 2.41GHz, 2580 MHz
Kolibri v: 0.11.1

Please advise if you need more info.

Thank you in advance for you assistance, I am hopeful we will resolve this issue very soon.

Kind regards,
Bryan

Hi @bryan_fisher, this is Jonathan from the Learning Equality team. I’ve been looking at the logs you shared and am not familiar with the recent errors. I’ll share them with the team to see if they have any ideas.

Could you share how you setup the Kolibri servers at your centers? Were the different servers setup at the same time using the same method, or were they done differently? What version of Kolibri did you have before the upgrade?

Could you tell us if anything happened recently before you started having these problems? I notice that the logs go back several months; were things working okay then? Did the problems occur immediately after upgrading to 0.11.

Hi @jonathan,

Thanks so much for your reply.

Re; your reply:
Could you share how you setup the Kolibri servers at your centers?
We run Kolibri on Ubuntu 16.04 LTS, and installed via PPA Repository. After installing kolibri, I add the kolibri user into the sudoers file. I give kolibri user ownership of the .kolibri folder. Thereafter import the content etc.
This is done once, the rest of the server gets cloned.

Were the different servers setup at the same time using the same method, or were they done differently?
There’s no difference, all cloned from one image.

What version of Kolibri did you have before the upgrade?
These server were on v10.2
Some where on 9.2

Could you tell us if anything happened recently before you started having these problems?
Nothing Critical that made it unresponsive as this, mostly some content was taking long to load, some was taking long to login.

I notice that the logs go back several months; were things working okay then?
Yes we have received several monthly CSV reports from our centre’s. We even have a center from Cricket SA who has a centre for young players to study after school :slight_smile:

Did the problems occur immediately after upgrading to 0.11.
Yes, the first centre that started using it, called the same day they switched it on. that was from the centre I spent an hour on the phone with this morning, testing different scenarios, like testing network, to see if the disconnection is caused maybe from the Access Point, the connection remained stable with 0% loss of packets. No error logs on the router logs.

Today we asked the users to login 5 at a time, so that we don’t distort the application with requesting to login at the same time, while we monitor the performance on the server as well. We had to ensure that the server is not running out of Memory, while users start logging into kolibri and accessing content, the server was basically under used.

Thank you kindly for your reply. Truly appreciate it.

Warm Regard,
Bryan

Thanks for the information, Bryan.

I missed in your original post, that problems appear to happen as more people log in to Kolibri. Has the strategy of letting users log in 5 at a time helped?

In the center’s, do students start working lessons immediately after logging in? If so, do they mainly watch videos, do exercises?

Also, when some student’s become disconnected, is the Kolibri server still running and working for some, or has Kolibri completely stopped for everybody?

Hi @jonathan,

Has the strategy of letting users log in 5 at a time helped?
No, not at all.

In the center’s, do students start working lessons immediately after logging in? If so, do they mainly watch videos, do exercises?
Yes, yes.

when some student’s become disconnected, is the Kolibri server still running and working for some, or has Kolibri completely stopped for everybody?
Kolibri stop and displays the error message, “Kolibri server configuration error (9)”, when running the command “kolibri status”.
Its stops for everybody.

Thanks so much for your speedy reply!

Good day @jonathan ,

Do you maybe have any update on this issue?

how do i downgrade the current version of Kolibri? Please advise asap.

Here are more centre’s eporting… loading videos for more than 10 minutes, disconneting from server etc…

Hope to hear from you soon.

Thanks so much!

Kind regards,
Bryan

Hi Bryan,

I’m not familiar with installation on Linux, but do you have access to the PPA for the last version of Kolibri?

Hi @jonathan,

Yes I do, that’s how i update kolibri to the latest version on all our servers. Except the ones that had a some OS instabilities, I re-imaged those one.

Hi @Bryan_Fisher

I’m also from the Learning Equality team and @jonathan told me about this issue. It’s a bit complex, so I hope you are okay about answering some more questions? I’ll certainly be here at the other end to try and get this solved with you.

I’m noticing in the logs that the very first time this incident occurs (Disk I/O error), there are 4 kolibri daemons launched with 1 second intervals, all writing their outputs in the log.


WARNING 2019-01-25 10:51:09,724 ping Ping failed (could not connect). Trying again in 15.0 minutes.
INFO 2019-01-25 11:06:09,825 ping Attempting a ping.
WARNING 2019-01-25 11:06:30,103 ping Ping failed (could not connect). Trying again in 15.0 minutes.
INFO 2019-01-25 11:21:30,195 ping Attempting a ping.
WARNING 2019-01-25 11:21:50,445 ping Ping failed (could not connect). Trying again in 15.0 minutes.
INFO 2019-01-25 11:36:50,547 ping Attempting a ping.
WARNING 2019-01-25 11:37:10,818 ping Ping failed (could not connect). Trying again in 15.0 minutes.
INFO 2019-01-25 11:52:10,899 ping Attempting a ping.
WARNING 2019-01-25 11:52:31,152 ping Ping failed (could not connect). Trying again in 15.0 minutes.
INFO 2019-01-25 11:53:55,917 apps Running Kolibri with the following settings: kolibri.deployment.default.settings.base
INFO 2019-01-25 11:54:33,392 apps Running Kolibri with the following settings: kolibri.deployment.default.settings.base
INFO 2019-01-25 11:55:34,158 apps Running Kolibri with the following settings: kolibri.deployment.default.settings.base
INFO 2019-01-25 11:56:15,736 apps Running Kolibri with the following settings: kolibri.deployment.default.settings.base
ERROR 2019-01-25 11:56:37,861 sanity_checks Port 8080 is occupied.
Please check that you do not have other processes running on this port and try again.

INFO 2019-01-25 11:58:05,147 apps Running Kolibri with the following settings: kolibri.deployment.default.settings.base
ERROR 2019-01-25 11:58:11,202 exception Internal Server Error: /api/content/contentnode/bd17296937234e3a8ce0c3351d00b3f8/
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/kolibri/dist/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/usr/lib/python3/dist-packages/kolibri/dist/django/db/backends/sqlite3/base.py", line 328, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.OperationalError: disk I/O error

A disk I/O error unfortunately usually means that something isn’t read/written correctly in the file system.

Some questions:

  1. Do you have enough free hard drive space on all partitions? Try running df -h.
  2. Are there many instances of kolibri running at the same time? Try running ps ax | grep kolibri
  3. Do you have any other bottlenecks on the server while Kolibri starts crashing?
  4. Finally, a question to clarify if none of the above gives you any answers: Is Kolibri working at all right now or completely inaccessible?

I would like to help you restore the database if indeed it’s broken, but the Disk I/O error hints that something else is wrong, so just wanna investigate that firstly…

Hi @benjamin,

Thank you for your response.
Q1.
Yes, all machines have 1TB of space, available store on all unit is 900+ GB free space. It just has the Ubuntu 16.04 OS, and Kolibri installed.

Q2.
No, The error ref: “…sanity_checks Port 8080 is occupied.” occurs after kolibri upgraded, the default owner becomes root, I then login as root user, run the command to stop kolibri, then i give all permission to kolibri over the .kolibri folder. Reboot and then kolibri runs from the kolibri user just fine without the error: “…sanity_checks Port 8080 is occupied.”.

When this happened:…“sanity_checks Port 8080 is occupied.” when i run the command kolibri status… I immediately checked the permmisions on the .kolibri folder. I will run the command ps ax | grep kolibri to double check for you.

Q3.
No, there are none. However I have noticed this error: “WARNING 2019-01-25 11:21:50,445 ping Ping failed (could not connect). Trying again in 15.0 minutes.” only when there is no internet access. Testing offline.

Q4.
Kolibri is working. When users start logging in, and accessing content within 10-15 minutes, the kolibri bird loading image is displayed, users start seeing “disconnected from server…etc” then it becomes unstable.

Yes I would like you to assist me in trying to restore the db, if it is broken.
Also I have seen the Disk I/O errors on all the previous versions while installing kolibri, ref: Manually verify import content on database file have a look here.

Looking forward to your reply.

Thanks so much!

HI @benjamin,

Please see output for ps ax | grep kolibri command on the server:

I have done a test (offline) with x14 Tablets.
While logging on and playing video’s it took 4minutes for kolibri to not respond, as soon as we got the error, “…Disconnected from server…” I checked kolibri status: please see results:

.

I also checked services running on which user:


Then i checked the status again:

@benjamin,

It looked like the kolibri instance is running from the root as well? When the error happens then i checked kolibri status on the server, that gives me “Not responding (5)” but then i log onto the root user and do kolibri status, it shows me “Stopped (1)” as if it was running.

please see below:

Hi Brian,

Thanks for the inputs. Let’s keep in mind that it’s a “Disk I/O error” and suppose that this error is what it is. You say that it has happened since the upgrade.

Please try these steps:

  1. Log in as the kolibri user
  2. Stop kolibri with kolibri stop
  3. Ensure that there are no more running instances:
    killall -9 kolibri
    ps aux | grep kolibri # check that it's really not running anymore
    
  4. Copy the current database somewhere (doesn’t matter, but let’s keep a backup):
    cp ~/.kolibri/db.sqlite3 ~/old_kolibri.sqlite3`
    
  5. Now try opening the current database and verifying the integrity:
    sqlite3 ~/.kolibri/db.sqlite3
    SQLite version 3.22.0 2018-01-22 18:45:57
    Enter ".help" for usage hints.
    sqlite> PRAGMA integrity_check;
    ok
    
  6. Maybe that didn’t work, however try anyways to re-create the database this way (from the command line, NOT the sqlite3 command prompt):
    sqlite3 ~/.kolibri/db.sqlite3 ".dump" | sqlite3 ~/fixed.db
    mv ~/fixed.db ~/.kolibri/db.sqlite3
    
  7. Now run kolibri start and try it out: Does it work now?

If you encounter errors, please do keep posting the error messages.

On this subject, what did df -h tell you? I can see that you might have abundant storage space, but it may be partitioned in a problematic way?

Hi @benjamin,

Thanks so much for your reply! Very much appreciated.

I will only be able to give you feedback tomorrow morning, when i’m back at the office.

Thanks again for your speedy response.

Hi @benjamin,
"…opening the current database and verifying the integrity:
output: ok

That worked. We have also upgraded to SQLite version 3.22.0

This is the last part of the kolibri.log:
I will actually attach a link to the kolibri.log file and server.log file…

Hi @benjamin,

Please see kolibri.log and server.log:

You will see the logs on today’s date.