After Upgrading to kolibri v0.11.1 - Server continues to disconnects and reconnects clients


On this subject: Please below…

Note: I have also came across these:

this does “sound” like the same issue we having.

Hi @Bryan_Fisher - am aware that the UNIQUE constraint failure is present in your log, but this is earlier and was fixed in Kolibri 0.11, which is why it stopped occurring in your logs after ERROR 2019-01-29 10:12:38.

Great, thanks, but could you continue with the other steps? This small integrity check isn’t as thorough as dumping the full database and rewriting it.

Hi @benjamin,

Okay, thank you kindly.

@benjamin,

You mean step 6? Yes we have done so. Issue still remains.

Alright, copy that @Bryan_Fisher - let me just confer with some others about the Disk I/O issue.

Could you give me the output of du -sh ~/.kolibri/* – curious about the file size of the database.

I’ll respond to my own question :slight_smile: with the info that you already provided:

image

It seems that the database is ~60 MB in size which isn’t considerable at all.

@Bryan_Fisher considering the privacy of your database, would you be able to send it directly to me by direct message in this forum? I would like to test and see if I can re-produce the Disk I/O error.

@Bryan_Fisher - can I convince you to repeat Step 6 and ensure that stop kolibri before and start kolibri again after ?

It seems from your latest logs that the Disk I/O errors are gone and it’s now the usual django.db.utils.DatabaseError: database disk image is malformed error which is known to be fixed with the instructions in Step 6.

Hi @benjamin, you’ve been a busy man :smile:

I will follow through on your reply’s with much excitement :slight_smile:

If I run out of time today again, can we perhaps arrange a Team Viewer session? That will be later in the evening for me and early morning (I think) for you. Please let me know.

Thanks so much!

Yes I would.

Hi @benjamin,

Here it is:

@Bryan_Fisher I can see from your screenshot that fixed.db hasn’t overwritten the database that’s in use (and has different file sizes).

Could you do the steps 1-6 again? It’s important that kolibri is stopped while you overwrite the database.

Afterwards, if it still doesn’t work, I would like to see your log file again, alternatively, the output of running kolibri start --foreground.

Thanks so much,
Ben

Hi @benjamin,

thanks for joining me!

yes it has overwritten, because it has started a new instance of kolibri. I tracking changes on a word spread sheet can i share the doc with you?

Hi @Bryan_Fisher

Firstly just a caution: In the “After Step 6 Output” (in the Google Doc that you shared privately), I can see that you are running with an empty database. You need to locate the original backup and ensure that you keep it safe for now.

Having now tried Kolibri with a blank DB confirms that the I/O error isn’t due to anything particular with your original DB. I observe that Kolibri installs correctly and starts, and that you have this I/O errors afterwards when using more than one tablet.

An additional test would be to try to load Kolibri with a completely fresh data directory:

# Stop kolibri
kolibri stop

# (ensure that you don't have any wild processes left)
ps aux | grep kolibri

# Move current ~/.kolibri to a safe location
mv ~/.kolibri ~/.kolibri_bak

# Start kolibri, notice that this is now a blank installation
kolibri start --foreground

# (Now test if the I/O error occurs)

# To restore:

# Stop kolibri
kolibri stop

# Remove the test home dir
rm -rf ~/.kolibri

# Move the backup back in place
mv ~/.kolibri_bak ~/.kolibri

If these problems persist, turn your attention to the I/O error again. Is your file system particular? Is it slow? Broken? Do you see other I/O errors, for instance in your syslog?

Here are some basic tests:

# Look at the sys log to see if you have other I/O errors. Press Q to quit.
less /varl/log/syslog

# Prints information about your partitions
sudo lsblk -f

# Benchmarks write speed
dd if=/dev/zero of=/tmp/output conv=fdatasync bs=384k count=1k; rm -f /tmp/output

# Checks your root partition /dev/sda2 for errors
fsck /dev/sda2

Hi @benjamin,

I will follow through on your suggestion, thank so much!

I have in the meantime accessed the content with 10+ Tablets with internet access, Yes its kinda slow, only got disconnect error once while trying to log into one tablet but it automatically came back after a few seconds and logged in…

here are the log files:

Some notes during this session:

I will follow through on your above suggestion later and will get back to you promptly.

Thanks again!

Hi @benjamin,

I have stopped @ “# (Now test if the I/O error occurs)” this is where i need to start kolibri again. However, I started a complete new image from scratch, installed ubuntu updates etc, installed kolibri from PPA Repo, imported come content from a local HDD. The same error occurred “Disconnected from server” . My Manager, also did a fresh install, etc, installed kolibri etc… I imported some content (a public channel) and tested, it worked fine for x9 tablets, as soon as i added more users up to 14, then in loads a very long time to get to the login screen, about 3-4 tablets trying to load the login screen, and then the error occurs.

it does look like this behaviour pattern only happens when there a few devices increasing traffic, ie, watching videos on some devices, and some users trying to login then this happens.Two fresh installs, same outcome. It like the more users trying to access kolibri then it just stops as if it is unable to handle all the requests at once. Meantime on the sever it only utilises 8% of CPU consumption.

I shall continue with your suggestion tomorrow morning when i’m back in the office.

Kind regards,
Bryan

Hi @Bryan_Fisher

Thanks for the thorough testing: I’m sorry that we couldn’t immediately identify a solution. But given that it doesn’t seem to be disk errors on your side and that it’s reproducible on a blank installation, I have to escalate it for further stress testing. It’s not an error that should occur, so it’s definitely to be regarded as something we want to look into and fix.

Thanks for your patience and detailed reports so far – will get back, and perhaps with an updated release to install and test. Our first step will be to try and re-produce the similar I/O errors.

I assume that these are the kinds of hardware and OS specs we should try out firstly…

Server/Machine info:
Ubuntu 16.04. 5 LTS,
RAM: 4GB
CPU: Intel® Celeron® CPU J1800 @ 2.41GHz, 2580 MHz
Kolibri v: 0.11.1

Best,
Ben

Hi @benjamin,

Thanks so much for your assistance much appreciate it!

"I assume that these are the kinds of hardware and OS specs we should try out firstly…

Server/Machine info:
Ubuntu 16.04. 5 LTS,
RAM: 4GB
CPU: Intel® Celeron® CPU J1800 @ 2.41GHz, 2580 MHz
Kolibri v: 0.11.1"

Yes please.

FYI, I am going to revert back to v9.2 in the meantime, that was our most stable server.

Hi @benjamin,

I have a clone image with kolibri 9.2, but I dont have the installet .deb file, could you please send me a link with the .deb for kolibri v9.2.

It is running with no issues with 20 + tabs connected. (just FYI).

Kind regards,
Bryan