Implications of large db.sqlite3 file for Kolibri performance

If you have a specific technical inquiry, construct your post with the following:

Summary

The hardware on which Kolibri is currently running is a Celeron Dual core + 4 GB RAM.

db.sqlite3 has grown to 1.7 GB and will surely grow beyond 2.2 GB in the next couple of days as we sync in user data from two more devices.

When Kolibri starts, it takes around 9 minutes now and the kolibri.txt log reveals that it seems to be busy with the vacuuming process.

  • What can be done to speed up the startup process?
  • Are there any other speed implications I must be watching out for?

Technical details

  • Kolibri version - 0.12.5, PEX installer (have not switched to DEB as yet as did not feel the need yet)
  • Operating system - Ubuntu 18.04
  • Hardware - Celeron Dual Core + 4 GB RAM

Hello @shrenik
reading files of 2.2 Gb is a performance problem for any OS, with a SSD disk things get better, but it’s bad anyway. Being a database the problem is worse because the system needs to do read and write operations in different parts of the file. Theorically sqlite can handle even terabytes of data, but things will go bad in a concurrent system as a web server. If the OS could keep all the database in memory things would go better, but that’s not your case.
There is one good solution from a sysops point of view: Kolibri can be used with Postgresql. Being a client-server database does not have this file problems.

However, before switching to a database that requires some sysadmin skills, I would like you to do a test: sqlite databases tend to be fragmented (yes, as the disks partitions in windows OS) due to multiple read/write/delete operations. Kolibri includes the vacuum procedure to defrag it when starting up, but it looks like it’s not working correctly in your environment. I think it’s timing out in your system because some other process is accessing to the database file at the same time.

Could you (all these operations have to be done inside the kolibri home folder):
a) stop kolibri
b) ensure no process has the database open with lsof db.sqlite3. The output should not have anything but a warning.
c) check the kolibri database filesize with ls -lh db.sqlite3
d) to ensure you have sqlite3 installed: sudo apt install sqlite3
e) execute :
sqlite3 db.sqlite3
vacuum;
.exit
f) check again the kolibri database filesize with ls -lh db.sqlite3

The output between c) and f) should be quite different if the vacuum process has defragmented the database.
If you still have a file with 1.7 gb there’s no option: either you live with the sloweness of it or you switch to use Postgresql.
If the filesize is much smaller, you are done. You can repeat this process from time to time to keep your database size limited. (or try to use kolibri using the deb installer to check if the vacuum problem is due to the pex envirnoment)

I know these steps could be complex, depending on your Linux knowledge, don’t hesitate to ask for more help if needed.

José

Thanks @jredrejo for the guidance.

I am equipped technically and should be able to manage with broad guidance as you have given above. And thankfully you are there. :slight_smile:

AFAIR I remember I did see a vacuuming successful message without any errors in kolibri.txt log. Hence migrating to postgres may be the next logical step.

I am away from the system hence can try only tomorrow.

Yet, I shall try the manual vacuuming steps to be double sure.

If there are no issues with vacuum then I don’t see a reason for trying the switch to deb.

For sqlite3 to postgres migration should I follow this - https://pgloader.readthedocs.io/en/latest/ref/sqlite.html?

Also spotted this related discussion - https://github.com/learningequality/kolibri/issues/5721, where there is a confirmation that existing data’s migration does not work yet with a kolibri manage command.

hello again @shrenik, I am answering you inline

AFAIR I remember I did see a vacuuming successful message without any errors in kolibri.txt log. Hence migrating to postgres may be the next logical step.

sqlite returns the same code if the vacuum finishes correctly or if it times out. Kolibri has no way to distinct it. Only when there are vacuum errors you would see a different message.

If there are no issues with vacuum then I don’t see a reason for trying the switch to deb.

yes, if you don’t see changes in the filesize, there are no reasons. If you see changes, you should switch. There might be some problem inside your pex environment avoiding the vacuum to happen.

For sqlite3 to postgres migration should I follow this - SQLite to Postgres — pgloader 3.6.9 documentation?

I don’t think there’s any documentation (yet) focused in Kolibri, so that seems the best approach. All the postgresql installations I am aware have started from a clean database, not trying to import the data from sqlite.

Also spotted this related discussion - kolibri cannot start after switching from sqlite to postgresql database · Issue #5721 · learningequality/kolibri · GitHub, where there is a confirmation that existing data’s migration does not work yet with a kolibri manage command.

This was a very special case: the user had done the migration on a sqlite database, and then switched the configuration to use postgresql with old tables. The system thought that the migration was already done and didn’t trigger it. In normal circumstances this should not happen and migrations work normally when using postgresql.

José

Hey @jredrejo. Thanks once again.

Tried with manual vacuuming -

So no need to shift to DEB as of now. Will seriously consider switching to postgres.

Please guide me with the exact sequence of steps to shift to postgres. I will try the pgloader approach referenced above to create a postgres db. What needs to be done before and after at Kolibri’s end to ensure Kolibri uses it properly?

@jredrejo In my offline discussions with @richard, he mentioned using django-admin would be better than pgloader due to possible schema mismatch issues that may arise if done through pgloader. He also pointed me to the django-admin documentation but am a complete newbie to django. Could you please guide me with some instructions so as to remain on course?

@jredrejo, @richard Is any documented step-wise guidance now available to migrate from sqlite to postgres db? I am a complete novice at django.