Optimisation of mp4 videos

This question really relates to content, but this looked like the best place for it.

I have the latest Kolibri (0.15.1) installed on RPi 4B using the RPi image provided, and I have loaded various channels from the online Studio resources, including Khan Academy (US)

The KA channel in particular contains some fairly large videos where the files are 100 - 200MB in size.
Good examples are in the section:
Khan Academy (English - US curriculum) / College, careers, and more / Entrepreneurship /
Interviews with entrepreneurs / Sal chats with entrepreneurs

I noticed that these videos took a long time to load into the browser - sometimes minutes.

The video files are .mp4 type. This type of file needs to be optimised for Web usage so that the browser can start playing the file without waiting for the whole file to be downloaded.
The key optimisation involves placing the metadata block (known as the “moov” block) to the beginning of the file rather the end. Once the browser has received the “moov” block it can commence playing the video. An explanation is provided here:
3 Solutions to MP4 Fast Start - How to Optimize Video for Web Fast Streaming?

I tested all of the large “entrepreneurs” video files from the KA channel and found that if I optimised them the slow loading issue was resolved.

So I suspect that at least some the video files in the online channel repositories are not optimised for Web usage by default.

Can anyone confirm whether videos files in the online Studio channel repositories are routinely optimised or not please? (Or did I just find some bad examples perhaps.)

Apart from the delay for users in starting the video playing, loading non-optimised files creates a major workload for the Kolibri server and the network. Transferring a 100MB video file will fully consume a 100Mbps Ethernet network for more than 10 seconds (at best), so if even a few users try to launch videos the overall effect can be a very significant loss of performance, and the appearance that the system has “hung”.

When the video files are optimised, only a small chunk of the video is sent to the browser to get started. As the video plays, further small chunks are sent as required. This allows many users to be active on the system without loss of performance. You can see this “chunking” effect happening in the progress bar at the bottom of the screen which will advance in steps as a large video plays.

Any feedback on this would be appreciated


For anyone interested in trying out the optimisation, I used ffmeg on Ubuntu command line to optimise individual video files as follows:
$ ffmpeg -i <input.mp4> -movflags faststart -c copy <output.mp4>

I then optimised all the .mp4 content on my Kolibri system storage card by running a simple script (as below) to the content stored on the system uSD card, where the content is stored at: /KOLIBRI_DATA/content/storage

The script looks recursively through sub-directories for any file whose name includes “.mp4” and applies the optimisation to each file in turn. The (temporary) output file has “.fix” appended to the original filename.
When each file optimisation is complete, the original file is replaced by the “.fix” file.
The script ran for a couple of hours to process the 80GB of content I had downloaded onto the Kolibri system.

---------------------------------
#!/bin/bash

# Find the pathname for all mp4 files in all subdirectories, then optimise each one in the loop
for vidfile in $(tree -if --noreport . | grep ".mp4"); do

   # Apply Faststart optimisation
   ffmpeg  -i $vidfile   -movflags faststart   -c copy   $vidfile.fix

   # Replace the original file
   mv   $vidfile.fix   $vidfile

done
--------------------------
2 Likes

@tgillett Thank you very much for sharing these comprehensive observations and recommendations about improving videos for streaming! :star_struck:

We do perform optimization steps with videos prior to publishing them as a Kolibri channel, but we’ll certainly include some steps that you mention to further improve. We’ll additionally include these recommendations into the user documentation for Kolibri, Studio and the Ricecooker, so the whole community can benefit from them. Thank you again for sharing! :clap:t4:

@RadinaMatic

We have done some further testing to see what sort of numbers of connections to Kolibri server we can achieve, in particular for playing videos, as that is the obviously most critical type of content.
We have attempted to simulate a classroom environment as best we can, as described below.

For these tests, Kolibri is running on a recycled laptop (HP 4230s i3, 8GB RAM) which running Desktop Ubuntu 20.04. We expect that the level of performance from this test laptop running full Desktop Ubuntu is roughly comparable to running the Kolibri RPi image on a RPi 4B+ 4GB RAM device.

The Kolibri server laptop is connected by Ethernet cable to a GLiNet MT300NV2 pocket WiFi router acting as a WiFi Access Point. This device is running a customised build of OpenWrt 19.04.

The main client devices are 8 recycled laptops (HP 4340s i5 8GB RAM) running Ubuntu 22.04 with Firefox browser set up with disc cache disabled. Each instance of Firefox has 6 open tabs and a different video playing in each tab. Each laptop is connected by WiFi to the Access Point.
Each laptop is rebooted prior to the test to ensure there is no content cached locally.

We used the videos from the Khan Academy (US) channel, Entrepreneurship folder as these video files are all over 100MB in size and play for around 30 minutes.
The video files have been optimised for Faststart as described in the previous post.

An additional 8 client devices are also connected to the Access Point by WiFi, and these are used to manually run adhoc video and exercise pages interactively to subjectively judge the performance of the server.

Following are some screenshots showing results of our testing.

  1. Kolibri Network - Non-Optimised Video

This screenshot shows the Performance app screen for the Kolibri server and what happens on the network between the Kolibri server and the Access Point when one user requests one of the videos which is about 120MB in size, and the video file is not optimised for FastStart.

You can see that the network runs at around 40Mbps for 30 seconds to move the whole file to the client. The rate is actually limited by how fast the Access Point can move the data to the Client device, which is around 40Mbps.
Obviously the WiFi network is fairly well saturated during this period and other clients do not get much response from the server.

  1. Kolibri Network - Delivering 48 concurrent videos (optimised)

This screenshot shows the Performance app screen for the Kolibri server, as well as output of the ‘top’ command in a terminal window.

You can see that the network is delivering a series of ‘chunks’ of the 48 video files, plus traffic from the other client devices, and is obviously not saturating for any period of time. The ‘top’ command window shows that the Kolibri server is running with a load average of around 0.76 and so is not overloaded.

Using the secondary clients for manual ad hoc access to content during this time gives no impression that the response from the server has degraded.

  1. Access Point performance

This screenshot shows the ‘top’ command running on the OpenWrt Access Point as well as the Network Status page at the same time as the screenshot above from the Kolibri device.

On the Network Status page you can see there are 18 connections to the Main AP.
The ‘top’ command shows that the device is running with a load average of 0.38, and so is not at all overloaded.

  1. In Summary


These results are encouraging and indicate that it should be possible to run larger numbers of concurrent clients from a single Kolibri server, and hopefully also using an RPi as the server.

A key point is that the video files must be optimised for FastStart to get these results.

Our idea is to attach multiple Access Point devices to the Kolibri server using Ethernet cables and a network switch, and to place one Access Point in each classroom so that the student’s client devices will get good quality WiFi connections. We have successfully used these AP devices previously in classrooms with up to 35 connected student devices.

We plan to run the same tests using a RPi Kolibri server over the next couple of weeks to see if the results can be replicated.

  1. Feedback

We would greatly appreciate any feedback from others who may have been down this path, successfully or otherwise.

Thanks in advance.


PostScript
In my previous post I had an typo error in the optimisation script around the optimised file extension.
It needs to be .mp4 for ffmpeg to work.
Apologies for any inconvenience this may have caused.
The corrected script is as follows:

---------------------------------
#!/bin/bash

# Find the pathname for all mp4 files in all subdirectories, then optimise each one in the loop

for vidfile in $(tree -if --noreport . | grep ".mp4"); do

   # Apply Faststart optimisation
   ffmpeg  -i $vidfile   -movflags faststart   -c copy   $vidfile.mp4

   # Replace the original file
   mv   $vidfile.mp4   $vidfile

done
--------------------------

Hi @tgillett,

Thanks for this in depth analysis - we’ve implemented your suggestions for any new videos coming into the KA-EN (US) channel in the next iteration of the channel, and will also look at doing targeted recoding of specific video files that are particularly large based on your recommendations here.

Note that out of an abundance of caution we aren’t yet going to recode all the video files, as we don’t yet to do binary diffing of video files, so any updates require a full redownload.

If there are is an exhaustive list of specific files, or some sort of size cut off for files that would most benefit from recoding, I’d be interested to hear!

Kind Regards,
Richard

Hi @richard

I understand your reservations about recoding the videos in bulk. An abundance of caution is barely enough!
I expect that there will be some testing undertaken to validate our results before making any changes to the content repositories.

For testing, I have recoded the video files on a specific Kolibri device (eg Ubuntu laptop) by installing the optimisation script in the /KOLIBRI_DATA/content/storage directory and running it from there. You need to install the ffmpeg and tree packages. I expect it should run ok on RPi.
The alternative is to remove the SD card and plug it into a laptop and run the optimisation there.

To answer your question about what size files to target for optimisation, I think that the answer eventually will be that every video file needs to be optimised.

While the effect of serving a few 100MB+ video files is pretty dramatic in consuming the network bandwidth for many seconds, the cumulative effect of sending many smaller video files, which are mostly in the 2 to 10MB range can be equally detrimental to the system performance.

But as a starting point, I suggest that any video file over 10MB should be optimised.

Our goal is to be able to serve 100 typical classroom users concurrently from a Kolibri RPi server, using good Access Points in each classroom. This is just a simple matter of economics.

I understand from Radina’s comment in an earlier post, that the currently accepted level of performance is that an RPi server will support ~25 concurrent users in a typical classroom scenario, and that the delivery of video content is a major determinant of this figure.

Our recent tests have given us some confidence that we should be able to make some improvements to the numbers, but it is early days yet. Trying to simulate large numbers of users for testing is not simple, and we have to make some brave approximations. I am sure you have been down this path before.

It would be interesting to see how some of your tests that led to the 25 concurrent user figure are affected by optimising the videos, particularly the smaller ones that are used in bulk.

Regards
Terry

Hi @richard
We have done some follow up tests using a Kolibri RPi (4B+ 4GB RAM) server.
The results were something of a surprise.

It quickly became obvious that we would not be able to get the same level of performance as in the previous tests where we used an i3 laptop as the Kolibri server, so we started with a lower level of load to see what was happening. The following screenshots show the results.

The test set up is essentially the same as previously, with the Kolibri RPi server connected by Ethernet cable to an Access Point to which client laptops connect via WiFi.
The server was running the Koilbri RPi image.
The video files used have been optimised.

  1. Test Stage 1
    Three client laptops connected, each with two browser tabs open and running one of the Entrepreneur videos from Khan Academy (US) channel. Below is a screenshot of the ‘top’ command running on the Kolibri RPi server.

You can see that the Load Average has risen to 4.19 from a value of 0.58 before the test was started.
At this point the Kolibri system is still reasonably responsive to user requests.
The Load Average value slowly reduces over time.

  1. Test Stage 2.
    The load is increased to four browser tabs each running one of the videos.
    Below is the ‘top’ command output.

You can see that the load Average value has gone up to 5.25.
At this point the system is becoming fairly unresponsive to user requests and it is effectively not possible to add any further load with extra clients.

  1. Access Point Performance
    The following screenshot shows the output of the ‘top’ command for the OpenWRT Access Point (GLiNet MT300N-V2) for the Stage 2 Test.

You can see that the OpenWrt system is not experiencing any significant load, with a Load Average of 0.04.

  1. In Summary

    We were surprised with the results. We had expected that the RPi device would have performed similarly to the i3 laptop running Desktop Ubuntu. But in fact the level of performance is much lower, and indicates that the RPi would not service a full classroom of students (which Radina pointed out earlier).

The overall improvement we expected to get from optimising the videos has not resulted in better overall performance (although large non-optimised videos will bring the system to its knees if even one user makes the request).

We don’t understand what is generating the workload on the Kolibri server.
The ‘top’ command shows that the main load is from the kolibri, nginx and uwsgi processes.
None of these seem to be consuming huge amounts of system resources.

We know that just serving the optimised videos doesn’t itself create a great workload as we can serve them directly from a web server (lighttpd) running on the Access Point, and deliver 30 videos concurrently just from the tiny OpenWrt system running on the AP.

So what other workload is consuming the resources on the RPi???

  1. More investigation required


Regards
Terry

Hi @richard

We decided to run some more tests on the RPi as a Kolibri server with different system configurations to see if we could get some understanding of the performance issue.

We made a fresh installation of RPiOS and then installed Kolibri from the .deb file.
We did the same thing with Ubuntu and installed Kolibri from the PPA repository.

The following test results were taken from the RPiOS version, but the results for the Ubuntu version were essentially the same.

After installing Kolibri we installed the Khan Academy Entrepreneurship videos, then ran the ‘optimise.sh’ script over the ~/.kolibri/content/storage directory to optimise all the video files for Faststart.

As in previous tests, the RPi was networked to an Access Point by Ethernet cable, and the client laptops connect to the AP via WiFi.

For an initial quick test we set up two client laptops and opened 6 videos running in browser tabs on each. We then opened another 6 videos on each client laptop, so that there were a total of 24 videos running.

  1. RPi loading
    The screenshot below shows the output of the ‘top’ command running on the RPi.

You can see that the Load Average is 0.34, so the RPi is not experiencing any serious loading.

  1. Network traffic
    The screenshot below shows the WiFi network traffic on one of the client laptops.

You can see that there are bursts of data traffic occurring for a second or so about every 10 seconds on average.

  1. In Summary


This is an entirely different test result to what we obtained in previous testing of the Kolibri RPi image installation, and more like we expected in comparison to the i3 laptop server test.

The Load Average figure for the RPi in these system configurations indicates that it does not experience any great load while delivering 24 videos to the client laptops, and the user experience is that the Kolibri server remains quite responsive at this level of load. I expect that we will be able to increase the load to 48 videos as in the i3 laptop tests.

I cannot offer any explanation for the difference in test results, but I would appreciate any thoughts as to what might be going on.

One notable difference in the ‘top’ command output with these latest tests is that the ‘uwsgi’ process does not feature.

Meanwhile, I think we now at least have a workable Kolibri RPi server configuration that we can use for field testing across multiple classrooms.

Regards
Terry

Hi @tgillett,

To be clear I understand here - the principle difference between these two test runs is that in the first instance, you were using the RPi image, and in the second, you just installed the deb package?

Kind Regards,
Richard

Hi @richard

That is essentially correct.

For the RPiOS test we installed the .deb package and for the Ubuntu test we installed from the PPA.

Regards
Terry

Hi @richard
I went back to review the various test results to make sure we had not made a mistake anywhere in the setup. Everything looked ok, but when I checked the first set of tests I noticed that the image used was version 0.15.1 which I had downloaded a few weeks ago when we started to prepare our field trials. When I checked the current download page I saw that the version available is now 0.15.5.

So I downloaded the new version and set up a fresh SD card, loaded the KA Entrepreneurship videos, optimised them and ran the test sequence again.

The results were much better than those from the 0.15.1 version - we were able to launch around 24 videos concurrently without much difficulty. We did notice that the Load Average figure climbed to around 3.0 as the videos were sequentially launched, and it eased back to around 1.0 after some time as the videos played.

By comparison, the Load Average figure for the test using RPiOS and the Kolibri .deb file did not climb much above 1.0 and settled back to less than 0.5.

To sanity check, I went back and re-ran the tests using the 0.15.1 image and got the same sort of results as previously, with only about 12 videos being able to get started and the Load Average hitting as high as 8.0 and not settling back much below 1.5 even with a limited number of videos running.

So it seems that image 0.15.5 is considerably better than 0.15.1, and is likely good enough to use across at least a couple of classrooms. (At this stage we can’t really equate our multiple video launching test to a real world classroom load - that is the next step for us.)

However it seems that Kolibri RPi 0.15.5 image creates a load on the RPi which is still noticeably higher than that created by the RPIOS/.deb combination.

I understand from the documentation that the RPi image includes some additional features to utilise the multiple cores in the RPi4 and some form of caching.

Is there some documentation that describes the differences in more detail please?
Or, if not, perhaps you could expand the explanation a bit.

I guess the question I am coming to is what do we miss out on if we build RPi servers using RPiOS and .deb file compared to using the Kolibri RPi image?
(Obviously the image is less effort to set up as it is already configured with hostapd, dnsmasq etc)

Really the only reason we would consider using an alternative to the Kolibri RPi image would be to be able to service a significantly larger number of students from the one RPi server device. Our initial test results indicate that this may be the case, but we need to do additional testing to get a more quantitative comparison, and get more real world numbers.

Thanks in advance for any feedback.

Hello @tgillett
I’d like to know if the difference ini the cpu load you’re noticing is due to the setup the RPi image uses for kolibri or due to the other services it uses (hostapd, dnsmasq mainly).
On the kolibri side the main difference is the use of the kolibri-server package that sets up redis caching, nginx & uwsgi to use the multicore the RPi provides. In GitHub - learningequality/kolibri-server: A performance-boosting access layer for Kolibri with multi-core support and improved caching you can see how these services are configured.
If you can test it, it would be great to know your results installing kolibri-server instead of kolibri in your RPIOS/.deb installation. The package’s available in the same repositories as kolibri, or you can download it directly from https://launchpad.net/~learningequality/+archive/ubuntu/kolibri/+files/kolibri-server_0.4.0-0ubuntu2_all.deb . Don’t need to uninstall kolibri, just install kolibri-server and will use it.

If you get the same results than using the RPi image, then this package is the one to investigate, if not, it would be the dnsmasq/hostapd configuration.
Maybe, the same caching service that kolibri-server uses via redis is causing this cpu load, but that’s a win later if you continue using the application. It could be also some of the proxying capabilities nginx is using what produces the difference. If the gap is not too high I think it’s worth keeping it because it should improve the browsers navigation when using files that are not videos and are easily cacheable.

Thanks
José

Hi @jredrejo

Thanks for your reply and the explanation of the differences between kolibri and kolibri-server.

We don’t plan to use the RPi WiFi in our setup as we use dedicated AP devices, so the hostapd and dnsmasq processes will be disabled by default. But I will check to make sure this is also the case in our tests.

One noticeable difference in our tests was the presence of uwsgi in the ‘top’ command output where it shows up often towards the top of the cpu usage list when running the Kolibri RPi image (ie the kolibri-server version). So I suspect some of the resource usage difference may be associated with this.

I will run the test to compare kolibri and kolibri-server as you suggest, although it will be a couple of days before we can fit it in to our schedule.

As an aside, we also have caching capabilities in our classroom APs, but we don’t want to look at introducing that until we have the basic system operating in a stable manner.

Thanks again for your help.

Terry

Perfect, thank you. Looking forward to knowing your results using kolibri-server, it’d be great if we can optimize its setup.

Also, it’s normal uwsgi is up on the top list: when using kolibri-server, it’s running the python code, replacing the kolibri command. Also, there are several instances doing it, depending on the number of cores of the server microprocessor. That produces more initial load of the machine (it has an autoscaling setup to reduce/increase the number of threads depending on the traffic) but it also provides more performance when Python operations are done in the server.
When using kolibri alone all the calculations have to be done one by one, while using uwsgi they can be done in parallel. I think it’s logical to suppose this will produce more load in scenarios where just static files are served but a better performance when calculations are executed as kolibri is not restricted to run in only one core of the machine. The most voted response in this thread explains it quite well.

@jredrejo
Thanks for the explanation.

I am a little wary that performance comparisons based on our simple video launching test may not be valid in real world usage due to the difference between static content and content that requires calculations, as you mention.
It would be good to devise a better test that better represents the real world usage!

By way of background, we have two key requirements :

  • Firstly we need to be able to support as large a number of students as possible from each server to make the system economics work in our target schools which are very remote and low budget.

  • Secondly we need the system to be really stable and robust. Instability causing distraction in the classroom is fatal to the acceptance of the system. Any system failure is also fatal as there are simply no technical resources available locally to fix it.
    I am sure many implementation teams are dealing with these same requirements.

So I guess there is a balance to be struck between adding complexity to improve performance against potentially decreasing the robustness and stability of the system.

Interesting stuff!

Hi @jredrejo

A couple of questions please:

  1. Can you nominate some specific Channel content that utilises the redis/uswgi multi thread capabilities in kolibri-server please?
    I would like to see if we can incorporate this type of content in our testing if possible.

  2. As an alternative to substituting kolibri-server for kolibri, would it be possible to use kolibri-server (ie install the Kolibri RPi image) and then stop/disable the redis/uwsgi services as required for A/B comparison testing?
    This would make setting up for testing much simpler.

Regards
Terry

hello @tgillett

Can you nominate some specific Channel content that utilises the redis/uswgi multi thread capabilities in kolibri-server please?

That will not be determined by a Channel, but by the type of interactions: a typical example would be several learners in different browsers interacting with pdf, exercises, etc. while a coach is looking in his browser the interactions in real time as explained at Coach your learners in Kolibri — Kolibri User Guide . Most of the content is static and cached by kolibri so the work of the cpu there is very low. The only exception could be html content that’s zipped and required to be extracted while served, these are, for example in the PhET Interactive Simulations channels

As an alternative to substituting kolibri-server for kolibri, would it be possible to use kolibri-server (ie install the Kolibri RPi image) and then stop/disable the redis/uwsgi services as required for A/B comparison testing?

Sure, you can do:

sudo systemctl stop kolibri-server
sudo systemctl stop nginx

to stop kolibri server and , then,
kolibri start
to switch to the standard kolibri package

Regards
José

Hi @jredrejo

Thanks, that is very helpful.

  1. So if we can incorporate launching some PHET pages in our testing that would generate some workload that should be better handled by kolibri-server.
    Is that a correct understanding?

  2. That is a nice way to switch the kolibri package. Thanks.

  1. So if we can incorporate launching some PHET pages in our testing that would generate some workload that should be better handled by kolibri-server.
    Is that a correct understanding?

Yes, but the best way to load to the server is doing things like the coach real-time reports. With the PHET pages, the content will be unzipped and then statically served, so depending on the size of the initial file, it will produce cpu spikes but not a permanent load.

Thanks.

Automating coaching might be an interesting project :wink:

We will take a look.

If you want to get really into the weeds here - one thing we have looked into but not tried yet internally is the Chrome recorder tool: Record, replay, and measure user flows - Chrome Developers - this can produce puppeteer scripts that can then be programmatically replayed.

This could then be combined with something like puppeteer-loadtest - npm to launch multiple headless browsers in parallel to test the server.