Automating and Updating Large Channels in Kolibri Studio

Hello,

I’m working for an organization and need to upload content from our website to Kolibri Studio.

I’m making a channel called PSL (Pakistan Sign Language) Dictionary. I have around 80 categories and an average of 100 videos per category.

What would be the best (efficient) way to automate this process keeping crucially in mind that adding/removing videos (and its metadata) as well as adding/removing categories (and its metadata) would be inevitable in the future.

From my experience, there are potentially 2 options (both involve ricecooker):

  1. Writing a sushi chef script which reads in data (csv) and creates the channel and its contents. For updating, we would implement conditioning in the csv and subsequent conditioning implementation in the script which would execute by re-running the script. For example; to remove a video, specifying a “delete” column in the csv and setting it 1, and handle it accordingly in the script when re-running. This re-running mechanism seems inefficient since ricecooker downloads and uploads files even when they are in stored in the cache. Especially considering the case if i have a huge channel and i simply want to edit a video’s name (not possible manually since API generated channels not editable), then my script would re-run processing 1000s of videos all over again.
  2. Using the CSV Workflow which automates the whole uploading process given that the channel and its contents follow a specific csv format. This has a huge limitation in terms of updating content.

In conclusion, the main issue seems to be in updating content once its uploaded on the studio. Given that the corrections workflow and the StudioApi doesn’t work anymore as mentioned in:

We are left with a big hole in terms of updating the content once uploaded. Hence, let me know your thoughts and the main query of what is the best approach to tackle this, given ricecooker’s limitations? Thank you.

Regards,
Essa, FESF

Hello @essa

First, if you’re using ricecooker, you can update the channel in the future by using the same tool. There’s no need to use the deprecated corrections workflow. Simply use ricecooker again with the same channel ID and your API token to make updates.

Regarding the option to upload the channel, it’s a tradeoff:

  • CSV Workflow: Requires less effort compared to writing a chef script but offers less flexibility in how the channel is structured.
  • Chef Script: Provides more flexibility but will require additional work to set up and maintain.

Ultimately, the best approach depends on your specific needs and available resources. Keep in mind, since you’re using ricecooker, you can switch between the two methods for different uploads if necessary.

Best regards,
José

Hello again @essa
I forgot to comment on this from your first message:

This should not be happenning unless you’re deleting ricecooker cache in your local disk. If you do it, videos will be redownloaded and processed again. If this is happening to you it’s a bug we are not aware and should be fixed.

Regards
José

1 Like

Hello José,

I did try the CSV Workflow approach and did verify your following point through seeing the send/receive speed under the Performance tab in my Task Manager:

I think the reason I had this confusion was due to the fact that when you use either method (sushichef or csv workflow) and given that you already have files in the ricecooker cache, the console still prints out a few statements that may seem ricecooker is re-downloading and re-uploading everything again.

Like for example the following log entry says that it found the particular video from the ricecooker storage, but then ricecooker is supposedly uploading the file again on Kolibri Studio (?):

INFO     2024-09-17 16:23:02 VideoResource - Video preset from /home/essa_fesf/ricecooker-psl-content/ricecooker-script/storage/f/4/f41a1f90ef808dd8211c8d14a281f68f.mp4 = high resolution
INFO     2024-09-17 16:23:03 root - 	Uploaded cfd34a1b335e458e451d256ddd7eed5c.mp4 (162/1183)

Another query/confusion is related to ricecooker creating the tree all over again whenever you re-run a script, as indicated in the below log message:

INFO     2024-09-17 16:28:58 root - Creating channel...
INFO     2024-09-17 16:28:58 root - Creating tree on Kolibri Studio...
INFO     2024-09-17 16:28:58 root -    Creating channel Adjectives
INFO     2024-09-17 16:28:59 root - 	Preparing fields...
INFO     2024-09-17 16:28:59 root - (0 of 593 uploaded)    Processing Adjectives (ChannelNode)

Please let me know your clarity on this. Thank you!

Regards,
Essa, FESF

Hello @essa

About your questions,

no, it does not upload them if the hash of the files are the same. If you haven’t removed your local cache nor modify a file it won’t be uploaded. You see the logs telling the video was already uploaded (thus 1 second before the video is seen until you see the message of uploaded)

I don’t quite understand what you mean: where’s the issue? ricecooker shows the logs of the tree and will modify it in Studio if needed. Seeing the tree in the logs does not mean it’s been changed, it means ricecooker needs to detect any difference from your new local tree and the one in the Studio server to decide if changes are needed.

Regards
José

1 Like