The code behind my “Stay at Home DJ/Visuals set”

Shane Mannion
10 min readJan 31, 2021

Here’s an audio-visual application I wrote. It was both an artistic and computing endeavour, and in trying to describe it in this article I delve into some detail in both realms. If you are a non-programmer reading this and find me drifting too far into the technical, skip that paragraph — and vice versa for programmers.

I posted a Vimeo recently, which was a big milestone for this project.

It all stemmed from a pretty simple idea; I wanted to run a slideshow of photographs where the image changed in time with music. I’ve been DJing since the mid-90s. I also enjoy photography, so I liked the idea of showing photographs that reflected the mood of the music. This was intended to add atmosphere and create a more immersive experience. In looking at available software I found lots of options for working with video in music but couldn’t really find anything for photos specifically. What I needed was something where I could configure how often the image changed, and synch a start point against the start of a drum pattern.

As a software developer and engineer, it seemed that if the software didn’t exist then I should build it myself. I live in Toronto, and driving around the city listening to house music I noticed that the countdown timer at cross walks would change on the kick drum of the track. Most of the music I was listening to was around 120 beats per minute (bpm) ±3%, so the synch would line up at two beats per second.

This was one of those things where once you notice it you see it everywhere and it became a constant reminder of this project, in a sense goading me during my daily commute to make progress.

Slideshow Bob

And so Slideshow Bob was born, or Slideshow for short. It is an application written in Java to read collections of images from folders on my laptop and display them on a screen. With a laptop connected to a second monitor, the application will present these photos in full screen mode, which works well for projectors.

The application is multi-threaded, with the following execution flows:

  1. display image
  2. get next image
  3. do background processing, which includes getting input from the user on which images to load into memory and managing this loading

MVP#1 of Slideshow used a text file for input, where I specified the bpm of the music, which images to display, change frequency (once per beat, twice per beat, or every two beats, etc) and how many times to loop through a folder before moving to the next entry in the text file. Although it allowed me to meet the original objective of running a slideshow of images x seconds apart, updating a text file with a computer keyboard just didn’t jive with the flow and energy of DJing.

MVP#2 evolved to using a MIDI controller for input: the KORG nanoKONTROL2. This was a huge improvement. Often used to control DJ software, this controller was much more intuitive way to sequence and synch images with beats.

KORG nanoKONTROL2 midi controller

Each of the channels (or control groups) on the controller can be assigned to a folder on my laptop. This allows a maximum of 8 sets of images (potentially hundreds of photographs) to be loaded in memory at once, from which a live video stream can be composed in response to the music, the dance floor dynamics or to light up the room.

Channel or Control Group

The controller assignments are as follows:
1. Knob with 1..128 values linked to individual folders containing images
2. Slider with 1..10 values, representing 0.032, 0.062, 0.125, 0.25, 0.5, 1, 2, 4, 8, 16 image transitions per beat
3. R button used to load the change folder/parameters dialog
4. M button used to open/mute a channel

Transport Controls

The transport controls were also assigned as follows:
1. Fast Forward used to move to next open channel
2. Rewind to first image for current folder, resetting number of loops to zero
3. Cycle used to invoke BPM dialog. Track Last used to tap in beat in this mode.
4. Track Next used on change folder/parameters dialog to increase number of loops for the new folder. Default is 4.
5. Track Last used on change folder/parameters dialog to decrease number of loops for the new folder.
6. Marker Next used to select next folder on change folder/parameters dialog
7. Marker Last used to select last folder on change folder/parameters dialog.

The intention in explaining the controls in this detail is to help the reader conceptualize how this works. The controller maps to the main Slideshow Bob GUI, shown below with labels added to indicate the purpose of each element. This was created using Swing, which is an old Java toolkit, but one I’ve worked with before. Admittedly the GUI looks like it was created in the last century, but as it is never seen by the audience, I went with it rather than invest time in learning a new Java UI framework.

Slideshow GUI maps to KORG nanoKONTROL2 midi controller

My laptop has seen better days, like 2010. It’s a 2008 MBP, so it is 91 in dog years. I suspect Slideshow Bob will run better on a newer machine, which is planned for this year. The performance impact is that video rendering is a little glitchy sometimes. In this video I think that happened just once, where it became stuck on one image for 2 seconds and then rendered a whole bunch together to catch up.

The application is 3000 lines of code, including Unit Tests. It cycles every 10ms, while processing input from the midi controller, rendering images and reading files into memory. To develop this, with so many moving parts I used TDD where I could code and test small parts in isolation and then integrate these parts into the application. I am a big fan of Unit Testing and TDD as part of development, it takes a little more effort up front, but it improves the design dramatically and has a lasting effect on quality. When it came to refactoring the user input from text file to midi controller, the existing tests were invaluable. After a period of organic growth where I experimented with different features, I embarked on a major refactor to optimise the design, which again was more manageable due to existing unit tests.

I don’t have a deployment pipeline as Slideshow runs on my laptop from the target folder where the jar is created by a maven build command. The build also runs the Unit Tests.

A lot of code has been removed through several refactoring efforts and there’s lots to do, perhaps a bigger controller and improvements from running on a faster machine, but at MVP#2 I am pretty happy with how it now stands.

The Photos

When I started out with this project, the images I wanted to show were quite bright.

For this purpose, as electronic music is often played in low lit rooms, I found that these brighter images projected in large format were not suitable. It was like bringing all the lights up, which should only happen at the end of the night. Slideshow was detracting from the atmosphere rather than adding to it. Reconsidering the images, I started to experiment with different image processing techniques and new subjects.

A good friend suggested inverting the images to reduce light and I experimented with this approach, first converting colour photos to black and white. I needed to process large numbers of images, which required automation. Python seemed the obvious choice for this type of task, as I’ve used this language for years and find it really useful. There are lots of open source libraries available for image processing and other purposes. I used the following approach for a number of sets of images to create high contrast, low light images.

The comments in the code cover most of the detail. It cycles through each image file in a folder and applies the same processing to it - open the image, change to grey scale, invert the image. In a black and white digital image, with resolution of 1280 x 853, there are more than 1 million individual pixels. Grey scale has a range of 0 to 255, from black to white. The value max in this code is the whitest pixel in the inverted image. Any values below floor will be mapped to black. Values above the prescribed floor are shifted to an evenly distributed map, created using the linspace function from numpy.

Map illustrating how grey scale values are mapped based on a look up table with a floor of 180 and max of 254.

The following images show each phase in this transformation.

Original image
Change to grey scale
Invert the image
Apply floored look up table

This approach reduces the light emitted and creates an image that aligns with the machine-driven aesthetic of certain electronic music. Here’s another example of before and after processing.

Before
After

I have used other techniques in this recording too, but this is the one I have used most. Originally I started out with collections of unrelated photos that felt like they worked together. Through the course of the project I became more interested in sequences that captured motion of one or more subjects, usually taken with a DSLR on a tripod, with a high-speed card that could capture multiple images in a short time frame. The movement in the images reflected the movement in the music.

Electronic music usually has a 4/4 time signature, although changes often coincide with the 8th, 16th, 32nd or 64th beat. Creating groups of photos with either 8 images, or an even number multiple of 8 created more opportunities for changes in the music to coincide with changes in images. Similarly, looping these groups for 1, 2 or 4 iterations also helped create changes that synched in both music and visuals.

The Video

I used a Zoom H4n recorder for the audio, a Canon DSLR for the decks and mixer and a screen capture of my 2nd monitor to record the visuals output. Weaving these three strands together was very challenging. I didn’t have a good video editor on my laptop, and given its age and spec I didn’t think it worthwhile to pursue that approach. Instead I looked to Python and available video editing libraries to make these strands work together. MoviePy is an open-source library with a lot of neat features.

To me this was a compute task where it made sense to do the work in the cloud, rather than struggle with available for versions for the now unsupported OS on my laptop. As the process of recording this mix (including visuals and video) ran over several months, while I tried different tracks and different images, I created an AWS AMI to configure a Python virtual environment and complete pip installs for moviePy and all its dependencies. I don’t recall the exact issue, but I had a problem in using moviePy for something, possibly positioning of video within video. The solution was to use version 1.0.0 of the library, rather than the latest, so I baked this version into my AMI. I used an AWS EC2 instance with a few cores and enough RAM.

I wrote scripts to do picture-in-picture with the video files, changing the focus from Canon feed as the primary, to the visual feed as primary with pieces from the Canon inset. DSLRs have a maximum recording length of 30 minutes, which means restarting the recording twice for this set and synching the video to audio and visuals again. Each of the layouts were intended to highlight the most relevant video from the options available, based on what was happening at that point in time.

With a Python script to cater for different combinations of the video feeds and different inset options, I used a csv file to specify which parts of each video file to be combined. Each snippet of combined video was saved as a separate scene file, with the final version comprising 36 different scenes, which were subsequently concatenated into one video file.

To create each of these scene files and combine into one end file took around 8 hours on an EC2. Video processing is resource intensive.

At this point there were 27 scene files. All files created here were part of one execution of the scripts. This shows the time taken to build individual scenes and compile into one file.

I tried many times to add the Zoom audio recording to the combined video file but just could not get it working with MoviePy. I looked for other options and found ffmpeg, which is a free and active open source project that has been running for 20 years. Ffmpeg is a powerful command line tool for processing video and audio. Had I been able to use MoviePy for this purpose, the script would have taken 2–3 hours to run. Ffmpeg could complete the task in a few minutes. Installing ffmpeg is not trivial, but I found great guides for AWS EC2s and Raspberry Pi, both by the same author. Here’s the command to add audio to my finished movie.

Along with giving an insight into how this project came together, I hope this article might be useful to someone else who is trying something similar or using some of the tools that I’ve used. I’ve had some good feedback on the output, which has been much appreciated. Great to hear that other people see and hear what I see and hear in this idea. Next step will be to start on another set, spend some time working on the code again and get working on some new photos.

More audio here.

--

--