How not to do cluster monitoring.

The world should have blinkenlights on its computer systems. That’s a given.

I wrote a couple of things. One was a Python program that pinged the four machines forming the cluster, and displayed a red or green light on a UnicornHD HAT to show their status. It worked very nicely. Then I wrote Python code to form part of any program that would be run in parallel on the cluster, which would send a signal saying whether each core was busy or not. It worked nicely, and I now had a row of 16 LEDs, in red or green, so I could see what was going on. It was very pretty.

Unfortunately, as it worked by sending a file by FTP every time a processor core changed between running and idle, it created a very effective Denial of Service attack on our network. Oops.

Now that I have thought about it more carefully, I shall be constructing a much better monitoring system, which will be based on sockets. I’ve been avoiding learning how to use them for far too long, anyway…

Later:

I tried at least umpteen example programs using sockets, and the connections were all rejected, and I couldn’t work out how to fix that. Suggestions, anyone?

Using a Python program to query the cluster computers took nearly six seconds to look at the 16 cores, hardly blinkenlights… A quick hack of a bash script, astonishingly, took almost as long. Back to trying to get sockets to work, then…

Working sockets tutorial!

At last, I found a socket programming example that worked, here!

I wanted to give Zan a tiny donation, but sadly his GoFundMe page seems defunct, and possibly the message I tried to send him also failed…

Sadly, I was then unable to work out how to accept multiple connections from the cluster computers.

Threading sockets programs!

There’s another set of client-server demos on GitHub, here, that I tested with Marvin and two of the Oysters, to confirm that it can do what I want. I can hoik code from those while retaining the program logic, and maybe get all four Oysters to send their status to Marvin, for him to display. I am not at all bothered that I am writing control system code for the cluster, instead of getting round to some fun applications of parallelism

Fear and Loathing with mpirun

My Raspberry Pi cluster, named “Oyster” because of something to do with the Walrus and the Carpenter, had been out of commission for months, so I eventually got to work and tried to set it up from scratch, as an alternative to checking everything over and over again, and failing to find anything wrong. Oyster used to look like this, but the Pi 3s in Lego compatible cases ran too hot.

At first, I attempted to use my only Pi 4, “Marvin”, as the control machine for the four Pi 3s in Oyster, but I got something wrong in the setup, and it didn’t work. I had been thinking 20 cores would obviously be better than 16, but suspected the two different user names in use might be causing the problem. It probably wasn’t, as I now think the ssh communication for the cluster is done anonymously. Possibly. Anyway, I changed my mind about 20 cores, when I thought about the other tasks Marvin runs, and how I didn’t want them slowed down. Sixteen will do. Unless I get some more Pi 3s and add them to the cluster…

Anyway, I went through all the setup described in Ashwin Pajankar’s e-book* about Raspberry Pi “supercomputers”, again. Twice. I found some online guides, and checked the setup with those, too. I could run programs on the four cores of oyster0, but not on the other three Pis. Eventually, I spotted that I had forgotten to create the file “known-hosts” on the controlling Pi. The message passing software, python3-mpi4py, would probably have told me this, if I had run it in verbose mode, but I didn’t. Still, it now works, and I have a sixteen core Pi cluster running.

I have added a program, running on Marvin, that lights up LEDs to show the state of the Pis in the cluster, and intend to add further blinkenlights to show the activity of each of the sixteen cores.

# Program to monitor status of Oyster cluster, displaying up/down indication.
#import os
import time
import subprocess as sp

machines = ["oyster0", "oyster1", "oyster2", "oyster3"]

while True:
    for i in range(len(machines)):
        machine = machines[i]
        state  = sp.call(['ping', '-c', '1', machine], stdout=sp.DEVNULL)
        
        if state == 0:
            colour = "0 200 0" # Green means UP
        else:
            colour = "200 0 0" # Red means DOWN 
            
        row = "12 "
        if machine == "oyster0":
            column = "0 "
        elif machine == "oyster1":
            column = "4 "
        elif machine == "oyster2":
            column = "8 "
        elif machine == "oyster3":
            column = "12 "
    
        message = "set_pixel " + row + column + colour
        
        fp = open("/home/chris/ftp/files/blinkenlights" + machine, 'w')
        fp.write(message)
        fp.close()
        
        time.sleep(1)

This code sends a text file to my Unicorn HD server program, which enables more than one program to write to its 256 LEDs, without messing up each other’s displays.

My plans for further development include improving my set of framework code for running parallel programs on the cluster, and more (or possibly less) importantly, to send messages to Marvin for the blinkenlights.

[* I haven’t included a link to AP’s e-book, because it’s easy to find online, and he is charging far too much for it, while others are giving us the same information free.]

Raspberry Pi 7 Segment display

Introduction

One of my #RaspberryPi Zeros is called PiClock, and has an 8 digit seven segment LED display. The program it runs displays the time, and sends it to two other Pis, that display it on Unicorn HD HATs. Between midnight and 8 am, it flashes the message “SLEEP” every five minutes, as well. The software library that it uses can display numbers, and most upper and lower case letters, but not all of them. I rather liked the idea of animating sequences of single segments on it, because, well you know, blinkenlights. I had a look at the software library, “7seg.py”, to see if I could get it to do that.

It turns out that the library uses a Python dictionary to look up the byte to send to the display for each of the characters it can display. Decoding the hexadecimal bytes took a few minutes, working from the code for the digits from 1 to 5.

The first bit is always a 0. The remaining seven are the seven segments, in the order abcdefg, which are laid out like this…

So, the codes for illuminating single segments are as follows…

Now to amend the library! I needed some typeable characters to put in the dictionary, ready to be used in strings in the python code. For no obvious reason, I chose a selection of brackets and the tilde character, and amended the library file. The selection of brackets didn’t work!

After trying characters until they did work, I ended up with #][£<$~ as the symbols for the segments abcdefg.

I’m only showing the amended part of the file, where the pattern to send to the display is looked up. The arrangement of the brackets and tilde for the segments is as follows…

Now I’m ready to program PiClock to do silly animations, which will be fun, and a lot easier than using the WordPress editor. Note to self: See if you can find a WYSIWYG editor for WordPress.