How not to do cluster monitoring.

The world should have blinkenlights on its computer systems. That’s a given.

I wrote a couple of things. One was a Python program that pinged the four machines forming the cluster, and displayed a red or green light on a UnicornHD HAT to show their status. It worked very nicely. Then I wrote Python code to form part of any program that would be run in parallel on the cluster, which would send a signal saying whether each core was busy or not. It worked nicely, and I now had a row of 16 LEDs, in red or green, so I could see what was going on. It was very pretty.

Unfortunately, as it worked by sending a file by FTP every time a processor core changed between running and idle, it created a very effective Denial of Service attack on our network. Oops.

Now that I have thought about it more carefully, I shall be constructing a much better monitoring system, which will be based on sockets. I’ve been avoiding learning how to use them for far too long, anyway…

Later:

I tried at least umpteen example programs using sockets, and the connections were all rejected, and I couldn’t work out how to fix that. Suggestions, anyone?

Using a Python program to query the cluster computers took nearly six seconds to look at the 16 cores, hardly blinkenlights… A quick hack of a bash script, astonishingly, took almost as long. Back to trying to get sockets to work, then…

Working sockets tutorial!

At last, I found a socket programming example that worked, here!

I wanted to give Zan a tiny donation, but sadly his GoFundMe page seems defunct, and possibly the message I tried to send him also failed…

Sadly, I was then unable to work out how to accept multiple connections from the cluster computers.

Threading sockets programs!

There’s another set of client-server demos on GitHub, here, that I tested with Marvin and two of the Oysters, to confirm that it can do what I want. I can hoik code from those while retaining the program logic, and maybe get all four Oysters to send their status to Marvin, for him to display. I am not at all bothered that I am writing control system code for the cluster, instead of getting round to some fun applications of parallelism

Success!

I now have a GUI program that runs on Marvin, which takes a program developed on Marvin, deploys it on the four Pi’s that make up the oyster cluster, then uses mpirun to run it in parallel on 16 cores, with the results appearing on Marvin.

It’s clearly time to knock off and celebrate…

Raspberry Pi Cluster test

I’m just testing my Raspberry Pi cluster, to see if I have sorted out the setup properly this time. Finding the primes up to 10,000 with one core, and then sixteen cores, followed by using 16 cores to find primes up to 100,000 gave these results…

pi@oyster0:~ $ mpirun -hostfile myhostfile -np 1 python3 Programs/prime.py 10000
Find all primes up to: 10000
Nodes: 1
Time elapsed: 4.34 seconds
pi@oyster0:~ $ mpirun -hostfile myhostfile -np 16 python3 Programs/prime.py 10000
Find all primes up to: 10000
Nodes: 16
Time elapsed: 0.34 seconds
pi@oyster0:~ $ mpirun -hostfile myhostfile -np 16 python3 Programs/prime.py 100000
Find all primes up to: 100000
Nodes: 16
Time elapsed: 23.78 seconds

So, it is all working as it should now. Next step is to add blinkenlights on the supervising machine, Marvin, which has a UnicornHD HAT. After that, I want to get my GUI based supervisor working.

This leaves far too little time to flog stuff on eBay! I shall have to write a program to do that…

Update:

I ran it for the primes under a million, and it was disturbingly slow. I’d hope for something less than ten times as long as for a hundred thousand, but no!

pi@oyster0:~ $ mpirun -hostfile myhostfile -np 16 python3 Programs/prime.py 1000000
Find all primes up to: 1000000
Nodes: 16
Time elapsed: 2279.37 seconds

Almost ten minutes. I’m assuming things ended up swapping memory in and out, or Python doesn’t handle big integers very well. It’s not a problem, but it is one of the reasons I want blinkenlights…

At last, an app with a GUI!

For a while, I have been turning the camera on the Pi in the greenhouse on and off manually. By that I mean…

  • Connecting to the Pi using VNC
  • Opening the /var/tmp directory in the file manager
  • Creating a file called blind to switch the camera off, or
  • Deleting/renaming the blind file to switch the camera on
  • Disconnecting from the Pi

That’s clearly a huge faff, so I decided I’d finally make a GUI based program to do the job, using guizero. Other GUI libraries were either immensely complex or mysteriously impossible to install. I looked at a couple of example programs, and cobbled this together…

import paramiko
from guizero import App, Text, PushButton

def switch_camera_on():
    ssh_stdin, ssh_stdout, ssh_stderr = ssh.exec_command("sudo rm /var/tmp/blind")
    
def switch_camera_off():
    ssh_stdin, ssh_stdout, ssh_stderr = ssh.exec_command("touch /var/tmp/blind")

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect("server-name-here", username = "user-name-here", password = "password-here")

app = App(title="Greenhouse Camera Control", bg = "lightblue", width = 400, height = 200)
welcome_message = Text(app, text = "Click button to switch camera state.", size = 15)
onswitch  = PushButton(app, command = switch_camera_on,  text = "Camera on.")
offswitch = PushButton(app, command = switch_camera_off, text = "Camera off.")

app.display()

It does need you to have used ssh-keygen on your systems, so they can communicate. I can’t believe how easy simple stuff like this is to hack out!

Fear and Loathing with mpirun

My Raspberry Pi cluster, named “Oyster” because of something to do with the Walrus and the Carpenter, had been out of commission for months, so I eventually got to work and tried to set it up from scratch, as an alternative to checking everything over and over again, and failing to find anything wrong. Oyster used to look like this, but the Pi 3s in Lego compatible cases ran too hot.

At first, I attempted to use my only Pi 4, “Marvin”, as the control machine for the four Pi 3s in Oyster, but I got something wrong in the setup, and it didn’t work. I had been thinking 20 cores would obviously be better than 16, but suspected the two different user names in use might be causing the problem. It probably wasn’t, as I now think the ssh communication for the cluster is done anonymously. Possibly. Anyway, I changed my mind about 20 cores, when I thought about the other tasks Marvin runs, and how I didn’t want them slowed down. Sixteen will do. Unless I get some more Pi 3s and add them to the cluster…

Anyway, I went through all the setup described in Ashwin Pajankar’s e-book* about Raspberry Pi “supercomputers”, again. Twice. I found some online guides, and checked the setup with those, too. I could run programs on the four cores of oyster0, but not on the other three Pis. Eventually, I spotted that I had forgotten to create the file “known-hosts” on the controlling Pi. The message passing software, python3-mpi4py, would probably have told me this, if I had run it in verbose mode, but I didn’t. Still, it now works, and I have a sixteen core Pi cluster running.

I have added a program, running on Marvin, that lights up LEDs to show the state of the Pis in the cluster, and intend to add further blinkenlights to show the activity of each of the sixteen cores.

# Program to monitor status of Oyster cluster, displaying up/down indication.
#import os
import time
import subprocess as sp

machines = ["oyster0", "oyster1", "oyster2", "oyster3"]

while True:
    for i in range(len(machines)):
        machine = machines[i]
        state  = sp.call(['ping', '-c', '1', machine], stdout=sp.DEVNULL)
        
        if state == 0:
            colour = "0 200 0" # Green means UP
        else:
            colour = "200 0 0" # Red means DOWN 
            
        row = "12 "
        if machine == "oyster0":
            column = "0 "
        elif machine == "oyster1":
            column = "4 "
        elif machine == "oyster2":
            column = "8 "
        elif machine == "oyster3":
            column = "12 "
    
        message = "set_pixel " + row + column + colour
        
        fp = open("/home/chris/ftp/files/blinkenlights" + machine, 'w')
        fp.write(message)
        fp.close()
        
        time.sleep(1)

This code sends a text file to my Unicorn HD server program, which enables more than one program to write to its 256 LEDs, without messing up each other’s displays.

My plans for further development include improving my set of framework code for running parallel programs on the cluster, and more (or possibly less) importantly, to send messages to Marvin for the blinkenlights.

[* I haven’t included a link to AP’s e-book, because it’s easy to find online, and he is charging far too much for it, while others are giving us the same information free.]

Fun with rpi-connect

To begin at the beginning…

It all started when I read a news item saying RealVNC was going to change its terms and conditions unilaterally, so that home users would be limited to connecting to three computers. I was using it to view and control anything up to fifteen computers.

Panic set in. I began to research other free VNC implementations. Then it was pointed out to me that it was only remote access over the internet that would be affected. I hadn’t been doing that at all. It seemed all the Pi’s on my network would be unaffected. To ensure things would not get changed, I set RealVNC to NOT update automatically on all of them.

During my researches, I heard that a Pi running Wayland instead of X-Windows could be remotely accessed just like VNC, using rpi-connect, and decided that would be interesting to do, anyway. So I needed to update a machine to run the Bookworm version of RaspberryPi OS, which does use Wayland. I have only one Pi 4 at present, and did the thing they tell you never to do – I attempted an in-place upgrade. It ever so nearly worked. But it wouldn’t run rpi-connect. So, I pulled the Pi to bits, took its SSD off the Waveshare adapter it uses, and burned a fresh copy of Bookworm. Once it was all re-assembled, the Pi booted up normally, and rpi-connect was usable. Well, there was the small matter of going online and associating the Pi, whose name is Marvin, with my RaspberryPi ID. Which I have.

Here is a screenshot of my Linux Mint box, with VNC Connect at the top left, a VNC session into the street camera at the bottom left, another remote session on a Pi, at the middle left, and a remote session on Marvin…

Ideas for the next step…

Firstly, it seemed like a good idea to verify that I did indeed have remote access. I turned off wi-fi on my phone, and used Chrome to access the Raspberry Pi sign in page, which was fine. And on connecting to Marvin, I got this…

So, remote access definitely works. I don’t think I will be using my phone for the job, but one of my tablets, or even the Chromebook should work just fine.

The next thing was to write something to enable Marvin to monitor WeatherPi, which has an occasional problem with its greenhouse temperature sensor, that results in the weather monitoring and upload program crashing. I used the Python Paramiko library, and made a program (cobbled together from earlier versions) that checks the weather station every five minutes. If the machine itself is down, it sends a message to my phone, using Pushover. If the machine is up, it checks whether the weather program is running, and attempts to restart it otherwise. I added starting this program to the other programs Marvin runs at startup, which is done by this bash script –

#!/bin/bash
lxterminal --title "UnicornHD" -e 'python3 Programs/unicorn_server.py && read x' &
sleep 2
lxterminal --title "Fan HAT" -e 'python3 Programs/fan65.py && read x' &
sleep 2
lxterminal --title "Weather Station" -e 'python3 Programs/WeatherMonitor.py && read x' &
sleep 10

That was unnecessary!

Somewhere online, while I was trying to get the remote monitoring to work, I found a way to make a python program keep going. Using a bash shell script to run the program like this means that if the program does stop, it will immediately be run again, until the terminal it was run in is closed.

while : ; do
    now=$(date)
    echo $now >> restart.log
    python3 /home/pi/program-to-run.py
    sleep 10
done

This does not work with programs using Pi cameras, when the camera crashes. I wonder if there is a camera resetting utility available, as rebooting is not possible inside this script! I wonder if libcamera can do a reset of the camera? Must have a look…

No compatible streams are available

That title is an error message from Emby, the music, movies, and more, server that I use on my systems, so all our music is available. It’s really very, very good. But sometimes it stops working. And you get that error message.

All over the internet, you can find people searching for that message, asking what it means, and how to fix it. What you won’t find is anyone giving a simple, direct answer, not even the people who wrote it.

I don’t claim to know the answer in every case. But I can tell you what caused it on my system, and how I fixed it. It is worth seeing if you have the same problem…

When I set up Linux Mint on my big PC, the hard disc the music is on appeared to be in the /mnt directory. I didn’t decide that, it’s just how it set itself up.

I had a look at the directory structure, and found that the hard disc now appeared in the /media directory. I hadn’t moved it.

So, I told Emby to forget its libraries, and then added them in again with the right path.

It worked.

This is virtually amazing!

I’ve been wanting to change my main PC from Windows to Linux for ages, but held back because there are a few applications on it that I found didn’t have a Linux equivalent, such as the old version of OneNote that I like, and the Canon photographic utilities for my DSLR camera. The OneNote version is the one that just works on the local machine, rather than the newer one that insists on putting things “in the cloud”. I want my data here, not somewhere I am unable to control, and might get disconnected from. I have a strong dislike of dual booting systems, from back when they used to be a real pig to work with, and kept going wrong…

A lot of people have told me, “Oh, you can run your Windows software under Linux, using Wine”. There’s something wrong with me, as I never could get any of those programs to work in Wine. I think I prefer wine to Wine…

What I needed, clearly, was to keep a virtual copy of the PC, and run it on the Linux machine. And it turns out you can…

A while back, I had moved as much data as I could from C: to my 4 Terabyte D: drive, planning to keep the programs that used the data on the boot SSD.

First, I used VMware’s useful converter program, to make a virtual machine from the PC’s SSD. For some reason, VirtualBox isn’t yet able to do this. I’m more used to VirtualBox, which seems to have better support than VMware, so I fed VMware’s resulting virtual PC into a converter that outputs a VirtualBox machine.

Having saved that very carefully, in more than one place, I installed Linux Mint Cinnamon on a brand new SSD on the PC, and started the fun of getting used to it. It’s really good these days, and “just works”. I put VirtualBox and RealVNC on Mint, and found that my saved virtual Windows 10 computer ran just fine.

Well, actually not the first time, it didn’t. Reverting to Windows 10 was easy, because it was on the old SSD, and I was able to fix the things I had forgotten to do. There is a snag here, in that each time I return to Linux, the Data drive gets changed to a read only file system. There’s a simple fix for that snafu, that I will mention when I remember what it is.

Now, normally, I use a Chromebook around the house, to access the PC and all the little Raspberry Pi computers I run. Mrs Walrus seems to prefer to have me where she can see me, or maybe she likes my company, and I do an awful lot on what used to be her Chromebook. I was already using the Chrome browser on the Chromebook to do remote access to the Windows PC. To my absolute delight, when I fired that up, it connected to the virtualised PC on the Linux machine. When software is good, it can be very good indeed!

Soon, I must get the Chromebook to remote directly into Linux Mint as well. Mint is just fine using RealVNC Connect to work the Pi computers, so now I have more than one way to access them.

End of Part 1…

Meta-Spuds

Forgive me Readers (if any) for I have not blogged about food for over a fortnight!

I just received an email from the Guardian’s Rachel Roddy, about mashed potato. Now I know many people think mash comes in a packet, and you just add water, and some of us are old enough to remember the Cadbury’s Smash robots laughing at our primitive way of making them… “they cut them with their metal knives”.

Normally, I might just read it, and perhaps some of the things it links to, but I was rather impressed with the way Rachel Roddy referenced, in one paragraph, all of…

Rachel Roddy mentions an Italian trattoria, that served

the puree di patate con lardo. It turned out to be a small mound of buttery mashed potato topped with three slices of cured pork back fat that had once been white, but was now translucent as it melted into its mountain. It remains one of the most delicious things I have ever eaten and summed up the joy of mashed potato: ordinary and luxurious, silly and serious. Mash, wonderful mash.

Now, I have wanted to make lardo for ages, but hey, it’s the 21st Century, and farmers in the UK are mostly producing damned skinny pigs, because everyone knows fat is dreadfully bad for us, so you can’t even get pork fat a couple of inches thick, even when you go to a real butcher. Part of the problem is that pigs are not kept until they have time to get properly fat before they’re rushed off to whatever food product factory has the contract for them. It’s the same problem when I make bacon, as well, there’s only just enough fat on it to be able to fry it properly. Once upon a time, you would put bacon in a pan, fry it, fry the eggs in the fat that was left behind, and then soak the fat up with a Staffordshire oatcake, and eat the lot. I once made a batch of Staffordshire oatcakes, and they were wonderful. I must make more, as I no longer live in Staffordshire, and nobody sells ready-made ones here in Wales.

Staffordshire? You know…

Money saving Duck methods

Duck is delicious. Sadly, it’s not cheap. But there are ways to make it less expensive.

Buy a whole duck. A whole Gressingham duck currently sells for about £9.

Meanwhile, two duck breasts cost £8, and two duck legs cost £4.50 or thereabouts.

A whole Gressingham duck, removed from the packaging.

Here’s a whole duck. I’ve pulled the plastic bag of giblets out, and put them in the stock pot, along with the wing tips. There will be more in the pan soon….

Giblets and other bits, waiting for me to make duck stock.

Now, with a very sharp knife, and considerable caution, I have cut one breast off the duck. Not very tidy knife-work, but I’m out of practice…

The duck with one breast cut off.
THe other breast’s gone, and so has this leg.
Here are two legs, and two breasts. £12.50 already, from a £9 duck.

Eventually, one ends up with two duck breasts and two duck legs, for the freezer. When I have collected four legs, I will make confit duck legs.

The duck breasts seem to be smaller than the ones they sell separately. My guess is that they use their biggest ducks for the portions, and sell the smaller ones whole.

Bits of duck, about to become stock.

The rest of the carcase just gets broken up, submerged in water, and boiled for a while, resulting in a delicious stock. What can I use that for, you ask?

Well, I used it for ramen. There wasn’t quite enough duck meat on the carcase for this, so I quickly cooked a couple of chicken thighs, you can see it at the top of the bowl. This was a lovely dish for a cold, wet evening…

A bowl of ramen.