How not to do cluster monitoring.

The world should have blinkenlights on its computer systems. That’s a given.

I wrote a couple of things. One was a Python program that pinged the four machines forming the cluster, and displayed a red or green light on a UnicornHD HAT to show their status. It worked very nicely. Then I wrote Python code to form part of any program that would be run in parallel on the cluster, which would send a signal saying whether each core was busy or not. It worked nicely, and I now had a row of 16 LEDs, in red or green, so I could see what was going on. It was very pretty.

Unfortunately, as it worked by sending a file by FTP every time a processor core changed between running and idle, it created a very effective Denial of Service attack on our network. Oops.

Now that I have thought about it more carefully, I shall be constructing a much better monitoring system, which will be based on sockets. I’ve been avoiding learning how to use them for far too long, anyway…

Later:

I tried at least umpteen example programs using sockets, and the connections were all rejected, and I couldn’t work out how to fix that. Suggestions, anyone?

Using a Python program to query the cluster computers took nearly six seconds to look at the 16 cores, hardly blinkenlights… A quick hack of a bash script, astonishingly, took almost as long. Back to trying to get sockets to work, then…

Working sockets tutorial!

At last, I found a socket programming example that worked, here!

I wanted to give Zan a tiny donation, but sadly his GoFundMe page seems defunct, and possibly the message I tried to send him also failed…

Sadly, I was then unable to work out how to accept multiple connections from the cluster computers.

Threading sockets programs!

There’s another set of client-server demos on GitHub, here, that I tested with Marvin and two of the Oysters, to confirm that it can do what I want. I can hoik code from those while retaining the program logic, and maybe get all four Oysters to send their status to Marvin, for him to display. I am not at all bothered that I am writing control system code for the cluster, instead of getting round to some fun applications of parallelism

Raspberry Pi Cluster test

I’m just testing my Raspberry Pi cluster, to see if I have sorted out the setup properly this time. Finding the primes up to 10,000 with one core, and then sixteen cores, followed by using 16 cores to find primes up to 100,000 gave these results…

pi@oyster0:~ $ mpirun -hostfile myhostfile -np 1 python3 Programs/prime.py 10000
Find all primes up to: 10000
Nodes: 1
Time elapsed: 4.34 seconds
pi@oyster0:~ $ mpirun -hostfile myhostfile -np 16 python3 Programs/prime.py 10000
Find all primes up to: 10000
Nodes: 16
Time elapsed: 0.34 seconds
pi@oyster0:~ $ mpirun -hostfile myhostfile -np 16 python3 Programs/prime.py 100000
Find all primes up to: 100000
Nodes: 16
Time elapsed: 23.78 seconds

So, it is all working as it should now. Next step is to add blinkenlights on the supervising machine, Marvin, which has a UnicornHD HAT. After that, I want to get my GUI based supervisor working.

This leaves far too little time to flog stuff on eBay! I shall have to write a program to do that…

Update:

I ran it for the primes under a million, and it was disturbingly slow. I’d hope for something less than ten times as long as for a hundred thousand, but no!

pi@oyster0:~ $ mpirun -hostfile myhostfile -np 16 python3 Programs/prime.py 1000000
Find all primes up to: 1000000
Nodes: 16
Time elapsed: 2279.37 seconds

Almost ten minutes. I’m assuming things ended up swapping memory in and out, or Python doesn’t handle big integers very well. It’s not a problem, but it is one of the reasons I want blinkenlights…

Fear and Loathing with mpirun

My Raspberry Pi cluster, named “Oyster” because of something to do with the Walrus and the Carpenter, had been out of commission for months, so I eventually got to work and tried to set it up from scratch, as an alternative to checking everything over and over again, and failing to find anything wrong. Oyster used to look like this, but the Pi 3s in Lego compatible cases ran too hot.

At first, I attempted to use my only Pi 4, “Marvin”, as the control machine for the four Pi 3s in Oyster, but I got something wrong in the setup, and it didn’t work. I had been thinking 20 cores would obviously be better than 16, but suspected the two different user names in use might be causing the problem. It probably wasn’t, as I now think the ssh communication for the cluster is done anonymously. Possibly. Anyway, I changed my mind about 20 cores, when I thought about the other tasks Marvin runs, and how I didn’t want them slowed down. Sixteen will do. Unless I get some more Pi 3s and add them to the cluster…

Anyway, I went through all the setup described in Ashwin Pajankar’s e-book* about Raspberry Pi “supercomputers”, again. Twice. I found some online guides, and checked the setup with those, too. I could run programs on the four cores of oyster0, but not on the other three Pis. Eventually, I spotted that I had forgotten to create the file “known-hosts” on the controlling Pi. The message passing software, python3-mpi4py, would probably have told me this, if I had run it in verbose mode, but I didn’t. Still, it now works, and I have a sixteen core Pi cluster running.

I have added a program, running on Marvin, that lights up LEDs to show the state of the Pis in the cluster, and intend to add further blinkenlights to show the activity of each of the sixteen cores.

# Program to monitor status of Oyster cluster, displaying up/down indication.
#import os
import time
import subprocess as sp

machines = ["oyster0", "oyster1", "oyster2", "oyster3"]

while True:
    for i in range(len(machines)):
        machine = machines[i]
        state  = sp.call(['ping', '-c', '1', machine], stdout=sp.DEVNULL)
        
        if state == 0:
            colour = "0 200 0" # Green means UP
        else:
            colour = "200 0 0" # Red means DOWN 
            
        row = "12 "
        if machine == "oyster0":
            column = "0 "
        elif machine == "oyster1":
            column = "4 "
        elif machine == "oyster2":
            column = "8 "
        elif machine == "oyster3":
            column = "12 "
    
        message = "set_pixel " + row + column + colour
        
        fp = open("/home/chris/ftp/files/blinkenlights" + machine, 'w')
        fp.write(message)
        fp.close()
        
        time.sleep(1)

This code sends a text file to my Unicorn HD server program, which enables more than one program to write to its 256 LEDs, without messing up each other’s displays.

My plans for further development include improving my set of framework code for running parallel programs on the cluster, and more (or possibly less) importantly, to send messages to Marvin for the blinkenlights.

[* I haven’t included a link to AP’s e-book, because it’s easy to find online, and he is charging far too much for it, while others are giving us the same information free.]

Fun with rpi-connect

To begin at the beginning…

It all started when I read a news item saying RealVNC was going to change its terms and conditions unilaterally, so that home users would be limited to connecting to three computers. I was using it to view and control anything up to fifteen computers.

Panic set in. I began to research other free VNC implementations. Then it was pointed out to me that it was only remote access over the internet that would be affected. I hadn’t been doing that at all. It seemed all the Pi’s on my network would be unaffected. To ensure things would not get changed, I set RealVNC to NOT update automatically on all of them.

During my researches, I heard that a Pi running Wayland instead of X-Windows could be remotely accessed just like VNC, using rpi-connect, and decided that would be interesting to do, anyway. So I needed to update a machine to run the Bookworm version of RaspberryPi OS, which does use Wayland. I have only one Pi 4 at present, and did the thing they tell you never to do – I attempted an in-place upgrade. It ever so nearly worked. But it wouldn’t run rpi-connect. So, I pulled the Pi to bits, took its SSD off the Waveshare adapter it uses, and burned a fresh copy of Bookworm. Once it was all re-assembled, the Pi booted up normally, and rpi-connect was usable. Well, there was the small matter of going online and associating the Pi, whose name is Marvin, with my RaspberryPi ID. Which I have.

Here is a screenshot of my Linux Mint box, with VNC Connect at the top left, a VNC session into the street camera at the bottom left, another remote session on a Pi, at the middle left, and a remote session on Marvin…

Ideas for the next step…

Firstly, it seemed like a good idea to verify that I did indeed have remote access. I turned off wi-fi on my phone, and used Chrome to access the Raspberry Pi sign in page, which was fine. And on connecting to Marvin, I got this…

So, remote access definitely works. I don’t think I will be using my phone for the job, but one of my tablets, or even the Chromebook should work just fine.

The next thing was to write something to enable Marvin to monitor WeatherPi, which has an occasional problem with its greenhouse temperature sensor, that results in the weather monitoring and upload program crashing. I used the Python Paramiko library, and made a program (cobbled together from earlier versions) that checks the weather station every five minutes. If the machine itself is down, it sends a message to my phone, using Pushover. If the machine is up, it checks whether the weather program is running, and attempts to restart it otherwise. I added starting this program to the other programs Marvin runs at startup, which is done by this bash script –

#!/bin/bash
lxterminal --title "UnicornHD" -e 'python3 Programs/unicorn_server.py && read x' &
sleep 2
lxterminal --title "Fan HAT" -e 'python3 Programs/fan65.py && read x' &
sleep 2
lxterminal --title "Weather Station" -e 'python3 Programs/WeatherMonitor.py && read x' &
sleep 10

That was unnecessary!

Somewhere online, while I was trying to get the remote monitoring to work, I found a way to make a python program keep going. Using a bash shell script to run the program like this means that if the program does stop, it will immediately be run again, until the terminal it was run in is closed.

while : ; do
    now=$(date)
    echo $now >> restart.log
    python3 /home/pi/program-to-run.py
    sleep 10
done

This does not work with programs using Pi cameras, when the camera crashes. I wonder if there is a camera resetting utility available, as rebooting is not possible inside this script! I wonder if libcamera can do a reset of the camera? Must have a look…

Raspberry Pi 7 Segment display

Introduction

One of my #RaspberryPi Zeros is called PiClock, and has an 8 digit seven segment LED display. The program it runs displays the time, and sends it to two other Pis, that display it on Unicorn HD HATs. Between midnight and 8 am, it flashes the message “SLEEP” every five minutes, as well. The software library that it uses can display numbers, and most upper and lower case letters, but not all of them. I rather liked the idea of animating sequences of single segments on it, because, well you know, blinkenlights. I had a look at the software library, “7seg.py”, to see if I could get it to do that.

It turns out that the library uses a Python dictionary to look up the byte to send to the display for each of the characters it can display. Decoding the hexadecimal bytes took a few minutes, working from the code for the digits from 1 to 5.

The first bit is always a 0. The remaining seven are the seven segments, in the order abcdefg, which are laid out like this…

So, the codes for illuminating single segments are as follows…

Now to amend the library! I needed some typeable characters to put in the dictionary, ready to be used in strings in the python code. For no obvious reason, I chose a selection of brackets and the tilde character, and amended the library file. The selection of brackets didn’t work!

After trying characters until they did work, I ended up with #][£<$~ as the symbols for the segments abcdefg.

I’m only showing the amended part of the file, where the pattern to send to the display is looked up. The arrangement of the brackets and tilde for the segments is as follows…

Now I’m ready to program PiClock to do silly animations, which will be fun, and a lot easier than using the WordPress editor. Note to self: See if you can find a WYSIWYG editor for WordPress.

Python and SQL with matplotlib.

# Quick hack to graph last 500 greenhouse temperatures from weather database.
import mariadb
import matplotlib.pyplot as plt
conn = mariadb.connect(user="pi",password="password",host="localhost",database="weather")
cur = conn.cursor()
tempIN     = []
tempOUT    = []
timestamps = []
# Get the most recent 500 records.
cur.execute("SELECT greenhouse_temperature, ambient_temperature, created FROM WEATHER_MEASUREMENT 
             ORDER BY created DESC LIMIT 500")
for i in cur:
    tempIN.append(i[0])
    tempOUT.append(i[1])
    timestamps.append(i[2])  
conn.close()
plt.figure(figsize=(14, 6))
plt.title(label="Greenhouse and outside temperature up to "+str(timestamps[0]))
plt.xlabel("Date and time")
plt.ylabel("Temperature in Celsius")
plt.plot(timestamps, tempIN, label='Greenhouse temperature')
plt.plot(timestamps, tempOUT, label='Outside temperature')
plt.axhline(y=5.0, color='r', linestyle='-.')
plt.legend()
plt.savefig("/var/www/html/GHtemp.jpg")
plt.show()

Python on Raspberry Pi, a note about structure, or something.

I’ve been struggling with a problem with a Pi camera for a couple of days. Instead of being able to start up the camera, I just had error messages about MMAL running out of resources.

Now, I knew I’d seen it before, and sure enough, Stack Overflow had quite a lot of questions about it. But I’d seen them before. And then I remembered that I never found out why the problem went away before.

As an experiment, I tried something that I thought couldn’t possibly work, and suddenly everything worked. All it took was moving the camera instantiation from the top of the program to just below all the function declarations.

At a guess, the camera startup can’t get the resources it needs, because the Python interpreter is chewing its way though all the function declarations, and using up something the camera software wanted.

It’s an age or so, since I wrote a language interpreter, and it was for a simple language, Pilot, but I know interpreters have reasons for liking programs in a particular order, so that’s my guess…

#MMALresources

A Python time-lapse program.

A free program…

This is the Python code I cobbled together to make a time-lapse movie of my rather exciting flowering cactus. I’m sure this has been done better by lots of people. It runs on a Raspberry Pi Zero, with not much memory, and no online storage, so it sends the pictures to another Pi Zero, called PiBigStore, which happens to have a 2 Terabyte USB drive. Help yourself to a copy, if you like. Change the server name, and password, obviously. If you know ways this can be improved, feel free to comment.

# Time lapse pictures
import os
import time
import ftplib
from picamera import PiCamera
import schedule

def send_to_PiBigStore():
    hour = int(time.strftime(“%H”))
    #print(hour)
    if hour < 7 or hour > 21:
        time.sleep(250)
        return
    
    file_name = “cactus” + time.strftime(“%Y%m%d-%H%M%S”) + “.jpg”
    camera.capture(“/var/tmp/” + file_name)
    
    connected = True
    ftp = ftplib.FTP()
    try:
        ftp.connect(“PiBigStore”)
    except ftplib.all_errors:
        connected = False
        print(“Couldn’t connect to PiBigStore.”)
        ftp.quit()
        
    try:
        ftp.login(“pi”,”password goes here”)
    except ftplib.all_errors:
        connected = False
        print (“Failed to login to PiBigStore server.”)
        ftp.quit()
    
    if connected:
        ftp.cwd(“/media/pidrive/data/cactus/”)
        ftp.storbinary(‘STOR ‘+file_name, open(“/var/tmp/”+file_name, “rb”))
        print (“Sent to PiBigStore “, file_name)
    ftp.quit()
    os.remove(“/var/tmp/”+file_name)

# Main loop
schedule.every(5).minutes.do(send_to_PiBigStore)
camera = PiCamera()
camera.rotation = 90

while True:
    schedule.run_pending()
    time.sleep(10)
A foot-tall cactus on a windowsill, with a Raspberry Pi Zero with camera, mounted on a Lego tower.

Curse you, munmap_chunk()!

 I still haven’t spotted a working solution to the problem where weather station programs in Python on Raspberry Pi fail, with no traceback details, after a couple of days.

I think it must be some resource in either the operating system, or the Python interpreter, running out, with very poor error reporting. I will leave it to people more familiar with the OS and interpreter to find out what it is, and fix it, in the fairly certain knowledge that everyone who could fix it has better things to do.

I found out that a Python program can actually restart itself, and changed mine to restart once a day. If that doesn’t fix it, I’ll let you know…

#RaspberryPi #Python 

My Stack Overflow comment on this.

Greenhouse computer improvement

New sensor!

I was using an MHT22, hanging on wires outside the case, for temperature readings on my greenhouse computer. I wasn’t happy with it, as it isn’t really compatible with the connections on the Raspberry Pi, and it has a habit of giving occasional absurd readings for no obvious reason.









So, I got myself a Microdia TEMPer-2, from PiHut, which plugs into a USB port. It has a fancy button on it, which activates the sending of text messages or emails, which I shall never be using. It also has an external plug in sensor, which is waterproof, a handy thing in a greenhouse!

It comes with a software mini-disc, which may possibly be useful if you’re using it on a PC, whatever they are. (Kidding. I’m writing this on my PC.) There are several web sites that tell you how to program Python to read from it, and it didn’t take me long to install the appropriate library on the greenhouse computer, and run the test command, sudo temper-poll. That worked, but then I ran into one of those programming blockages that can send you crazy. None of the various pieces of example code would work, mostly due to my inability to get the necessary permissions set correctly. It didn’t matter, I realised, after a lot of head scratching. Instead, I just used Python’s subprocess library to run the command that worked…
import subprocess

rv = str(subprocess.check_output(“sudo temper-poll”, shell=True))
# Split the string, keep fourth block, chop last five characters, make float.
temperature = float(rv.split()[4][:-5])

I’m hoping I won’t need to write any more software for the greenhouse for a while. The Raspberry Pi now monitors the temperature, switching the fan heater on if the temperature is below 5°C, uses its fish-eye camera to take pictures at set times for a time-lapse series, and takes a picture if it spots movement. Eventual improvements under consideration are a soil moisture detection sensor, automated watering… Nothing’s ever really finished, is it?