{"id":738,"date":"2024-08-03T09:55:47","date_gmt":"2024-08-03T09:55:47","guid":{"rendered":"https:\/\/doctor-dark.co.uk\/blog\/?p=738"},"modified":"2024-08-29T12:11:45","modified_gmt":"2024-08-29T12:11:45","slug":"how-not-to-do-cluster-monitoring","status":"publish","type":"post","link":"https:\/\/doctor-dark.co.uk\/blog\/how-not-to-do-cluster-monitoring\/","title":{"rendered":"How not to do cluster monitoring."},"content":{"rendered":"\n<p>The world should have blinkenlights on its computer systems. That&#8217;s a given.<\/p>\n\n\n\n<p>I wrote a couple of things. One was a Python program that pinged the four machines forming the cluster, and displayed a red or green light on a UnicornHD HAT to show their status. It worked very nicely. Then I wrote Python code to form part of any program that would be run in parallel on the cluster, which would send a signal saying whether each core was busy or not. It worked nicely, and I now had a row of 16 LEDs, in red or green, so I could see what was going on. It was very pretty.<\/p>\n\n\n\n<p>Unfortunately, as it worked by sending a file by FTP every time a processor core changed between running and idle, it created a very effective Denial of Service attack on our network. Oops.<\/p>\n\n\n\n<p>Now that I have thought about it more carefully, I shall be constructing a much better monitoring system, which will be based on sockets. I&#8217;ve been avoiding learning how to use them for far too long, anyway&#8230;<\/p>\n\n\n\n<p>Later:<\/p>\n\n\n\n<p>I tried at least umpteen example programs using sockets, and the connections were all rejected, and I couldn&#8217;t work out how to fix that. Suggestions, anyone?<\/p>\n\n\n\n<p>Using a Python program to query the cluster computers took nearly six seconds to look at the 16 cores, hardly blinkenlights&#8230; A quick hack of a bash script, astonishingly, took almost as long. Back to trying to get sockets to work, then&#8230;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Working sockets tutorial!<\/h2>\n\n\n\n<p>At last, I found a socket programming example that worked, <a href=\"https:\/\/thezanshow.com\/electronics-tutorials\/raspberry-pi\/tutorial-27-28-29\">here<\/a>!<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"1112\" src=\"https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-18-14-05-53.png\" alt=\"\" class=\"wp-image-749\" srcset=\"https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-18-14-05-53.png 1920w, https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-18-14-05-53-1000x579.png 1000w, https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-18-14-05-53-768x445.png 768w, https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-18-14-05-53-1536x890.png 1536w, https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-18-14-05-53-1200x695.png 1200w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/figure>\n\n\n\n<p>I wanted to give Zan a tiny donation, but sadly his GoFundMe page seems defunct, and possibly the message I tried to send him also failed&#8230;<\/p>\n\n\n\n<p>Sadly, I was then unable to work out how to accept multiple connections from the cluster computers. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Threading sockets programs!<\/h2>\n\n\n\n<p>There&#8217;s another set of client-server demos on GitHub, <a href=\"https:\/\/github.com\/MattCrook\/python_sockets_multi_threading\/\">here<\/a>, that I tested with Marvin and two of the Oysters, to confirm that it can do what I want. I can hoik code from those while retaining the program logic, and maybe get all four Oysters to send their status to Marvin, for him to display. I am not at all bothered that I am writing control system code for the cluster, instead of getting round to some fun applications of parallelism<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2017\" height=\"1703\" src=\"https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-29-13-10-02.png\" alt=\"\" class=\"wp-image-753\" srcset=\"https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-29-13-10-02.png 2017w, https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-29-13-10-02-1000x844.png 1000w, https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-29-13-10-02-768x648.png 768w, https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-29-13-10-02-1536x1297.png 1536w, https:\/\/doctor-dark.co.uk\/blog\/wp-content\/uploads\/2024\/08\/Screenshot-from-2024-08-29-13-10-02-1200x1013.png 1200w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>The world should have blinkenlights on its computer systems. That&#8217;s a given. I wrote a couple of things. One was a Python program that pinged the four machines forming the cluster, and displayed a red or green light on a UnicornHD HAT to show their status. It worked very nicely. Then I wrote Python code &hellip; <a href=\"https:\/\/doctor-dark.co.uk\/blog\/how-not-to-do-cluster-monitoring\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;How not to do cluster monitoring.&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20,7],"tags":[15,13,30,31,32],"class_list":["post-738","post","type-post","status-publish","format-standard","hentry","category-computing","category-raspberry-pi","tag-led","tag-raspberrypi","tag-cluster","tag-dosattack","tag-python-2"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/doctor-dark.co.uk\/blog\/wp-json\/wp\/v2\/posts\/738","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/doctor-dark.co.uk\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/doctor-dark.co.uk\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/doctor-dark.co.uk\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/doctor-dark.co.uk\/blog\/wp-json\/wp\/v2\/comments?post=738"}],"version-history":[{"count":5,"href":"https:\/\/doctor-dark.co.uk\/blog\/wp-json\/wp\/v2\/posts\/738\/revisions"}],"predecessor-version":[{"id":754,"href":"https:\/\/doctor-dark.co.uk\/blog\/wp-json\/wp\/v2\/posts\/738\/revisions\/754"}],"wp:attachment":[{"href":"https:\/\/doctor-dark.co.uk\/blog\/wp-json\/wp\/v2\/media?parent=738"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/doctor-dark.co.uk\/blog\/wp-json\/wp\/v2\/categories?post=738"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/doctor-dark.co.uk\/blog\/wp-json\/wp\/v2\/tags?post=738"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}