sinfo - a monitoring tool for networked computers
by
Jürgen Rinas email homepage
 
Contents of this page
The Problem down
The Solution - How it works down
Some Screenshots down
Documentation down
Preconditions down
ChangeLog down
Download down
Published on down
 
The Problem homepage top of page

Don't you know the problem?

1.) There are many computers connected to your local network and you have got the (scientific) programs to use their computational power.

But the computers are too different to use some special clustering software. So you are on your own!

The first problem you´ve got to solve is an overview of the available computers an their current load. sinfo is an approach to solve this problem.

2.) You are responsible for an laboratory of public accessible computers. And you need a tool that provides a quick overview of all nodes and the (insane) processes running on it. sinfo can be your choice.

sinfo is running for about two years on our local network and has become a tool for daily use.

I'm running sinfo on the following systems:

  • Linux 2.6.25
  • Solaris 2.8, 2.9
    From version 0.0.15 on I can't test the sinfo on solaris, since we scraped our last ultra sparcs lately. I would be grateful for your information if sinfo still works on solaris, or if you can provide me with an account on one of your solaris machines.
  • FreeBSD 4.6.2 (and 5.0)
    We have changed our server to linux, so I need your help for FreeBSD support. Does anybody have a qemu image for the latest stable FreeBSD?
 
The Solution - How it works homepage top of page
idea.jpg:shematic information flow of sinfo
shematic information flow of sinfo

The sinfo-system is split into two parts. A demon and a user program.

1.) The demon (sinfod) distributes system information using UDP broadcasts on the local network. Each demon will also receive UDP broadcasts of all other demons and manage a list of the most recent informations.

2.) The user program (sinfo) connects to the demon via the local loop-back interface and displays the up to date informations using the ncurses library.

This scheme has the advantage that it produces minimal network load. If each node broadcasts it's information in a cooperative manner, the network load is O(N), where N is the number of nodes in your network.

Other systems to monitor your cluster load (e.g. rup(1) ) are using a polling scheme where every node has to ask every other node for the system information: In that case the network load is O(N**2).

The Informations broadcasted include:

  • The number of CPUs and their speed.
  • The network node hostname, the hardware type, the host processor type, the operating system name, the operating system release, the operating system version. - Everything uname provides.
  • The uptime of the system.
  • The load average.
  • The current load - split by user, nice, system and idle times.
  • The memory usage of the RAM and the swap space.
  • The network traffic send and received by the network card.
  • Informations of the TOP-5 processes.
 
Some Screenshots homepage top of page
 
Documentation homepage top of page

For documentation look at the man pages of sinfo and sinfod.

 
Preconditions homepage top of page

The following tools and libraries are necessary to compile and use sinfo. Most of them should be already available on any modern Linux system.

necessary libraries:

  • ncurses - libraries for terminal handling (tested with version 5.6)
  • boost - peer-reviewed portable C++ source libraries using Boost.Bind and Boost.Signals (tested with version 1.35)
  • asio (>=1.1.0) - asio is a cross-platform C++ library for network programming (tested with version 1.1.1)
 
ChangeLog homepage top of page

sinfo 0.0.33 - Fri, 12 Dec 2008 06:52:59 +0100

  • test programs in subdirs
  • increased buffer-size to 16kByte in libmessageio

sinfo 0.0.32 - Sun, 26 Oct 2008 11:35:40 +0100

  • now sinfo can request informations from sinfod using IPv6 e.g. sinfo --host ::1
  • sinfo returning exit code 1 if a timeout occurs in non ncurses modes e.g. sinfo -L -H or -W
  • added --hostsandipmode as output option for sinfo
  • added -I/--ipmode as output option for sinfo
  • changed sshallsinfo and examples in addon directory to use --hostsandipmode respectively -I This should prevent problems with unreliable reverse name lookups.
  • encapsulate asio stuff to allow implementation of abstract polling protocol in sinfo's SinfoRequester
  • added options --udp(default) and --tcp to sinfo to select connection method sinfo<=>sinfod
  • added comparison for sorting of hostnames so that node9 is less than node10
  • now using pkglib i.e. dynamic libraries go into prefix/lib/sinfo

sinfo 0.0.31 - Tue, 02 Sep 2008 18:50:32 +0200

  • corrected reading number of CPUs for linux 2.6.26

sinfo 0.0.30 - Sat, 30 Aug 2008 19:18:19 +0200

  • restructured sinfod: sinfoserver => udpmessageserver
  • cleaned up Message class
  • binding UDP SINFO_REQUEST_PORT to IPv4 and IPv6
  • added a few desktop files to simplify invocation of sinfo in addon directory

sinfo 0.0.29 - Thu, 07 Aug 2008 12:31:19 +0200

  • reorganized sinfo (client) to follow the Model View Controller architecture using boost::signal as observer - example: http://www.boost.org/doc/libs/1_35_0/doc/html/signals/tutorial.html
 
Download homepage top of page

I've prepared a debian package for this program. You can download this via
apt-get update; apt-get install sinfo
if you configure your system to include my debian repository.

 
Published on homepage top of page

Lately I've seen some sites out there linking sinfo's homepage. If you want your site to be listed here or know a site not listed here, please drop me a line.

Valid XHTML 1.1!