Depalma is a package for distributing jobs throughout a cluster of workstations. Files: src/master.py: This defines the master_server, the main process that manages computation on the cluster. src/remote.py: This defines the remote_server, the processes that listen on cluster machines, accepts and runs jobs from the master_server. src/dispatcher.py: the dispatcher class is defined here. dispatchers manage the job queue, and balance work across the cluster. src/monitor.py: the monitor class is defined here, which monitors job process in a thread on a cluster machine, and alerts the master server when jobs are completed. bin/run.py: A standalone script which will run a remote server. This is used by the master server to create the cluster. bin/kill_cluster.py: A quick script to kill any stray processes on the cluster. Should not be needed. This is used during development when servers die horrible deaths. example/example.conf: A sample configuration file. example/example.py: A sample depalma application. How it works: The master server starts an xml-rpc server as a front end to the cluster. when init_server is called: The master server spawns a dispatch thread, which handles all the job control in the cluster. It then starts all the servers on the remote machines. When a new code is added to the cluster, it calls an add_code() function on each server in the cluster which will add the code, with the specified parameters. When the server has successfully loaded the code, it makes a callback to the server announcing that it has finished. When the server has received a callback from every cluster machine for which it started the job, it returns. This use of callbacks can be seen in several different methods. It also happens when the servers are started. right before serve_forever() the remote_server objects announce to the master_server that they are ready to receive commands. The callback mechanism needs to be expanded. Ideally, each command that needs to be synced across the cluster would have a callback. These callbacks should also be signed with an incrementing id. So if you add_code(cmd), the cluster will sync with an id=0. Ideally the cluster will be able to tell when a server has fallen behind in updates, and kick it out or sync it up. Anyways... The master server communicates to the remote server objects. A remote server when launched, starts a monitor thread which watches running processes on the cluster machine and alerts the master server on completion. The remote server shouldn't be contacted by a user. The public xmlrpc methods it has are for use by the master server. It contains similar job control functions to what we saw for the master server. Features: depalma will try to handle any network issues that arise. If a cluster machine is taken off of the network, it will be dropped from the valid server list, and all of the work it was processing will be reassigned to new hosts. Similarly if a cluster machine cannot contact the master server, it will timeout and shut itself down. This part of the code is still in development. Future Plans: In the future I would like to add support for different groups of machines, with jobs having the ability to declare what machines they can be executed on. Different machines would have different priority levels, etc. Also i would like to add support for job priorities, as well as a better dispatcher that does smarter load balancing. Methods for adding and removing hosts, or groups of machines would be nice... ps should return useful information. it should include ps info from the os about the jobs. load should be brought back in. load would return system load, cpu usage, and mem usage. I also plan on writing a web frontend for depalma.