Thursday, 8 March 2007
Old Logs Available
There is a new link "All Logs" which allows one to look at old log and output files. These can be filtered by institute. It's a bit flaky because the webserver isn't really up to it. Logs before February 8 have been lost.
Monday, 5 March 2007
Update RB config file
In my atlas.config file change
to
to try and use load balancing.
NSAddresses = "lcgrb02.gridpp.rl.ac.uk:7772";
LBAddresses = "lcgrb02.gridpp.rl.ac.uk:9000";
to
NSAddresses = {"lcgrb01.gridpp.rl.ac.uk:7772","lcgrb02.gridpp.rl.ac.uk:7772"};
LBAddresses = {{"lcgrb01.gridpp.rl.ac.uk:9000"},{"lcgrb02.gridpp.rl.ac.uk:9000"}};
to try and use load balancing.
Sunday, 4 March 2007
Working again
Commented out LoggingDestination line in heppc009:/opt/edg/etc/edg_wl_ui_cmd_var.conf Everything seems to be OK again now.
Friday, 2 March 2007
RAL RB Broken
All job submission fails:
Why does it try and use lcgrb01.gridpp.rl.ac.uk when I have this in my conf file:
Selected Virtual Organisation name (from --config-vo option): atlas
Connecting to host lcgrb02.gridpp.rl.ac.uk, port 7772
Logging to host lcgrb01.gridpp.rl.ac.uk, port 9002
**** Error: API_NATIVE_ERROR ****
Error while calling the "edg_wll_RegisterJobSync" native api
Unable to Register the Job:
https://lcgrb02.gridpp.rl.ac.uk:9000/rdV_Ep9fG_oqlBypG--UBQ
to the LB logger at: lcgrb01.gridpp.rl.ac.uk:9002
No route to host (edg_wll_ssl_connect())
Why does it try and use lcgrb01.gridpp.rl.ac.uk when I have this in my conf file:
[
VirtualOrganisation = "atlas";
NSAddresses = "lcgrb02.gridpp.rl.ac.uk:7772";
LBAddresses = "lcgrb02.gridpp.rl.ac.uk:9000";
Manchester ce01 Off
There are no suitable queues on Manchester ce01 and all my jobs fail so I've switched it off for the time being. ce02 is OK.
Thursday, 1 March 2007
Splitting Manchester
On Alessandra's request I am splitting Manchester into two - ce01 and ce02 reading from dcache01 and dcache02 respectively. At the moment it isn't quite working because of a bdii problem somewhere.
Minor problem
Attempts to make everything fully automatic by killing old processes and deleting their lock files failed because of a bug. Everything stopped overnight. Hopefully now OK.
Subscribe to:
Posts (Atom)