I've added CMS specific SAM tests to the suite. See
http://pprc.qmul.ac.uk/~lloyd/gridpp/cms_samtest.html.
Tuesday, 16 September 2008
Friday, 1 August 2008
Network Tests Updated
I have revamped the network tests as agreed at a dteam meeting a couple of weeks ago. The tests now involve copying files from the Tier-1 to the local SE, copying the files from the local SE to the WN and copying them back to all SEs. There is quite a high failure rate. Some of this is transfer/catalogue/info failures and some of it is attempts to get more reliable results by using different timings and asking for consistency between them. Sometimes the times vary wildly with smaller files taking longer etc. I am sure there is more to be done.
Tuesday, 22 July 2008
LHCb Tests
I have added LHCb tests to the collection. These are actually SAM tests tailored to LHCb. They are kept separate from the normal (Ops Critical) SAM tests to avoid distorting the availability/reliability statistics. See http://pprc.qmul.ac.uk/~lloyd/gridpp/lhcb_samtest.html.
Monday, 21 July 2008
Local problems
The bad news: The raid array I store my outputs and log files on developed an error over the weekend. It is being rebuilt now but may take some time before it comes back.
The good news: I've instigated a 'heartbeat' system so that if all the tests stop running and the pages look green there is something to tell you that in fact it hasn't been updated for many hours.
Wednesday, 2 July 2008
Network Tests
I have introduced some network tests. Test jobs are sent to every UK site once an hour which attempt to copy a number of differently sized test files (1MB, 5MB, 10MB, 50MB and 100MB) from the Storage Element at each of the sites. The transfer times are then used to calculate the average throughput in MB/s. The results are averaged over the last 24 hours to produce a matrix of site to site transfer throughputs. There is an average over the whole matrix on the main UK Grid page. See http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest.html.
Monday, 23 June 2008
New test
I have enabled a new test which sends an ATLAS analysis job to ".ac.uk" every 10 minutes and wherever it ends up reads data from the local SE and does some analysis. This should give a much better estimate of success rate as seen by users (as jobs won't go to sites that are down). There is still the problem for users whose data is on the site that is down of course). See http://pprc.qmul.ac.uk/~lloyd/gridpp/uktest.html for a summary.
Wednesday, 18 June 2008
New views
There are a new set of results pages showing all the results for a particular site or Tier-2 on one page. Go to the main page
http://pprc.qmul.ac.uk/~lloyd/gridpp/ukgrid.html and click on one of the institute names down the left hand side. Or click on a Tier-2 just above the big table.
http://pprc.qmul.ac.uk/~lloyd/gridpp/ukgrid.html and click on one of the institute names down the left hand side. Or click on a Tier-2 just above the big table.
Subscribe to:
Posts (Atom)