tag:blogger.com,1999:blog-57635714978751537452024-02-08T15:52:28.893+00:00Steve Lloyd's ATLAS Grid TestsSteve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.comBlogger67125tag:blogger.com,1999:blog-5763571497875153745.post-81894340275796622862010-10-29T15:06:00.002+00:002010-10-29T15:11:11.948+00:00Upgrade to AtlasSetupI have changed the setup for my ATLAS jobs so it uses AtlasSetup (rather than AtlasLogin). The magic lines are:<br /><pre><br />RELEASE=xx.xx.xx<br />source $VO_ATLAS_SW_DIR/software/$RELEASE/cmtsite/asetup.sh AtlasOffline $RELEASE<br /></pre><br />VO_ATLAS_SW_DIR is set up automatically and you have to set RELEASE yourself. Since AtlasSetup is only available from Release 16 onwards, jobs going to sites without Release 16 will fail.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-4214623189041167022010-07-23T09:01:00.002+00:002010-07-23T09:04:12.765+00:00Steve's pages updateI have done some much needed maintenance and the gstat information is available again (from gstat2). There is also a new page giving the history of the ATLAS Hammercloud tests status <a href="http://pprc.qmul.ac.uk/~lloyd/gridpp/hammercloud.html">http://pprc.qmul.ac.uk/~lloyd/gridpp/hammercloud.html</a>.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-65036708012957731842010-01-26T10:09:00.001+00:002010-01-26T10:10:55.369+00:00ATLAS Tests now running natively on SL5My ATLAS tests (and 'UK Tests') are now running in native SL5 mode on all sites where ATLAS release 15.6.3 (or above) is installed.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-68128820206425671752009-11-04T12:42:00.003+00:002009-11-04T12:44:04.815+00:00UK Tests Working AgainI've finally managed to get the UK tests working again. There were several problems that I won't go into. I have tried to make it a bit more robust and now the tests should start using the latest ATLAS release as soon as more than 50% of UK sites have it installed.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-38165628127895050652009-07-08T18:51:00.002+00:002009-07-08T18:54:25.084+00:00ATLAS Release 15Today I finally got round to switching my ATLAS tests to using Release 15. I tried as soon as it came out but the tests all failed and so they did today (as you might have seen) until I finally fixed it this evening. I don't understand why it wouldn't work - I had to modify PYTHONPATH and LD_LIBRARY_PATH by hand eventually.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-20483412606222370802009-02-24T15:44:00.001+00:002009-02-24T15:45:49.337+00:00SE SAM Tests removedI have remove the 'SE' SAM tests as they are no longer run and are superceded by the 'SRMv2' ones.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-73082436503649591472008-12-02T09:35:00.002+00:002008-12-02T09:38:01.431+00:00SAM UpdatesI have changed the main SAM tests from SRM to SRMv2. I have also added ATLAS specific SAM tests (to join CMS and LHCb ones). It is not clear if these are the right tests. Each SAM summary page now has a list of the tests being polled under the results table. If the experiments want different tests please let me know.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-42023008165422826022008-10-08T11:19:00.002+00:002008-10-08T11:22:34.888+00:00Proxy problemsMost tests have been red for the last 24 hrs or so due to failure to submit any jobs. This was due to trying to use an ATLAS production role which seems to stop anything working. The tests should be OK again now I've stopped trying to do this.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-57574432551911682932008-10-01T13:13:00.001+00:002008-10-01T13:14:48.575+00:00New ServerI have moved everything on to one dedicated server. There should be no noticeable change but if there is please let me know.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-17090175057213047912008-09-16T10:51:00.001+00:002008-09-16T10:52:42.251+00:00CMS SAM Tests addedI've added CMS specific SAM tests to the suite. See <a href="http://pprc.qmul.ac.uk/~lloyd/gridpp/cms_samtest.html"><br />http://pprc.qmul.ac.uk/~lloyd/gridpp/cms_samtest.html</a>.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-19231373641267479212008-08-01T11:53:00.000+00:002008-08-01T11:54:16.875+00:00Network Tests UpdatedI have revamped the network tests as agreed at a dteam meeting a couple of weeks ago. The tests now involve copying files from the Tier-1 to the local SE, copying the files from the local SE to the WN and copying them back to all SEs. There is quite a high failure rate. Some of this is transfer/catalogue/info failures and some of it is attempts to get more reliable results by using different timings and asking for consistency between them. Sometimes the times vary wildly with smaller files taking longer etc. I am sure there is more to be done.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-71183939623605349982008-07-22T14:02:00.002+00:002008-07-22T14:05:11.551+00:00LHCb TestsI have added LHCb tests to the collection. These are actually SAM tests tailored to LHCb. They are kept separate from the normal (Ops Critical) SAM tests to avoid distorting the availability/reliability statistics. See <a href="http://pprc.qmul.ac.uk/~lloyd/gridpp/lhcb_samtest.html">http://pprc.qmul.ac.uk/~lloyd/gridpp/lhcb_samtest.html</a>.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-3641509592910371052008-07-21T14:59:00.002+00:002008-07-21T15:01:21.318+00:00Local problemsThe bad news: The raid array I store my outputs and log files on developed an error over the weekend. It is being rebuilt now but may take some time before it comes back.<br /><p><br />The good news: I've instigated a 'heartbeat' system so that if all the tests stop running and the pages look green there is something to tell you that in fact it hasn't been updated for many hours.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-51176028077281004742008-07-02T09:41:00.002+00:002008-07-02T09:47:35.608+00:00Network TestsI have introduced some network tests. Test jobs are sent to every UK site once an hour which attempt to copy a number of differently sized test files (1MB, 5MB, 10MB, 50MB and 100MB) from the Storage Element at each of the sites. The transfer times are then used to calculate the average throughput in MB/s. The results are averaged over the last 24 hours to produce a matrix of site to site transfer throughputs. There is an average over the whole matrix on the main UK Grid page. See <a href="http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest.html">http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest.html</a>.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-33635884203161451172008-06-23T11:35:00.000+00:002008-06-23T11:36:40.967+00:00New testI have enabled a new test which sends an ATLAS analysis job to ".ac.uk" every 10 minutes and wherever it ends up reads data from the local SE and does some analysis. This should give a much better estimate of success rate as seen by users (as jobs won't go to sites that are down). There is still the problem for users whose data is on the site that is down of course). See <a href="http://pprc.qmul.ac.uk/~lloyd/gridpp/uktest.html">http://pprc.qmul.ac.uk/~lloyd/gridpp/uktest.html</a> for a summary.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com1tag:blogger.com,1999:blog-5763571497875153745.post-75099456356412027072008-06-18T15:34:00.000+00:002008-06-18T15:35:36.121+00:00New viewsThere are a new set of results pages showing all the results for a particular site or Tier-2 on one page. Go to the main page <a href="http://pprc.qmul.ac.uk/~lloyd/gridpp/ukgrid.html"><br />http://pprc.qmul.ac.uk/~lloyd/gridpp/ukgrid.html</a> and click on one of the institute names down the left hand side. Or click on a Tier-2 just above the big table.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-16219485405435186572008-06-17T13:20:00.002+00:002008-06-17T13:23:10.260+00:00FCR HistoryI have been collecting the FCR history since just before Christmas. I have now put up a new page that shows the percentage of time each site is blacklisted by each VO using the FCR. This can be found at <a href="http://pprc.qmul.ac.uk/~lloyd/gridpp/fcrtest.html"> http://pprc.qmul.ac.uk/~lloyd/gridpp/fcrtest.html</a> and is linked from the other test pages as "FCR".Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com1tag:blogger.com,1999:blog-5763571497875153745.post-24922611098671499052008-06-13T10:52:00.002+00:002008-06-13T10:55:38.838+00:00Late Spring CleanI've done some tidying up so that the ATLAS tests are more resilient to changing versions of the ATLAS code and this seems to mean that sites with only Release 13 and not 14 are passing again. I have also tried to make all the times consistent - some were previously GMT and some BST. They are now all supposed to be BST in Summer and GMT in Winter.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-61036172658854028732008-06-06T14:13:00.002+00:002008-06-06T14:14:54.662+00:00Release 14 CompleteAll three ATLAS tests are now running on ATLAS Release 14. If you want to pass you need to install 14.1.0 or higher.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-42855315347640636662008-06-03T16:09:00.001+00:002008-06-03T16:10:35.556+00:00Release 14 User Analysis SuccessRelease 14 User Analysis seems to work at some sites at least. The Z mass has gone up from 96.661 to 97.203!Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-83030735208299277162008-06-03T14:50:00.003+00:002008-06-03T14:53:24.281+00:00Upgrading to ATLAS Release 14I am currently upgrading my tests to ATLAS Release 14 starting with User Analysis. All sites without Release 14, which requires SL4, will fail the test. All sites will probably fail while I get the bugs out on my side! For the time being the other tests will use Release 13 (even if 14 is installed).Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-64101036336497200702008-01-22T15:38:00.000+00:002008-01-22T15:44:53.418+00:00New SE TestsThere are now some new tests of the UK SEs. These run every hour (for the moment) and attempt to copy a (2.8MB) file to an SE (lcg-cr), check it is there (lcg-lr), read it back (lcg-cp), delete it (lcg-del) and check it is no longer there (lcg-lr). The log files from these operations are available by clicking on the test result. You can see that the results are not 100% correlated with the SAM test results (which is why I did this). See: <a href="http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/setest.html"><br />http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/setest.html</a>.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-33203909634600173672008-01-14T12:40:00.000+00:002008-01-14T12:44:10.079+00:00Auto RBOn the basis of my <a href="http://hepwww.ph.qmul.ac.uk/~lloyd/gridpp/rbtest.html">RB tests</a> I now define an 'Auto RB' which is one of the 'Good' RBs or failing that one of the 'Fair' ones. This RB is then used for my ATLAS tests. This should avoid having to manually switch RBs every time the RAL ones go down.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-27694514853568128732007-12-19T15:44:00.000+00:002007-12-19T15:48:08.881+00:00Major DisruptionThere was major disruption over the last couple of days after a raid array died and various things had to be recovered from backup. Everything should be running again now but there is some history missing between 13-19 Dec. This could be recovered but is probably not worth it. The system is 'at risk' from now till 3 Jan (no 24x7 here!). Merry Christmas and a Happy New Year to all who use my test results.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0tag:blogger.com,1999:blog-5763571497875153745.post-17602317369492277902007-12-06T11:43:00.000+00:002007-12-06T11:46:49.773+00:00ATLAS Tests now all use release 13.0.30I have finally managed to create some 13.0.30 AOD and upload it to (most) UK SEs. All the tests are now using 13.0.30. QMUL, UCL_CCC, Lancaster and Edinburgh are not getting the analysis job at the moment as I could not make the replica at these sites.Steve Lloydhttp://www.blogger.com/profile/14225166097536228752noreply@blogger.com0