2-mars-2009Eric Lançon1 Nouvelles d’ATLAS Reunion LCG 2 mars 2009 Lyon
New LHC Schedule 2-mars-2009 Eric Lançon 2 NO winter shutdown 2010 resources have to be deployed early 2010 (and before) and not in April TeV running (not 14 TeV) MC samples have to be produced
RAW data on Tape at T1 With optimistic assumptions On event size (Raw 1.6 MB, etc..) With 200Hz trigger rate and secs/day + 50% efficiency 2-mars-2009 Eric Lançon 3 Tape access at T1 ~450 TB to TAPE at CC
Data on DISK at T1 10% of RAW data share ESD (20% of total) 100% AODs 100% DPDs Pledges 2008 : 1133 TB (760 TB deployed at CC) 2009 : 2244 TB 2010 : 4502 TB 2-mars-2009 Eric Lançon 4 Size in TB RAW45 ESD540 AOD720 DPD720 TOTAL data at CC 2009 only ~ 680TB Looks OK But…
BUT… 2010 pledges might not be there (at CC)… MC has not been accounted for 370TB already used at Lyon Nor user and group analysis needs Access to ESD and RAW might be more important than anticipated in model SpaceTokens complicate disk management Not all T1s will deploy what was pledged … Consequence : ATLAS computing model might/will change! 2-mars-2009 Eric Lançon 5 Quid des T2s?
Consequences for us We have enough disk for 2009 and assuming we get what asked for 2010, there are no problem for storing data at Lyon Strong pressure to reduce number of AOD/DPD copies One (too simple) solution : remove them from T1s This is not what we want == No Analysis at Lyon Possible solution to keep in mind : Transfer some of T1 pledges to Lyon-T2 Need a separate SRM (one for T1, one for T2) 2-mars-2009 Eric Lançon 6
Cosmic data re-processing 2-mars-2009 Eric Lançon 7 10 th March : Arret CC 10 th March : Arret CC Done at T1 Test foreseen on some T2s Done at T1 Test foreseen on some T2s
Reprocessing setup Same as for Dec. 08 reprocessing Most data are on disk at CC Without ORACLE access Too many connections / job No ATLAS effort to use ORACLE With small (~GB) SQLite files for condition data access (‘hot’ files) Tests with ORACLE will be performed at CC 2-mars-2009 Eric Lançon 8
Re-reconstruction of MC Equivalent of data reprocessing Start from HITS (output of Geant simulation) reconstruct datasets with new software version And merge outputs to reduce number of files Will be in // with cosmic re-processing… but with lower priority Output volume : ~60 TB, on disk at CC 2-mars-2009 Eric Lançon 9
Since January Load increased on all sites and will continue Unless old data cleanup is not performed Job efficiency increased a lot For all clouds partly because of ATLFast production (one input, one output only) low stress on storage Improvements on CE-BQS reactivity Variation of ATLAS share in BQS FTS balancing at CC Xrootd default for analysis (through pAnda) dCache much more stable 2-mars-2009 Eric Lançon 10
2-mars-2009 Eric Lançon 11 Increase! Deeps!
2-mars-2009 Eric Lançon 12 Nb of ATLAS production jobs last month FR-Cloud Always activated jobs for Lyon Deeps are not due to ATLAS Always activated jobs for Lyon Deeps are not due to ATLAS Records! Will continue… Records! Will continue…
2-mars-2009 Eric Lançon 13
2-mars-2009 Eric Lançon 14 Not only with CMS ATLAS
LHC & BQS La CPU qui n'est pas utilisee par CMS ne devrait-elle pas revenir a LHC ? BQS traite les objectifs des groupes individuellement Cette notion d'entite LHC n'a jamais ete requise (elle n'existe pas dans BQS) QUID en periode de re-processing? 2-mars-2009 Eric Lançon 15
Reactivity of CE-BQS 2-mars-2009 Eric Lançon 16 Activated Running Very good! Never observed before! Very good! Never observed before! Quid balance between CEs?
FTS Improvements 2-mars-2009 Eric Lançon 17 10M files T1-T1 transfer test Nb of datasets in queue for Lyon 10M files T1-T1 transfer test Nb of datasets in queue for Lyon 3 rd FTS machine (CC downtime) Improved monitoring of FTS needed
2-mars-2009 Eric Lançon 18 Jobs per Clouds 13/01-23/ Mjobs successful France : En augmentation (Période précédente: 12%) 3.2 Mjobs successful France : En augmentation (Période précédente: 12%)
Efficacités Clouds 13/01-23/02 2-mars-2009 Eric Lançon 19 FR : Efficacité 89% Cf. Période du 23/11-12/01 : 80% FR : Efficacité 89% Cf. Période du 23/11-12/01 : 80%
2-mars-2009 Eric Lançon 20 FR Cloud : Jobs per Site 13/01-23/02
FR Cloud : 2-mars-2009 Eric Lançon 21 Lyon : T1 : 90% - T2 : 82% (période précédente T1: 74% (reprocessing) Efficacité/Site 13/01-23/02
Problemes Transfert output DS T2 ->T1 si trop lents -> erreurs Charge SRM au CC Tickets : système de tickets ne fonctionne pas tres bien Si ticket mis le weekend -> pas de réponse visible Ticket le 8/02 pour erreurs de stockage Tickets le 21/02 pour pb SRM Aucun jobs & pbs afs ce we. Si pb durant reprocessing de vraies données??? 2-mars-2009 Eric Lançon 22
Disk space Pledges are centrally managed by ATLAS Token : USERDISK has to be replace by SCRACH Name was misleading Disk crisis at T1s Too few pledges Too many dataset copies (30 currently world wide) Too many dataset versions Data distribution within clouds might change 3 full copies on FR-cloud… 2-mars-2009 Eric Lançon 23
Analysis tests O(50) jobs/site weekly (Thursday) Monitoring of performances Metrics Time for software setup Time to read data Time to store data Nb job vs time … Missing involvement of physicists on some sites 2-mars-2009 Eric Lançon 24