Calcul CMS: bilan CCRC08 C. Charlot / LLR LCGFR, 3 mars 2008
Réunion LCG-France, 03/03/2008 C.Charlot CCRC08: objectifs Test de readiness de l’infrastructure de calcul avant le data taking Exercice combiné avec les autres expériences Phase I: fév Série de tests fonctionnels Processing au T0 et archivage, transferts Cessy->CERN, transferts T0- >T1->T2, T1 staging et processing, tests CAF Phase II: mai 2008 Workflow complet et simultané à tous les sites Echelle = 100% 1 semaine de mise en route puis 4 semaines de test Cette présentation Tests de transferts Test de staging au T1 Test de processing simultané avec ATLAS au T1 (PIC, CC-IN2P3)
Réunion LCG-France, 03/03/2008 C.Charlot Objectifs: Performances T0 (stagged data) -> T1 (disk buffer): minimum = 25% de 2008, objectif = 40% de 2008, optimal = 50% de 2008 T1 (disk) -> T1 (bandes): 25% de 2008 Objectif doit être atteint pendant 3 jours de suite Stabilité T0 (stagged) -> T1 (disk) -> T1 (bandes) Tranfert stable avec réception d’un volume équivalent à 3 jours au débit ci- dessus (10TB pour CC-IN2P3) CCRC08: T0->T1 transfers
Réunion LCG-France, 03/03/2008 C.Charlot CCRC08: T0->T1 transfers T0-T1
Réunion LCG-France, 03/03/2008 C.Charlot CCRC08: T0->T1 transfers T0-T1-CCIN2P3 - Problems with srmv2 config - Problems with dcache
Réunion LCG-France, 03/03/2008 C.Charlot CCRC08: T0->T1 transfers T0-T1s
Réunion LCG-France, 03/03/2008 C.Charlot Objectifs: Performances Débit aggrégé: exporter à 50% du débit 2008 vers au moins 3 T1s Débit aggrégé: importer à 50% du débit 2008 depuis au moins 3 T1s –Au moins 1 T1 d’un autre continent CCRC08: T1->T1 transfers
Réunion LCG-France, 03/03/2008 C.Charlot Résumé des 3 semaines CCRC08: T1->T1 transfers
Réunion LCG-France, 03/03/2008 C.Charlot T1-CCIN2P3->otherT1s CCRC08: T1->T1 transfers
Réunion LCG-France, 03/03/2008 C.Charlot T1 T1 résumé CCRC08: T1->T1 transfers
Réunion LCG-France, 03/03/2008 C.Charlot CCRC08: T1->T2 transfers
Réunion LCG-France, 03/03/2008 C.Charlot CCRC08: T1-CCIN2P3->T2s
Réunion LCG-France, 03/03/2008 C.Charlot CCRC08: T1s->region T2s
Réunion LCG-France, 03/03/2008 C.Charlot CCRC08: T1s->region T2s
Réunion LCG-France, 03/03/2008 C.Charlot Reprocessing tests for CCRC08 in February : Reprocessing tests for CCRC08 in February : A) Migration from Tape to Buffer: pre-stage test. B) Reprocessing exercise: use all available CMS slots at T1s. Not done since already achieved at T1 CC-IN2P3 with ~1000 slots used processing of production data C) Reprocessing exercise: test ATLAS and CMS reprocessing jobs on same WN CCRC08: reprocessing tests
Réunion LCG-France, 03/03/2008 C.Charlot Goal: Goal: Measure latency, throughput and success rate for Tape to Buffer staging, for files which are only kept on Tape (not on disk).Plan: + select one (or more) dataset(s) of 10TB size existing at T1. + remove all the files from disk (aka, T1 Buffer). + fire the staging from Tape to Disk of all files. + measure some variables (detailed in the twiki). Schedule: Schedule: To be done at sites (with help of site admins) during the 1st quarter of February. Done at all T1 sites. CCRC08: pre-staging tests
Réunion LCG-France, 03/03/2008 C.Charlot Obtained Results: Obtained Results: Staging time for 10 TBs: ~24h (except RAL and IN2P3,CNAF) CCRC08: pre-staging tests
Réunion LCG-France, 03/03/2008 C.Charlot dCache HPSS interface: HPSS -> HPSS_Disk ->dCache_Disk (Farm access). 1 GB file needs ~140’’ to complete process HPSS_Tape HPSS_Disk dCache_Disk. The latest (HPSS_Disk dCache_Disk) is achieved in ~45 secs (22 MB/s), while HPSS_Tape HPSS_Disk takes the majority of time, as expected (mounts, tape seek…). 140’’ for file staging 7.1 MB/s for file recovery, in average, per drive. The test launched 3 parallel processes for staging -> 3 tapes (max.) were mounted at any time to recover files from the system. 7.1 MB/s/drive was achieved 23 MB/s, averaged A last test consisting on recalling 100 files in a same tape has been performed. HPSS_Tape dCache_Disk took 19' 12secs/file 88 MB/s. x10 better. CCIN2P3: pre-staging tests
Réunion LCG-France, 03/03/2008 C.Charlot Goal: run ATLAS and CMS reprocessing jobs on same WNs Investigate performances, memory issues Setup new CE with updated middleware and dedicated queues Results: ATLAS and CMS jobs were ran on dedicated CE+WNs 10 8-core worker nodes It allowed grid people to discover tricks in the LCG-CE glite-3.1 Discovered that at CC tthe jobs were submitted to all queues and that GlueCEStateStatus == "Production" was not taken into account. Max memory requirement was relaxed to allow for memory study but this was not looked at Too limited # of WNs to see any interference effects ATLAS+CMS processing test
Réunion LCG-France, 03/03/2008 C.Charlot Conclusions Bonne participation du CC aux tests CCRC08 Merci à tous pour les efforts Test de re-processing se sont avérés utiles Staging: débit limité par l’interface dCache->HPSS Optimisation de la gestion des requêtes ou ordonancement des fichiers par bande par l’utilisateur à prévoir Des difficultés pour les transferts Problème de configuration srmv2, nombreux problèmes dCache Objectif CCRC08 (mai) de 100MB/s depuis le CERN Il parait urgent de stabiliser dCache