Frédéric Desprez LIP ENS Lyon / INRIA GRAAL Research Team

Slides:



Advertisements
Présentations similaires
WINS Windows Internet Name Service. What is WINS?  It does name resolution (?!) DNS resolves IP numbers and FQDN ARP resolves IP numbers and MAC addresses.
Advertisements

Making PowerPoint Slides Avoiding the Pitfalls of Bad Slides.
PERFORMANCE One important issue in networking is the performance of the network—how good is it? We discuss quality of service, an overall measurement.
An Introduction To Two – Port Networks The University of Tennessee Electrical and Computer Engineering Knoxville, TN wlg.
Traffic Sign Recognition Jacob Carlson Sean St. Onge Advisor: Dr. Thomas L. Stewart.
 Components have ratings  Ratings can be Voltage, Current or Power (Volts, Amps or Watts  If a Current of Power rating is exceeded the component overheats.
IP Multicast Text available on
Subject: CMS(Content Management System) Université Alioune DIOP de Bambey UFR Sciences Appliquées et Technologies de l’Information et de la Communication.
Overview of SUN’s Unix Campus-Booster ID : **XXXXX Copyright © SUPINFO. All rights reserved Introduction to Solaris 10.
Theme Three Speaking Questions
Notes for teacher. You can just use slides 2-5 if you wish. If you want to do the practical activity (slides 6-8) you will need to: print off Slide 6.
What about discrete point skills?
IGTMD réunion du 4 Mai 2007 CC IN2P3 Lyon
Speaking Exam Preparation
Theme Two Speaking Questions
Direct and Indirect Object Pronouns in French
LEACH & LEACH-C & V-LEACH. Overview Introduction Existing Protocols – Direct Transmission – Minimum Transmission Energy LEACH Stochastic Threshold Algorithm.
IDL_IDL bridge The IDL_IDLBridge object class allows an IDL session to create and control other IDL sessions, each of which runs as a separate process.
Réunion service Instrumentation Activités CMS-Traces
Il y a des moments dans la vie où la présence de l’autre
Qu’est-ce qu’on mange au...
Why is it important to plan ahead for the future?
Samples for evaluation from All Charts & Templates Packs for PowerPoint © All-PPT-Templates.comPersonal Use Only – not for distribution. All Rights Reserved.
Fonctionnement de la grille
Reflective verbs or Pronominal verbs
Strengths and weaknesses of digital filtering Example of ATLAS LAr calorimeter C. de La Taille 11 dec 2009.
ABAQUS I Summary Program Capability Components of an ABAQUS Model Elements, Materials and Procedures Modules (analysis, pre and post processing) Input.
Quantum Computer A New Era of Future Computing Ahmed WAFDI ??????
- User case - 3D curve length optimization
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Theme Two Speaking Questions
Architecture de clients Duniter
Theme One Speaking Questions
Approvisionnement et Traitement Campagne CPS 2015 Qu’avons-nous appris Approvisionnement et Traitement Campagne CPS 2015 Qu’avons-nous appris? Tchad.
F RIENDS AND FRIENDSHIP Project by: POPA BIANCA IONELA.
Quel type de compétences peut-on apprendre en participant à des activités de robotique? Recherche et raisonnement déductif.
G. Peter Zhang Neurocomputing 50 (2003) 159–175 link Time series forecasting using a hybrid ARIMA and neural network model Presented by Trent Goughnour.
Lect12EEE 2021 Differential Equation Solutions of Transient Circuits Dr. Holbert March 3, 2008.
Essai
Introduction to Computational Journalism: Thinking Computationally JOUR479V/779V – Computational Journalism University of Maryland, College Park Nick Diakopoulos,
High-Availability Linux Services And Newtork Administration Bourbita Mahdi 2016.
Qu’est-ce que tu as dans ta trousse?
Efficacité des algorithmes
Introduction à la Grille
Information available in a capture history
Qu’est-ce que tu as dans ta trousse?
Forum national sur l’IMT de 2004.
Definition Division of labour (or specialisation) takes place when a worker specialises in producing a good or a part of a good.
C’est quel numéro? Count the numbers with pupils.
Quelle est la date aujourd’hui?
1-1 Introduction to ArcGIS Introductions Who are you? Any GIS background? What do you want to get out of the class?
Vulnerability Analysis by : Wail Belhouchet Dr Djouad Tarek 1.
sortir avec mes copains faire les magasins jouer à des vidéo
Question formation In English, you can change a statement into a question by adding a helping verb (auxiliary): does he sing? do we sing? did they sing.
J’ai mal !!!!! FINAL REVIEW.
Making PowerPoint Slides Avoiding the Pitfalls of Bad Slides.
POWERPOINT PRESENTATION FOR INTRODUCTION TO THE USE OF SPSS SOFTWARE FOR STATISTICAL ANALISYS BY AMINOU Faozyath UIL/PG2018/1866 JANUARY 2019.
SLIDE TITLE PowerPoint Presentation SUBTITILE HERE.
C021TV-I1-S4.
les instructions Bonjour la classe, sortez vos affaires
5S Methodology How to implement "5S" and get extraordinary results.
1 Sensitivity Analysis Introduction to Sensitivity Analysis Introduction to Sensitivity Analysis Graphical Sensitivity Analysis Graphical Sensitivity Analysis.
Avoiding the Pitfalls of Bad Slides Tips to be Covered Outlines Slide Structure Fonts Colour Background Graphs Spelling and Grammar Conclusions Questions.
Le Passé Composé (Perfect Tense)
University : Ammar Telidji Laghouat Faculty : Technology Department : Electronics 3rd year Telecommunications Professor : S.Benghouini Student: Tadj Souad.
+ Siham Boutayeb BMC Sr Software Consultant.
Soutenance de thèse: Okba Taouali 1 02/08/2019 Fathia AZZOUZI, Adam BOURAS, Nizar JEBLI Conceptual specifications of a cooperative inter- machines dialogue.
IMPROVING PF’s M&E APPROACH AND LEARNING STRATEGY Sylvain N’CHO M&E Manager IPA-Cote d’Ivoire.
M’SILA University Information Communication Sciences and technology
Transcription de la présentation:

Frédéric Desprez LIP ENS Lyon / INRIA GRAAL Research Team Semi-Static Data Replication and Request Scheduling From algorithms to middleware and applications Frédéric Desprez LIP ENS Lyon / INRIA GRAAL Research Team Join work with E. Caron, G. Le Mahec and A. Vernois

Agenda Introduction One target application: BLAST using large databases over the grid One problem: join scheduling and replication One middleware : DIET/DAGDA Conclusion and future work

Introduction Several applications ready (and not only number crunching ones !) Huge data sets start to be available around the world Data management is one of the most important issue of today’s application Replication has to be used to improve platform throughput Services for resource management and data management/replication available in most grid middleware … … but usually they work separately Our approach Put the data management into the resource management sphere Perform the request scheduling with the data replication based on the information provided by the application Application to a large scale bioinformatics application

One target application: BLAST over the grid Basic Local Alignment Search Tool (BLAST) To find homologies between nucleotids or amino acids sequences. Objectives To find clues about the function of a protein/gene comparing a newly discovered sequence to well-known ones. Biological databases BLAST uses biological databases containing large sets of annotated sequences as entry parameter. Usage Generally, biologists perform BLAST searches on large sets of sequences. En comparant deux séquences dont l’une à un rôle connu, on espère, si elles se ressemblent, trouver le rôle de la nouvelle séquence dans une protèine ou dans un gène. Les bases de données utilisées sont simplement des fichiers plats qui contiennent des séquences accompagnées de leurs descriptions. … A T C G | -

One target application: BLAST over the grid Each sequence of the entry set can be treated independently A simple parallelization: to perform the searches for each sequence on different nodes But efficient… A sequence weights at most some kilobytes and a search on it takes at most some minutes Requirements to “gridify” BLAST applications A way to submit and distribute the requests A way to replicate databases Then a software to perform the sequences sets division before the computation the results merge on exit

One target application: BLAST over the grid Grid-BLAST execution The user submits a large set of sequences and a database or a database identifier Entry set is divided. The database is replicated on the platform using a data manager The resource broker returns a set of computing resources Each sequence is treated on a different server Results merged into a single result file Si la base n’est pas « installée » dans DAGD, l’utilisateur la transmet en paramètre, sinon on se sert du système d’identifiants partagés de DAGDA (alias sur les données qui évite de se servir d’un identifiant uuid pas pratique à partager/échanger) Rem : La division du fichier d’entrée et la fusion des résultats ont un coût négligeable en regard du coût d’exécution des BLAST.

Where to replicate the databases ? How to distribute the requests ? Parallelization and distribution of bioinformatics requests over the Grid Context Large sets of requests are submitted by many users of the grid Large databases are used as parameters of the requests The computing nodes of the grid have different computing and storage capacities Several different BLAST applications depending upon users’ needs The jobs have to be scheduled on-line (the scheduling is made job by job) We want to optimize the resources usage: No useless replication Platform throughput optimization Nice “little scheduling problem” (Yves Robert approved!) Where to replicate the databases ? How to distribute the requests ?

The information obtained can be several minutes old Parallelization and distribution of bioinformatics requests on the Grid Moreover, to obtain dynamic information about the nodes status is a difficult task involving complex grid services The information obtained can be several minutes old For long execution time jobs: not really a problem For jobs that take some minutes: the information are outdated In this context without more information, there are few things to do… Using the classical Minimum Completion Time (MCT) scheduling strategy is a good way to schedule the jobs but… With few more information, we can do better The analysis of the execution traces of bioinformatics clusters showed that the way the biologists are submitting jobs is homogeneous if the observed time interval is long enough. We will use this information to optimize the Grid resources usage.

Scheduling and Replication Algorithm (SRA) Parallelization and distribution of bioinformatics requests on the Grid Problem modeling Scheduling and Replication Algorithm (SRA) Relies on the integer approximation of a linear program Gives the data distribution and the jobs scheduling Takes into account the specificities of the biologists submissions

Scheduling and Replication Algorithm SRA gives good results when the requests frequencies are constant The data distribution and the scheduling are efficient It does not need dynamic information about the Grid

Scheduling and Replication Algorithm Using MCT Using SRA Simulation experiments Grid’5000 platform 2540 heterogeneous CPUs on 9 sites 5 databases of different size 5 algorithms of different complexity 1 request per 0,3 s. The data are randomly distributed before the start For small sets of requests SRA does not have enough time to replicate the data For large sets of requests The replications and jobs scheduling of SRA avoid the computing nodes to saturate 5 bases de tailles de 150 Mo à 5 Go. 5 algos : blastn, blastp, blastx, tblastn et tblastx (adn vs adn, protein vs protein, adn vs protein etc…) Le plus long est tblastx (environ 20 fois plus long que blastn) Le pic du début dans le graphique SRA : Les ensembles de jobs soumis sont petits, les replications finissent après la soumission du dernier job => Le temps moyen des jobs est important puisqu’ils sont tous ordonnancés sur les mêmes machines. Le temps d’attente décroit au fur et à mesure que les réplications se finissent, les derniers jobs soumis profitant pleinement des réplications. MCT envoie chaque job sur le nœud qui sera le plus rapide pour l’exécuter à l’instant de sa soumission : Il va souvent copier les bases sur le site le plus rapide, même s’il faut pour ça effacer une base très grande et donc longue à retransmettre ensuite. Ce qu’il fera quand même pour les jobs les plus longs (tblastx). Pour les jobs les plus rapides (blastn), il avantage les nœuds qui ont déjà la base même s’ils sont lents et même si ils sont très nombreux…

SRA with frequency variation detection On the grid, the number and variety of users can make the time needed to have constant frequencies very long. Some « events » can temporary modify the frequencies Data challenges Important conference submission deadline Holidays and week-ends … By detecting such a frequency variation, we can correct the data distribution and jobs scheduling

SRA with frequency variation detection Algorithm principle SRA gives an initial data distribution The Resource Broker records the submissions and update the frequencies If a frequency varies beyond a  threshold New data distribution computation using SRA Start of the transfers asynchronously if possible. Otherwise, the job scheduling will cause the data transfers. Task scheduling using the job distribution given by SRA

SRA with frequency variation detection The frequencies vary - We do not proceed to data redistribution The scheduling is no more efficient for large sets of jobs. The average execution time of the jobs increases with the number of jobs submitted. The frequencies vary - We detect the variations and proceed to the data redistribution The scheduling efficiency is saved. The average execution time of the jobs does not increase too much.

Implementation within a middleware - DIET GridRPC API Network Enabled Servers paradigm Some features Distributed (hierarchical) scheduling Plugin schedulers Data management Workflow management Sequential and parallel task (through batch schedulers) … LA MA Server front end Master Agent Local Agent Client JXTA http://graal.ens-lyon.fr/DIET/ 15

Data/replica management Two needs Keep the data in place to reduce the overhead of communications between clients and servers Replicate data whenever possible Three approaches for DIET DTM (LIFC, Besançon) Hierarchy similar to the DIET’s one Distributed data manager Redistribution between servers JuxMem (Paris, Rennes) P2P data cache DAGDA (IN2P3, Clermont-Ferrand) Replication Joining task scheduling and data management Work done within the GridRPC Working Group (OGF) Relations with workflow management 16

DAGDA Data Arrangement for Grid and Distributed Applications A new data manager for the DIET middleware providing Explicit data replication: using the API Implicit data replication: data items are replicated on the selected servers Direct data get/put through the API Automatic data management: using a selected data replacement algorithm when necessary LRU: The Least Recently Used data is deleted. LFU: The Least Frequently Used data is deleted. FIFO: The « oldest » data is deleted. Transfer optimization by selecting the best source Using statistics on previous transfers Storage resources usage management The space reserved for the data can be configured by the “user” Data status backup/restoration Allowing to stop and restart DIET, saving the data status on each node Explicite : L’utilisateur décide explicitement de répliquer les données. Implicite : Ce sont les appels aux services qui provoquent les réplications de données. Contrairement à DTM, les données sont répliquées et pas déplacées. Accès direct aux données stockées + Ajout direct d’une donnée dans DIET. Automatic data management : Quand on souhaite installer une donnée sur un nœud qui ne dispose plus assez d’espace, on efface une donnée en utilisant un algo choisi dans la configuration du nœud. Transfer optimization : On choisit la « meilleure » source pour une donnée en fonction de stats réalisées pendant les transferts précédents. Storage usage management : On peut choisir quelle quantité de mémoire et quel espace disque sont réservés aux données gérées par DAGDA. Data backup/restoration : On peut enregistrer l’état actuel des distributions de données et rétablir la situation au redémarrage de DIET. (Par exemple, on arrive à la fin d’une réservation, et on veut continuer une expèrience plus tard. Au redémarrage, les données sont remises comme elles étaient avant la coupure.)

DAGDA Transfer model Uses the pull model. The data are sent independently of the service call. The data can be sent in several parts. 1: The client send a request for a service 2: DIET selects some SeDs according using a scheduling heuristic 3: The client sends its request to the SeD 4: The SeD downloads the data from the client and/or from other DIET servers 5: The SeD performs the call. 6: The persistent data are updated Contrairement à DTM, c’est le SeD qui télécharge les données et pas le client qui les envoies « d’autorité ». Seules les descriptions des données (type, taille etc.) sont envoyées pour les requêtes. Si on a configuré la taille maximum des messages envoyés par DAGDA, les données trop grandes sont envoyées en plusieurs fois. Ca permet également de limiter la quantité de mémoire nécessaire pour les transferts. DTM charge tout en mémoire avant d’envoyer les données.

DAGDA DAGDA architecture Each data is associated to one unique identifier DAGDA control the disk and memory space limits. If necessary, it uses a data replacement algorithm The CORBA interface is used to communicate between the DAGDA nodes Users can access data and perform replications using the API Le « cœur » de DAGDA gère l’identification et la recherche des données ainsi que le choix des sources/destinations pour les transferts. Les éléments étendus de DAGDA gèrent les limitations de ressources fixées par les utilisateurs et la sauvegarde/restauration des données. L’API permet d’accéder/ajouter directement des données dans la plateforme ainsi que de lancer des réplications.

Putting everything together Une requête est un ensemble de séquences à « BLASTER » sur une base donnée. Une sous-requête est un sous-ensemble de ces séquences à BLASTER sur la même base.

Putting everything together Experiments over Grid’5000 using DIET and DAGDA 300 nodes (4 sites), 40000 sequences, 4 BLAST algorithms, 2 databases 2 different sequences splits Small grain: 1 sequence after an other Coarse grain: splitting as a function of the number of servers available 4 scheduling algorithms random, MCT, round-robin, dynamic SRA Use of plugin schedulers

Putting everything together Maximum request partition Partition in n sub-requests. (n the number of available nodes) Division maximum : Si le fichier requête de départ contient n séquences, on crée n fichiers de requête chacun d’entre eux ne contenant qu’une séquence. Division en n sous-requêtes : On a n nœuds dispos, on crée n sous-requêtes de taille identique. Chaque nœud n’a à traiter qu’une seule requête. Avec Random, MCT & Round-Robin, la multiplication des requêtes provoque de l’overhead qui n’est pas compensé par l’ordonnancement. Le mieux reste de découper les requêtes en le nombre de nœuds dispos. Avec SRA, plus on a de requêtes, plus les fréquences sont fiables, et donc, l’algo est plus efficace. On compense l’overhead par l’ordonnancement. Globalement Dynamic-SRA est meilleur, même en découpant la requêtes en n parties si on a suffisamment de nœuds (ici 300 SeDs) : Sur 300 fichiers, on arrive à avoir des fréquences à peu près convenables. Avec moins de nœuds, donc moins de requêtes, les fréquences sont de plus en plus approximatives et comme on optimise le débit de la plateforme, SRA-dynamique devient de moins en moins bon. For MCT, Random & Round-Robin: The better requests partitioning is using the number of available nodes (coarse grain). For Dynamic-SRA: One sequence per request (small grain)

Putting everything together Comparison of MCT and dynamic SRA over 1000 nodes (8 sites) 4 databases between 175 Mo and 6.5 Go 4 algorithms 40000 sequences Frequencies variation

Putting everything together Using Dynamic-SRA: Using MCT: Requests 20001 to 40000: BLASTP Vs DB 1 10% BLASTP Vs DB 3 30% BLASTN Vs DB 2 10% BLASTN Vs DB 4 10% BLASTX Vs DB 1 10% BLASTX Vs DB 3 10% TBLASTX Vs DB 2 20% Requests 1 to 20000: BLASTP Vs DB 1 30% BLASTN Vs DB 2 30% BLASTX Vs DB 3 20% TBLASTX Vs DB 4 20% Les algos ont des complexité différentes : BLASTN est le plus rapide à faire (ADN => alphabet de 4 lettres Vs ADN). BLASTP : Protéine => alphabet de 20 lettres Vs Protéines. BLASTX : ADN traduit en protéine Vs Protéines (traduction ADN + BLASTP) TBLASTX : Le plus long => ADN traduit en Protéine Vs Une base ADN traduite en Protéines. (Traduction de toutes les séquences et BLASTP) Globalement, le changement des fréquences n’a pas beaucoup d’influence sur MCT.

Conclusion and future work Data management is one of the most important issue of large scale applications over the grid It has to be linked with request scheduling to get the best performance Static algorithms perform well even on a dynamic platform Future work Provide a set of replication algorithms tuned for specific application classes within DAGDA Increase the exchange between resource and data management systems Use other performance metrics (fairness) Manage replication for workflows scheduling

Questions ?