Modèles de prédiction Intérêt de la validation

Modèles de prédiction Intérêt de la validation
M. Dramaix-Wilmet Département de Biostatistique

Modèle de prédiction Equation prédiction d’une variable “dépendante” à partir de plusieurs autres Modèles de régression linéaire Modèles de régression logistique Classification Autres …..

Qu’en est-il pour d’autres échantillons?
Modèles de prédiction  problème!! “OPTIMISME” Le modèle “prédit” bien pour l’échantillon à partir duquel il est établi Qu’en est-il pour d’autres échantillons?

Modèles de prédiction Validation: un exemple
External validation is necessary in prediction research: A clinical example S.E. Bleekera, H.A. Moll, E.W. Steyerberg, A.R.T. Donders, G. Derksen-Lubsen, D.E. Grobbee, K.G.M. Moons Journal of Clinical Epidemiology 56 (2003) 826–832

The derivation set was comprised of 376 children with fever without apparent source, and the validation set consisted of 179 children who had been referred for the same reason (three respectively zero patients were excluded because of isolation of Haemophilus influenzae). Except for the variable pale skin, no material differences were found in the distribution of the general characteristics and the predictors between the two sets. A serious bacterial infection was present in 20% of the children in the derivation set and in 25% of the validation set.

Strong predictors of serious bacterial infection included age above 1 year, duration of fever, changed crying pattern, nasal discharge or earache in history, ill clinical appearance, pale skin, chest-wall retractions, crepitations, and signs of pharyngitis or tonsillitis. The ROC area of this model was (95%CI: 0.78–0.87) and the R² :32.3% (95%CI: 15.1–49.4%). Subsequently, the model was applied to the validation set to test its predictive performance. The ROC area dropped to 0.57 (95%CI: 0.47–0.67) and the R² to 2.0%.

Modèle de prédiction Crières de validation
Exemple : régression logistique Performance Discrimination : capacité à différencier les “malades” des “non malades” surface ROC Calibration Concordance entre probabiltés prédites et observées ex: test de Hosmer et Lemeshow Biais: différence entre moyenne des probabiltés prédites et moyenne des probabilités observée Mesures globales Ex: R² de Nagelkerke

Modèles de prédiction Méthodes de Validation
Validation interne Data splitting Cross-validation Bootstrap Validation externe Un autre échantillon

Modèles de prédiction Méthodes de Validation
« Shrinkage Factor » = facteur de réduction Facteur déterminé à partir des méthodes de validation interne ci-dessus Corrige « l’optimisme » du modèle établit à partir des données Ex. : réduit coefficient de régression réduit le OR, réduit la surface ROC…

Modèles de prédiction Shrinkage factor

Données: cancer du sein n=138 Événement: récurrences (76) FUP median: 83 mois Modèle de Cox 1 prédicteur=S-phase fraction HR=exp(b*X) Question: meilleur cut-point?

Approche “ad-hoc”: Cutpoint p-value minimum Sans correction Cutpoint=10.7 pmin=0.007 b=0.863 HR=2.37 P-value “corrigée” pour tenir compte de “l’optimisme” Pcor= facteur de réduction=0.57 Bcor=0.57*0.863 HRcor=1.64

Modèles de prédiction Validation – Data splitting
Simple On divise l’échantillon aléatoirement en deux morceaux “Training” “Validation” sample sample

Repeated data splitting On divise aléatoirement l’échantillon en deux morceaux « training sample » et « validation sample » On répète un grand nombre de fois la procédure On analyse la distribution des statistiques étudiées

STATISTICA, anno LXIII, n. 2, 2003 MODEL PERFORMANCE ANALYSIS AND MODEL VALIDATION IN LOGISTIC REGRESSION R. Arboretti Giancristofaro, L. Salmaso

i) Data-splitting The original sample is randomly split into the fitting and validation samples. ii) Model fitting The model is fitted on the fitting sample using the SAS logistic procedure. iii) Event probability estimation We use the fitted model to estimate the probability of a positive outcome for each of the subjects in both the fitting and the validation samples.

iv) Computation of performance measures on both samples For both the fitting and the validation samples we compute the following statistics: - C statistic (measure of discrimination); - Hosmer and Lemeshow chi-squared test (measure of calibration); - bias (measure of calibration). v) Iterations and full model In order for a model to be validated, the above described procedure is repeated 100 times. After that we also fit the model on the full original sample.

vi) Results Each time the procedure is repeated the sample is split into two random portions, the model is fitted on one of the two portions, and its performance is evaluated on both portions. Since each iteration is based on a different split of the original data, it results in different model coefficients, significance levels, and performance values. vii) Presentation of the results Results are presented using both tables and box-plots.

viii) Interpretation of the results variability of the estimates of the model’s parameters over the 100 repetitions. variability large model’s coefficients highly depend on the particular portion of the original data used to fit the model clear symptom of instability of the model and, what is worse, of overfitting (not enough data to compute a reliable estimation of the model’s parameters)

quality of fit distributions of the estimates to be averaged around the same values as the estimates computed on the whole original sample. If this does not happen, the model cannot be validated because of its internal instability. performance outside the fitting sample. comparing the fitting to the validation distributions of the measures of discrimination and calibration. We expect the model to perform better on the fitting sample, i.e. we expect lower levels for both discrimination and calibration when shifting from the fitting to the validation distribution a reduction in the magnitude of the performance measures is to be expected. Drop in value of the measures is too large model does not validate outside the fitting sample.

Modèles de prédiction Validation – Cross-validation
K-fold cross-validation On divise l’échantillon en K sous-échantillons mutuellement exclusifs On établit le modèle sur K-1 sous-échantillons pris ensembles On applique le modèle sur le sous-échantillon omis On répète l’opération en omettant tour à tour chacun des K sous échantillon On étudie la distribution des statistiques relevées dans les sous-échantillons de validation (moyenne) ou on détermine un «skrinkage factor» (facteur de réduction) pour estimer «l’optimisme» du modèle dans l’échantillon original

Modèles de prédiction Validation – Cross-validation
Cas particulier: leave-in-one =jacknife Les sous-échantillons sont constitués d’un seul sujet on ôte chaque sujet tour à tour; on établit le modèle sur les n-1 autres sujets et on le valide sur le sujet restant

Modèles de prédiction Validation croisée

Modèles de prédiction validation croisée

Modèles de prédiction Validation croisée
Titre du document / Document title Construction et validation d'un modèle de prédiction de la date de floraison du colza d'hiver = Modelisation of the winter rape flowering date Auteur(s) / Author(s) HUSSON F. ; LETERME P. (2) ; Résumé / Abstract Le but de cette étude est de présenter une modélisation de la date de floraison du colza d'hiver. Ce modèle est ensuite paramétré pour différentes variétés de colza: Darmor, Bienvenu, Eurol, Bristol, Aligator, Goéland, Vivol et Symbol. On valide ensuite le modèle en faisant appel à des techniques de validation croisée. Ce type de validation est nécessaire si on doit à la fois paramétrer et valider le modèle avec peu d'observations. Revue / Journal Title OCL. Oléagineux, corps gras, lipides (OCL, Ol. corps gras lipides) ISSN Oléagineux, corps gras, lipides Source / Source 1997, vol. 4, no5, pp. (23 ref.)

Modèles de prédiction Validation - Bootstrap
Sélection d’un échantillon aléatoire avec remplacement de taille n ou inférieure à n On établit le modèle dans cet échantillon «bootstrap» caractéristiques On répète l’opération un grand nombre de fois (1000, 2000 x) On étudie la distribution des caractéristiques relevées dans les échantillons bootstrap et/ou en un déduit un « shrinkage factor » (facteur de réduction).

A prediction rule for selective screening of infection Chlamydia trachomatis H M Götz, J E A M van Bergen, I K Veldhuijzen, J Broer, C J P A Hoebe and J H Richardus Sex. Transm. Inf. 2005;81;24-30

Statistical analysis Univariate logistic regression analyses were performed, with self reported characteristics as independent variables and diagnosis of C trachomatis as the dependent variable. For the odds ratios, 95% confidence intervals (CI) were calculated. Variables showing an association of p,0.2 were included in the multivariable analysis. Backward stepwise selection was performed with a p value for the likelihood ratio test as the criterion for elimination of variables from the model. Interactions between predictors and sex were assessed to study whether effects of predictors were different for men and women.

The goodness of fit (reliability) of the model was tested by the Hosmer-Lemeshow statistic. The model’s ability to discriminate between participants with or without a chlamydial infection was quantified by using the area under the receiver operating characteristic curve (AUC). AUC values 0.7–0.8 are considered acceptable, 0.8–0.9 excellent, and .0.9 outstanding.17 Calibration was assessed graphically by plotting observed frequencies of chlamydial infection against predicted probabilities.

The performance of screening criteria in a study population, from which the model is developed, is known often to be too optimistic. The internal validity of the regression model was therefore assessed to estimate the performance of the model in new participants, similar to the population used to develop the model. We used bootstrapping techniques: random samples, with replacement, were taken one hundred times from the study population. At each step predictive models were developed, including variable selection.

Bootstrapping may help to reduce the bias in the estimated regression coefficients, and give an impression of the discriminative ability in similar participants of screening. The outcome is a correction factor for the AUC, and a shrinkage factor to correct for statistical over-optimism in the regression coefficients and to improve calibration of the model in future participants.

Modèles de prédiction Validation externe
External validity was assessed by leaving out the four MHS in the sample one by one, and fitting regression models, including variableselection, on the remaining data. The discriminative ability of this model was assessed externally on the MHS data not included in the fitting procedure. This procedure replicates the situation in which the prediction model is applied in another MHS region with a population that may to some extent be different.

Performance of predictive model and development of prediction score Multivariable logistic regression analysis showed that chlamydial infection was associated with high urbanisation, young age, ethnicity (Surinamese/Antillian), low/intermediate education, multiple lifetime partners, a new contact in the previous two months, no condom use at last sexual contact, and complaints of (post)coital bleeding in women and frequent urination in men (table 1). The only statistically significant interaction term in the model was sex and the number of lifetime partners.

Modèles de prédiction Validation – Bootstrap; externe
The Hosmer-Lemeshow goodness of fit test had a p value of 0.12, indicating adequate goodness of fit. The model discriminated well between participants who were and were not infected by C trachomatis, with an AUC of 0.81 (95% CI 0.77 to 0.84). Internal validation showed optimism in the AUC of 0.03, resulting in a correction of the AUC from 0.81 to 0.78. In the external validation similar sets of predictors were selected. When tested in each separate MHS, the AUC varied from 0.74 to 0.80.

Modèles de prédiction Validation – Bootstrap

Modèles de prédiction validation externe - exemple
Prognostic Indices for Mortality of Hospitalized Children in Central Africa Michele Dramaix, Daniel Brasseur, Philippe Donnen, Paluku Bawhere, Denis Porignon, Rene Tonglet and Philippe Hennart American Journal of Epidemiology 1996; 143 (12)

Modèles de prédiction validation externe - exemple

Modèles de prédiction Validation - Conclusion
Validation indispensable pour utiliser un nouveau modèle Bootstrap plus souvent recommandé pour validation interne Utiliser le modèle corrigé pour son “optimisme” par un facteur de réduction Validation externe: la meilleure!

Modèles de prédiction Intérêt de la validation

Présentations similaires

Présentation au sujet: "Modèles de prédiction Intérêt de la validation"— Transcription de la présentation:

Présentations similaires

Notre projet

Feed-back

Entrer

S'autoriser via un réseau social:

Modèles de prédiction Intérêt de la validation

Présentations similaires

Présentation au sujet: "Modèles de prédiction Intérêt de la validation"— Transcription de la présentation:

Présentations similaires

Notre projet

Feed-back