Algorithmes pour le web “A Unified Approach to Personalization Based on Probabilistic Latent Semantic Models of Web Usage and Content”
Hanieh Fakhfouri2/16 Plan de présentation Introduction Probabilistic Latent Semantic Models of Web User Navigations A Recommendation Framework Based on the Joint PLSA Model Description of Data Sets Conclusion
Hanieh Fakhfouri3/16 Introdution Qu’est-ce que web usage mining Différentes catégories de comportement Différentes sortes de Data mining Techniques LSA,SVD,PLSA
Hanieh Fakhfouri4/16 Probabilistic Latent Semantic Models of Web User Navigations Usage Data preprocessing phase P = {p1, p2,..., pn} U = {u1, u2,..., um} Web Session Data: UPm×n, Content preprocessing techniques Application de “ text mining and information retrieval techniques”, nous permet de représenter chaque pageview comme un “ attribute vector”: content preprocessing techniques donne A = {a1, a2,..., as} : qui contient content observation data APs×n
Hanieh Fakhfouri5/16 Probabilistic Latent Semantic Models of Web User Navigations Content preprocessing techniques
Hanieh Fakhfouri6/16 Probabilistic Latent Semantic Models of Web User Navigations Variable cachée zk € Z = {z1, z2, · · ·, zl} est associé À chaque observation (ui, pj) À chaque observation (at, pj) Notre but : Trouver Z = {z1, z2, · · ·, zl}
Hanieh Fakhfouri7/16 The probabilistic latent factor model Peut êtres présenté de cette façon: 1. select a user session ui from U with probability Pr(ui); 2. select a latent factor zk associated with ui with probability Pr(zk|ui); 3. given the factor zk, generate a pageview pj from P with probability Pr(pj |zk).
Hanieh Fakhfouri8/16 The probabilistic latent factor model likelihood
Hanieh Fakhfouri9/16 Expectation-Maximization (EM) algorithm 2 phases : Expectation (E) step, Maximization (M) step Résultat : Pr(zk), Pr(ui|zk), Pr(at|zk), Pr(pj zk), pour chaque zk € Z, ui € U, at € A, and pj € P. (E) (M)
Hanieh Fakhfouri10/16 A Recommendation Framework Based on the Joint PLSA Model Characterizing Web User Segments Qu’est-ce qu’un « user segment » ? prototypical” user sessions : highest Pr(u|zk) Using the Joint Probability Model for Personalization
Hanieh Fakhfouri11/16 Characterizing Web User Segments Pr(ui|zk)
Hanieh Fakhfouri12/16 Using the Joint Probability Model for Personalization
Hanieh Fakhfouri13/16 Using the Joint Probability Model for Personalization
Hanieh Fakhfouri14/16 Expériences Description of the Data Sets CTI data : data set is based on the server log data from the host Computer Science department. 21,299 user sessions (U) and 692 Web pageviews (P), where each user session consists of 9.8 pageviews in average. Realty data : data set is based on server logs of a local affiliate of a national real estate company. 24,000 user sessions from 3,800 unique users.
Hanieh Fakhfouri15/16 Expériences Le 1ier exemple genère les « latent factors » ou les facteurs cachées en utilisant «PLSA model »
Hanieh Fakhfouri16/16 Expériences Utilisation de WAVP
Hanieh Fakhfouri17/16 Conclusion Utilisation de formules complexes Résultats intéressantes et la flexibilité de modèle Résultat des expériences montrent clairement que le modèle de PLSA donne lieu à une représentions presque correcte de comportement des utilisateurs.
Hanieh Fakhfouri18/16 Références Dai, H., and Mobasher, B Using ontologies to discover domain-level web usage pro.les. In Proceedings of the 2nd Semantic Web Mining Workshop at ECML/PKDD Anderson, C.; Domingos, P.; and Weld, D Relational markov models and their application to adaptive web navigation. In Proceedings of the Eighth ACM (KDD-2002). Berry, M.; Dumais, S.; and OBrien, G Using linear algebra for intelligent information retrieval. SIAM Review 37:573–595. Hofmann, T Probabilistic latent semantic indexing.In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval. Mobasher, B.; Dai, H.; Luo, T.; Sun, Y.; and Zhu, J Integrating web usage and content mining for more e.ective personalization. In E-Commerce and Web Technologies: Proceedings of the EC-WEB 2000 Conference, …………