Projects


Automatic Unbiased Learning to Rank from Unbiased Propensity Estimation

posted Apr 17, 2018, 2:10 PM by Qingyao Ai   [ updated Apr 26, 2018, 4:45 PM ]

Learning to rank with biased click data is a well-known challenge. A variety of methods has been explored to debias click data for learning to rank such as click models, result interleaving and, more recently, the unbiased learning-to-rank framework based on inverse propensity weighting. 
Despite their differences, most existing studies separate the estimation of click bias (namely the propensity model) from the learning of ranking algorithms. To estimate click propensities, they either conduct online result randomization, which can negatively affect the user experience, or offline parameter estimation, which has special requirements for click data and is optimized for objectives (e.g. click likelihood) that are not directly related to the ranking performance of the system. In this work, we propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker and an unbiased propensity model. DLA is an automatic unbiased learning-to-rank framework as it directly learns unbiased ranking models from biased click data without any preprocessing. It can adapt to the change of bias distributions and is applicable to online learning. [SIGIR'18] [code]

Learning to Rank with Deep Local Context Models

posted Apr 17, 2018, 2:02 PM by Qingyao Ai   [ updated Apr 26, 2018, 4:45 PM ]


Learning to rank has been intensively studied and widely applied in information retrieval. However, the majority of the existing learning-to-rank algorithms model the relativity at the loss level through constructing pairwise or listwise loss functions. They are confined to  pointwise relevance, i.e., the relevance score of a document is computed based on the document itself, regardless of the other documents in the list. In this project, we argue that the relative relevance of a document in a rank list should depend on other top ranked documents, which we refer as the local ranking context. Thus, we propose to use the inherent feature distributions of the top results to capture the listwise context for learning-to-rank systems. [SIGIR'18] [code].

Joint Representation Learning Model for Top-N Recommendation

posted Mar 4, 2018, 5:50 PM by Qingyao Ai

    The Web has accumulated a rich source of information, such as text, image, rating, etc, which represent different aspects of user preferences. In this work, we propose a Joint Representation Learning (JR
L) framework in which each type of information source (review text, product image, numerical rating, etc) is adopted to learn the corresponding user and item representations based on available (deep) representation learning architectures. [CIKM'17] [code]

Hierarchical Embedding Model for Personalized Product Search

posted Mar 4, 2018, 5:43 PM by Qingyao Ai   [ updated Apr 17, 2018, 2:11 PM ]


The unique characteristics of product search make search personalization essential for both customers and e-shopping companies. In this work, we propose a hierarchical embedding model to learn semantic representations for entities (i.e. words, products, users and queries) from different levels with their associated language data. [SIGIR'17] [code] [data]

Characterizing Email Search using Large-scale Behavioral Logs and Surveys

posted Mar 4, 2018, 5:38 PM by Qingyao Ai   [ updated Mar 4, 2018, 5:39 PM ]

As the number of email users and messages continues to grow, search is becoming more important for finding information in personal archives. In spite of its importance, email search is much less studied than web search, particularly using large-scale behavioral log analysis. 
In this project, we conduct a large-scale log analysis of email search and complement it with a survey to better understand email search intent and success.  [CHIIR'17] [WWW'17]

*This is an internship project in Microsoft Research.

Enhanced Paragraph Vector Model for Ad-hoc Retrieval

posted Feb 2, 2018, 8:41 AM by Qingyao Ai   [ updated Mar 4, 2018, 2:28 PM ]

Incorporating topic level estimation into language models has been shown to be beneficial for information retrieval (IR) models such as cluster-based retrieval and LDA-based document representation. Neural embedding models, such as paragraph vector (PV) models, on the other hand have shown their effectiveness and efficiency in learning semantic representations of documents and words in multiple Natural Language Processing (NLP) tasks. In this work, we study how to effectively use the PV model to improve ad-hoc retrieval. [SIGIR'16] [ICTIR'16] [code]

1-6 of 6