Lerot: an Online Learning to Rank Framework¶
This project is designed to run experiments on online learning to rank methods for information retrieval. Below is a short summary of its prerequisites, how to run an experiment, and possible extensions.
Prerequisites¶
- Python (2.7 or higher)
- PyYaml
- Numpy
- Scipy
- Celery
- Gurobi
(all prerequisites are included in the academic distribution of Enthought Python, e.g., version 7.1)
Installation¶
Install the prerequisites plus Lerot as follows:
$ pip install PyYAML numpy scipy celery
$ git clone https://bitbucket.org/ilps/lerot.git
$ cd lerot
$ python setup.py install
Running experiments¶
prepare data in svmlight format, e.g., download the MQ2007 (see next section on Data)
$ mkdir data $ wget http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2007.rar -O data/MQ2007.rar $ unrar x data/MQ2007.rar data/
prepare a configuration file in yml format, e.g., starting from the template below, store as
config/experiment.yml
(or simply useconfig/config.yml
instead )training_queries: data/MQ2007/Fold1/train.txt test_queries: data/MQ2007/Fold1/test.txt feature_count: 46 num_runs: 1 num_queries: 10 query_sampling_method: random output_dir: outdir output_prefix: Fold1 user_model: environment.CascadeUserModel user_model_args: --p_click 0:0.0,1:0.5,2:1.0 --p_stop 0:0.0,1:0.0,2:0.0 system: retrieval_system.ListwiseLearningSystem system_args: --init_weights random --sample_weights sample_unit_sphere --comparison comparison.ProbabilisticInterleave --delta 0.1 --alpha 0.01 --ranker ranker.ProbabilisticRankingFunction --ranker_arg 3 --ranker_tie random evaluation: - evaluation.NdcgEval
run the experiment using python:
$ python src/scripts/learning-experiment.py -f config/experiment.yml
summarize experiment outcomes:
$ python src/scripts/summarize-learning-experiment.py --fold_dirs outdir
Arbitrarily many folds can be listed per experiments. Results are aggregated over runs and folds. The output format is a simple text file that can be further processed using e.g., gnuplot. The columns are: mean_offline_perf stddev_offline_perf mean_online_perf stddev_online_perf
Data¶
You can download learning to rank data sets here:
- GOV: http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR3.0/Gov.rar (you’ll need files in QueryLevelNorm)
- OHSUMED: http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR3.0/OHSUMED.zip
- MQ2007: http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2007.rar (files for supervised learning)
- MQ2008: http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2008.rar (files for supervised learning)
- Yahoo!: http://webscope.sandbox.yahoo.com/catalog.php?datatype=c
- MSLR-WEB10K: http://research.microsoft.com/en-us/um/beijing/projects/mslr/data/MSLR-WEB10K.zip
- MSLR-WEB30K: http://research.microsoft.com/en-us/um/beijing/projects/mslr/data/MSLR-WEB30K.zip
Note that Lerot reads from both plain text and .gz files.
Extensions¶
The code is intended to be extended with new learning and/or feedback mechanisms for future experiments. The most obvious points for extension are:
- comparison - extend ComparisonMethod to add new interleaving or inference methods; existing methods include balanced interleave, team draft, and probabilistic interleave.
- retrieval_system - extend OnlineLearningSystem to add a new mechanism for learning from click feedback. New implementations need to be able to provide a ranked list for a given query, and ranking solutions should have the form of a vector.
License¶
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.