Lerot’s documentation

Packages

lerot.analysis

class lerot.analysis.HeatmapAnalysis(*parms)[source]

Bases: lerot.analysis.AbstractAnalysis.AbstractAnalysis

finish()[source]
class lerot.analysis.SummarizeAnalysis(*parms)[source]

Bases: lerot.analysis.AbstractAnalysis.AbstractAnalysis

finish()[source]

lerot.comparison

class lerot.comparison.BalancedInterleave(arg_str=None)[source]

Bases: lerot.comparison.AbstractInterleavedComparison.AbstractInterleavedComparison

Interleave and compare rankers using the original balanced interleave method.

infer_outcome(l, a, c, query)[source]
interleave(r1, r2, query, length)[source]
class lerot.comparison.DocumentConstraints(arg_str='random')[source]

Bases: lerot.comparison.AbstractInterleavedComparison.AbstractInterleavedComparison

Interleave using balanced interleave, compare using document constraints.

check_constraints(l, a, click_ids)[source]
infer_outcome(l, a, c, query)[source]
interleave(r1, r2, query, length)[source]
class lerot.comparison.HistBalancedInterleave(arg_str=None)[source]

Bases: lerot.comparison.AbstractHistInterleavedComparison.AbstractHistInterleavedComparison

Balanced interleave method, applied to historical data.

infer_outcome(l, a, c, target_r1, target_r2, query)[source]

count clicks within the top-k interleaved list

class lerot.comparison.HistDocumentConstraints(arg_str=None)[source]

Bases: lerot.comparison.AbstractHistInterleavedComparison.AbstractHistInterleavedComparison

Document constraints method, applied to historical data.

infer_outcome(l, a, c, target_r1, target_r2, query)[source]

count clicks within the top-k interleaved list

class lerot.comparison.HistProbabilisticInterleave(arg_str=None)[source]

Bases: lerot.comparison.AbstractHistInterleavedComparison.AbstractHistInterleavedComparison

Probabilistic interleaving using historical data

infer_outcome(l, source_context, c, target_r1, target_r2, query)[source]
class lerot.comparison.HistTeamDraft(arg_str=None)[source]

Bases: lerot.comparison.AbstractHistInterleavedComparison.AbstractHistInterleavedComparison

Team draft method, applied to historical data.

infer_outcome(l, a, c, target_r1, target_r2, query)[source]

assign clicks for contributed documents

class lerot.comparison.OptimizedInterleave(arg_str='')[source]

Bases: lerot.comparison.AbstractInterleavedComparison.AbstractInterleavedComparison

An implementation of Optimized Interleave as described in:

@see: Radlinski, F., & Craswell, N. (2013, February). Optimized interleaving for online retrieval evaluation. In Proceedings of the sixth ACM international conference on Web search and data mining (pp. 245-254).

@author: Anne Schuth @contact: anne.schuth@uva.nl @since: February 2013 @requires: Gurobi from http://www.gurobi.com/

binary_credit(li, rankA, rankB)[source]
f(i)[source]
infer_outcome(l, credit, clicks, query)[source]
interleave(r1, r2, query, length, bias=0)[source]
interleave_n(r1, r2, query, length, num_repeat, bias=0)[source]
inverse_credit(li, rankA, rankB)[source]
linear_credit(li, rankA, rankB)[source]
perm_given_index(alist, apermindex)[source]

See http://stackoverflow.com/questions/5602488/random-picks-from-permutation-generator

precompute_rank(R)[source]
prefix_constraint(rankings, length)[source]
prefix_constraint_bound(rankings, length, prefix_bound)[source]
rank(li, R)[source]
reject(l, rankings)[source]
sample(docs, length)[source]
sample_prefix_constraint(rankings, length)[source]
sample_prefix_constraint_constructive(rankings, length)[source]
class lerot.comparison.OptimizedInterleaveVa(arg_str=None)[source]

Bases: lerot.comparison.OptimizedInterleave.OptimizedInterleave

precompute_rank_va(R)[source]
prefix_constraint_va(rankings, length)[source]
class lerot.comparison.ProbabilisticInterleave(arg_str=None)[source]

Bases: lerot.comparison.AbstractInterleavedComparison.AbstractInterleavedComparison

Probabilistic interleaving, marginalizes over assignments

get_probability_of_list(result_list, context, query)[source]
infer_outcome(l, a, c, query)[source]
interleave(r1, r2, query, length)[source]
class lerot.comparison.ProbabilisticInterleaveWithHistory(arg_str)[source]

Bases: lerot.comparison.ProbabilisticInterleave.ProbabilisticInterleave

Probabilistic interleaving that reuses historic data (with importance sampling).

infer_outcome(l, context, c, query)[source]
class lerot.comparison.StochasticBalancedInterleave(arg_str)[source]

Bases: lerot.comparison.AbstractInterleavedComparison.AbstractInterleavedComparison

Interleave and compare rankers using the stochastic interleave method introduced in Hofmann et al. ECIR‘11.

infer_outcome(l, a, c, query)[source]
interleave(r1, r2, query, length)[source]
class lerot.comparison.TeamDraft(arg_str=None)[source]

Bases: lerot.comparison.AbstractInterleavedComparison.AbstractInterleavedComparison

Baseline team draft method.

infer_outcome(l, a, c, query)[source]

assign clicks for contributed documents

interleave(r1, r2, query, length1=None)[source]

updated to match the original method

class lerot.comparison.VaTdi(arg_str=None)[source]

Bases: lerot.comparison.TeamDraft.TeamDraft

Algorithm described in https://bitbucket.org/varepsilon/tois2013-interleaving

interleave(r1, r2, query, length=None)[source]
static sampleSmoothly(a, b, maxVal)[source]

lerot.document

class lerot.document.Document(docid, doctype='Web')[source]

Bases: object

get_id()[source]
get_type()[source]
set_type(doctype)[source]

lerot.evaluation

class lerot.evaluation.AsRbpEval(alpha=10, beta=0.8)[source]

Bases: lerot.evaluation.AbstractEval.AbstractEval

Compute AS_RBP metric as described in [1].

[1] Zhou, K. et al. 2012. Evaluating aggregated search pages. SIGIR. (2012).

get_value(ranking, labels, orientations, cutoff=-1)[source]
class lerot.evaluation.DcgEval[source]

Bases: lerot.evaluation.AbstractEval.AbstractEval

Compute DCG (with gain = 2**rel-1 and log2 discount).

evaluate_ranking(ranking, query, cutoff=-1)[source]

Compute DCG for the provided ranking. The ranking is expected to contain document ids in rank order.

get_dcg(ranked_labels, cutoff=-1)[source]

Get the dcg value of a list ranking. Does not check if the numer for ranked labels is smaller than cutoff.

get_value(ranking, labels, orientations, cutoff=-1)[source]

Compute the value of the metric - ranking contains the list of documents to evaluate - labels are the relevance labels for all the documents, even those

that are not in the ranking; labels[doc.get_id()] is the relevance of doc
  • orientations contains orientation values for the verticals; orientations[doc.get_type()] is the orientation value for the doc (from 0 to 1).
class lerot.evaluation.NdcgEval[source]

Bases: lerot.evaluation.DcgEval.DcgEval

Compute NDCG (with gain = 2**rel-1 and log2 discount).

evaluate_ranking(ranking, query, cutoff=-1)[source]

Compute NDCG for the provided ranking. The ranking is expected to contain document ids in rank order.

get_value(ranking, labels, orientations, cutoff=-1)[source]
class lerot.evaluation.LetorNdcgEval[source]

Bases: lerot.evaluation.NdcgEval.NdcgEval

Compute NDCG as implemented in the Letor toolkit.

get_dcg(labels, cutoff=-1)[source]
class lerot.evaluation.VSEval[source]

Bases: lerot.evaluation.AbstractEval.AbstractEval

Simple vertical selection (VS) metric, a.k.a. prec_v.

get_value(ranking, labels, orientations, cutoff=-1)[source]
class lerot.evaluation.VDEval[source]

Bases: lerot.evaluation.AbstractEval.AbstractEval

Simple vertical selection (VD) metric, a.k.a. rec_v.

get_value(ranking, labels, orientations, cutoff=-1)[source]
class lerot.evaluation.ISEval[source]

Bases: lerot.evaluation.AbstractEval.AbstractEval

Simple vertical selection (IS) metric, a.k.a. mean-prec.

get_value(ranking, labels, orientations, cutoff=-1)[source]
class lerot.evaluation.RPEval[source]

Bases: lerot.evaluation.AbstractEval.AbstractEval

Simple vertical selection (RP) metric, a.k.a. corr.

get_value(ranking, labels, orientations, cutoff=-1, ideal_ranking=None)[source]
class lerot.evaluation.LivingLabsEval[source]
get_performance()[source]
get_win()[source]
update_score(wins)[source]
class lerot.evaluation.PAKEval[source]

Bases: lerot.evaluation.AbstractEval.AbstractEval

Precision at k evaluation. Relevant document in ranking up to index k

evaluate_ranking(ranking, query, cutoff=-1)[source]

lerot.environment

class lerot.environment.CascadeUserModel(arg_str)[source]

Bases: lerot.environment.AbstractUserModel.AbstractUserModel

Defines a cascade user model, simulating a user that inspects results starting from the top of a result list.

get_clicks(result_list, labels, **kwargs)[source]

simulate clicks on list l

class lerot.environment.FederatedClickModel(arg_str)[source]

Bases: lerot.environment.AbstractUserModel.AbstractUserModel

b(i, vert)[source]
static getParamRescaled(rank, serp_len, param_vector)[source]
static getVertClass(vert_type)[source]
get_clicks(result_list, labels, **kwargs)[source]

Simulate clicks on the result_list. - labels contain relevance labels indexed by the docid

get_examination_prob(result_list, **kwargs)[source]
h(i, serp_len, vert)[source]
p(i, serp_len)[source]
class lerot.environment.PositionBasedUserModel(p)[source]

Bases: lerot.environment.AbstractUserModel.AbstractUserModel

Defines a positions based user model.

get_clicks(result_list, labels, **kwargs)[source]

simulate clicks on list l

get_examination_prob(result_list, **kwargs)[source]
p(i)[source]
class lerot.environment.RandomClickModel(p=0.5)[source]

Bases: lerot.environment.AbstractUserModel.AbstractUserModel

Defines a positions based user model.

get_clicks(result_list, labels, **kwargs)[source]

simulate clicks on list l

class lerot.environment.LivingLabsRealUser(key, doc_ids)[source]

Bases: lerot.environment.AbstractUserModel.AbstractUserModel

KEY = ''
get_clicks(result_list, labels, **kwargs)[source]
get_win(query, feedback_list, lerot_ranked_list)[source]

Used for seznam site which interleaves ranked list with it’s own list Returns ‘ranked list winner’ with number of clicks of each ranker e.g. [0 2] where [lerot_list_score seznam_list_score]

runs = {}
upload_run(query, upload_list, runid)[source]

Uploads a run to living-labs api.

class lerot.environment.RelevantUserModel(arg_str)[source]

Bases: lerot.environment.AbstractUserModel.AbstractUserModel

Defines a user model that clicks on all relevant documents in a list with an optional limit

get_clicks(result_list, labels, **kwargs)[source]

lerot.experiment

class lerot.experiment.GenericExperiment(args_str=None)[source]
run()[source]
run_experiment(aux_log_fh)[source]
class lerot.experiment.LearningExperiment(training_queries, test_queries, feature_count, log_fh, args)[source]

Bases: lerot.experiment.AbstractLearningExperiment.AbstractLearningExperiment

Represents an experiment in which a retrieval system learns from implicit user feedback. The experiment is initialized as specified in the provided arguments, or config file.

run()[source]

A single run of the experiment.

class lerot.experiment.MetaExperiment[source]
apply(conf)[source]
finish_analytics()[source]
run_celery()[source]
run_conf()[source]
run_local()[source]
store(conf, r)[source]
update_analytics()[source]
update_analytics_file(log_file)[source]
class lerot.experiment.PrudentLearningExperiment(training_queries, test_queries, feature_count, log_fh, args)[source]

Bases: lerot.experiment.AbstractLearningExperiment.AbstractLearningExperiment

Represents an experiment in which a retrieval system learns from implicit user feedback. The experiment is initialized as specified in the provided arguments, or config file.

run()[source]

Run the experiment num_runs times.

class lerot.experiment.HistoricalComparisonExperiment(queries, feature_count, log_fh, args)[source]

Represents an experiment in which rankers are compared using interleaved comparisons with live and historic click data.

run()[source]

Run the experiment for num_queries queries.

class lerot.experiment.SingleQueryComparisonExperiment(query_dir, feature_count, log_fh, args)[source]

Represents an experiment in which rankers are compared using interleaved comparisons on a single query.

run()[source]

Run the experiment for num_queries queries.

class lerot.experiment.SyntheticComparisonExperiment(log_fh, args)[source]

Represents an experiment in which synthetic rankers are compared to investigate theoretical properties / guarantees.

run()[source]

Run the experiment for num_queries queries.

class lerot.experiment.VASyntheticComparisonExperiment(log_fh, args)[source]

Represents an experiment in which synthetic rankers are compared to investigate theoretical properties / guarantees.

static block_counts(l)[source]
static block_position1(l, result_length)[source]
static block_sizes(l)[source]
static generate_ranking_pair(result_length, num_relevant, pos_method='beyondten', vert_rel='non-relevant', block_size=3, verticals=None, fixed=False, dominates=<function <lambda>>)[source]

Generate pair of synthetic rankings. Appendix A, https://bitbucket.org/varepsilon/tois2013-interleaving

static get_online_metrics(clicks, ranking)[source]
init_rankers(query)[source]

Init rankers for a query

Since the ranker may be stateful, we need to init it every time we access its documents.

run()[source]

lerot.ranker

class lerot.ranker.DeterministicRankingFunction(ranker_arg_str, ties, feature_count, init='random', sample='sample_unit_sphere')[source]

Bases: lerot.ranker.AbstractRankingFunction.AbstractRankingFunction

document_count()[source]
getDocs(numdocs=None)[source]

Copied from StatelessRankingFunction.

get_document_probability(docid)[source]

get probability of producing doc as the next document drawn

init_ranking(query)[source]
next()[source]

produce the next document

next_det()[source]
next_random()[source]

produce a random next document

rm_document(docid)[source]

remove doc from list of available docs and adjust probabilities

class lerot.ranker.ModelRankingFunction[source]

Bases: lerot.ranker.StatelessRankingFunction.StatelessRankingFunction

add_doc_for_query(query, doc)[source]
init_ranking(query)[source]
update_weights(new_weights)[source]
class lerot.ranker.ProbabilisticRankingFunction(ranker_arg_str, ties, feature_count, init='random', sample='sample_unit_sphere')[source]

Bases: lerot.ranker.AbstractRankingFunction.AbstractRankingFunction

document_count()[source]
getDocs(numdocs=None)[source]

Copied from StatelessRankingFunction.

get_document_probability(docid)[source]

get probability of producing doc as the next document drawn

get_ranking()[source]
init_ranking(query)[source]
next()[source]

produce the next document by random sampling, or deterministically

next_det()[source]
next_random()[source]

produce a random next document

rm_document(docid)[source]

remove doc from list of available docs and adjust probabilities

class lerot.ranker.StatelessRankingFunction(ranker_arg_str, ties, feature_count, init='random', sample='sample_unit_sphere')[source]

Bases: lerot.ranker.AbstractRankingFunction.AbstractRankingFunction

document_count()[source]
getDocs(numdocs=None)[source]

More efficient and less error-prone version of getDocs.

init_ranking(query)[source]

Initialize ranking for particular query.

Since AbstractRankingFunction has a next() function that changes a state, we need to have a support for that. You need to set self.docs and the only stateful object self.doc_idx

next()[source]
next_det()[source]
next_random()[source]
rm_document(doc)[source]
verticals(length=None)[source]
class lerot.ranker.SyntheticDeterministicRankingFunction(synthetic_docs)[source]

Bases: lerot.ranker.StatelessRankingFunction.StatelessRankingFunction

Synthetic deterministic ranker.

get_document_probability(doc)[source]

Get probability of producing doc as the next document drawn.

init_ranking(query)[source]
update_weights(new_weights)[source]
class lerot.ranker.SyntheticProbabilisticRankingFunction(ranker_arg_str, ties='random')[source]

Bases: lerot.ranker.ProbabilisticRankingFunction.ProbabilisticRankingFunction

Synthetic ranker for use in this experiment only

get_document_probability(docid)[source]

get probability of producing doc as the next document drawn

init_ranking(synthetic_docids)[source]
rm_document(docid)[source]

remove doc from list of available docs, adjust probabilities

update_weights(new_weights)[source]

not required under synthetic data

lerot.retrieval_system

class lerot.retrieval_system.ListwiseLearningSystem(feature_count, arg_str)[source]

Bases: lerot.retrieval_system.AbstractLearningSystem.AbstractLearningSystem

A retrieval system that learns online from listwise comparisons. The system keeps track of all necessary state variables (current query, weights, etc.) so that comparison and learning classes can be stateless (implement only static / class methods).

get_ranked_list(query, getNewCandidate=True)[source]
get_solution()[source]
update_solution(clicks)[source]
class lerot.retrieval_system.PrudentListwiseLearningSystem(feature_count, arg_str)[source]

Bases: lerot.retrieval_system.AbstractLearningSystem.AbstractLearningSystem

A retrieval system that learns online from listwise comparisons. The system keeps track of all necessary state variables (current query, weights, etc.) so that comparison and learning classes can be stateless (implement only static / class methods).

get_outcome(clicks)[source]
get_ranked_list(query, getNewCandidate=True)[source]
get_solution()[source]
update_solution()[source]
class lerot.retrieval_system.ListwiseLearningSystemWithCandidateSelection(feature_count, arg_str)[source]

Bases: lerot.retrieval_system.ListwiseLearningSystem.ListwiseLearningSystem

A retrieval system that learns online from listwise comparisons, and pre-selects exploratory rankers using historic data.

select_candidate_beat_the_mean(candidate_us)[source]
select_candidate_random(candidates)[source]
select_candidate_repeated(candidates)[source]

Selects a ranker in randomized matches. Ranker pairs are sampled uniformly and compared over a number of historical samples. The outcomes observed over these samples are averaged (with / without importance sampling). The worse-performing ranker is removed from the pool. If no preference is found, the ranker to be removed is selected randomly. The final ranker in the pool is returned. This selection method assumes transitivity.

select_candidate_simple(candidates)[source]

Selects a ranker in randomized matches. For each historic data point two rankers are randomly selected from the pool and compared. If a ranker loses the comparison, it is removed from the pool. If there is more than one ranker left when the history is exhausted, a ranker is randomly selected from the remaining pool. This selection method assumes transitivity (a ranker that loses against one ranker is assumed to not be the best ranker).

class lerot.retrieval_system.PairwiseLearningSystem(feature_count, arg_str)[source]

Bases: lerot.retrieval_system.AbstractLearningSystem.AbstractLearningSystem

A retrieval system that learns online from pairwise comparisons. The system keeps track of all necessary state variables (current query, weights, etc.).

get_ranked_list(query)[source]
get_solution()[source]
initialize_weights(method, feature_count)[source]
sample_fixed(n)[source]
sample_unit_sphere(n)[source]

See http://mathoverflow.net/questions/24688/efficiently-sampling- points-uniformly-from-the-surface-of-an-n-sphere

update_solution(clicks)[source]

“Ranker weights are updated after each observed document pair. This means that a pair may have been misranked when the result list was gen- erated, but is correctly labeled after an earlier update based on a higher-ranked pair from the same list.

class lerot.retrieval_system.SamplerSystem(feature_count, arg_str, run_count='')[source]

Bases: lerot.retrieval_system.AbstractLearningSystem.AbstractLearningSystem

get_ranked_list(query)[source]
get_solution()[source]
update_solution(clicks)[source]
class lerot.retrieval_system.PerturbationLearningSystem(feature_count, arg_str)[source]

Bases: lerot.retrieval_system.AbstractLearningSystem.AbstractLearningSystem

A retrieval system that learns online from pairwise comparisons. The system keeps track of all necessary state variables (current query, weights, etc.) so that comparison and learning classes can be stateless (implement only static / class methods).

get_ranked_list(query)[source]
get_solution()[source]
update_solution(clicks)[source]

Update the ranker weights

while keeping in mind that documents with a relevance of > 1 are clicked more than once

update_solution_once(clicks)[source]

Update the ranker weights without regard to multiple clicks on a single link

lerot.query

Interface to query data with functionality for reading queries from svmlight format, both sequentially and in batch mode.

class lerot.query.Query(qid, feature_vectors, labels=None, comments=None)[source]
get_comment(docid)[source]
get_comments()[source]
get_docids()[source]
get_document_count()[source]
get_feature_vector(docid)[source]
get_feature_vectors()[source]
get_ideal()[source]
get_label(docid)[source]
get_labels()[source]
get_prediction(docid)[source]
get_predictions()[source]
get_qid()[source]
has_ideal()[source]
set_feature_vector(docid, feature_vector)[source]
set_ideal(ideal)[source]
set_label(docid, label)[source]
set_labels(labels)[source]
set_predictions(predictions)[source]
write_to(fh, sparse=False)[source]
class lerot.query.Queries(fh, num_features, preserve_comments=False)[source]

a list of queries with some convenience functions

get_feature_vectors()[source]
get_labels()[source]
get_predictions()[source]
get_qids()[source]
get_query(index)[source]
get_size()[source]
keys()[source]
set_predictions()[source]
values()[source]
class lerot.query.QueryStream(fh, num_features, preserve_comments=False)[source]

iterate over a stream of queries, only keeping one query at a time

next()[source]
read_all()[source]
lerot.query.load_queries(filename, features, preserve_comments=False)[source]

Utility method for loading queries from a file.

lerot.query.write_queries(filename, queries)[source]

Utility method for writing queries to a file. Returns the number of queries written

Indices and tables