A MoreLikeThis (MLT) query is designed to generate information about "similar" documents using the MoreLikeThis functionality provided by Lucene. It supports faceting, paging, and filtering using CommonQueryParameters.
This query uses the Solr MoreLikeThis Handler that specifically returns MLT results. Alternatively you can use the MLT component for the select query.
Building a MLT query
See the example code below.
Available options:
Name | Type | Default value | Description |
---|---|---|---|
handler | string | select | Name of the Solr request handler to use, without leading or trailing slashes |
resultclass | string | Solarium_Result_Select | Classname for result. If you set a custom classname make sure the class is readily available (or through autoloading) |
documentclass | string | Solarium_Document_ReadWrite | Classname for documents in the resultset. If you set a custom classname make sure the class is readily available (or through autoloading) |
query | string | *:* | Query to execute |
start | int | 0 | Start position (offset) in the complete Solr query resultset, to paginate big resultsets. |
rows | int | 10 | Number of rows to fetch, starting from the 'start' (offset) position. It's a limit, you might get less. |
fields | string | *,score | Comma separated list of fields to fetch from Solr. There are two special values: '*' meaning 'all fields' and 'score' to also fetch the Solr document score value. |
sort | array | empty array | Array with sort field as key and sort order as values. Multiple entries possible, they are used in the order of the array. Example: array('price' => 'asc') |
stream | boolean | false | Set to true to post query content instead of using the URL param |
matchinclude | boolean | false | Specifies whether or not the response should include the matched document. If set to false, the response will look like a normal select response. |
matchoffset | int | 0 | Specifies an offset into the main query search results to locate the document on which the MoreLikeThis query should operate. |
interestingTerms | string | none | Controls how the handler presents the "interesting" terms. Must be one of: none, list, details. |
mltfields | string | null | The fields to use for similarity. NOTE: if possible, these should have a stored TermVector. Separate multiple fields with commas. |
minimumtermfrequency | int | null | Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. |
mimimumdocumentfrequency | int | null | Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. |
maximumdocumentfrequency | int | null | Maximum Document Frequency - the frequency at which words will be ignored which occur in more than this many docs. |
maximumdocumentfrequencypercentage | int | null | Maximum Document Frequency Percentage - a relative ratio at which words will be ignored which occur in more than this percentage of the docs in the index. |
minimumwordlength | int | null | Minimum word length below which words will be ignored. |
maximumwordlength | int | null | Maximum word length above which words will be ignored. |
maximumqueryterms | int | null | Maximum number of query terms that will be included in any generated query. |
maximumnumberoftokens | int | null | Maximum number of tokens to parse in each example doc field that is not stored with TermVector support. |
boost | boolean | null | If true the query will be boosted by the interesting term relevance. |
queryfields | string | null | Query fields and their boosts using the same format as that used in DisMaxQParserPlugin. These fields must also be specified in mltfields. Separate multiple fields with commas. |
Executing a MLT query
Use the moreLikeThis
method of the client to execute the query object. See the example code below.
Result of a MLT query
The result of a MLT query shares the features of the select query result. On top of that the following is added:
Interestingterms
This will show what "interesting" terms (the top TF/IDF terms) are used for the MoreLikeThis query.
The format of the interesting terms depends on the value set for interestingterms in the query.
list
: The terms are returned as an array of strings.details
: Each term is an array key associated with the boost value used by Solr. Unless you set boost totrue
, this will be1.0
for every term.none
: The terms aren't available with the resultset and an exception is thrown if you try and fetch them anyway.
Match
The document used for matching MLT results. Only available if matchinclude was set to true
in the query.
Setting up the MLT handler
The examples below assume an MLT handler is set up at /mlt
. Solr's example configsets don't include one by default.
In solrconfig.xml
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler" />
Through the Config API
<?php
require_once(__DIR__.'/init.php');
$client = new Solarium\Client($adapter, $eventDispatcher, $config);
$query = $client->createApi([
'version' => Solarium\Core\Client\Request::API_V1,
'handler' => 'techproducts/config',
'method' => Solarium\Core\Client\Request::METHOD_POST,
'rawdata' => json_encode([
'add-requesthandler' => [
'name' => '/mlt',
'class' => 'solr.MoreLikeThisHandler',
],
]),
]);
$client->execute($query);
Example
<?php
require_once(__DIR__.'/init.php');
htmlHeader();
// create a client instance
$client = new Solarium\Client($adapter, $eventDispatcher, $config);
// get a morelikethis query instance
$query = $client->createMoreLikethis();
// query a document you want similar documents for
$query->setQuery('id:SP2514N')
->setMltFields('manu,cat')
->setMinimumDocumentFrequency(1)
->setMinimumTermFrequency(1)
->setInterestingTerms('details')
->setBoost(true)
->setMatchInclude(true)
->createFilterQuery('stock')->setQuery('inStock:true');
// this executes the query and returns the result
$resultset = $client->moreLikeThis($query);
echo 'Document used for matching:<br/><table>';
foreach ($resultset->getMatch() as $field => $value) {
// this converts multivalue fields to a comma-separated string
if (is_array($value)) {
$value = implode(', ', $value);
}
echo '<tr><th>' . $field . '</th><td>' . $value . '</td></tr>';
}
echo '</table><hr/>';
// display the total number of MLT documents found by Solr
echo 'Number of MLT matches found: '.$resultset->getNumFound().'<br/><br/>';
// display the "interesting" terms for the query
echo 'Interesting terms with the boost value used:';
echo '<ul>';
foreach ($resultset->getInterestingTerms() as $term => $boost) {
echo '<li>'.$term.' (boost='.$boost.')</li>';
}
echo '</ul>';
echo '<b>Listing of matched docs:</b>';
// show MLT documents using the resultset iterator
foreach ($resultset as $document) {
echo '<hr/><table>';
// the documents are also iterable, to get all fields
foreach ($document as $field => $value) {
// this converts multivalue fields to a comma-separated string
if (is_array($value)) {
$value = implode(', ', $value);
}
echo '<tr><th>' . $field . '</th><td>' . $value . '</td></tr>';
}
echo '</table>';
}
htmlFooter();
Matching against supplied text
Instead of querying the index for a document to match against, you can also find similar documents based on supplied text.
In this case, there is no document to include when matchinclude is set to true
.
Example
This example assumes the /mlt
handler is already set up (see above).
<?php
require_once(__DIR__.'/init.php');
htmlHeader();
// create a client instance
$client = new Solarium\Client($adapter, $eventDispatcher, $config);
// get a morelikethis query instance
$query = $client->createMoreLikeThis();
// supply text you want similar documents for
$text = <<<EOT
Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133
7200RPM, 8MB cache, IDE Ultra ATA-133, NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor
EOT;
$query->setQuery($text);
$query->setQueryStream(true);
$query->setMltFields('name,features');
$query->setMinimumDocumentFrequency(1);
$query->setMinimumTermFrequency(1);
$query->createFilterQuery('stock')->setQuery('inStock:true');
$query->setInterestingTerms('details');
$query->setBoost(true);
// this executes the query and returns the result
$resultset = $client->moreLikeThis($query);
// display the total number of MLT documents found by Solr
echo 'Number of MLT matches found: '.$resultset->getNumFound().'<br/><br/>';
// display the "interesting" terms for the query
echo 'Interesting terms with the boost value used:';
echo '<ul>';
foreach ($resultset->getInterestingTerms() as $term => $boost) {
echo '<li>'.$term.' (boost='.$boost.')</li>';
}
echo '</ul>';
echo '<b>Listing of matched docs:</b>';
// show MLT documents using the resultset iterator
foreach ($resultset as $document) {
echo '<hr/><table>';
// the documents are also iterable, to get all fields
foreach ($document as $field => $value) {
// this converts multivalue fields to a comma-separated string
if (is_array($value)) {
$value = implode(', ', $value);
}
echo '<tr><th>' . $field . '</th><td>' . $value . '</td></tr>';
}
echo '</table>';
}
htmlFooter();