The most powerful aspect of bookworm is its ability to filter down to a custom set of fields. This is done by setting a number of constraints on search_limits or compare_limits fields.
As described in 4.2, search_limits and compare_limits both take the same syntax: for this section, I'll just use search_limits but understand that they mean both.
Every bookworm may have different metadata fields: these examples are for a fictitious database of books.
A few keys should be supported in all bookworms, though:
-
Word searches
- The one term that is pre-defined in (almost) all bookworms is the term
"word"(which matches, despite its name, any phrase up to the number of supported grams). unigramorbigramcan be used as a synonym for word. This is particularly useful in searching for groups.haswordcan limit the search to only books that containing the specified word. (Not currently supported, but part of the specification)-
In progress: It's possible this should be retitled
$haswordfromhaswordto better match the syntax below.
-
- The one term that is pre-defined in (almost) all bookworms is the term
In progress: There should also be a syntax built in to support proximity searches: these are not possible under MySQL, but would be under Solr. I'd suggest something along the lines of
{"near":["foo",5]}where foo is the word and five is the proximity range.
To limit a query categorically, pass an array consisting of the allowed keys. For example, to limit to women and books published in the United States or Germany in 1890, pass
"search_limits":{
"author_gender":["Female"],
"country":["United States","Germany","East Germany","West Germany"],
"publish_year":[1890]
}
Queries bounded in an array are by default treated as an "or" construction within the array: documents matching any limit will be returned.
Each of the elements, on the other hand, is treated as an AND construction: returned documents must match at least one item for EACH of the keys entered.
If a string or numeric value is passed rather than an array, the API will automatically convert that to a single-element array.
An alternate way to express categorical limits is to use a dictionary rather than array, and pass the special "$eq" key as a value pointing to the limitations. The above query could also be written as:
"search_limits":{
"author_gender":{"$eq":"Female"},
"country":{
"$eq":["United States","Germany","East Germany","West Germany"]
},
"publish_year":{"$eq":1890}
}
This long syntax is borrowed from MongoDB. It is verbose and not useful for most queries: but is provided for completeness because more complicated API calls require other keys (such as "$ne") which take the same form.
To define a search by negative matches, you can pass a hash rather than an array and use the special key "$ne". So to search for books published outside the United States or Canada, you would pass:
{"country":{"$ne":["United States","Canada"]}}
An "$ne" limitation will reject any books published in either of those places.
Numeric values can also be searched with range queries. The syntax for this is to pass a hash and use special keys from the following list:
$gteGreater than or equal to$gtGreater than$lteLess than or equal to$gtLess than
So to limit to books published between 1900 and 1950, either of the following two queries would work:
{"publish_year":{"$lte":1950,"$gte":1900}
{"publish_year":{"$lt":1951,"$gt":1899}
More complicated boolean statements are possible by using the special $or construction. (This syntax is also borrowed from MongoDB). $or points to an array, any element of which might be true. To search for books that are either published in the US or by US-born authors, for example, you might limit as follows:
{"$or":[
{"country":["USA"]},
{"author_birth_country":["USA"]}
]}
$and is the opposite of $or: so the following query:
"search_limits":{"$and":[{"country":["USA"],"author_birth_country":["USA"]}]}
would return all books published both in the United States and by American-born authors. This and-construction will never be necessary in a base-level search, because all grouped limitations are and-queries by default: the above is exactly equivalent to
"search_limits"{"country":["USA"],"author_birth_country":["USA"]}
Like "$eq" (above), "$and" is included for completeness and so that it can specified in deeply nested queries.
These constructions can be nested arbitrarily to create complicated combinations of "and" and "or" queries. To construct a query, for example, that encompasses authors whose political party is the same as the sitting US president's in the last few decades, you might type the following monstrosity:
"search_limits":{
"$or":[
{
"year":{
"$or":[
{"$gte":1980,"$lte":1992},
{"$gte":2001,"$lte":2008}
]
},
"author_party":"Republican"
},
{
"year":{
"$or":[
{"$gte":1993,"$lte":2000},
{"$gte":2009}
]
},
"author_party":"Democrat"
}
]
}
You can also limit by regular expressions rather than complete matches by using the "$grep" key.
MySQL-supported regular expressions can be used. To match any of several spellings of the publisher "Little, Brown" you might search for:
"search_limits":{"publisher":{"$grep":"Little,? Brown ((and|&) ?[Cc]o\.?)?"}
Yuck, huh?
Question: Should this be retitled
$re?
Should it be possible to limit not just by results, but by counttypes? For instance, to only return results where the specified groups have a wordcount over 10, you could search:
"search_limits":{"WordCount":{"$gte":10}}