The Daisy Query Language can be used to search for documents (more precisely, document variants). In the Daisy Wiki, queries can be used in various places:
The implementation of various Daisy Wiki features is also based on queries, such as the recent changes page or the referrers page. And of course it is possible to execute queries from your own applications, using the HTTP interface or Java API.
The query language is a somewhat SQL-like language that allows to search on various document properties (including the fields), fulltext on the part content, or a combination of those. The sort order of the results can also be defined. The resulting document list is filtered to only include documents to which the user has at least read-live access.
An example query, searching all documents in a collection call "mycollection":
select id, name where InCollection('mycollection') order by name
Internally, non-fulltext queries are translated to SQL and executed on the relational database while fulltext queries are executed by Jakarta Lucene.
Although the query language is somewhat SQL-like, it hides the complexity of the actual SQL-queries that are performed by the repository server on the relational database, which can quickly grow quite complex.
Note: every time in this document when we talk about "searching documents", this is equivalent to "searching document variants". The result of query is a set of document variants, i.e. each member of the result set is identified by a triple (document ID, branch, language).
select ... where ... order by ... limit x option ...
The select and where parts are required, the rest is optional. Whitespace is of no importance.
The select part should list one or more value expressions, separated by commas. A value expression can be an identifier, a literal or a function call. This is described in more detail further on.
The where part should contain a predicate expression, thus an expression which tests the value of value expressions using operators, or uses some built-in conditions.
Besides the operators listed in the table below, the operations AND and OR are supported, and parentheses can be used for grouping.
|
string |
long |
double |
decimal |
date |
datetime |
boolean |
|
|---|---|---|---|---|---|---|---|
|
= |
X |
X |
X |
X |
X |
X |
X |
|
!= |
X |
X |
X |
X |
X |
X |
X |
|
< |
X |
X |
X |
X |
X |
X |
|
|
> |
X |
X |
X |
X |
X |
X |
|
|
<= |
X |
X |
X |
X |
X |
X |
|
|
>= |
X |
X |
X |
X |
X |
X |
|
|
[NOT] LIKE |
X |
||||||
|
[NOT] BETWEEN |
X |
X |
X |
X |
X |
X |
|
|
[NOT] IN |
X |
X |
X |
X |
X |
X |
|
|
IS [NOT] NULL |
X |
X |
X |
X |
X |
X |
X |
Wildcards for LIKE are _ and %, escape using \_ and \%.
All keywords such as AND, LIKE, BETWEEN, ... can be written in either uppercase or lowercase (but not mixed case).
If these operators are used on multi-value fields, they return true if at least one of the values of the multi-value field satisfies. See further on for a set of conditions specifically for multi-value fields.
A value expression is:
A function call usually has the following form:
functionName(arg1, arg2, ...)
However, for the basic mathematical functions (addition, subtraction, multiplication and division) "infix" notation is used instead, using the symbols +, -, * and /. Parentheses can be used to influence the order of the operations.
The table below lists the available identifiers.
Some notes:
|
name |
searchable |
datatype |
version dependent |
remarks |
|---|---|---|---|---|
|
id |
yes |
string |
no |
|
|
namespace |
yes |
string |
no |
The namespace part of the document ID |
|
name |
yes |
string |
yes |
|
|
branch |
yes |
symbolic |
no |
|
|
branchId |
yes |
long |
no |
|
|
language |
yes |
symbolic |
no |
|
|
languageId |
yes |
long |
no |
|
|
link |
yes |
link |
no |
The current document variant as a link. This is useful for comparison with link type fields. For example $someLinkField = link to find documents which link to themselves in a certain field, or $someLinkField = ContextDoc(link) to find documents which link to the context document. |
|
documentType |
yes |
symbolic |
no |
|
|
versionId |
yes |
long |
yes |
ID of the live version, or if the query option search_last_version is specified, of the last version |
|
creationTime |
yes |
datetime |
no |
|
|
ownerId |
yes |
long |
no |
|
|
ownerLogin |
yes |
symbolic |
no |
|
|
ownerName |
no |
string |
no |
|
|
summary |
no |
string |
no |
always of last published version |
|
retired |
yes |
boolean |
no |
|
|
private |
yes |
boolean |
no |
|
|
lastModified |
yes |
datetime |
no |
|
|
lastModifierId |
yes |
long |
no |
|
|
lastModifierLogin |
yes |
symbolic |
no |
|
|
lastModifierName |
no |
string |
no |
|
|
variantLastModified |
yes |
datetime |
no |
|
|
variantLastModifierId |
yes |
long |
no |
|
|
variantLastModifierLogin |
yes |
symbolic |
no |
|
|
variantLastModifierName |
yes |
string |
no |
|
|
%partTypeName.mimeType |
yes |
string |
yes |
|
|
%partTypeName.size |
yes |
long |
yes |
|
|
%partTypeName.content |
no |
xml |
yes |
only works for part types for which the flag 'daisy html' is set to true, and additionally the actual part must have the mime type 'text/xml' |
|
versionCreationTime |
yes |
datetime |
yes |
|
|
versionCreatorId |
yes |
long |
yes |
|
|
versionCreatorLogin |
yes |
symbolic |
yes |
|
|
versionCreatorName |
yes |
string |
yes |
|
|
versionState |
yes |
symbolic |
yes |
'draft' or 'publish' |
|
totalSizeOfParts |
yes |
long |
yes |
sum of the size of all parts in document |
|
versionStateLastModified |
yes |
datetime |
yes |
|
|
lockType |
yes |
symbolic |
no |
'pessimistic' or 'warn' |
|
lockTimeAcquired |
yes |
datetime |
no |
|
|
lockDuration |
yes |
long |
no |
(in milliseconds) |
|
lockOwnerId |
yes |
long |
no |
|
|
lockOwnerLogin |
yes |
symbolic |
no |
|
|
lockOwnerName |
no |
string |
no |
|
|
collections |
yes |
symbolic |
no |
The collections (the names of the collections) the document belongs too. Behaves the same as a multi-value field with respect to applicable search conditions. |
|
collections.valueCount |
yes |
symbolic |
no |
The number of collections a document belongs too. |
|
$fieldTypeName |
yes |
yes |
datatype depends on field type |
|
|
$fieldTypeName.valueCount |
yes |
long |
yes |
Useful for multi-value fields. Searching for a value count of 0 does not work, use the "is null" condition instead. |
|
$fieldTypeName.documentId |
yes |
string |
yes |
These special field sub-identifiers are only supported on fields of the type "link". For link field types, the $fieldTypeName identifier checks on the document ID, while these identifiers can be used to check on the branch and language. |
|
$fieldTypeName.branch |
yes |
symbolic |
yes |
|
|
$fieldTypeName.branchId |
yes |
long |
yes |
|
|
$fieldTypeName.language |
yes |
symbolic |
yes |
|
|
$fieldTypeName.languageId |
yes |
long |
yes |
|
| $fieldTypeName
.namespace |
yes |
string |
yes |
|
|
#customFieldName |
yes |
string |
no |
|
|
score |
no |
double |
no |
The score of a document after doing a full text search. This score ranges from 0-1. When this identifier is used without the FullText() function it will just return 0. |
Strings (text) should be put between single quotes, the single quote is escaped by doubling it, for example:
'''t is mooi weer vandaag'
These consists of digits (0-9), the decimal separator is a dot (.).
Numeric literals can be put between single quotes like strings, but it is not required to do so.
Date format: 'YYYY-MM-DD'
Datetime format: 'YYYY-MM-DD HH:MM:SS'
When searching on fields of type "link", the link should be specified as:
'daisy:docid' (assumes branch main and language default) 'daisy:docid@branch' (assumes language default) 'daisy:docid@branch:lang' (branch can be left blank which defaults to main branch)
Branch and language can be specified either by name or ID.
So a search condition could be for example:
$someLinkField = 'daisy:35'
$fieldName has all (value1, value2, value3, ...)
Tests that the multi-value field has all the specified values (and possibly more).
$fieldName has exactly (value1, value2, value3, ...)
Tests that the multi-value field has all the specified values, and none more. The order is not important.
$fieldName has some (value1, value2, value3, ...) or $fieldName has any (value1, value2, value3, ...)
has some and has any are synonyms. They test that the multi-value field has at least one of the specified values.
$fieldName has none (value1, value2, value3, ...)
Tests that the multi-value field has none of the specified values.
In addition to these conditions, you can use is null and is not null to check if a document has a certain (multi-value) field. The special sub-identifier $fieldName.valueCount can be used to check the number of values a multi-value field has.
For searching on hierarchical fields, a special matchesPath condition is available. It takes as argument an expression in which the elements of the hierarchical path are separated by a slash. For example, a basic usage is:
$fieldName matchesPath('/A/B/C')
$fieldName matchesPath('A/B/C') -> the initial slash is optional
This would return all documents for which the hierarchical field has as value the path A/B/C.
The values should be entered using the correct literal syntax corresponding to the type of the field. For example, for link type fields, you would use:
matchesPath('daisy:10-FOO/daisy:11-FOO')
It is possible to use wildcards (placeholders) in the expression, namely * and **. One stars (*) matches one path part. Two stars (**) matches multiple path parts. Two stars can only be used at the very start or at the very end of the expression (not at both ends at the same time). Some examples to give an idea of what's possible:
$fieldName matchesPath('/A/*')
$fieldName matchesPath('/A/*/*')
$fieldName matchesPath('/A/*/B')
$fieldName matchesPath('/*/*/*') -> finds all hierarchical paths of length 3
$fieldName matchesPath('/*/*/**') -> finds all hierarchical paths of at least length 3
$fieldName matchesPath('/A/**') -> finds all paths of any lenght starting on A
thus e.g. A/B, A/B/C or A/B/C/D.
$fieldName matchesPath('**/A') -> finds all paths ending on A
The matchesPath condition can also be used to search on multi-value hierarchical fields, in which case it will evaluate to true if at least one of the values of the multi-value field matches the path expression.
The special multi-value conditions such as 'has all', 'has some', etc. can also be used. There is no special syntax to specify hierarchical path literals in the query language, but they can be entered by using the Path function. For example:
$fieldName has all ( Path('/A/B/C'), Path('/X/Y/Z') )
The hierarchical paths specified using the Path function do not support wildcards.
When using the equals operator (=) or other binary operators with hierarchical fields, it will evaluate to true as long as there is one element in the hierarchy path which has the given value. For example, $MyField = 'b' will match a field whose value is "/a/b/c". This is similar to the behaviour of this operator for multivalue fields.
When an expression returns a link as value (most often this is in the form a link field identifier, e.g. $SomeLinkField), then it is possible to 'walk through' this link to access properties of the linked-to document. This is known as link dereferencing.
The link dereferencing operator is written as "=>". Notations for dereferencing in other languages are sometimes dot (.) or "->", however since dash is a valid character in identifiers in Daisy, and dot is already used to access 'sub-field identifiers' (like #SomePart.mimeType), these could not be used.
[link expression]=>[identifier]
A practical example:
select name, $SomeLinkField=>name where $SomeLinkField=>name like 'A%' order by $SomeLinkField=>name
As shown in this example, the link dereferencing operator works in the select, where and order by parts of the query.
Link dereferencing can work multiple levels deep, e.g.
$SomeLink=>$SomeOtherLink=>name
If documents are linked together with the same type of field, this could of course be something like:
$SomeLink=>$SomeLink=>$SomeLink=>name
When dereferencing a link in the where-clause of the query, but one does not have access to the dereferenced document, then the evaluation of the where clause will be considered as 'false', e.g. the row will be excluded from the result set, since without access to the document it is not possible to know if it would evaluate to 'true'. Accessing non-accessible values in the select or order-by clauses will return a 'null' value.
InCollection('collectionname' [, collectionname, collectionname])
Searches documents contained in at least one of the specified collections. To search documents that occur in multiple collections (thus in the intersection of those collections), use the function InCollection multiple times with AND in between: InCollection('collection1') and InCollection('collection2'). This also works for OR but in that case it is more efficient to give the collections as arguments to one InCollection call.
Instead of the InCollection condition, you can use the collections identifier in combination with the multi-value field search conditions such as has some, has all or has none for more powerful search possibilities. The InCollection condition predates the existence of multi-value fields, but remains supported.
LinksTo(documentId, inLastVersion, inLiveVersion [, linktypes]) LinksFrom(documentId, inLastVersion, inLiveVersion [, linktypes]) LinksToVariant(documentId, branch, language, inLastVersion, inLiveVersion [, linktypes]) LinksFromVariant(documentId, branch, language, inLastVersion, inLiveVersion [, linktypes])
Searches documents which link to or from the specified document (or document variant). The other two parameters, inLastVersion and inLiveVersion, are interpreted as booleans: 0 is false, any other (numeric) value is true.
If inLastVersion is true, only documents whose last version link to the specified document are included.
If inLiveVersion is true, only documents whose live version link to the specified document are included.
If both parameters are true or both are false, all documents are returned for which either the last or live version link to the specified document.
The optional parameter linktypes is a string containing a comma or whitespace separated list of the types of links to include, which is one or more of: inline, out_of_line, image, include or other.
IsLinked() IsNotLinked()
IsLinked() evaluates to true for any document which is linked by other documents, IsNotLinked() evaluates to true for any document that is not linked from any other document (thus not reachable by following links in documents, the navigation tree, or linked by the content of other parts on which link extraction is performed).
HasPart('partTypeName')
Searches documents which have a part of the specified part type. This search is version-dependent.
HasPartWithMimeType('some mimetype')
Searches documents having a part with the given mime type. This search is version-dependent. This uses a 'like' condition, thus the % wildcard can be used in the parameter. For example, to search all images: HasPartWithMimeType('image/%')
DoestNotHaveVariant(branch, language)
Searches documents that do not have the specified variant. See also the page on variants for more information.
The following functions can be used in value expressions.
Syntax:
Concat(string1, string2, ...) : string
Concatenates multiple strings.
Syntax:
Length(string) : long
Returns the length of its string argument.
Syntax:
Left(string, length) : string
Returns 'length' leftmost characters from the string. If 'length' is larger than the string, the whole string is returned. If length is 0, an empty string is returned.
Syntax:
Right(string, length) : string
Returns 'length' rightmost characters from the string. If 'length' is larger than the string, the whole string is returned. If length is 0, an empty string is returned.
Syntax:
Substring(string, position, length) : string
Returns a string formed by taking 'length' characters from the string at the specified position. The 'length' argument is optional, if not specified, it will go till the end of the input string. The 'position' argument starts at 1 for the first character.
Syntax:
UpperCase(string) : string
Syntax:
LowerCase(string) : string
Syntax:
CurrentDate(spec?) : date
Returns the current date.
The optional spec argument allows to specify an offset to the current date. It is a string with the following syntax:
+/- <num> (days|weeks|months|years)
For example:
CurrentDate('- 7 days')
Syntax:
CurrentDateTime(spec?) : date
Returns the current datetime.
The optional spec argument allows to specify an offset to the current datetime. It is a string with the following syntax:
+/- <num> (seconds|minutes|hours|days|weeks|months|years)
For example:
CurrentDateTime('- 3 hours')
These functions all take a date or datetime as argument, and return a long value.
DayOfWeek returns a value in the range 1-7, where 1 is Sunday.
For the Week function, the first week of the year is the first week containing a Sunday.
These functions take one string argument consisting of 3 words, each one taken from the following groups:
start this week
end last month
next year
So for example:
RelativeDate('start this month')
returns a date set to the first day of the current month.
The basic mathematical operations.
Returns a pseudo-random double value greater than or equal to 0 and less than or equal to 1.
Syntax:
Mod(number1, number2)
These functions all take one number as argument.
Syntax:
Round(number, scale)
Rounds the given number to have at most scale digits to the right of the decimal point.
Syntax:
ContextDoc(expression [, position])
In some cases a context document is available when performing a query. For example, when a query is embedded inside a document, that document serves as the context document. It is possible to evaluate expressions on this context document by use of this ContextDoc function. The optional position argument allows to climb up in the stack of context documents (which is available in publisher requests).
Examples:
ContextDoc(id) -- the id of the context document ContextDoc($someField) -- the value of a field of the context document ContextDoc(Concat(name, ' ', $someField))
Returns the ID of the current user (= the user executing the query).
UserId() -> function takes no arguments
Converts its argument to a hierarchical path literal. This function is useful because there is no special query language syntax for entering hierarchical path literals. The argument should be a slash-separated hierarchical path, e.g.:
Path('/A/B/C')
Path('A/B/C') -> the initial slash is optional
For full text queries, the where part takes a special form. There are two possibilities: either only a full text search is performed, or the fulltext query is further restricted using 'normal' conditions. The two possible forms are:
... where FullText('word')
or
... where FullText('word') AND <other conditions>
for example:
... where FullText('word') AND $myfield = 'abc' AND InCollection('mycollection')
Note that the combining operator between the FullText condition and other conditions is always AND, thus the result of the full text query is further refined. The further conditions can of course be of any complexity, and can thus again contain OR.
The FullText clause needs to be the first after the word "where", it cannot appear at arbitrary positions in the where-clause.
If no order by clause is included when doing a full text query, the results are ordered according to the score assigned by the fulltext search engine.
The parameter of the FullText(...) function is a query which is passed on to the full text engine, in our case Lucene. See here.
The FullText() function can have 3 additional parameters which indicate if the search should be performed on the document name, document content or field content. By default, all three are searched. These parameters should be numeric: 0 indicates false, and any other value true.
For example:
FullText('word', 1, 0, 0)
Searches for 'word', but only in the document name.
Additionally, you can specify a branch and language as parameters to the FullText function, to specify that only documents of that branch/language should be searched. Thus the full syntax of the FullText function is:
FullText(lucene query, searchInName, searchInContent, searchInFields, branch, language)
Specifying the branch and language as part of the FullText function is more more efficient then using:
FullText(lucene query) and branch = 'my_branch' and language = 'my_language'
If you wish to have contextualized text fragments of the sought after terms. This function should be used in the select part of the query. By default this function will only return the first text fragment found. The fragments are returned as xml which has the following structure :
<html>
<body>
<div class="fulltext-fragment">
... full text fragment ... <span class="fulltext-hit">the term</span> ... more text ...
</div>
...
</body>
</html>
Usage of the function when you only wish to receive one fragment (default) :
select FullTextFragment() where FullText('word')
If you wish to have more text fragments you can specify the amount of fragments as a function parameter.
select FullTextFragment(5) where FullText('word')
This function will only return fragments from the content of the document. This means that context from document name or fields will not appear in the result.
The order by part is optional.
The order by part contains a comma separated listing of value expressions, each of these optionally followed by ASC or DESC to indicate ascending (the default) or descending order. The expressions listed here have no connection with those in the select-part, i.e. it does not have to be subset of those.
"null" values are put at the end (when using ASC order).
This can be used to limit the number of results returned from a query. This part is optional.
The option part allows to specify options that influence the execution of the query. The options are defined as:
option_name = 'option_value' (, option_name = 'option_value')*
Supported options:
|
name |
value |
default |
|---|---|---|
|
include_retired |
true/false |
false |
|
search_last_version |
true/false |
false |
|
style_hint |
(anything) |
(empty) |
|
annotate_link_fields |
true/false |
true |
|
chunk_offset |
an integer (start-index is 1) |
N/A |
|
chunk_length |
an integer |
N/A |
include_retired is used to indicate that retired documents should be included in the result (by default they are not).
search_last_version is used to indicate that the last version of metadata should be searched and retrieved, instead of the live version. When using this, documents that do not have a live version will also be included in the query result (otherwise they are not included). Full text searches are always performed on the live data, regardless of whether this option is specified.
style_hint is used to supply a hint to the publishing layer for how the result of the query should be styled. The repository server does not do anything more then add the value of this option as an attribute on the generated XML query results (<searchResult styleHint="my hint" ...). It is then up to the publishing layer to pick this up and do something useful with it. For how this is handled in the DaisyWiki, see the page on Query Styling.
annotate_link_fields indicates whether selected fields of type "link" should be annotated with the document name of the document pointed to by the link. If you don't need this, you can disable this to gain some performance.
chunk_offset and chunk_length allow to retrieve a subset of the query results. This is useful for paged display of the query results.
select id, name where true
select id, name where name like 'p%' order by creationTime desc limit 10
select id, name, totalSizeOfParts where true order by totalSizeOfParts desc limit 10
select id, name, versionState, versionCreationTime where versionState = 'draft' option search_last_version = 'true'
select id, name, lockType, lockOwnerName, lockTimeAcquired, lockDuration where lockType is not null
select id, name where HasPartWithMimeType('image/%')
select name where true order by Random()
select name, Length(name) where true order by Length(name) DESC