2.1. Overview
2.1. Overview
A JSONPath expression is a string that, when applied to a JSON value (the query argument), selects zero or more nodes of the argument and outputs these nodes as a nodelist.
A query MUST be encoded using UTF-8. The grammar for queries given in this document assumes that its UTF-8 form is first decoded into Unicode scalar values as described in [RFC3629]; implementation approaches that lead to an equivalent result are possible.
A string to be used as a JSONPath query needs to be well-formed and valid. A string is a well-formed JSONPath query if it conforms to the ABNF syntax in this document. A well-formed JSONPath query is valid if it also fulfills both semantic requirements posed by this document, which are as follows:
-
Integer numbers in the JSONPath query that are relevant to the JSONPath processing (e.g., index values and steps) MUST be within the range of exact integer values defined in Internet JSON (I-JSON) (see Section 2.2 of [RFC7493]), namely within the interval [-(2^53)+1, (2^53)-1].
-
Uses of function extensions MUST be well-typed, as described in Section 2.4.3.
A JSONPath implementation MUST raise an error for any query that is not well-formed and valid. The well-formedness and the validity of JSONPath queries are independent of the JSON value the query is applied to. No further errors relating to the well-formedness and the validity of a JSONPath query can be raised during application of the query to a value. This clearly separates well-formedness/ validity errors in the query from mismatches that may actually stem from flaws in the data.
Mismatches between the structure expected by a valid query and the structure found in the data can lead to empty query results, which may be unexpected and indicate bugs in either. JSONPath implementations might therefore want to provide diagnostics to the application developer that aid in finding the cause of empty results.
Obviously, an implementation can still fail when executing a JSONPath query, e.g., because of resource depletion, but this is not modeled in this document. However, the implementation MUST NOT silently malfunction. Specifically, if a valid JSONPath query is evaluated against a structured value whose size is too large to process the query correctly (for instance, requiring the processing of numbers that fall outside the range of exact values), the implementation MUST provide an indication of overflow.
(Readers familiar with the HTTP error model may be reminded of 400 type errors when pondering well-formedness and validity, and they may recognize resource depletion and related errors as comparable to 500 type errors.)
2.1.1. Syntax
Syntactically, a JSONPath query consists of a root identifier ($), which stands for a nodelist that contains the root node of the query argument, followed by a possibly empty sequence of segments.
jsonpath-query = root-identifier segments
segments = *(S segment)
B = %x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ; Carriage return
S = *B ; optional blank space
The syntax and semantics of segments are defined in Section 2.5.
2.1.2. Semantics
In this document, the semantics of a JSONPath query define the required results and do not prescribe the internal workings of an implementation. This document may describe semantics in a procedural step-by-step fashion; however, such descriptions are normative only in the sense that any implementation MUST produce an identical result but not in the sense that implementers are required to use the same algorithms.
The semantics are that a valid query is executed against a value (the query argument) and produces a nodelist (i.e., a list of zero or more nodes of the value).
The query is a root identifier followed by a sequence of zero or more segments, each of which is applied to the result of the previous root identifier or segment and provides input to the next segment. These results and inputs take the form of nodelists.
The nodelist resulting from the root identifier contains a single node (the query argument). The nodelist resulting from the last segment is presented as the result of the query. Depending on the specific API, it might be presented as an array of the JSON values at the nodes, an array of Normalized Paths referencing the nodes, or both -- or some other representation as desired by the implementation. Note: An empty nodelist is a valid query result.
A segment operates on each of the nodes in its input nodelist in turn, and the resultant nodelists are concatenated in the order of the input nodelist they were derived from to produce the result of the segment. A node may be selected more than once and appears that number of times in the nodelist. Duplicate nodes are not removed.
A syntactically valid segment MUST NOT produce errors when executing the query. This means that some operations that might be considered erroneous, such as using an index lying outside the range of an array, simply result in fewer nodes being selected. (Additional discussion of this property can be found in the introduction of Section 2.1.)
As a consequence of this approach, if any of the segments produces an empty nodelist, then the whole query produces an empty nodelist.
If the semantics of a query give an implementation a choice of producing multiple possible orderings, a particular implementation may produce distinct orderings in successive runs of the query.
2.1.3. Example
Consider this example. With the query argument {"a":[{"b":0},{"b":1},{"c":2}]}, the query $.a[*].b selects the following list of nodes (denoted here by their values): 0, 1.
The query consists of $ followed by three segments: .a, [*], and .b.
First, $ produces a nodelist consisting of just the query argument.
Next, .a selects from any object input node and selects the node of any member value of the input node corresponding to the member name "a". The result is again a list containing a single node: [{"b":0},{"b":1},{"c":2}].
Next, [*] selects all the elements from the input array node. The result is a list of three nodes: {"b":0}, {"b":1}, and {"c":2}.
Finally, .b selects from any object input node with a member name b and selects the node of the member value of the input node corresponding to that name. The result is a list containing 0, 1. This is the concatenation of three lists: two of length one containing 0, 1, respectively, and one of length zero.