CEP 29 - The MatchSpec query language
| Title | The MatchSpec query language |
| Status | Accepted |
| Author(s) | Jaime Rodríguez-Guerra <jaime.rogue@gmail.com>, Cheng H. Lee <clee@anaconda.com>, Bas Zalmstra <bas@prefix.dev> |
| Created | June 4, 2024 |
| Updated | Mar 4, 2026 |
| Discussion | https://github.com/conda/ceps/pull/82 |
| Implementation | https://github.com/conda/conda/blob/4.3.34/conda/resolve.py#L33, https://github.com/conda/conda/blob/25.7.0/conda/models/match_spec.py#L85, https://docs.rs/rattler_conda_types/latest/rattler_conda_types/struct.MatchSpec.html, https://github.com/mamba-org/mamba/blob/2.3.2/libmamba/src/specs/match_spec.cpp, https://github.com/openSUSE/libsolv/blob/0.7.35/src/conda.c#L567 |
| Requires | CEP 33, CEP 34, CEP 36 |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 when, and only when, they appear in all capitals, as shown here.
Abstract
This CEP standardizes the syntax for the MatchSpec query language.
Motivation
The motivation of this CEP is merely informative. It describes the details of an existing query language.
Nomenclature
The MatchSpec query syntax is a mini-language designed to select individual entries in a collection of package records. It is sometimes referred to as simply spec or conda spec.
Specification
MatchSpec strings provide a compact method to query collections of conda artifacts (e.g. in a conda channel, or in an installed environment) by matching str and int fields on package records (see CEP 34: ./info/index.json and CEP 36: Package Record Metadata). Note that fields using other types, like list[str] (depends, constrains, etc.), cannot be matched by this syntax.
Syntax
The MatchSpec syntax can be thought of as a structured collection of matching expressions, each targeting a package record field. A matching expression is defined as a string that MUST follow these rules:
- For expressions targeting the
versionfield, version specifier rules MUST be applied. - For expressions targeting the
channelfield, channel specifier rules MUST be applied. - For expressions targeting any other
strfield, string matching conventions MUST be used. - For expressions targeting
intfields, the target value MUST be converted tostrand handled as such.
The full MatchSpec syntax takes this approximate form, with parentheses denoting optional fields:
(channel(/subdir):(namespace):)name(version(build))([key1='value 1'(, )key2=value2])
More precisely, the following rules MUST apply:
- A
MatchSpecstring MAY exhibit two forms of expressions: positional and keyword based. - Six positional expressions are recognized. From left to right, they can be arranged in two groups: (
channel,subdir,namespace) and (name,version,build).- The first group is optional. If present, it MUST be separated from the second group by a single colon character
:. Within this group, there are four items:channel: str. Optional.subdir: str. Optional. It requireschannelto be defined. MUST be separated fromchannelby a single forward slash,/. It MUST use a known subdir identifier; otherwise it could be interpreted as the last component of a channel URL.- A colon
:separator, required ifchannelornamespaceare defined. namespace: str. Optional. This expression field MUST be parsed and ignored.
- The second group contains three expressions. They MUST be separated by either spaces or a single
=character. Separator types MUST NOT be mixed. See the version expression parsing notes for additional details on the interaction between the=symbol as a separator and as an operator. Leading and trailing spaces MUST be ignored.name: str. Required. Empty names MUST be represented as*.version: str | VersionSpec. Optional.build: str. Optional. It requiresversionto be present.
- The first group is optional. If present, it MUST be separated from the second group by a single colon character
- All keyword expressions are optional. If present, they MUST be enclosed in a single set of square brackets, after the positional expressions. The following rules apply:
- Keyword expressions are written as key-value pairs. They MUST be built by joining the name of the target record field (key) and the expression string (value) with a single
=character. - The value MUST be quoted with single
'or double"quotes if it contains spaces, commas, equal signs, or square brackets. Quoting rules follow Python's string literals. - Keyword expression pairs MUST be separated by a single comma character
,. Historically, spaces have also been allowed as separators but SHOULD NOT be used. - Spaces between comma separators MAY be allowed and MUST be ignored.
- Keyword expressions are written as key-value pairs. They MUST be built by joining the name of the target record field (key) and the expression string (value) with a single
- When both positional and keyword expressions are used, the keyword expressions override the positional values, except for
name: its keyword expression MUST be ignored.
Matching conventions
String matching
Matching expressions that target string fields MUST be interpreted using these case-insensitive rules:
- If the expression begins with
^and ends with$, it MUST be interpreted as a regular expression (regex). The expression matches if the regex search returns a hit; e.g. with Python:re.search(expression, field) is not None. Advanced expressions like lookaround and backreferences SHOULD NOT be allowed. - If the expression contains one or more asterisks (
*), it is considered a glob expression and MUST be converted into a regular expression and interpreted as such. To convert a glob expression into a regex string:- Escape characters considered special in regex expressions adequately (e.g. using Python's
re.escape). - Replace escaped asterisks (
\*) by.*. - Wrap the resulting string with
^and$.
- Escape characters considered special in regex expressions adequately (e.g. using Python's
- Otherwise, matches MUST be tested with exact, case-insensitive string equality.
Channel matching
Channel fields MUST be matched with the same rules as strings.
The value of a channel expression MUST allow both names and full URLs. When a name is used (as per CEP 26), it MUST be promoted to its corresponding fully qualified URL before comparison.
Version matching
Expressions targeting the version field MUST be handled with additional rules. These expressions are referred to as version specifiers.
A version specifier MUST consist of one or more version clauses, separated by logical operators that MUST follow these rules:
|denotes the logical OR.,denotes the logical AND.,(AND) has higher precedence than|(OR).- Parentheses
()MAY be used to modify precedence.
A version clause consists of either:
- A single version literal (as defined in CEP 33).
- An operator plus a single version literal.
- A single version literal containing one or more globs (
*). - A single glob (
*).
For example, given a string
python>=3,<4, the version specifier is the full expression>=3,<4, which consists of two clauses (>=3,<4) separated by,(AND). Each clause contains a version literal (3and4, respectively).
Each version clause MUST be described by one of these types:
- String matching rules apply when:
- The value is a regex (surrounded by
^and$). - The value contains a non-trailing glob (
*).
- The value is a regex (surrounded by
- Exact equality, expressed as a version literal prefixed by the double-equals string
==, MUST be interpreted as normalized version literal equality. - Fuzzy equality, expressed as either a version literal prefixed by one
=symbol, or a version literal trailed by.*or*. After removing the leading=character and appending a.*suffix, comparison is only truthy when all the version segments before the glob match are equal. - Exclusion, expressed as a version literal or a version literal augmented with globs, prefixed by the string
!=, MUST be interpreted as a negated fuzzy equality. - Ordered comparison, with the implied ordering described in CEP 33:
- Exclusive ordered comparison, expressed as a version literal prefixed by
<or>, MUST be interpreted as "smaller than" and "greater than", respectively, as per their position in the version ordering scheme. - Inclusive ordered comparison, expressed as a version literal prefixed by one of these strings:
<=,>=, MUST be interpreted as "smaller than" and "greater than", but they will also match as normalized version literal equality.
- Exclusive ordered comparison, expressed as a version literal prefixed by
- Semver-like comparisons, expressed as a version literal prefixed by the
~=string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g.~=0.5.3expands to>=0.5.3,0.5.*). This operator is considered deprecated, and its expanded alternative SHOULD be used instead.
Version expressions SHOULD NOT contain spaces between operators, and MUST be removed and ignored if present.
Version expression parsing
In the name of backwards compatibility, the (name, version, build) group in the MatchSpec syntax allows two types of separators: spaces and a single = character. This conditions how certain version expressions are parsed. Given a version literal denoted as version-literal (i.e. no operators or asterisks), the following rules MUST apply:
- If the string only contains two fields, which MUST be
nameandversion:{name}={version-literal}and{name} ={version-literal}(note the space) both denote fuzzy equality. They are equivalent to{name}[version={version-literal}.*]and{name} {version-literal}.*{name} {version-literal}denotes exact equality. It is equivalent to{name}[version={version-literal}]and{name}=={version-literal}.
- If the string contains three fields,
name,versionandbuild:{name} {version-literal} {build},{name} =={version-literal} {build},{name}={version-literal}={build}and{name}=={version-literal}={build}all denote exact equality. They are equivalent to{name}[version={version-literal},build={build}].{name} ={version-literal} {build}denotes fuzzy equality.
Some examples for name=pkg and version-literal=1.8, with equivalent version specifiers in the same block:
pkg=1.8
pkg =1.8
pkg 1.8.*
pkg 1.8.* *
pkg=1.8.*
pkg=1.8.*=*
pkg =1.8.* *
pkg ==1.8.* *
pkg[version=1.8.*]
pkg[version="1.8.*"]
pkg 1.8
pkg 1.8 *
pkg==1.8
pkg=1.8=*
pkg==1.8=*
pkg ==1.8 *
pkg[version=1.8]
pkg[version="1.8"]
Examples
>>> str(MatchSpec('foo 1.0 py27_0'))
'foo==1.0=py27_0'
>>> str(MatchSpec('foo=1.0=py27_0'))
'foo==1.0=py27_0'
>>> str(MatchSpec('conda-forge::foo[version=1.0.*]'))
'conda-forge::foo=1.0'
>>> str(MatchSpec('conda-forge/linux-64::foo>=1.0'))
"conda-forge/linux-64::foo[version='>=1.0']"
>>> str(MatchSpec('*/linux-64::foo>=1.0'))
"foo[subdir=linux-64,version='>=1.0']"
Rationale
The initial MatchSpec form was a simpler name [version [build]] syntax (still in use in build recipes), with two optional keyword arguments (optional, target) between parentheses. The CLI also had its own string specification, which only supported name and version separated by = symbols (see conda 4.3.x's spec_from_line()). conda search allowed queries based on regexes only.
With conda 4.4.0, a new syntax was introduced to unify and consolidate all these different variations (see release notes for 4.4.0, conda/conda#4158, and conda/conda#5517), and also brought channel and subdir matching (fields before ::) and arbitrary record field matching in between square brackets.
The new syntax had to maintain backwards compatibility with the space- and =-separated forms too. This is the reason behind some surprising behaviors discussed in the specification above.
Advanced expressions like lookaround and backreferences are discouraged because they can incur performance issues leading to DOS and other security problems.
Mixing * with other version-specific operators is disallowed as per the recommendations discussed in https://github.com/conda/ceps/pull/60.
Some legacy syntax that is still recognized by conda was intentionally left out of this CEP due to lack of usage in practice. Examples include:
[optional]: bare keyword (no value) that is used internally by the classic solver to track droppable requirements(optional=True): same as above, but with different syntax.condaallows parenthesized blocks after square brackets, with arbitrary contents.@feature: a way to require the, now deprecated, features (e.g.@mkl)channel[subdir]::name: a non ambiguous way to add subdir information to the positional channel field (instead of slash separation). Keyword argument is preferred for disambiguation. By dropping this syntax, we only assign one meaning to square brackets: key-value pairs.
Future work may introduce a stricter syntax subset that further reduces the ambiguity in the specification (e.g. disallowing space-separated name-version-build triplets).
Appendices
Appendix A: Canonical representation
The canonical string representation of a MatchSpec expression proposed by conda follows these rules:
nameis required and MUST be written as a positional expression. Empty names MUST be written as*.- If
versiondescribes an exact equality expression, it MUST be written as a positional expression, prepended by==. Ifversiondenotes fuzzy equality (e.g.1.11.*), it MUST be written as a positional expression with the.*suffix left off and prepended by=. OtherwiseversionMUST be included inside the key-value brackets. - If
versionis an exact equality expression, andbuilddoes not contain asterisks,buildMUST be written as a positional expression, prepended by=. Otherwise,buildMUST go inside the key-value brackets. - If
channelis defined and does not contain asterisks, a::separator is used betweenchannelandname.channelMAY be represented by its name or full, subdir-less URL. - If both
channelandsubdirdo not contain asterisks,subdiris appended tochannelwith a/separator. Otherwise,subdiris included in the key-value brackets. - Key-value pairs MUST be separated by commas, with no spaces between delimiters. Values MUST be quoted with single quotes.
- The
namespacefield MUST NOT be represented. - Case-insensitive string fields MUST be lowercased.
Appendix B: Search vs solver MatchSpec
MatchSpec strings can be used under two different contexts:
- Search queries: To obtain all the artifacts matching the query against a collection of packages. Results may include more than one entry per package name.
- Solver requests: To obtain the subset of packages in an index that satisfy the request and their dependency metadata. Results must only include one entry per package name.
In contrast with search queries, only some MatchSpec fields make sense for solver requests. Most common include: name, version, build, channel.
Appendix C: Fully specified expressions
To uniquely identify a single package record, a MatchSpec expression can be constructed in two ways:
- By passing exact values to the fields
channel(preferably by URL),subdir,name,version,build. - By matching its checksum directly:
*[md5=12345678901234567890123456789012]or*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e].
Note that an artifact URL may be parsed into a fully specified MatchSpec. Given:
https://conda.anaconda.org/conda-forge/linux-64/python-3.11.10-h123456_0.conda
[----------channel--------------------|-subdir-|-name-|version|-build---]
, becomes conda-forge/linux-64::python==3.11.10[build=h123456_0].
References
conda.models.match_spec.MatchSpecrattler_conda_types::match_spec- Package match specifications at conda-build docs
- Comparison of
MatchSpecimplementation incondavs a LARK grammar - Comparison of
MatchSpecimplementations inconda,rattlerandmamba
Copyright
All CEPs are explicitly CC0 1.0 Universal.