Skip to main content

CEP 29 - The MatchSpec query language

Title The MatchSpec query language
Status Accepted
Author(s) Jaime Rodríguez-Guerra <jaime.rogue@gmail.com>, Cheng H. Lee <clee@anaconda.com>, Bas Zalmstra <bas@prefix.dev>
Created June 4, 2024
Updated Mar 4, 2026
Discussion https://github.com/conda/ceps/pull/82
Implementation https://github.com/conda/conda/blob/4.3.34/conda/resolve.py#L33, https://github.com/conda/conda/blob/25.7.0/conda/models/match_spec.py#L85, https://docs.rs/rattler_conda_types/latest/rattler_conda_types/struct.MatchSpec.html, https://github.com/mamba-org/mamba/blob/2.3.2/libmamba/src/specs/match_spec.cpp, https://github.com/openSUSE/libsolv/blob/0.7.35/src/conda.c#L567
Requires CEP 33, CEP 34, CEP 36

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 when, and only when, they appear in all capitals, as shown here.

Abstract

This CEP standardizes the syntax for the MatchSpec query language.

Motivation

The motivation of this CEP is merely informative. It describes the details of an existing query language.

Nomenclature

The MatchSpec query syntax is a mini-language designed to select individual entries in a collection of package records. It is sometimes referred to as simply spec or conda spec.

Specification

MatchSpec strings provide a compact method to query collections of conda artifacts (e.g. in a conda channel, or in an installed environment) by matching str and int fields on package records (see CEP 34: ./info/index.json and CEP 36: Package Record Metadata). Note that fields using other types, like list[str] (depends, constrains, etc.), cannot be matched by this syntax.

Syntax

The MatchSpec syntax can be thought of as a structured collection of matching expressions, each targeting a package record field. A matching expression is defined as a string that MUST follow these rules:

The full MatchSpec syntax takes this approximate form, with parentheses denoting optional fields:

(channel(/subdir):(namespace):)name(version(build))([key1='value 1'(, )key2=value2])

More precisely, the following rules MUST apply:

  • A MatchSpec string MAY exhibit two forms of expressions: positional and keyword based.
  • Six positional expressions are recognized. From left to right, they can be arranged in two groups: (channel, subdir, namespace) and (name, version, build).
    • The first group is optional. If present, it MUST be separated from the second group by a single colon character :. Within this group, there are four items:
      • channel: str. Optional.
      • subdir: str. Optional. It requires channel to be defined. MUST be separated from channel by a single forward slash, /. It MUST use a known subdir identifier; otherwise it could be interpreted as the last component of a channel URL.
      • A colon : separator, required if channel or namespace are defined.
      • namespace: str. Optional. This expression field MUST be parsed and ignored.
    • The second group contains three expressions. They MUST be separated by either spaces or a single = character. Separator types MUST NOT be mixed. See the version expression parsing notes for additional details on the interaction between the = symbol as a separator and as an operator. Leading and trailing spaces MUST be ignored.
      • name: str. Required. Empty names MUST be represented as *.
      • version: str | VersionSpec. Optional.
      • build: str. Optional. It requires version to be present.
  • All keyword expressions are optional. If present, they MUST be enclosed in a single set of square brackets, after the positional expressions. The following rules apply:
    • Keyword expressions are written as key-value pairs. They MUST be built by joining the name of the target record field (key) and the expression string (value) with a single = character.
    • The value MUST be quoted with single ' or double " quotes if it contains spaces, commas, equal signs, or square brackets. Quoting rules follow Python's string literals.
    • Keyword expression pairs MUST be separated by a single comma character ,. Historically, spaces have also been allowed as separators but SHOULD NOT be used.
    • Spaces between comma separators MAY be allowed and MUST be ignored.
  • When both positional and keyword expressions are used, the keyword expressions override the positional values, except for name: its keyword expression MUST be ignored.

Matching conventions

String matching

Matching expressions that target string fields MUST be interpreted using these case-insensitive rules:

  • If the expression begins with ^ and ends with $, it MUST be interpreted as a regular expression (regex). The expression matches if the regex search returns a hit; e.g. with Python: re.search(expression, field) is not None. Advanced expressions like lookaround and backreferences SHOULD NOT be allowed.
  • If the expression contains one or more asterisks (*), it is considered a glob expression and MUST be converted into a regular expression and interpreted as such. To convert a glob expression into a regex string:
    1. Escape characters considered special in regex expressions adequately (e.g. using Python's re.escape).
    2. Replace escaped asterisks (\*) by .*.
    3. Wrap the resulting string with ^ and $.
  • Otherwise, matches MUST be tested with exact, case-insensitive string equality.

Channel matching

Channel fields MUST be matched with the same rules as strings.

The value of a channel expression MUST allow both names and full URLs. When a name is used (as per CEP 26), it MUST be promoted to its corresponding fully qualified URL before comparison.

Version matching

Expressions targeting the version field MUST be handled with additional rules. These expressions are referred to as version specifiers.

A version specifier MUST consist of one or more version clauses, separated by logical operators that MUST follow these rules:

  • | denotes the logical OR.
  • , denotes the logical AND.
  • , (AND) has higher precedence than | (OR).
  • Parentheses () MAY be used to modify precedence.

A version clause consists of either:

  • A single version literal (as defined in CEP 33).
  • An operator plus a single version literal.
  • A single version literal containing one or more globs (*).
  • A single glob (*).

For example, given a string python>=3,<4, the version specifier is the full expression >=3,<4, which consists of two clauses (>=3, <4) separated by , (AND). Each clause contains a version literal (3 and 4, respectively).

Each version clause MUST be described by one of these types:

  • String matching rules apply when:
    • The value is a regex (surrounded by ^ and $).
    • The value contains a non-trailing glob (*).
  • Exact equality, expressed as a version literal prefixed by the double-equals string ==, MUST be interpreted as normalized version literal equality.
  • Fuzzy equality, expressed as either a version literal prefixed by one = symbol, or a version literal trailed by .* or *. After removing the leading = character and appending a .* suffix, comparison is only truthy when all the version segments before the glob match are equal.
  • Exclusion, expressed as a version literal or a version literal augmented with globs, prefixed by the string !=, MUST be interpreted as a negated fuzzy equality.
  • Ordered comparison, with the implied ordering described in CEP 33:
    • Exclusive ordered comparison, expressed as a version literal prefixed by < or >, MUST be interpreted as "smaller than" and "greater than", respectively, as per their position in the version ordering scheme.
    • Inclusive ordered comparison, expressed as a version literal prefixed by one of these strings: <=, >=, MUST be interpreted as "smaller than" and "greater than", but they will also match as normalized version literal equality.
  • Semver-like comparisons, expressed as a version literal prefixed by the ~= string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g. ~=0.5.3 expands to >=0.5.3,0.5.*). This operator is considered deprecated, and its expanded alternative SHOULD be used instead.

Version expressions SHOULD NOT contain spaces between operators, and MUST be removed and ignored if present.

Version expression parsing

In the name of backwards compatibility, the (name, version, build) group in the MatchSpec syntax allows two types of separators: spaces and a single = character. This conditions how certain version expressions are parsed. Given a version literal denoted as version-literal (i.e. no operators or asterisks), the following rules MUST apply:

  • If the string only contains two fields, which MUST be name and version:
    • {name}={version-literal} and {name} ={version-literal} (note the space) both denote fuzzy equality. They are equivalent to {name}[version={version-literal}.*] and {name} {version-literal}.*
    • {name} {version-literal} denotes exact equality. It is equivalent to {name}[version={version-literal}] and {name}=={version-literal}.
  • If the string contains three fields, name, version and build:
    • {name} {version-literal} {build}, {name} =={version-literal} {build}, {name}={version-literal}={build} and {name}=={version-literal}={build} all denote exact equality. They are equivalent to {name}[version={version-literal},build={build}].
    • {name} ={version-literal} {build} denotes fuzzy equality.

Some examples for name=pkg and version-literal=1.8, with equivalent version specifiers in the same block:

pkg=1.8
pkg =1.8
pkg 1.8.*
pkg 1.8.* *
pkg=1.8.*
pkg=1.8.*=*
pkg =1.8.* *
pkg ==1.8.* *
pkg[version=1.8.*]
pkg[version="1.8.*"]
pkg 1.8
pkg 1.8 *
pkg==1.8
pkg=1.8=*
pkg==1.8=*
pkg ==1.8 *
pkg[version=1.8]
pkg[version="1.8"]

Examples

>>> str(MatchSpec('foo 1.0 py27_0'))
'foo==1.0=py27_0'
>>> str(MatchSpec('foo=1.0=py27_0'))
'foo==1.0=py27_0'
>>> str(MatchSpec('conda-forge::foo[version=1.0.*]'))
'conda-forge::foo=1.0'
>>> str(MatchSpec('conda-forge/linux-64::foo>=1.0'))
"conda-forge/linux-64::foo[version='>=1.0']"
>>> str(MatchSpec('*/linux-64::foo>=1.0'))
"foo[subdir=linux-64,version='>=1.0']"

Rationale

The initial MatchSpec form was a simpler name [version [build]] syntax (still in use in build recipes), with two optional keyword arguments (optional, target) between parentheses. The CLI also had its own string specification, which only supported name and version separated by = symbols (see conda 4.3.x's spec_from_line()). conda search allowed queries based on regexes only.

With conda 4.4.0, a new syntax was introduced to unify and consolidate all these different variations (see release notes for 4.4.0, conda/conda#4158, and conda/conda#5517), and also brought channel and subdir matching (fields before ::) and arbitrary record field matching in between square brackets.

The new syntax had to maintain backwards compatibility with the space- and =-separated forms too. This is the reason behind some surprising behaviors discussed in the specification above.

Advanced expressions like lookaround and backreferences are discouraged because they can incur performance issues leading to DOS and other security problems.

Mixing * with other version-specific operators is disallowed as per the recommendations discussed in https://github.com/conda/ceps/pull/60.

Some legacy syntax that is still recognized by conda was intentionally left out of this CEP due to lack of usage in practice. Examples include:

  • [optional]: bare keyword (no value) that is used internally by the classic solver to track droppable requirements
  • (optional=True): same as above, but with different syntax. conda allows parenthesized blocks after square brackets, with arbitrary contents.
  • @feature: a way to require the, now deprecated, features (e.g. @mkl)
  • channel[subdir]::name: a non ambiguous way to add subdir information to the positional channel field (instead of slash separation). Keyword argument is preferred for disambiguation. By dropping this syntax, we only assign one meaning to square brackets: key-value pairs.

Future work may introduce a stricter syntax subset that further reduces the ambiguity in the specification (e.g. disallowing space-separated name-version-build triplets).

Appendices

Appendix A: Canonical representation

The canonical string representation of a MatchSpec expression proposed by conda follows these rules:

  1. name is required and MUST be written as a positional expression. Empty names MUST be written as *.
  2. If version describes an exact equality expression, it MUST be written as a positional expression, prepended by ==. If version denotes fuzzy equality (e.g. 1.11.*), it MUST be written as a positional expression with the .* suffix left off and prepended by =. Otherwise version MUST be included inside the key-value brackets.
  3. If version is an exact equality expression, and build does not contain asterisks, build MUST be written as a positional expression, prepended by =. Otherwise, build MUST go inside the key-value brackets.
  4. If channel is defined and does not contain asterisks, a :: separator is used between channel and name. channel MAY be represented by its name or full, subdir-less URL.
  5. If both channel and subdir do not contain asterisks, subdir is appended to channel with a / separator. Otherwise, subdir is included in the key-value brackets.
  6. Key-value pairs MUST be separated by commas, with no spaces between delimiters. Values MUST be quoted with single quotes.
  7. The namespace field MUST NOT be represented.
  8. Case-insensitive string fields MUST be lowercased.

Appendix B: Search vs solver MatchSpec

MatchSpec strings can be used under two different contexts:

  • Search queries: To obtain all the artifacts matching the query against a collection of packages. Results may include more than one entry per package name.
  • Solver requests: To obtain the subset of packages in an index that satisfy the request and their dependency metadata. Results must only include one entry per package name.

In contrast with search queries, only some MatchSpec fields make sense for solver requests. Most common include: name, version, build, channel.

Appendix C: Fully specified expressions

To uniquely identify a single package record, a MatchSpec expression can be constructed in two ways:

  • By passing exact values to the fields channel (preferably by URL), subdir, name, version, build.
  • By matching its checksum directly: *[md5=12345678901234567890123456789012] or *[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e].

Note that an artifact URL may be parsed into a fully specified MatchSpec. Given:

https://conda.anaconda.org/conda-forge/linux-64/python-3.11.10-h123456_0.conda
[----------channel--------------------|-subdir-|-name-|version|-build---]

, becomes conda-forge/linux-64::python==3.11.10[build=h123456_0].

References

All CEPs are explicitly CC0 1.0 Universal.