Photo by Issac Smith on Unsplash
Anaconda surveyed the conda community in late 2022. This post reviews what we learned from that survey and how it is impacting the future directions of conda.
Around the same time, the Python Software Foundation published the results of their (much, much bigger) Python Packaging Survey. The two surveys asked some similar questions and some distinct ones. We include insights from the PSF survey when they are particularly relevant to the conda community.
While the survey itself only had 72 responses and is in no way representative of the entire conda community of users, the summary statistics listed here could point to particular problem areas that we may want to follow up on with further research or user interviews. PSF received 8774 responses.
The summary has several sections:
- How happy are people with conda?
- What needs improvement?
- When is conda used?
- What is conda used with?
- Who uses conda?
How happy are people with conda?
The conda survey asked 4 questions that we'll throw into the "happiness" bucket:
The news here is mostly good, but the results also highlight that things still need to be improved. For example, in "Conda is my preferred package manager" 61% had "happy" responses, but 25% were neutral, and almost 14% had "unhappy" responses. That's 39% who can't say they are in the "happy" bucket. For the other three questions the numbers outside the "happy" categories are 28%, 24%, and 32%. Those are better, but still not where we want them. See the specific feedback in the next section for some reasons behind these numbers.
What needs improvement?
The survey asked respondents what changes should be prioritized going forward.
The need for speed was the most prominent theme:
- 71% prioritized speedups.
- 60% use mamba, which is known for being very fast compared to the standard conda solver.
Some individual responses:
- "just make libmamba the default solver already!!!!"
- "Nearly 100% use Mamba nowadays due to speed, only fallback to Conda when something is going wrong."
- "Dear Goddess make the solver faster out of the box."
This survey was taken in November and December 2022, which we feel was an inflection point in conda's performance profile. The 22.11.0 conda release, which came out towards the end of the survey window, implemented parallel package download and extraction, and dropped the
experimental tag from the
conda-libmamba-solver (the solver was added in the March 2022 release). At the time of the survey, only 19% of respondents were using conda-libmamba-solver.
These changes are representative of a multi-year, and continuing effort to improve the speed of conda. Our next survey will tell us if conda is succeeding at speeding things along...
Better error messages
Close behind speed was more helpful error messages with 65% of respondents.
Some individual feedback
- "when it works, it's fantastic. when it doesn't, it's really painful to diagnose dependency resolution issues and discover what's happening, resulting in ad hoc tools like parsing json into networkx or something for exploration."
- "Explorability of dependency resolution conflicts."
The conda-libmamba-solver error messages will improve shortly as the latest version of the solver gets integrated into conda.
Improved interoperability with other package managers was prioritized by 62% of respondents. This message is reinforced by how frequently people use more than one package manager.
79% of conda survey respondents use other package managers in addition to conda. 75% of all respondents also use pip, and 55% use mamba.
The PSF survey asked "What Packaging tools do you use?" They blended together package and environment management (as does conda itself):
(Note: PSF survey results are shown with a blue outline.)
In this survey conda is used less frequently than 6 "standard" (and highly overlapping) entries (pip, venv/virtualenv, PyPI, setuptools, and wheel), and by poetry. pip has almost twice as many users as venv, the second most popular response. conda is followed closely by pipenv and twine. conda-build is listed in the bottom half with 4-5% of respondents using it.
(This is only one snapshot in time, so we can't say if the conda ecosystem is gaining or losing mind-share in the Python ecosystem.)
The PSF survey asked several questions specifically related to interoperability.
When PSF asked their respondents to rank these 4 priorities the highest ranked (by far) is "Support more interoperability between Python packaging tools." At the very bottom was "Support interoperability between Python packaging and packaging tools for other languages."
Several other questions revealed a strong preference for just one tool/workflow for package management.
The first question is "I prefer to have a clearly defined "official" workflow" which 75% agreed with. A second similar (but inversely worded) question asks if "The existence of multiple tools is beneficial for the Python packaging ecosystem. Here 46% disagree with the statement, and 23% are neutral
What do we do with all this?
Let's start by noting that the Python community strongly prioritizes interoperability between Python packaging tools, but that support for multiple languages is at the bottom. This is good news and bad news for conda. Conda already works well with pip. The downside is that most PSF respondents don't prioritize support for multiple languages, which is one of conda's superpowers.
PSF respondents would also clearly prefer to have a single packaging toolkit. This is not necessarily bad news for conda, as the conda ecosystem addresses concerns that pip does not. The PSF survey does not ask "What toolkit would you prefer that to be." However, given the Python community's current usage of pip/venv our guess is that they would say "pip/venv", and that is a challenge we need to address. We hope the current much wider use of pip/venv is merely a byproduct of the wide range of Python packages available via pip and that this is a Python community survey, rather than a widespread preference for the capabilities of pip/venv over conda. But, from these surveys we can't tell that.
When is conda used?
Industries and Ambits
Just over 40% of respondents are connected to academia. Since the student plus professor/instructor pool adds up to less than 25%, a significant slice of these academics come from other job functions. Specific domains make up the long tail of responses.
When asked in what settings participants use conda 81% said at work, while only 18% said school-related. 72% use conda on open-source projects. 53% use if for "personal* projects.
The survey used the categories from the packages directory on the Anaconda Nucleus site to ask in what application areas conda is used in
Data Usage & ETL (67%, Extract, Transform and Load) and Machine Learning & AI (63%) were the most common applications. Visualization (54%), Packaging (54%) and Scientific (53%) make up the next largest groups.
What is conda used with?
The conda survey asked people to list their top 5 packages.
This question had a few clear leaders with Numpy and Pandas used by over half of respondents. Matplotlib, Scipy, and Scikit-learn were used by 1/4 to 1/5 of respondents.
Linux was the dominant operating system with almost 70% use, but macOS and Windows weren't that far behind, both in the neighborhood of 50%. VS Code was the clearly preferred IDE with 70% use. The Jupyter platforms came in next, with both in the mid 40 percents. PyCharm was used by 1/3 of participants.
Python was the most widely used programming language by far, with almost 100% use. Shell, C/C++, R and SQL had 1/3 to 1/4 as many users as Python.
There is also data about framework tools, CI tools, machine learning frameworks and ML operations.
Respondents used open and proprietary data equally, with both garnering close to an 80% response rate.
Over 33 data formats were listed as being used. CSV and JSON were the most popular with 6 out of every 7 responders using CSV.
Respondents listed 13 database management systems they use. SQLite (36%) and PostgreSQL (32%) led this group, followed by MySQL (19%).
Microsoft Excel (35%) and R (31%) were the two most popular tools in the statistical analysis tools category.
The survey asked specific questions about ETL and data visualization tool use. 20 ETL tools had users, but only Pandas (54%), Apache Airflow (10%) and Apache Spark (10%) had more than one or two. Jupyter (67%) was the clear leader in data visualization responses, with Plotly (35%) following. 23 other visualization tools were listed, most with only one or two users.
Who uses conda?
71 of 72 responders write some code as part of their work.
Both the conda and Python groups had similar levels of experience. 68% of conda respondents had 4+ years of experience with conda, while 66% of Python respondents has 4+ years of experience with Python. The conda survey asked participants about their job functions, industries and Project types.
The most common role was Developer, followed closely by Research Scientists. The top 12 roles can be roughly grouped:
- Developer (29.2%)
- Research Scientist (27.8%) and Applied Scientist (15.3%)
- Data Scientist (22.2%), Data Engineer (15.3%), and ML Scientist (12.5%)
- Professor/Instructor (12.5%) and Student (11.1%)
- ML Ops (6.9%), DevOps (5.6%), and System Administrator (4.2%)
- Project Manager (4.2%)
People could select more than one job function.
Many thanks to all of you who took the survey, and to Luc (Boaz) Douyon of Anaconda for the conda survey, and Shamika Mohanan of the PSF for the Python packaging survey summary. We will continue to run surveys of the conda community in future years. When we do, you will see the report here.
Dave Clements, on behalf of the Conda Communications Team