A SBIR Phase II contract was awarded to Quansight in September, 2023 for $1,250,000.0 USD from the U.S. Department of Defense and DARPA.
We propose to integrate the leading sparse tensor algebra compiler, TACO, as the back-end for PyData/Sparse, the default sparse computing package in the Python ecosystem. The TACO compiler was developed by PI Amarasinghe's group, which pioneered the field of sparse computing compilation. The PyData/Sparse library is developed and maintained by Quansight LLC, a leader of the high-performance scientific Python ecosystem and employs many of the creators and current lead developers of the key projects in this space. PI Reines, lead author of the Python Array API Standard and project lead of the popular stdlib.js numerical computing library, will be leading the project. Currently, PyData/Sparse provides a comprehensive API for sparse array processing for leading-edge Python packages, such as NumPy, SciPy, and scikit-learn. However, the performance of PyData/Sparse can be orders of magnitude slower than what is possible. With the TACO compiler, one can take any complex tensor algebra expression with sparse tensors and generate high-performance CPU and GPU codes with equal or even better performance compared to state-of-the-art hand-generated libraries. In the proposed work integrating TACO into PyData/Sparse, we will generate code for CPUs and GPUs and the Onyx sparse accelerator co-developed by Prof. Joel Emer, a leading expert in microprocessor design. Thus, we believe that our proposal to make TACO the back-end of PyData/Sparse provides the fastest, most comprehensive, and least risky path toward making sparsity highly performant and universally available to the entire Python ecosystem. The vision of this proposal is a common infrastructure that can keep up with performance demands while offering a sparse array language on par with NumPy’s dense array language. TACO is currently the only universal high-performance framework that can support any sparse (and dense) tensor algebra expression in all the essential formats and generate code equal to or better than the few available state-of-the-art hand-optimized implementations. Thus, we have a unique window of opportunity to make a significant impact on the Python ecosystem based on TACO. We believe that a TACO-based system can support the needs of all stakeholders and provide a unified sparsity framework for the entire Python ecosystem.