Neural Magic

Founders: Alex Matveev & Nir Shavit
Founding: 2018
Mission: Unlock the full potential of your ML environment. Accommodate the continuous growth of neural networks without added complexity or cost.
Employees: 53 & ~50% Local
Workplace: Hybrid
Stage & Capital Raised: Series A & $50M raised
Investors: Andreessen Horowitz, Amdocs, Comcast Ventures, NEA, Pillar VC, Ridgeline Ventures, Verizon Ventures, VMware
Key Partners and Customers: Striveworks, Intel, AMD, DigitalOcean, Google Cloud, AWS
Glassdoor Rating: N/A
Valuation (estimated): $100M – $300M (assuming they sold ~20% of the company in the $30M Q4 ‘21 Series A fundraise)
^ this is a useless number. There is no tangible valuation until the business is sold or goes public. Don’t forget it!

Neural Magic helps organizations deliver AI through software rather than expensive and inefficient hardware like GPUs. MIT Professor Nir Shavit and his PHD student Alex Matveev were doing deep learning research at an MIT Lab in 2016 but didn’t have access to the same high powered hardware resources their corporate counterparts had to do traditional neural network computer science research.

First, let’s set the stage. A GPU (graphics processing unit) is a specialized chip that allows you to execute large neural networks to predict desired outcomes. Like a race car engine, more or less. The NVIDIA A100, the type of industrial level chip used to train a LLM (large language model) like OpenAI’s ChatGPT, retails for $10-$15k. For one chip! Some analysts project OpenAI would have needed a cluster of 10,000 GPUs (chips) to train the initial ChatGPT model. I’ll let you do the math.

Quibble on the specifics all you wish, but the point is that it’s unreasonable to think infrastructure costs can continue to scale linearly (or exponentially). For any company, even one with the vast resources of OpenAI & Microsoft, rising infrastructure costs are a problem. 

Without the budget to afford GPUs, Nir & Alex built software that optimizes deep learning models to run  on CPUs (commodity-level hardware). Scarcity breeds creativity, after all. Spinning out of the university in 2018, the duo founded Neural Magic back when AI was still called machine learning to help AI teams run better, faster deep learning models on a budget.

There are some other secular trends that act as tailwinds for Neural Magic too. Moore’s Law is coming to an end (we think). For 50+ years the advancements in chipmaking have made software development easier on the back of increasingly more powerful hardware. But it’s getting more expensive and more difficult to make smaller and smaller transistors. Our power grid & infrastructure is being pushed to its limits too. Technologists need to come up with new, novel ways to get more efficient. We need better tools to help optimize AI models and software. 

Neural Magic is building one of those solutions. Nir, Alex & team are building a bridge between expensive hardware and commodity servers you can rent on cloud providers like Azure, AWS, or Google Cloud. They help organizations optimize their AI use cases to be deployed at very fast speeds by leveraging a well-researched ML optimization practice known as sparsity. Sparsity allows developers to reduce the computational requirements of ML models. Magic? No, computer science. Once model computational needs are reduced, Neural Magic’s inference runtime software, called DeepSparse, executes the model using sparsity-aware techniques and only CPUs, at GPU speeds.

Brian Stevens, the former CTO of Red Hat and Google Cloud, was brought in as Neural Magic’s CEO in 2021 to help lead their commercialization efforts with an emphasis on adopting an open source first model and building out their open source community efforts. Today, any developer can access SparseZoo, an open source repository of existing models that have been developed using Neural Magic’s toolkit. Next, their SparseML tool allows open source developers to optimize custom models and tweak the machine learning to apply to individualized use cases with both private and public data.

Last, developers can run their models in DeepSparse, Neural Magic’s inference runtime. DeepSparse achieves its performance using breakthrough algorithms that reduce the computation needed for neural network execution and accelerate the resulting memory-bound computation, cutting up to 90% of a model’s compute requirements to get more flexibility, faster by optimizing once and then deploying at scale. 

In 2023, the team is further building out their revenue engine and adding to an open source community of 1,400+ Slack members. In 2023 YTD they’ve seen over 1M SparseZoo model downloads, more than 2x what they saw in 2022. 

Operators to Know (Locally):

My investigative powers continue to need work so apologies to the Neural Magic team if I missed many up & coming operators internally

Key Roles To Be Hired:

If I were interviewing here are some questions I’d ask:

  • What are the key defensible technical advantages that Neural Magic is building?
  • Who are the key competitors Neural Magic faces in the years ahead?
  • What are the key milestones for 2023 and what is the long term vision for the company?
  • What are the biggest priorities as you scale the team? What are the most important roles you’ll be looking to add in 2023?

We’re optimizing for readability here so to learn more about Neural Magic you’ll have to D.Y.O.R. I’m excited to watch this team help more AI companies get more efficient with their infrastructure costs. All consumer and high powered machine learning enthusiasts applaud your efforts. See you around town!