Modal

Overview

Modal is a serverless cloud platform built for AI, ML, and data-intensive workloads. You define compute functions in Python using simple decorators, and Modal handles the rest: container images, GPU provisioning, autoscaling, and teardown. Containers spin up in under a second thanks to a custom Rust-based runtime, and resources scale to zero when idle. It supports thousands of GPUs across multiple clouds with no reservations or quotas. Teams use it for model inference, fine-tuning, batch processing, and sandboxed code execution.

Key Features

Sub-Second Cold Starts: Custom Rust-based container system launches GPU-enabled containers in under one second, so you spend less time waiting and more time iterating.
Elastic GPU Access: Access thousands of GPUs (from NVIDIA T4 to H100 and B200) across multiple clouds with no quotas, reservations, or capacity planning.
Python-First SDK: Define container images, hardware requirements, and compute functions entirely in Python using decorators. No YAML, Dockerfiles, or DevOps scripts needed.
Auto-Scaling to Zero: Resources automatically scale up to handle demand and back down to zero when idle. You only pay for actual compute time, billed by the second.
Sandboxed Environments: Spin up secure, ephemeral environments for running untrusted code, used by AI agent builders and code execution platforms.
Built-In Scheduling & Web Endpoints: Run recurring jobs with built-in cron scheduling, or expose functions as HTTP endpoints without setting up a web server.
Integrated Observability: Full logging and visibility into every function, container, and workload directly from the platform's dashboard.
Multi-Node Training: Spin up clusters of nodes connected with high-throughput RDMA in seconds for distributed model training.

Pros

Extremely Fast Iteration: Sub-second container starts mean you can test and deploy changes almost instantly, keeping development loops tight.
No Infrastructure Management: Skip the Kubernetes configs, Dockerfiles, and cloud console setup. Modal abstracts all of it behind a clean Python API.
Cost-Efficient for Bursty Workloads: Pay-per-second billing with scale-to-zero means you're never paying for idle GPUs, which is ideal for unpredictable traffic.
Generous Free Tier: The Starter plan includes $30/month in free compute credits, giving individuals and small teams room to experiment without cost pressure.
Strong Developer Experience: Users consistently praise the ergonomic API, responsive support, and clear documentation as standout qualities.
Broad GPU Selection: Access to a wide range of NVIDIA GPUs (T4, A10G, A100, H100, B200) without dealing with cloud provider quotas or availability headaches.

Cons

Python-Only for Primary Development: While alpha SDKs exist for JavaScript/TypeScript and Go, Modal is fundamentally a Python platform. Teams working in other languages will find it limiting.
Limited Networking Controls: No custom VPCs, firewall rules, or private networking options. This can be a blocker for organizations with strict security or compliance requirements.
Not Built for Multi-Service Apps: Modal excels at running individual functions and jobs, but it's not designed for orchestrating full multi-service applications with frontends and backends.
No Native CI/CD Integration: You'll need to wire up your own deployment pipelines, automated testing, and preview environments using external tools.

Use Cases

Model Inference at Scale: Deploy LLMs, image generators, or custom ML models as API endpoints that auto-scale to thousands of GPUs during traffic spikes and scale back to zero during quiet periods. Teams like Cartesia and Gan.AI use Modal for production inference.
Fine-Tuning Open-Source Models: Spin up single or multi-node GPU clusters in seconds to fine-tune models like Llama or Stable Diffusion. No cluster setup or management needed, just write your training script and run it.
Batch Data Processing: Run large-scale parallel jobs for tasks like audio transcription, video processing, or protein folding. Modal handles the orchestration and lets you scale to thousands of containers on demand.
AI Agent Code Execution: Create secure, sandboxed environments where AI agents can safely execute untrusted code. Companies like Lovable use Modal sandboxes for this purpose.
Rapid Prototyping: Go from a local Python script to a cloud-deployed function with GPU access in minutes. Ideal for researchers and engineers who want to validate ideas quickly without infrastructure overhead.
Recurring ML Pipelines: Set up scheduled jobs for spam detection, recommendation systems, or data transformation pipelines that run on a cron schedule with built-in concurrency support.

Frequently Asked Questions

What is Modal?

Modal is a serverless cloud platform that lets you run compute-intensive Python code, especially AI and ML workloads, without managing any infrastructure. You write Python functions, add a decorator, and Modal handles provisioning GPUs, building containers, scaling, and billing.

How much does Modal cost?

Modal uses pay-per-second billing based on actual CPU, GPU, and memory usage. The Starter plan is free with $30/month in compute credits. The Team plan costs $250/month with $100 in credits, unlimited seats, and higher concurrency limits. Enterprise pricing is custom. GPU costs vary by type, for example, NVIDIA A10G runs at roughly $1.10/hour and B200 at $6.25/hour.

Do I need to know Kubernetes or Docker to use Modal?

No. Modal abstracts away containers, orchestration, and infrastructure entirely. You define everything in Python, including container images and hardware requirements. There are no YAML files, Dockerfiles, or kubectl commands involved.

What programming languages does Modal support?

Python is the primary and fully supported language. Modal has released alpha SDKs for JavaScript/TypeScript and Go that allow calling Modal functions and managing resources, but building Modal applications is currently Python-only.

What GPUs are available on Modal?

Modal provides access to a wide range of NVIDIA GPUs including T4, A10G, L4, A100, H100, and B200. Each container can use up to 8 NVIDIA H100 GPUs, 64 CPUs, and 336 GB of memory. GPU availability spans multiple cloud providers.

How fast are Modal's cold starts?

Modal's custom Rust-based container runtime achieves sub-second cold starts, even for GPU-enabled containers. The platform claims to be up to 100x faster than standard Docker-based systems.

Can I use Modal for production workloads?

Yes. Many companies run production inference, batch processing, and ML pipelines on Modal. The platform supports auto-scaling, web endpoints, scheduled jobs, and integrated monitoring. However, it's not designed for full multi-service application architectures.

Is there a free tier?

Yes. The Starter plan costs $0/month and includes $30 in monthly compute credits, 3 workspace seats, up to 10 GPU concurrency, and 100 containers. This is enough for experimentation, prototyping, and small-scale workloads.