WhatsApp-first • Text • Voice • Images

WhatsApp-first AI that understands text, voice notes, and images.

WaNova routes your messages to the right capability—chat, speech-to-text, image understanding (and optional image generation)—then sends the result back in the same WhatsApp conversation.

Watch Demo Set up WaNova

No app install required for users.

WhatsApp Cloud webhook integration

Multi-modal support: text, audio, images

Runs on your own infrastructure

WaNova Agent

Explain quantum computing in simple words.

Quantum computing uses quantum bits that can represent many possibilities at once, so some problems can be solved much faster than with normal computers.

0:12

Transcribing audio...

Transcript

Explain machine learning in simple terms.

Machine learning means teaching computers patterns from examples so they can make predictions, like spam filtering or product recommendations.

Audio reply available (TTS)

What recipe can I make with this?

Analyzing image...

One-pot biryani (quick): fry onions + whole spices, mix yogurt + biryani masala with chicken/veg, fold in parboiled rice, then steam 20–25 minutes. Rest 10 minutes and serve.

Message

Why another app is the wrong starting point

In India, many people are still reluctant to install “yet another app” just to ask a question. WaNova avoids that friction: you already have WhatsApp open, so you just message and get answers back.

No new app install for end users

Fast, familiar interaction inside an existing chat

Consistent experience across text, voice notes, and images

Everything you expect from an AI assistant—right in WhatsApp

Under the hood, a router decides whether to run chat, audio, or image workflows based on your request.

Text Q&A

Chat with WaNova in WhatsApp with text responses by default.

Voice notes (STT)

Send voice notes; WaNova transcribes with Whisper and responds.

Voice replies (optional TTS)

Request voice output; WaNova can send audio via TTS when enabled.

Image understanding

Send images; WaNova analyzes them using vision and replies with text (and optionally other modes).

Generated images

Request image generation (e.g., “Generate an image of ...”). Explicit request required.

Memory

Keep context using short-term state plus long-term memory via Qdrant.

Prompting tips that improve reliability

Because WaNova routes requests to different workflows, being explicit about what you want improves results.

Tip: include “voice note”, “image”, or “generate image” for clearer intent.

How WaNova processes your message

Webhook

1

Meta’s WhatsApp Cloud API calls the webhook at /whatsapp_response.

WaNova downloads any media you send (audio or image).

2

Media processing

Routing

3

A router selects the workflow: conversation, image, or audio.

Capabilities run: chat (LLM), speech-to-text, and vision analysis (plus optional image generation and optional TTS).

4

Execution

Response

5

WaNova returns the response to the same WhatsApp user.

Supported incoming payload types: text, audio, image. Other types receive a friendly fallback asking for text, audio, or image.

Watch WaNova in action

See WaNova respond to text Q&A, voice notes (transcribe + answer), and image requests (analysis + reply).

Set up WaNova Open Getting Started docs

Deploy WaNova in minutes

Configure environment variables, run with Docker Compose, and start receiving WhatsApp webhook events. This repo includes a local quick start and production-ready wiring for the WhatsApp webhook + LangGraph agent.

Copy .env.example to .env

Add API keys and WhatsApp webhook credentials

Start services with Docker Compose

Verify endpoints (Chainlit UI and webhook route)

View Getting Started docs View repository

docker-compose.yml

version: '3.8'
services:
  wanova:
    build: .
    ports:
      - "8000:8000"
    env_file:
      - .env
  qdrant:
    image: qdrant/qdrant

Built with proven components

GROQ: Llama models + Whisper + vision

Qdrant: long-term memory

GCP: container deployment

LangGraph: workflow routing

ElevenLabs: voice responses (TTS)

Together AI: image generation

How much does it cost?

You can run WaNova on your own computer for free. The free tiers from Groq, ElevenLabs, Qdrant Cloud, and Together AI are typically enough to get started.

If you deploy to Google Cloud Run, you can try it with a free account and starter credits. Even after free credits, Cloud Run is typically inexpensive for experiments.

FAQ

Which WhatsApp message types are supported?

WaNova supports text, audio (voice notes), and images. Any other incoming message type receives a friendly fallback response asking you to send text, audio, or image.

How does WaNova choose what to do?

A router decides the workflow: conversation, image, or audio. Being explicit about “voice note”, “image”, or “generate image” improves reliability.

Does WaNova keep context?

Yes. It uses short-term state plus long-term memory stored in Qdrant.

Can I run it locally?

Yes. The README includes a local quick start using Docker Compose.

Is voice output optional?

Yes. Voice replies (TTS) are optional and depend on configuration.

Can it generate images?

Yes, when you explicitly request image generation.

What providers does it use?

Groq (chat/vision/Whisper), Qdrant (memory), ElevenLabs (TTS), Together AI (image generation), and LangGraph (routing).

Privacy policy

WaNova processes the messages and media you send so the agent can generate a response. Data flow depends on the providers configured in your deployment.

You control deployment and configuration. If you self-host, you can define your own retention, logging, and access controls for operational data.

This section is a placeholder policy for the demo site. Replace with your project’s formal legal text before production use.

Terms of use

WaNova is provided as-is for experimentation and implementation reference. You are responsible for configuration, monitoring, and compliance in your deployment environment.

Provider services used by WaNova may have separate terms, quotas, and pricing. Review those terms before production rollout.

This section is a placeholder terms block for the single-page demo. Replace with your organization’s approved terms for public release.