Kaanta Tax Assistant – Usage Guide
This guide explains how to set up and operate the Kaanta Tax Assistant service, which blends a Retrieval-Augmented Generation (RAG) helper with a deterministic Nigerian tax rules engine. You can use it as a CLI tool, run it as a FastAPI microservice, or deploy it to Hugging Face Spaces via the provided Docker image.
1. Prerequisites
- Python 3.11 (recommended) for local execution.
- A Groq API key with access to
llama-3.1-8b-instant(or another model you configure). - PDF source documents placed under
data/(or a custom directory) for RAG indexing. - Basic build chain (
build-essential,git) when building Docker images.
Environment variables (configure locally in .env or as deployment secrets):
| Variable | Default | Description |
|---|---|---|
GROQ_API_KEY |
— | Required for RAG responses (Groq LLM). |
EMBED_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
Hugging Face embeddings for FAISS. |
GROQ_MODEL |
llama-3.1-8b-instant |
Groq chat model used by LangChain. |
PERSIST_DIR |
vector_store |
Directory for cached FAISS index. |
Set variables by editing .env or exporting them in your shell before running the service.
2. Install Dependencies
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt
The requirements file installs FastAPI, LangChain, FAISS CPU bindings, Groq client, Hugging Face tooling, and supporting scientific libraries.
3. Preparing Data for RAG
- Place your PDF references beneath
data/. Nested folders are supported. - The first run will build or refresh the FAISS store under
vector_store/. The hashing routine skips rebuilding unless the PDFs change. - If you already have a prepared FAISS index, drop it into
vector_store/and setPERSIST_DIRaccordingly.
Tip: If you deploy to Hugging Face Spaces, consider committing the populated
vector_store/to avoid long cold-starts.
4. Running the FastAPI Service Locally
uvicorn orchestrator:app --host 0.0.0.0 --port 8000
Endpoints:
GET /– service metadata and readiness flags.GET /health– lightweight health probe.POST /v1/query– main orchestration endpoint.
Example request:
curl -X POST http://localhost:8000/v1/query \
-H "Content-Type: application/json" \
-d '{
"question": "Compute PAYE for gross income 1,500,000",
"inputs": {"gross_income": 1500000}
}'
Illustrative response (rag_only shape omitted):
{
"mode": "calculate",
"as_of": "2025-10-15",
"tax_type": "PIT",
"summary": {"tax_due": 12345.0},
"lines": [
{
"rule_id": "pit_band_1",
"title": "First band",
"amount": 5000.0,
"output": "tax_due",
"details": {"base": 300000.0, "rate": 0.07},
"authority": [{"doc": "PITA", "section": "S.3"}],
"quote": "Optional short excerpt pulled via RAG."
}
]
}
Swagger UI and ReDoc are automatically exposed at /docs and /redoc.
5. Using the CLI Router (Orchestrator)
Although the FastAPI service is now the main entry point, you can still invoke the orchestrator CLI:
python orchestrator.py \
--question "How much VAT should I pay on 2,000,000 turnover?" \
--tax-type VAT \
--jurisdiction federal \
--inputs-json fixtures/vat_example.json
This will print the same JSON payload returned by the HTTP API.
6. Docker Workflow
Build the container:
docker build -t kaanta-tax-api .
Run locally:
docker run --rm -p 7860:7860 \
-e GROQ_API_KEY=your_key_here \
-v "$(pwd)/data:/app/data" \
-v "$(pwd)/vector_store:/app/vector_store" \
kaanta-tax-api
The container starts Uvicorn on port 7860 (the port Hugging Face Spaces expects). Mounting data/ and vector_store/ lets you reuse local assets.
7. Deploying to Hugging Face Spaces
- Create a Space, select Docker runtime.
- Add a Space secret
GROQ_API_KEY. - Push repository contents (including
Dockerfile, PDFs, optional FAISS cache). - Spaces builds automatically from the Dockerfile.
The deployed API will be reachable at https://<space-name>.hf.space/v1/query.
8. Integrating as an HTTP Microservice
Example Python client:
import requests
BASE_URL = "https://<space-name>.hf.space"
payload = {
"question": "What is the PAYE liability for 1.5M NGN salary?",
"inputs": {"gross_income": 1_500_000}
}
resp = requests.post(f"{BASE_URL}/v1/query", json=payload, timeout=60)
resp.raise_for_status()
print(resp.json())
Prefer a ready-made CLI? Run python client_demo.py --question "..." --input gross_income=1500000 to hit a live instance (defaults to https://eniiyanu-kaanta.hf.space; override with --base-url). Pass --hf-token <hf_xxx> if your Space is private.
Handle both rag_only and calculate response shapes in your downstream services.
9. Troubleshooting
- RAG not initialized: Ensure PDFs exist in
data/,GROQ_API_KEYis valid, and the Groq service is reachable. - FAISS build errors: Delete
vector_store/and rerun; check thatfaiss-cpuinstalled correctly. - Model timeouts: Adjust
with_rag_quotes_on_calctofalsefor calculator-only paths or experiment with smallertop_kvalues inrag_pipeline.py. - Docker build failures on arm64: Switch to a base image that supports FAISS for your architecture or prebuild the FAISS index elsewhere.
With this workflow, you can run Kaanta locally, ship it via Docker to Hugging Face, and consume it as a microservice or CLI tool depending on your needs.