Kaanta / USAGE.md
Oluwaferanmi
This is the latest changes
66d6b11

Kaanta Tax Assistant – Usage Guide

This guide explains how to set up and operate the Kaanta Tax Assistant service, which blends a Retrieval-Augmented Generation (RAG) helper with a deterministic Nigerian tax rules engine. You can use it as a CLI tool, run it as a FastAPI microservice, or deploy it to Hugging Face Spaces via the provided Docker image.


1. Prerequisites

  • Python 3.11 (recommended) for local execution.
  • A Groq API key with access to llama-3.1-8b-instant (or another model you configure).
  • PDF source documents placed under data/ (or a custom directory) for RAG indexing.
  • Basic build chain (build-essential, git) when building Docker images.

Environment variables (configure locally in .env or as deployment secrets):

Variable Default Description
GROQ_API_KEY Required for RAG responses (Groq LLM).
EMBED_MODEL sentence-transformers/all-MiniLM-L6-v2 Hugging Face embeddings for FAISS.
GROQ_MODEL llama-3.1-8b-instant Groq chat model used by LangChain.
PERSIST_DIR vector_store Directory for cached FAISS index.

Set variables by editing .env or exporting them in your shell before running the service.


2. Install Dependencies

python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt

The requirements file installs FastAPI, LangChain, FAISS CPU bindings, Groq client, Hugging Face tooling, and supporting scientific libraries.


3. Preparing Data for RAG

  1. Place your PDF references beneath data/. Nested folders are supported.
  2. The first run will build or refresh the FAISS store under vector_store/. The hashing routine skips rebuilding unless the PDFs change.
  3. If you already have a prepared FAISS index, drop it into vector_store/ and set PERSIST_DIR accordingly.

Tip: If you deploy to Hugging Face Spaces, consider committing the populated vector_store/ to avoid long cold-starts.


4. Running the FastAPI Service Locally

uvicorn orchestrator:app --host 0.0.0.0 --port 8000

Endpoints:

  • GET / – service metadata and readiness flags.
  • GET /health – lightweight health probe.
  • POST /v1/query – main orchestration endpoint.

Example request:

curl -X POST http://localhost:8000/v1/query \
  -H "Content-Type: application/json" \
  -d '{
        "question": "Compute PAYE for gross income 1,500,000",
        "inputs": {"gross_income": 1500000}
      }'

Illustrative response (rag_only shape omitted):

{
  "mode": "calculate",
  "as_of": "2025-10-15",
  "tax_type": "PIT",
  "summary": {"tax_due": 12345.0},
  "lines": [
    {
      "rule_id": "pit_band_1",
      "title": "First band",
      "amount": 5000.0,
      "output": "tax_due",
      "details": {"base": 300000.0, "rate": 0.07},
      "authority": [{"doc": "PITA", "section": "S.3"}],
      "quote": "Optional short excerpt pulled via RAG."
    }
  ]
}

Swagger UI and ReDoc are automatically exposed at /docs and /redoc.


5. Using the CLI Router (Orchestrator)

Although the FastAPI service is now the main entry point, you can still invoke the orchestrator CLI:

python orchestrator.py \
  --question "How much VAT should I pay on 2,000,000 turnover?" \
  --tax-type VAT \
  --jurisdiction federal \
  --inputs-json fixtures/vat_example.json

This will print the same JSON payload returned by the HTTP API.


6. Docker Workflow

Build the container:

docker build -t kaanta-tax-api .

Run locally:

docker run --rm -p 7860:7860 \
  -e GROQ_API_KEY=your_key_here \
  -v "$(pwd)/data:/app/data" \
  -v "$(pwd)/vector_store:/app/vector_store" \
  kaanta-tax-api

The container starts Uvicorn on port 7860 (the port Hugging Face Spaces expects). Mounting data/ and vector_store/ lets you reuse local assets.


7. Deploying to Hugging Face Spaces

  1. Create a Space, select Docker runtime.
  2. Add a Space secret GROQ_API_KEY.
  3. Push repository contents (including Dockerfile, PDFs, optional FAISS cache).
  4. Spaces builds automatically from the Dockerfile.

The deployed API will be reachable at https://<space-name>.hf.space/v1/query.


8. Integrating as an HTTP Microservice

Example Python client:

import requests

BASE_URL = "https://<space-name>.hf.space"

payload = {
    "question": "What is the PAYE liability for 1.5M NGN salary?",
    "inputs": {"gross_income": 1_500_000}
}

resp = requests.post(f"{BASE_URL}/v1/query", json=payload, timeout=60)
resp.raise_for_status()
print(resp.json())

Prefer a ready-made CLI? Run python client_demo.py --question "..." --input gross_income=1500000 to hit a live instance (defaults to https://eniiyanu-kaanta.hf.space; override with --base-url). Pass --hf-token <hf_xxx> if your Space is private.

Handle both rag_only and calculate response shapes in your downstream services.


9. Troubleshooting

  • RAG not initialized: Ensure PDFs exist in data/, GROQ_API_KEY is valid, and the Groq service is reachable.
  • FAISS build errors: Delete vector_store/ and rerun; check that faiss-cpu installed correctly.
  • Model timeouts: Adjust with_rag_quotes_on_calc to false for calculator-only paths or experiment with smaller top_k values in rag_pipeline.py.
  • Docker build failures on arm64: Switch to a base image that supports FAISS for your architecture or prebuild the FAISS index elsewhere.

With this workflow, you can run Kaanta locally, ship it via Docker to Hugging Face, and consume it as a microservice or CLI tool depending on your needs.