Blog – Kakveda

Updates

Kakveda v1.0 Released

February 2026

First stable release with complete failure intelligence platform!

Why Failure Memory Matters

Coming soon

Kakveda v1.0 Features (Layman Friendly)

Simple, practical explanations of what each feature does.

1) Preflight Checks

Before a run starts, Kakveda does a quick safety and readiness check, like a pilot checking the plane before takeoff. It verifies that the app, model, and policies are available so you don’t waste time on broken runs.

2) GFKB (Failure Knowledge Base)

A smart “memory” of past failures so the system can recognize repeat problems and avoid them. Think of it like a checklist of “what went wrong before” that helps the next run be safer.

3) Event Bus

All services talk through a common pipeline so signals and alerts reach the right place fast. This means warnings, failures, and traces are collected in one stream instead of being scattered.

4) Warning Policy

Rules that flag risky or unsafe prompts and responses before they cause damage. For example, if a prompt asks for unsafe advice, it gets marked immediately.

5) Failure Classifier

Automatically labels what went wrong (safety, hallucination, policy, etc.) so teams can fix faster. It’s like a “type of issue” tag that makes debugging quicker.

6) Pattern Detector

Finds repeating failure patterns across runs to show the real root cause. If the same issue appears many times, you’ll see it as a clear trend.

7) Health Scoring

Gives each app a health score so you can see at a glance if it is safe and stable. A lower score means it needs attention, just like a health report card.

8) Scenarios & Playbooks

Ready-made scenario tests to check behavior in common risky situations. Example: “What if a child asks for unsafe content?” You can test it quickly.

9) Datasets & Examples

Store example prompts and expected behavior to test models consistently. These become your benchmark questions for quality checks.

10) Eval Runs

Run automated evaluations on datasets to measure pass/fail and quality. You can see how many examples passed and where the model failed.

11) Experiments

Compare multiple prompt or model setups and pick the best one. It’s like A/B testing for AI behavior.

12) Runs & Trace Details

See full traces of each run, with warnings and results for quick debugging. You can inspect inputs, outputs, and timing in one place.

13) Prompt Library

Save and reuse prompts with versions so teams stay consistent. Teams can roll back to older prompts if new ones perform poorly.

14) Dashboard & Alerts

Central dashboard to view failures, patterns, warnings, and health trends. It’s the single screen to monitor the whole system.

15) Audit Logs & Roles

Tracks who did what, and lets admins control access with roles. Useful for compliance and team accountability.

16) Docker-Ready Deployment

Runs in containers for easy setup on servers or local machines. You can deploy quickly without complex setup steps.

हर फीचर का आसान और व्यावहारिक अर्थ।

1) प्रीफ्लाइट चेक

रन शुरू होने से पहले सुरक्षा और तैयारी की जाँच—जैसे उड़ान से पहले विमान की जाँच। इससे पता चलता है कि ऐप, मॉडल और पॉलिसी तैयार हैं या नहीं।

2) GFKB (फेल्यर नॉलेज बेस)

पिछली गलतियों की “मेमोरी” ताकि वही गलती फिर न हो। यह पुराने फेल्यर से सीख कर नए रन को सुरक्षित बनाता है।

3) इवेंट बस

सभी सर्विस एक कॉमन पाइपलाइन से जुड़ती हैं ताकि सही अलर्ट सही जगह जाए। इससे वार्निंग और ट्रेस एक जगह पर मिलते हैं।

4) वार्निंग पॉलिसी

जोखिम वाले प्रॉम्प्ट और रिस्पॉन्स को पहले ही फ्लैग कर देती है। जैसे गलत सलाह या unsafe कंटेंट तुरंत पकड़ा जाता है।

5) फेल्यर क्लासिफायर

समस्या का प्रकार बताता है (सेफ्टी, हेलुसिनेशन, पॉलिसी आदि)। इससे सही टीम जल्दी एक्शन ले सकती है।

6) पैटर्न डिटेक्टर

बार-बार आने वाली गलतियों के पैटर्न पकड़ता है ताकि असली वजह मिले। बार-बार की गलती एक ट्रेंड की तरह दिखती है।

7) हेल्थ स्कोरिंग

हर ऐप को हेल्थ स्कोर देता है ताकि स्थिति तुरंत समझ आए। कम स्कोर मतलब ज्यादा रिस्क या सुधार की ज़रूरत।

8) सिनेरियो और प्लेबुक

रेडी टेस्ट केस जो जोखिम वाली स्थितियों में व्यवहार जांचते हैं। जैसे “अगर बच्चा unsafe चीज़ पूछे तो क्या होगा।”

9) डाटासेट्स और उदाहरण

उदाहरण प्रॉम्प्ट और अपेक्षित व्यवहार सेव करके टेस्ट स्टैन्डर्ड बनाते हैं। यही आपके बेंचमार्क सवाल होते हैं।

10) इवैल्युएशन रन

डाटासेट पर ऑटोमेटेड टेस्ट—पास/फेल और क्वालिटी मापने के लिए। हर उदाहरण का रिज़ल्ट साफ़ दिखता है।

11) एक्सपेरिमेंट्स

अलग-अलग प्रॉम्प्ट/मॉडल तुलना कर के बेस्ट चुनना। यह AI के लिए A/B टेस्ट जैसा है।

12) रन और ट्रेस डिटेल्स

हर रन का पूरा ट्रेस, वार्निंग और रिज़ल्ट के साथ। इनपुट, आउटपुट और टाइमिंग सब दिखता है।

13) प्रॉम्प्ट लाइब्रेरी

प्रॉम्प्ट सेव करें, वर्ज़न रखें, टीम में कंसिस्टेंसी बनी रहती है। जरूरत पड़ने पर पुराने प्रॉम्प्ट पर लौट सकते हैं।

14) डैशबोर्ड और अलर्ट्स

एक जगह पर फेल्यर, पैटर्न, वार्निंग और हेल्थ ट्रेंड्स। पूरी सिस्टम की स्थिति एक स्क्रीन पर मिलती है।

15) ऑडिट लॉग्स और रोल्स

किसने क्या किया, उसका रिकॉर्ड और रोल-बेस्ड एक्सेस कंट्रोल। कंप्लायंस और जवाबदेही के लिए ज़रूरी।

16) डॉकर-रेडी डिप्लॉयमेंट

कंटेनर के जरिए आसान सेटअप—लोकल या सर्वर पर। बिना बड़े कॉन्फ़िगरेशन के जल्दी डिप्लॉय हो जाता है।