Changelog
Engineering updates to Achiral AI, model releases, and platform improvements. All changes follow Semantic Versioning.
v1.0.0
Engineering2025-12-05
Chiro Infrastructure Launch
Unified Multi-Tenant AI Infrastructure
Migrated from per-tenant Kubernetes pod architecture to shared inference with dedicated Weaviate vector stores. Each organization now gets logical isolation via native Weaviate multi-tenancy with automatic pod isolation upgrade for enterprise workloads.
Technical Stack
- •
Model: Chiro (state-of-the-art language model) - •
Inference: Shared inference service (OpenAI-compatible API) - •
Vector Store: Weaviate with multi-tenant collections - •
Fine-tuning: LoRA adapters with hot-loading - •
Development: Local development inference with automatic API detection
Tier-Based Quotas
- •
Spark: 2048 tokens/req, 10k tokens/hr, 5 concurrent requests - •
Seed: 4096 tokens/req, 100k tokens/hr, 20 concurrent requests - •
Growth: 8192 tokens/req, unlimited tokens/hr, 100 concurrent requests - •
Dedicated: 16384 tokens/req, unlimited, unlimited (auto pod isolation at $10k+/month)
RAG Pipeline
512-token document chunks with 50-token overlap, hybrid BM25+vector semantic search, automatic context injection before inference.
Pod Isolation Triggers
Elite tier ($10k+/mo), 80%+ quota utilization for 1hr, compliance requirements (HIPAA/SOC2), or P99 latency >500ms for 30min.
Upcoming Features
Multi-Model Support
Specialized model routing for healthcare, software development, and efficiency-focused tiers
Model Registry
VRAM capacity management for 128GB GX10 GPU with intelligent model loading
Observability Stack
Prometheus metrics and Grafana dashboards for real-time tenant monitoring
Training Worker Pool
Distributed LoRA fine-tuning on llm-wrk-3/4 nodes for production-scale training
Developer Resources
Stay Updated
Get notified about engineering updates, new model releases, and platform improvements.
Subscribe to Updates