Storage & Compute Practice Manager / AI Pod Architect at Business Technology Architects (BTA), Alpharetta, GA
Presales Engineer building AI demos and proof-of-concept systems at BTA. Design, build, and document production-grade AI inference platforms from bare metal to deployment. Focus on reproducible infrastructure, comprehensive training development, and enterprise integration. Lead architecture and deployment of AI infrastructure for Cisco Partners, centered on Cisco UCS X-Series, NVIDIA GPU clusters, and Pure Storage platforms.
Enterprise AI Inference Platform
Mission-Critical Training System: Designed and built complete AI inference platform enabling Cisco Partners worldwide to deploy private, air-gapped LLM systems for enterprise customers in regulated industries (healthcare, finance, government, defense). The git repository IS the product - Partners clone, run bootstrap.sh, and deploy working AI infrastructure in under 1 hour with zero tribal knowledge required.
AI Technical Skills
LLM Deployment & Inference
- vLLM production deployment with quantization, tensor parallelism, and memory optimization
- Multi-model platform with hot-swap switching (Llama, Mistral, DeepSeek, Qwen families)
- OpenAI-compatible APIs for LangChain, LlamaIndex, and SDK integration
- RAG architecture: embeddings, vector databases, chunking strategies
AI Platform Development
- ChatGPT-like interfaces with streaming responses and conversation management
- Real-time dashboards: GPU metrics, inference stats, cost tracking
- Flask APIs with OpenAPI documentation and audit logging
Infrastructure & Automation
- One-command deployment: Docker β Kubernetes β GPU Operator β vLLM
- Kubernetes for AI: GPU scheduling, resource quotas, Helm charts
- Observability: Prometheus, Grafana, GPU telemetry
Security & Compliance
- Air-gapped deployment for regulated industries (HIPAA, SOC2, FedRAMP)
- Multi-tenant architecture with namespace isolation and RBAC
- Data sovereignty: zero external transmission, local-only storage
Key Outcomes & Impact
- One-Command AI Deployment: Partners deploy complete AI infrastructure from git clone in under 1 hour with zero tribal knowledge required. 100% success rate across all deployments.
- Multi-Model Production Platform: Enterprise-ready system with dynamic model switching for diverse use cases.
- ChatGPT-Like User Experience: Professional web interface with streaming responses, conversation management, live GPU metrics, and real-time cost tracking.
- OpenAI-Compatible API: Drop-in replacement for ChatGPT integrations - works with LangChain, LlamaIndex, and any OpenAI SDK. Enables immediate adoption without code changes.
- Team Enablement: Mentored engineers on AI infrastructure; created hands-on training programs. All processes documented - zero dependency on tribal knowledge.