Job Title: AI QA ENGINEER (AGENTIC & GENERATIVE)
Job Location: Dallas TX
Job Type: Contract
Job Description:
- Quality Strategy & Leadership
- Agentic & Multi Agent Testing
- Reliability Resiliency and Latency
- Accuracy & Macro-Level Validations
- Scale & Orchestration
- Dev Prod Readiness
- Define and own the QA strategy for agentic/multi-agent AI systems across dev staging and prod.
- Mentor a team of QA engineers; establish testing standards coding guidelines for test harnesses and review practices.
- Partner with Agentic Operations Data Science MLOps and Platform teams to embed QA in the SDLC and incident response.
- Design tests for agent orchestration tool calling planner-executor loops and inter-agent coordination (e.g. task decomposition handoff integrity and convergence to goals).
- Validate state management context windows memory/knowledge stores and prompt/graph correctness under varying conditions.
- Implement scenario fuzzing (e.g. adversarial inputs prompt perturbations tool latency spikes degraded APIs).
- Create resilience testing suites: chaos experiments failover retries/backoff circuit-breaking and degraded mode behavior.
- Establish latency SLOs and measure end-to-end response times across orchestration layers (LLM calls tool invocations queues).
- Ensure reliability through soak tests canary verifications and automated rollbacks.
- Define ground-truth and reference pipelines for task accuracy (exact match semantic similarity factuality checks).
- Build macro validation frameworks that validate task outcomes across multi-step agent workflows (e.g. complex data pipelines content generation verification agent loops).
- Instrument guardrail validations (toxicity PII hallucination policy compliance).
- Design load/stress tests for multi-agent graphs under scale (concurrency throughput queue depth backpressure).
- Validate orchestrator correctness (DAG execution retries branching timeouts compensation paths).
- Engineer reusable test artifacts (scenario configs synthetic datasets prompt libraries agent graph fixtures simulators).
- Integrate tests into CI/CD (pre-merge gates nightly canary) and production monitoring with alerting tied to KPIs.
- Define release criteria and run operational readiness (performance security compliance cost/latency budgets).
- Build post-deployment validation playbooks and incident triage runbooks.
Required Qualifications
- 7 years in Software QA/Testing with 2 years in AI/ML or LLM-based systems; hands-on experience testing agentic/multi-agent architectures.
- Strong programming skills in Python or TypeScript/JavaScript; experience building test harnesses simulators and fixtures.
- Experience with LLM evaluation (exact/soft match BLEU/ROUGE BERTScore semantic similarity via embeddings) guardrails and prompt testing.
- Expertise in distributed systems testing latency profiling resiliency patterns (circuit breakers retries) chaos engineering and message queues.
- Familiarity with orchestration frameworks (LangChain LangGraph LlamaIndex DSPy OpenAI Assistants/Actions Azure OpenAI orchestration or similar).
- Proficiency with CI/CD (GitHub Actions/Azure DevOps) observability (OpenTelemetry Prometheus/Grafana Datadog) and feature flags/canaries.
- Solid understanding of privacy/security/compliance in AI systems (PII handling content policies model safety).
- Excellent communication and leadership skills; proven ability to work cross-functionally with Ops Data and Engineering.
Preferred Qualifications
- Experience with multi-agent simulators agent graph testing and tooling latency emulation.
- Knowledge of MLOps (model versioning datasets evaluation pipelines) and A/B experimentation for LLMs.
- Background in cloud (AWS) serverless containerization and event-driven architectures.
- Prior ownership of cost/latency/SLAs for AI workloads in production.
Job Title: AI QA ENGINEER (AGENTIC & GENERATIVE) Job Location: Dallas TX Job Type: Contract Job Description: Quality Strategy & Leadership Agentic & Multi Agent Testing Reliability Resiliency and Latency Accuracy & Macro-Level Validations Scale & Orchestration Dev Prod Readiness Define and ow...
Job Title: AI QA ENGINEER (AGENTIC & GENERATIVE)
Job Location: Dallas TX
Job Type: Contract
Job Description:
- Quality Strategy & Leadership
- Agentic & Multi Agent Testing
- Reliability Resiliency and Latency
- Accuracy & Macro-Level Validations
- Scale & Orchestration
- Dev Prod Readiness
- Define and own the QA strategy for agentic/multi-agent AI systems across dev staging and prod.
- Mentor a team of QA engineers; establish testing standards coding guidelines for test harnesses and review practices.
- Partner with Agentic Operations Data Science MLOps and Platform teams to embed QA in the SDLC and incident response.
- Design tests for agent orchestration tool calling planner-executor loops and inter-agent coordination (e.g. task decomposition handoff integrity and convergence to goals).
- Validate state management context windows memory/knowledge stores and prompt/graph correctness under varying conditions.
- Implement scenario fuzzing (e.g. adversarial inputs prompt perturbations tool latency spikes degraded APIs).
- Create resilience testing suites: chaos experiments failover retries/backoff circuit-breaking and degraded mode behavior.
- Establish latency SLOs and measure end-to-end response times across orchestration layers (LLM calls tool invocations queues).
- Ensure reliability through soak tests canary verifications and automated rollbacks.
- Define ground-truth and reference pipelines for task accuracy (exact match semantic similarity factuality checks).
- Build macro validation frameworks that validate task outcomes across multi-step agent workflows (e.g. complex data pipelines content generation verification agent loops).
- Instrument guardrail validations (toxicity PII hallucination policy compliance).
- Design load/stress tests for multi-agent graphs under scale (concurrency throughput queue depth backpressure).
- Validate orchestrator correctness (DAG execution retries branching timeouts compensation paths).
- Engineer reusable test artifacts (scenario configs synthetic datasets prompt libraries agent graph fixtures simulators).
- Integrate tests into CI/CD (pre-merge gates nightly canary) and production monitoring with alerting tied to KPIs.
- Define release criteria and run operational readiness (performance security compliance cost/latency budgets).
- Build post-deployment validation playbooks and incident triage runbooks.
Required Qualifications
- 7 years in Software QA/Testing with 2 years in AI/ML or LLM-based systems; hands-on experience testing agentic/multi-agent architectures.
- Strong programming skills in Python or TypeScript/JavaScript; experience building test harnesses simulators and fixtures.
- Experience with LLM evaluation (exact/soft match BLEU/ROUGE BERTScore semantic similarity via embeddings) guardrails and prompt testing.
- Expertise in distributed systems testing latency profiling resiliency patterns (circuit breakers retries) chaos engineering and message queues.
- Familiarity with orchestration frameworks (LangChain LangGraph LlamaIndex DSPy OpenAI Assistants/Actions Azure OpenAI orchestration or similar).
- Proficiency with CI/CD (GitHub Actions/Azure DevOps) observability (OpenTelemetry Prometheus/Grafana Datadog) and feature flags/canaries.
- Solid understanding of privacy/security/compliance in AI systems (PII handling content policies model safety).
- Excellent communication and leadership skills; proven ability to work cross-functionally with Ops Data and Engineering.
Preferred Qualifications
- Experience with multi-agent simulators agent graph testing and tooling latency emulation.
- Knowledge of MLOps (model versioning datasets evaluation pipelines) and A/B experimentation for LLMs.
- Background in cloud (AWS) serverless containerization and event-driven architectures.
- Prior ownership of cost/latency/SLAs for AI workloads in production.
View more
View less