AI QA ENGINEER (AGENTIC & GENERATIVE)

Dallas, IA - USA

Monthly Salary: Not Disclosed

Posted on: 1 hour ago

Vacancies: 1 Vacancy

Job Summary

Job Title: AI QA ENGINEER (AGENTIC & GENERATIVE)
Job Location: Dallas TX
Job Type: Contract

Job Description:

Quality Strategy & Leadership
Agentic & Multi Agent Testing
Reliability Resiliency and Latency
Accuracy & Macro-Level Validations
Scale & Orchestration
Dev Prod Readiness
Define and own the QA strategy for agentic/multi-agent AI systems across dev staging and prod.
Mentor a team of QA engineers; establish testing standards coding guidelines for test harnesses and review practices.
Partner with Agentic Operations Data Science MLOps and Platform teams to embed QA in the SDLC and incident response.
Design tests for agent orchestration tool calling planner-executor loops and inter-agent coordination (e.g. task decomposition handoff integrity and convergence to goals).
Validate state management context windows memory/knowledge stores and prompt/graph correctness under varying conditions.
Implement scenario fuzzing (e.g. adversarial inputs prompt perturbations tool latency spikes degraded APIs).
Create resilience testing suites: chaos experiments failover retries/backoff circuit-breaking and degraded mode behavior.
Establish latency SLOs and measure end-to-end response times across orchestration layers (LLM calls tool invocations queues).
Ensure reliability through soak tests canary verifications and automated rollbacks.
Define ground-truth and reference pipelines for task accuracy (exact match semantic similarity factuality checks).
Build macro validation frameworks that validate task outcomes across multi-step agent workflows (e.g. complex data pipelines content generation verification agent loops).
Instrument guardrail validations (toxicity PII hallucination policy compliance).
Design load/stress tests for multi-agent graphs under scale (concurrency throughput queue depth backpressure).
Validate orchestrator correctness (DAG execution retries branching timeouts compensation paths).
Engineer reusable test artifacts (scenario configs synthetic datasets prompt libraries agent graph fixtures simulators).
Integrate tests into CI/CD (pre-merge gates nightly canary) and production monitoring with alerting tied to KPIs.
Define release criteria and run operational readiness (performance security compliance cost/latency budgets).
Build post-deployment validation playbooks and incident triage runbooks.

Required Qualifications

7 years in Software QA/Testing with 2 years in AI/ML or LLM-based systems; hands-on experience testing agentic/multi-agent architectures.
Strong programming skills in Python or TypeScript/JavaScript; experience building test harnesses simulators and fixtures.
Experience with LLM evaluation (exact/soft match BLEU/ROUGE BERTScore semantic similarity via embeddings) guardrails and prompt testing.
Expertise in distributed systems testing latency profiling resiliency patterns (circuit breakers retries) chaos engineering and message queues.
Familiarity with orchestration frameworks (LangChain LangGraph LlamaIndex DSPy OpenAI Assistants/Actions Azure OpenAI orchestration or similar).
Proficiency with CI/CD (GitHub Actions/Azure DevOps) observability (OpenTelemetry Prometheus/Grafana Datadog) and feature flags/canaries.
Solid understanding of privacy/security/compliance in AI systems (PII handling content policies model safety).
Excellent communication and leadership skills; proven ability to work cross-functionally with Ops Data and Engineering.

Preferred Qualifications

Experience with multi-agent simulators agent graph testing and tooling latency emulation.
Knowledge of MLOps (model versioning datasets evaluation pipelines) and A/B experimentation for LLMs.
Background in cloud (AWS) serverless containerization and event-driven architectures.
Prior ownership of cost/latency/SLAs for AI workloads in production.

Job Title: AI QA ENGINEER (AGENTIC & GENERATIVE) Job Location: Dallas TX Job Type: Contract Job Description: Quality Strategy & Leadership Agentic & Multi Agent Testing Reliability Resiliency and Latency Accuracy & Macro-Level Validations Scale & Orchestration Dev Prod Readiness Define and ow...

Job Title: AI QA ENGINEER (AGENTIC & GENERATIVE)
Job Location: Dallas TX
Job Type: Contract

Job Description:

Quality Strategy & Leadership
Agentic & Multi Agent Testing
Reliability Resiliency and Latency
Accuracy & Macro-Level Validations
Scale & Orchestration
Dev Prod Readiness
Define and own the QA strategy for agentic/multi-agent AI systems across dev staging and prod.
Mentor a team of QA engineers; establish testing standards coding guidelines for test harnesses and review practices.
Partner with Agentic Operations Data Science MLOps and Platform teams to embed QA in the SDLC and incident response.
Design tests for agent orchestration tool calling planner-executor loops and inter-agent coordination (e.g. task decomposition handoff integrity and convergence to goals).
Validate state management context windows memory/knowledge stores and prompt/graph correctness under varying conditions.
Implement scenario fuzzing (e.g. adversarial inputs prompt perturbations tool latency spikes degraded APIs).
Create resilience testing suites: chaos experiments failover retries/backoff circuit-breaking and degraded mode behavior.
Establish latency SLOs and measure end-to-end response times across orchestration layers (LLM calls tool invocations queues).
Ensure reliability through soak tests canary verifications and automated rollbacks.
Define ground-truth and reference pipelines for task accuracy (exact match semantic similarity factuality checks).
Build macro validation frameworks that validate task outcomes across multi-step agent workflows (e.g. complex data pipelines content generation verification agent loops).
Instrument guardrail validations (toxicity PII hallucination policy compliance).
Design load/stress tests for multi-agent graphs under scale (concurrency throughput queue depth backpressure).
Validate orchestrator correctness (DAG execution retries branching timeouts compensation paths).
Engineer reusable test artifacts (scenario configs synthetic datasets prompt libraries agent graph fixtures simulators).
Integrate tests into CI/CD (pre-merge gates nightly canary) and production monitoring with alerting tied to KPIs.
Define release criteria and run operational readiness (performance security compliance cost/latency budgets).
Build post-deployment validation playbooks and incident triage runbooks.

Required Qualifications

7 years in Software QA/Testing with 2 years in AI/ML or LLM-based systems; hands-on experience testing agentic/multi-agent architectures.
Strong programming skills in Python or TypeScript/JavaScript; experience building test harnesses simulators and fixtures.
Experience with LLM evaluation (exact/soft match BLEU/ROUGE BERTScore semantic similarity via embeddings) guardrails and prompt testing.
Expertise in distributed systems testing latency profiling resiliency patterns (circuit breakers retries) chaos engineering and message queues.
Familiarity with orchestration frameworks (LangChain LangGraph LlamaIndex DSPy OpenAI Assistants/Actions Azure OpenAI orchestration or similar).
Proficiency with CI/CD (GitHub Actions/Azure DevOps) observability (OpenTelemetry Prometheus/Grafana Datadog) and feature flags/canaries.
Solid understanding of privacy/security/compliance in AI systems (PII handling content policies model safety).
Excellent communication and leadership skills; proven ability to work cross-functionally with Ops Data and Engineering.

Preferred Qualifications

Experience with multi-agent simulators agent graph testing and tooling latency emulation.
Knowledge of MLOps (model versioning datasets evaluation pipelines) and A/B experimentation for LLMs.
Background in cloud (AWS) serverless containerization and event-driven architectures.
Prior ownership of cost/latency/SLAs for AI workloads in production.