Skip to content

AWS Bedrock Error Troubleshooting

Resolve AWS Bedrock AI-related issues with workers and correction jobs.

Quick Diagnosis

Run these commands to quickly identify the issue:

# Check worker logs for Bedrock errors
docker logs newhires-workers --tail=50 | grep -i "bedrock\|aws\|error"

# Verify AWS credentials are set
docker exec newhires-workers env | grep AWS

# Test Bedrock access from worker
docker exec newhires-workers python3 -c "
import boto3
client = boto3.client('bedrock-runtime', region_name='us-east-1')
print('Bedrock client created successfully')
"

Common Bedrock Errors

"AccessDeniedException" or "UnrecognizedClientException"

Symptom: Worker logs show:

botocore.exceptions.ClientError: An error occurred (AccessDeniedException)
or
botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException)

Cause: Invalid or missing AWS credentials

Solutions:

  1. Verify credentials in .env:

    grep AWS_ .env
    # Should show:
    # AWS_ACCESS_KEY_ID=AKIA...
    # AWS_SECRET_ACCESS_KEY=...
    # AWS_REGION=us-east-1
    

  2. Check credentials format:

  3. Access Key ID should start with AKIA
  4. Secret Access Key is 40 characters
  5. No quotes around values in .env

  6. Verify credentials in AWS Console:

  7. Go to AWS Console → IAM → Users → Your User → Security credentials
  8. Check if access key is Active
  9. If inactive or deleted, create new one

  10. Test credentials:

    export AWS_ACCESS_KEY_ID="your_key"
    export AWS_SECRET_ACCESS_KEY="your_secret"
    aws sts get-caller-identity
    # Should return your AWS account info
    

  11. Restart workers with correct credentials:

    docker-compose -f docker-compose.prod.yml restart workers
    


"AccessDeniedException: User is not authorized"

Symptom: Worker logs show:

AccessDeniedException: User: arn:aws:iam::XXXX:user/newhires is not authorized
to perform: bedrock:InvokeModel on resource: arn:aws:bedrock:...

Cause: IAM user lacks bedrock:InvokeModel permission

Solutions:

  1. Attach correct IAM policy to your user:

Go to AWS Console → IAM → Users → Your User → Add permissions → Attach policies

Create/attach policy with this JSON:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/us.anthropic.claude-sonnet-4-5-20250929-v1:0",
        "arn:aws:bedrock:us-east-1::foundation-model/us.meta.llama4-scout-17b-instruct-v1:0"
      ]
    }
  ]
}

  1. Wait 1-2 minutes for IAM changes to propagate

  2. Restart workers:

    docker-compose -f docker-compose.prod.yml restart workers
    

See Also: AWS Bedrock Setup Guide for complete IAM setup


"ResourceNotFoundException: Could not resolve foundation model"

Symptom: Worker logs show:

ResourceNotFoundException: Could not resolve the foundation model from the model identifier

Cause: Model access not enabled in AWS Bedrock console

Solutions:

  1. Enable model access in AWS Console:
  2. Go to: https://console.aws.amazon.com/bedrock/home?region=us-east-1#/modelaccess
  3. Click "Manage model access"
  4. Check "Claude Sonnet 4.5" (and optionally "Llama 4 Scout")
  5. Click "Request model access"
  6. Wait for status to change from "Pending" to "Access granted"

  7. Verify model ID is correct in .env:

    grep BEDROCK_MODEL_ID .env
    # Should be empty (uses default) or:
    # BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0
    

  8. Check model is available in us-east-1:

    aws bedrock list-foundation-models --region us-east-1 --query 'modelSummaries[?contains(modelId, `claude`)].modelId'
    

  9. Restart workers:

    docker-compose -f docker-compose.prod.yml restart workers
    


"ThrottlingException: Rate exceeded"

Symptom: Worker logs show:

ThrottlingException: Rate exceeded for operation InvokeModel

Cause: Too many Bedrock API calls too quickly (hitting rate limits)

Solutions:

  1. Reduce concurrent calls in .env:

    # Edit .env
    MAX_CONCURRENT_BEDROCK_CALLS=1  # Reduce from 2 to 1
    

  2. Increase poll interval to reduce pressure:

    # Edit .env
    POLL_INTERVAL=10  # Increase from 5 to 10 seconds
    

  3. Scale down workers if running multiple:

    docker-compose -f docker-compose.prod.yml up -d --scale workers=1
    

  4. Wait and retry:

  5. Throttling is temporary
  6. Workers will automatically retry with exponential backoff
  7. Check logs: docker logs newhires-workers --tail=50

  8. Request higher limits (if persistent):

  9. Go to AWS Service Quotas console
  10. Request quota increase for Bedrock InvokeModel

Note: Claude Sonnet 4.5 default limits: - Requests per minute: Varies by account - Tokens per minute: Varies by account


"Could not connect to the endpoint URL"

Symptom: Worker logs show:

Could not connect to the endpoint URL: "https://bedrock-runtime.us-east-1.amazonaws.com/"

Cause: Network connectivity or wrong AWS region

Solutions:

  1. Verify AWS region in .env:

    grep AWS_REGION .env
    # Should be: AWS_REGION=us-east-1
    

  2. Test network connectivity:

    # From host
    curl -I https://bedrock-runtime.us-east-1.amazonaws.com
    
    # From worker container
    docker exec newhires-workers curl -I https://bedrock-runtime.us-east-1.amazonaws.com
    

  3. Check firewall/proxy settings:

  4. Ensure outbound HTTPS (port 443) is allowed
  5. Check if corporate firewall blocks AWS services
  6. Configure proxy if needed

  7. Restart workers:

    docker-compose -f docker-compose.prod.yml restart workers
    


"ValidationException: The provided model identifier is invalid"

Symptom: Worker logs show:

ValidationException: The provided model identifier is invalid

Cause: Incorrect BEDROCK_MODEL_ID in .env

Solutions:

  1. Use correct model ID:

Valid options: - us.anthropic.claude-sonnet-4-5-20250929-v1:0 (Claude Sonnet 4.5) - us.meta.llama4-scout-17b-instruct-v1:0 (Llama 4 Scout)

  1. Fix .env file:

    # Edit .env and set:
    BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0
    
    # Or remove the line to use default (Claude Sonnet 4.5)
    

  2. List available models:

    aws bedrock list-foundation-models --region us-east-1 \
      --query 'modelSummaries[*].[modelId,modelName]' --output table
    

  3. Restart workers:

    docker-compose -f docker-compose.prod.yml restart workers
    


Workers Not Processing Jobs

Symptom

Jobs stuck in pending status, workers not picking them up.

Diagnosis

# Check worker is running
docker ps | grep workers

# Check worker logs
docker logs newhires-workers --tail=50

# Check job queue
docker exec newhires-db psql -U newhires -d newhires -c \
  "SELECT id, status, created_at FROM job_queue WHERE status='pending' LIMIT 10;"

Solutions

  1. Verify worker is running:

    docker-compose -f docker-compose.prod.yml ps workers
    # Should show "Up"
    

  2. Check for AWS errors in worker logs:

    docker logs newhires-workers | grep -i "error\|exception"
    

  3. Verify database connection:

    docker exec newhires-workers env | grep DATABASE_URL
    

  4. Restart workers:

    docker-compose -f docker-compose.prod.yml restart workers
    

  5. Scale up workers if needed:

    docker-compose -f docker-compose.prod.yml up -d --scale workers=2
    


High AWS Costs

Symptom

Unexpectedly high AWS Bedrock charges.

Diagnosis

# Check token usage in worker logs
docker logs newhires-workers | grep "tokens used"

# Count jobs processed
docker exec newhires-db psql -U newhires -d newhires -c \
  "SELECT status, COUNT(*) FROM correction_jobs GROUP BY status;"

Solutions

  1. Review cost settings in .env:

    grep -E "MAX_CONCURRENT|MAX_AI_ATTEMPTS|BEDROCK_MODEL" .env
    

  2. Reduce concurrent calls:

    # Edit .env
    MAX_CONCURRENT_BEDROCK_CALLS=1  # Reduce from 2
    

  3. Reduce retry attempts:

    # Edit .env
    MAX_AI_ATTEMPTS=3  # Reduce from 5
    

  4. Switch to cheaper model:

    # Edit .env
    BEDROCK_MODEL_ID=us.meta.llama4-scout-17b-instruct-v1:0  # ~70% cheaper than Claude
    

  5. Set up billing alerts in AWS Console:

  6. Go to AWS Billing → Budgets
  7. Create budget alert for Bedrock usage

  8. Restart workers after changes:

    docker-compose -f docker-compose.prod.yml restart workers
    

Cost comparison per job: - Claude Sonnet 4.5: ~$0.045/job - Llama 4 Scout: ~$0.015/job


Debugging Workflow

Step 1: Check Worker Logs

docker logs newhires-workers --tail=100 -f

Look for: - INFO: Worker polling for jobs - Worker is running - INFO: Processing correction job - Job picked up - INFO: Calling AWS Bedrock - API call started - INFO: Bedrock tokens used: input=XXX, output=YYY - Success - ERROR: - Problems

Step 2: Verify AWS Credentials

# Check env vars
docker exec newhires-workers env | grep AWS

# Test AWS access
docker exec newhires-workers python3 -c "
import boto3
print(boto3.client('sts').get_caller_identity())
"

Step 3: Test Bedrock Directly

# Test from worker container
docker exec newhires-workers python3 -c "
import boto3
import json

client = boto3.client('bedrock-runtime', region_name='us-east-1')

body = json.dumps({
    'anthropic_version': 'bedrock-2023-05-31',
    'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'Hello'}]}],
    'max_tokens': 100,
    'temperature': 0.1
})

response = client.invoke_model(
    modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
    body=body
)

print('Success!')
"

Step 4: Check Database Connection

# Verify workers can reach database
docker exec newhires-workers python3 -c "
import psycopg2
import os

conn = psycopg2.connect(os.environ['DATABASE_URL'])
print('Database connection successful')
conn.close()
"

Getting Help

If issues persist after trying these solutions:

  1. Collect diagnostic info:

    # Save logs
    docker logs newhires-workers > worker_logs.txt
    
    # Check env (remove sensitive values before sharing)
    docker exec newhires-workers env | grep -v SECRET > worker_env.txt
    
    # Check job status
    docker exec newhires-db psql -U newhires -d newhires -c \
      "SELECT status, COUNT(*) FROM correction_jobs GROUP BY status;" > job_status.txt
    

  2. Review documentation:

  3. AWS Bedrock Setup
  4. Environment Variables
  5. Common Issues

  6. Check AWS Service Health:

  7. https://health.aws.amazon.com/health/status
  8. Filter by Bedrock and us-east-1

  9. Contact Support with collected logs and diagnostic info


Prevention

Best Practices

  1. Monitor worker logs regularly:

    docker logs newhires-workers --tail=50
    

  2. Set up CloudWatch for Bedrock API monitoring (optional)

  3. Use staging environment to test changes before production

  4. Keep credentials secure:

  5. Rotate AWS access keys every 90 days
  6. Never commit .env to git
  7. Use IAM roles when possible (EC2/ECS)

  8. Monitor costs:

  9. Set up AWS billing alerts
  10. Review Bedrock usage monthly
  11. Adjust MAX_CONCURRENT_BEDROCK_CALLS based on budget

  12. Test Bedrock access after any AWS changes:

    aws bedrock list-foundation-models --region us-east-1
    


Quick Reference

Environment Variables

Variable Purpose Impact on Bedrock
AWS_ACCESS_KEY_ID AWS authentication Required
AWS_SECRET_ACCESS_KEY AWS authentication Required
AWS_REGION AWS region Required (us-east-1)
BEDROCK_MODEL_ID Model selection Optional (default: Claude)
MAX_CONCURRENT_BEDROCK_CALLS API concurrency Higher = more cost
MAX_AI_ATTEMPTS Retry limit Higher = more cost

Common Commands

# Restart workers
docker-compose -f docker-compose.prod.yml restart workers

# View worker logs
docker logs newhires-workers -f

# Check AWS credentials
docker exec newhires-workers env | grep AWS

# Test Bedrock access
aws bedrock list-foundation-models --region us-east-1

# Check job queue
docker exec newhires-db psql -U newhires -d newhires -c \
  "SELECT status, COUNT(*) FROM correction_jobs GROUP BY status;"