AWS Bedrock Error Troubleshooting¶
Resolve AWS Bedrock AI-related issues with workers and correction jobs.
Quick Diagnosis¶
Run these commands to quickly identify the issue:
# Check worker logs for Bedrock errors
docker logs newhires-workers --tail=50 | grep -i "bedrock\|aws\|error"
# Verify AWS credentials are set
docker exec newhires-workers env | grep AWS
# Test Bedrock access from worker
docker exec newhires-workers python3 -c "
import boto3
client = boto3.client('bedrock-runtime', region_name='us-east-1')
print('Bedrock client created successfully')
"
Common Bedrock Errors¶
"AccessDeniedException" or "UnrecognizedClientException"¶
Symptom: Worker logs show:
orCause: Invalid or missing AWS credentials
Solutions:
-
Verify credentials in .env:
-
Check credentials format:
- Access Key ID should start with
AKIA - Secret Access Key is 40 characters
-
No quotes around values in .env
-
Verify credentials in AWS Console:
- Go to AWS Console → IAM → Users → Your User → Security credentials
- Check if access key is Active
-
If inactive or deleted, create new one
-
Test credentials:
-
Restart workers with correct credentials:
"AccessDeniedException: User is not authorized"¶
Symptom: Worker logs show:
AccessDeniedException: User: arn:aws:iam::XXXX:user/newhires is not authorized
to perform: bedrock:InvokeModel on resource: arn:aws:bedrock:...
Cause: IAM user lacks bedrock:InvokeModel permission
Solutions:
- Attach correct IAM policy to your user:
Go to AWS Console → IAM → Users → Your User → Add permissions → Attach policies
Create/attach policy with this JSON:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:us-east-1::foundation-model/us.anthropic.claude-sonnet-4-5-20250929-v1:0",
"arn:aws:bedrock:us-east-1::foundation-model/us.meta.llama4-scout-17b-instruct-v1:0"
]
}
]
}
-
Wait 1-2 minutes for IAM changes to propagate
-
Restart workers:
See Also: AWS Bedrock Setup Guide for complete IAM setup
"ResourceNotFoundException: Could not resolve foundation model"¶
Symptom: Worker logs show:
Cause: Model access not enabled in AWS Bedrock console
Solutions:
- Enable model access in AWS Console:
- Go to: https://console.aws.amazon.com/bedrock/home?region=us-east-1#/modelaccess
- Click "Manage model access"
- Check "Claude Sonnet 4.5" (and optionally "Llama 4 Scout")
- Click "Request model access"
-
Wait for status to change from "Pending" to "Access granted"
-
Verify model ID is correct in
.env: -
Check model is available in us-east-1:
-
Restart workers:
"ThrottlingException: Rate exceeded"¶
Symptom: Worker logs show:
Cause: Too many Bedrock API calls too quickly (hitting rate limits)
Solutions:
-
Reduce concurrent calls in
.env: -
Increase poll interval to reduce pressure:
-
Scale down workers if running multiple:
-
Wait and retry:
- Throttling is temporary
- Workers will automatically retry with exponential backoff
-
Check logs:
docker logs newhires-workers --tail=50 -
Request higher limits (if persistent):
- Go to AWS Service Quotas console
- Request quota increase for Bedrock InvokeModel
Note: Claude Sonnet 4.5 default limits: - Requests per minute: Varies by account - Tokens per minute: Varies by account
"Could not connect to the endpoint URL"¶
Symptom: Worker logs show:
Cause: Network connectivity or wrong AWS region
Solutions:
-
Verify AWS region in
.env: -
Test network connectivity:
-
Check firewall/proxy settings:
- Ensure outbound HTTPS (port 443) is allowed
- Check if corporate firewall blocks AWS services
-
Configure proxy if needed
-
Restart workers:
"ValidationException: The provided model identifier is invalid"¶
Symptom: Worker logs show:
Cause: Incorrect BEDROCK_MODEL_ID in .env
Solutions:
- Use correct model ID:
Valid options:
- us.anthropic.claude-sonnet-4-5-20250929-v1:0 (Claude Sonnet 4.5)
- us.meta.llama4-scout-17b-instruct-v1:0 (Llama 4 Scout)
-
Fix .env file:
-
List available models:
-
Restart workers:
Workers Not Processing Jobs¶
Symptom¶
Jobs stuck in pending status, workers not picking them up.
Diagnosis¶
# Check worker is running
docker ps | grep workers
# Check worker logs
docker logs newhires-workers --tail=50
# Check job queue
docker exec newhires-db psql -U newhires -d newhires -c \
"SELECT id, status, created_at FROM job_queue WHERE status='pending' LIMIT 10;"
Solutions¶
-
Verify worker is running:
-
Check for AWS errors in worker logs:
-
Verify database connection:
-
Restart workers:
-
Scale up workers if needed:
High AWS Costs¶
Symptom¶
Unexpectedly high AWS Bedrock charges.
Diagnosis¶
# Check token usage in worker logs
docker logs newhires-workers | grep "tokens used"
# Count jobs processed
docker exec newhires-db psql -U newhires -d newhires -c \
"SELECT status, COUNT(*) FROM correction_jobs GROUP BY status;"
Solutions¶
-
Review cost settings in
.env: -
Reduce concurrent calls:
-
Reduce retry attempts:
-
Switch to cheaper model:
-
Set up billing alerts in AWS Console:
- Go to AWS Billing → Budgets
-
Create budget alert for Bedrock usage
-
Restart workers after changes:
Cost comparison per job: - Claude Sonnet 4.5: ~$0.045/job - Llama 4 Scout: ~$0.015/job
Debugging Workflow¶
Step 1: Check Worker Logs¶
Look for:
- INFO: Worker polling for jobs - Worker is running
- INFO: Processing correction job - Job picked up
- INFO: Calling AWS Bedrock - API call started
- INFO: Bedrock tokens used: input=XXX, output=YYY - Success
- ERROR: - Problems
Step 2: Verify AWS Credentials¶
# Check env vars
docker exec newhires-workers env | grep AWS
# Test AWS access
docker exec newhires-workers python3 -c "
import boto3
print(boto3.client('sts').get_caller_identity())
"
Step 3: Test Bedrock Directly¶
# Test from worker container
docker exec newhires-workers python3 -c "
import boto3
import json
client = boto3.client('bedrock-runtime', region_name='us-east-1')
body = json.dumps({
'anthropic_version': 'bedrock-2023-05-31',
'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'Hello'}]}],
'max_tokens': 100,
'temperature': 0.1
})
response = client.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=body
)
print('Success!')
"
Step 4: Check Database Connection¶
# Verify workers can reach database
docker exec newhires-workers python3 -c "
import psycopg2
import os
conn = psycopg2.connect(os.environ['DATABASE_URL'])
print('Database connection successful')
conn.close()
"
Getting Help¶
If issues persist after trying these solutions:
-
Collect diagnostic info:
# Save logs docker logs newhires-workers > worker_logs.txt # Check env (remove sensitive values before sharing) docker exec newhires-workers env | grep -v SECRET > worker_env.txt # Check job status docker exec newhires-db psql -U newhires -d newhires -c \ "SELECT status, COUNT(*) FROM correction_jobs GROUP BY status;" > job_status.txt -
Review documentation:
- AWS Bedrock Setup
- Environment Variables
-
Check AWS Service Health:
- https://health.aws.amazon.com/health/status
-
Filter by Bedrock and us-east-1
-
Contact Support with collected logs and diagnostic info
Prevention¶
Best Practices¶
-
Monitor worker logs regularly:
-
Set up CloudWatch for Bedrock API monitoring (optional)
-
Use staging environment to test changes before production
-
Keep credentials secure:
- Rotate AWS access keys every 90 days
- Never commit
.envto git -
Use IAM roles when possible (EC2/ECS)
-
Monitor costs:
- Set up AWS billing alerts
- Review Bedrock usage monthly
-
Adjust
MAX_CONCURRENT_BEDROCK_CALLSbased on budget -
Test Bedrock access after any AWS changes:
Quick Reference¶
Environment Variables¶
| Variable | Purpose | Impact on Bedrock |
|---|---|---|
AWS_ACCESS_KEY_ID |
AWS authentication | Required |
AWS_SECRET_ACCESS_KEY |
AWS authentication | Required |
AWS_REGION |
AWS region | Required (us-east-1) |
BEDROCK_MODEL_ID |
Model selection | Optional (default: Claude) |
MAX_CONCURRENT_BEDROCK_CALLS |
API concurrency | Higher = more cost |
MAX_AI_ATTEMPTS |
Retry limit | Higher = more cost |
Common Commands¶
# Restart workers
docker-compose -f docker-compose.prod.yml restart workers
# View worker logs
docker logs newhires-workers -f
# Check AWS credentials
docker exec newhires-workers env | grep AWS
# Test Bedrock access
aws bedrock list-foundation-models --region us-east-1
# Check job queue
docker exec newhires-db psql -U newhires -d newhires -c \
"SELECT status, COUNT(*) FROM correction_jobs GROUP BY status;"