Skip to main content

Analyze AWS Application Load Balancer logs using Amazon Athena

1. Introduction

The AWS Application Load Balancer logs show details about the requests sent to your ALB, like where they came from, what was requested, and if it was successful. These logs help with fixing problems and understanding how well your system is working.

I recently wanted to analyze ALB access logs to identify suspicious activity. However, the logs are stored in gzip format in an S3 bucket with hundreds of thousands of files, making manual analysis impractical. To analyze these logs, you can use Amazon Athena, a tool that lets you query data stored in Amazon S3 using regular SQL commands.

2. What is Amazon Athena

Amazon Athena is an interactive query service provided by AWS that allows you to analyze and query data stored in Amazon S3 using standard SQL. It enables you to run ad-hoc queries on data in S3 without the need for complex ETL processes or data movement.

Since Athena is serverless, you don't need to handle any infrastructure, and you are charged solely based on the queries you execute.

3. Analyzing ALB Access Logs with Athena

Analyzing ALB Access Logs with Athena

3.1. Make sure to enable Application Load Balancer access logs as described here so that the access logs can be saved to your Amazon S3 bucket.

Enable ALB access logs

3.2. Open the Athena console and click Launch Query Editor.

Athena Console

3.3. Create an Athena database and table for Application Load Balancer logs. To create an Athena database, please run the following command in Query Editor. It's recommended to create the database in the same AWS Region as the Amazon S3 bucket.

CREATE DATABASE <DATABASE_NAME>

Create an Athena Database

3.4. Then, select the database from the dropdown and create an alb_logs table for the ALB logs. Make sure to replace the <YOUR-ALB-LOGS-DIRECTORY>, <ACCOUNT-ID>, and <REGION> with the correct values.

CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code int,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
lambda_error_reason string,
target_port_list string,
target_status_code_list string,
classification string,
classification_reason string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"')
LOCATION 's3://<YOUR-ALB-LOGS-DIRECTORY>/AWSLogs/<ACCOUNT-ID>/elasticloadbalancing/<REGION>/';

Create Table

3.5. In the Query Editor settings, choose an S3 bucket to store the results of your Athena queries.

Athena Settings

3.6. Now you can use SQL syntax to query the access logs.

The following SQL query in counts the occurrences of different request verbs for requests containing 'hs' in the URL from the 'alb_logs' table. It groups the results by request verb, client IP, and request URL, and limits the output to the first 100 results.

SELECT COUNT(request_verb) AS
count,
request_verb,
client_ip,
request_url
FROM alb_logs
WHERE request_url LIKE '%hs%'
GROUP BY request_verb, client_ip, request_url
LIMIT 100;

Athena Query

4. References

4.1. Querying Application Load Balancer logs

4.2. Analyzing ALB Access Logs with Amazon Athena

4.3. How do I use Amazon Athena to analyze ALB access logs