Collect Amazon VPC flow logs with Elastic Agent
What is an Elastic integration?
This integration is powered by Elastic Agent. Elastic Agent is a single, unified way to add monitoring for logs, metrics, and other types of data to a host. It can also protect hosts from security threats, query data from operating systems, forward data from remote services or hardware, and more. Refer to our documentation for a detailed comparison between Beats and Elastic Agent.
Prefer to use Beats for this use case? See Filebeat modules for logs or Metricbeat modules for metrics.
See the integrations quick start guides to get started:
The AWS VPC Flow integration allows you to monitor Amazon Virtual Private Cloud (Amazon VPC) flow logs. Flow logs capture information about the IP traffic going to and from network interfaces in a VPC.
Use the AWS VPC Flow integration to collect logs related to your Amazon VPCs. Then visualize that data in Kibana, create alerts to notify you if something goes wrong, and reference logs when troubleshooting an issue.
For example, you could use this data to:
Then you can alert the relevant project manager about those events by email.
IMPORTANT: Extra AWS charges on AWS API requests will be generated by this integration. Please refer to the AWS integration for more details.
The AWS VPC Flow integration collects one type of data: logs.
Logs help you keep a record of events happening in your VPCs. Logs collected by the vpcflow integration include the packet-level (original) source and destination IP addresses for the traffic, accepted traffic, rejected traffic, and more. See more details in the Logs reference.
You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it. You can use our hosted Elasticsearch Service on Elastic Cloud, which is recommended, or self-manage the Elastic Stack on your own hardware.
Before using any AWS integration you will need:
For more details about these requirements, see the AWS integration documentation.
Use this integration if you only need to collect IP traffic data for your VPCs.
If you want to collect data from two or more AWS services, consider using the AWS integration. When you configure the AWS integration, you can collect data from as many AWS services as you'd like.
For step-by-step instructions on how to set up an integration, see the Getting started guide.
For more information on implementation, see the Amazon documentation on:
This integration supports various plain text VPC flow log formats:
${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status} ${vpc-id} ${subnet-id} ${instance-id} ${tcp-flags} ${type} ${pkt-srcaddr} ${pkt-dstaddr} ${region} ${az-id} ${sublocation-type} ${sublocation-id} ${pkt-src-aws-service} ${pkt-dst-aws-service} ${flow-direction} ${traffic-path}
Note: The Parquet format is not supported.
Exported fields
Field | Description | Type |
---|---|---|
@timestamp | Event timestamp. | date |
aws.vpcflow.account_id | The AWS account ID for the flow log. | keyword |
aws.vpcflow.action | The action that is associated with the traffic, ACCEPT or REJECT. | keyword |
aws.vpcflow.instance_id | The ID of the instance that's associated with network interface for which the traffic is recorded, if the instance is owned by you. | keyword |
aws.vpcflow.interface_id | The ID of the network interface for which the traffic is recorded. | keyword |
aws.vpcflow.log_status | The logging status of the flow log, OK, NODATA or SKIPDATA. | keyword |
aws.vpcflow.pkt_dst_service | The name of the subset of IP address ranges for the pkt-dstaddr field, if the source IP address is for an AWS service. | keyword |
aws.vpcflow.pkt_dstaddr | The packet-level (original) destination IP address for the traffic. | ip |
aws.vpcflow.pkt_src_service | The name of the subset of IP address ranges for the pkt-srcaddr field, if the source IP address is for an AWS service. | keyword |
aws.vpcflow.pkt_srcaddr | The packet-level (original) source IP address of the traffic. | ip |
aws.vpcflow.sublocation.id | The ID of the sublocation that contains the network interface for which traffic is recorded. If the traffic is not from a sublocation, the field is removed. | keyword |
aws.vpcflow.sublocation.type | The type of sublocation that's returned in the sublocation-id field. The possible values are: wavelength | outpost |
aws.vpcflow.subnet_id | The ID of the subnet that contains the network interface for which the traffic is recorded. | keyword |
aws.vpcflow.tcp_flags | The bitmask value for the following TCP flags: 2=SYN,18=SYN-ACK,1=FIN,4=RST | keyword |
aws.vpcflow.tcp_flags_array | List of TCP flags: 'fin, syn, rst, psh, ack, urg' | keyword |
aws.vpcflow.traffic_path | The path that egress traffic takes to the destination. To determine whether the traffic is egress traffic, check the network.direction field. The possible values can be found here. If none of the values apply, the field is set to -. | keyword |
aws.vpcflow.type | The type of traffic: IPv4, IPv6, or EFA. | keyword |
aws.vpcflow.version | The VPC Flow Logs version. If you use the default format, the version is 2. If you specify a custom format, the version is 3. | keyword |
aws.vpcflow.vpc_id | The ID of the VPC that contains the network interface for which the traffic is recorded. | keyword |
cloud.account.id | The cloud account or organization id used to identify different entities in a multi-tenant environment. Examples: AWS account id, Google Cloud ORG Id, or other unique identifier. | keyword |
cloud.availability_zone | Availability zone in which this host, resource, or service is located. | keyword |
cloud.image.id | Image ID for the cloud instance. | keyword |
cloud.instance.id | Instance ID of the host machine. | keyword |
cloud.instance.name | Instance name of the host machine. | keyword |
cloud.machine.type | Machine type of the host machine. | keyword |
cloud.project.id | The cloud project identifier. Examples: Google Cloud Project id, Azure Project id. | keyword |
cloud.provider | Name of the cloud provider. Example values are aws, azure, gcp, or digitalocean. | keyword |
cloud.region | Region in which this host, resource, or service is located. | keyword |
container.id | Unique container id. | keyword |
container.image.name | Name of the image the container was built on. | keyword |
container.labels | Image labels. | object |
container.name | Container name. | keyword |
data_stream.dataset | Data stream dataset. | constant_keyword |
data_stream.namespace | Data stream namespace. | constant_keyword |
data_stream.type | Data stream type. | constant_keyword |
destination.address | Some event destination addresses are defined ambiguously. The event will sometimes list an IP, a domain or a unix socket. You should always store the raw address in the .address field. Then it should be duplicated to .ip or .domain , depending on which one it is. | keyword |
destination.as.number | Unique number allocated to the autonomous system. The autonomous system number (ASN) uniquely identifies each network on the Internet. | long |
destination.as.organization.name | Organization name. | keyword |
destination.as.organization.name.text | Multi-field of destination.as.organization.name . | match_only_text |
destination.geo.city_name | City name. | keyword |
destination.geo.continent_name | Name of the continent. | keyword |
destination.geo.country_iso_code | Country ISO code. | keyword |
destination.geo.country_name | Country name. | keyword |
destination.geo.location | Longitude and latitude. | geo_point |
destination.geo.region_iso_code | Region ISO code. | keyword |
destination.geo.region_name | Region name. | keyword |
destination.ip | IP address of the destination (IPv4 or IPv6). | ip |
destination.port | Port of the destination. | long |
ecs.version | ECS version this event conforms to. ecs.version is a required field and must exist in all events. When querying across multiple indices -- which may conform to slightly different ECS versions -- this field lets integrations adjust to the schema version of the events. | keyword |
error.message | Error message. | match_only_text |
event.category | This is one of four ECS Categorization Fields, and indicates the second level in the ECS category hierarchy. event.category represents the "big buckets" of ECS categories. For example, filtering on event.category:process yields all events relating to process activity. This field is closely related to event.type , which is used as a subcategory. This field is an array. This will allow proper categorization of some events that fall in multiple categories. | keyword |
event.dataset | Event dataset | constant_keyword |
event.end | event.end contains the date when the event ended or when the activity was last observed. | date |
event.kind | This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. event.kind gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not. | keyword |
event.module | Event module | constant_keyword |
event.original | Raw text message of entire event. Used to demonstrate log integrity or where the full log message (before splitting it up in multiple parts) may be required, e.g. for reindex. This field is not indexed and doc_values are disabled. It cannot be searched, but it can be retrieved from _source . If users wish to override this and index this field, please see Field data types in the Elasticsearch Reference . | keyword |
event.outcome | This is one of four ECS Categorization Fields, and indicates the lowest level in the ECS category hierarchy. event.outcome simply denotes whether the event represents a success or a failure from the perspective of the entity that produced the event. Note that when a single transaction is described in multiple events, each event may populate different values of event.outcome , according to their perspective. Also note that in the case of a compound event (a single event that contains multiple logical events), this field should be populated with the value that best captures the overall success or failure from the perspective of the event producer. Further note that not all events will have an associated outcome. For example, this field is generally not populated for metric events, events with event.type:info , or any events for which an outcome does not make logical sense. | keyword |
event.start | event.start contains the date when the event started or when the activity was first observed. | date |
event.type | This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. event.type represents a categorization "sub-bucket" that, when used along with the event.category field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types. | keyword |
host.architecture | Operating system architecture. | keyword |
host.containerized | If the host is a container. | boolean |
host.domain | Name of the domain of which the host is a member. For example, on Windows this could be the host's Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host's LDAP provider. | keyword |
host.hostname | Hostname of the host. It normally contains what the hostname command returns on the host machine. | keyword |
host.id | Unique host id. As hostname is not always unique, use values that are meaningful in your environment. Example: The current usage of beat.name . | keyword |
host.ip | Host ip addresses. | ip |
host.mac | Host MAC addresses. The notation format from RFC 7042 is suggested: Each octet (that is, 8-bit byte) is represented by two [uppercase] hexadecimal digits giving the value of the octet as an unsigned integer. Successive octets are separated by a hyphen. | keyword |
host.name | Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name, or a name specified by the user. The sender decides which value to use. | keyword |
host.os.build | OS build information. | keyword |
host.os.codename | OS codename, if any. | keyword |
host.os.family | OS family (such as redhat, debian, freebsd, windows). | keyword |
host.os.kernel | Operating system kernel version as a raw string. | keyword |
host.os.name | Operating system name, without the version. | keyword |
host.os.name.text | Multi-field of host.os.name . | match_only_text |
host.os.platform | Operating system platform (such centos, ubuntu, windows). | keyword |
host.os.version | Operating system version as a raw string. | keyword |
host.type | Type of host. For Cloud providers this can be the machine type like t2.medium . If vm, this could be the container, for example, or other information meaningful in your environment. | keyword |
network.bytes | Total bytes transferred in both directions. If source.bytes and destination.bytes are known, network.bytes is their sum. | long |
network.community_id | A hash of source and destination IPs and ports, as well as the protocol used in a communication. This is a tool-agnostic standard to identify flows. Learn more at https://github.com/corelight/community-id-spec. | keyword |
network.direction | Direction of the network traffic. Recommended values are: * ingress * egress * inbound * outbound * internal * external * unknown When mapping events from a host-based monitoring context, populate this field from the host's point of view, using the values "ingress" or "egress". When mapping events from a network or perimeter-based monitoring context, populate this field from the point of view of the network perimeter, using the values "inbound", "outbound", "internal" or "external". Note that "internal" is not crossing perimeter boundaries, and is meant to describe communication between two hosts within the perimeter. Note also that "external" is meant to describe traffic between two hosts that are external to the perimeter. This could for example be useful for ISPs or VPN service providers. | keyword |
network.iana_number | IANA Protocol Number (https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml). Standardized list of protocols. This aligns well with NetFlow and sFlow related logs which use the IANA Protocol Number. | keyword |
network.packets | Total packets transferred in both directions. If source.packets and destination.packets are known, network.packets is their sum. | long |
network.transport | Same as network.iana_number, but instead using the Keyword name of the transport layer (udp, tcp, ipv6-icmp, etc.) The field value must be normalized to lowercase for querying. | keyword |
network.type | In the OSI Model this would be the Network Layer. ipv4, ipv6, ipsec, pim, etc The field value must be normalized to lowercase for querying. | keyword |
related.ip | All of the IPs seen on your event. | ip |
source.address | Some event source addresses are defined ambiguously. The event will sometimes list an IP, a domain or a unix socket. You should always store the raw address in the .address field. Then it should be duplicated to .ip or .domain , depending on which one it is. | keyword |
source.as.number | Unique number allocated to the autonomous system. The autonomous system number (ASN) uniquely identifies each network on the Internet. | long |
source.as.organization.name | Organization name. | keyword |
source.as.organization.name.text | Multi-field of source.as.organization.name . | match_only_text |
source.bytes | Bytes sent from the source to the destination. | long |
source.geo.city_name | City name. | keyword |
source.geo.continent_name | Name of the continent. | keyword |
source.geo.country_iso_code | Country ISO code. | keyword |
source.geo.country_name | Country name. | keyword |
source.geo.location | Longitude and latitude. | geo_point |
source.geo.region_iso_code | Region ISO code. | keyword |
source.geo.region_name | Region name. | keyword |
source.ip | IP address of the source (IPv4 or IPv6). | ip |
source.packets | Packets sent from the source to the destination. | long |
source.port | Port of the source. | long |
tags | List of keywords used to tag each event. | keyword |
An example event for vpcflow
looks as following:
{
"data_stream": {
"namespace": "default",
"type": "logs",
"dataset": "aws.vpcflow"
},
"destination": {
"port": 22,
"address": "2001:db8:1234:a102:3304:8879:34cf:4071",
"ip": "2001:db8:1234:a102:3304:8879:34cf:4071"
},
"source": {
"address": "2001:db8:1234:a100:8d6e:3477:df66:f105",
"port": 34892,
"bytes": 8855,
"packets": 54,
"ip": "2001:db8:1234:a100:8d6e:3477:df66:f105"
},
"tags": [
"preserve_original_event"
],
"network": {
"community_id": "1:hXZclvxUJScaVf0xMIJR6yW6tBQ=",
"transport": "tcp",
"type": "ipv6",
"bytes": 8855,
"iana_number": "6",
"packets": 54
},
"cloud": {
"provider": "aws",
"account": {
"id": "123456789010"
}
},
"@timestamp": "2016-10-31T11:37:00.000Z",
"ecs": {
"version": "8.0.0"
},
"related": {
"ip": [
"2001:db8:1234:a100:8d6e:3477:df66:f105",
"2001:db8:1234:a102:3304:8879:34cf:4071"
]
},
"event": {
"ingested": "2021-09-28T19:10:43.075027100Z",
"original": "2 123456789010 eni-1235b8ca123456789 2001:db8:1234:a100:8d6e:3477:df66:f105 2001:db8:1234:a102:3304:8879:34cf:4071 34892 22 6 54 8855 1477913708 1477913820 ACCEPT OK",
"kind": "event",
"start": "2016-10-31T11:35:08.000Z",
"end": "2016-10-31T11:37:00.000Z",
"type": "connection",
"category": "network",
"outcome": "success"
},
"aws": {
"vpcflow": {
"action": "ACCEPT",
"account_id": "123456789010",
"log_status": "OK",
"interface_id": "eni-1235b8ca123456789",
"version": "2"
}
}
}