Understanding GreyNoise Datasets

GreyNoise produces two datasets of IP information that can be used for threat enrichment. The following article provides a basic overview of each dataset and where it is best used.

Noise Dataset

What is it?

GreyNoise’s internet-wide sensor network passively collects packets from hundreds of thousands of IPs seen scanning the internet every day. Companies like Shodan and Censys, as well as researchers and universities, scan in good faith to help uncover vulnerabilities for network defense. Others scan with potentially malicious intent. GreyNoise analyzes and enriches this data to identify behavior, methods, and intent, giving analysts the context to take action.

When is it best to query it?

The Noise dataset is best used to enrich log events on your environment's perimeter and public, internet-facing devices. This data can be used to help determine if this activity is something that is happening across the internet or is something that may be targeted specifically at your organization.

RIOT Dataset

What is it?

RIOT provides context to communications between your users and common business applications (e.g. Microsoft O365, Google Workspace, and Slack), or services like CDNs and public DNS servers. These applications communicate through unpublished or dynamic IPs making it difficult for security teams to track. Without context, this harmless behavior distracts security teams from investigating true threats.

When is it best to query it?

The RIOT data set is best used to filter outbound traffic leaving your network. It can be applied to determine which traffic is going to known services so that you can focus on the connections going to unknown IPs.

RIOT can also be very helpful as a pre-filter on IPs submitted to blocklists to ensure that you do not accidentally block a critical business service for your organization.