Using the GreyNoise Query Language (GNQL)

The GreyNoise Query Language (GNQL)

The GreyNoise Query Language (GNQL) is a domain-specific query language to search the GreyNoise data set to help analysts, threat hunters, researchers, etc find emerging threats, compromised devices, and other interesting trends.

GNQLs can be used to query data from both the GreyNoise Visualizer and the GreyNoise REST API.

Searchable Fields

The following is a list of all the GreyNoise fields that can be searched with a GNQL:

PathDescription
actorThe benign actor the device has been associated with (e.g. Shodan, Censys, GoogleBot)
callback_ipsIP addresses observed in HTTP payloads
classificationThe classification of the IP: malicious, suspicious, unknown, or benign
cveA list of CVEs the IP has been associated with over the past 90 days
first_seenThe date GreyNoise first observed the device
last_seenThe date GreyNoise most recently observed the device
last_seen_benignThe date GreyNoise last observed this IP exhibiting benign behavior
last_seen_maliciousThe date GreyNoise last observed this IP exhibiting malicious behavior
last_seen_suspiciousThe date GreyNoise last observed this IP exhibiting suspicious behavior
last_seen_timestampThe timestamp GreyNoise most recently observed the device
metadata.asnThe autonomous system the IP address belongs to
metadata.carrierThe carrier associated with the IP
metadata.categoryWhether the device belongs to a business, ISP, hosting, education, or mobile network
metadata.datacenterThe datacenter associated with the IP
metadata.destination_asnList of ASNs of destination IPs
metadata.destination_cityList of destination cities the IP has been observed scanning toward
metadata.destination_countryList of full country names the IP has been observed scanning toward
metadata.destination_country_codeList of two-character country codes for scanning destinations
metadata.domainThe domain associated with the IP
metadata.latitudeThe latitude of the device's geographic location
metadata.longitudeThe longitude of the device's geographic location
metadata.mobileWhether the IP belongs to a mobile network
metadata.organizationThe organization that owns the network the IP address belongs to
metadata.osThe operating system of the device
metadata.rdnsThe reverse DNS pointer of the IP
metadata.rdns_parentThe parent domain of the reverse DNS pointer
metadata.rdns_validatedWhether the reverse DNS record has been validated
metadata.regionThe region the device is geographically located in
metadata.sensor_countThe number of GreyNoise sensors that observed the IP
metadata.sensor_hitsThe number of times the IP was observed across GreyNoise sensors
metadata.single_destinationWhether the IP has only been observed scanning a single destination country
metadata.source_cityThe city the device is geographically located in
metadata.source_countryThe full name of the country the device is geographically located in
metadata.source_country_codeThe two-character country code of the country the device is located in
raw_data.hasshHASSH SSH fingerprint data
raw_data.http.cookie_keysHTTP cookie keys observed in requests
raw_data.http.ja4hJA4H HTTP fingerprint
raw_data.http.md5MD5 hash of observed HTTP content
raw_data.http.methodHTTP method observed (e.g. GET, POST)
raw_data.http.pathHTTP paths the device has been observed crawling
raw_data.http.request_authorizationAuthorization header observed in HTTP requests
raw_data.http.request_cookiesCookies observed in HTTP requests
raw_data.http.request_headerHeaders observed in HTTP requests
raw_data.http.request_originOrigin header observed in HTTP requests
raw_data.http.useragentHTTP user-agents the device has been observed using
raw_data.ja3JA3 TLS/SSL fingerprint data
raw_data.scanList of ports and protocols the device has been observed scanning
raw_data.scan.portThe port number in the first scan entry
raw_data.scan.protocolThe protocol in the first scan entry
raw_data.source.bytesNumber of bytes observed from the source IP
raw_data.ssh.ja4sshJA4SSH fingerprint
raw_data.ssh.keySSH host key observed
raw_data.tcp.ja4lJA4L TCP latency fingerprint
raw_data.tcp.ja4tJA4T TCP fingerprint
raw_data.tcp.ja4t.0First entry in the JA4T fingerprint list
raw_data.tls.cipherTLS cipher suite observed
raw_data.tls.ja4JA4 TLS fingerprint
spoofableWhether the IP has failed to complete a full TCP connection; reported activity could be spoofed
tagA list of tags the device has been assigned over the past 90 days
torWhether the device is a known Tor exit node
vpnWhether the IP is associated with a VPN service
vpn_serviceThe VPN service the IP is associated with
workspace_labelDetermines which dataset to pull results from. When not provided as an input, defaults to greynoise. Accepted values greynoise, community, or personal
📘

Note: last_seen_classification differs from the top-level classification field. GreyNoise classifications follow a hierarchy with an age-out period — for example, an IP observed behaving maliciously retains a malicious classification for 30 days before aging off. last_seen_classification lets you query based on when GreyNoise last saw behavior matching a classification, regardless of what the IP's current top-level classification is.

Behavior

  • You can subtract facets by prefacing the query with a minus character
  • The data that this endpoint queries refreshes once per hour

Accessing Results from different datasets

When searching for a query, the workspace_label parameter can be added to the query to return results from different datasets within the product.

Currently, the following datasets are available:

  • greynoise - this includes all data directly collected by the GreyNoise Global Observation Grid (GOG), the default and historical dataset that has always been offered
  • community - this includes aggregated data that is collected by ALL users who are running GreyNoise sensors within their own workspaces. These results are still collected by GreyNoise sensors and data pipeline, but are not managed by GreyNoise directly
  • personal - this includes data that is collected by sensors running in your assigned workspace only

Examples

To pull all malicious IPs from the GreyNoise dataset and the community dataset:

last_seen_malicious:1d AND (workspace_label:greynoise OR workspace_label:community)

To pull all IPs from your personal sensors:

last_seen:1d AND workspace_label:personal

To pull all IPs from the community dataset observed with the Mirai tag:

tags:Mirai AND workspace_label:community

Using Wildcards in GreyNoise Query Language (GNQL)

When searching fields using wildcards in GNQL, it's important to understand how each wildcard operates:

Wildcard Types

  • ? - Matches exactly one character.
  • * - Matches zero or more characters, including an empty string.
  • ?* - Matches one or more characters, excluding empty strings.

Examples:

  • metadata.rdns:"*example.com"

    • Returns all records with an rDNS entry ending in "example.com" (including "sub.example.com", "test.example.com", etc.).
  • metadata.rdns:?*

    • Returns records that have at least one character in the rDNS field, effectively filtering out entries where the field is empty.

Shortcuts

  • You can find interesting hosts by using the GNQL query term interesting
  • You can use the keyword today in the first_seen and last_seen parameters: last_seen:today or first_seen:today

IP Geo Destination

The GNQL language supports IP source and destination queries. This will help you to understand how scanning behavior impacts different countries.

  • metadata.source_country OR metadata.country - The full name of the country the scanning device is geographically located in
  • metadata.source_country_code OR metadata.country_code - The two-character country code of the country the device is geographically located in
  • metadata.destination_country - The full name of the IP scanning destination country
  • metadata.destination_country_code - The two-character country code of the IP scanning destination country.
  • single_destination - A boolean parameter that filters source country IPs observed only in a single destination country. This has to be used in conjunction with metadata.destination_countryormetadata.destination_country_code`.

If your search of the destination country doesn’t return any results, please ensure that you have entered a valid country name or code. It is possible that the destination country is not in the GreyNoise sensor network; therefore, we don’t have any data in that country.

Here is a list of countries that are part of the GreyNoise sensor network:

AlbaniaArmeniaAustralia
AustriaAzerbaijanBahrain
BelarusBelgiumBolivia
BrazilCambodiaCanada
ChileColombiaCyprus
Czech RepublicDenmarkEcuador
EstoniaFinlandFrance
GeorgiaGermanyGhana
GreeceGuamHong Kong
HungaryIcelandIndia
IndonesiaIranIraq
IrelandIsraelItaly
JapanKazakhstanKenya
KuwaitLatviaLithuania
LuxembourgMalaysiaMexico
MoldovaNetherlandsNew Zealand
NigeriaNorwayOman
PakistanPanamaPeru
PhilippinesPolandPortugal
QatarRomaniaRussia
Saudi ArabiaSerbiaSingapore
SlovakiaSloveniaSouth Africa
South KoreaSwedenSwitzerland
TaiwanThailandTurkey
UkraineUnited Arab EmiratesUnited Kingdom
United StatesVietnam

Examples:

  • Search for all IPs in China that are ONLY scanning sensors located in Brazil:

source_country:"China" AND destination_country:"Brazil" AND single_destination:true AND spoofable:false

  • Search for all IPs scanning sensors located in Germany:

destination_country:"Germany"

Time-Based Query Options

The GNQL language allows time-based queries, based on the last_seen and first_seen dates. The following options are supported for both:

  • last_seen:1d - last seen in the previous day plus today (includes the previous full day and partial current day).
  • last_seen:1w - last seen in the last week.
  • last_seen:1m - last seen in the last month.
  • last_seen:1y - last seen in the last year.
  • last_seen:today - last seen on this date.
  • last_seen_malicious:1d - IPs that were seen doing something malicious in the last day
  • last_seen_suspicious:7d - IPs that were seen doing something suspicious in the last 7 days
  • last_seen_unknown:10d - IPs that were seen doing something unknown in the last 7 days
📘

Time-Based Queries are based on UTC Timestamps

When using time-based query options, please note that the time query is based on the current date and time in UTC.

Examples

  • last_seen:today - Returns all IPs scanning/crawling the Internet today
  • tags:Mirai - Returns all devices with the "Mirai" tag
  • tags:"RDP Bruteforce Attempt" - Returns all devices with the "RDP Bruteforce Attempt" tag
  • classification:malicious metadata.country:Belgium - Returns all compromised devices located in Belgium
  • classification:malicious metadata.rdns:"*.gov*" - Returns all compromised devices that include .gov in their reverse DNS records
  • metadata.organization:Microsoft classification:malicious - Returns all compromised devices that belong to Microsoft
  • raw_data.scan.port:554 - Returns all devices scanning the Internet for port 554
  • -metadata.organization:Google raw_data.web.useragents:GoogleBot - Returns all devices crawling the Internet with "GoogleBot" in their useragent from a network that does NOT belong to Google
  • tags:"Siemens PLC Scanner" -classification:benign - Returns all devices scanning the Internet for SCADA devices who ARE NOT tagged by GreyNoise as "benign" (Shodan/Project Sonar/Censys/Google/Bing/etc)
  • classification:benign - Returns all "good guys" scanning the Internet
  • raw_data.ja3.fingerprint:795bc7ce13f60d61e9ac03611dd36d90 - Returns all devices crawling the Internet with a matching client JA3 TLS/SSL fingerprint
  • raw_data.hassh.fingerprint:51cba57125523ce4b9db67714a90bf6e - Returns all devices crawling the Internet with a matching client HASSH fingerprint
  • raw_data.web.paths:"/HNAP1/" -Returns all devices crawling the Internet for the HTTP path "/HNAP1/"
  • 8.0.0.0/8 - Returns all devices scanning the Internet from the CIDR block 8.0.0.0/8
📘

Use Quotes with Wildcards

When performing complex GNQL searches that include wildcards, be sure to use quotes around the appropriate term to ensure the most relevant results are returned:

ex: rdns:"*.ant.isi.edu"