Using the GreyNoise Query Language (GNQL)

The GreyNoise Query Language (GNQL)

The GreyNoise Query Language (GNQL) provides users with a powerful tool to search the GreyNoise data set to help analysts, threat hunters, researchers, etc find emerging threats, compromised devices, and other interesting trends.

GNQL is a domain-specific query language that uses Lucene deep under the hood. GNQL is built with self-defeat and fully featured product lines in mind. If we do our job correctly, each individual GNQL query that brings our users and customers sufficient value will eventually be transitioned into its own individual offering.

GNQLs can be used to query data from both the GreyNoise Visualizer and the GreyNoise REST API.

Searchable Fields

The following is a list of all the GreyNoise fields that can be searched with a GNQL:

  • ip - The IP address of the scanning device IP
  • classification - Whether the device has been categorized as unknown, benign, or malicious
  • first_seen - The date the device was first observed by GreyNoise
  • last_seen - The date the device was most recently observed by GreyNoise
  • actor - The benign actor the device has been associated with, such as Shodan, Censys, GoogleBot, etc
  • tags - A list of the tags the device has been assigned over the past 90 days
  • spoofable - This IP address has been opportunistically scanning the Internet, however has failed to complete a full TCP connection. Any reported activity could be spoofed.
  • vpn - This IP is associated with a VPN service. Activity, malicious or otherwise, should not be attributed to the VPN service provider.
  • vpn_service - The VPN service the IP is associated with
  • bot - The IP has been associated with known bot activity
  • metadata.category - Whether the device belongs to a business, isp, hosting, education, or mobile network
  • metadata.country OR metadata.source_country - The full name of the country the device is geographically located in
  • metadata.country_code OR metadata.source_country_code- The two-character country code of the country the device is geographically located in
  • metadata.destination_country - The full name of the IP scanning destination country
  • metadata.destination_country_code - The two-character country code of the IP scanning destination country.
  • single_desintation - A boolean parameter that filters source country IPs that have only been observed in a single destination country. This has to be used in conjunction with destination_coutnry and desintation_country_code.
  • metadata.city - The city the device is geographically located in
  • metadata.region - The region the device is geographically located in
  • metadata.organization - The organization that owns the network that the IP address belongs to
  • metadata.rdns - The reverse DNS pointer of the IP
  • metadata.asn - The autonomous system the IP address belongs to
  • metadata.tor - Whether or not the device is a known Tor exit node
  • raw_data.scan.port - The port number(s) the devices has been observed scanning
  • raw_data.scan.protocol - The protocol of the port the device has been observed scanning
  • raw_data.web.paths - Any HTTP paths the device has been observed crawling the Internet for
  • raw_data.web.useragents - Any HTTP user-agents the device has been observed using while crawling the Internet
  • raw_data.ja3.fingerprint - The JA3 TLS/SSL fingerprint
  • raw_data.ja3.port - The corresponding TCP port for the given JA3 fingerprint
  • raw_data.hassh.fingerprint - The HASSH fingerprint
  • raw_data.hassh.port - The corresponding TCP port for the given HASSH fingerprint

Behavior

  • You can subtract facets by prefacing the query with a minus character
  • The data that this endpoint queries refreshes once per hour

Shortcuts

  • You can find interesting hosts by using the GNQL query term interesting
  • You can use the keyword today in the first_seen and last_seen parameters: last_seen:today or first_seen:today

IP Geo Destination

The GNQL language supports IP source and destination queries. This will help you to understand how scanning behavior impacts different countries.

  • metadata_source_country OR metadata.country - The full name of the country the scanning device is geographically located in
  • metadata_source_country_code OR metadata.country_code - The two-character country code of the country the device is geographically located in
  • metadata.destination_country - The full name of the IP scanning destination country
  • metadata.destination_country_code - The two-character country code of the IP scanning destination country.
  • single_destination - A boolean parameter that filters source country IPs observed only in a single destination country. This has to be used in conjunction with metadata.destination_countryormetadata.destination_country_code`.

If your search of the destination country doesn’t return any results, please ensure that you have entered a valid country name or code. It is possible that the destination country is not in the GreyNoise sensor network; therefore, we don’t have any data in that country.

Here is a list of countries that are part of the GreyNoise sensor network:

AustraliaBahrainBelgium
BrazilCanadaChina
FinlandFranceGeorgia
GermanyHong KongIndia
IndonesiaIrelandIsrael
ItalyJapanKenya
MalaysiaMoldovaNetherlands
PolandSerbiaSingapore
South AfricaSouth KoreaSpain
SwitzerlandTaiwanTurkey
UkraineUnited Arab EmiratesUnited Kingdom
United States

Examples:

  • Search for all IPs in China that are ONLY scanning sensors located in Brazil:

source_country:"China" AND destination_country:"Brazil" AND single_destination:true AND spoofable:false

  • Search for all IPs scanning sensors located in Germany:

destination_country:"Germany"

Time-Based Query Options

The GNQL language allows time base queries, based on the last_seen and first_seen dates. The following options are supported for both:

  • last_seen:1d - last seen in the previous day plus today (equal to last_seen: today OR yesterday)
  • last_seen:1w - last seen in the last week
  • last_seen:1m - last seen in the last month
  • last_seen:1y - last seen in the last year
  • last_seen:today - last seen on this date

📘

Time-Based Queries are based on UTC Timestamps

When using time-based query options, please note that the time query is based on the current date and time in UTC.

Examples

  • last_seen:today - Returns all IPs scanning/crawling the Internet today
  • tags:Mirai - Returns all devices with the "Mirai" tag
  • tags:"RDP Scanner" - Returns all devices with the "RDP Scanner" tag
  • classification:malicious metadata.country:Belgium - Returns all compromised devices located in Belgium
  • classification:malicious metadata.rdns:"*.gov*" - Returns all compromised devices that include .gov in their reverse DNS records
  • metadata.organization:Microsoft classification:malicious - Returns all compromised devices that belong to Microsoft
  • (raw_data.scan.port:445 and raw_data.scan.protocol:TCP) metadata.os:"Windows*" - Return all devices scanning the Internet for port 445/TCP running Windows operating systems (Conficker/EternalBlue/WannaCry)
  • raw_data.scan.port:554 - Returns all devices scanning the Internet for port 554
  • -metadata.organization:Google raw_data.web.useragents:GoogleBot - Returns all devices crawling the Internet with "GoogleBot" in their useragent from a network that does NOT belong to Google
  • tags:"Siemens PLC Scanner" -classification:benign - Returns all devices scanning the Internet for SCADA devices who ARE NOT tagged by GreyNoise as "benign" (Shodan/Project Sonar/Censys/Google/Bing/etc)
  • classification:benign - Returns all "good guys" scanning the Internet
  • raw_data.ja3.fingerprint:795bc7ce13f60d61e9ac03611dd36d90 - Returns all devices crawling the Internet with a matching client JA3 TLS/SSL fingerprint
  • raw_data.hassh.fingerprint:51cba57125523ce4b9db67714a90bf6e - Returns all devices crawling the Internet with a matching client HASSH fingerprint
  • raw_data.web.paths:"/HNAP1/" -Returns all devices crawling the Internet for the HTTP path "/HNAP1/"
  • 8.0.0.0/8 - Returns all devices scanning the Internet from the CIDR block 8.0.0.0/8

📘

Use Quotes with Wildcards

When performing complex GNQL searches that include wildcards, be sure to use quotes around the appropriate term to ensure the most relevant results are returned:

ex: rdns:"*.ant.isi.edu"