Enabling deeper threat investigations with YARA rule hunting over Validin's host response data
Introduction
What if you had a way to search the most complete and rigorous collection of virtual host responses on the internet with paradigms you were already intimately familiar with: YARA?
At PIVOTcon in May we spoke to leading threat hunters and analysts from around the world who described their challenges around hunting for different kinds of threats. Some of the tools they used had great search capabilities, but didn’t have the data or the scale to feel complete. Others had completeness, but lacked the ability to search in the ways that were most natural to them.
This month, we’ve made a significant improvement to how our users are able to query our data by allowing them to write custom YARA rules to retroactively scan our virtual host responses. This enables our customers to more accurately fingerprint, track, and discover novel threat indicators.
Today, we’re making Validin’s YARA retro hunting capability available to all of our enterprise customers. In this blog, we present a guide on how to compose and run a YARA rule in the Validin enterprise platform and showcase a use case we’ve come across during our testing.
Getting Started
Creating a rule
YARA Rules are tied to a specific project so their definitions and matches can be easily shared and monitored. A summary view of all YARA rules can be found within the “YARA Rules” tab of a project’s page. We display the name of each of your rules, the summary of the rule’s latest run and the actions you can take with your rule. If you need help writing your first rule, check out “Writing YARA rules”.

Figure 1. The YARA rule tab within a project
To create a new rule, click the “Add Rule” button to open our YARA rule editor, where you can draft, compile and test your YARA rule.

Figure 2. Validin’s YARA rule editor
Currently, any rule we run must meet the following conditions:
- Your rule is syntactically correct YARA (e.g. it must compile)
- Your rule does not contain private or global rules
- Your rule only contains a single definition
Note: We recommend you draft your YARA rules in a dedicated editor before pasting them into Validin.
Running a rule
After a rule is saved, you can run it with a variety of configurations. By default, running a rule allows you to retro hunt, similar to what is available with Virus Total. It’ll run a rule over Validin’s 4.9 TB/day on average of historical virtual host responses.
First, press the “Run” button by the rule you would like to run. Next, you will be presented with the following configuration options.

Figure 3. The YARA rule configuration step for a run
1. Lookback
The lookback window allows you to configure how far back you would like to scan our data. For example, by selecting the option of “1 day”, we would run your YARA rule over every virtual host response we’ve collected in the past 24 hours.
Note: A 4-hour buffer is automatically applied to the start of any lookback window. If you select a 1-hour lookback, the effective range will be from (current time – 5h) to (current time – 4h).
2. Source
Currently, we support scanning Validin’s collection of virtual host responses. We plan to aggressively expand the number of sources we allow you to run YARA rules over. Examples of sources we’re considering are favicons, full certificate artifacts, and JavaScript artifacts.
If you have suggestions or requests for additional sources that would be useful for hunting in your workflows, please reach out to support@validin.com or join our Slack, we’d love to hear them!
Viewing your matches
Matches to your YARA rule can be viewed by selecting the button “View Run”.

Figure 4. A YARA rule’s statistics and matches for a single run
From here, you can view summary statistics of your latest run as well as the matches.
If you’d like to view matches of previous runs, use the selector in the top right to switch between all of your rule’s runs.
For each match, you’ll see the body’s SHA1, the hour of data in which it was matched (this roughly equates to when we observed this virtual host response) and an option to view its full HTML.
Use case: Uncovering exposed LLM keys on the internet
In a recent LABSCON talk and blog post, the SentinelOne team disclosed how they were able to uncover 6,000 unique OpenAI and Anthropic API keys via a Virus Total retro hunt over a year of their historical data. They wrote a YARA rule that simply searched for the substrings T3BlbkFJ
and sk-ant-api03
within samples which can be used to identify OpenAI and Anthropic API keys respectively. We replicated this retro hunt on Validin with a similar YARA rule.
rule OpenAI_api_keys {
meta:
description = "This is a rule to find exposed OpenAI API Keys"
reference = "https://www.sentinelone.com/labs/prompts-as-code-embedded-keys-the-hunt-for-llm-enabled-malware/"
strings:
$a = "T3BlbkFJ"
condition:
$a
}
We discovered over 5000 matches for this YARA rule in just one week of data.
Looking at the HTML artifacts associated with these matches reveals some DevOps mishaps. Many of the pages have guiding comments left in the source code that look very similar to the type of comments left by AI when generating code snippets.

Figure 5. A screenshot of the exposed OpenAI key embedded within an HTML artifact found through a YARA rule match.
The comment above the embedded API key translates to “Your API key here (make sure to keep it secure and not expose it on the frontend).”
Conclusion
Validin is solely committed to building the world’s most powerful threat hunting platform. Since YARA rules are a popular way to fingerprint malicious files, we’re excited to bring this capability to Validin’s huge set of virtual host responses. Enterprise Edition clients can get started right away on our platform. If you’re not yet an Enterprise Edition client, please reach out to explore your options for accessing our YARA capabilities.