Introducing YARA Rules: Search and Monitor the Internet’s Infrastructure with YARA

Introducing YARA Rules: Search and Monitor the Internet’s Infrastructure with YARA

Enabling deeper threat investigations with YARA rule hunting over Validin's host response data

Introduction

What if you had a way to search the most complete and rigorous collection of virtual host responses on the internet with paradigms you were already intimately familiar with: YARA?

At PIVOTcon in May we spoke to leading threat hunters and analysts from around the world who described their challenges around hunting for different kinds of threats. Some of the tools they used had great search capabilities, but didn’t have the data or the scale to feel complete. Others had completeness, but lacked the ability to search in the ways that were most natural to them.

This month, we’ve made a significant improvement to how our users are able to query our data by allowing them to write custom YARA rules to retroactively scan our virtual host responses. This enables our customers to more accurately fingerprint, track, and discover novel threat indicators.

Today, we’re making Validin’s YARA retro hunting capability available to all of our enterprise customers. In this blog, we present a guide on how to compose and run a YARA rule in the Validin enterprise platform and showcase a use case we’ve come across during our testing.

Getting Started

Creating a rule

YARA Rules are tied to a specific project so their definitions and matches can be easily shared and monitored. A summary view of all YARA rules can be found within the “YARA Rules” tab of a project’s page. We display the name of each of your rules, the summary of the rule’s latest run and the actions you can take with your rule. If you need help writing your first rule, check out “Writing YARA rules”.

Figure 1. The YARA rule tab within a project

Figure 1. The YARA rule tab within a project

To create a new rule, click the “Add Rule” button to open our YARA rule editor, where you can draft, compile and test your YARA rule.

Figure 2. Validin’s YARA rule editor

Figure 2. Validin’s YARA rule editor

Currently, any rule we run must meet the following conditions:

  • Your rule is syntactically correct YARA (e.g. it must compile)
  • Your rule does not contain private or global rules
  • Your rule only contains a single definition

Note: We recommend you draft your YARA rules in a dedicated editor before pasting them into Validin.

Running a rule

After a rule is saved, you can run it with a variety of configurations. By default, running a rule allows you to retro hunt, similar to what is available with Virus Total. It’ll run a rule over Validin’s 4.9 TB/day on average of historical virtual host responses.

First, press the “Run” button by the rule you would like to run. Next, you will be presented with the following configuration options.

Figure 3. The YARA rule configuration step for a run

Figure 3. The YARA rule configuration step for a run

1. Lookback

The lookback window allows you to configure how far back you would like to scan our data. For example, by selecting the option of “1 day”, we would run your YARA rule over every virtual host response we’ve collected in the past 24 hours.

Note: A 4-hour buffer is automatically applied to the start of any lookback window. If you select a 1-hour lookback, the effective range will be from (current time – 5h) to (current time – 4h).

2. Source

Currently, we support scanning Validin’s collection of virtual host responses. We plan to aggressively expand the number of sources we allow you to run YARA rules over. Examples of sources we’re considering are favicons, full certificate artifacts, and JavaScript artifacts.

If you have suggestions or requests for additional sources that would be useful for hunting in your workflows, please reach out to support@validin.com or join our Slack, we’d love to hear them!

Viewing your matches

Matches to your YARA rule can be viewed by selecting the button “View Run”.

Figure 4. A YARA rule’s statistics and matches for a single run

Figure 4. A YARA rule’s statistics and matches for a single run

From here, you can view summary statistics of your latest run as well as the matches.

If you’d like to view matches of previous runs, use the selector in the top right to switch between all of your rule’s runs.

For each match, you’ll see the body’s SHA1, the hour of data in which it was matched (this roughly equates to when we observed this virtual host response) and an option to view its full HTML.

Use case: Uncovering exposed LLM keys on the internet

In a recent LABSCON talk and blog post, the SentinelOne team disclosed how they were able to uncover 6,000 unique OpenAI and Anthropic API keys via a Virus Total retro hunt over a year of their historical data. They wrote a YARA rule that simply searched for the substrings T3BlbkFJ and sk-ant-api03 within samples which can be used to identify OpenAI and Anthropic API keys respectively. We replicated this retro hunt on Validin with a similar YARA rule.

Rule to hunt exposed OpenAI API Keys.
rule OpenAI_api_keys {
    meta:
      description = "This is a rule to find exposed OpenAI API Keys"
      reference = "https://www.sentinelone.com/labs/prompts-as-code-embedded-keys-the-hunt-for-llm-enabled-malware/"
    strings:
      $a = "T3BlbkFJ"
    condition:
      $a
}

We discovered over 5000 matches for this YARA rule in just one week of data. 

Looking at the HTML artifacts associated with these matches reveals some DevOps mishaps. Many of the pages have guiding comments left in the source code that look very similar to the type of comments left by AI when generating code snippets.

Figure 5. A screenshot of the exposed OpenAI key embedded within an HTML artifact found through a YARA rule match.

Figure 5. A screenshot of the exposed OpenAI key embedded within an HTML artifact found through a YARA rule match.

The comment above the embedded API key translates to “Your API key here (make sure to keep it secure and not expose it on the frontend).”

Conclusion

Validin is solely committed to building the world’s most powerful threat hunting platform. Since YARA rules are a popular way to fingerprint malicious files, we’re excited to bring this capability to Validin’s huge set of virtual host responses. Enterprise Edition clients can get started right away on our platform. If you’re not yet an Enterprise Edition client, please reach out to explore your options for accessing our YARA capabilities.

Contact Us

"Validin is the first tab I open every morning"

- Senior Analyst at a Financial Services IT Company