Democratizing Security Detection

Democratizing Security Detection
Photo by Pawel Czerwinski / Unsplash

Introduction

Security detection programs face significant scaling challenges. As the size and scope of an organization grows, so too do the volume and diversity of security alerts; this influx can result in adverse impacts to an organization’s security posture, and may cause burnout among security staff responsible for triaging alerts. Teams often try to address their alerting volume by adding more personnel, but simply increasing headcount is an unsustainable long-term solution.

At Palantir, we have over 3,000 employees and contractors working around the globe. We run a diverse enterprise ecosystem with hundreds of applications across tens of thousands of hosts. We also provide Palantir software-as-a-service across hyperscale cloud providers including Amazon Web Services, Microsoft Azure, and Google Compute.

Despite Palantir’s relatively complex technical infrastructure, all information security monitoring, alert triage, and incident response across this infrastructure is centralized and managed by a team of fewer than 10 people. The goal of this blog post is twofold:

  • First, to share our learnings on scaling our detection program through democratization of security alerts
  • Second, to provide actionable detection strategies that should be considered in most environments.

Detection Architecture

Before we dive in to democratizing security detection, let’s discuss our detection architecture. This will vary considerably between organizations, but the base elements are similar.

The Palantir Computer Incident Response Team (CIRT) is responsible for all facets of security detection and response across the enterprise. This includes detection on hosts, networks, applications, and cloud infrastructure.

On the log source side, the CIRT defines all telemetry requirements and sources. They are the final say in validating and approving telemetry sources based upon the utility of security-relevant logs and data.

Security-relevant logs and data are plumbed via hundreds of log pipelines into a centralized security information and event management (SIEM) platform. The CIRT then develops alerting and detection strategies (ADS) to surface unusual or malicious activity across these data sources.

Generated alerts then flow into our security orchestration, automation, and response (SOAR) platform. Depending on the alert, we apply code playbooks to perform specific actions based upon the source, context, and payload of the alert. One subset of these code playbooks are used for identifying and responding to security events through user security awareness and triage.

InfoSecBot

InfoSecBot is an application integration between our SOAR platform and our Slack tenant, and is our most successful implementation of surfacing unusual activity directly to users. We have been running InfoSecBot in production across our company since 2016 and the capabilities, features, and scope have only increased over time.

There are several key operations handled by InfoSecBot. These are implemented as modules in our SOAR platform, and the ADS author can select one or more modules as they are developing and implementing an alert. These modules include:

  • Individual Notify: Notifying a user via direct message (DM) with an arbitrary alert payload.
  • Individual Challenge: Challenging a user via direct message (DM) with an arbitrary alert payload. A challenge consists of a message and a confirm/deny button to collect their response.
  • Group Notify: Notifying a slack channel via a message with an arbitrary alert payload.
  • Group Challenge: Challenging a slack channel via a message with an arbitrary alert payload. A challenge consists of a message and a confirm/deny button to collect their response.
  • MFA Challenge: Issuing a Duo multi-factor authentication prompt to collect a response from a user.

These modules can be chained together to construct complex detection workflows. For example, an alert is presented to an individual user via direct message. They are given a challenge and have 15 minutes to confirm or deny via a Slack button. The user clicks confirm, but, due to the severity of the alert, a Duo multi-factor authentication prompt is issued to confirm their identity. Once the Duo prompt has been approved, the alert may be closed, de-escalated in priority, or surfaced to the CIRT for manual validation.

Had the user clicked the deny button, the timeout expired, or the Duo prompt failed, the alert would have immediately escalated and other automation would have kicked in.

Detection Opportunities and Strategies

While there is a real cost of time and attention when surfacing these alerts to users, it provides some of the strongest signal possible, and dramatically reduces time-to-response for unusual or unauthorized activity. Additionally, depending on the efficacy and severity of an alert, additional automation can be incorporated in response to user signal, including password resets, account lockouts, token revocation, or device quarantining and isolation.

The following sections of this publication include a sampling of the alerting & detection strategies that shoulder a significant portion of our overall alert volume. By deconflicting high-volume alerts directly with the operating party, these strategies have allowed the CIRT to expand its attack surface coverage without proportionally adding to our headcount.

We encourage other organizations to develop their own unique ADS and find a mechanism to share them with the broader InfoSec community. While there are operational security considerations around publicly acknowledging and documenting internal alerts, we hope these examples spur greater sharing and collaboration, inspire detection enhancements for other defenders, and ultimately increase the operational cost for attackers.

Detecting Account Takeover

Account takeover can be quickly detected in response to unusual activity impacting user account primitives. Users have direct knowledge of major actions they’ve performed with their account, especially if you can reach them within a short window of time after the behavior occurs. The following detection strategies are designed to help surface attacks using account takeover techniques.

Detecting Account Takeover: Failed MFA — A User Rejects or Disavows an MFA Prompt

Attack Summary: An attacker ascertains the password of a user, and attempts to authenticate as that user. However, due to MFA, the attacker is unable to authenticate without the user’s second factor of authentication.

Detection Strategy: MFA is often touted as a control to harden one’s posture against account takeover. However, in addition to being a hardening control, we believe that MFA should also serve as a basis for alerting & detection. Any MFA prompt denied by a user ought to be treated as a case of potential account takeover, and should generate a corresponding alert.

Response: When a user (or an actor attempting to authenticate as a user) attempts to log in, they’ll receive an MFA notification; the user can choose to either accept, or reject that notification. If a user denies an MFA prompt, an alert will propagate to the CIRT’s alerting & detection pipeline, and an analyst will review the activity in question.

Detecting Account Takeover: User Authenticates to a Sensitive Resource

Attack Summary: An attacker takes over a victim account for the purposes of using the victim’s access to a sensitive target resource.

Detection Strategy: Monitor security / audit logs for discrete authentication events against sensitive platforms, hosts, or resources.

Response: An InfoSecBot direct message notification is issued to the user whenever they authenticate to an in-scope resource; the message includes:

  • A timestamp of the activity.
  • The resource to which their account authenticated.
  • The IP address from which the authentication attempt was initiated.

The user in question must affirmatively respond to the InfoSecBot message prompt, and must additionally confirm an MFA prompt from InfoSecBot. If the user fails to complete the prompt / MFA, the alert will be escalated to the CIRT.

Detecting Credential Resets

Passwords are horrible things; they’re often weak or re-used, and it’s all too easy to hand a password over to an attacker . However, until passwordless authentication is ubiquitous, passwords are a necessary evil. The following detection strategies are designed to help surface attacks using credential reset techniques.

Detecting Credential Resets: User changes their password

Attack Summary: A compromised user may attempt to rotate their credentials to gain exclusive access to the account.

Detection Strategy: Monitor security logs (e.g. AzureAD, Okta, Windows Event Logs) for discrete user-initiated password change workflows.

Response: An InfoSecBot direct message notification is issued to the user whenever they perform a password change on their enterprise accounts. The alert payload presented includes:

  • A timestamp of the activity.
  • The origin of their password rotation, including device hostname and IP address.

In the event the activity is unexpected or unusual, the user is given instructions on how to quickly surface it to the CIRT.

Detecting Credential Resets: User has their password changed by an administrator

Attack Summary: A compromised user with elevated permissions may attempt to perform a credential reset on another account to move laterally, or to privilege escalate.

Detection Strategy: Monitor security logs (e.g. AzureAD, Okta, Windows Event Logs) for discrete administrator-initiated password change workflows.

Response: An InfoSecBot direct message notification is issued to the user whenever a password is forcibly changed by an administrator for their enterprise accounts. The alert payload presented includes:

  • A timestamp of the activity.
  • The origin of their password rotation, including device hostname and IP address.
  • The username of the administrator.

In the event the activity is unexpected or unusual, the user is given instructions on how to quickly surface it to the CIRT.

The response strategy may be extended to include notifying the administrator that they performed this action. This provides additional detection properties, and acts as an oversight mechanism for users with privileged access.

Detecting Credential Resets: User has a passwordless bypass code created for them

Passwordless authentication ideally leverages hardware security tokens. However, users may lose or forget their security tokens, preventing access to information systems. Additionally, users need to initially enroll their security tokens before they can start using them. To enable these workflows, bypass codes or temporary access passes may need to be handed to users. These effectively function as both the first and second factor of authentication for users, and are typically time-bound or limited in usage.

Attack Summary: A compromised user with elevated permissions may attempt to create a bypass code or temporary access pass for a user.

Detection Strategy: Monitor security logs (e.g. AzureAD, Okta, Windows Event Logs) for discrete administrator-initiated bypass code creation or association workflows.

Response: An InfoSecBot direct message notification is issued to the user whenever a password is forcibly changed by an administrator for their enterprise accounts. The alert payload presented includes:

  • A timestamp of the activity.
  • Details regarding the bypass code or temporary access pass.
  • The username of the administrator.

In the event the activity is unexpected or unusual, the user is given instructions on how to quickly surface it to the CIRT.

The response strategy may be extended to include notifying the administrator that they performed this action. This provides additional detection properties, and acts as an oversight mechanism for users with privileged access.

Detecting Credential Resets: User attempts password recovery.

For security reasons, we do not allow self-service password recovery. However, this is a very common technique abused by attackers to gain initial access to a legitimate account.

Attack Summary: An attacker abuses a self-service password recovery portal to gain initial access to an account.

Detection Strategy: Monitor security logs (e.g. AzureAD, Okta, Windows Event Logs) for discrete user-initiated password recovery attempts.

Response: An InfoSecBot direct message notification is issued to the user whenever a password reset request occurs. The alert payload presented includes:

  • A timestamp of the activity.
  • The origin of the password recovery process, including device hostname and IP address.
  • Whether the password recovery process was successful or not.

In the event the activity is unexpected or unusual, the user is given instructions on how to quickly surface it to the CIRT.

If self-service password recovery is enabled, multiple failed attempts can be a high-signal detection. However, an attacker may get lucky with their first attempt, so mandatory notification to impacted users is appropriate.

Detecting Credential Phishing

In recognition of passwords (and the human behavior derivative of passwords) being arguably the weakest link in one’s identity security posture, Palantir’s InfoSec staff sought to build a dedicated tool to augment our approach to combatting credential phishing & re-use. The result was our browser extension “PhishCatch”, which has since been open sourced; Read more about PhishCatch’s functionality, and its role at Palantir on GitHub, or on our blog. Palantir’s PhishCatch alerting & detection works as follows:

Detection Credential Phishing: PhishCatch Alerting

If / when a user inputs their password on a page other than Palantir’s dedicated IDP, PhishCatch will create an alert. The PhishCatch alert will then propagate to the CIRT’s SOAR, at which point a series of automation will run against the alert:

  • InfoSecBot will Slack the user with a request to change their password; this message will repeat every hour for 72 hours.
  • The same hourly message will also be sent to the user’s email for redundancy.
  • The user’s lead / supervisor will also receive the same hourly message.
  • If / when a user changes their password, they’ll stop receiving notifications.
  • If the user fails to change their password within 72 hours of the alert, their account will be locked.

Following the full 72-hour lifecycle of a PhishCatch alert, a given user will need to have their account manually re-activated by Palantir’s support staff.

Detecting Unusual Behavior

Even if an alerting primitive doesn’t fall cleanly under a certain category of attack methods, we continue to author ad hoc alerting & detection strategies for any activity that could present risk to Palantir if performed by a party with malicious intent. To address such activity, the CIRT authors alerting & detection as follows:

Attack Summary: The CIRT’s alerting & detection strategies have identified an ad hoc action that isn’t necessarily categorized as an explicit attack attempt, but is a suspicious or sensitive enough action that the user should be consulted regarding the activity.

Detection Strategy: While the scope of the CIRT’s alerting & detection coverage is ever-growing, the following list represents a general picture of activity that we’d consider suspicious or sensitive enough to raise with a user:

  • User enrolls a new device via mobile device management.
  • User enrolls a new multi-factor authentication device.
  • User logs in to high-risk infrastructure (e.g. VDI).
  • User has account locked or marked as high-risk.

Response: An InfoSecBot direct message notification is issued to the user whenever in-scope activity is detected. The alert payload presented includes:

  • A summary of the activity itself.
  • A timestamp of the activity.
  • A brief description of why the activity could be concerning.

In the event that the activity is unexpected or unusual, the user is given instructions on how to quickly surface it to the CIRT.

Detecting Post-Exploitation

While InfoSec teams should endeavor to never have an attacker reach the stage of post-exploitation, it’d be negligent to discount it as a possibility. A healthy alerting & detection posture should cover the entirety of the attack lifecycle, and consulting users regarding on-system activity can be a valuable tool in reconciling late-stage lifecycle alerting with its respective root cause. The following alerting & detecting strategies represent a sampling of Palantir’s user-centric alerting around post-exploitation activity.

Detecting Post Exploitation: Data Movement or Staging To/From a Sensitive Destination

Attack Summary: An attacker, who is already operating within the environment using a compromised account, uses the account to stage (i.e., archive, compress, or encrypt) or move/transfer data.

Detection Strategy: In the event that an anomalous data movement, archiving, or encrypting operation occurs via a user account, endpoint logging telemetry will record the activity in question. The CIRT’s alerting logic will query the aforementioned logging telemetry to surface the activity in question.

Response: When a user account is found to be interacting with data in a manner in-scope for the alert, the user will receive an InfoSecBot prompt containing the following:

  • A timestamp of the activity.
  • The system(s) on which the activity took place.
  • A summary of the data interaction in question.

The user in question must affirmatively respond to the InfoSecBot message prompt, and must additionally confirm an MFA prompt from InfoSecBot. If the user fails to complete the prompt / MFA, the alert will be escalated to the CIRT.

Detecting Post Exploitation: Host-Based Reconnaissance or Scanning

Attack Summary: An attacker operating on a given user’s endpoint, or operating on any endpoint using a given user’s account, may attempt to run discovery/reconnaissance commands in an effort to ascertain further details about our environment’s attack surface.

Detection Strategy: The CIRT queries EDR logging telemetry for activity or commands that could be construed as reconnaissance, or for any commands consistent with discovery or port scanning (e.g., nmap).

Response: When such activity is detected from a user’s account, or from a user’s workstation, that user will receive an InfoSecBot prompt detailing:

  • The specific commands run from their account or workstation.
  • A timestamp of the activity.
  • The system(s) on which the activity took place.

The user in question must affirmatively respond to the InfoSecBot message prompt, and must additionally confirm an MFA prompt from InfoSecBot. If the user fails to complete the prompt / MFA, the alert will be escalated to the CIRT.

Crowdsourcing Appropriate Usage

While the majority of InfoSecBot prompts are sent to a specific individual, there are some cases where it makes more sense to message a broader group. Consider the following factors:

  • If an attacker has taken over an account, they could possibly take action to mute, delete, or otherwise conceal Slack messages sent to the taken-over account, which could obstruct the legitimate account owner from seeing the message.
  • In an insider threat scenario, a given user would be well aware of their own actions; Slack messages to the user would provide no security benefit.
  • Teams of coworkers, including the leader(s) of said teams, often have a general awareness of actions taken by team members.

In recognition of the factors above, Palantir’s CIRT has found it prudent to target some InfoSecBot prompts towards Slack rooms, rather than individuals. Generally speaking, the target room is the dedicated / centralized channel for communications with a given team; consequently, all members of the team (including the leader(s)) are present in the channel. Additionally, channel membership is backed by group membership in Palantir’s IDP, and members will be automatically re-added to the channel in the event that they’re removed for any reason.

For a crowdsourced alert, Palantir’s approach generally observes the follow practices:

  • The alert will generally reflect privileged activities against sensitive, collective resources.
  • e.g., SSHing to backend server using a breakglass credentials.
  • The InfoSecBot notification will generally contain the full context of a user’s activity
  • e.g., who took the action, what action they took, timestamp, etc.
  • The alert will message the channel containing the entirety of the actor’s team.
  • In the event that the activity is inconsistent with the actor’s work expectations, the actor’s team has instructions to escalate the matter to the CIRT.

Other Miscellaneous Security Use Cases

While InfoSecBot provides immense value in the context of security alerting & detection, there are other ad hoc cases in which InfoSecBot has been a valuable communication tool. For example:

Software Compliance

Palantir maintains a stringent list of approved & unapproved software for endpoint workstations. In cases where a user installs a product inconsistent with our security standards, InfoSecBot will notify them of the infraction, and they’ll be allowed a brief grace period to remove the offending content before a member of the InfoSec team intervenes.

Additionally, Palantir also maintains an inventory of software versions on user workstations. In the event that a Palantir user’s workstation contains a software version with a known vulnerability, InfoSecBot will prompt that user to update their software. The user will have a brief grace period to update their software before the matter is escalated to Palantir’s InfoSec team.

Emergency Patching

In cases of high or critical severity CVEs which necessitate an immediate software patch or operating system update, Palantir’s security engineers have utilized InfoSecBot to notify users that their system may experience imminent downtime for updates. Users receive notifications through other means as well, but InfoSecBot provides an additional layer of communication redundancy to ensure that users aren’t caught by surprise for emergency patching.

Further Reading and Acknowledgements