Big data security analytics is collection and analysis of security data sets so large and complex that it becomes difficult to process using traditional database management tools or security data processing applications. It is marked by three basic characteristics:
- Scale: Ability to collect, process, and store terabytes to petabytes of data for security analytics activities.
- Analytical flexibility: Ability to interact, query, and visualize this voluminous data in an assortment of ways.
- Performance: Appropriate compute architecture to process data analytic algorithms and complex queries thereby delivering results in an acceptable time-frame.
There are essentially two types of solutions
- Real-time big data security analytics solutions: Essentially evolved from present day SIEM and log management solutions built for modern scale and performance requirements. These solutions are deployed on a distributed architecture; comprise of structures designed for local streaming processing and collective parallel processing., tend to collect and analyze old standby data like logs, network flows, and IP packets across the enterprise. Many use propriety data storage as well. Examples include Click Security, Lancope, and Solera Networks.
- Asymmetric big data security analytics: designed for non-linear incremental analysis pivoting from query to query as individual security events and/or anomalous behavior across systems, networks, user activity, etc are investigated. These can be built on proprietary data repositories, but all products usually support big data technologies like Cassandra, Hadoop ecosystem (including Pig, Hive, Mahout, and RHadoop),, and NoSQL over time. These essentially employ machine learning algorithms, cluster analysis, and advanced visualization. Examples include LexisNexis, PacketLoop, and RedLambda.
Big data is still in the hype phase: Everyone’s talking about it but few people have figured out how to harness it to really improve IT security. SIEMs are good at producing reports but they haven’t helped in dealing with new kinds of threats and telling us how to use our existing security controls more effectively to protect the business. The first step would involve looking for indicators of compromise, or the ways that attackers start to do things that later lead to threats leading to some automated way to feed the various tools that can plow through large amounts of data really quickly and can pull events from servers, firewalls and IPS etc really quickly. Then this has to offer specific recommendations of what to do when we are vulnerable to an attack and when attack has started. e.g. a specific rule on the IPS, turning on DOS prevention, closing ports on a firewall, blocking access to certain app etc. Finally to ponder over the corrective action part with the gained knowledge of, “How did we get into this vulnerable state, what’s causing this?” It could be this piece of buggy software or attacks coming from organizations/countries that we never do business with. We have progressed from IDS (which involves detection of specific defines signatures and had problems of false alarms), to 1st generation SIEM (which didn’t scale well) and now are currently in 2nd Generation SIEM which use Big Data Technologies for scale-able security analytics. The incorporation of unstructured data and multiple disparate datasets into a single analysis framework is big data’s key promising feature.
Big data tools are particulary useful in case of Advanced Persistent Threat (APT) detection and forensics. APTs operate in a low-and-slow mode ( low profile and long-term execution)while the victim remains oblivious to the intrusion. To detect these attacks, we need to collect and correlate large quantities of long term, diverse data and perform long-term historical correlation to incorporate a posteriori information of an attack. This is certainly possible through Big Data Security Analytics.