The beneficial capabilities of Artificial Intelligence (AI) have never been more obvious. A big part of the reason is OpenAI’s launch of ChatGPT in November 2022. ChatGPT describes itself as follows:
“ChatGPT is a large language model developed by OpenAI based on the GPT (Generative Pre-trained Transformer) architecture. It is designed to generate human-like responses to natural language inputs, making it capable of holding conversations with people on a wide range of topics. ChatGPT has been trained on vast amounts of data from the internet, books, and other sources, allowing it to understand and generate responses in many different languages and styles. It can be used for a variety of applications, including chatbots, language translation, and text completion.”
In short, users can type questions and receive answers based on ChatGPT’s enormous knowledge base. The most common use of this is to simplify a web search. Microsoft’s Bing search engine includes a feature similar to ChatGPT result as an alternative search option.
But it can also do more. Because its data set includes software development information, ChatGPT can generate source code based on a functional description of a task and debug code. While OpenAI has provided guardrails to prevent ChatGPT from creating malicious code, researchers have easily bypassed those to generate data exfiltration malware, phishing emails (including those with malicious payloads), and steganography malware.
Defensive Uses of AI
AI can also help defend networks and applications against adversaries. For example, solutions like ChatGPT provide a simple, natural language user interface to support development teams. Advanced users can extend GPT by training machine learning models with internal policies, proprietary research, cybersecurity standards and best practices, and other data to make it ‘aware’ of new threats or vulnerabilities and ways to mitigate those. This will be a great help to teams that lack the expertise of an experienced Application Security engineer.
AI is already in use in application security. Veracode recently announced they are using AI to generate remediated code for customers. Similarly, AI and machine learning can help accelerate and automate incident response. AI can consume threat intelligence much more efficiently than humans and identify patterns in the data. By training a system with indicators of attack and “run books,” AI systems can explain events in an easily understood manner and suggest actions. By observing selected responses over time, the systems can automate responses.
AI is also useful for discovering anomalies in email messages that could indicate malicious intent. Rather than relying on fixed signatures, AI allows solutions to analyze message content, domains, senders, and attachments to identify phishing emails and malicious content.
Limitations of AI in (today’s) Cybersecurity
While AI holds much promise, organizations need to address several challenges to ensure its success. These include:
- Training set integrity: The term “garbage in, garbage out” is especially true in AI. Any model is only as good as the data in its training set. AI models rely on accurate and representative training data to learn patterns and make predictions. If the training set is inaccurate, the model may learn incorrect or misleading patterns, leading to reduced accuracy in its predictions.
- Limited domain expertise: AI models benefit from large training sets. However, cybersecurity is constantly evolving to new and evolving threats. This means some threats may not have sufficient data for the AI model to learn from, resulting in reduced accuracy and performance. The model may struggle to generalize to new data or make incorrect predictions, leading to unreliable results. This is particularly relevant for services like chatGPT, which (as of this writing) was only trained on data available through 2021, meaning it would not be aware of any new threats or vulnerabilities discovered in the past two years.
- Model “hallucinations”: AI models ChatGPT and Google Bard are “black box” models with inner workings that are not easily understandable. This can make it difficult to understand how they arrive at their decisions or to detect errors in their logic. Google CEO Sundar Pichai recently referred to this as the “hallucination problem” where the model provides incorrect information that is not part of its training set. Lack of documentation, citations, or reference material would make these models unsuitable for situations that may be subject to audit or that require an audit trail.
- Models can be vulnerable to manipulation: The models have been designed to not perform some malicious tasks, such as developing ransomware attacks. Researchers have already identified methods to circumvent these controls. We should expect hackers to find methods for exploiting defensive AI as well.
- Bias: AI and ChatGPT can reflect the biases present in the data used to train them, leading to discriminatory or unfair outcomes. This is particularly concerning in cybersecurity, where biases can lead to incorrect identifications of threats or vulnerabilities.
- Privacy and protection of IP: In some open models, any data you provide may become part of the services training set and become available to other users. Engineers at Samsung recently exposed proprietary code and meeting notes to the model when they asked ChatGPT to debug source code from a semiconductor database. Similarly, Blockfence reported ChatGPT exposing unreported CVEs (essentially zero-day vulnerabilities) apparently researched by an unknown user.
The Future of AI in Cybersecurity
It is important to remember that ChatGPT has only been available publicly for a few months. Its adoption rate – 100 million users after just two months – makes it the “fastest growing consumer application in history.” Its reported value of $29 billion ensures that investments in generative AI will continue, leading to innovative uses of it over the coming years.
Here is what we think:
- Security will improve: There are already example applications available that can help developers and security teams. These include:
- Commoditized AI services: AI offerings will be ubiquitous across cloud service providers like Azure, Amazon, and Google, driving down costs for basic services.
- Data sets will be key: Prepare for this now and remember the “garbage in – garbage out” paradigm. Understand the problems you are trying to solve, and the data required to train models. Organizations can train product support chatbots easily with internal documentation. Threat intelligence will require data sources with low false positives.
- Beware of snake oil: Artificial intelligence, natural language processing, and machine learning are the latest buzz words. We expect to see everyone claim they are using AI. Make sure you understand “how” AI is used and how it is trained.
- We will still have jobs: You will always need humans in the loop to train and validate your model. Higher order tasks will continue to require human oversight. The good news is that AI will make us all more efficient, yielding better results from data analysis at a fraction of the effort.
How to Get Started
The first step is to be sure you are clear on what you want to accomplish, then evaluate GPT models to see if they can help. Don’t buy a solution looking for a problem. That approach almost never results in success, and not everyone will have reason to incorporate GPT models in their business.
Remember, the key to success is starting with good data. We see that as a competitive advantage as we leverage AI. SD Elements provides an expansive content library of threats, countermeasures, regulatory requirements, and security and compliance best practices designed specifically to address the needs of developers. Since 2004 we have continuously improved and expanded this knowledgebase.
There is more to come – stay tuned!