Skip to content

ciioprof0/stixd

Repository files navigation

STIX Descriptions CNL Repository

For my capstone project in the HLTMS program, I aim to develop a Controlled Natural Language (CNL) and necessary tools and databases for Structured Threat Information eXpression (STIX) descriptions in the Cyber Threat Intelligence (CTI) domain. The project, named ‘STIX-D’, will be a subset of Attempto Controlled English (ACE).

Phase 1: Build a STIX-D Corpus

Phase 1 of the project started during the Summer 2024 academic session. It includes three components with each component corresponding to a class project. The components are:

  • LING 508. An application to extract descriptions from STIX objects and parse the description texts into documents, sentences, and words, forming a corpus.
  • INFO 579. A relational database to hold the STIX-D Corpus.
  • INFO 523. Data mining in STIX-D Project.

LING 508: STIX-D Corpus Builder

Overview

This module focuses on developing an application to extract descriptions from STIX objects, parsing the description texts into documents, sentences, and words to form a comprehensive corpus.

Functional Components

Below are the functional components implement so far for the STIX-D Corpus Builder:

Clex Importer

clex_importer.py handles importing the ACE Common Lexicon (Clex) into the STIX-D Corpus Database, parsing the content, and saving lexical entries into the database.

Lexicon Manager

lexicon_manager.py manages the processing of lexical entries and creation of lexicon objects in the database.

Sentence Manager

sent_manager.py manages the processing of sentences and creation of sentence objects in the database.

Document Scraper

doc_scrapper.py manages fetching and processing HTML documents, converting them to markdown.

Document Manager

doc_manager.py manages the processing of documents and creation of document objects in the database.

STIX Importer

stix_importer.py handles importing STIX objects into the STIX-D Corpus Database, parsing the content, and saving descriptions into the database.

MySQL Repository

mysql_repository.py provides the MySQL database implementation for Create, Read, Update, and Delete (CRUD) operations in the repository.

Repository Abstraction

repository.py defines the abstract repository interface, and mysql_repository.py provides the MySQL database implementation for CRUD operations.

UUID Generation

gen_uuid.py and gen_clex_uuid.py handle generating UUIDs for STIX objects and Clex entries, respectively.

Testing

Unit tests are located in the tests directory, utilizing pytest to ensure the functionality of various components.

Flask API

app.py contains a simple Flask API for interacting with the STIX-D Corpus Database.

Table 1. List of Related Third-Party Repositories

No. Repository Description
1. APE ACE Parser Engine (APE)
2. attack-stix-data MITRE ATT&CK dataset represented in STIX 2.1 JSON collections.
3. Clex ACE Common Lexicon
4. cti-pattern-validator A software tool for checking the syntax of the Cyber Threat Intelligence (CTI) STIX Pattern expressions
5. cti-python-stix2 Python APIs for serializing and de-serializing STIX2 JSON content, along with higher-level APIs for common tasks, including data markings, versioning, and for resolving STIX IDs across multiple data sources.
6. cti-stix2-json-schemas JSON schemas and examples for STIX 2
7. cti-stix-validator The STIX Validator checks that STIX JSON content conforms to the requirements specified in the STIX 2.1 specification.