-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This project was undertaken for the 2023 LightSys Code-a-thon by Kobe Couvion, Kai Delsing, Stephen Venable, and Michael White with the help of Alan Bunning at the CNTR.
Given a church father's writings, automate the identification of locations of possible New Testament scripture citations, allusions, or paraphrases.
Many documents of the writings of early church fathers exist. However, currently, the only method of finding scriptural citations in these documents is with a Greek1 expert digging through the document line-by-line. This is incredibly time-consuming, which creates the demand for an automated tool to perform this task.
Theoretically, a brute-force tool could be used to forcibly search through every verse in the New Testament; locating every exact quotation in a given document. However, because there were no standardized translations of the Bible at the time of the writings, and the citations were often performed from memory, many flaws were introduced. Examples include spelling errors, changes in word order, citations of fragments of verses, allusions, and combinations thereof. Therefore, a more granular approach must be taken.
The most obvious solution for this problem is a form of neural networks and machine learning, in order to detect matches syntactically and based on context. However, given the timeframe of the project (four work days) and the lack of easily-accessible training data, upon Mr. Bunning's suggestions, a probabilistic analysis path was taken.
Component | Description |
---|---|
Preprocessor | Iterate through the Bible, and create gword objects for use in Probabilistic Data Synthesis |
Source Text Parser | Iterate through the source text, and create gclause objects for use in Probabilistic Data Synthesis |
Probabilistic Data Synthesis | Synthesize the data created in the Preprocessor and Source Text Parser, creating a data collection used as the input for Probabilistic Data Analysis |
Probabilistic Data Analysis | Use the synthesized data created by the rest of the program to analyze trends |
1: This project was originally designed for Greek documents, but it should be language-agnostic for any Unicode documents if the preprocessor is run with an equivalent New Testament CSV in the correct format.