Skip to content
Kai edited this page Mar 10, 2023 · 20 revisions

Scripture-Quotation-Identification Wiki!

This project was undertaken for the 2023 LightSys Code-a-thon by Kobe Couvion, Kai Delsing, Stephen Venable, and Michael White with the help of Alan Bunning at the CNTR.


Project goal:

Given a church father's writings, automate the identification of locations of possible New Testament scripture citations, allusions, or paraphrases.

Background:

Many documents of the writings of early church fathers exist. However, currently, the only method of finding scriptural citations in these documents is with a Greek1 expert digging through the document line-by-line. This is incredibly time-consuming, which creates the demand for an automated tool to perform this task.

Theoretically, a brute-force tool could be used to forcibly search through every verse in the New Testament; locating every exact quotation in a given document. However, because there were no standardized translations of the Bible at the time of the writings, and the citations were often performed from memory, many flaws were introduced. Examples include spelling errors, changes in word order, citations of fragments of verses, allusions, and combinations thereof. Therefore, a more granular approach must be taken.

Angle of Attack:

The most obvious solution for this problem is a form of neural networks and machine learning, in order to detect matches syntactically and based on context. However, given the timeframe of the project (four work days) and the lack of easily-accessible training data, upon Mr. Bunning's suggestions, a probabilistic analysis path was taken.


Program Components

Component Description
Preprocessor Iterate through the Bible, and create gword objects for use in Probabilistic Data Synthesis
Source Text Parser Iterate through the source text, and create gclause objects for use in Probabilistic Data Synthesis
Probabilistic Data Synthesis Synthesize the data created in the Preprocessor and Source Text Parser, creating a data collection used as the input for Probabilistic Data Analysis
Probabilistic Data Analysis Use the synthesized data created by the rest of the program to analyze trends

Program Flow

Program Flow Image


Stack Flow

Program Flow Image

1: This project was originally designed for Greek documents, but it should be language-agnostic for any Unicode documents if the preprocessor is run with an equivalent New Testament CSV in the correct format.