Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making Parsed Source Code Data Available Externally #314

Open
5 tasks
daomcgill opened this issue Oct 10, 2024 · 2 comments
Open
5 tasks

Making Parsed Source Code Data Available Externally #314

daomcgill opened this issue Oct 10, 2024 · 2 comments
Assignees

Comments

@daomcgill
Copy link
Collaborator


Purpose

This issue is an extension of issue #313. The purpose here is to create configurable /exec scripts that make data tables available externally. The new scripts will add usability to the syntax extraction process by providing a usable way to perform source code annotations and XML querying.

Process

  1. Create script for annotating source code using srcML.
  2. Create script for querying the annotated data. This will accept a predefined query or a user-defined XPath query.
  3. Documentation

New Scripts

/exec/syntaxextractor.R: Script for running the syntax extractor using existing functions in R/src.R. The functionality for this is split into two parts:

  1. Annotation: Takes in a source code folder and uses srcML to generate an annotated XML file.
  2. Querying: Accept predefined XPath queries to extract syntactic elements from the XML files. Allows custom XPath queries to be specified by the user. Outputs the query results.

Task List

  • Prerequisite: completion of issue Expanding the Syntax Extractor #313
  • Create a new script in /exec
  • Implement functionality for annotation
  • Implement functionality for queries
  • Documentation: explain how to use exec scripts, configuration and parameters

@daomcgill
Copy link
Collaborator Author

@carlosparadis part II

@carlosparadis
Copy link
Member

@daomcgill For this one I would consider making two execs, one that annotates, and the other that can query the file. Annotating can take a long time depending on the size of the project, hence the split.

Otherwise, I think this is good! We can take another pass once #313 is done.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants