Skip to content

Commit

Permalink
Merge branch 'master' into 285-bugzilla-downloader-refresh
Browse files Browse the repository at this point in the history
  • Loading branch information
beydlern committed Oct 10, 2024
2 parents 9ef70d4 + 7e7afba commit 192fad3
Show file tree
Hide file tree
Showing 11 changed files with 51 additions and 33 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/R-CMD-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: macOS-13
strategy:
matrix:
r-version: ['4.2']
r-version: ['4.4']

steps:
- uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test-coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: macOS-13
strategy:
matrix:
r-version: ['4.2']
r-version: ['4.4']
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
steps:
Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Imports:
httr (>= 1.4.1),
curl (>= 4.3),
gh (>= 1.2.0),
XML (>= 3.99-0),
XML (>= 3.99-0.7),
RColorBrewer (>= 1.1-2),
cli (>= 2.0.2),
docopt (>= 0.7.1)
Expand Down
5 changes: 3 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ __kaiaulu 0.0.0.9700 (in development)__

### NEW FEATURES

* `refresh_jira_issues()` had been added. It is a wrapper function for the previous downloader and downloads only issues greater than the greatest key already downloaded.
* `download_jira_issues()`, `download_jira_issues_by_issue_key()`, and `download_jira_issues_by_date()` has been added. This allows for downloading of Jira issues without the use of JirAgileR [#275](https://github.com/sailuh/kaiaulu/issues/275) and specification of issue Id and created ranges. It also interacts with `parse_jira_latest_date` to implement a refresh capability.
* `refresh_jira_issues()` had been added. It is a wrapper function for the previous downloader and downloads only issues greater than the greatest key already downloaded. [#275](https://github.com/sailuh/kaiaulu/issues/275)
* `download_jira_issues()`, `download_jira_issues_by_issue_key()`, and `download_jira_issues_by_date()` has been added. This allows for downloading of Jira issues without the use of JirAgileR and specification of issue Id and created ranges. It also interacts with `parse_jira_latest_date()` to implement a refresh capability. [#275](https://github.com/sailuh/kaiaulu/issues/275)
* `make_jira_issue()` and `make_jira_issue_tracker()` no longer create fake issues following JirAgileR format, but instead the raw data obtained from JIRA API. This is compatible with the new parser function for JIRA. [#277](https://github.com/sailuh/kaiaulu/issues/277)
* `parse_jira()` now parses folders containing raw JIRA JSON files without depending on JirAgileR. [#276](https://github.com/sailuh/kaiaulu/issues/276)
* The `parse_jira_latest_date()` has been added. This function returns the file name of the downloaded JIRA JSON containing the latest date for use by `download_jira_issues()` to implement a refresh capability. [#276](https://github.com/sailuh/kaiaulu/issues/276)
Expand All @@ -28,6 +28,7 @@ __kaiaulu 0.0.0.9700 (in development)__

### MINOR IMPROVEMENTS

* Issue #275, when introducing the concept of refresh on JIRA, affected some notebooks that still relied on data in that format. This issue change either notebook or config file to conform to the new JIRA downloader [#312](https://github.com/sailuh/kaiaulu/issues/312)
* The line metrics notebook now provides further guidance on adjusting the snapshot and filtering.
* The R File and R Function parser can now properly parse R folders which contain folders within (not following R package structure). Both `.r` and `.R` files are also now captured (previously only one of the two were specified, but R accepts both). [#235](https://github.com/sailuh/kaiaulu/issues/235)
* Refactor GoF Notebook in Graph GoF and Text GoF Notebooks [#224](https://github.com/sailuh/kaiaulu/issues/224)
Expand Down
8 changes: 4 additions & 4 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,10 @@ reference:
- parse_jira_rss_xml
- make_jira_issue
- make_jira_issue_tracker
- download_jira_issues_comments
- download_jira_issues_comments_by_date
- download_jira_issues_comments_by_issuekey
- refresh_jira_issues_comments_by_issuekey
- download_jira_issues
- download_jira_issues_by_date
- download_jira_issues_by_issue_key
- refresh_jira_issues
- title: __GitHub__
desc: >
Functions to interact and download data from GitHub API.
Expand Down
12 changes: 6 additions & 6 deletions conf/camel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,12 @@ version_control:
# List of branches used for analysis
branch:
- camel-1.6.0
- camel-1.0.0
- camel-2.11.4
- camel-3.21.0
- camel-1.0.0

mailing_list:
mod_mbox:
mod_mbox:
mail_key_1:
archive_url: http://mail-archives.apache.org/mod_mbox/camel-dev
mbox: ../../rawdata/camel/mod_mbox/camel-dev/
Expand All @@ -66,11 +66,11 @@ mailing_list:
issue_tracker:
jira:
# Obtained from the project's JIRA URL
# domain: https://issues.apache.org/jira
domain: https://issues.apache.org/jira
project_key: CAMEL
# Download using `download_jira_data.Rmd`
issues: ../../rawdata/camel/jira/issues/
issue_comments: ../../rawdata/camel/jira/issue_comments/
issues: ../../rawdata/issue_tracker/camel/issues/
issue_comments: ../../rawdata/issue_tracker/camel/issue_comments/
# github:
# Obtained from the project's GitHub URL
# owner: apache
Expand Down Expand Up @@ -155,7 +155,7 @@ tool:
# The project folder path to store various intermediate
# files for DV8 Analysis
# The folder name will be used in the file names.
folder_path: ../../analysis/dv8/camel_1_0_0
folder_path: ../../analysis/dv8/camel_1_6
# the architectural flaws thresholds that should be used
architectural_flaws:
cliqueDepends:
Expand Down
4 changes: 2 additions & 2 deletions conf/helix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,8 @@ issue_tracker:
domain: https://issues.apache.org/jira
project_key: HELIX
# Download using `download_jira_data.Rmd`
issues: ../../rawdata/helix/jira/issues/helix
issue_comments: ../../rawdata/helix/jira/issue_comments/helix
issues: ../../rawdata/issue_tracker/helix/issues/
issue_comments: ../../rawdata/issue_tracker/helix/issue_comments/
github:
project_key_1:
# Obtained from the project's GitHub URL
Expand Down
2 changes: 1 addition & 1 deletion vignettes/causal_flaws.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ scc_path <- tool[["scc"]]
# Gitlog parameters
git_repo_path <- conf[["version_control"]][["log"]]
git_branch <- conf[["version_control"]][["branch"]][4] # camel 1.0.0
git_branch <- conf[["version_control"]][["branch"]][1] # camel 1.6.0
# Depends parameters
Expand Down
18 changes: 12 additions & 6 deletions vignettes/download_jira_issues.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,10 @@ Note in the subsequent code block we specified the fields from the issue we are

Beware that even if only 3 issues exist in a JIRA, a large time range will still request several API calls (in contrast to the issue endpoint below). Therefore, it is advisable to use the issue key query instead which is explained in the next sub-section.

```{r}
file.exists(save_path_issue_tracker_issues)
```

```{r eval = FALSE}
# e.g. date_lower_bound <- "1970/01/01".
date_lower_bound <- "2023/11/16 21:00"
Expand Down Expand Up @@ -162,11 +166,14 @@ In the subsequent codeblock, note we also include a new field, `comment`. This i

```{r eval = FALSE}
# eg issueKey_lower_bound <- "GERONIMO-740"
#issue_key_lower_bound <- "GERONIMO-500"
#issue_key_upper_bound <- "GERONIMO-560"
#issue_key_lower_bound <- "GERONIMO-5000"
#issue_key_upper_bound <- "GERONIMO-5010"
issue_key_lower_bound <- "SAILUH-1"
issue_key_upper_bound <- "SAILUH-3"
issue_key_upper_bound <- "SAILUH-7"
#issue_key_lower_bound <- "CAMEL-1"
#issue_key_upper_bound <- "CAMEL-800"
all_issues <- download_jira_issues_by_issue_key(domain = issue_tracker_domain,
jql_query = paste0("project='",issue_tracker_project_key,"'"),
Expand All @@ -192,14 +199,13 @@ all_issues <- download_jira_issues_by_issue_key(domain = issue_tracker_domain,
username = username,
password = password,
save_folder_path = save_path_issue_tracker_issue_comments,
max_results = 50,
max_total_downloads = 60,
max_results = 500,
max_total_downloads = 500,
issue_key_lower_bound = issue_key_lower_bound,
issue_key_upper_bound = issue_key_upper_bound,
verbose = TRUE)
```


```{r}
names(parsed_jira_issues)
```
Expand Down
8 changes: 5 additions & 3 deletions vignettes/reply_communication_showcase.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Load config file.

```{r}
tool <- yaml::read_yaml("../tools.yml")
conf <- yaml::read_yaml("../conf/geronimo.yml")
conf <- yaml::read_yaml("../conf/helix.yml")
perceval_path <- tool[["perceval"]]
mbox_path <- conf[["mailing_list"]][["mbox"]]
Expand Down Expand Up @@ -79,9 +79,11 @@ project_mbox <- project_mbox[!is.na(reply_datetimetz)]
project_jira <- project_jira[!is.na(reply_datetimetz)]
project_mbox_slice <- project_mbox[reply_datetimetz >= as.POSIXct("2005-08-01", format = "%Y-%m-%d",tz = "UTC") & reply_datetimetz < as.POSIXct("2005-08-30", format = "%Y-%m-%d",tz = "UTC")]
project_jira_slice <- project_jira[reply_datetimetz >= as.POSIXct("2005-08-01", format = "%Y-%m-%d",tz = "UTC") & reply_datetimetz < as.POSIXct("2005-08-30", format = "%Y-%m-%d",tz = "UTC")]
project_mbox_slice <- project_mbox[reply_datetimetz >= as.POSIXct("2018-04-29", format = "%Y-%m-%d",tz = "UTC") & reply_datetimetz < as.POSIXct("2019-02-26", format = "%Y-%m-%d",tz = "UTC")]
project_jira_slice <- project_jira[reply_datetimetz >= as.POSIXct("2018-04-29", format = "%Y-%m-%d",tz = "UTC") & reply_datetimetz < as.POSIXct("2019-02-26", format = "%Y-%m-%d",tz = "UTC")]
```

# Mailing List
Expand Down
21 changes: 15 additions & 6 deletions vignettes/social_smell_showcase.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ As stated in the introduction, we need both git log and at least one communicati

To get started, we use the `parse_gitlog` function to extract a table from the git log. You can inspect the `project_git` variable to inspect what information is available from the git log.

```{r}
```{r Parse Git Log}
git_checkout(git_branch,git_repo_path)
project_git <- parse_gitlog(perceval_path,git_repo_path)
project_git <- project_git %>%
Expand All @@ -110,7 +110,7 @@ Next, we parse the various communication channels the project use. Similarly to

We also have to parse and normalize the timezone across the different projects. Since one of the social metrics in the quality framework is the count of different timezones, we separate the timezone information before normalizing them.

```{r}
```{r Convert Timestamps to POSIXct}
project_git$author_tz <- sapply(stringi::stri_split(project_git$author_datetimetz,
regex=" "),"[[",6)
project_git$author_datetimetz <- as.POSIXct(project_git$author_datetimetz,
Expand Down Expand Up @@ -242,15 +242,24 @@ A third choice we make here is whether the collaboration being analyzed is done

## Community Detection

For some social smells, such as radio silence and primma donna, community detection is required to be applied to the constructed networks. Do consider the implications of the one chosen below in your results.
For some social smells, such as radio silence and primma donna, community detection is required to be applied to the constructed networks. Do consider the implications of the one chosen below in your results. We will use a sample of the data here for demonstration instead of the full dataset:

```{r}
# Define all timestamp in number of days since the very first commit of the repo
# Note here the start_date and end_date are in respect to the git log.
# Transform commit hashes into datetime so window_size can be used
start_date <- get_date_from_commit_hash(project_git,start_commit)
end_date <- get_date_from_commit_hash(project_git,end_commit)
#start_date <- get_date_from_commit_hash(project_git,start_commit)
#end_date <- get_date_from_commit_hash(project_git,end_commit)
start_date <- as.POSIXct("2012-10-17 18:19:46", tz = "UTC")
end_date <- as.POSIXct("2013-02-17 18:19:46", tz = "UTC")
```


```{r Compute Social Smells}
datetimes <- project_git$author_datetimetz
reply_datetimes <- project_reply$reply_datetimetz
Expand Down

0 comments on commit 192fad3

Please sign in to comment.