Skip to content

Commit

Permalink
Update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
howff committed Jun 3, 2024
1 parent b0b7bf2 commit cbc72c6
Showing 1 changed file with 12 additions and 1 deletion.
13 changes: 12 additions & 1 deletion doc/anonymisation.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,9 @@ For example
"PHI_rules": {
"clinic": [
{
"comment": "A full description of this rule in plain English.",
"test_true": [ "list of strings which the pattern must match", "more" ],
"test_false": [ "list of strings which the pattern must not match", "more" ],
"pattern": "\\bplease\\s+contact(\\s+\\w+(\\s+\\w+){0,2})",
"flags": [ "ignorecase" ],
"data_labels": [ "name" ],
Expand All @@ -159,7 +162,9 @@ For example
```

The pattern is a python regex but note that as it's in JSON it needs a
double backslash. Note that the regex will be searched in fragments of
double backslash so things like `\b` for boundary should be written `\\b`.

Note that the regex will be searched in fragments of
the document, not the whole document and not necessarily sentences.
(In fact it may be whole sections defined by `working_fields`). This
has implications for anchors such as `^` and `$`, and `multiline`.
Expand All @@ -178,6 +183,12 @@ The `data_type` is used to identify what type of information was extracted.
`disabled` is optional; when true, the rule is not used.

`comment` could also be used to give an explanation for the rule.
The comment is optional but should be used to describe the rule in plain English.

The tests are optional but should be used to allow automated testing of rules,
using the `test_rules.py` script. All strings in the `test_true` list should
contain something which matches the pattern and all strings in the `test_false` list
should contain something that is not matched by the pattern.

### Document structure rules

Expand Down

0 comments on commit cbc72c6

Please sign in to comment.