Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

612 Introduced RECAP Search Alerts sweep index #4127

Merged
merged 44 commits into from
Oct 19, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
3e4f269
fix(elasticsearch): Test RECAP nested index reliability
albertisfu Jun 21, 2024
53b3b65
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Jun 21, 2024
2955b0b
fix(alerts): Changed sweep index approach to parent-child documents
albertisfu Jun 22, 2024
9307b77
fix(alerts): Added cl_send_recap_alerts command
albertisfu Jun 25, 2024
9b4e1c1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 25, 2024
8b537f0
fix(alerts): Implemented filtering of RECAP alerts hits for the sweep…
albertisfu Jun 27, 2024
c1232ec
fix(alerts): Updated ES alert email templates to support RECAP Alerts.
albertisfu Jun 28, 2024
3e96f61
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Jun 28, 2024
51c7bb6
fix(alerts): Group alerts and case hits limit
albertisfu Jun 28, 2024
7fc3298
fix(alerts): Trigger RECAP search alerts webhooks
albertisfu Jun 29, 2024
b5016ba
fix(alerts): Schedule wly and mly RECAP Search Alerts
albertisfu Jun 29, 2024
4a128bf
fix(alerts): Copy documents from the main index to the sweep index us…
albertisfu Jul 2, 2024
3a4a456
fix(alerts): Fixed RECAPSweepDocument index mapping
albertisfu Jul 2, 2024
add980a
fix(alerts): Tweak RECAP Alert estimation query to consider both Dock…
albertisfu Jul 3, 2024
a20113f
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Jul 3, 2024
ebf269d
fix(elasticsearch): Fixed build_daterange_query type hint
albertisfu Jul 3, 2024
bffee6d
fix(alerts): Fixed re_index task estimated remaining time compute
albertisfu Jul 3, 2024
847f0fd
fix(alerts): Handle creation and removal of the RECAP alerts sweep in…
albertisfu Jul 3, 2024
4b324c9
fix(elasticsearch): Fixed tests related to timestamp updates
albertisfu Jul 3, 2024
0d63080
fix(alerts): Fix should_docket_hit_be_included date comparison
albertisfu Jul 4, 2024
5b3d130
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Jul 4, 2024
5077e01
fix(alerts): Changed approach to filter out cross-object hits by usin…
albertisfu Jul 10, 2024
9dffbfd
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Jul 10, 2024
a4e4e62
fix(alerts): Added more tests related to filtering cross-object hits.
albertisfu Jul 10, 2024
49dd480
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Jul 10, 2024
a468336
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Jul 19, 2024
38d6884
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Jul 25, 2024
b56f235
fix(alerts): Restore send_es_search_alert_webhook to avoid conflicts …
albertisfu Jul 25, 2024
d102664
fix(alerts): Fixed MLY alerts test can't be sent after the 28th
albertisfu Jul 29, 2024
7977b80
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Sep 26, 2024
57b6df7
fix(alerts): Fixed merge conflicts and adjust test accordingly new RE…
albertisfu Sep 26, 2024
b35ef0a
fix(elasticsearch): Fixed failing test due to build_full_join_es_quer…
albertisfu Sep 27, 2024
8902aa0
fix(alerts): Removed recap_document_hl_matched as we no longer rely o…
albertisfu Sep 27, 2024
d0b1298
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Sep 27, 2024
d8c4db2
Merge branch 'main' into 612-introduced-recap-search-alerts
ERosendo Oct 18, 2024
4babf5d
feat(custom filter): Refactor alerts_supported method for better read…
ERosendo Oct 18, 2024
a0085cc
refactor(alerts): Cleaned up unused imports in utils.py
ERosendo Oct 18, 2024
5f12c30
refactor(search): Cleanup unused constants
ERosendo Oct 18, 2024
5fb177f
refactor(alerts): Replaces Type import with built-in alternative
ERosendo Oct 18, 2024
78955f1
refactor(search): Removes unused argument from index command
ERosendo Oct 18, 2024
da72292
feat(alert): Implements early returns in recap alert command
ERosendo Oct 18, 2024
0b62dca
feat(alerts): Adds TaskCompletionStatus dataclass for tracking task p…
ERosendo Oct 18, 2024
3b153d2
feat(lib): Introduces EsMainQueries Dataclass
ERosendo Oct 18, 2024
debd256
Merge branch 'main' into 612-introduced-recap-search-alerts
albertisfu Oct 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
286 changes: 272 additions & 14 deletions cl/lib/elasticsearch_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import re
import time
import traceback
from collections import defaultdict
from copy import deepcopy
from dataclasses import fields
from functools import reduce, wraps
Expand Down Expand Up @@ -68,6 +69,7 @@
SEARCH_RECAP_CHILD_HL_FIELDS,
SEARCH_RECAP_CHILD_QUERY_FIELDS,
SEARCH_RECAP_HL_FIELDS,
SEARCH_RECAP_NESTED_CHILD_QUERY_FIELDS,
SEARCH_RECAP_PARENT_QUERY_FIELDS,
api_child_highlight_map,
)
Expand Down Expand Up @@ -1066,6 +1068,7 @@ def build_es_base_query(
cd: CleanData,
child_highlighting: bool = True,
api_version: Literal["v3", "v4"] | None = None,
nested_query: bool = False,
) -> tuple[Search, QueryString | None]:
"""Builds filters and fulltext_query based on the given cleaned
data and returns an elasticsearch query.
Expand All @@ -1074,6 +1077,7 @@ def build_es_base_query(
:param cd: The cleaned data object containing the query and filters.
:param child_highlighting: Whether highlighting should be enabled in child docs.
:param api_version: Optional, the request API version.
:param nested_query: Whether to perform a nested query.
:return: A two-tuple, the Elasticsearch search query object and an ES
QueryString for child documents, or None if there is no need to query
child documents.
Expand Down Expand Up @@ -1151,6 +1155,15 @@ def build_es_base_query(
],
)
)
nested_child_fields = SEARCH_RECAP_NESTED_CHILD_QUERY_FIELDS.copy()
nested_child_fields.extend(
add_fields_boosting(
cd,
[
"description",
],
)
)
child_query_fields = {"recap_document": child_fields}
parent_query_fields = SEARCH_RECAP_PARENT_QUERY_FIELDS.copy()
parent_query_fields.extend(
Expand All @@ -1162,13 +1175,22 @@ def build_es_base_query(
],
)
)
main_query, join_query = build_full_join_es_queries(
cd,
child_query_fields,
parent_query_fields,
child_highlighting=child_highlighting,
api_version=api_version,
)

if nested_query:
main_query, _ = build_full_nested_es_queries(
cd,
nested_child_fields,
parent_query_fields,
)
else:
main_query, join_query = build_full_join_es_queries(
cd,
child_query_fields,
parent_query_fields,
child_highlighting=child_highlighting,
api_version=api_version,
)

case SEARCH_TYPES.OPINION:
str_query = cd.get("q", "")
related_match = RELATED_PATTERN.search(str_query)
Expand Down Expand Up @@ -1984,11 +2006,14 @@ def fetch_es_results(
return [], 0, error, None, None


def build_has_child_filters(cd: CleanData) -> list[QueryString]:
def build_has_child_filters(
cd: CleanData, nested_query=False
) -> list[QueryString]:
"""Builds Elasticsearch 'has_child' filters based on the given child type
and CleanData.

:param cd: The user input CleanedData.
:param nested_query: Whether to perform a nested query.
:return: A list of QueryString objects containing the 'has_child' filters.
"""

Expand Down Expand Up @@ -2022,22 +2047,36 @@ def build_has_child_filters(cd: CleanData) -> list[QueryString]:
attachment_number = cd.get("attachment_number", "")

if available_only:
field = (
"is_available"
if not nested_query
else "documents.is_available"
)
queries_list.extend(
build_term_query(
"is_available",
field,
available_only,
)
)
if description:
queries_list.extend(build_text_filter("description", description))
field = (
"description" if not nested_query else "documents.description"
)
queries_list.extend(build_text_filter(field, description))
if document_number:
queries_list.extend(
build_term_query("document_number", document_number)
field = (
"document_number"
if not nested_query
else "documents.document_number"
)
queries_list.extend(build_term_query(field, document_number))
if attachment_number:
queries_list.extend(
build_term_query("attachment_number", attachment_number)
field = (
"attachment_number"
if not nested_query
else "documents.attachment_number"
)
queries_list.extend(build_term_query(field, attachment_number))

return queries_list

Expand Down Expand Up @@ -3014,3 +3053,222 @@ def do_es_alert_estimation_query(
estimation_query, _ = build_es_base_query(search_query, cd)

return estimation_query.count()


def build_nested_child_query(
query: QueryString | str,
child_type: str,
child_hits_limit: int,
highlighting_fields: dict[str, int] | None = None,
) -> QueryString:
"""Build a nested query.

:param query: The Elasticsearch query string or QueryString object.
:param child_type: The type of the child document.
:param child_hits_limit: The maximum number of child hits to be returned.
:param highlighting_fields: List of fields to highlight in child docs.
:return: The 'has_child' query.
"""

highlight_options, fields_to_exclude = build_highlights_dict(
highlighting_fields, SEARCH_HL_TAG
)
inner_hits = {
"name": f"filter_query_inner_{child_type}",
"size": child_hits_limit,
"_source": {
"excludes": fields_to_exclude,
},
}
if highlight_options:
inner_hits["highlight"] = highlight_options

return Q(
"nested",
path="documents",
score_mode="max",
query=query,
inner_hits=inner_hits,
)


def build_full_nested_es_queries(
cd: CleanData,
child_query_fields: list[str],
parent_query_fields: list[str],
) -> tuple[QueryString | list, QueryString | None]:
"""Build a complete Elasticsearch query with both parent and nested
documents conditions.

:param cd: The query CleanedData
:param child_query_fields: A dictionary mapping child fields document type.
:param parent_query_fields: A list of fields for the parent document.
:return: An Elasticsearch QueryString object.
"""

q_should = []
child_query = None
if cd["type"] in [
SEARCH_TYPES.RECAP,
SEARCH_TYPES.DOCKETS,
SEARCH_TYPES.RECAP_DOCUMENT,
SEARCH_TYPES.OPINION,
SEARCH_TYPES.PEOPLE,
]:
# Build child filters.
child_filters = build_has_child_filters(cd, nested_query=True)
# Copy the original child_filters before appending parent fields.
# For its use later in the parent filters.
child_filters_original = deepcopy(child_filters)
# Build child text query.
child_fields = [f"documents.{field}" for field in child_query_fields]
child_text_query = build_fulltext_query(
child_fields, cd.get("q", ""), only_queries=True
)

# Build parent filters.
parent_filters = build_join_es_filters(cd)

# Build the child query based on child_filters and child child_text_query
match child_filters, child_text_query:
case [], []:
pass
case [], _:
child_query = Q(
"bool",
should=child_text_query,
minimum_should_match=1,
)
case _, []:
child_query = Q(
"bool",
filter=child_filters,
)
case _, _:
child_query = Q(
"bool",
filter=child_filters,
should=child_text_query,
minimum_should_match=1,
)

_, query_hits_limit = get_child_top_hits_limit(cd, cd["type"])
has_child_query = None
if child_text_query or child_filters:
hl_fields = api_child_highlight_map.get((True, cd["type"]), {})
has_child_query = build_nested_child_query(
child_query,
"recap_document",
query_hits_limit,
hl_fields,
)

if has_child_query:
q_should.append(has_child_query)

# Build the parent filter and text queries.
string_query = build_fulltext_query(
parent_query_fields, cd.get("q", ""), only_queries=True
)

# If child filters are set, add a nested query as a filter to the
# parent query to exclude results without matching children.
if child_filters_original:
parent_filters.append(
Q(
"nested",
path="documents",
score_mode="max",
query=Q("bool", filter=child_filters_original),
)
)
parent_query = None
match parent_filters, string_query:
case [], []:
pass
case [], _:
parent_query = Q(
"bool",
should=string_query,
minimum_should_match=1,
)
case _, []:
parent_query = Q(
"bool",
filter=parent_filters,
)
case _, _:
parent_query = Q(
"bool",
filter=parent_filters,
should=string_query,
minimum_should_match=1,
)
if parent_query:
q_should.append(parent_query)

if not q_should:
return [], child_query

final_query = Q(
"bool",
should=q_should,
)
return (
final_query,
child_query,
)


def do_es_sweep_nested_query(
search_query: Search,
cd: CleanData,
) -> tuple[list[defaultdict] | None, int | None]:
"""Build an ES query for its use in the daily RECAP sweep index.

:param search_query: Elasticsearch DSL Search object.
:param cd: The query CleanedData
:return: A two-tuple, the Elasticsearch search query object and an ES
Query for child documents, or None if there is no need to query
child documents.
"""

search_form = SearchForm(cd, is_es_form=True)
if search_form.is_valid():
cd = search_form.cleaned_data
else:
return None, None

hits = None
try:
s, _ = build_es_base_query(
search_query,
cd,
True,
nested_query=True,
)
except (
UnbalancedParenthesesQuery,
UnbalancedQuotesQuery,
BadProximityQuery,
) as e:
raise ElasticBadRequestError(detail=e.message)
main_query = add_es_highlighting(s, cd, highlighting=True)
main_query = main_query.extra(from_=0, size=30)
mlissner marked this conversation as resolved.
Show resolved Hide resolved
results = main_query.execute()
if results:
hits = results.hits.total.value

limit_inner_hits({}, results, cd["type"])
set_results_highlights(results, cd["type"])

for result in results:
child_result_objects = []
if hasattr(result, "child_docs"):
for child_doc in result.child_docs:
child_result_objects.append(
defaultdict(lambda: None, child_doc["_source"].to_dict())
)
result["child_docs"] = child_result_objects

return results, hits
15 changes: 15 additions & 0 deletions cl/search/api_serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -731,3 +731,18 @@ class Meta:
"pacer_doc_id",
"trustee_str",
)


class RECAPNestedResultSerializer(
RECAPMetaMixin, BaseDocketESResultSerializer
):
"""The serializer class for RECAP search type results."""

recap_documents = BaseRECAPDocumentESResultSerializer(
many=True, read_only=True, source="child_docs"
)

class Meta(BaseDocketESResultSerializer.Meta):
exclude = BaseDocketESResultSerializer.Meta.exclude + (
"docket_absolute_url",
)
5 changes: 5 additions & 0 deletions cl/search/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,11 @@
"chapter",
"trustee_str",
]
SEARCH_RECAP_NESTED_CHILD_QUERY_FIELDS = [
"short_description",
"plain_text",
"document_type",
]
SEARCH_OPINION_QUERY_FIELDS = [
"court",
"court_id",
Expand Down
Loading
Loading