Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CL XML and head_matter fields with data from CAP #4614

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

jtmst
Copy link
Collaborator

@jtmst jtmst commented Oct 24, 2024

Description

This PR introduces a new management command update_cap_cases along with corresponding unit tests. The command is designed to update CourtListener (CL) cases with the latest data from the Caselaw Access Project (CAP).

Key Changes

  • Added update_cap_cases.py management command
  • Implemented test_update_cap_cases.py for unit testing
  • The command processes crosswalk files, fetches CAP HTML and CL XML, and updates CL data accordingly

Testing

Unit tests have been added to for core functionality in the new command

Note

It is necessary to have generated crosswalk files with the generate_capcrosswalk.py command before this script will work

@jtmst jtmst marked this pull request as ready for review October 28, 2024 14:06
@jtmst jtmst requested a review from mlissner October 28, 2024 16:36
@mlissner
Copy link
Member

@flooie, to you for triage, analysis, or both! :)

Copy link
Member

@quevon24 quevon24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a problem, i was testing the command with this cluster: https://www.courtlistener.com/opinion/1539264/go/ (https://static.case.law/a2d/191/html/0138-01.html) and i saw that the resulting xml removed the link from the first footnote and also there is a wrong link in the first footnote in the updated xml:

<footnote data-label="1" id="footnote_1_1">
<footnote citation-index="1" href="#fn2_ref" label="139">1</footnote>
<p data-blocks="[[&quot;BL_175.11&quot;,175,[157,2661,695,68]]]" id="b175-8">. Schibi v. Schibi, <citation data-cite="136 Conn. 190" data-index="0" href="/citations/?q=136%20Conn.%20190">136 Conn. 190</citation>, <citation data-cite="69 A.2d 831" data-index="1" href="/citations/?q=69%20A.2d%20831">69 A.2d 831</citation>, <citation data-cite="14 A.L.R. 2d 620" data-index="2" href="/citations/?q=14%20A.L.R.%202d%20620">14 A.L.R.2d 620</citation>.</p>
</footnote>

image

When we ran the harvard merger command to update opinions and metatada, it also fixed the footnotes(regenerated the tag and link), and that may be breaking the update_cap_html_with_cl_xml function.

Here is how we fixed the footnotes to be linked correctly: https://github.com/freelawproject/courtlistener/blob/main/cl/corpus_importer/management/commands/harvard_merge.py#L518

this is the xml of the cluster i mentioned above:

<?xml version="1.0" encoding="utf-8"?><opinion type="majority"><author id="b174-23"> HOOD, Chief Judge. </author><p id="b174-24"> This appeal is by a husband from an order dismissing his complaint seeking a divorce on the ground of five years voluntary separation. </p><p id="b174-25"> The facts, as found by the trial court, are these. A child was born out of wedlock to the parties in April of 1955. In November of that year the parties were legally married, but separated eight days later and have not lived together since that time. Prior to and at the time of the marriage the parties agreed that the purpose of the marriage was to give the child a legal name and that if they were not satisfied with the marriage a divorce could be obtained. </p><p id="b174-26"> The trial court denied the divorce on the ground that the agreement of the parties prior to and at the time of marriage was collusive and contrary to law. </p><p id="b175-4"><span citation-index="1" class="star-pagination" label="139"> *139 </span> The parties, as the court found, were legally married. Although a marriage is entered into solely for the purpose of legitimizing a child born out of wedlock, such a marriage is a valid one. <a class="footnote" href="#fn1" id="fn1_ref"> 1 </a> The court also found that the parties had lived separate and apart for more than five years. The court did not expressly state that the separation was voluntary, but that is implicit in its finding, and there is no intimation in the record that the separation was other than voluntary. Under our law proof of a valid marriage and five years voluntary separation entitles either party to a divorce. <a class="footnote" href="#fn2" id="fn2_ref"> 2 </a> The sole question is whether the agreement of the parties at the time of marriage bars granting of the divorce. </p><p id="b175-5"> The agreement did not constitute collusion in a legal sense. In general it may be said that collusion, in the law of divorce, implies a corrupt agreement by which evidence is fabricated or suppressed in an attempt to deceive the court and obtain a divorce where legal grounds do not exist. Such was not the case here, but the trial court apparently was of the opinion that an agreement before marriage that if the marriage was unsatisfactory the parties could and would separate and thereafter obtain a divorce, was collusive in nature and contrary to law. </p><p id="b175-6"> When our divorce law was amended in 1935 to include five years voluntary separation as a ground for divorce, it made possible that parties to a marriage could put an end to the marriage by their own voluntary action and after the required period either party could have the marriage legally dissolved. In such a dissolution proceeding there is no question of the innocence or guilt of either party and the reason for the separation is not material. The only issue is the existence of the voluntary separation for the required time. </p><p id="b175-7"> The result is that an agreement by the parties prior to entering marriage that they may voluntarily separate, end the marriage and be divorced, is nothing more than a recognition of the rights given them by law. Such an agreement cannot be said to-be contrary to law. </p><p id="b175-11"> Reversed with instructions to award appellant a divorce. </p><div class="footnotes"><div class="footnote" id="fn1" label="1"><a class="footnote" href="#fn1_ref"> 1 </a><p id="b175-8"> . Schibi v. Schibi, 136 Conn. 190, 69 A.2d 831, 14 A.L.R.2d 620. </p></div><div class="footnote" id="fn2" label="2"><a class="footnote" href="#fn2_ref"> 2 </a><p id="b175-24"> . Code 1961, 16-403. </p></div></div></opinion>

Exception: If an error occurs during the processing of a crosswalk file,
it is caught and logged, but not re-raised.
"""
crosswalk_dir = "cl/search/crosswalks"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to pass the directory where the json files are as an argument (https://github.com/freelawproject/courtlistener/blob/main/cl/search/management/commands/import_harvard_pdfs.py#L48)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this

desc="Processing crosswalks",
) as pbar:
with ThreadPoolExecutor(
max_workers=multiprocessing.cpu_count() * 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should set the max_workers this way, because it carries some risk of overloading, maybe we should pass the max number of workers as an argument, what do you think @flooie ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to take an arg instead with a default of 4 and a max of 16 to avoid this risk

opinions = Opinion.objects.filter(cluster=cl_cluster)

xml_data = []
for opinion in opinions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can achieve the same thing with this:

for opinion in cl_cluster.sub_opinions.all()

@jtmst
Copy link
Collaborator Author

jtmst commented Oct 30, 2024

I found a problem, i was testing the command with this cluster: https://www.courtlistener.com/opinion/1539264/go/ (https://static.case.law/a2d/191/html/0138-01.html) and i saw that the resulting xml removed the link from the first footnote and also there is a wrong link in the first footnote in the updated xml:

<footnote data-label="1" id="footnote_1_1">
<footnote citation-index="1" href="#fn2_ref" label="139">1</footnote>
<p data-blocks="[[&quot;BL_175.11&quot;,175,[157,2661,695,68]]]" id="b175-8">. Schibi v. Schibi, <citation data-cite="136 Conn. 190" data-index="0" href="/citations/?q=136%20Conn.%20190">136 Conn. 190</citation>, <citation data-cite="69 A.2d 831" data-index="1" href="/citations/?q=69%20A.2d%20831">69 A.2d 831</citation>, <citation data-cite="14 A.L.R. 2d 620" data-index="2" href="/citations/?q=14%20A.L.R.%202d%20620">14 A.L.R.2d 620</citation>.</p>
</footnote>

image

When we ran the harvard merger command to update opinions and metatada, it also fixed the footnotes(regenerated the tag and link), and that may be breaking the update_cap_html_with_cl_xml function.

Here is how we fixed the footnotes to be linked correctly: https://github.com/freelawproject/courtlistener/blob/main/cl/corpus_importer/management/commands/harvard_merge.py#L518

this is the xml of the cluster i mentioned above:

<?xml version="1.0" encoding="utf-8"?><opinion type="majority"><author id="b174-23"> HOOD, Chief Judge. </author><p id="b174-24"> This appeal is by a husband from an order dismissing his complaint seeking a divorce on the ground of five years voluntary separation. </p><p id="b174-25"> The facts, as found by the trial court, are these. A child was born out of wedlock to the parties in April of 1955. In November of that year the parties were legally married, but separated eight days later and have not lived together since that time. Prior to and at the time of the marriage the parties agreed that the purpose of the marriage was to give the child a legal name and that if they were not satisfied with the marriage a divorce could be obtained. </p><p id="b174-26"> The trial court denied the divorce on the ground that the agreement of the parties prior to and at the time of marriage was collusive and contrary to law. </p><p id="b175-4"><span citation-index="1" class="star-pagination" label="139"> *139 </span> The parties, as the court found, were legally married. Although a marriage is entered into solely for the purpose of legitimizing a child born out of wedlock, such a marriage is a valid one. <a class="footnote" href="#fn1" id="fn1_ref"> 1 </a> The court also found that the parties had lived separate and apart for more than five years. The court did not expressly state that the separation was voluntary, but that is implicit in its finding, and there is no intimation in the record that the separation was other than voluntary. Under our law proof of a valid marriage and five years voluntary separation entitles either party to a divorce. <a class="footnote" href="#fn2" id="fn2_ref"> 2 </a> The sole question is whether the agreement of the parties at the time of marriage bars granting of the divorce. </p><p id="b175-5"> The agreement did not constitute collusion in a legal sense. In general it may be said that collusion, in the law of divorce, implies a corrupt agreement by which evidence is fabricated or suppressed in an attempt to deceive the court and obtain a divorce where legal grounds do not exist. Such was not the case here, but the trial court apparently was of the opinion that an agreement before marriage that if the marriage was unsatisfactory the parties could and would separate and thereafter obtain a divorce, was collusive in nature and contrary to law. </p><p id="b175-6"> When our divorce law was amended in 1935 to include five years voluntary separation as a ground for divorce, it made possible that parties to a marriage could put an end to the marriage by their own voluntary action and after the required period either party could have the marriage legally dissolved. In such a dissolution proceeding there is no question of the innocence or guilt of either party and the reason for the separation is not material. The only issue is the existence of the voluntary separation for the required time. </p><p id="b175-7"> The result is that an agreement by the parties prior to entering marriage that they may voluntarily separate, end the marriage and be divorced, is nothing more than a recognition of the rights given them by law. Such an agreement cannot be said to-be contrary to law. </p><p id="b175-11"> Reversed with instructions to award appellant a divorce. </p><div class="footnotes"><div class="footnote" id="fn1" label="1"><a class="footnote" href="#fn1_ref"> 1 </a><p id="b175-8"> . Schibi v. Schibi, 136 Conn. 190, 69 A.2d 831, 14 A.L.R.2d 620. </p></div><div class="footnote" id="fn2" label="2"><a class="footnote" href="#fn2_ref"> 2 </a><p id="b175-24"> . Code 1961, 16-403. </p></div></div></opinion>

Taking a look at this. Might be able to just plug in the fix_footnotes logic as a fix. @flooie you had mentioned about making some site wide modifications to footnotes sometimes soon, does that come into play here it all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

4 participants