-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CL XML and head_matter fields with data from CAP #4614
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
@flooie, to you for triage, analysis, or both! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a problem, i was testing the command with this cluster: https://www.courtlistener.com/opinion/1539264/go/ (https://static.case.law/a2d/191/html/0138-01.html) and i saw that the resulting xml removed the link from the first footnote and also there is a wrong link in the first footnote in the updated xml:
<footnote data-label="1" id="footnote_1_1">
<footnote citation-index="1" href="#fn2_ref" label="139">1</footnote>
<p data-blocks="[["BL_175.11",175,[157,2661,695,68]]]" id="b175-8">. Schibi v. Schibi, <citation data-cite="136 Conn. 190" data-index="0" href="/citations/?q=136%20Conn.%20190">136 Conn. 190</citation>, <citation data-cite="69 A.2d 831" data-index="1" href="/citations/?q=69%20A.2d%20831">69 A.2d 831</citation>, <citation data-cite="14 A.L.R. 2d 620" data-index="2" href="/citations/?q=14%20A.L.R.%202d%20620">14 A.L.R.2d 620</citation>.</p>
</footnote>
When we ran the harvard merger command to update opinions and metatada, it also fixed the footnotes(regenerated the tag and link), and that may be breaking the update_cap_html_with_cl_xml function.
Here is how we fixed the footnotes to be linked correctly: https://github.com/freelawproject/courtlistener/blob/main/cl/corpus_importer/management/commands/harvard_merge.py#L518
this is the xml of the cluster i mentioned above:
<?xml version="1.0" encoding="utf-8"?><opinion type="majority"><author id="b174-23"> HOOD, Chief Judge. </author><p id="b174-24"> This appeal is by a husband from an order dismissing his complaint seeking a divorce on the ground of five years voluntary separation. </p><p id="b174-25"> The facts, as found by the trial court, are these. A child was born out of wedlock to the parties in April of 1955. In November of that year the parties were legally married, but separated eight days later and have not lived together since that time. Prior to and at the time of the marriage the parties agreed that the purpose of the marriage was to give the child a legal name and that if they were not satisfied with the marriage a divorce could be obtained. </p><p id="b174-26"> The trial court denied the divorce on the ground that the agreement of the parties prior to and at the time of marriage was collusive and contrary to law. </p><p id="b175-4"><span citation-index="1" class="star-pagination" label="139"> *139 </span> The parties, as the court found, were legally married. Although a marriage is entered into solely for the purpose of legitimizing a child born out of wedlock, such a marriage is a valid one. <a class="footnote" href="#fn1" id="fn1_ref"> 1 </a> The court also found that the parties had lived separate and apart for more than five years. The court did not expressly state that the separation was voluntary, but that is implicit in its finding, and there is no intimation in the record that the separation was other than voluntary. Under our law proof of a valid marriage and five years voluntary separation entitles either party to a divorce. <a class="footnote" href="#fn2" id="fn2_ref"> 2 </a> The sole question is whether the agreement of the parties at the time of marriage bars granting of the divorce. </p><p id="b175-5"> The agreement did not constitute collusion in a legal sense. In general it may be said that collusion, in the law of divorce, implies a corrupt agreement by which evidence is fabricated or suppressed in an attempt to deceive the court and obtain a divorce where legal grounds do not exist. Such was not the case here, but the trial court apparently was of the opinion that an agreement before marriage that if the marriage was unsatisfactory the parties could and would separate and thereafter obtain a divorce, was collusive in nature and contrary to law. </p><p id="b175-6"> When our divorce law was amended in 1935 to include five years voluntary separation as a ground for divorce, it made possible that parties to a marriage could put an end to the marriage by their own voluntary action and after the required period either party could have the marriage legally dissolved. In such a dissolution proceeding there is no question of the innocence or guilt of either party and the reason for the separation is not material. The only issue is the existence of the voluntary separation for the required time. </p><p id="b175-7"> The result is that an agreement by the parties prior to entering marriage that they may voluntarily separate, end the marriage and be divorced, is nothing more than a recognition of the rights given them by law. Such an agreement cannot be said to-be contrary to law. </p><p id="b175-11"> Reversed with instructions to award appellant a divorce. </p><div class="footnotes"><div class="footnote" id="fn1" label="1"><a class="footnote" href="#fn1_ref"> 1 </a><p id="b175-8"> . Schibi v. Schibi, 136 Conn. 190, 69 A.2d 831, 14 A.L.R.2d 620. </p></div><div class="footnote" id="fn2" label="2"><a class="footnote" href="#fn2_ref"> 2 </a><p id="b175-24"> . Code 1961, 16-403. </p></div></div></opinion>
Exception: If an error occurs during the processing of a crosswalk file, | ||
it is caught and logged, but not re-raised. | ||
""" | ||
crosswalk_dir = "cl/search/crosswalks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to pass the directory where the json files are as an argument (https://github.com/freelawproject/courtlistener/blob/main/cl/search/management/commands/import_harvard_pdfs.py#L48)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed this
desc="Processing crosswalks", | ||
) as pbar: | ||
with ThreadPoolExecutor( | ||
max_workers=multiprocessing.cpu_count() * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should set the max_workers this way, because it carries some risk of overloading, maybe we should pass the max number of workers as an argument, what do you think @flooie ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to take an arg instead with a default of 4 and a max of 16 to avoid this risk
opinions = Opinion.objects.filter(cluster=cl_cluster) | ||
|
||
xml_data = [] | ||
for opinion in opinions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can achieve the same thing with this:
for opinion in cl_cluster.sub_opinions.all()
Taking a look at this. Might be able to just plug in the fix_footnotes logic as a fix. @flooie you had mentioned about making some site wide modifications to footnotes sometimes soon, does that come into play here it all? |
Description
This PR introduces a new management command
update_cap_cases
along with corresponding unit tests. The command is designed to update CourtListener (CL) cases with the latest data from the Caselaw Access Project (CAP).Key Changes
update_cap_cases.py
management commandtest_update_cap_cases.py
for unit testingTesting
Unit tests have been added to for core functionality in the new command
Note
It is necessary to have generated crosswalk files with the generate_capcrosswalk.py command before this script will work