Skip to content

Commit

Permalink
Some improvements for linking in the scala version (#52)
Browse files Browse the repository at this point in the history
* improve between operator linking. Unknown as default value for Type property.

* some tests fixed

* between indicators in a file

* toMap in betweenIndicators

* dealing with too long numbers in Value

* when converting to Int in AnaforaReader, catch NumberFormatException and throw AnaforaReader.Exception
  • Loading branch information
EgoLaparra authored Dec 13, 2019
1 parent e638b91 commit bcdbbc4
Show file tree
Hide file tree
Showing 6 changed files with 163 additions and 78 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Start from
Start since
End to
End until
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
AMPM-Of-Day AM AM
AMPM-Of-Day AM am
AMPM-Of-Day AM a.m.
AMPM-Of-Day PM PM
AMPM-Of-Day PM pm
AMPM-Of-Day PM p.m.
Calendar-Interval Century century
Calendar-Interval Day Tomorrow
Calendar-Interval Day daily
Calendar-Interval Day day
Calendar-Interval Day days
Expand All @@ -19,67 +18,79 @@ Calendar-Interval Week week
Calendar-Interval Year annual
Calendar-Interval Year year
Calendar-Interval Year years
Day-Of-Week Friday Friday
Day-Of-Week Monday Monday
Day-Of-Week Saturday Saturday
Day-Of-Week Sunday Sunday
Day-Of-Week Thursday Thursday
Day-Of-Week Tuesday Tuesday
Day-Of-Week Wednesday Wednesday
Modifier Approx almost
Modifier Approx around
Modifier Approx nearly
Modifier End end
Modifier End late
Modifier Fiscal financial
Modifier Less-Than almost
Modifier Less-Than less_than
Modifier Less-Than no_more_than
Modifier Mid mid
Modifier Mid middle
Modifier More-Than at_least
Modifier More-Than more_than
Modifier More-Than over
Modifier Start early
Calendar-Interval Hour hour
Calendar-Interval Hour hours
Calendar-Interval Hour hrs
Calendar-Interval Minute minute
Calendar-Interval Minute minutes
Calendar-Interval Minute min
Calendar-Interval Second second
Calendar-Interval Second seconds
Calendar-Interval Second sec
Day-Of-Week Friday friday
Day-Of-Week Monday monday
Day-Of-Week Saturday saturday
Day-Of-Week Sunday sunday
Day-Of-Week Thursday thursday
Day-Of-Week Tuesday tuesday
Day-Of-Week Wednesday wednesday
Month-Of-Year April 04
Month-Of-Year April Apr.
Month-Of-Year April April
Month-Of-Year April 4
Month-Of-Year April apr.
Month-Of-Year April apr
Month-Of-Year April april
Month-Of-Year August 08
Month-Of-Year August Aug.
Month-Of-Year August August
Month-Of-Year August 8
Month-Of-Year August aug.
Month-Of-Year August aug
Month-Of-Year August august
Month-Of-Year December 12
Month-Of-Year December Dec.
Month-Of-Year December December
Month-Of-Year December dec.
Month-Of-Year December dec
Month-Of-Year December december
Month-Of-Year February 02
Month-Of-Year February Feb.
Month-Of-Year February February
Month-Of-Year February 2
Month-Of-Year February feb.
Month-Of-Year February feb
Month-Of-Year February february
Month-Of-Year January 01
Month-Of-Year January JANUARY
Month-Of-Year January Jan.
Month-Of-Year January January
Month-Of-Year January 1
Month-Of-Year January jan.
Month-Of-Year January jan
Month-Of-Year January january
Month-Of-Year July 07
Month-Of-Year July Jul.
Month-Of-Year July July
Month-Of-Year July 7
Month-Of-Year July jul.
Month-Of-Year July jul
Month-Of-Year July july
Month-Of-Year June 06
Month-Of-Year June Jun.
Month-Of-Year June June
Month-Of-Year June 6
Month-Of-Year June jun.
Month-Of-Year June jun
Month-Of-Year June june
Month-Of-Year March 03
Month-Of-Year March Mar.
Month-Of-Year March March
Month-Of-Year March 3
Month-Of-Year March mar.
Month-Of-Year March mar
Month-Of-Year March march
Month-Of-Year May 05
Month-Of-Year May May
Month-Of-Year May 5
Month-Of-Year May may
Month-Of-Year November 11
Month-Of-Year November Nov.
Month-Of-Year November November
Month-Of-Year November nov.
Month-Of-Year November nov
Month-Of-Year November november
Month-Of-Year October 10
Month-Of-Year October Oct.
Month-Of-Year October October
Month-Of-Year October oct.
Month-Of-Year October oct
Month-Of-Year October october
Month-Of-Year September 09
Month-Of-Year September Sept.
Month-Of-Year September September
Month-Of-Year September 9
Month-Of-Year September sep.
Month-Of-Year September sep
Month-Of-Year September september
Part-Of-Day Afternoon afternoon
Part-Of-Day Evening evening
Part-Of-Day Morning Morning
Part-Of-Day Morning morning
Part-Of-Day Night night
Part-Of-Day Night nights
Expand All @@ -93,21 +104,17 @@ Period Decades decade
Period Decades decades
Period Hours hour
Period Hours hours
Period Hours hrs
Period Minutes minute
Period Minutes minutes
Period Minutes min
Period Seconds second
Period Seconds seconds
Period Seconds sec
Period Months month
Period Months months
Period Unknown Not_that_long
Period Unknown fairly_lengthy_period
Period Unknown moment
Period Unknown periods
Period Unknown point
Period Unknown some_time
Period Unknown time
Period Weeks week
Period Weeks weeks
Period Months month
Period Months months
Period Years year
Period Years years
Season-Of-Year Fall fall
Expand Down
22 changes: 15 additions & 7 deletions src/main/scala/org/clulab/timenorm/scate/Readers.scala
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,14 @@ class AnaforaReader(val DCT: Interval)(implicit data: Data) {
}
}

private def intValue(value: String): Int = {
try {
value.toInt
} catch {
case _: NumberFormatException => throw new AnaforaReader.Exception(s"expected numeric Value, found $value")
}
}

def number(entity: Entity)(implicit data: Data): Number = entity.properties("Value") match {
case "?" => VagueNumber(entity.text.getOrElse(""), Some(entity.fullSpan))
case "" => throw new AnaforaReader.Exception(s"""cannot parse number from "${entity.text} and ${entity.xml}""")
Expand All @@ -58,7 +66,7 @@ class AnaforaReader(val DCT: Interval)(implicit data: Data) {
val g = gcd(numerator, denominator)
FractionalNumber(number, numerator / g, denominator / g, Some(entity.fullSpan))
} else if (value.forall(_.isDigit)) {
IntNumber(value.toInt, Some(entity.fullSpan))
IntNumber(intValue(value), Some(entity.fullSpan))
} else {
VagueNumber(value, Some(entity.fullSpan))
}
Expand Down Expand Up @@ -150,10 +158,10 @@ class AnaforaReader(val DCT: Interval)(implicit data: Data) {
val result = (entity.`type`, valueOption, periods, repeatingIntervals, numbers, semantics) match {
case ("Event", None, N, N, N, None) => Event(entity.text.getOrElse(""), charSpan)
case ("Year", Some(value), N, N, N, None) => value.partition(_ != '?') match {
case (year, questionMarks) => Year(year.toInt, questionMarks.length, charSpan)
case (year, questionMarks) => Year(intValue(year), questionMarks.length, charSpan)
}
case ("Two-Digit-Year", Some(value), N, N, N, None) => value.partition(_ != '?') match {
case (year, questionMarks) => YearSuffix(interval(properties), year.toInt, year.length, questionMarks.length, charSpan)
case (year, questionMarks) => YearSuffix(interval(properties), intValue(year), year.length, questionMarks.length, charSpan)
}
case ("Between", None, N, N, N, None) => Between(
interval(properties, "Start-"),
Expand Down Expand Up @@ -184,9 +192,9 @@ class AnaforaReader(val DCT: Interval)(implicit data: Data) {
case ("After", None, N, Seq(rInterval), N, Inc) => AfterRI(interval(properties), rInterval, from = Interval.Start, triggerCharSpan = charSpan)
case ("After", None, N, Seq(rInterval), Seq(number), Exc) => AfterRI(interval(properties), rInterval, number, triggerCharSpan = charSpan)
case ("After", None, N, Seq(rInterval), Seq(number), Inc) => AfterRI(interval(properties), rInterval, number, from = Interval.Start, triggerCharSpan = charSpan)
case ("NthFromStart", Some(value), N, N, N, None) => NthP(interval(properties), value.toInt, UnknownPeriod(), triggerCharSpan = charSpan)
case ("NthFromStart", Some(value), Seq(period), N, N, None) => NthP(interval(properties), value.toInt, period, triggerCharSpan = charSpan)
case ("NthFromStart", Some(value), N, Seq(rInterval), N, None) => NthRI(interval(properties), value.toInt, rInterval, triggerCharSpan = charSpan)
case ("NthFromStart", Some(value), N, N, N, None) => NthP(interval(properties), intValue(value), UnknownPeriod(), triggerCharSpan = charSpan)
case ("NthFromStart", Some(value), Seq(period), N, N, None) => NthP(interval(properties), intValue(value), period, triggerCharSpan = charSpan)
case ("NthFromStart", Some(value), N, Seq(rInterval), N, None) => NthRI(interval(properties), intValue(value), rInterval, triggerCharSpan = charSpan)
case ("Intersection", None, N, N, N, None) => IntersectionI(entity.properties.getEntities("Intervals").map(interval), charSpan)
case _ => throw new AnaforaReader.Exception(
s"""cannot parse Interval from "${entity.text}" and ${entity.descendants.map(_.xml)}""")
Expand Down Expand Up @@ -217,7 +225,7 @@ class AnaforaReader(val DCT: Interval)(implicit data: Data) {
case ("Last", None, N, Seq(rInterval), Seq(number), N) => LastRIs(interval(entity.properties), rInterval, number, triggerCharSpan = charSpan)
case ("Next", None, N, Seq(rInterval), Seq(number), N) => NextRIs(interval(entity.properties), rInterval, number, triggerCharSpan = charSpan)
case ("NthFromStart", Some(value), N, Seq(rInterval), Seq(number), N) =>
NthRIs(interval(entity.properties), value.toInt, rInterval, number, triggerCharSpan = charSpan)
NthRIs(interval(entity.properties), intValue(value), rInterval, number, triggerCharSpan = charSpan)
case _ => throw new AnaforaReader.Exception(
s"""cannot parse Intervals from "${entity.text}" and ${entity.descendants.map(_.xml)}""")
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import scala.collection.immutable.ListMap
import scala.collection.mutable
import scala.io.Source
import scala.language.postfixOps
import scala.xml.{XML, Elem}
import scala.xml.{Elem, XML}


object TemporalNeuralParser {
Expand Down Expand Up @@ -113,6 +113,13 @@ class TemporalNeuralParser(modelStream: Option[InputStream] = None) extends Auto
}.toIndexedSeq.groupBy(_._1).mapValues(_.map(_._2).toMap).toMap
}

// textual indicators that help to filter links for Between operators
lazy private val betweenIndicators: Map[String, IndexedSeq[String]] = {
resourceLines("/org/clulab/timenorm/linking_configure/between-indicators.txt").map(_.split(' ')).map{
case Array(key, string) => (key, string)
}.toIndexedSeq.groupBy(_._1).mapValues(_.map(_._2)).toMap
}

lazy private val operatorToPropertyToTypes: Map[String, Map[String, Set[String]]] = {
val path = "/org/clulab/timenorm/linking_configure/timenorm-schema.xml"
val xml = XML.load(this.getClass.getResourceAsStream(path))
Expand Down Expand Up @@ -156,12 +163,12 @@ class TemporalNeuralParser(modelStream: Option[InputStream] = None) extends Auto
}

def parseBatchToXML(text: String, spans: Array[(Int, Int)]): Array[Elem] = {
val antixmlCleanedText = """[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD]+""".r.replaceAllIn(text, " ")
val antixmlCleanedText = """[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD]""".r.replaceAllIn(text, " ")

val allTimeSpans = identifyBatch(text, spans)
val timeSpanToId = allTimeSpans.flatten.zipWithIndex.toMap.mapValues(i => s"$i@id")
for (timeSpans <- allTimeSpans) yield {
val entityElems = for ((timeSpan, links) <- timeSpans zip inferLinks(timeSpans)) yield {
val entityElems = for ((timeSpan, links) <- timeSpans zip inferLinks(text, timeSpans)) yield {
val id = timeSpanToId(timeSpan)
val (start, end, timeType) = timeSpan
val timeText = antixmlCleanedText.substring(start, end)
Expand Down Expand Up @@ -246,7 +253,16 @@ class TemporalNeuralParser(modelStream: Option[InputStream] = None) extends Auto
}
}

def inferLinks(timeSpans: Array[(Int, Int, String)]): Array[Array[(String, Int)]] = {
def filterBetween(property: String, text: String, source: (Int, Int, String), target: (Int, Int, String)): Boolean = {
val source_text = text.slice(source._1, source._2)
property match {
case "End-Interval" if source._1 > target._1 || betweenIndicators.getOrElse("Start", IndexedSeq()).contains(source_text) => false
case "Start-Interval" if source._1 < target._1 && betweenIndicators.getOrElse("End", IndexedSeq()).contains(source_text) => false
case _ => true
}
}

def inferLinks(text: String, timeSpans: Array[(Int, Int, String)]): Array[Array[(String, Int)]] = {
val links = Array.fill(timeSpans.length)(mutable.ArrayBuffer.empty[(String, Int)])
val ancestors = Array.fill(timeSpans.length)(mutable.Set.empty[Int])
val descendants = Array.fill(timeSpans.length)(mutable.Set.empty[Int])
Expand All @@ -265,8 +281,8 @@ class TemporalNeuralParser(modelStream: Option[InputStream] = None) extends Auto
(source, target) <- Seq((s, i), (i, s))
sourceType = timeSpans(source)._3
targetType = timeSpans(target)._3
propertyAllowedValues <- this.operatorToPropertyToTypes.get(sourceType).toSeq
(propertyName, allowedValues) <- propertyAllowedValues
propertyAllowedValues <- this.operatorToPropertyToTypes.get(sourceType)
(propertyName, allowedValues) <- propertyAllowedValues.filterKeys(filterBetween(_, text, timeSpans(source), timeSpans(target)))
// the slot should be valid according to the schema
if allowedValues contains targetType
// the slot should not already be full
Expand All @@ -292,10 +308,14 @@ class TemporalNeuralParser(modelStream: Option[InputStream] = None) extends Auto
private def inferProperties(timeText: String, timeType: String, links: Array[(String, Int)]): Array[(String, String)] = {
val propertyOptions = for (propertyType <- this.operatorToPropertyToTypes(timeType).keys) yield propertyType match {
case "Type" =>
val p = this.operatorToTextToType.get(timeType).flatMap(_.get(timeText)).getOrElse(timeText)
// The "Type" value for decades (70s, 80s, ...) is set as "Years".
// It will be converted into "Year" in case of "Period". E.g. "He is in his 70s".
val timeTextNoDecades = """"^[0-9]{2}s?$""".r.replaceAllIn(timeText, "Years")
val p = this.operatorToTextToType.get(timeType).flatMap(_.get(timeTextNoDecades.toLowerCase)).getOrElse("Unknown")
(timeType, p.last) match {
case ("Calendar-Interval", 's') => Some((propertyType, p.dropRight(1)))
case ("Period", l) if p != "Unknown" && l != 's' => Some((propertyType, p + "s"))
case ("Frequency", _) => Some((propertyType, "Other"))
case _ => Some((propertyType, p))
}
case "Value" =>
Expand All @@ -304,7 +324,10 @@ class TemporalNeuralParser(modelStream: Option[InputStream] = None) extends Auto
try {
Some(cleanedText.toLong)
} catch {
case _: NumberFormatException => textToNumber(cleanedText.split("""[\s-]+"""))
case _: NumberFormatException => """^\d+$""".r.findFirstIn(cleanedText) match {
case Some(_) => None
case None => textToNumber(cleanedText.split("""[\s-]+"""))
}
}
Some((propertyType, valueOption.map(_.toString).getOrElse(timeText)))
case intervalType if intervalType contains "Interval-Type" =>
Expand Down
10 changes: 8 additions & 2 deletions src/test/scala/org/clulab/timenorm/scate/EvaluateLinker.scala
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
package org.clulab.timenorm.scate

import java.nio.file.{Path, Paths}
import java.nio.file.{Files, Path, Paths}

import scala.xml.Elem
import org.clulab.anafora.{Anafora, Data, Entity}


object EvaluateLinker {

private def linkedEntityInfo(inRoots: Array[Path], exclude: Set[String]): Array[(Entity, String, Entity, Int)] = {
Expand Down Expand Up @@ -36,6 +37,10 @@ object EvaluateLinker {
distances.toSeq.groupBy(_._1).mapValues(_.map(_._2).groupBy(identity).mapValues(_.length).toMap).toMap
}

private def textContent(path: String): String = {
Files.readAllBytes(Paths.get(path)).mkString
}

def evaluateLinker(inRoots: Array[Path], verbose: Boolean = false): (Int, Int, Int) = {
val parser = new TemporalNeuralParser()
val results = for {
Expand All @@ -44,7 +49,8 @@ object EvaluateLinker {
data = Data.fromPaths(xmlPath)
entities = data.entities.sortBy(_.fullSpan._1)
timeSpans = entities.map(e => (e.fullSpan._1, e.fullSpan._2, e.`type`))
(i, links) <- entities.indices zip parser.inferLinks(timeSpans.toArray)
text = textContent(xmlPath.toString.replace(".TimeNorm.gold.completed.xml", ""))
(i, links) <- entities.indices zip parser.inferLinks(text, timeSpans.toArray)
} yield {
val goldProperties = entities(i).properties.xml.child.collect{
case elem: Elem if elem.text.contains("@") => (elem.label, elem.text)
Expand Down
Loading

0 comments on commit bcdbbc4

Please sign in to comment.