Skip to content

Commit

Permalink
Release 7.020
Browse files Browse the repository at this point in the history
  • Loading branch information
cnuernber committed Oct 29, 2023
1 parent 80d8b09 commit 2baeeff
Show file tree
Hide file tree
Showing 40 changed files with 88 additions and 81 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
# Changelog
# 7.020
* hamf perf upgrades.
* big perf upgrade for parsing sequences of maps.

# 7.019
* hamf perf upgrades.

# 7.018
* hamf perf upgrades.

Expand Down
4 changes: 2 additions & 2 deletions deps.edn
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{:paths ["src" "resources" "target/classes"]
:deps {cnuernber/dtype-next {:mvn/version "10.106"}
:deps {cnuernber/dtype-next {:mvn/version "10.107"}
techascent/tech.io {:mvn/version "4.31"
:exclusions [org.apache.commons/commons-compress]}
org.apache.datasketches/datasketches-java {:mvn/version "4.2.0"}}
Expand All @@ -12,7 +12,7 @@
:exec-fn codox.main/-main
:exec-args {:group-id "techascent"
:artifact-id "tech.ml.dataset"
:version "7.019"
:version "7.020"
:name "TMD"
:description "A Clojure high performance data processing system"
:metadata {:doc/format :markdown}
Expand Down
2 changes: 1 addition & 1 deletion docs/000-getting-started.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/100-walkthrough.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/200-quick-reference.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/columns-readers-and-datatypes.html

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/nippy-serialization-rocks.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/supported-datatypes.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.categorical.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.clipboard.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.column-filters.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.column.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.io.csv.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.io.datetime.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.io.string-row-parser.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.io.univocity.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.join.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.math.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.metamorph.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.modelling.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.neanderthal.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.print.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.reductions.apache-data-sketch.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.reductions.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.rolling.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.set.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.tensor.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.dataset.zip.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.libs.arrow.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.libs.fastexcel.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.libs.guava.cache.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.libs.parquet.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.libs.poi.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.libs.smile.data.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/tech.v3.libs.tribuo.html

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions src/tech/v3/dataset/impl/column_data_process.clj
Original file line number Diff line number Diff line change
Expand Up @@ -114,11 +114,11 @@
#:tech.v3.dataset{:force-datatype? true})))
(ds-proto/is-column? obj-data)
(let [cm (meta obj-data)]
#:tech.v3.datatype{:name (:name cm)
:missing (ds-proto/missing obj-data)
:force-datatype? true
:data (ds-proto/column-buffer obj-data)
:metadata cm})
#:tech.v3.dataset{:name (:name cm)
:missing (ds-proto/missing obj-data)
:force-datatype? true
:data (ds-proto/column-buffer obj-data)
:metadata cm})
:else
(scan-data obj-data nil)))

Expand Down
74 changes: 37 additions & 37 deletions src/tech/v3/dataset/io/column_parsers.clj
Original file line number Diff line number Diff line change
Expand Up @@ -478,6 +478,7 @@
^{:unsynchronized-mutable true} missing-value
^RoaringBitmap missing
column-name
^:unsynchronized-mutable ^long last-idx
^:unsynchronized-mutable ^long max-idx
options]
dtype-proto/PECount
Expand All @@ -492,44 +493,42 @@
(.get container idx))))
PParser
(addValue [_p idx value]
(set! max-idx (max idx max-idx))
(set! max-idx idx)
(when-not (missing-value? value)
(let [org-datatype (dtype/datatype value)
;;Avoid the pack call if possible
packed-dtype (if (identical? container-dtype org-datatype)
org-datatype
(packing/pack-datatype org-datatype))
container-ecount (- (.size container) (.getCardinality missing))]
(if (or (== 0 container-ecount)
(identical? container-dtype packed-dtype))
(do
(when (== 0 container-ecount)
(set! container (column-base/make-container packed-dtype options))
(set! container-dtype packed-dtype)
(set! missing-value (column-base/datatype->missing-value packed-dtype)))
(when-not (== container-ecount idx)
(add-missing-values! container missing missing-value idx))
(.add container value))
;;boolean present a problem here. We generally want to keep them as booleans
;;and not promote them to full numbers.
(let [widest-datatype (if (identical? org-datatype :boolean)
(if (identical? container-dtype :boolean)
:boolean
:object)
(casting/widest-datatype
(packing/unpack-datatype container-dtype)
org-datatype))]
(when-not (= widest-datatype container-dtype)
(let [new-container (promote-container container
missing widest-datatype
options)]
(set! container new-container)
(set! container-dtype widest-datatype)
(set! missing-value (column-base/datatype->missing-value
widest-datatype))))
(when-not (== container-ecount idx)
(add-missing-values! container missing missing-value idx))
(.add container value))))))
(let [val-dtype (fast-dtype value)]
;;setup container for new data
(when-not (identical? container-dtype val-dtype)
(let [;;Avoid the pack call if possible
packed-dtype (packing/pack-datatype val-dtype)
container-ecount (.size container)
logical-ecount (- container-ecount (.getCardinality missing))]
;;Setup container
(if (== 0 logical-ecount)
(do
(set! container (column-base/make-container packed-dtype options))
(set! container-dtype val-dtype)
(set! missing-value (column-base/datatype->missing-value packed-dtype)))
;;boolean present a problem here. We generally want to keep them as booleans
;;and not promote them to full numbers.
(let [widest-datatype (if (identical? val-dtype :boolean)
(if (identical? container-dtype :boolean)
:boolean
:object)
(casting/widest-datatype
(packing/unpack-datatype container-dtype)
val-dtype))]
(when-not (= widest-datatype container-dtype)
(let [new-container (promote-container container
missing widest-datatype
options)]
(set! container new-container)
(set! container-dtype widest-datatype)
(set! missing-value (column-base/datatype->missing-value
widest-datatype))))))))
(when (> (- idx last-idx) 1)
(add-missing-values! container missing missing-value idx))
(set! last-idx idx)
(.add container value))))
(finalize [_p rowcount]
(finalize-parser-data! container missing nil nil
missing-value rowcount)))
Expand All @@ -543,4 +542,5 @@
(bitmap/->bitmap)
column-name
-1
-1
options))
2 changes: 1 addition & 1 deletion src/tech/v3/dataset/string_table.clj
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
(add [this value]
(errors/when-not-errorf
(or (nil? value) (instance? String value))
"Value added to string table is not a string: %s" value)
"Value added to string table is not a string: %s" (type value))
(let [value (or value "")
item-idx (int (.computeIfAbsent
str->int
Expand Down

0 comments on commit 2baeeff

Please sign in to comment.