Skip to main content

Compressed Big Decimal

Overview

Compressed Big Decimal is an extension which provides support for Mutable big decimal value that can be used to accumulate values without losing precision or reallocating memory. This type helps in absolute precision arithmetic on large numbers in applications, where greater level of accuracy is required, such as financial applications, currency based transactions. This helps avoid rounding issues where in potentially large amount of money can be lost.

Accumulation requires that the two numbers have the same scale, but does not require that they are of the same size. If the value being accumulated has a larger underlying array than this value (the result), then the higher order bits are dropped, similar to what happens when adding a long to an int and storing the result in an int. A compressed big decimal that holds its data with an embedded array.

Compressed big decimal is an absolute number based complex type based on big decimal in Java. This supports all the functionalities supported by Java Big Decimal. Java Big Decimal is not mutable in order to avoid big garbage collection issues. Compressed big decimal is needed to mutate the value in the accumulator.

Main enhancements provided by this extension:

  1. Functionality: Mutating Big decimal type with greater precision
  2. Accuracy: Provides greater level of accuracy in decimal arithmetic

Operations

To use this extension, make sure to load compressed-big-decimal to your config file.

Configuration

There are currently no configuration properties specific to Compressed Big Decimal

Limitations

  • Compressed Big Decimal does not provide correct result when the value being accumulated has a larger underlying array than this value (the result), then the higher order bits are dropped, similar to what happens when adding a long to an int and storing the result in an int.

Ingestion Spec:

propertydescriptionrequired?
metricsSpecMetrics Specification, In metrics specification while specifying metrics details such as name, type should be specified as compressedBigDecimalYes

Query spec:

  • Most properties in the query spec derived from groupBy query / timeseries, see documentation for these query types.
propertydescriptionrequired?
queryTypeThis String should always be either "groupBy" OR "timeseries"; this is the first thing Druid looks at to figure out how to interpret the query.yes
dataSourceA String or Object defining the data source to query, very similar to a table in a relational database. See DataSource for more information.yes
dimensionsA JSON list of DimensionSpec (Notice that property is optional)no
limitSpecSee LimitSpecno
havingSee Havingno
granularityA period granularity; See Period Granularitiesyes
filterSee Filtersno
aggregationsAggregations forms the input to Averagers; See Aggregations. The Aggregations must specify type, scale and size as follows for compressedBigDecimal Type "aggregations": [{"type": "compressedBigDecimal","name": "..","fieldName": "..","scale": [Numeric],"size": [Numeric]}. Please refer query example in Examples section.Yes
postAggregationsSupports only aggregations as input; See Post Aggregationsno
intervalsA JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.yes
contextAn additional JSON Object which can be used to specify certain flags.no

Examples

Consider the data as

DateItemSaleAmount
20201208,ItemA,0.0
20201208,ItemB,10.000000000
20201208,ItemA,-1.000000000
20201208,ItemC,9999999999.000000000
20201208,ItemB,5000000000.000000005
20201208,ItemA,2.0
20201208,ItemD,0.0

IngestionSpec syntax:

{
"type": "index_parallel",
"spec": {
"dataSchema": {
"dataSource": "invoices",
"timestampSpec": {
"column": "timestamp",
"format": "yyyyMMdd"
},
"dimensionsSpec": {
"dimensions": [{
"type": "string",
"name": "itemName"
}]
},
"metricsSpec": [{
"name": "saleAmount",
"type": *"compressedBigDecimal"*,
"fieldName": "saleAmount"
}],
"transformSpec": {
"filter": null,
"transforms": []
},
"granularitySpec": {
"type": "uniform",
"rollup": false,
"segmentGranularity": "DAY",
"queryGranularity": "none",
"intervals": ["2020-12-08/2020-12-09"]
}
},
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "local",
"baseDir": "/home/user/sales/data/staging/invoice-data",
"filter": "invoice-001.20201208.txt"
},
"inputFormat": {
"type": "tsv",
"delimiter": ",",
"skipHeaderRows": 0,
"columns": [
"timestamp",
"itemName",
"saleAmount"
]
}
},
"tuningConfig": {
"type": "index_parallel"
}
}
}

Group By Query example

Calculating sales groupBy all.

Query syntax:

{
"queryType": "groupBy",
"dataSource": "invoices",
"granularity": "ALL",
"dimensions": [
],
"aggregations": [
{
"type": "compressedBigDecimal",
"name": "saleAmount",
"fieldName": "saleAmount",
"scale": 9,
"size": 3

}
],
"intervals": [
"2020-01-08T00:00:00.000Z/P1D"
]
}

Result:

[ {
"version" : "v1",
"timestamp" : "2020-12-08T00:00:00.000Z",
"event" : {
"revenue" : 15000000010.000000005
}
} ]

Had you used doubleSum instead of compressedBigDecimal the result would be

[ {
"timestamp" : "2020-12-08T00:00:00.000Z",
"result" : {
"revenue" : 1.500000001E10
}
} ]

As shown above the precision is lost and could lead to loss in money.

TimeSeries Query Example

Query syntax:

{
"queryType": "timeseries",
"dataSource": "invoices",
"granularity": "ALL",
"aggregations": [
{
"type": "compressedBigDecimal",
"name": "revenue",
"fieldName": "revenue",
"scale": 9,
"size": 3
}
],
"filter": {
"type": "not",
"field": {
"type": "selector",
"dimension": "itemName",
"value": "ItemD"
}
},
"intervals": [
"2020-12-08T00:00:00.000Z/P1D"
]
}

Result:

[ {
"timestamp" : "2020-12-08T00:00:00.000Z",
"result" : {
"revenue" : 15000000010.000000005
}
} ]