TimescaleDB vs. InfluxDB Purpose-Built for Time-Series Data

# TimescaleDB vs. InfluxDB: Purpose-Built for Time-Series Data ![rw-book-cover](https://blog.timescale.com/content/images/2019/06/Screen-Shot-2019-06-19-at-3.02.10-PM.png) URL:: https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877/ Author:: Timescale Blog ## Highlights > Typically database comparisons focus on performance benchmarks. Yet performance is just a part of the overall picture. It doesn't matter how well a database performs in benchmarks if it lacks the data model, query language, or reliability required for your production workloads. With that in mind, we begin by comparing TimescaleDB and InfluxDB across three qualitative dimensions, *data model, query language, and reliability,* before diving deeper with *performance benchmarks*. We then round out with a comparison with *database ecosystem, operational management, and company/community support*. ([View Highlight](https://read.readwise.io/read/01fdjwzfh87veaza1780jhdq1y)) > Yes, we are the developers of TimescaleDB, so you might quickly disregard our comparison as biased. But if you let the analysis speak for itself, you’ll find that we stay objective . ([View Highlight](https://read.readwise.io/read/01fdjwzkd5qeb59yx2nstzwavx)) > When it comes to data models, TimescaleDB and InfluxDB have two very different opinions: TimescaleDB is a relational database, while InfluxDB is more of a custom, NoSQL, non-relational database. What this means is that TimescaleDB relies on the *relational data model*, commonly found in PostgreSQL, MySQL, SQL Server, Oracle, etc. On the other hand, InfluxDB has developed its own custom data model, which, for the purpose of this comparison, we’ll call the *tagset data model*. ([View Highlight](https://read.readwise.io/read/01fdjx0437daqpj702115th5vg)) > The [*relational data model*](https://en.wikipedia.org/wiki/Relational_model) has been in use for several decades now. With the [relational model in TimescaleDB](https://docs.timescale.com/latest/introduction/data-model?utm_source=timescale-influx-benchmark&utm_medium=blog&utm_campaign=july-2020-advocacy&utm_content=docs-data-model), each time-series measurement is recorded in its own row, with a time field followed by any number of other fields, which can be floats, ints, strings, booleans, arrays, JSON blobs, [geospatial dimensions](http://postgis.net/), date/time/timestamps, currencies, binary data, or even [more complex data types](https://www.postgresql.org/docs/current/static/datatype.html). One can create indexes on any one field (standard indexes) or multiple fields (composite indexes), or on expressions like functions, or even limit an index to a subset of rows (partial index). Any of these fields can be used as a *foreign key* to secondary tables, which can then store additional metadata. ([View Highlight](https://read.readwise.io/read/01fdjx139vjxmvsdgd5vycm4np)) > The advantage of this approach is that it is quite flexible. One can choose to have: > • A narrow or wide table, depending on how much data and metadata to record per reading > • Many indexes to speed up queries or few indexes to reduce disk usage > • Denormalized metadata within the measurement row, or normalized metadata that lives in a separate table, either of which can be updated at any time (although it is easier to update in the latter case) > • A rigid schema that validates input types or a schemaless JSON blob to increase iteration speed > • Check constraints that validate inputs, for example checking for uniqueness or non-null values ([View Highlight](https://read.readwise.io/read/01fdjx7jnndb0enbtve5n1bkfh)) > With the InfluxDB [*tagset data model*](https://docs.influxdata.com/influxdb/v1.8/concepts/schema_and_data_layout/), each measurement has a timestamp, and an associated set of tags (tagset) and set of fields (fieldset). The fieldset represents the actual measurement reading values, while the tagset represents the metadata to describe the measurements. > Field data types are limited to floats, ints, strings, and booleans, and cannot be changed without rewriting the data ([View Highlight](https://read.readwise.io/read/01fdjx91wqe0trhjjrkv06xc4z)) > The advantage of this approach is that if one’s data naturally fits the tagset model, then it is quite easy to get started, as one doesn’t have to worry about creating schemas or indexes. > Conversely, the disadvantage of this model is that it is quite rigid and limited, with no ability to create additional indexes, indexes on continuous fields (e.g., numerics), update metadata after the fact, enforce data validation, etc. ([View Highlight](https://read.readwise.io/read/01fdjx9agph8a7y32qx3bn4vap)) > It’s also possible to create relational schema that are equivalent to the tagset model for specific use cases, such as [Prometheus](https://github.com/prometheus/prometheus) metrics. For more on this, see the [Timescale-Prometheus GitHub repository](https://github.com/timescale/timescale-prometheus). ([View Highlight](https://read.readwise.io/read/01fdjx9tvg6sy055wrgbqs922r)) > For most use cases, we believe that SQL is the right query language for a time-series database. SQL has a rich tradition and history, including familiarity among millions of developers and a vibrant ecosystem of tutorials, training, and community leaders. In short, choosing SQL means you’re never alone. ([View Highlight](https://read.readwise.io/read/01fdjxbcndsv4nf57279xq23pj)) > This is also why [SQL is making a comeback](https://blog.timescale.com/blog/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a/?utm_source=timescale-influx-benchmark&utm_medium=blog&utm_campaign=july-2020-advocacy&utm_content=why-sql-v-nosql-blog) as the query language of choice for data infrastructure in general. Indeed, SQL is well-documented and is the [third-most commonly used programming language among developers](https://insights.stackoverflow.com/survey/2020#most-popular-technologies). ([View Highlight](https://read.readwise.io/read/01fdjxbnje6d9esdp0hyb1y9fa)) > At its start, InfluxDB sought to completely [write an entire database in Go](https://blog.gopheracademy.com/birthday-bash-2014/why-influxdb-uses-go/). ([View Highlight](https://read.readwise.io/read/01fdjxcqa2wbjrnd1vzbe4cn45)) > These design decisions have significant implications that affect reliability. First, InfluxDB has to implement the full suite of fault-tolerance mechanisms, including replication, high availability, and backup/restore. Second, InfluxDB is responsible for its on-disk reliability, e.g., to make sure all its data structures are both durable and resist data corruption across failures (and even failures during the recovery of failures). ([View Highlight](https://read.readwise.io/read/01fdjxd34dc5p872c3gwayxcsk)) > We made a dramatically different architectural decision when building TimescaleDB: build on PostgreSQL. ([View Highlight](https://read.readwise.io/read/01fdjxdarr0exvr6rak6sjv3nh)) > databases have been built with an array of mechanisms to further reduce such risk, including streaming replication to replicas, full-snapshot backup and recovery, streaming backups, robust data export tools, etc. ([View Highlight](https://read.readwise.io/read/01fdjxdw8zckymfkpyh2vsyqct)) > InfluxDB, on the other hand, has had to build all these tools from scratch. ([View Highlight](https://read.readwise.io/read/01fdjxeacy3ttf5sft8dvga6gj)) > InfluxDB had to design and implement all recovery, reliability and durability functionality from scratch. [This is a notoriously hard problem in databases](https://dl.acm.org/citation.cfm?id=128770) that typically takes many years or even decades to get correct. ([View Highlight](https://read.readwise.io/read/01fdjxf6nyj62hrcs9hftxax0q)) > **Insert performance summary** > • For workloads with extremely low cardinality, the databases are comparable, with InfluxDB outperforming Timescale. > • As cardinality increases, InfluxDB insert performance drops off dramatically faster than that with TimescaleDB. > • For workloads with high cardinality, TimescaleDB has ~3.5x the insert performance as InfluxDB. > • If your insert performance is far below these benchmarks (e.g., if it is 2,000 rows / second), then insert performance will not be your bottleneck. ([View Highlight](https://read.readwise.io/read/01fdjxgjs10kdxgtvye5hn6ykb)) > ****Read latency performance summary**** > • For simple queries, TimescaleDB generally outperforms InfluxDB. > • For aggregates and double roll ups, TimescaleDB also generally outperforms InfluxDB. However, when simply rolling up just a single metric, InfluxDB can sometimes outperform TimescaleDB. > • When selecting rows based on a threshold, TimescaleDB outperforms InfluxDB by a significant margin, being up to 414% faster. > • For complex queries, TimescaleDB vastly outperforms InfluxDB, and supports a broader range of query types; the difference here is often in the range of seconds to tens of seconds, with Timescale 344-7100% the performance improvement over InfluxDB. ([View Highlight](https://read.readwise.io/read/01fdjxh2j0e9pyq4x9jg5p3hat)) > Stability issues during benchmarking ([View Highlight](https://read.readwise.io/read/01fdjxhpjrzntrqg9xe5nsf0qa)) > While we were able to insert batches of 10K into InfluxDB at lower cardinalities, once we got·to 100k devices we would experience timeouts and errors with batch sizes that large. The most common errors were write errors caused by exceeding the maximum cache memory size, timeouts and fatal out of memory errors, which all occurred during runtime. ([View Highlight](https://read.readwise.io/read/01fdjxhv380tqgn72gpfx1naw8)) > High-cardinality datasets are a significant weakness for InfluxDB. This is because of how the InfluxDB developers have architected their system, starting with their Time-series Index (TSI). ([View Highlight](https://read.readwise.io/read/01fdjxk7gtm1h3mjdnkg22q6tt)) > In contrast, TimescaleDB is a relational database that relies on a proven data structure for indexing data: the B-tree. This decision leads to its ability to scale to high cardinalities. ([View Highlight](https://read.readwise.io/read/01fdjxmcxczwswq6w4c3j5rffw)) > Having a broad ecosystem makes deployment easier. For example, if one is already using [Tableau](https://docs.timescale.com/latest/tutorials/visualizing-time-series-data-in-tableau?utm_source=timescale-influx-benchmark&utm_medium=blog&utm_campaign=july-2020-advocacy&utm_content=docs-tableau) to visualize data, or Apache Spark for data processing, TimescaleDB can plug right into the existing infrastructure due to its compatible connectors. ([View Highlight](https://read.readwise.io/read/01fdjxmxessth9jkjc2dc8ex4x)) > ![](https://blog.timescale.com/content/images/2020/08/database-ecosystem.jpg) > Ecosystem Comparison: InfluxDB vs TimescaleDB ([View Highlight](https://read.readwise.io/read/01fdjxnh0x4fd097aqq1fc4f8f)) > While InfluxDB high availability is only offered by their paid [enterprise version](https://docs.influxdata.com/influxdb/v1.8/high_availability/#clustering-with-influxdb-enterprise-influxdb-v1-8-high-availability-clusters), TimescaleDB supports high availability for free in both its open source and Community editions, via PostgreSQL streaming replication (as explained in [this tutorial](https://blog.timescale.com/scalable-postgresql-high-availability-read-scalability-streaming-replication-fb95023e2af/?utm_source=timescale-influx-benchmark&utm_medium=blog&utm_campaign=july-2020-advocacy&utm_content=streaming-replication-blog)). This is yet another benefit that Timescale inherits as a result of the rock solid foundation of PostgreSQL. ([View Highlight](https://read.readwise.io/read/01fdjxnsxf3c9rx8e34c9zaq2h)) > Finally, when investing in an open source technology primarily developed by a company, you are implicitly also investing in that company’s ability to serve you, whether you’re a paying customer or not. With that in mind, let’s note the differences between Timescale and InfluxData, the companies behind TimescaleDB and InfluxDB. ([View Highlight](https://read.readwise.io/read/01fdjxpxzz3yye125j5y37n0wp)) > Both databases have cloud offerings. [Timescale Cloud](https://www.timescale.com/cloud?utm_source=timescale-influx-benchmark&utm_medium=blog&utm_campaign=july-2020-advocacy&utm_content=timescale-cloud-signup), Timescale’s hosted and managed service, is available on AWS, GCP and Azure, in over 75 regions and over 2000 different region/storage/compute configurations. By comparison, Influx Cloud, InfluxData’s hosted service, is available on all 3 major clouds, but only in 4 regions. ([View Highlight](https://read.readwise.io/read/01fdjxqd5e5vn02zsrsdqt8p0w)) --- Title: TimescaleDB vs. InfluxDB: Purpose-Built for Time-Series Data Author: Timescale Blog Tags: readwise, articles date: 2024-01-30 --- # TimescaleDB vs. InfluxDB: Purpose-Built for Time-Series Data ![rw-book-cover](https://blog.timescale.com/content/images/2019/06/Screen-Shot-2019-06-19-at-3.02.10-PM.png) URL:: https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877/ Author:: Timescale Blog ## AI-Generated Summary An in-depth look into how two leading time-series databases stack up in terms of data model, query language, reliability, performance, ecosystem, operational management, and company/community support. ## Highlights > Typically database comparisons focus on performance benchmarks. Yet performance is just a part of the overall picture. It doesn't matter how well a database performs in benchmarks if it lacks the data model, query language, or reliability required for your production workloads. With that in mind, we begin by comparing TimescaleDB and InfluxDB across three qualitative dimensions, *data model, query language, and reliability,* before diving deeper with *performance benchmarks*. We then round out with a comparison with *database ecosystem, operational management, and company/community support*. ([View Highlight](https://read.readwise.io/read/01fdjwzfh87veaza1780jhdq1y)) > Yes, we are the developers of TimescaleDB, so you might quickly disregard our comparison as biased. But if you let the analysis speak for itself, you’ll find that we stay objective . ([View Highlight](https://read.readwise.io/read/01fdjwzkd5qeb59yx2nstzwavx)) > When it comes to data models, TimescaleDB and InfluxDB have two very different opinions: TimescaleDB is a relational database, while InfluxDB is more of a custom, NoSQL, non-relational database. What this means is that TimescaleDB relies on the *relational data model*, commonly found in PostgreSQL, MySQL, SQL Server, Oracle, etc. On the other hand, InfluxDB has developed its own custom data model, which, for the purpose of this comparison, we’ll call the *tagset data model*. ([View Highlight](https://read.readwise.io/read/01fdjx0437daqpj702115th5vg)) > The [*relational data model*](https://en.wikipedia.org/wiki/Relational_model) has been in use for several decades now. With the [relational model in TimescaleDB](https://docs.timescale.com/latest/introduction/data-model?utm_source=timescale-influx-benchmark&utm_medium=blog&utm_campaign=july-2020-advocacy&utm_content=docs-data-model), each time-series measurement is recorded in its own row, with a time field followed by any number of other fields, which can be floats, ints, strings, booleans, arrays, JSON blobs, [geospatial dimensions](http://postgis.net/), date/time/timestamps, currencies, binary data, or even [more complex data types](https://www.postgresql.org/docs/current/static/datatype.html). One can create indexes on any one field (standard indexes) or multiple fields (composite indexes), or on expressions like functions, or even limit an index to a subset of rows (partial index). Any of these fields can be used as a *foreign key* to secondary tables, which can then store additional metadata. ([View Highlight](https://read.readwise.io/read/01fdjx139vjxmvsdgd5vycm4np)) > The advantage of this approach is that it is quite flexible. One can choose to have: > • A narrow or wide table, depending on how much data and metadata to record per reading > • Many indexes to speed up queries or few indexes to reduce disk usage > • Denormalized metadata within the measurement row, or normalized metadata that lives in a separate table, either of which can be updated at any time (although it is easier to update in the latter case) > • A rigid schema that validates input types or a schemaless JSON blob to increase iteration speed > • Check constraints that validate inputs, for example checking for uniqueness or non-null values ([View Highlight](https://read.readwise.io/read/01fdjx7jnndb0enbtve5n1bkfh)) > With the InfluxDB [*tagset data model*](https://docs.influxdata.com/influxdb/v1.8/concepts/schema_and_data_layout/), each measurement has a timestamp, and an associated set of tags (tagset) and set of fields (fieldset). The fieldset represents the actual measurement reading values, while the tagset represents the metadata to describe the measurements. > Field data types are limited to floats, ints, strings, and booleans, and cannot be changed without rewriting the data ([View Highlight](https://read.readwise.io/read/01fdjx91wqe0trhjjrkv06xc4z)) > The advantage of this approach is that if one’s data naturally fits the tagset model, then it is quite easy to get started, as one doesn’t have to worry about creating schemas or indexes. > Conversely, the disadvantage of this model is that it is quite rigid and limited, with no ability to create additional indexes, indexes on continuous fields (e.g., numerics), update metadata after the fact, enforce data validation, etc. ([View Highlight](https://read.readwise.io/read/01fdjx9agph8a7y32qx3bn4vap)) > It’s also possible to create relational schema that are equivalent to the tagset model for specific use cases, such as [Prometheus](https://github.com/prometheus/prometheus) metrics. For more on this, see the [Timescale-Prometheus GitHub repository](https://github.com/timescale/timescale-prometheus). ([View Highlight](https://read.readwise.io/read/01fdjx9tvg6sy055wrgbqs922r)) > For most use cases, we believe that SQL is the right query language for a time-series database. SQL has a rich tradition and history, including familiarity among millions of developers and a vibrant ecosystem of tutorials, training, and community leaders. In short, choosing SQL means you’re never alone. ([View Highlight](https://read.readwise.io/read/01fdjxbcndsv4nf57279xq23pj)) > This is also why [SQL is making a comeback](https://blog.timescale.com/blog/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a/?utm_source=timescale-influx-benchmark&utm_medium=blog&utm_campaign=july-2020-advocacy&utm_content=why-sql-v-nosql-blog) as the query language of choice for data infrastructure in general. Indeed, SQL is well-documented and is the [third-most commonly used programming language among developers](https://insights.stackoverflow.com/survey/2020#most-popular-technologies). ([View Highlight](https://read.readwise.io/read/01fdjxbnje6d9esdp0hyb1y9fa)) > At its start, InfluxDB sought to completely [write an entire database in Go](https://blog.gopheracademy.com/birthday-bash-2014/why-influxdb-uses-go/). ([View Highlight](https://read.readwise.io/read/01fdjxcqa2wbjrnd1vzbe4cn45)) > These design decisions have significant implications that affect reliability. First, InfluxDB has to implement the full suite of fault-tolerance mechanisms, including replication, high availability, and backup/restore. Second, InfluxDB is responsible for its on-disk reliability, e.g., to make sure all its data structures are both durable and resist data corruption across failures (and even failures during the recovery of failures). ([View Highlight](https://read.readwise.io/read/01fdjxd34dc5p872c3gwayxcsk)) > We made a dramatically different architectural decision when building TimescaleDB: build on PostgreSQL. ([View Highlight](https://read.readwise.io/read/01fdjxdarr0exvr6rak6sjv3nh)) > databases have been built with an array of mechanisms to further reduce such risk, including streaming replication to replicas, full-snapshot backup and recovery, streaming backups, robust data export tools, etc. ([View Highlight](https://read.readwise.io/read/01fdjxdw8zckymfkpyh2vsyqct)) > InfluxDB, on the other hand, has had to build all these tools from scratch. ([View Highlight](https://read.readwise.io/read/01fdjxeacy3ttf5sft8dvga6gj)) > InfluxDB had to design and implement all recovery, reliability and durability functionality from scratch. [This is a notoriously hard problem in databases](https://dl.acm.org/citation.cfm?id=128770) that typically takes many years or even decades to get correct. ([View Highlight](https://read.readwise.io/read/01fdjxf6nyj62hrcs9hftxax0q)) > **Insert performance summary** > • For workloads with extremely low cardinality, the databases are comparable, with InfluxDB outperforming Timescale. > • As cardinality increases, InfluxDB insert performance drops off dramatically faster than that with TimescaleDB. > • For workloads with high cardinality, TimescaleDB has ~3.5x the insert performance as InfluxDB. > • If your insert performance is far below these benchmarks (e.g., if it is 2,000 rows / second), then insert performance will not be your bottleneck. ([View Highlight](https://read.readwise.io/read/01fdjxgjs10kdxgtvye5hn6ykb)) > ****Read latency performance summary**** > • For simple queries, TimescaleDB generally outperforms InfluxDB. > • For aggregates and double roll ups, TimescaleDB also generally outperforms InfluxDB. However, when simply rolling up just a single metric, InfluxDB can sometimes outperform TimescaleDB. > • When selecting rows based on a threshold, TimescaleDB outperforms InfluxDB by a significant margin, being up to 414% faster. > • For complex queries, TimescaleDB vastly outperforms InfluxDB, and supports a broader range of query types; the difference here is often in the range of seconds to tens of seconds, with Timescale 344-7100% the performance improvement over InfluxDB. ([View Highlight](https://read.readwise.io/read/01fdjxh2j0e9pyq4x9jg5p3hat)) > Stability issues during benchmarking ([View Highlight](https://read.readwise.io/read/01fdjxhpjrzntrqg9xe5nsf0qa)) > While we were able to insert batches of 10K into InfluxDB at lower cardinalities, once we got·to 100k devices we would experience timeouts and errors with batch sizes that large. The most common errors were write errors caused by exceeding the maximum cache memory size, timeouts and fatal out of memory errors, which all occurred during runtime. ([View Highlight](https://read.readwise.io/read/01fdjxhv380tqgn72gpfx1naw8)) > High-cardinality datasets are a significant weakness for InfluxDB. This is because of how the InfluxDB developers have architected their system, starting with their Time-series Index (TSI). ([View Highlight](https://read.readwise.io/read/01fdjxk7gtm1h3mjdnkg22q6tt)) > In contrast, TimescaleDB is a relational database that relies on a proven data structure for indexing data: the B-tree. This decision leads to its ability to scale to high cardinalities. ([View Highlight](https://read.readwise.io/read/01fdjxmcxczwswq6w4c3j5rffw)) > Having a broad ecosystem makes deployment easier. For example, if one is already using [Tableau](https://docs.timescale.com/latest/tutorials/visualizing-time-series-data-in-tableau?utm_source=timescale-influx-benchmark&utm_medium=blog&utm_campaign=july-2020-advocacy&utm_content=docs-tableau) to visualize data, or Apache Spark for data processing, TimescaleDB can plug right into the existing infrastructure due to its compatible connectors. ([View Highlight](https://read.readwise.io/read/01fdjxmxessth9jkjc2dc8ex4x)) > ![](https://blog.timescale.com/content/images/2020/08/database-ecosystem.jpg) > Ecosystem Comparison: InfluxDB vs TimescaleDB ([View Highlight](https://read.readwise.io/read/01fdjxnh0x4fd097aqq1fc4f8f)) > While InfluxDB high availability is only offered by their paid [enterprise version](https://docs.influxdata.com/influxdb/v1.8/high_availability/#clustering-with-influxdb-enterprise-influxdb-v1-8-high-availability-clusters), TimescaleDB supports high availability for free in both its open source and Community editions, via PostgreSQL streaming replication (as explained in [this tutorial](https://blog.timescale.com/scalable-postgresql-high-availability-read-scalability-streaming-replication-fb95023e2af/?utm_source=timescale-influx-benchmark&utm_medium=blog&utm_campaign=july-2020-advocacy&utm_content=streaming-replication-blog)). This is yet another benefit that Timescale inherits as a result of the rock solid foundation of PostgreSQL. ([View Highlight](https://read.readwise.io/read/01fdjxnsxf3c9rx8e34c9zaq2h)) > Finally, when investing in an open source technology primarily developed by a company, you are implicitly also investing in that company’s ability to serve you, whether you’re a paying customer or not. With that in mind, let’s note the differences between Timescale and InfluxData, the companies behind TimescaleDB and InfluxDB. ([View Highlight](https://read.readwise.io/read/01fdjxpxzz3yye125j5y37n0wp)) > Both databases have cloud offerings. [Timescale Cloud](https://www.timescale.com/cloud?utm_source=timescale-influx-benchmark&utm_medium=blog&utm_campaign=july-2020-advocacy&utm_content=timescale-cloud-signup), Timescale’s hosted and managed service, is available on AWS, GCP and Azure, in over 75 regions and over 2000 different region/storage/compute configurations. By comparison, Influx Cloud, InfluxData’s hosted service, is available on all 3 major clouds, but only in 4 regions. ([View Highlight](https://read.readwise.io/read/01fdjxqd5e5vn02zsrsdqt8p0w))