Data has no format

Posted on May 11, 2016 · 1 min read · tagged with: #architecture #data #design

I need to be able to store 1GB of JSON

I’d like to push XML 100 MB/s to this Azure blob

I need to log this data as CSV

Statements like this are sometimes true, but in the majority of cases the format is not given and is a part of designing your architecture/application. Or redesigning if needed. Selecting a proper format can lower the size of your data, increasing the throughput of your system, if a medium like a disk or a network is saturated. That’s why systems like Apache Arrow or Google’s Dremel use their own formats. That’s why you may consider using the protobuf-net serialization for EventStore, disabling it build in v8 projections and lowering size of events at the same time. For low latency systems you can choose the new library Simple Binary Encoding. That’s why sometimes storing data in another format is simply better. I’ve written a blog post Do we really need all these data tranformations and it doesn’t state something opposite. It’s all about making a rational and proper choices of the storage format and taking into consideration different aspects of it and its influence on your system. With this one decision you might improve your system performance greatly.


Comments

...so what weighs more: 1MB of JSON or 1MB of XML:)?

by pw at 2016-05-11 07:31:17 +0000