Thursday, February 9, 2017

Kryo serialization

≃ 1/4 smaller results and more than 3x faster than standard Java.
Widely used (Spark, ...).
Kryo instances  are not thread safe.

Serialization

Kryo kryo = new Kryo();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
Output output = new Output(bos);
byte bytes[]; 
try {
  kryo.writeClassAndObject(output, obj);
  bytes = output.toBytes();
} finally {
  bos.close();
}

Deserialization

Kryo kryo = new Kryo();
Input input = new Input(bytes);
Object obj;
try {
  obj = kryo.readClassAndObject(input);
}
finally {
  input.close();
}

Sunday, February 5, 2017

PostgreSQL tuning

About PostgreSQL

Periodic maintenance

Conventional (disk based)


max_connections
Number of planned connections.

effective_cache_size
Expected memory to be available in the OS and PostgreSQL buffer caches, not an allocation!. Used only by the PostgreSQL query planner to figure out whether plans it's considering would be expected to fit in RAM or not.
Setting effective_cache_size to 1/2 of total memory would be a normal conservative setting, and 3/4 of memory is a more aggressive but still reasonable amount.

shared_buffers
Page caching.
25% of system RAM and no more than 8GB.

Alternative?: higher values may perform much better if you can comfortably fit the working set inside shared_buffers leaving a generous amount of memory left over for other purposes. http://rhaas.blogspot.co.at/2012/03/tuning-sharedbuffers-and-walbuffers.html 

work_mem
Intermediante results, sort, hashjoins, materialized views, etc.
4 to 64MB. 

maintenance_work_mem
Maintenance operations like vacuuming and index creation.
5% of system RAM and no more than 512MB. 

wal_buffers
No more than 16MB. 

checkpoint_segments
Deprecated on 9.5. Now max_wal_size.
Better performance on bulk data loads. 

checkpoint_completion_target
0.9 will decrease the performance impact of checkpointing on a busy system.

checkpoint_timeout
Increasing checkpoint_timeout from 5 minutes to a larger value, such as 15 minutes, can reduce the I/O load on your system, especially when using large values for shared_buffers.

random_page_cost and seq_page_cost
Planner parameters.
An estimate for the relative cost of disk seeks.
2 and 1 (0.1 for both values if the DB completely fits in memory).

huge_pages

In-memory