Tuesday, May 16, 2017

HP Vertica

  • HP columnar database. 
  • PostgreSQL fork.
  • Updates and inserts (even batched) are slow. Bulk loading required.
Tools
  • Binaries at opt/vertica/bin and opt/vertica/bin
vsql
adminTools

VM credentials
dbadmin/password
root/password

Example Database
Scripts location: /opt/vertica/examples
Installation: ./install_example VMart 
Verification: launch vsql and select count(1) from store.store_sales_fact;

Singularities
Bulk loading
Tips
https://github.com/jackghm/Vertica/wiki/HP-Vertica-Tips,-Tricks,-and-Best-Practices

Monday, May 8, 2017

Notes on Apache Kafka


http://davewentzel.com/content/kafka-notes/

Clustering
Based on Zookeeper
https://www.quora.com/What-is-the-actual-role-of-ZooKeeper-in-Kafka
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper

Log compaction
Cosumer receives at least the last message of each key.
http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/

JSON serialization
org.apache.kafka.connect.json.JsonConverter
Easier to setup
https://blog.knoldus.com/2017/01/30/kafka-sending-object-as-a-message/

AVRO serialization
io.confluent.connect.avro.AvroConverter
Recommended: schema based, fast, compact, versioning (evolution)
Schema registry needed (Confluent distro)
Schemas are automatically registered by the producers
http://cloudurable.com/blog/avro/index.html
http://cloudurable.com/blog/kafka-avro-schema-registry/index.html
https://www.slideshare.net/JeanPaulAzar1/kafka-and-avro-with-confluent-schema-registry

Compression
GZIP or Snappy
Less bandwitdh and disk space, more CPU resources.

Topics and Partitions
Multiple topics per producer: https://stackoverflow.com/questions/21376715/how-many-producers-to-create-in-kafka

Distributions
Producer
http://cloudurable.com/blog/kafka-tutorial-kafka-producer-advanced-java-examples/index.html