My Quotes

When U were born , you cried and the world rejoiced
Live U'r life in such a way that when you go

Tuesday, March 13, 2018

JPA Static Meta Model

  1. When you write a criteria query or create a dynamic entity graph, you need to reference the entity classes and their attributes.
  2. The quickest and easiest way is to provide the required names as Strings.
  3. But this has several drawbacks, e.g. you have to remember or look-up all the names of the entity attributes when you write the query.
  4. But it will also cause even greater issues at later phases of the project, if you have to refactor your entities and change the names of some attributes.
  5. In that case you have to use the search function of your IDE and try to find all Strings that reference the changed attributes.
  6. This is a tedious and error prone activity which will easily take up the most time of the refactoring
  7. Use the static metamodel to write criteria queries and dynamic entity graphs.
  8. This is a small feature defined by the JPA specification which provides a type-safe way to reference the entities and their properties.
  1. The Metamodel Generator also takes into consideration xml configuration specified in orm.xml or mapping files specified in persistence.xml. However, if all configuration is in XML you need to add in at least on of the mapping file the following persistence unit metadata:
  2. Maven dependency: The jar file for the annotation processor can be found as below.
  3. Maven compiler plugin configuration - direct execution

  4. Maven compiler plugin configuration - indirect execution

  5. Configuration with maven-processor-plugin
  6. Javac Task configuration
    As mentioned before, the annotation processor will run automatically each time the Java compiler is called, provided the jar file is on the classpath.

  7. IDE Configuration
  1. A simple entity for this example.
    public class AlertEO implements{
     private static final long serialVersionUID = 1L;
     private Integer id;
     private String name;
     private String description;
      * method to get serial Id
      * @return id
     @GeneratedValue(strategy = GenerationType.AUTO)
     public Integer getId() {
      return id;
      * Functions to get id
      * @return id
     public void setId(Integer id){ = id;
      * Functions to get name
      * @return name
     @Column(name = "name")
     public String getName(){
      return name;
      * Functions to set name
      * @return name
     public void setName(String name){ = name;
      * Functions to get description
      * @return description
     @Column(name = "description")
     public String getDescription(){
      return description;
      * Functions to set description
      * @return description
     public void setDescription(String description){
      /* (non-Javadoc)
      * @see java.lang.Object#toString()
     public String toString() {
      return "AlertEO [id=" + id + ", name=" + name + ", description=" + description + "]";

  2. The class of the static metamodel looks similar to the entity.
    Based on the JPA specification, there is a corresponding metamodel class for every managed class in the persistence unit.
    You can find it in the same package and it has the same name as the corresponding managed class with an added ‘_’ at the end

    @Generated(value = "org.hibernate.jpamodelgen.JPAMetaModelEntityProcessor")
    public abstract class AlertEO_{
     public static volatile SingularAttribute<AlertEO, String>firstName;
     public static volatile SingularAttribute<AlertEO, String> lastName;
     public static volatile SetAttribute<AlertEO, Book> books;
     public static volatile SingularAttribute<AlertEO, Long> id;
     public static volatile SingularAttribute<AlertEO, Integer> version;

  3. Using metamodel classes
  4. You can use the metamodel classes in the same way as you use the String reference to the entities and attributes.
  5. The APIs for criteria queries and dynamic entity graphs provide overloaded methods that accept Strings and implementations of the Attribute interface.

    CriteriaBuilder cb = this.em.getCriteriaBuilder();
    // create the query
    CriteriaQuey<AlertEO> q = cb.createQuery(AlertEO.class);
    // set the root class
    Root<AlertEO> a = q.from(AlertEO.class);
    // use metadata class to define the where clause
    q.where(, "J%"));
    // perform query

Wednesday, January 10, 2018

ELK - Elastic, LogStash and Amazon Kibana - alternative for SPLUNK

ELK - Architecture

For more information on Kibana here is a nice article

  1. Step 1- Install Elasticsearch
    1. Download elasticsearch zip file from
    2. Extract it to a directory (unzip it)
    3. Run it (bin/elasticsearch or bin/elasticsearch.bat on Windows)
    4. Check that it runs using curl -XGET http://localhost:9200
    5. Here's how to do it (steps are written for OS X but should be similar on other systems):
cd elasticsearch-1.7.1
  1. Elasticsearch should be running now. You can verify it's running using curl. In a separate terminal window execute a GET request to Elasticsearch's status page:
curl -XGET http://localhost:9200
  1. If all is well, you should get the following result:
  "status" : 200,
  "name" : "Tartarus",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.7.1",
    "build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
    "build_timestamp" : "2015-07-29T09:54:16Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  "tagline" : "You Know, for Search"
  1. Step 2 - Install Kibana 4
  2. Download Kibana archive from
  3. Please note that you need to download appropriate distribution for your OS, URL given in examples below is for OS X
  4. Extract the archive
  5. Run it (bin/kibana)
  6. Check that it runs by pointing the browser to the Kibana's WebUI
tar xvzf kibana-4.1.1-darwin-x64.tar.gz
cd kibana-4.1.1-darwin-x64
  1. Point your browser to http://localhost:5601 (if Kibana page shows up, we're good - we'll configure it later)
  1. Step 3) Install Logstash
  2. Download Logstash zip from
  3. Extract it (unzip it)
  1. Step 4) Configure Spring Boot's Log File
  2. In order to have Logstash ship log files to Elasticsearch, we must first configure Spring Boot to store log entries into a file.
  3. We will establish the following pipeline: Spring Boot App --> Log File --> Logstash --> Elasticsearch.
  4. There are other ways of accomplishing the same thing, such as configuring logback to use TCP appender to send logs to a remote Logstash instance via TCP, and many other configurations.
  5. Anyhow, let's configure Spring Boot's log file.
  6. The simplest way to do this is to configure log file name in
  7. It's enough to add the following line:
Spring Boot will now log ERROR, WARN and INFO level messages in the application.log log file and will also rotate it as it reaches 10 Mb.
  1. Step 5) Configure Logstash to Understand Spring Boot's Log File Format
  2. Typical Logstash config file consists of three main sections: input, filter and output.
  3. Each section contains plugins that do relevant part of the processing
  4. such as file input plugin that reads log events from a file or elasticsearch output plugin which sends log events to Elasticsearch.
  5. Input section defines from where Logstash will read input data
  6. in our case it will be a file hence we will use a file plugin with multiline codec, which basically means that our input file may have multiple lines per log entry.
input {
  file {
    type => "java"
    path => "/path/to/application.log"
    codec => multiline {
      pattern => "^%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}.*"
      negate => "true"
      what => "previous"
  1. Explanation
  2. We're using file plugin.
  3. type is set to java - it's just additional piece of metadata in case you will use multiple types of log files in the future.
  4. path is the absolute path to the log file. It must be absolute - Logstash is picky about this.
  5. We're using multiline codec which means that multiple lines may correspond to a single log event,
  6. In order to detect lines that should logically be grouped with a previous line we use a detection pattern:
  7. pattern => "^%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}.*" ? Each new log event needs to start with date.
  8. negate => "true" ? if it doesn't start with a date ...
  9. what => "previous" ? ... then it should be grouped with a previous line.
  10. File input plugin, as configured, will tail the log file (e.g. only read new entries at the end of the file). Therefore, when testing, in order for Logstash to read something you will need to generate new log entries.
  1. Filter Section
  2. Filter section contains plugins that perform intermediary processing on an a log event.
  3. In our case, event will either be a single log line or multiline log event grouped according to the rules described above.
  4. In the filter section we will do several things:
  5. Tag a log event if it contains a stacktrace. This will be useful when searching for exceptions later on.
  6. Parse out (or grok, in logstash terminology) timestamp, log level, pid, thread, class name (logger actually) and log message.
  7. Specified timestamp field and format - Kibana will use that later for time based searches.
  8. Filter section for Spring Boot's log format that aforementioned things looks like this:
filter {
  #If log line contains tab character followed by 'at' then we will tag that entry as stacktrace
  if [message] =~ "\tat" {
    grok {
      match => ["message", "^(\tat)"]
      add_tag => ["stacktrace"]

  #Grokking Spring Boot's default log format
  grok {
    match => [ "message", 
               "(?%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME})  %{LOGLEVEL:level} %{NUMBER:pid} --- \[(?[A-Za-z0-9-]+)\] [A-Za-z0-9.]*\.(?[A-Za-z0-9#_]+)\s*:\s+(?.*)",
               "(?%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME})  %{LOGLEVEL:level} %{NUMBER:pid} --- .+? :\s+(?.*)"

  #Parsing out timestamps which are in timestamp field thanks to previous grok section
  date {
    match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ]
  1. Explanation:
  2. if [message] =~ "\tat" ? If message contains tab character followed by at (this is ruby syntax) then...
  3. se the grok plugin to tag stacktraces:
  4. match => ["message", "^(\tat)"] ? when message matches beginning of the line followed by tab followed by at then..
  5. add_tag => ["stacktrace"] ? ... tag the event with stacktrace tag.
  6. Use the grok plugin for regular Spring Boot log message parsing:
  7. First pattern extracts timestamp, level, pid, thread, class name (this is actually logger name) and the log message.
  8. Unfortunately, some log messages don't have logger name that resembles a class name (for example, Tomcat logs) hence the second pattern that will skip the logger/class field and parse out timestamp, level, pid, thread and the log message.
  9. Use date plugin to parse and set the event date:
  10. match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ] ? timestamp field (grokked earlier) contains the timestamp in the specified format
  1. Output Section
  2. Output section contains output plugins that send event data to a particular destination.
  3. Outputs are the final stage in the event pipeline.
  4. We will be sending our log events to stdout (console output, for debugging) and to Elasticsearch.
  5. Compared to filter section, output section is rather straightforward:
output {
  # Print each event to stdout, useful for debugging. Should be commented out in production.
  # Enabling 'rubydebug' codec on the stdout output will make logstash
  # pretty-print the entire event as something similar to a JSON representation.
  stdout {
    codec => rubydebug

  # Sending properly parsed log events to elasticsearch
  elasticsearch {
   hosts => [""]  #  takes an array of hosts (e.g. elasticsearch cluster) as value. 
  1. Putting it all together
  2. Finally, the three parts - input, filter and output - need to be copy pasted together and saved into logstash.conf config file.
  3. Once the config file is in place and Elasticsearch is running, we can run Logstash:
  4. /path/to/logstash/bin/logstash -f logstash.conf
  5. If everything went well, Logstash is now shipping log events to Elasticsearch.
  1. Step 6) Configure Kibana
  2. Ok, now it's time to visit the Kibana web UI again.
  3. We have started it in step 2 and it should be running at http://localhost:5601.
  4. First, you need to point Kibana to Elasticsearch index(s) of your choice.
  5. Logstash creates indices with the name pattern of logstash-YYYY.MM.DD.
  6. In Kibana Settings --> Indices configure the indices:
  7. Index contains time-based events (select this option)
  8. Use event times to create index names (select this option)
  9. Index pattern interval: Daily
  10. Index name or pattern: [logstash-]YYYY.MM.DD
  11. Click on "Create Index"
  12. Now click on "Discover" tab.
  13. It is the places for "Search" because it allows you to perform new searches and also to save/manage them.
  14. Log events should be showing up now in the main window.
  15. If they're not, then double check the time period filter in to right corner of the screen.
  16. Default table will have 2 columns by default: Time and _source.
  17. In order to make the listing more useful, we can configure the displayed columns.
  18. From the menu on the left select level, class and logmessage.
 Here is a sample output screent shot of the kibana console 

Tuesday, April 4, 2017

Fetch TWITTER data using R

Fetch Twitter data using R

  1. twitteR Package:
    1. One of the available package in R for fetching Twitter Data. The package can be obtained from CRAN.R.PROJECT
    2. This package allows us to make REST API calls to twitter using the ConsumerKey & ConsumerSecret code. Code below illustrates
      how to extract the Twitter Data.
    3. This package offers below functionality:
      1. Authenticate with Twitter API
      2. Fetch User timeline
      3. User Followers
      4. User Mentions
      5. Search twitter
      6. User Information
      7. User Trends
      8. Convert JSON object to dataframes
  2. REST API CALLS using R - twitteR package:
    1. Register your application with twitter.
    2. After registration, you will be getting ConsumerKey & ConsumerSecret code which needs to be used for calling twitter API.
    3. Load TwitteR library in R environment.
    4. Call twitter API using OAuthFactory$new() method with ConsumerKey & ConsumerSecret code as input params.
    5. The above step will return an authorization link, which needs to be copied & pasted in the internet browser.
    6. You will be redirected to Twitter application authentication page where you need to authenticate yourself by providing you twitter credentials.
    7. After authenticating , we will be provided with a Authorization code, which needs to be pasted in the R console.
    8. Call registerTwitterOAuth().
    9. friends information
    10. Location based
  3. Source Code:
    requestURL <-  ""
    accessURL =    ""
    authURL =      ""
    consumerKey =   "XXXXXXXXXXXX"
    consumerSecret = "XXXXXXXXXXXXXXXX"
    twitCred <- OAuthFactory$new(consumerKey=consumerKey,
    save(list="twitCred", file="twitteR_credentials")
    registerTwitterOAuth(twitCred)#Register your app with Twitter.
  4. StreamR Package:
    1. This package allows users to fetch twitter Data in real time by connecting to Twitter Stream API.
    2. We can obtain the package from STREAM.R.PROJECT
    3. Few important functions this package offers are: it allows R users to access Twitter's search streams,user streams, parse the output into data frames.
    4. filterStream() - filterStream method opens a connection to Twitter’s Streaming API that will return public statuses that match one or more filter predicates like search keywords.
    5. Tweets can be filtered by keywords, users, language, and location.
    6. The output can be saved as an object in memory or written to a text file.
    7. parseTweets() - This function parses tweets downloaded using filterStream, sampleStream or userStream and returns a data frame.
  5. Below code example shows how to fetch data in real time using RStream:
    load("twitteR_credentials")  # make using the save credentials in the previous code.
    filterStream( = "tweets.json", track = "#bigdata",timeout = 0, locations=c(-74,40,-73,41), oauth = twitCred)
    Executing the above will capturing Tweets on "#bigdata" from "NEW YORK" location. Here when we mention timeout=0, we are setting it to fetch continuously, to fetch records for certain time then use timeout=300 (fetches data for 300 secs)
    To Parse the fetched tweets use the below code:
    tweets.df <- parseTweets("tweets.json")

Tuesday, March 21, 2017

Technology stack of BigData

Docker versus Kubernetes - comparison

  1. Introduction
    1. Containers have become popular thanks to their focus on consistency across platforms from development to production.
    2. The rise in interest to containers has in turn brought in higher demands for their deployment and management.
    3. The need for better control attracted a number of software options as solutions for container orchestration, which allows for abstraction of individual containers to services with a number of instances or replicas.
    4. Two of the major players developing container orchestration are Docker and Kubernetes.
  2. Kubernetes
    1. Kubernetes is an open-source platform for container deployment automation, scaling, and operations across clusters of hosts. The production ready orchestrator draws on Google’s
      extensive experience of years of working with Linux containers.
    2. Kubernetes aims to provide the components and tools to relieve the burden of running applications in public and private clouds by grouping containers into logical units. Their strengths lie in flexible growth, environment agnostic portability, and easy scaling.
  3. Docker Swarm
    1. Swarm is the native clustering for Docker. Originally Docker Swarm did not provide much in the sense of container automation, but with the update to Docker Engine 1.12, container orchestration is now built into its core with first party support.
    2. Docker Swarm is designed around four core principles:
      1. Simple yet powerful with a “just works” user experience,
      2. Resilient zero single-point-of-failure architecture,
      3. Secure by default with automatically generated certificates, and
      4. Backwards compatibility with existing components.
    3. The promise of backwards compatibility is especially important to the existing users. Any tools or containers that work with Docker run equally well in Docker Swarm.
  4. Comparisons
    Although both orchestrators provide much of the same functionality to one another, there are fundamental differences in between how the two operate. Below are listed some of the most notable points on where these rivals diverge.

  5. Summary
    1. Throughout the comparison, it is possible to note how Kubernetes and Docker Swarm fundamentally differ.
    2. Swarm focuses on ease of use with integration with Docker core components while Kubernetes remains open and modular.
    3. The same difference can be noticed while installing and configuring each of the orchestrators.
    4. Docker provides a simple solution that is fast to get started with while Kubernetes aims to support higher demands with higher complexity.
    5. For much of the same reasons, Docker has been popular among developers who prefer simplicity and fast deployments.
    6. At the same time, Kubernetes is used in production environments by many high profile internet companies running popular services

Friday, March 3, 2017

Kafka Multiple Topic

Producer and multiple Topics

    Download a recent stable version of Apache Kafka
  1. Untar the package
  2. Enter into Kafka directory
  3. Start Zookeeper Server
  4. bin/ config/
  5. In a different terminal start Kafka Server
  6. bin/ config/
  7. Create a topic test (if not exists)
  8. bin/ --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic example2
  9. Create a topic test1 (if not exists)
  10. bin/ --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic example3
  11. Start consumer on topic test
  12. bin/ --zookeeper localhost:2181 --topic example2 --from-beginning
  13. Start consumer on topic test1
  14. bin/ --zookeeper localhost:2181 --topic example3 --from-beginning
  15. Run mvn clean compile exec:exec
  16. You should see the message in the consumer terminal.
    Here are the project sources
  1. Project POM.xml

  2. Producer Java File
    package com.example.kafka;
    import java.util.Properties;
    import org.apache.kafka.clients.producer.KafkaProducer;
    import org.apache.kafka.clients.producer.ProducerRecord;
    import org.apache.kafka.common.serialization.StringSerializer;
    public class ProducerMultipleTopic {
        public static void main(String[] args) {
            Properties props = new Properties();
            props.put("bootstrap.servers", "localhost:9092");
            props.put("key.serializer", StringSerializer.class.getName());
            props.put("value.serializer", StringSerializer.class.getName());
            KafkaProducer prod = new KafkaProducer(props);
            ProducerRecord data1 = new ProducerRecord("example2", "example2");
            ProducerRecord data2 = new ProducerRecord("example3", "example3");
  3. Logger file
    # Root logger option
    log4j.rootLogger=INFO, stdout
    # Direct log messages to stdout
    log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

  4. This is the PARENT POM

Thursday, December 1, 2016

Application Contianer versus System Container

When people talk about containers, they usually mean application containers. Docker is automatically associated with application containers and is widely used to package applications and services. But there is another type of container: system containers. Let us look at the differences between application containers vs. system containers and see how each type of container is used:

  1. Application Containers 
    • Application/service centric
    • Growing tool ecosystem
    • Security concerns
    • Networking challenges
    • Hampered by base OS limitations 
  1. System Containers

    • Machine-centric
    • Limited tool ecosystem
    • Datacenter-centric
    • Isolated & secure
    • Optimized networking

  • Application Containers

Application containers are used to package applications without launching a virtual machine for each app or each service within an app. They are especially beneficial when making the move to a microservices architecture, as they allow you to create a separate container for each application component and provide greater control, security and process restriction. Ultimately, what you get from application containers is easier distribution. The risks of inconsistency, unreliability and compatibility issues are reduced significantly if an application is placed and shipped inside a container.
Docker is currently the most widely adopted container service provider with a focus on application containers. However, there are other container technologies like CoreOS’s Rocket. Rocket promises better security, portability and flexibility of image sharing. Docker already enjoys the advantage of mass adoption, and Rocket might just be too late to the container party. Even with its differences, Docker is still the unofficial standard for application containers today.

Docker Datacenter enables the deployment of containerized apps across multiple environments, from on-premises to virtual private cloud infrastructure.

With Docker Datacenter you can provide a Containers as a Service (CaaS) environment for your teams.

Deploying Docker Datacenter provides options for container deployment:

  • On-premises. Docker can be deployed to on-premises datacenters.
  • Virtual Private Cloud. Docker can be deployed to virtual private cloud environments including Microsoft Azure and Amazon Web Services.
  • Portability. With Docker, you retain control of where you deploy your app.

As the use of containers increases and organizations deploy them more widely, the need for tools to manage containers across the infrastructure also increases. Orchestrating a cluster of containers is a competitive and rapidly evolving area, and many tools exist offering various feature sets.

Container orchestration tools can be broadly defined as providing an enterprise-level framework for integrating and managing containers at scale. Such tools aim to simplify container management and provide a framework not only for defining initial container deployment but also for managing multiple containers as one entity -- for purposes of availability, scaling, and networking.

Some container orchestration tools to know about include:

  • Amazon ECS -- The Amazon EC2 Container Service (ECS) supports Docker containers and lets you run applications on a managed cluster of Amazon EC2 instances.
  • Azure Container Service (ACS) -- ACS lets you create a cluster of virtual machines that act as container hosts along with master machines that are used to manage your application containers.
  • Cloud Foundry’s Diego -- Diego is a container management system that combines a scheduler, runner, and health manager. It is a rewrite of the Cloud Foundry runtime.
  • CoreOS Fleet -- Fleet is a container management tool that lets you deploy Docker containers on hosts in a cluster as well as distribute services across a cluster.
  • Docker Swarm -- Docker Swarm provides native clustering functionality for Docker containers, which lets you turn a group of Docker engines into a single, virtual Docker engine.
  • Docker Shipyard is a handy tool for people who love Docker Swarm, but wish it did even more. While Swarm focuses on container orchestration through the CLI, Docker Shipyard takes things further by letting you manage app images and container registries in addition to containers themselves. Plus, Shipyard offers a Web-based graphical front-end and a rich API in addition to a CLI.
  • Google Container Engine -- Google Container Engine, which is built on Kubernetes, lets you run Docker containers on the Google Cloud platform. It schedules containers into the cluster and manages them based on user-defined requirements.
  • Kubernetes -- Kubernetes is an orchestration system for Docker containers. It handles scheduling and manages workloads based on user-defined parameters.
  • Mesosphere Marathon -- Marathon is a container orchestration framework for Apache Mesosthat is designed to launch long-running applications. It offers key features for running applications in a clustered environment.

Additionally, the Cloud Native Computing Foundation (CNCF) is focused on integrating the orchestration layer of the container ecosystem. The CNCF’s stated goal is to create and drive adoption of a new set of common container technologies, and it recently selected Google’s Kubernetes container orchestration tool as its first containerization technology.

  • System Containers: How They’re Used

System containers play a similar role to virtual machines, as they share the kernel of the host operating system and provide user space isolation. However, system containers do not use hypervisors. (Any container that runs an OS is a system container.) They also allow you to install different libraries, languages, and databases. Services running in each container use resources that are assigned to just that container.

System containers let you run multiple processes at the same time, all under the same OS and not a separate guest OS. This lowers the performance impact, and provides the benefits of VMs, like running multiple processes, along with the new benefits of containers like better portability and quick startup times.

  • Useful System Container Tools
    • Joyent’s Triton is a Container as a Service that implements its proprietary OS called SmartOS. It not only focuses on packing apps into containers but also provides the benefits of added security, networking and storage, while keeping things lightweight, with very little performance impact. The key differentiator is that Triton delivers bare-metal performance. With Samsung’s recent acquisition of Joyent, it’s left to be seen how Triton progresses.
    • Giant Swarm is a hosted cloud platform that offers a Docker-based virtualization system that is configured for microservices. It helps businesses manage their development stack, spend less time on operations setup, and more time on active development.
    • LXD is a fairly new OS container that was released in 2016 by Canonical, the creators of Ubuntu. It combines the speed and efficiency of containers with the famed security of virtual machines. Since Docker and LXD share the same kernels, it is easy to run Docker containers inside LXD containers.