Hortonworks and Ubuntu deployment guide

I recently deployed HortonWorks on a small lab environment. 4 virtual machines were used on a Intel Nuc barebone with 16gb memory. I saved the history of the command line and made several screenshots. In case you want to deploy HortonWorks yourself, this might be useful.

For my VMWare ESXi I had to install the open-vm-tools, see below.

sudo apt-get install open-vm-tools

The installation requires a root account or an account with enough privileges. I used the root account option. For Ubuntu you need to enable this first:

sudo nano /etc/ssh/sshd_config
PermitRootLogin yes

After changing the sshd_config, restart ssh and change the password:

sudo service ssh restart
sudo passwd root

Next thing I have done is that I added all the servers to the hosts file on the first machine I wanted to use for installation. You can use DNS for this or the hosts file. In my small lab environment I went for the hosts file.

nano /etc/hosts

Add the following lines of code; hadoop01 hadoop01 hadoop02 hadoop03 hadoop04

Next thing is that we need to be able to access all machines from the first node. Create a ssh key for this first:


Now copy the ssh key to all authorized_keys:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_rsa.pub | ssh root@hadoop02 "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"
cat ~/.ssh/id_rsa.pub | ssh root@hadoop03 "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"
cat ~/.ssh/id_rsa.pub | ssh root@hadoop04 "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"

HortonWorks requires transparent hugepages on each server. Disable this setting by doing the following:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

We also need NTP on every machine. Install NTP by:

sudo apt-get install ntp

Now we’re ready to start the installation. Add the HortonWorks ambari repository:

wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/ -O /etc/apt/sources.list.d/ambari.list
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD

Update everything and install ambari server:

apt-get update
apt-get install ambari-server

Now you can run the initial setup. Use the following command line below and see how I answered the questions below:

ambari-server setup

Customize user account for ambari-server daemon [y/n] -> n
Checking JDK… -> [1] Oracle JDK 1.8 + Java Cryptography Extension (JCE) Policy Files 8
Do you accept the Oracle Binary Code License Agreement [y/n] -> y
Enter advanced database configuration [y/n] (n)? -> n

Accept the Oracle JDK license when prompted. You must accept this license to download the necessary JDK from Oracle. The JDK is installed during the deploy phase.

Select n at Enter advanced database configuration to use the default, embedded PostgreSQL database for Ambari. The default PostgreSQL database name is ambari. The default user name and password are ambari/bigdata. Otherwise, to use an existing PostgreSQL, MySQL or Oracle database with Ambari -> y

Now you are ready to start the server. Use the following command:

ambari-server start

Navigate to the 8080 port on the ambari server. In my case I used

Use the admin/admin combination, see below:


Next step is that we want to launch the install wizard, use this button.


Give the cluster a name. In my case I used the name “hadoop”.


Select the distribution version. I used the latest HDP 2.4


Select the nodes you want to install on. I used all four nodes, see below. For the communication I had to copy paste the ssh key into this screen. Use the command below and copy paste the entire ssh key into this field:

cat .ssh/id_rsa


Confirm and start to install:


When ready, select the services you want to use.



Next thing is to assign the masters. The first node I used as namenode, zookeeper, atlas and grafana. For the next snamenode, history, app timeline server, etc. You can divide them all to one server, but at least make sure you have enough memory.


Next step is to assign the slaves and clients. I made all hosts a data node and node manager. You might to make an exception for the first node.


Next step is for hive to create a new mysql database. This is where all the management information will be stored on.


I had to type a password for grafana in order to complete my installation:


Review and finish the installation:


All the packages will be deployed:



After the installation the admin user was not able to connect to the HDFS client. In order to do so, switch to the hdfs system account user.

su - hdfs
hadoop fs -mkdir /user/admin

Set the ownership on the newly created directory:

hadoop fs -chown admin:hadoop /user/admin


Synology data scrubbing speedup

My NAS sometimes needs to perform a parity consistency check. To speed up this process you the following command:

echo 190000 > /proc/sys/dev/raid/speed_limit_min

The process should be now 10 times quicker!


[WordPress] filter out unneeded menu classes

Here’s a code snippet for WordPress to filter out unneeded menu classes. Put the following line in your functions.php

// Reduce nav classes, leaving only 'current-menu-item'
function nav_class_filter($var)
    return is_array($var) ? array_intersect($var, array(
    )) : '';
add_filter('nav_menu_css_class', 'nav_class_filter', 100, 1);
add_filter('nav_menu_item_id', 'my_css_attributes_filter', 100, 1);

[WordPress] set cookie

Here’s a code snippet for WordPress. Since WordPress doesn’t support any sessions a cookie might be useful. Here’s a code snippet to use for your functions.php

//Set cookie
function set_newuser_cookie() {
	if (!isset($_COOKIE['sitename_newvisitor'])) {
		setcookie('sitename_newvisitor', 1, time()+1209600, COOKIEPATH, COOKIE_DOMAIN, false);
add_action( 'init', 'set_newuser_cookie');

[WordPress & Genesis] Add parent and child classses to menu

Here’s another code snippet for WordPress Genesis Framework.

If you would like to add menu classes to parent and child menu’s, use the code below.
Put this in the functions.php file:

// Function to add parent and child classses to menu
class Arrow_Walker_Nav_Menu extends Walker_Nav_Menu
    function display_element($element, &$children_elements, $max_depth, $depth = 0, $args, &$output)
        $id_field = $this->db_fields['id'];
        if (0 == $depth) {
            $element->classes[] = 'menu-top'; //top main menu
            if (empty($children_elements[$element->$id_field])) {
                $element->classes[] = 'menu-noparent'; //no childs
        if (!empty($children_elements[$element->$id_field])) {
            $element->classes[] = 'menu-parent'; //child in menu
        Walker_Nav_Menu::display_element($element, $children_elements, $max_depth, $depth, $args, $output);

[WordPress & Genesis] Add menu classes to first en last menu items

Here’s a code snippet for WordPress Genesis Framework.

If you would like to add menu classes to first en last menu items, use the code below. Put this in the functions.php file:

// Function to add menu classes to first en last menu items
function add_first_and_last($items)
    $items[1]->classes[]             = 'menu-first-item';
    $items[count($items)]->classes[] = 'menu-last-item';
    return $items;
add_filter('wp_nav_menu_objects', 'add_first_and_last');

[WordPress & Genesis] custom viewport

Here’s a code snippet for WordPress Genesis Framework.

If you would like to use a custom viewport for mobile devices, for example, use the code below:

/** Add Viewport meta tag for mobile browsers */
add_action('genesis_meta', 'add_viewport_meta_tag');
function add_viewport_meta_tag()
    echo '<meta name="viewport" content="width=1020">';

[WordPress & Genesis] custom footer or custom header

Here’s a code snippet for WordPress Genesis. If you would like to use a custom header or custom footer, don’t use any header.php of footer.php. Use the code snippet below:

// Include Header, seperate from functions
include 'custom-header.php';

// Include Footer, seperate from functions
include 'custom-footer.php';

[Synology] How to secure photostation with htaccess

Here’s a short instruction on how to protect your synology photostation by using htaccess:

Create the following file: /volume1/@appstore/PhotoStation/photo/.htaccess

AuthName "Restricted Area"
AuthType Basic
AuthUserFile /volume1/@appstore/PhotoStation/photo/.htpasswd
AuthGroupFile /dev/null
require valid-user

and the following file: /volume1/@appstore/PhotoStation/photo/.passwd


Use the passwd htaccess online generate for your own password.


Synology: Monitoring Apache with mod_status

Here another quick manual. If you would like to monitor your apache webserver, you can do that with mod_status. Open the httpd.conf-user file:

pico /usr/syno/apache/conf/httpd.conf-user

Copy paste the following content from below:

<Location /server-status>
   SetHandler server-status
   Order Deny,Allow
   Deny from all
   Allow from all

Save the httpd.conf-user file and restart apache with:

/usr/syno/etc/rc.d/S97apache-user.sh restart

You can now obtain the apache server status by querying the following url:


This might be useful for people working with cacti.


Synology: Run sabnzbd behind apache

Here’s a quick instruction for those who would like to run sabnzbd behind apache on a synology nas system. Open up a ssh connection, create the following file:

nano /usr/syno/etc/sites-enabled-user/sabnzbd.conf

Copy paste the contents below to this file. Please note that my sabnzbd port is 9090. If you would like to change this, change the config file below.

# Put this after the other LoadModule directives
LoadModule proxy_module /usr/syno/apache/modules/mod_proxy.so
LoadModule proxy_http_module /usr/syno/apache/modules/mod_proxy_http.so

<Location /sabnzbd>
order deny,allow
deny from all
allow from all
ProxyPass http://localhost:9090/sabnzbd
ProxyPassReverse http://localhost:9090/sabnzbd

Save the file by using control-x. Restart apache with the following command:

/usr/syno/etc/rc.d/S97apache-user.sh restart

Restore / Extract Plesk 9.5.4 backup

If you want to do a manual restore of your Plesk 9.5.4 backup you should use the following commando:

cat plesk-backup_1205270308.tar* | tar xvf -

The domains, vhosts and databases are found in the clients, domains and resellers folders.


updatedb on Synology

If you would like to index your Synology file system you should install the mlocate package from optware

Install then the  mlocate package with the following command:

ipkg install mlocate

To update your filesystem use:


To then search use the locate command.


Environment variable HADOOP_CMD must be set before loading package rhdfs

Mocht je onder Ubuntu met R onverhoop de volgende melding krijgen:

Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
  call: fun(libname, pkgname)
  error: Environment variable HADOOP_CMD must be set before loading package rhdfs

Probeer dan de volgende regel toe te voegen aan het /etc/environment bestand:


Wellicht lost dit het bovenstaande probleem op!


Tuning Mapreduce Jobs

Mocht je met mapreduce willen tunen dan is het handig om een aantal parameters in de gaten te houden. De volgende parameters kunnen van belangrijk zijn:

mapred.tasktracker.map.tasks.maximum = The maximum number of map tasks that will be run simultaneously by a task tracker.
mapred.tasktracker.reduce.tasks.maximum = The maximum number of reduce tasks that will be run simultaneously by a task tracker.
mapred.reduce.tasks = The default number of reduce tasks per job.
mapred.map.tasks = The default number of map tasks per job. Ignored when mapred.job.tracker is “local”.

Deze zijn alle te configureren in het mapred-site.xml bestand. Hier een voorbeeld:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- In: conf/mapred-site.xml -->

Een korte test kan je bijvoorbeeld laten uitvoeren door PI te laten uitrekenen door Hadoop:

hadoop jar /usr/local/hadoop/hadoop-examples-1.0.3.jar pi 10 10
hadoop dfs -rmr /user/hduser/PiEstimator_TMP_3_141592654

Meer informatie over tuning is hier te vinden: Pro Hadoop Ch. 6