Hyperion System Monitoring
In my KSCOPE12 presentation, Planning for and Managing Hyperion Infrastructure, I give a 100,000 foot view of monitoring.
In this article I bring monitoring down to a 20,000 foot view.
There are many products on the market to choose from for monitoring your Hyperion system(s).
In a nut shell you need a solution which has the below capabilities and characteristics:
- Little or no false alarms
- Ability to monitor: Database, Disk, Processor, Memory, and Network
- Ability to extend base product capabilities with custom monitoring
- Ability to test key parts of the system to prove it really is working
I narrow the field down to five that I believe have one or more of the following qualities: easy to implement, inexpensive, and/or purpose built for Hyperion. You may note I have excluded solutions from: BMC, IBM, HP, and CA which have products not exhibiting the former qualities.
When implementing any new software you should go through an evaluation cycle to understand how one tool compares to another in terms of capabilities and price.
The Problem
You understand Perl scripting and want to deploy a quick solution to monitor which is free and has no real infrastructure needs and can be deployed in less than an hour.
The Solution
The below Perl code was a weekend project several months back. It does have a feature (testing of Smart View logins) you may wish to integrate with a more formal monitoring solution.
It is easy to implement and can run from your existing Hyperion deployment with no other software. It relies on Perl and will work with the versions available on the 11.1.2 Hyperion line.
Modify the configuration file to suite your environment. This means changing the server names to correspond to those which are applicable to your environment, adding or removing services, adding or removing log and keywords, setting your system outage times, and finally scheduling the script to run on an interval (usually every 5 minutes).
NOTE: The mail server configuration is hard-coded in the script so be sure to update that.
Sample config file:
#Current code has some windows specifics e.g. the \\ notation. #UPDATE BELOW SAMPLE and change JAVA_APP_01 to your main java application server #UPDATE BELOW SAMPLE and change ESS_APP_01 to be your Essbase #TYPE,ID,TYPE_ATTRIB #Down time cannot cross days. down_time,ALL,SAT_2100_2330 down_time,ESB,ALL_0300_0430 credentials,user,password email,OPS,notify1@myco.com email,OPS,notify2@myco.com #We can clean this up as we know which java.lang we really care about. #You will probably ignore all java exceptions as there are a lot of "normal" ones error_action,java.lang,notify #ORA-01033: ORACLE initialization or shutdown in progress #error_action,ORA-01033,notify log,APSWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9aps-sysout.log log,CALCWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9CALC-sysout.log log,EASWEBB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9eas-sysout.log log,SSWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9FoundationServices-sysout.log log,FRWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9FRReports-sysout.log log,PLNWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9Planning-sysout.log log,RAFWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9RaFramework-sysout.log log,RAFAGENT,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9RaFrameworkAgentOut.log log,WAWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9WebAnalysis-sysout.log log,APSWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9aps-sysout.log log,CALCWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9CALC-sysout.log log,EASWEBB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9eas-sysout.log log,SSWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9FoundationServices-sysout.log log,FRWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9FRReports-sysout.log log,PLNWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9Planning-sysout.log log,RAFWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9RaFramework-sysout.log log,RAFAGENT,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9RaFrameworkAgentOut.log log,WAWEB,\\JAVA_APP_01\c$\Oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\services\HyS9WebAnalysis-sysout.log service,APSWEB,HyS9aps service,CALCWEB,HyS9CALC service,EASWEB,HyS9eas service,EPMADS,HyS9EPMADataSynchronizer service,EPMAWEB,HyS9EPMAWebTier service,SSWEB,HyS9FoundationServices service,ESB,opmn_EPM_epmsystem1 service,FRWEB,HyS9FRReports service,PLNWEB,HyS9Planning service,RAFWEB,HyS9RaFramework service,RAFAGENT,HyS9RaFrameworkAgent service,WAWEB,HyS9WebAnalysis service,RMIREG,Hyperion RMI Registry machine,ESB,ESS_APP01 machine,RAFAGENT,JAVA_APP_01 machine,FRWEB,JAVA_APP_01 machine,PLNWEB,JAVA_APP_01 machine,EASWEB,JAVA_APP_01 machine,RMIREG,JAVA_APP_01 machine,APSWEB,JAVA_APP_01 machine,WAWEB,JAVA_APP_01 machine,SSWEB,JAVA_APP_01 machine,RAFWEB,JAVA_APP_01 machine,RAFAGENT,JAVA_APP_01 machine,FRWEB,JAVA_APP_01 machine,PLNWEB,JAVA_APP_01 machine,EASWEB,JAVA_APP_01 machine,RMIREG,JAVA_APP_01 machine,APSWEB,JAVA_APP_01 machine,WAWEB,JAVA_APP_01 machine,SSWEB,JAVA_APP_01 machine,RAFWEB,JAVA_APP_01 port,ESB,1423 port,EPMADIM,5251 port,EPMAJNI,5255 port,RAFAGENT,6860 port,FRWEB,8200 port,PLNWEB,8300 port,CALCWEB,8500 port,EASWEB,10080 port,RMIREG,11333 port,APSWEB,13080 port,WAWEB,16000 port,EPMADS,19101 port,EPMAWEB,19091 port,SSWEB,28080 port,RAFWEB,45000 appcheck,RAFAGENT,port appcheck,ESB,port appcheck,RAFWEB,loginwks appcheck,FRWEB,port appcheck,PLNWEB,port appcheck,CALCWEB,port appcheck,EASWEB,port appcheck,RMIREG,port appcheck,APSWEB,port appcheck,WAWEB,port appcheck,EPMADS,port appcheck,EPMAWEB,port appcheck,SSWEB,port
Perl script:
use strict;
use Switch;
use File::stat;
use IO::Socket;
use LWP::UserAgent;
use HTTP::Request::Common;
use HTTP::Cookies;
use URI::Escape;
use Net::SMTP;
my $user_id;
my $password;
my @Machines=();
my @OPSEmailNotify=();
my %LoggedErrors;
my %ApplicationCheck;
my %Downtime;
my %ErrorAction;
my %Logs;
my %Ports;
my %Service;
#DB Outage window Saturdays from 9:00 PM - 11:30
my $service_message="";
my $service_timestamp="";
my %day_hash = ('SUN',0,'MON',1,'TUE',2,'WED',3,'THU',4,'FRI',5,'SAT',6);
my($filepath,$junk) = split(/\./,$0);
#Name the configuration file with same name as script. e.g. system_mon.pl and system_mon.cfg or mon.pl and mon.cfg
my $config_file=$filepath.".cfg";
print "Reading $config_file\n";
open (CONFIG_FILE,$config_file) || die "cannot open file $config_file\n";
my @lines = <CONFIG_FILE>;
close(CONFIG_FILE);
my $index;
my $type;
my $id;
my $type_attrib;
my $count = @lines;
print "Found $count lines in ".$config_file."\n";
for ($index = 0;$index< $count;$index++) {
if (!($lines[$index] =~m/^\#/)) {
($type,$id,$type_attrib)=split(/,/,$lines[$index]);
chomp $type_attrib;
switch ($type) {
case "error_action" {$ErrorAction{ $id } = $type_attrib;}
case "down_time" {$Downtime{ $id } = $type_attrib;}
case "log" {$Logs{ $id } = $type_attrib;}
case "service" {$Service{ $id } = $type_attrib;}
case "port" {$Ports{ $id } = $type_attrib;}
case "appcheck" {$ApplicationCheck{ $id } = $type_attrib;}
case "machine" {push(@Machines,"$id,$type_attrib");}
case "email" {if ($id eq "OPS") {push(@OPSEmailNotify,"$type_attrib");}}
case "credentials" { $user_id = $id; $password = $type_attrib; }
else { print "unknown type $type, ignoring\n"};
}
}
}
my $ERROR_FH;
my $error_file=$filepath.".err";
if (-f $error_file) {
open ($ERROR_FH,"<", $error_file) || die "cannot open file $error_file\n";
while (<$ERROR_FH>) {
chomp $_;
my($ts,$app,$log)=split(/_/);
$LoggedErrors{ $app.$ts } = $log;
}
close $ERROR_FH;
}
open ($ERROR_FH,">", $error_file) || die "cannot open file $error_file\n";
#Loop through down-time and perform checks if we are not in down_time window
if (downtime_check("ALL")) { print("Down time window, no system checks shall be ran.\n"); }
else {
application_check($ERROR_FH);
monitor_logs($ERROR_FH);
}
close $ERROR_FH;
sub application_check() {
my $FH = shift @_;
my $error_message="";
my $ts=localtime;
foreach (@Machines) {
my($service_id,$machine_name)=split(/,/);
if (downtime_check($service_id)==0) {
switch ($ApplicationCheck{$service_id}) {
case "port" { if (port_check($machine_name,$service_id)) {printf $FH "%s_%s_%s\n",$machine_name,$Ports{$service_id},$ts;
if (!exists($LoggedErrors{$Ports{$service_id}.$machine_name})) { $error_message=$error_message."Machine Name: $machine_name Service:$Service{$service_id} Port:$Ports{$service_id} not available\n"}} }
case "loginwks" {
switch (logon_raf_smartview($machine_name,"IIS",$user_id,$password)) {
case "1" { printf $FH "%s_%s_%s\n",$machine_name,"loginwks",$ts; if (!exists($LoggedErrors{"loginwks".$machine_name})) {$error_message=$error_message."logon_raf_smartview: $machine_name Invalid User or Password passed to login_raf_smartview!"; }}
case "2" { printf $FH "%s_%s_%s\n",$machine_name,"loginwks",$ts; if (!exists($LoggedErrors{"loginwks".$machine_name})) {$error_message=$error_message."logon_raf_smartview: $machine_name Service:".$Service{'SSWEB'}." FAILED\n";}}
case "3" { printf $FH "%s_%s_%s\n",$machine_name,"loginwks",$ts; if (!exists($LoggedErrors{"loginwks".$machine_name})) {$error_message=$error_message."logon_raf_smartview: $machine_name Service:".$Service{'RAFWEB'}." FAILED\n";}}
case "4" { printf $FH "%s_%s_%s\n",$machine_name,"loginwks",$ts; if (!exists($LoggedErrors{"loginwks".$machine_name})) {$error_message=$error_message."logon_raf_smartview: $machine_name Service:".$Service{'RAFAGENT'}." FAILED\n";}}
else {}
}
}
else { print "No application check found for type $service_id\n"};
}
}
}
if (length($error_message)>0) { email_alert("Hyperion Log Alert",$error_message,\@OPSEmailNotify);}
}
sub restart_service {
my $computer_name=shift @_;
my $service_name=shift @_;
system("sc stop $service_name");
sleep(30);
system("sc start $service_name");
}
sub port_check () {
my $machine_name=shift @_;
my $service_id=shift @_;
my $sock = new IO::Socket::INET (
PeerAddr => $machine_name,
PeerPort => $Ports{$service_id},
Proto => 'tcp',
Timeout => '2',
);
if ($sock) { close($sock); return 0; }
else { return 1;}
}
sub monitor_logs {
my $FH = shift @_;
my $ts;
my $error_message="";
print "Monitor Logs\n";
while ( my ($app_id, $log_path) = each(%Logs) ) {
if (-f $log_path) {
open (FILE, "< $log_path") or die("Cannot open input file $log_path\n");
while () {
if (substr($_, 0, 1) eq "<") { $service_message=$_; my $junk; ($ts,$junk)=split(/\>/);
($junk,$ts)=split(/\0) { email_alert("Hyperion Log Alert",$error_message,\@OPSEmailNotify);}
}
sub logon_raf_smartview() {
my $SERVER=shift;
my $WEBSERVER=shift;
my $USER=shift;
my $PASSWORD=shift;
my $CLIENT_VERSION="4.2.0.0.0";
my $SERVER_PORT;
if ($WEBSERVER !~ m/IIS/) { $SERVER_PORT=$SERVER.":19000"; }
else {$SERVER_PORT=$SERVER;}
my $userAgent = LWP::UserAgent->new(agent => 'HttpApp/1.0');
# Store Cookies
$userAgent->cookie_jar(
HTTP::Cookies->new(
file => 'mycookies.txt',
autosave => 1
)
);
my $message = "<req_ConnectToProvider>".$CLIENT_VERSION."en_US";
my $response = $userAgent->request(POST 'http://'.$SERVER_PORT.'/workspace/SmartViewProviders',
Content_Type => 'text/xml',
Content => $message);
if (!$response->is_success || $response->as_string !~ m/Oracle Enterprise Performance Management System Workspace/) {
print("login_raf_smartview: Failed to receive workspace response from $SERVER_PORT, check Hyperion Foundation Services - Managed Server\n");
return 2;
}
my $message = "<req_GetProvisionedDataSources>";
my $response = $userAgent->request(POST 'http://'.$SERVER_PORT.'/workspace/SmartViewProviders',
Content_Type => 'text/xml',
Content => $message);
if (!$response->is_success || $response->as_string !~ m/User authentication needed/) {
print("login_raf_smartview: Failed to receive workspace authentication challenge from $SERVER_PORT, check Hyperion Foundation Services - Managed Server\n");
return 2;
}
my $message = "<req_GetProvisionedDataSources>".$USER."".$PASSWORD."";
my $response = $userAgent->request(POST 'http://'.$SERVER_PORT.'/workspace/SmartViewProviders',
Content_Type => 'text/xml',
Content => $message);
if ($response->is_success && $response->as_string =~ m/Invalid login/) {
print("login_raf_smartview: Invalid username or password passed to login_raf_smartview function in monitoring script\n");
return 1;
}
if (!$response->is_success || $response->as_string !~ m/\<sso\>/) {
print("login_raf_smartview: Failed to receive sso token from $SERVER_PORT, check Hyperion Foundation Services - Managed Server\n");
return 2;
}
my $sso_token = substr($response->as_string,index($response->as_string,"")+5,index($response->as_string,"")-index($response->as_string,"")-5);
$message="<req_GetProvisionedDataSources>".$sso_token."";
my $response = $userAgent->request(POST 'http://'.$SERVER_PORT.'/workspace/SmartViewProviders',
Content_Type => 'text/xml',
Content => $message);
if (!$response->is_success || $response->as_string !~ m/res_GetProvisionedDataSources/) {
print("login_raf_smartview: Failed to receive response to GetProvisionedDataSources request from $SERVER_PORT, check Hyperion Foundation Services - Managed Server\n");
return 2;
}
my $message = "rcp_version=1.4&sso_token=".uri_escape($sso_token)."&applicationtype=officeAddin&applicationversion=1.0.0&format=excel.2003&hycmnaddin18467=41&action=server";
my $response = $userAgent->request(POST 'http://'.$SERVER_PORT.'/raframework/browse/listXML',
Content_Type => 'application/x-www-form-urlencoded;charset=UTF-8',
Content => $message);
if (!$response->is_success && $response->as_string =~ m/Service Unavailable/) {
print("login_raf_smartview: Failed to conect to $SERVER_PORT. Check Hyperion Reporting and Analysis Framework Web Application\n");
return 3;
}
if ($response->is_success && $response->as_string =~ m/port 6800/) {
print("login_raf_smartview: Failed to conect. Server cannot connect to port 6800, check Hyperion Reporting Analysis Framework\n");
return 4;
}
print("login_raf_smartview: passed for $SERVER_PORT\n");
return 0;
}
sub downtime_check() {
my $service_id=shift @_;
my $system_downtime=0;
my ($seconds,$minute,$hour,$day,$month,$year,$wday,$yday,$isdst)=localtime(time);
my $hourmin=$hour*100 + $minute;
if (exists($Downtime{$service_id})) {
my($dow,$start_time,$end_time)=split(/_/,$Downtime{$service_id});
if ( (uc($dow) eq "ALL" || $day_hash{uc($dow)} == $wday)&& $hourmin>=$start_time && $hourmin<=$end_time) { return 1;} }
return 0; }
sub email_alert {
my $SUBJ=shift;
my $MESSAGE=shift;
my $NOTIFY_USER_ARRAY=shift;
my $MAIL_SERVER='smtpl.myco.com';
my $BATCH_USER='Hyperion_Mon@myco.com';
my $mailto;
my $smtp = Net::SMTP->new($MAIL_SERVER);
print $smtp->banner();
$smtp->mail($BATCH_USER);
print $smtp->code();
print $smtp->message();
$smtp->recipient(@$NOTIFY_USER_ARRAY);
print $smtp->code();
print $smtp->message();
$smtp->data();
foreach $mailto (@$NOTIFY_USER_ARRAY) {
print "Notifying $mailto \n";
$smtp->datasend("To: $mailto\n");
}
$smtp->datasend("Subject: $ENV{COMPUTERNAME} - $SUBJ\n\n");
$smtp->datasend("\n");
$smtp->datasend("$MESSAGE\n");
$smtp->dataend();
print $smtp->code();
print $smtp->message();
$smtp->quit;
}



