jubatus anomalyのnum_rules動作検証(jubaanomalyにデータを投入するツール)

説明

jubaanomalyにデータを学習させたり、外れ値を計算させるツール。

指定可能なオプションは以下の通り。

オプション	意味
-t	num_rulesのタイプ(num or str or log)を指定。必須指定オプション
-c	データ学習前に既存データをクリアする。省略可能。デフォルトは無効
-s	学習データをファイルに保存する。省略可能。デフォルトは無効
-d	学習済みデータのID一覧を出力する。省略可能。デフォルトは無効
-o	データを学習せず外れ値の計算だけやる。省略可能。デフォルトは無効

-dは-c,-s,-o,-tと排他
-sは-oと排他

anom_test.py

# -*- coding: utf-8 -*-

import signal
import argparse
import numpy as np
import matplotlib.pyplot as plt

from jubatus.anomaly import client
from jubatus.common import Datum

parser = argparse.ArgumentParser(description='This is jubaanomaly test tool')
parser.add_argument('name',                                help='data name')
parser.add_argument('-t', '--type',      dest='type',      help='type num_rules')
parser.add_argument('-c', '--clear',     dest='clear',     help='clear data',     action="store_true", default=False)
parser.add_argument('-s', '--save',      dest='save',      help='save or not',    action="store_true", default=False)
parser.add_argument('-d', '--display',   dest='display',   help='save or not',    action="store_true", default=False)
parser.add_argument('-o', '--only_calc', dest='only_calc', help='only calc',      action="store_true", default=False)

args = parser.parse_args()


def display_result(rdata, idata):
    dpoint = 5
    idx = 0
    result = dict()
    inf = float("inf")
    for i in idata:
        print(i, rdata[idx])
        if rdata[idx] != inf:
            result[i] = round(rdata[idx], dpoint)
        idx += 1

    plt.plot(list(result.keys()), list(result.values()), 'o')
    plt.title(u'Outliers')
    plt.show()


def connect_srv():
    try:
        anom = client.Anomaly("127.0.0.1", 9199, args.name)
        if args.clear:
            anom.clear()
    except:
        err_fin('failed to connect to server')
    return anom


def study_data(anom):
    idata = list()
    rdata = list()
    while True:
        try:
            line = float(input().strip())
            idata.append(line)
        except EOFError:
            break

        datum = Datum()
        if args.type == 'num':
            d_type = 'd1'
        elif args.type == 'str':
            d_type = 'd2'
        elif args.type == 'log':
            d_type = 'd3'
        datum.add_number(d_type, line)
        if args.only_calc:
            ret = anom.calc_score(datum)
#            print(ret)
        else:
            ret = anom.add(datum)
            #if (ret.score != float('Inf')) and (ret.score != 1.0):
            print(ret)
        rdata.append(ret)

    if args.save:
        anom.save(args.name)
    return rdata, idata


def do_exit(sig, stack):
    print('You pressed Ctrl+C.')
    print('Stop running the job.')
    exit(0)


def display_data(anom):
    try:
        data = anom.get_all_rows()
    except:
        err_fin(args.name + ' failed to get data')
    if not data:
        print(args.name + ' has no data')
    else:
        for i in sorted(data, key=float):
            print(i)


def err_fin(mes):
   print('Error: ' + mes)
   exit(1)


def chk_args():
    # -dと-c,-s,-n,-oは排他
    type_rules = ['num', 'str', 'log']
    if args.display and (args.clear or args.save or args.type or args.only_calc):
        err_fin('can not use -d with -c,-s,-o,-t')
    if args.save and args.only_calc:
        err_fin('can not use -s with -o')
    if not args.display and not args.type in type_rules:
        err_fin('invlid rule_type. ' + 'should be ' + ', '.join(type_rules))


if __name__ == '__main__':
    signal.signal(signal.SIGINT, do_exit)
    chk_args()
    anom = connect_srv()
    if args.display:
        display_data(anom)
        exit()
    else:
        rdata, idata = study_data(anom)
        if args.only_calc:
            display_result(rdata, idata)

使い方例

データを初期化してから学習させる(num_rulesのtypeはstr、学習データのファイル保存なし)

10個のデータを学習

$ seq 1 10 | python ./anom_test.py -c -t str test
id_with_score{id: 220, score: inf}
id_with_score{id: 221, score: 1.0}
id_with_score{id: 222, score: 0.9926380515098572}
id_with_score{id: 223, score: 1.0}
id_with_score{id: 224, score: 1.0065279006958008}
id_with_score{id: 225, score: 1.010113000869751}
id_with_score{id: 226, score: 0.9977192282676697}
id_with_score{id: 227, score: 0.993184506893158}
id_with_score{id: 228, score: 0.9930071830749512}
id_with_score{id: 229, score: 0.9959399700164795}

学習済みデータの確認

$ python ./anom_test.py -d test
220
221
222
223
224
225
226
227
228
229
$

外れ値計算 & 分布図出力

$ seq 1 10 | python ./anom_test.py -o -t str test
1.0 0.9999997615814209
2.0 0.9999999403953552
3.0 1.0
4.0 0.9999997615814209
5.0 0.9999998211860657
6.0 1.0
7.0 1.0
8.0 0.9999999403953552
9.0 1.0
10.0 1.0