2012年7月5日 星期四

Openstack中虛擬機的監控

轉自  http://livemoon.dyndns.org/life/2011/12/openstack-1.html

本文涉及的程序代码均可以从我的github上下载:https://github.com/livemoon/openstack/tree/master/libvirt_code 并且持续更新代码
虛擬化使用kvm, 使用libvirt作為C API
基本思想:host負責運行程序,採集數據,額外一台服務器作為server收集每台host的數據進行分析
程序介紹:
首先我們需要打開一個和hypervisor的連接,需要一個virConnectPtr的指針
virConnectOpenReadOnly(char *) 返回的就是這麼一個指針。初始化程序例如:
void conn_init(char *ip, virConnectPtr *conn) {
    *conn = NULL; /* the hypervisor connection */
    char *p;
    p = (char *)malloc(35*sizeof(char));

    *conn = virConnectOpenReadOnly(p);
    free(p);
    if (*conn == NULL) {
        fprintf(stderr, "Failed to connect to hypervisor\n");
    }

}
第二個參數是一個指向virConnectPtr變量的指針,這裡的p指向的是類似“qemu+ssh://10.0.0.1/system”的字符串,10.0.0.1是你的host ip
關閉連接的函數
void conn_close(virConnectPtr *conn) {
    if (*conn != NULL)
        virConnectClose(*conn);
}
現在我們有了一個指向host的hypervisor的連接,我們可以用他來獲得host上跑的虛擬機的情況
void list_id_domain(virConnectPtr conn) {

    int ids[10];
    int maxids=10;
    int num, i;

    num = virConnectListDomains(conn, ids, maxids);

    for(i = 0;i < num;i++) {
        printf("%d\n",ids[i]);
    }
}
這個函數使用上面得到的conn這個指針,列出host上跑的實例的id號
有了id號我們就可以獲取每個實例的詳細信息,假設我有一個id為7的虛擬機實例:
virDomainPtr dom = NULL;
dom = virDomainLookupByID(conn, 7);
dom這個變量就是以後我們要一直用到的,釋放函數:
virDomainFree(dom);
cpu監控程序:
void list_info_domain(virDomainPtr domain) {

    virDomainInfo info;
    int interval = 2;
    struct timeval startTime;
    struct timeval endTime;
    int realTime;
    int cpuTime;
    double cpuUsage;

    virDomainGetInfo(domain, &info);

    unsigned long long startCpuTime = info.cpuTime;
    if (gettimeofday(&startTime, NULL) == -1) {
        printf("Failed to get start time\n");
    }
    sleep(interval);
    virDomainGetInfo(domain, &info);
    unsigned long long endCpuTime = info.cpuTime;
    if (gettimeofday(&endTime, NULL) == -1) {
        printf("Failed to get end time\n");
    }

    cpuTime = (endCpuTime - startCpuTime)/1000;
    realTime = 1000000 * (endTime.tv_sec - startTime.tv_sec) + (endTime.tv_usec - startTime.tv_usec);
    cpuUsage = cpuTime / (double)(realTime);

    printf("\t\tstate is %d\n", info.state);
    printf("\t\tvCPU is %d\n", info.nrVirtCpu);
    printf("\t\tMAXmemory is %ld\n", info.maxMem/1024);
    printf("\t\tmemory is %ld\n", info.memory/1024);
    printf("\t\tcpuUsage is %.2f%\n", cpuUsage*100);

}
解釋一下程序,首先virDomainGetInfo函數,傳入剛才我們得到的domain,另外一個參數是要返回的virDomainInfo 的結構體變量,其中包含了cpu個數,分配的時間,和分配的mem信息。我們分別取了間隔為2妙的info信息,使用裡面的info.cpuTime運行 時間,把後一次減去前一次,然後再除以實際的gettimeofday函數得到host的cpu運行時間,得到一個近似的百分比,反應的是此虛擬機實例的 cpu使用情況在整個host的cpu使用情況中的百分比。
磁盤監控:
void list_disk_domain(virDomainPtr domain) {

    virDomainBlockStatsStruct stats;

    size_t size;
    const char *disk = "vda";
    size = sizeof(stats);
    int interval = 2;

    virDomainBlockStats(domain, disk, &stats, size);
    long long start_rd_bytes = stats.rd_bytes;
    long long start_wr_bytes = stats.wr_bytes;
    sleep(interval);
    virDomainBlockStats(domain, disk, &stats, size);
    long long end_rd_bytes = stats.rd_bytes;
    long long end_wr_bytes = stats.wr_bytes;
    
    long rd_bytes = end_rd_bytes - start_rd_bytes;
    long wr_bytes = end_wr_bytes - start_wr_bytes;
    int rd_usage = rd_bytes/interval;
    int wr_usage= wr_bytes/interval;
// printf("%s:\n", virDomainGetName(domain));
    printf("\t\tread: %dbytes/s\n", rd_usage);
    printf("\t\twrite: %dbytes/s\n", wr_usage);
    printf("\t\trd_req: %lld\n", stats.rd_req);
    printf("\t\trd_bytes: %lld\n", stats.rd_bytes);
    printf("\t\twr_req: %lld\n", stats.wr_req);
    printf("\t\twr_bytes: %lld\n", stats.wr_bytes);
}
磁盤使用情況的方法和cpu類似,這裡用到的是virDomainBlockStats(domain, disk, &stats, size)這個函數,disk指向的字符串這裡為“vda”,實際使用甚麼你要根據xml裡面的信息
網絡部份這裡我們要用到libvirt中的Network Filters openstack实例的libvirt.xml在nova.conf中定义的实例目录下,里面有
...










...
filterref中包括了其他的filters,默认在/etc/libvirt/nwfilter目录下
你也可以使用virsh管理工具查看具体filter内容
# virsh nwfilter-dumpxml nova-instance-instance-00000007-02163e23f37d
Filtering chains
Filtering chains就是你在目录下看到的许多filters的文件。譬如有arp, 有dhcp, mac等
在程序中使用libvirt
int list_network_domain(virDomainPtr domain) {

    const char *path;
    virDomainInterfaceStatsStruct stats;
    size_t size;

    size = sizeof(stats);
    path = "vnet1";
    int interval = 2;

    if( virDomainInterfaceStats(domain, path, &stats, size) )
        return FALSE;
    long long start_rx_bytes = stats.rx_bytes;
    long long start_tx_bytes = stats.tx_bytes;

    sleep(interval);

    if( virDomainInterfaceStats(domain, path, &stats, size) )
        return FALSE;
    long long end_rx_bytes = stats.rx_bytes;
    long long end_tx_bytes = stats.tx_bytes;

    int rx_usage = (end_rx_bytes - start_rx_bytes)/interval;
    int tx_usage = (end_tx_bytes - start_tx_bytes)/interval;

    printf("\t\trx usage: %d bytes/s", rx_usage);
    printf("\trx bytes: %lld bytes", stats.rx_bytes);
    printf("\t\trx packets: %lld", stats.rx_packets);
    printf("\trx errs: %lld\n", stats.rx_errs);
    printf("\t\ttx usage: %d bytes/s", tx_usage);
    printf("\ttx bytes: %lld bytes", stats.tx_bytes);
    printf("\t\ttx packets: %lld", stats.tx_packets);
    printf("\ttx errs: %lld\n", stats.tx_errs);
}

这个函數很重要,其中將返回stats指針所指向的內容便是domain中各個網口的信息。這裡有個問題,就是path的值,他是由domain中
網卡的名字,不是eth0也不是em0等,而是要通過獲取domain的xml中網卡的interface部分
中這一部分中的vnet0,同理前面說得disk裡面的“vda”也是從這裡獲取,
所以你需要運行一下這個程序
    char *xmldesc;
    xmldesc = virDomainGetXMLDesc(dom, 0);
    if ((fp = fopen(virDomainGetName(dom), "w")) == NULL) {
        printf("Cannot open file test\n");
    }
    fprintf(fp,xmldesc);
    fclose(fp);
    free(xmldesc);
返回的是字符串指針指向了xml的內容,記住這個程序運行好需要free指針。

How do I calculate %CPU in my own libvirt programs?

Virt-top FAQ ( http://people.redhat.com/~rjones/virt-top/faq.html#calccpu )

Simple %CPU usage for a domain is calculated by sampling virDomainGetInfo periodically and looking at the virDomainInfo cpuTime field. This 64 bit field counts nanoseconds of CPU time used by the domain since the domain booted.
Let t be the number of seconds between samples. (Make sure that t is measured as accurately as possible, using something like gettimeofday(2) to measure the real sampling interval).
Let cpu_time_diff be the change in cpuTime over this time, which is the number of nanoseconds of CPU time used by the domain, ie:
cpu_time_diff = cpuTimenow — cpuTimet seconds ago
Let nr_cores be the number of processors (cores) on the system. Use virNodeGetInfo to get this.
Then, %CPU used by the domain is:
%CPU = 100 × cpu_time_diff / (t × nr_cores × 109)
Because sampling doesn't happen instantaneously, this can be greater than 100%. This is particularly a problem where you have many domains and you have to make a virDomainGetInfo call for each one.

php + sudo + iptables

在 CentOS 5.x 前可以很簡單的就用 php 來控制 iptable,但系統升級到 6.2 以後,卻常發生 iptable 指令變成 zombie的狀況,Google 了很久才找到一篇可能是解決方案的文章 ( http://stackoverflow.com/questions/8387077/starting-a-daemon-from-php )。
解決方法如下:

Try appending > /dev/null 2>&1 & to the command.
ex 
exec("sudo /etc/init.d/daemonToStart > /dev/null 2>&1 &");