Jul 9, 2020

[unix][programming] EINTR and What It Is Good For (http://250bpm.com/blog:12)

Reference:
EINTR and What It Is Good For Martin Sústrik, zeromq


Before we dive, this concept is well mentioned in Richard Stevens's UNIX Network Programming - Ch.20.5, thus Martin Sústrik's blog post can be considered as a recap of EINTR error.

Rule of thumb: 

When handling EINTR error, check any conditions that may have been altered by signal handlers.
Then restart the blocking function.

Additionally, If you are implementing a blocking function yourself, take care to return EINTR when you encounter a signal.

Beware those 2 POSIX functions which don't honor EINTR


Consider this code:
volatile int stop = 0;

void handler (int)
{
    stop = 1;
}

void event_loop (int sock)
{
    signal (SIGINT, handler);

    while (1) {
        if (stop) {  // never hit if recv is blocked
            printf ("do cleanup\n");
            return;
        }
        char buf [1];
        recv (sock, buf, 1, 0);  // block call
        printf ("perform an action\n");
    }
}

Above is the reason POSIX has EINTR error.

Modify code to this:
noted that to make blocking functions like recv return EINTR you may have to use sigaction() with SA_RESTART set to zero instead of signal() on some operating systems.
volatile int stop = 0;

void handler (int)
{
    stop = 1;
}

void event_loop (int sock)
{
    signal (SIGINT, handler);

    while (1) {
        if (stop) {
            printf ("do cleanup\n");
            return;
        }
        char buf [1];
        int rc = recv (sock, buf, 1, 0);
        if (rc == -1 && errno == EINTR)  // if interrupted by signal, continue while loop
            continue;
        printf ("perform an action\n");
    }
}


But, this isn't a graceful shutdown.
We have to exhaust the incoming message before exit.
When you press Ctrl+C, program exits performing the clean-up beforehand.

The morale of this story is that common advice to just restart the blocking function when EINTR is returned doesn't quite work:
volatile int stop = 0;

void handler (int)
{
    stop = 1;
}

void event_loop (int sock)
{
    signal (SIGINT, handler);

    while (1) {
        if (stop) {
            printf ("do cleanup\n");
            return;
        }
        char buf [1];
        while (1) {
            // even signaled with stop == 1, and no more incoming data, we are stucked here..
            int rc = recv (sock, buf, 1, 0); 
            // if signaled, continue to recv, otherwise, message consumed, break inner loop
            if (rc == -1 && errno == EINTR) 
                continue;
            break;
        }
        printf ("perform an action\n");
    }
}


Even EINTR is not completely water-proof, check this code:
volatile int stop = 0;

void handler (int)
{
    stop = 1;
}

void event_loop (int sock)
{
    signal (SIGINT, handler);

    while (1) {
        if (stop) {
            printf ("do cleanup\n");
            return;
        }

        /*  What if signal handler is executed at this point? */
        /* pressing Ctrl+C for the second time sorts the problem out */

        char buf [1];
        // even stop == 1, and no more data coming, we are stucked here...
        int rc = recv (sock, buf, 1, 0);
        if (rc == -1 && errno == EINTR)
            continue;
        printf ("perform an action\n");
    }
}



Ultimate solution

use pselect, which mask the signals before calling pselect, and allow signal to pass during the pselect(which if signal occurs, pselect returns).

select
int select(int nfds,
                    fd_set *readfds,
                    fd_set *writefds,
                    fd_set *exceptfds, 
                    struct timeval *timeout);
                    

nfds should be n + 1 (exclusive bound), this optimizing the linear check of fds.

Be sure to check the definition under what conditions is a Descriptor ready for network FDs.

Notice that when an error occurs on a socket, both readable and writable is marked by select.

Although the timeval structure lets us specify a resolution in microseconds, the actual resolution supported by the kernel is often more coarse.
Many Unix kernels round the timeout value up to a multiple of 10ms. There is also a scheduling latency involved, meaning it takes some time after the timer expires before the kernel schedules this process to run.


void FD_ZERO(fd_set *fdset);
void FD_SET(int fd, fd_set *fdset);
void FD_CLR(int fd, fd_set *fdset);
int FD_ZERO(int fd, fd_set *fdset);




pselect
 int pselect(
            int nfds,
            fd_set *restrict readfds,
            fd_set *restrict writefds,
            fd_set *restrict errorfds,
            const struct timespec *restrict timeout,
            const sigset_t *restrict sigmask);

example:
// https://github.com/k84d/unpv13e/blob/master/bcast/dgclibcast4.c
#include "unp.h"

static void recvfrom_alarm(int);

void
dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t servlen)
{
 int    n;
 const int  on = 1;
 char   sendline[MAXLINE], recvline[MAXLINE + 1];
 fd_set   rset;
 sigset_t  sigset_alrm, sigset_empty;
 socklen_t  len;
 struct sockaddr *preply_addr;
 
 preply_addr = Malloc(servlen);

 Setsockopt(sockfd, SOL_SOCKET, SO_BROADCAST, &on, sizeof(on));

 FD_ZERO(&rset);

 Sigemptyset(&sigset_empty);
 Sigemptyset(&sigset_alrm);
 Sigaddset(&sigset_alrm, SIGALRM);

 Signal(SIGALRM, recvfrom_alarm);

 while (Fgets(sendline, MAXLINE, fp) != NULL) {
  Sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);

  Sigprocmask(SIG_BLOCK, &sigset_alrm, NULL);
  alarm(5);
  for ( ; ; ) {
   FD_SET(sockfd, &rset);
   n = pselect(sockfd+1, &rset, NULL, NULL, NULL, &sigset_empty);
   if (n < 0) {
    if (errno == EINTR)
     break;
    else
     err_sys("pselect error");
   } else if (n != 1)
    err_sys("pselect error: returned %d", n);

   len = servlen;
   n = Recvfrom(sockfd, recvline, MAXLINE, 0, preply_addr, &len);
   recvline[n] = 0; /* null terminate */
   printf("from %s: %s",
     Sock_ntop_host(preply_addr, len), recvline);
  }
 }
 free(preply_addr);
}

static void
recvfrom_alarm(int signo)
{
 return;  /* just interrupt the recvfrom() */
}


poll
int poll(
        struct pollfd *fds,
        nfds_t nfds,
        const struct timespec *tmo_p,
        const sigset_t *sigmask);
        

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.