diff options
Diffstat (limited to 'sys-utils/ipc.texi')
-rw-r--r-- | sys-utils/ipc.texi | 1310 |
1 files changed, 1310 insertions, 0 deletions
diff --git a/sys-utils/ipc.texi b/sys-utils/ipc.texi new file mode 100644 index 000000000..cd9127167 --- /dev/null +++ b/sys-utils/ipc.texi @@ -0,0 +1,1310 @@ +\input texinfo @c -*-texinfo-*- +@comment %**start of header (This is for running Texinfo on a region.) +@setfilename ipc.info +@settitle Inter Process Communication. +@setchapternewpage odd +@comment %**end of header (This is for running Texinfo on a region.) + +@ifinfo +This file documents the System V style inter process communication +primitives available under linux. + +Copyright @copyright{} 1992 krishna balasubramanian + +Permission is granted to use this material and the accompanying +programs within the terms of the GNU GPL. +@end ifinfo + +@titlepage +@sp 10 +@center @titlefont{System V Inter Process Communication} +@sp 2 +@center krishna balasubramanian, + +@comment The following two commands start the copyright page. +@page +@vskip 0pt plus 1filll +Copyright @copyright{} 1992 krishna balasubramanian + +Permission is granted to use this material and the accompanying +programs within the terms of the GNU GPL. +@end titlepage + + + +@node top, Overview, Notes, (dir) +@chapter System V IPC. + +These facilities are provided to maintain compatibility with +programs developed on system V unix systems and others +that rely on these system V mechanisms to accomplish inter +process communication (IPC).@refill + +The specifics described here are applicable to the Linux implementation. +Other implementations may do things slightly differently. + +@menu +* Overview:: What is system V ipc? Overall mechanisms. +* Messages:: System calls for message passing. +* Semaphores:: System calls for semaphores. +* Shared Memory:: System calls for shared memory access. +* Notes:: Miscellaneous notes. +@end menu + +@node Overview, example, top, top +@section Overview + +@noindent System V IPC consists of three mechanisms: + +@itemize @bullet +@item +Messages : exchange messages with any process or server. +@item +Semaphores : allow unrelated processes to synchronize execution. +@item +Shared memory : allow unrelated processes to share memory. +@end itemize + +@menu +* example:: Using shared memory. +* perms:: Description of access permissions. +* syscalls:: Overview of ipc system calls. +@end menu + +Access to all resources is permitted on the basis of permissions +set up when the resource was created.@refill + +A resource here consists of message queue, a semaphore set (array) +or a shared memory segment.@refill + +A resource must first be allocated by a creator before it is used. +The creator can assign a different owner. After use the resource +must be explicitly destroyed by the creator or owner.@refill + +A resource is identified by a numeric @var{id}. Typically a creator +defines a @var{key} that may be used to access the resource. The user +process may then use this @var{key} in the @dfn{get} system call to obtain +the @var{id} for the corresponding resource. This @var{id} is then used for +all further access. A library call @dfn{ftok} is provided to translate +pathnames or strings to numeric keys.@refill + +There are system and implementation defined limits on the number and +sizes of resources of any given type. Some of these are imposed by the +implementation and others by the system administrator +when configuring the kernel (@xref{msglimits}, @xref{semlimits}, +@xref{shmlimits}).@refill + +There is an @code{msqid_ds}, @code{semid_ds} or @code{shmid_ds} struct +associated with each message queue, semaphore array or shared segment. +Each ipc resource has an associated @code{ipc_perm} struct which defines +the creator, owner, access perms ..etc.., for the resource. +These structures are detailed in the following sections.@refill + + + +@node example, perms, Overview, Overview +@section example + +Here is a code fragment with pointers on how to use shared memory. The +same methods are applicable to other resources.@refill + +In a typical access sequence the creator allocates a new instance +of the resource with the @code{get} system call using the IPC_CREAT +flag.@refill + +@noindent creator process:@* + +@example +#include <sys/shm.h> +int id; +key_t key; +char proc_id = 'C'; +int size = 0x5000; /* 20 K */ +int flags = 0664 | IPC_CREAT; /* read-only for others */ + +key = ftok ("~creator/ipckey", proc_id); +id = shmget (key, size, flags); +exit (0); /* quit leaving resource allocated */ +@end example + +@noindent +Users then gain access to the resource using the same key.@* +@noindent +Client process: +@example +#include <sys/shm.h> +char *shmaddr; +int id; +key_t key; +char proc_id = 'C'; + +key = ftok ("~creator/ipckey", proc_id); + +id = shmget (key, 0, 004); /* default size */ +if (id == -1) + perror ("shmget ..."); + +shmaddr = shmat (id, 0, SHM_RDONLY); /* attach segment for reading */ +if (shmaddr == (char *) -1) + perror ("shmat ..."); + +local_var = *(shmaddr + 3); /* read segment etc. */ + +shmdt (shmaddr); /* detach segment */ +@end example + +@noindent +When the resource is no longer needed the creator should remove it.@* +@noindent +Creator/owner process 2: +@example +key = ftok ("~creator/ipckey", proc_id) +id = shmget (key, 0, 0); +shmctl (id, IPC_RMID, NULL); +@end example + + +@node perms, syscalls, example, Overview +@section Permissions + +Each resource has an associated @code{ipc_perm} struct which defines the +creator, owner and access perms for the resource.@refill + +@example +struct ipc_perm + key_t key; /* set by creator */ + ushort uid; /* owner euid and egid */ + ushort gid; + ushort cuid; /* creator euid and egid */ + ushort cgid; + ushort mode; /* access modes in lower 9 bits */ + ushort seq; /* sequence number */ +@end example + +The creating process is the default owner. The owner can be reassigned +by the creator and has creator perms. Only the owner, creator or super-user +can delete the resource.@refill + +The lowest nine bits of the flags parameter supplied by the user to the +system call are compared with the values stored in @code{ipc_perms.mode} +to determine if the requested access is allowed. In the case +that the system call creates the resource, these bits are initialized +from the user supplied value.@refill + +As for files, access permissions are specified as read, write and exec +for user, group or other (though the exec perms are unused). For example +0624 grants read-write to owner, write-only to group and read-only +access to others.@refill + +For shared memory, note that read-write access for segments is determined +by a separate flag which is not stored in the @code{mode} field. +Shared memory segments attached with write access can be read.@refill + +The @code{cuid}, @code{cgid}, @code{key} and @code{seq} fields +cannot be changed by the user.@refill + + + +@node syscalls, Messages, perms, Overview +@section IPC system calls + +This section provides an overview of the IPC system calls. See the +specific sections on each type of resource for details.@refill + +Each type of mechanism provides a @dfn{get}, @dfn{ctl} and one or more +@dfn{op} system calls that allow the user to create or procure the +resource (get), define its behaviour or destroy it (ctl) and manipulate +the resources (op).@refill + + + +@subsection The @dfn{get} system calls + +The @code{get} call typically takes a @var{key} and returns a numeric +@var{id} that is used for further access. +The @var{id} is an index into the resource table. A sequence +number is maintained and incremented when a resource is +destroyed so that acceses using an obselete @var{id} is likely to fail.@refill + +The user also specifies the permissions and other behaviour +charecteristics for the current access. The flags are or-ed with the +permissions when invoking system calls as in:@refill +@example +msgflg = IPC_CREAT | IPC_EXCL | 0666; +id = msgget (key, msgflg); +@end example +@itemize @bullet +@item +@code{key} : IPC_PRIVATE => new instance of resource is initialized. +@item +@code{flags} : +@itemize @asis +@item +IPC_CREAT : resource created for @var{key} if it does not exist. +@item +IPC_CREAT | IPC_EXCL : fail if resource exists for @var{key}. +@end itemize +@item +returns : an identifier used for all further access to the resource. +@end itemize + +Note that IPC_PRIVATE is not a flag but a special @code{key} +that ensures (when the call is successful) that a new resource is +created.@refill + +Use of IPC_PRIVATE does not make the resource inaccessible to other +users. For this you must set the access permissions appropriately.@refill + +There is currently no way for a process to ensure exclusive access to a +resource. IPC_CREAT | IPC_EXCL only ensures (on success) that a new +resource was initialized. It does not imply exclusive access.@refill + +@noindent +See Also : @xref{msgget}, @xref{semget}, @xref{shmget}.@refill + + + +@subsection The @dfn{ctl} system calls + +Provides or alters the information stored in the structure that describes +the resource indexed by @var{id}.@refill + +@example +#include <sys/msg.h> +struct msqid_ds buf; +err = msgctl (id, IPC_STAT, &buf); +if (err) + !$#%* +else + printf ("creator uid = %d\n", buf.msg_perm.cuid); + .... +@end example + +@noindent +Commands supported by all @code{ctl} calls:@* +@itemize @bullet +@item +IPC_STAT : read info on resource specified by id into user allocated +buffer. The user must have read access to the resource.@refill +@item +IPC_SET : write info from buffer into resource data structure. The +user must be owner creator or super-user.@refill +@item +IPC_RMID : remove resource. The user must be the owner, creator or +super-user.@refill +@end itemize + +The IPC_RMID command results in immediate removal of a message +queue or semaphore array. Shared memory segments however, are +only destroyed upon the last detach after IPC_RMID is executed.@refill + +The @code{semctl} call provides a number of command options that allow +the user to determine or set the values of the semaphores in an array.@refill + +@noindent +See Also: @xref{msgctl}, @xref{semctl}, @xref{shmctl}.@refill + + +@subsection The @dfn{op} system calls + +Used to send or receive messages, read or alter semaphore values, +attach or detach shared memory segments. +The IPC_NOWAIT flag will cause the operation to fail with error EAGAIN +if the process has to wait on the call.@refill + +@noindent +@code{flags} : IPC_NOWAIT => return with error if a wait is required. + +@noindent +See Also: @xref{msgsnd},@xref{msgrcv},@xref{semop},@xref{shmat}, +@xref{shmdt}.@refill + + + +@node Messages, msgget, syscalls, top +@section Messages + +A message resource is described by a struct @code{msqid_ds} which is +allocated and initialized when the resource is created. Some fields +in @code{msqid_ds} can then be altered (if desired) by invoking @code{msgctl}. +The memory used by the resource is released when it is destroyed by +a @code{msgctl} call.@refill + +@example +struct msqid_ds + struct ipc_perm msg_perm; + struct msg *msg_first; /* first message on queue (internal) */ + struct msg *msg_last; /* last message in queue (internal) */ + time_t msg_stime; /* last msgsnd time */ + time_t msg_rtime; /* last msgrcv time */ + time_t msg_ctime; /* last change time */ + struct wait_queue *wwait; /* writers waiting (internal) */ + struct wait_queue *rwait; /* readers waiting (internal) */ + ushort msg_cbytes; /* number of bytes used on queue */ + ushort msg_qnum; /* number of messages in queue */ + ushort msg_qbytes; /* max number of bytes on queue */ + ushort msg_lspid; /* pid of last msgsnd */ + ushort msg_lrpid; /* pid of last msgrcv */ +@end example + +To send or receive a message the user allocates a structure that looks +like a @code{msgbuf} but with an array @code{mtext} of the required size. +Messages have a type (positive integer) associated with them so that +(for example) a listener can choose to receive only messages of a +given type.@refill + +@example +struct msgbuf + long mtype; type of message (@xref{msgrcv}). + char mtext[1]; message text .. why is this not a ptr? +@end example + +The user must have write permissions to send and read permissions +to receive messages on a queue.@refill + +When @code{msgsnd} is invoked, the user's message is copied into +an internal struct @code{msg} and added to the queue. A @code{msgrcv} +will then read this message and free the associated struct @code{msg}.@refill + + +@menu +* msgget:: +* msgsnd:: +* msgrcv:: +* msgctl:: +* msglimits:: Implementation defined limits. +@end menu + + +@node msgget, msgsnd, Messages, Messages +@subsection msgget + +@noindent +A message queue is allocated by a msgget system call : + +@example +msqid = msgget (key_t key, int msgflg); +@end example + +@itemize @bullet +@item +@code{key}: an integer usually got from @code{ftok()} or IPC_PRIVATE.@refill +@item +@code{msgflg}: +@itemize @asis +@item +IPC_CREAT : used to create a new resource if it does not already exist. +@item +IPC_EXCL | IPC_CREAT : used to ensure failure of the call if the +resource already exists.@refill +@item +rwxrwxrwx : access permissions. +@end itemize +@item +returns: msqid (an integer used for all further access) on success. +-1 on failure.@refill +@end itemize + +A message queue is allocated if there is no resource corresponding +to the given key. The access permissions specified are then copied +into the @code{msg_perm} struct and the fields in @code{msqid_ds} +initialized. The user must use the IPC_CREAT flag or key = IPC_PRIVATE, +if a new instance is to be allocated. If a resource corresponding to +@var{key} already exists, the access permissions are verified.@refill + +@noindent +Errors:@* +@noindent +EACCES : (procure) Do not have permission for requested access.@* +@noindent +EEXIST : (allocate) IPC_CREAT | IPC_EXCL specified and resource exists.@* +@noindent +EIDRM : (procure) The resource was removed.@* +@noindent +ENOSPC : All id's are taken (max of MSGMNI id's system-wide).@* +@noindent +ENOENT : Resource does not exist and IPC_CREAT not specified.@* +@noindent +ENOMEM : A new @code{msqid_ds} was to be created but ... nomem. + + + + +@node msgsnd, msgrcv, msgget, Messages +@subsection msgsnd + +@example +int msgsnd (int msqid, struct msgbuf *msgp, int msgsz, int msgflg); +@end example + +@itemize @bullet +@item +@code{msqid} : id obtained by a call to msgget. +@item +@code{msgsz} : size of msg text (@code{mtext}) in bytes. +@item +@code{msgp} : message to be sent. (msgp->mtype must be positive). +@item +@code{msgflg} : IPC_NOWAIT. +@item +returns : msgsz on success. -1 on error. +@end itemize + +The message text and type are stored in the internal @code{msg} +structure. @code{msg_cbytes}, @code{msg_qnum}, @code{msg_lspid}, +and @code{msg_stime} fields are updated. Readers waiting on the +queue are awakened.@refill + +@noindent +Errors:@* +@noindent +EACCES : Do not have write permission on queue.@* +@noindent +EAGAIN : IPC_NOWAIT specified and queue is full.@* +@noindent +EFAULT : msgp not accessible.@* +@noindent +EIDRM : The message queue was removed.@* +@noindent +EINTR : Full queue ... would have slept but ... was interrupted.@* +@noindent +EINVAL : mtype < 1, msgsz > MSGMAX, msgsz < 0, msqid < 0 or unused.@* +@noindent +ENOMEM : Could not allocate space for header and text.@* + + + +@node msgrcv, msgctl, msgsnd, Messages +@subsection msgrcv + +@example +int msgrcv (int msqid, struct msgbuf *msgp, int msgsz, long msgtyp, + int msgflg); +@end example + +@itemize @bullet +@item +msqid : id obtained by a call to msgget. +@item +msgsz : maximum size of message to receive. +@item +msgp : allocated by user to store the message in. +@item +msgtyp : +@itemize @asis +@item +0 => get first message on queue. +@item +> 0 => get first message of matching type. +@item +< 0 => get message with least type which is <= abs(msgtyp). +@end itemize +@item +msgflg : +@itemize @asis +@item +IPC_NOWAIT : Return immediately if message not found. +@item +MSG_NOERROR : The message is truncated if it is larger than msgsz. +@item +MSG_EXCEPT : Used with msgtyp > 0 to receive any msg except of specified +type.@refill +@end itemize +@item +returns : size of message if found. -1 on error. +@end itemize + +The first message that meets the @code{msgtyp} specification is +identified. For msgtyp < 0, the entire queue is searched for the +message with the smallest type.@refill + +If its length is smaller than msgsz or if the user specified the +MSG_NOERROR flag, its text and type are copied to msgp->mtext and +msgp->mtype, and it is taken off the queue.@refill + +The @code{msg_cbytes}, @code{msg_qnum}, @code{msg_lrpid}, +and @code{msg_rtime} fields are updated. Writers waiting on the +queue are awakened.@refill + +@noindent +Errors:@* +@noindent +E2BIG : msg bigger than msgsz and MSG_NOERROR not specified.@* +@noindent +EACCES : Do not have permission for reading the queue.@* +@noindent +EFAULT : msgp not accessible.@* +@noindent +EIDRM : msg queue was removed.@* +@noindent +EINTR : msg not found ... would have slept but ... was interrupted.@* +@noindent +EINVAL : msgsz > msgmax or msgsz < 0, msqid < 0 or unused.@* +@noindent +ENOMSG : msg of requested type not found and IPC_NOWAIT specified. + + + +@node msgctl, msglimits, msgrcv, Messages +@subsection msgctl + +@example +int msgctl (int msqid, int cmd, struct msqid_ds *buf); +@end example + +@itemize @bullet +@item +msqid : id obtained by a call to msgget. +@item +buf : allocated by user for reading/writing info. +@item +cmd : IPC_STAT, IPC_SET, IPC_RMID (@xref{syscalls}). +@end itemize + +IPC_STAT results in the copy of the queue data structure +into the user supplied buffer.@refill + +In the case of IPC_SET, the queue size (@code{msg_qbytes}) +and the @code{uid}, @code{gid}, @code{mode} (low 9 bits) fields +of the @code{msg_perm} struct are set from the user supplied values. +@code{msg_ctime} is updated.@refill + +Note that only the super user may increase the limit on the size of a +message queue beyond MSGMNB.@refill + +When the queue is destroyed (IPC_RMID), the sequence number is +incremented and all waiting readers and writers are awakened. +These processes will then return with @code{errno} set to EIDRM.@refill + +@noindent +Errors: +@noindent +EPERM : Insufficient privilege to increase the size of the queue (IPC_SET) +or remove it (IPC_RMID).@* +@noindent +EACCES : Do not have permission for reading the queue (IPC_STAT).@* +@noindent +EFAULT : buf not accessible (IPC_STAT, IPC_SET).@* +@noindent +EIDRM : msg queue was removed.@* +@noindent +EINVAL : invalid cmd, msqid < 0 or unused. + + +@node msglimits, Semaphores, msgctl, Messages +@subsection Limis on Message Resources + +@noindent +Sizeof various structures: +@itemize @asis +@item +msqid_ds 52 /* 1 per message queue .. dynamic */ +@item +msg 16 /* 1 for each message in system .. dynamic */ +@item +msgbuf 8 /* allocated by user */ +@end itemize + +@noindent +Limits +@itemize @bullet +@item +MSGMNI : number of message queue identifiers ... policy. +@item +MSGMAX : max size of message. +Header and message space allocated on one page. +MSGMAX = (PAGE_SIZE - sizeof(struct msg)). +Implementation maximum MSGMAX = 4080.@refill +@item +MSGMNB : default max size of a message queue ... policy. +The super-user can increase the size of a +queue beyond MSGMNB by a @code{msgctl} call.@refill +@end itemize + +@noindent +Unused or unimplemented:@* +MSGTQL max number of message headers system-wide.@* +MSGPOOL total size in bytes of msg pool. + + + +@node Semaphores, semget, msglimits, top +@section Semaphores + +Each semaphore has a value >= 0. An id provides access to an array +of @code{nsems} semaphores. Operations such as read, increment or decrement +semaphores in a set are performed by the @code{semop} call which processes +@code{nsops} operations at a time. Each operation is specified in a struct +@code{sembuf} described below. The operations are applied only if all of +them succeed.@refill + +If you do not have a need for such arrays, you are probably better off using +the @code{test_bit}, @code{set_bit} and @code{clear_bit} bit-operations +defined in <asm/bitops.h>.@refill + +Semaphore operations may also be qualified by a SEM_UNDO flag which +results in the operation being undone when the process exits.@refill + +If a decrement cannot go through, a process will be put to sleep +on a queue waiting for the @code{semval} to increase unless it specifies +IPC_NOWAIT. A read operation can similarly result in a sleep on a +queue waiting for @code{semval} to become 0. (Actually there are +two queues per semaphore array).@refill + +@noindent +A semaphore array is described by: +@example +struct semid_ds + struct ipc_perm sem_perm; + time_t sem_otime; /* last semop time */ + time_t sem_ctime; /* last change time */ + struct wait_queue *eventn; /* wait for a semval to increase */ + struct wait_queue *eventz; /* wait for a semval to become 0 */ + struct sem_undo *undo; /* undo entries */ + ushort sem_nsems; /* no. of semaphores in array */ +@end example + +@noindent +Each semaphore is described internally by : +@example +struct sem + short sempid; /* pid of last semop() */ + ushort semval; /* current value */ + ushort semncnt; /* num procs awaiting increase in semval */ + ushort semzcnt; /* num procs awaiting semval = 0 */ +@end example + +@menu +* semget:: +* semop:: +* semctl:: +* semlimits:: Limits imposed by this implementation. +@end menu + +@node semget, semop, Semaphores, Semaphores +@subsection semget + +@noindent +A semaphore array is allocated by a semget system call: + +@example +semid = semget (key_t key, int nsems, int semflg); +@end example + +@itemize @bullet +@item +@code{key} : an integer usually got from @code{ftok} or IPC_PRIVATE +@item +@code{nsems} : +@itemize @asis +@item +# of semaphores in array (0 <= nsems <= SEMMSL <= SEMMNS) +@item +0 => dont care can be used when not creating the resource. +If successful you always get access to the entire array anyway.@refill +@end itemize +@item +semflg : +@itemize @asis +@item +IPC_CREAT used to create a new resource +@item +IPC_EXCL used with IPC_CREAT to ensure failure if the resource exists. +@item +rwxrwxrwx access permissions. +@end itemize +@item +returns : semid on success. -1 on failure. +@end itemize + +An array of nsems semaphores is allocated if there is no resource +corresponding to the given key. The access permissions specified are +then copied into the @code{sem_perm} struct for the array along with the +user-id etc. The user must use the IPC_CREAT flag or key = IPC_PRIVATE +if a new resource is to be created.@refill + +@noindent +Errors:@* +@noindent +EINVAL : nsems not in above range (allocate).@* + nsems greater than number in array (procure).@* +@noindent +EEXIST : (allocate) IPC_CREAT | IPC_EXCL specified and resource exists.@* +@noindent +EIDRM : (procure) The resource was removed.@* +@noindent +ENOMEM : could not allocate space for semaphore array.@* +@noindent +ENOSPC : No arrays available (SEMMNI), too few semaphores available (SEMMNS).@* +@noindent +ENOENT : Resource does not exist and IPC_CREAT not specified.@* +@noindent +EACCES : (procure) do not have permission for specified access. + + +@node semop, semctl, semget, Semaphores +@subsection semop + +@noindent +Operations on semaphore arrays are performed by calling semop : + +@example +int semop (int semid, struct sembuf *sops, unsigned nsops); +@end example +@itemize @bullet +@item +semid : id obtained by a call to semget. +@item +sops : array of semaphore operations. +@item +nsops : number of operations in array (0 < nsops < SEMOPM). +@item +returns : semval for last operation. -1 on failure. +@end itemize + +@noindent +Operations are described by a structure sembuf: +@example +struct sembuf + ushort sem_num; /* semaphore index in array */ + short sem_op; /* semaphore operation */ + short sem_flg; /* operation flags */ +@end example + +The value @code{sem_op} is to be added (signed) to the current value semval +of the semaphore with index sem_num (0 .. nsems -1) in the set. +Flags recognized in sem_flg are IPC_NOWAIT and SEM_UNDO.@refill + +@noindent +Two kinds of operations can result in wait: +@enumerate +@item +If sem_op is 0 (read operation) and semval is non-zero, the process +sleeps on a queue waiting for semval to become zero or returns with +error EAGAIN if (IPC_NOWAIT | sem_flg) is true.@refill +@item +If (sem_op < 0) and (semval + sem_op < 0), the process either sleeps +on a queue waiting for semval to increase or returns with error EAGAIN if +(sem_flg & IPC_NOWAIT) is true.@refill +@end enumerate + +The array sops is first read in and preliminary checks performed on +the arguments. The operations are parsed to determine if any of +them needs write permissions or requests an undo operation.@refill + +The operations are then tried and the process sleeps if any operation +that does not specify IPC_NOWAIT cannot go through. If a process sleeps +it repeats these checks on waking up. If any operation that requests +IPC_NOWAIT, cannot go through at any stage, the call returns with errno +set to EAGAIN.@refill + +Finally, operations are committed when all go through without an intervening +sleep. Processes waiting on the zero_queue or increment_queue are awakened +if any of the semval's becomes zero or is incremented respectively.@refill + +@noindent +Errors:@* +@noindent +E2BIG : nsops > SEMOPM.@* +@noindent +EACCES : Do not have permission for requested (read/alter) access.@* +@noindent +EAGAIN : An operation with IPC_NOWAIT specified could not go through.@* +@noindent +EFAULT : The array sops is not accessible.@* +@noindent +EFBIG : An operation had semnum >= nsems.@* +@noindent +EIDRM : The resource was removed.@* +@noindent +EINTR : The process was interrupted on its way to a wait queue.@* +@noindent +EINVAL : nsops is 0, semid < 0 or unused.@* +@noindent +ENOMEM : SEM_UNDO requested. Could not allocate space for undo structure.@* +@noindent +ERANGE : sem_op + semval > SEMVMX for some operation. + + +@node semctl, semlimits, semop, Semaphores +@subsection semctl + +@example +int semctl (int semid, int semnum, int cmd, union semun arg); +@end example + +@itemize @bullet +@item +semid : id obtained by a call to semget. +@item +cmd : +@itemize @asis +@item +GETPID return pid for the process that executed the last semop. +@item +GETVAL return semval of semaphore with index semnum. +@item +GETNCNT return number of processes waiting for semval to increase. +@item +GETZCNT return number of processes waiting for semval to become 0 +@item +SETVAL set semval = arg.val. +@item +GETALL read all semval's into arg.array. +@item +SETALL set all semval's with values given in arg.array. +@end itemize +@item +returns : 0 on success or as given above. -1 on failure. +@end itemize + +The first 4 operate on the semaphore with index semnum in the set. +The last two operate on all semaphores in the set.@refill + +@code{arg} is a union : +@example +union semun + int val; value for SETVAL. + struct semid_ds *buf; buffer for IPC_STAT and IPC_SET. + ushort *array; array for GETALL and SETALL +@end example + +@itemize @bullet +@item +IPC_SET, SETVAL, SETALL : sem_ctime is updated. +@item +SETVAL, SETALL : Undo entries are cleared for altered semaphores in +all processes. Processes sleeping on the wait queues are +awakened if a semval becomes 0 or increases.@refill +@item +IPC_SET : sem_perm.uid, sem_perm.gid, sem_perm.mode are updated from +user supplied values.@refill +@end itemize + +@noindent +Errors: +@noindent +EACCES : do not have permission for specified access.@* +@noindent +EFAULT : arg is not accessible.@* +@noindent +EIDRM : The resource was removed.@* +@noindent +EINVAL : semid < 0 or semnum < 0 or semnum >= nsems.@* +@noindent +EPERM : IPC_RMID, IPC_SET ... not creator, owner or super-user.@* +@noindent +ERANGE : arg.array[i].semval > SEMVMX or < 0 for some i. + + + + +@node semlimits, Shared Memory, semctl, Semaphores +@subsection Limits on Semaphore Resources + +@noindent +Sizeof various structures: +@example +semid_ds 44 /* 1 per semaphore array .. dynamic */ +sem 8 /* 1 for each semaphore in system .. dynamic */ +sembuf 6 /* allocated by user */ +sem_undo 20 /* 1 for each undo request .. dynamic */ +@end example + +@noindent +Limits :@* +@itemize @bullet +@item +SEMVMX 32767 semaphore maximum value (short). +@item +SEMMNI number of semaphore identifiers (or arrays) system wide...policy. +@item +SEMMSL maximum number of semaphores per id. +1 semid_ds per array, 1 struct sem per semaphore +=> SEMMSL = (PAGE_SIZE - sizeof(semid_ds)) / sizeof(sem). +Implementation maximum SEMMSL = 500.@refill +@item +SEMMNS maximum number of semaphores system wide ... policy. +Setting SEMMNS >= SEMMSL*SEMMNI makes it irrelevent.@refill +@item +SEMOPM Maximum number of operations in one semop call...policy. +@end itemize + +@noindent +Unused or unimplemented:@* +@noindent +SEMAEM adjust on exit max value.@* +@noindent +SEMMNU number of undo structures system-wide.@* +@noindent +SEMUME maximum number of undo entries per process. + + + +@node Shared Memory, shmget, semlimits, top +@section Shared Memory + +Shared memory is distinct from the sharing of read-only code pages or +the sharing of unaltered data pages that is available due to the +copy-on-write mechanism. The essential difference is that the +shared pages are dirty (in the case of Shared memory) and can be +made to appear at a convenient location in the process' address space.@refill + +@noindent +A shared segment is described by : +@example +struct shmid_ds + struct ipc_perm shm_perm; + int shm_segsz; /* size of segment (bytes) */ + time_t shm_atime; /* last attach time */ + time_t shm_dtime; /* last detach time */ + time_t shm_ctime; /* last change time */ + ulong *shm_pages; /* internal page table */ + ushort shm_cpid; /* pid, creator */ + ushort shm_lpid; /* pid, last operation */ + short shm_nattch; /* no. of current attaches */ +@end example + +A shmget allocates a shmid_ds and an internal page table. A shmat +maps the segment into the process' address space with pointers +into the internal page table and the actual pages are faulted in +as needed. The memory associated with the segment must be explicitly +destroyed by calling shmctl with IPC_RMID.@refill + +@menu +* shmget:: +* shmat:: +* shmdt:: +* shmctl:: +* shmlimits:: Limits imposed by this implementation. +@end menu + + +@node shmget, shmat, Shared Memory, Shared Memory +@subsection shmget + +@noindent +A shared memory segment is allocated by a shmget system call: + +@example +int shmget(key_t key, int size, int shmflg); +@end example + +@itemize @bullet +@item +key : an integer usually got from @code{ftok} or IPC_PRIVATE +@item +size : size of the segment in bytes (SHMMIN <= size <= SHMMAX). +@item +shmflg : +@itemize @asis +@item +IPC_CREAT used to create a new resource +@item +IPC_EXCL used with IPC_CREAT to ensure failure if the resource exists. +@item +rwxrwxrwx access permissions. +@end itemize +@item +returns : shmid on success. -1 on failure. +@end itemize + +A descriptor for a shared memory segment is allocated if there isn't one +corresponding to the given key. The access permissions specified are +then copied into the @code{shm_perm} struct for the segment along with the +user-id etc. The user must use the IPC_CREAT flag or key = IPC_PRIVATE +to allocate a new segment.@refill + +If the segment already exists, the access permissions are verified, +and a check is made to see that it is not marked for destruction.@refill + +@code{size} is effectively rounded up to a multiple of PAGE_SIZE as shared +memory is allocated in pages.@refill + +@noindent +Errors:@* +@noindent +EINVAL : (allocate) Size not in range specified above.@* + (procure) Size greater than size of segment.@* +@noindent +EEXIST : (allocate) IPC_CREAT | IPC_EXCL specified and resource exists.@* +@noindent +EIDRM : (procure) The resource is marked destroyed or was removed.@* +@noindent +ENOSPC : (allocate) All id's are taken (max of SHMMNI id's system-wide). +Allocating a segment of the requested size would exceed the +system wide limit on total shared memory (SHMALL).@refill +@* +@noindent +ENOENT : (procure) Resource does not exist and IPC_CREAT not specified.@* +@noindent +EACCES : (procure) Do not have permission for specified access.@* +@noindent +ENOMEM : (allocate) Could not allocate memory for shmid_ds or pg_table. + + + +@node shmat, shmdt, shmget, Shared Memory +@subsection shmat + +@noindent +Maps a shared segment into the process' address space. + +@example +char *virt_addr; +virt_addr = shmat (int shmid, char *shmaddr, int shmflg); +@end example + +@itemize @bullet +@item +shmid : id got from call to shmget. +@item +shmaddr : requested attach address.@* + If shmaddr is 0 the system finds an unmapped region.@* + If a non-zero value is indicated the value must be page + aligned or the user must specify the SHM_RND flag.@refill +@item +shmflg :@* + SHM_RDONLY : request read-only attach.@* + SHM_RND : attach address is rounded DOWN to a multiple of SHMLBA. +@item +returns: virtual address of attached segment. -1 on failure. +@end itemize + +When shmaddr is 0, the attach address is determined by finding an +unmapped region in the address range 1G to 1.5G, starting at 1.5G +and coming down from there. The algorithm is very simple so you +are encouraged to avoid non-specific attaches. + +@noindent +Algorithm: +@display +Determine attach address as described above. +Check region (shmaddr, shmaddr + size) is not mapped and allocate + page tables (undocumented SHM_REMAP flag!). +Map the region by setting up pointers into the internal page table. +Add a descriptor for the attach to the task struct for the process. +@code{shm_nattch}, @code{shm_lpid}, @code{shm_atime} are updated. +@end display + +@noindent +Notes:@* +The @code{brk} value is not altered. +The segment is automatically detached when the process exits. +The same segment may be attached as read-only or read-write and + more than once in the process' address space. +A shmat can succeed on a segment marked for destruction. +The request for a particular type of attach is made using the SHM_RDONLY flag. +There is no notion of a write-only attach. The requested attach + permissions must fall within those allowed by @code{shm_perm.mode}. + +@noindent +Errors:@* +@noindent +EACCES : Do not have permission for requested access.@* +@noindent +EINVAL : shmid < 0 or unused, shmaddr not aligned, attach at brk failed.@* +@noindent +EIDRM : resource was removed.@* +@noindent +ENOMEM : Could not allocate memory for descriptor or page tables. + + +@node shmdt, shmctl, shmat, Shared Memory +@subsection shmdt + +@example +int shmdt (char *shmaddr); +@end example + +@itemize @bullet +@item +shmaddr : attach address of segment (returned by shmat). +@item +returns : 0 on success. -1 on failure. +@end itemize + +An attached segment is detached and @code{shm_nattch} decremented. The +occupied region in user space is unmapped. The segment is destroyed +if it is marked for destruction and @code{shm_nattch} is 0. +@code{shm_lpid} and @code{shm_dtime} are updated.@refill + +@noindent +Errors:@* +@noindent +EINVAL : No shared memory segment attached at shmaddr. + + +@node shmctl, shmlimits, shmdt, Shared Memory +@subsection shmctl + +@noindent +Destroys allocated segments. Reads/Writes the control structures. + +@example +int shmctl (int shmid, int cmd, struct shmid_ds *buf); +@end example + +@itemize @bullet +@item +shmid : id got from call to shmget. +@item +cmd : IPC_STAT, IPC_SET, IPC_RMID (@xref{syscalls}). +@itemize @asis +@item +IPC_SET : Used to set the owner uid, gid, and shm_perms.mode field. +@item +IPC_RMID : The segment is marked destroyed. It is only destroyed +on the last detach.@refill +@item +IPC_STAT : The shmid_ds structure is copied into the user allocated buffer. +@end itemize +@item +buf : used to read (IPC_STAT) or write (IPC_SET) information. +@item +returns : 0 on success, -1 on failure. +@end itemize + +The user must execute an IPC_RMID shmctl call to free the memory +allocated by the shared segment. Otherwise all the pages faulted in +will continue to live in memory or swap.@refill + +@noindent +Errors:@* +@noindent +EACCES : Do not have permission for requested access.@* +@noindent +EFAULT : buf is not accessible.@* +@noindent +EINVAL : shmid < 0 or unused.@* +@noindent +EIDRM : identifier destroyed.@* +@noindent +EPERM : not creator, owner or super-user (IPC_SET, IPC_RMID). + + +@node shmlimits, Notes, shmctl, Shared Memory +@subsection Limits on Shared Memory Resources + +@noindent +Limits: +@itemize @bullet +@item +SHMMNI max num of shared segments system wide ... 4096. +@item +SHMMAX max shared memory segment size (bytes) ... 4M +@item +SHMMIN min shared memory segment size (bytes). +1 byte (though PAGE_SIZE is the effective minimum size).@refill +@item +SHMALL max shared mem system wide (in pages) ... policy. +@item +SHMLBA segment low boundary address multiple. +Must be page aligned. SHMLBA = PAGE_SIZE.@refill +@end itemize +@noindent +Unused or unimplemented:@* +SHMSEG : maximum number of shared segments per process. + + + +@node Notes, top, shmlimits, top +@section Miscellaneous Notes + +The system calls are mapped into one -- @code{sys_ipc}. This should be +transparent to the user.@refill + +@subsection Semaphore @code{undo} requests + +There is one sem_undo structure associated with a process for +each semaphore which was altered (with an undo request) by the process. +@code{sem_undo} structures are freed only when the process exits. + +One major cause for unhappiness with the undo mechanism is that +it does not fit in with the notion of having an atomic set of +operations on an array. The undo requests for an array and each +semaphore therein may have been accumulated over many @code{semop} +calls. Thus use the undo mechanism with private semaphores only.@refill + +Should the process sleep in @code{exit} or should all undo +operations be applied with the IPC_NOWAIT flag in effect? +Currently those undo operations which go through immediately are +applied and those that require a wait are ignored silently.@refill + +@subsection Shared memory, @code{malloc} and the @code{brk}. +Note that since this section was written the implementation was +changed so that non-specific attaches are done in the region +1G - 1.5G. However much of the following is still worth thinking +about so I left it in. + +On many systems, the shared memory is allocated in a special region +of the address space ... way up somewhere. As mentioned earlier, +this implementation attaches shared segments at the lowest possible +address. Thus if you plan to use @code{malloc}, it is wise to malloc a +large space and then proceed to attach the shared segments. This way +malloc sets the brk sufficiently above the region it will use.@refill + +Alternatively you can use @code{sbrk} to adjust the @code{brk} value +as you make shared memory attaches. The implementation is not very +smart about selecting attach addresses. Using the system default +addresses will result in fragmentation if detaches do not occur +in the reverse sequence as attaches.@refill + +Taking control of the matter is probably best. The rule applied +is that attaches are allowed in unmapped regions other than +in the text space (see <a.out.h>). Also remember that attach addresses +and segment sizes are multiples of PAGE_SIZE.@refill + +One more trap (I quote Bruno on this). If you use malloc() to get space +for your shared memory (ie. to fix the @code{brk}), you must ensure you +get an unmapped address range. This means you must mallocate more memory +than you had ever allocated before. Memory returned by malloc(), used, +then freed by free() and then again returned by malloc is no good. +Neither is calloced memory.@refill + +Note that a shared memory region remains a shared memory region until +you unmap it. Attaching a segment at the @code{brk} and calling malloc +after that will result in an overlap of what malloc thinks is its +space with what is really a shared memory region. For example in the case +of a read-only attach, you will not be able to write to the overlapped +portion.@refill + + +@subsection Fork, exec and exit + +On a fork, the child inherits attached shared memory segments but +not the semaphore undo information.@refill + +In the case of an exec, the attached shared segments are detached. +The sem undo information however remains intact.@refill + +Upon exit, all attached shared memory segments are detached. +The adjust values in the undo structures are added to the relevant semvals +if the operations are permitted. Disallowed operations are ignored.@refill + + +@subsection Other Features + +These features of the current implementation are +likely to be modified in the future. + +The SHM_LOCK and SHM_UNLOCK flag are available (super-user) for use with the +@code{shmctl} call to prevent swapping of a shared segment. The user +must fault in any pages that are required to be present after locking +is enabled. + +The IPC_INFO, MSG_STAT, MSG_INFO, SHM_STAT, SHM_INFO, SEM_STAT, SEMINFO +@code{ctl} calls are used by the @code{ipcs} program to provide information +on allocated resources. These can be modified as needed or moved to a proc +file system interface. + + +@sp 3 +Thanks to Ove Ewerlid, Bruno Haible, Ulrich Pegelow and Linus Torvalds +for ideas, tutorials, bug reports and fixes, and merriment. And more +thanks to Bruno. + + +@contents +@bye + |