summaryrefslogtreecommitdiffstats
path: root/sys-utils/ipc.texi
diff options
context:
space:
mode:
Diffstat (limited to 'sys-utils/ipc.texi')
-rw-r--r--sys-utils/ipc.texi1310
1 files changed, 1310 insertions, 0 deletions
diff --git a/sys-utils/ipc.texi b/sys-utils/ipc.texi
new file mode 100644
index 000000000..cd9127167
--- /dev/null
+++ b/sys-utils/ipc.texi
@@ -0,0 +1,1310 @@
+\input texinfo @c -*-texinfo-*-
+@comment %**start of header (This is for running Texinfo on a region.)
+@setfilename ipc.info
+@settitle Inter Process Communication.
+@setchapternewpage odd
+@comment %**end of header (This is for running Texinfo on a region.)
+
+@ifinfo
+This file documents the System V style inter process communication
+primitives available under linux.
+
+Copyright @copyright{} 1992 krishna balasubramanian
+
+Permission is granted to use this material and the accompanying
+programs within the terms of the GNU GPL.
+@end ifinfo
+
+@titlepage
+@sp 10
+@center @titlefont{System V Inter Process Communication}
+@sp 2
+@center krishna balasubramanian,
+
+@comment The following two commands start the copyright page.
+@page
+@vskip 0pt plus 1filll
+Copyright @copyright{} 1992 krishna balasubramanian
+
+Permission is granted to use this material and the accompanying
+programs within the terms of the GNU GPL.
+@end titlepage
+
+
+
+@node top, Overview, Notes, (dir)
+@chapter System V IPC.
+
+These facilities are provided to maintain compatibility with
+programs developed on system V unix systems and others
+that rely on these system V mechanisms to accomplish inter
+process communication (IPC).@refill
+
+The specifics described here are applicable to the Linux implementation.
+Other implementations may do things slightly differently.
+
+@menu
+* Overview:: What is system V ipc? Overall mechanisms.
+* Messages:: System calls for message passing.
+* Semaphores:: System calls for semaphores.
+* Shared Memory:: System calls for shared memory access.
+* Notes:: Miscellaneous notes.
+@end menu
+
+@node Overview, example, top, top
+@section Overview
+
+@noindent System V IPC consists of three mechanisms:
+
+@itemize @bullet
+@item
+Messages : exchange messages with any process or server.
+@item
+Semaphores : allow unrelated processes to synchronize execution.
+@item
+Shared memory : allow unrelated processes to share memory.
+@end itemize
+
+@menu
+* example:: Using shared memory.
+* perms:: Description of access permissions.
+* syscalls:: Overview of ipc system calls.
+@end menu
+
+Access to all resources is permitted on the basis of permissions
+set up when the resource was created.@refill
+
+A resource here consists of message queue, a semaphore set (array)
+or a shared memory segment.@refill
+
+A resource must first be allocated by a creator before it is used.
+The creator can assign a different owner. After use the resource
+must be explicitly destroyed by the creator or owner.@refill
+
+A resource is identified by a numeric @var{id}. Typically a creator
+defines a @var{key} that may be used to access the resource. The user
+process may then use this @var{key} in the @dfn{get} system call to obtain
+the @var{id} for the corresponding resource. This @var{id} is then used for
+all further access. A library call @dfn{ftok} is provided to translate
+pathnames or strings to numeric keys.@refill
+
+There are system and implementation defined limits on the number and
+sizes of resources of any given type. Some of these are imposed by the
+implementation and others by the system administrator
+when configuring the kernel (@xref{msglimits}, @xref{semlimits},
+@xref{shmlimits}).@refill
+
+There is an @code{msqid_ds}, @code{semid_ds} or @code{shmid_ds} struct
+associated with each message queue, semaphore array or shared segment.
+Each ipc resource has an associated @code{ipc_perm} struct which defines
+the creator, owner, access perms ..etc.., for the resource.
+These structures are detailed in the following sections.@refill
+
+
+
+@node example, perms, Overview, Overview
+@section example
+
+Here is a code fragment with pointers on how to use shared memory. The
+same methods are applicable to other resources.@refill
+
+In a typical access sequence the creator allocates a new instance
+of the resource with the @code{get} system call using the IPC_CREAT
+flag.@refill
+
+@noindent creator process:@*
+
+@example
+#include <sys/shm.h>
+int id;
+key_t key;
+char proc_id = 'C';
+int size = 0x5000; /* 20 K */
+int flags = 0664 | IPC_CREAT; /* read-only for others */
+
+key = ftok ("~creator/ipckey", proc_id);
+id = shmget (key, size, flags);
+exit (0); /* quit leaving resource allocated */
+@end example
+
+@noindent
+Users then gain access to the resource using the same key.@*
+@noindent
+Client process:
+@example
+#include <sys/shm.h>
+char *shmaddr;
+int id;
+key_t key;
+char proc_id = 'C';
+
+key = ftok ("~creator/ipckey", proc_id);
+
+id = shmget (key, 0, 004); /* default size */
+if (id == -1)
+ perror ("shmget ...");
+
+shmaddr = shmat (id, 0, SHM_RDONLY); /* attach segment for reading */
+if (shmaddr == (char *) -1)
+ perror ("shmat ...");
+
+local_var = *(shmaddr + 3); /* read segment etc. */
+
+shmdt (shmaddr); /* detach segment */
+@end example
+
+@noindent
+When the resource is no longer needed the creator should remove it.@*
+@noindent
+Creator/owner process 2:
+@example
+key = ftok ("~creator/ipckey", proc_id)
+id = shmget (key, 0, 0);
+shmctl (id, IPC_RMID, NULL);
+@end example
+
+
+@node perms, syscalls, example, Overview
+@section Permissions
+
+Each resource has an associated @code{ipc_perm} struct which defines the
+creator, owner and access perms for the resource.@refill
+
+@example
+struct ipc_perm
+ key_t key; /* set by creator */
+ ushort uid; /* owner euid and egid */
+ ushort gid;
+ ushort cuid; /* creator euid and egid */
+ ushort cgid;
+ ushort mode; /* access modes in lower 9 bits */
+ ushort seq; /* sequence number */
+@end example
+
+The creating process is the default owner. The owner can be reassigned
+by the creator and has creator perms. Only the owner, creator or super-user
+can delete the resource.@refill
+
+The lowest nine bits of the flags parameter supplied by the user to the
+system call are compared with the values stored in @code{ipc_perms.mode}
+to determine if the requested access is allowed. In the case
+that the system call creates the resource, these bits are initialized
+from the user supplied value.@refill
+
+As for files, access permissions are specified as read, write and exec
+for user, group or other (though the exec perms are unused). For example
+0624 grants read-write to owner, write-only to group and read-only
+access to others.@refill
+
+For shared memory, note that read-write access for segments is determined
+by a separate flag which is not stored in the @code{mode} field.
+Shared memory segments attached with write access can be read.@refill
+
+The @code{cuid}, @code{cgid}, @code{key} and @code{seq} fields
+cannot be changed by the user.@refill
+
+
+
+@node syscalls, Messages, perms, Overview
+@section IPC system calls
+
+This section provides an overview of the IPC system calls. See the
+specific sections on each type of resource for details.@refill
+
+Each type of mechanism provides a @dfn{get}, @dfn{ctl} and one or more
+@dfn{op} system calls that allow the user to create or procure the
+resource (get), define its behaviour or destroy it (ctl) and manipulate
+the resources (op).@refill
+
+
+
+@subsection The @dfn{get} system calls
+
+The @code{get} call typically takes a @var{key} and returns a numeric
+@var{id} that is used for further access.
+The @var{id} is an index into the resource table. A sequence
+number is maintained and incremented when a resource is
+destroyed so that acceses using an obselete @var{id} is likely to fail.@refill
+
+The user also specifies the permissions and other behaviour
+charecteristics for the current access. The flags are or-ed with the
+permissions when invoking system calls as in:@refill
+@example
+msgflg = IPC_CREAT | IPC_EXCL | 0666;
+id = msgget (key, msgflg);
+@end example
+@itemize @bullet
+@item
+@code{key} : IPC_PRIVATE => new instance of resource is initialized.
+@item
+@code{flags} :
+@itemize @asis
+@item
+IPC_CREAT : resource created for @var{key} if it does not exist.
+@item
+IPC_CREAT | IPC_EXCL : fail if resource exists for @var{key}.
+@end itemize
+@item
+returns : an identifier used for all further access to the resource.
+@end itemize
+
+Note that IPC_PRIVATE is not a flag but a special @code{key}
+that ensures (when the call is successful) that a new resource is
+created.@refill
+
+Use of IPC_PRIVATE does not make the resource inaccessible to other
+users. For this you must set the access permissions appropriately.@refill
+
+There is currently no way for a process to ensure exclusive access to a
+resource. IPC_CREAT | IPC_EXCL only ensures (on success) that a new
+resource was initialized. It does not imply exclusive access.@refill
+
+@noindent
+See Also : @xref{msgget}, @xref{semget}, @xref{shmget}.@refill
+
+
+
+@subsection The @dfn{ctl} system calls
+
+Provides or alters the information stored in the structure that describes
+the resource indexed by @var{id}.@refill
+
+@example
+#include <sys/msg.h>
+struct msqid_ds buf;
+err = msgctl (id, IPC_STAT, &buf);
+if (err)
+ !$#%*
+else
+ printf ("creator uid = %d\n", buf.msg_perm.cuid);
+ ....
+@end example
+
+@noindent
+Commands supported by all @code{ctl} calls:@*
+@itemize @bullet
+@item
+IPC_STAT : read info on resource specified by id into user allocated
+buffer. The user must have read access to the resource.@refill
+@item
+IPC_SET : write info from buffer into resource data structure. The
+user must be owner creator or super-user.@refill
+@item
+IPC_RMID : remove resource. The user must be the owner, creator or
+super-user.@refill
+@end itemize
+
+The IPC_RMID command results in immediate removal of a message
+queue or semaphore array. Shared memory segments however, are
+only destroyed upon the last detach after IPC_RMID is executed.@refill
+
+The @code{semctl} call provides a number of command options that allow
+the user to determine or set the values of the semaphores in an array.@refill
+
+@noindent
+See Also: @xref{msgctl}, @xref{semctl}, @xref{shmctl}.@refill
+
+
+@subsection The @dfn{op} system calls
+
+Used to send or receive messages, read or alter semaphore values,
+attach or detach shared memory segments.
+The IPC_NOWAIT flag will cause the operation to fail with error EAGAIN
+if the process has to wait on the call.@refill
+
+@noindent
+@code{flags} : IPC_NOWAIT => return with error if a wait is required.
+
+@noindent
+See Also: @xref{msgsnd},@xref{msgrcv},@xref{semop},@xref{shmat},
+@xref{shmdt}.@refill
+
+
+
+@node Messages, msgget, syscalls, top
+@section Messages
+
+A message resource is described by a struct @code{msqid_ds} which is
+allocated and initialized when the resource is created. Some fields
+in @code{msqid_ds} can then be altered (if desired) by invoking @code{msgctl}.
+The memory used by the resource is released when it is destroyed by
+a @code{msgctl} call.@refill
+
+@example
+struct msqid_ds
+ struct ipc_perm msg_perm;
+ struct msg *msg_first; /* first message on queue (internal) */
+ struct msg *msg_last; /* last message in queue (internal) */
+ time_t msg_stime; /* last msgsnd time */
+ time_t msg_rtime; /* last msgrcv time */
+ time_t msg_ctime; /* last change time */
+ struct wait_queue *wwait; /* writers waiting (internal) */
+ struct wait_queue *rwait; /* readers waiting (internal) */
+ ushort msg_cbytes; /* number of bytes used on queue */
+ ushort msg_qnum; /* number of messages in queue */
+ ushort msg_qbytes; /* max number of bytes on queue */
+ ushort msg_lspid; /* pid of last msgsnd */
+ ushort msg_lrpid; /* pid of last msgrcv */
+@end example
+
+To send or receive a message the user allocates a structure that looks
+like a @code{msgbuf} but with an array @code{mtext} of the required size.
+Messages have a type (positive integer) associated with them so that
+(for example) a listener can choose to receive only messages of a
+given type.@refill
+
+@example
+struct msgbuf
+ long mtype; type of message (@xref{msgrcv}).
+ char mtext[1]; message text .. why is this not a ptr?
+@end example
+
+The user must have write permissions to send and read permissions
+to receive messages on a queue.@refill
+
+When @code{msgsnd} is invoked, the user's message is copied into
+an internal struct @code{msg} and added to the queue. A @code{msgrcv}
+will then read this message and free the associated struct @code{msg}.@refill
+
+
+@menu
+* msgget::
+* msgsnd::
+* msgrcv::
+* msgctl::
+* msglimits:: Implementation defined limits.
+@end menu
+
+
+@node msgget, msgsnd, Messages, Messages
+@subsection msgget
+
+@noindent
+A message queue is allocated by a msgget system call :
+
+@example
+msqid = msgget (key_t key, int msgflg);
+@end example
+
+@itemize @bullet
+@item
+@code{key}: an integer usually got from @code{ftok()} or IPC_PRIVATE.@refill
+@item
+@code{msgflg}:
+@itemize @asis
+@item
+IPC_CREAT : used to create a new resource if it does not already exist.
+@item
+IPC_EXCL | IPC_CREAT : used to ensure failure of the call if the
+resource already exists.@refill
+@item
+rwxrwxrwx : access permissions.
+@end itemize
+@item
+returns: msqid (an integer used for all further access) on success.
+-1 on failure.@refill
+@end itemize
+
+A message queue is allocated if there is no resource corresponding
+to the given key. The access permissions specified are then copied
+into the @code{msg_perm} struct and the fields in @code{msqid_ds}
+initialized. The user must use the IPC_CREAT flag or key = IPC_PRIVATE,
+if a new instance is to be allocated. If a resource corresponding to
+@var{key} already exists, the access permissions are verified.@refill
+
+@noindent
+Errors:@*
+@noindent
+EACCES : (procure) Do not have permission for requested access.@*
+@noindent
+EEXIST : (allocate) IPC_CREAT | IPC_EXCL specified and resource exists.@*
+@noindent
+EIDRM : (procure) The resource was removed.@*
+@noindent
+ENOSPC : All id's are taken (max of MSGMNI id's system-wide).@*
+@noindent
+ENOENT : Resource does not exist and IPC_CREAT not specified.@*
+@noindent
+ENOMEM : A new @code{msqid_ds} was to be created but ... nomem.
+
+
+
+
+@node msgsnd, msgrcv, msgget, Messages
+@subsection msgsnd
+
+@example
+int msgsnd (int msqid, struct msgbuf *msgp, int msgsz, int msgflg);
+@end example
+
+@itemize @bullet
+@item
+@code{msqid} : id obtained by a call to msgget.
+@item
+@code{msgsz} : size of msg text (@code{mtext}) in bytes.
+@item
+@code{msgp} : message to be sent. (msgp->mtype must be positive).
+@item
+@code{msgflg} : IPC_NOWAIT.
+@item
+returns : msgsz on success. -1 on error.
+@end itemize
+
+The message text and type are stored in the internal @code{msg}
+structure. @code{msg_cbytes}, @code{msg_qnum}, @code{msg_lspid},
+and @code{msg_stime} fields are updated. Readers waiting on the
+queue are awakened.@refill
+
+@noindent
+Errors:@*
+@noindent
+EACCES : Do not have write permission on queue.@*
+@noindent
+EAGAIN : IPC_NOWAIT specified and queue is full.@*
+@noindent
+EFAULT : msgp not accessible.@*
+@noindent
+EIDRM : The message queue was removed.@*
+@noindent
+EINTR : Full queue ... would have slept but ... was interrupted.@*
+@noindent
+EINVAL : mtype < 1, msgsz > MSGMAX, msgsz < 0, msqid < 0 or unused.@*
+@noindent
+ENOMEM : Could not allocate space for header and text.@*
+
+
+
+@node msgrcv, msgctl, msgsnd, Messages
+@subsection msgrcv
+
+@example
+int msgrcv (int msqid, struct msgbuf *msgp, int msgsz, long msgtyp,
+ int msgflg);
+@end example
+
+@itemize @bullet
+@item
+msqid : id obtained by a call to msgget.
+@item
+msgsz : maximum size of message to receive.
+@item
+msgp : allocated by user to store the message in.
+@item
+msgtyp :
+@itemize @asis
+@item
+0 => get first message on queue.
+@item
+> 0 => get first message of matching type.
+@item
+< 0 => get message with least type which is <= abs(msgtyp).
+@end itemize
+@item
+msgflg :
+@itemize @asis
+@item
+IPC_NOWAIT : Return immediately if message not found.
+@item
+MSG_NOERROR : The message is truncated if it is larger than msgsz.
+@item
+MSG_EXCEPT : Used with msgtyp > 0 to receive any msg except of specified
+type.@refill
+@end itemize
+@item
+returns : size of message if found. -1 on error.
+@end itemize
+
+The first message that meets the @code{msgtyp} specification is
+identified. For msgtyp < 0, the entire queue is searched for the
+message with the smallest type.@refill
+
+If its length is smaller than msgsz or if the user specified the
+MSG_NOERROR flag, its text and type are copied to msgp->mtext and
+msgp->mtype, and it is taken off the queue.@refill
+
+The @code{msg_cbytes}, @code{msg_qnum}, @code{msg_lrpid},
+and @code{msg_rtime} fields are updated. Writers waiting on the
+queue are awakened.@refill
+
+@noindent
+Errors:@*
+@noindent
+E2BIG : msg bigger than msgsz and MSG_NOERROR not specified.@*
+@noindent
+EACCES : Do not have permission for reading the queue.@*
+@noindent
+EFAULT : msgp not accessible.@*
+@noindent
+EIDRM : msg queue was removed.@*
+@noindent
+EINTR : msg not found ... would have slept but ... was interrupted.@*
+@noindent
+EINVAL : msgsz > msgmax or msgsz < 0, msqid < 0 or unused.@*
+@noindent
+ENOMSG : msg of requested type not found and IPC_NOWAIT specified.
+
+
+
+@node msgctl, msglimits, msgrcv, Messages
+@subsection msgctl
+
+@example
+int msgctl (int msqid, int cmd, struct msqid_ds *buf);
+@end example
+
+@itemize @bullet
+@item
+msqid : id obtained by a call to msgget.
+@item
+buf : allocated by user for reading/writing info.
+@item
+cmd : IPC_STAT, IPC_SET, IPC_RMID (@xref{syscalls}).
+@end itemize
+
+IPC_STAT results in the copy of the queue data structure
+into the user supplied buffer.@refill
+
+In the case of IPC_SET, the queue size (@code{msg_qbytes})
+and the @code{uid}, @code{gid}, @code{mode} (low 9 bits) fields
+of the @code{msg_perm} struct are set from the user supplied values.
+@code{msg_ctime} is updated.@refill
+
+Note that only the super user may increase the limit on the size of a
+message queue beyond MSGMNB.@refill
+
+When the queue is destroyed (IPC_RMID), the sequence number is
+incremented and all waiting readers and writers are awakened.
+These processes will then return with @code{errno} set to EIDRM.@refill
+
+@noindent
+Errors:
+@noindent
+EPERM : Insufficient privilege to increase the size of the queue (IPC_SET)
+or remove it (IPC_RMID).@*
+@noindent
+EACCES : Do not have permission for reading the queue (IPC_STAT).@*
+@noindent
+EFAULT : buf not accessible (IPC_STAT, IPC_SET).@*
+@noindent
+EIDRM : msg queue was removed.@*
+@noindent
+EINVAL : invalid cmd, msqid < 0 or unused.
+
+
+@node msglimits, Semaphores, msgctl, Messages
+@subsection Limis on Message Resources
+
+@noindent
+Sizeof various structures:
+@itemize @asis
+@item
+msqid_ds 52 /* 1 per message queue .. dynamic */
+@item
+msg 16 /* 1 for each message in system .. dynamic */
+@item
+msgbuf 8 /* allocated by user */
+@end itemize
+
+@noindent
+Limits
+@itemize @bullet
+@item
+MSGMNI : number of message queue identifiers ... policy.
+@item
+MSGMAX : max size of message.
+Header and message space allocated on one page.
+MSGMAX = (PAGE_SIZE - sizeof(struct msg)).
+Implementation maximum MSGMAX = 4080.@refill
+@item
+MSGMNB : default max size of a message queue ... policy.
+The super-user can increase the size of a
+queue beyond MSGMNB by a @code{msgctl} call.@refill
+@end itemize
+
+@noindent
+Unused or unimplemented:@*
+MSGTQL max number of message headers system-wide.@*
+MSGPOOL total size in bytes of msg pool.
+
+
+
+@node Semaphores, semget, msglimits, top
+@section Semaphores
+
+Each semaphore has a value >= 0. An id provides access to an array
+of @code{nsems} semaphores. Operations such as read, increment or decrement
+semaphores in a set are performed by the @code{semop} call which processes
+@code{nsops} operations at a time. Each operation is specified in a struct
+@code{sembuf} described below. The operations are applied only if all of
+them succeed.@refill
+
+If you do not have a need for such arrays, you are probably better off using
+the @code{test_bit}, @code{set_bit} and @code{clear_bit} bit-operations
+defined in <asm/bitops.h>.@refill
+
+Semaphore operations may also be qualified by a SEM_UNDO flag which
+results in the operation being undone when the process exits.@refill
+
+If a decrement cannot go through, a process will be put to sleep
+on a queue waiting for the @code{semval} to increase unless it specifies
+IPC_NOWAIT. A read operation can similarly result in a sleep on a
+queue waiting for @code{semval} to become 0. (Actually there are
+two queues per semaphore array).@refill
+
+@noindent
+A semaphore array is described by:
+@example
+struct semid_ds
+ struct ipc_perm sem_perm;
+ time_t sem_otime; /* last semop time */
+ time_t sem_ctime; /* last change time */
+ struct wait_queue *eventn; /* wait for a semval to increase */
+ struct wait_queue *eventz; /* wait for a semval to become 0 */
+ struct sem_undo *undo; /* undo entries */
+ ushort sem_nsems; /* no. of semaphores in array */
+@end example
+
+@noindent
+Each semaphore is described internally by :
+@example
+struct sem
+ short sempid; /* pid of last semop() */
+ ushort semval; /* current value */
+ ushort semncnt; /* num procs awaiting increase in semval */
+ ushort semzcnt; /* num procs awaiting semval = 0 */
+@end example
+
+@menu
+* semget::
+* semop::
+* semctl::
+* semlimits:: Limits imposed by this implementation.
+@end menu
+
+@node semget, semop, Semaphores, Semaphores
+@subsection semget
+
+@noindent
+A semaphore array is allocated by a semget system call:
+
+@example
+semid = semget (key_t key, int nsems, int semflg);
+@end example
+
+@itemize @bullet
+@item
+@code{key} : an integer usually got from @code{ftok} or IPC_PRIVATE
+@item
+@code{nsems} :
+@itemize @asis
+@item
+# of semaphores in array (0 <= nsems <= SEMMSL <= SEMMNS)
+@item
+0 => dont care can be used when not creating the resource.
+If successful you always get access to the entire array anyway.@refill
+@end itemize
+@item
+semflg :
+@itemize @asis
+@item
+IPC_CREAT used to create a new resource
+@item
+IPC_EXCL used with IPC_CREAT to ensure failure if the resource exists.
+@item
+rwxrwxrwx access permissions.
+@end itemize
+@item
+returns : semid on success. -1 on failure.
+@end itemize
+
+An array of nsems semaphores is allocated if there is no resource
+corresponding to the given key. The access permissions specified are
+then copied into the @code{sem_perm} struct for the array along with the
+user-id etc. The user must use the IPC_CREAT flag or key = IPC_PRIVATE
+if a new resource is to be created.@refill
+
+@noindent
+Errors:@*
+@noindent
+EINVAL : nsems not in above range (allocate).@*
+ nsems greater than number in array (procure).@*
+@noindent
+EEXIST : (allocate) IPC_CREAT | IPC_EXCL specified and resource exists.@*
+@noindent
+EIDRM : (procure) The resource was removed.@*
+@noindent
+ENOMEM : could not allocate space for semaphore array.@*
+@noindent
+ENOSPC : No arrays available (SEMMNI), too few semaphores available (SEMMNS).@*
+@noindent
+ENOENT : Resource does not exist and IPC_CREAT not specified.@*
+@noindent
+EACCES : (procure) do not have permission for specified access.
+
+
+@node semop, semctl, semget, Semaphores
+@subsection semop
+
+@noindent
+Operations on semaphore arrays are performed by calling semop :
+
+@example
+int semop (int semid, struct sembuf *sops, unsigned nsops);
+@end example
+@itemize @bullet
+@item
+semid : id obtained by a call to semget.
+@item
+sops : array of semaphore operations.
+@item
+nsops : number of operations in array (0 < nsops < SEMOPM).
+@item
+returns : semval for last operation. -1 on failure.
+@end itemize
+
+@noindent
+Operations are described by a structure sembuf:
+@example
+struct sembuf
+ ushort sem_num; /* semaphore index in array */
+ short sem_op; /* semaphore operation */
+ short sem_flg; /* operation flags */
+@end example
+
+The value @code{sem_op} is to be added (signed) to the current value semval
+of the semaphore with index sem_num (0 .. nsems -1) in the set.
+Flags recognized in sem_flg are IPC_NOWAIT and SEM_UNDO.@refill
+
+@noindent
+Two kinds of operations can result in wait:
+@enumerate
+@item
+If sem_op is 0 (read operation) and semval is non-zero, the process
+sleeps on a queue waiting for semval to become zero or returns with
+error EAGAIN if (IPC_NOWAIT | sem_flg) is true.@refill
+@item
+If (sem_op < 0) and (semval + sem_op < 0), the process either sleeps
+on a queue waiting for semval to increase or returns with error EAGAIN if
+(sem_flg & IPC_NOWAIT) is true.@refill
+@end enumerate
+
+The array sops is first read in and preliminary checks performed on
+the arguments. The operations are parsed to determine if any of
+them needs write permissions or requests an undo operation.@refill
+
+The operations are then tried and the process sleeps if any operation
+that does not specify IPC_NOWAIT cannot go through. If a process sleeps
+it repeats these checks on waking up. If any operation that requests
+IPC_NOWAIT, cannot go through at any stage, the call returns with errno
+set to EAGAIN.@refill
+
+Finally, operations are committed when all go through without an intervening
+sleep. Processes waiting on the zero_queue or increment_queue are awakened
+if any of the semval's becomes zero or is incremented respectively.@refill
+
+@noindent
+Errors:@*
+@noindent
+E2BIG : nsops > SEMOPM.@*
+@noindent
+EACCES : Do not have permission for requested (read/alter) access.@*
+@noindent
+EAGAIN : An operation with IPC_NOWAIT specified could not go through.@*
+@noindent
+EFAULT : The array sops is not accessible.@*
+@noindent
+EFBIG : An operation had semnum >= nsems.@*
+@noindent
+EIDRM : The resource was removed.@*
+@noindent
+EINTR : The process was interrupted on its way to a wait queue.@*
+@noindent
+EINVAL : nsops is 0, semid < 0 or unused.@*
+@noindent
+ENOMEM : SEM_UNDO requested. Could not allocate space for undo structure.@*
+@noindent
+ERANGE : sem_op + semval > SEMVMX for some operation.
+
+
+@node semctl, semlimits, semop, Semaphores
+@subsection semctl
+
+@example
+int semctl (int semid, int semnum, int cmd, union semun arg);
+@end example
+
+@itemize @bullet
+@item
+semid : id obtained by a call to semget.
+@item
+cmd :
+@itemize @asis
+@item
+GETPID return pid for the process that executed the last semop.
+@item
+GETVAL return semval of semaphore with index semnum.
+@item
+GETNCNT return number of processes waiting for semval to increase.
+@item
+GETZCNT return number of processes waiting for semval to become 0
+@item
+SETVAL set semval = arg.val.
+@item
+GETALL read all semval's into arg.array.
+@item
+SETALL set all semval's with values given in arg.array.
+@end itemize
+@item
+returns : 0 on success or as given above. -1 on failure.
+@end itemize
+
+The first 4 operate on the semaphore with index semnum in the set.
+The last two operate on all semaphores in the set.@refill
+
+@code{arg} is a union :
+@example
+union semun
+ int val; value for SETVAL.
+ struct semid_ds *buf; buffer for IPC_STAT and IPC_SET.
+ ushort *array; array for GETALL and SETALL
+@end example
+
+@itemize @bullet
+@item
+IPC_SET, SETVAL, SETALL : sem_ctime is updated.
+@item
+SETVAL, SETALL : Undo entries are cleared for altered semaphores in
+all processes. Processes sleeping on the wait queues are
+awakened if a semval becomes 0 or increases.@refill
+@item
+IPC_SET : sem_perm.uid, sem_perm.gid, sem_perm.mode are updated from
+user supplied values.@refill
+@end itemize
+
+@noindent
+Errors:
+@noindent
+EACCES : do not have permission for specified access.@*
+@noindent
+EFAULT : arg is not accessible.@*
+@noindent
+EIDRM : The resource was removed.@*
+@noindent
+EINVAL : semid < 0 or semnum < 0 or semnum >= nsems.@*
+@noindent
+EPERM : IPC_RMID, IPC_SET ... not creator, owner or super-user.@*
+@noindent
+ERANGE : arg.array[i].semval > SEMVMX or < 0 for some i.
+
+
+
+
+@node semlimits, Shared Memory, semctl, Semaphores
+@subsection Limits on Semaphore Resources
+
+@noindent
+Sizeof various structures:
+@example
+semid_ds 44 /* 1 per semaphore array .. dynamic */
+sem 8 /* 1 for each semaphore in system .. dynamic */
+sembuf 6 /* allocated by user */
+sem_undo 20 /* 1 for each undo request .. dynamic */
+@end example
+
+@noindent
+Limits :@*
+@itemize @bullet
+@item
+SEMVMX 32767 semaphore maximum value (short).
+@item
+SEMMNI number of semaphore identifiers (or arrays) system wide...policy.
+@item
+SEMMSL maximum number of semaphores per id.
+1 semid_ds per array, 1 struct sem per semaphore
+=> SEMMSL = (PAGE_SIZE - sizeof(semid_ds)) / sizeof(sem).
+Implementation maximum SEMMSL = 500.@refill
+@item
+SEMMNS maximum number of semaphores system wide ... policy.
+Setting SEMMNS >= SEMMSL*SEMMNI makes it irrelevent.@refill
+@item
+SEMOPM Maximum number of operations in one semop call...policy.
+@end itemize
+
+@noindent
+Unused or unimplemented:@*
+@noindent
+SEMAEM adjust on exit max value.@*
+@noindent
+SEMMNU number of undo structures system-wide.@*
+@noindent
+SEMUME maximum number of undo entries per process.
+
+
+
+@node Shared Memory, shmget, semlimits, top
+@section Shared Memory
+
+Shared memory is distinct from the sharing of read-only code pages or
+the sharing of unaltered data pages that is available due to the
+copy-on-write mechanism. The essential difference is that the
+shared pages are dirty (in the case of Shared memory) and can be
+made to appear at a convenient location in the process' address space.@refill
+
+@noindent
+A shared segment is described by :
+@example
+struct shmid_ds
+ struct ipc_perm shm_perm;
+ int shm_segsz; /* size of segment (bytes) */
+ time_t shm_atime; /* last attach time */
+ time_t shm_dtime; /* last detach time */
+ time_t shm_ctime; /* last change time */
+ ulong *shm_pages; /* internal page table */
+ ushort shm_cpid; /* pid, creator */
+ ushort shm_lpid; /* pid, last operation */
+ short shm_nattch; /* no. of current attaches */
+@end example
+
+A shmget allocates a shmid_ds and an internal page table. A shmat
+maps the segment into the process' address space with pointers
+into the internal page table and the actual pages are faulted in
+as needed. The memory associated with the segment must be explicitly
+destroyed by calling shmctl with IPC_RMID.@refill
+
+@menu
+* shmget::
+* shmat::
+* shmdt::
+* shmctl::
+* shmlimits:: Limits imposed by this implementation.
+@end menu
+
+
+@node shmget, shmat, Shared Memory, Shared Memory
+@subsection shmget
+
+@noindent
+A shared memory segment is allocated by a shmget system call:
+
+@example
+int shmget(key_t key, int size, int shmflg);
+@end example
+
+@itemize @bullet
+@item
+key : an integer usually got from @code{ftok} or IPC_PRIVATE
+@item
+size : size of the segment in bytes (SHMMIN <= size <= SHMMAX).
+@item
+shmflg :
+@itemize @asis
+@item
+IPC_CREAT used to create a new resource
+@item
+IPC_EXCL used with IPC_CREAT to ensure failure if the resource exists.
+@item
+rwxrwxrwx access permissions.
+@end itemize
+@item
+returns : shmid on success. -1 on failure.
+@end itemize
+
+A descriptor for a shared memory segment is allocated if there isn't one
+corresponding to the given key. The access permissions specified are
+then copied into the @code{shm_perm} struct for the segment along with the
+user-id etc. The user must use the IPC_CREAT flag or key = IPC_PRIVATE
+to allocate a new segment.@refill
+
+If the segment already exists, the access permissions are verified,
+and a check is made to see that it is not marked for destruction.@refill
+
+@code{size} is effectively rounded up to a multiple of PAGE_SIZE as shared
+memory is allocated in pages.@refill
+
+@noindent
+Errors:@*
+@noindent
+EINVAL : (allocate) Size not in range specified above.@*
+ (procure) Size greater than size of segment.@*
+@noindent
+EEXIST : (allocate) IPC_CREAT | IPC_EXCL specified and resource exists.@*
+@noindent
+EIDRM : (procure) The resource is marked destroyed or was removed.@*
+@noindent
+ENOSPC : (allocate) All id's are taken (max of SHMMNI id's system-wide).
+Allocating a segment of the requested size would exceed the
+system wide limit on total shared memory (SHMALL).@refill
+@*
+@noindent
+ENOENT : (procure) Resource does not exist and IPC_CREAT not specified.@*
+@noindent
+EACCES : (procure) Do not have permission for specified access.@*
+@noindent
+ENOMEM : (allocate) Could not allocate memory for shmid_ds or pg_table.
+
+
+
+@node shmat, shmdt, shmget, Shared Memory
+@subsection shmat
+
+@noindent
+Maps a shared segment into the process' address space.
+
+@example
+char *virt_addr;
+virt_addr = shmat (int shmid, char *shmaddr, int shmflg);
+@end example
+
+@itemize @bullet
+@item
+shmid : id got from call to shmget.
+@item
+shmaddr : requested attach address.@*
+ If shmaddr is 0 the system finds an unmapped region.@*
+ If a non-zero value is indicated the value must be page
+ aligned or the user must specify the SHM_RND flag.@refill
+@item
+shmflg :@*
+ SHM_RDONLY : request read-only attach.@*
+ SHM_RND : attach address is rounded DOWN to a multiple of SHMLBA.
+@item
+returns: virtual address of attached segment. -1 on failure.
+@end itemize
+
+When shmaddr is 0, the attach address is determined by finding an
+unmapped region in the address range 1G to 1.5G, starting at 1.5G
+and coming down from there. The algorithm is very simple so you
+are encouraged to avoid non-specific attaches.
+
+@noindent
+Algorithm:
+@display
+Determine attach address as described above.
+Check region (shmaddr, shmaddr + size) is not mapped and allocate
+ page tables (undocumented SHM_REMAP flag!).
+Map the region by setting up pointers into the internal page table.
+Add a descriptor for the attach to the task struct for the process.
+@code{shm_nattch}, @code{shm_lpid}, @code{shm_atime} are updated.
+@end display
+
+@noindent
+Notes:@*
+The @code{brk} value is not altered.
+The segment is automatically detached when the process exits.
+The same segment may be attached as read-only or read-write and
+ more than once in the process' address space.
+A shmat can succeed on a segment marked for destruction.
+The request for a particular type of attach is made using the SHM_RDONLY flag.
+There is no notion of a write-only attach. The requested attach
+ permissions must fall within those allowed by @code{shm_perm.mode}.
+
+@noindent
+Errors:@*
+@noindent
+EACCES : Do not have permission for requested access.@*
+@noindent
+EINVAL : shmid < 0 or unused, shmaddr not aligned, attach at brk failed.@*
+@noindent
+EIDRM : resource was removed.@*
+@noindent
+ENOMEM : Could not allocate memory for descriptor or page tables.
+
+
+@node shmdt, shmctl, shmat, Shared Memory
+@subsection shmdt
+
+@example
+int shmdt (char *shmaddr);
+@end example
+
+@itemize @bullet
+@item
+shmaddr : attach address of segment (returned by shmat).
+@item
+returns : 0 on success. -1 on failure.
+@end itemize
+
+An attached segment is detached and @code{shm_nattch} decremented. The
+occupied region in user space is unmapped. The segment is destroyed
+if it is marked for destruction and @code{shm_nattch} is 0.
+@code{shm_lpid} and @code{shm_dtime} are updated.@refill
+
+@noindent
+Errors:@*
+@noindent
+EINVAL : No shared memory segment attached at shmaddr.
+
+
+@node shmctl, shmlimits, shmdt, Shared Memory
+@subsection shmctl
+
+@noindent
+Destroys allocated segments. Reads/Writes the control structures.
+
+@example
+int shmctl (int shmid, int cmd, struct shmid_ds *buf);
+@end example
+
+@itemize @bullet
+@item
+shmid : id got from call to shmget.
+@item
+cmd : IPC_STAT, IPC_SET, IPC_RMID (@xref{syscalls}).
+@itemize @asis
+@item
+IPC_SET : Used to set the owner uid, gid, and shm_perms.mode field.
+@item
+IPC_RMID : The segment is marked destroyed. It is only destroyed
+on the last detach.@refill
+@item
+IPC_STAT : The shmid_ds structure is copied into the user allocated buffer.
+@end itemize
+@item
+buf : used to read (IPC_STAT) or write (IPC_SET) information.
+@item
+returns : 0 on success, -1 on failure.
+@end itemize
+
+The user must execute an IPC_RMID shmctl call to free the memory
+allocated by the shared segment. Otherwise all the pages faulted in
+will continue to live in memory or swap.@refill
+
+@noindent
+Errors:@*
+@noindent
+EACCES : Do not have permission for requested access.@*
+@noindent
+EFAULT : buf is not accessible.@*
+@noindent
+EINVAL : shmid < 0 or unused.@*
+@noindent
+EIDRM : identifier destroyed.@*
+@noindent
+EPERM : not creator, owner or super-user (IPC_SET, IPC_RMID).
+
+
+@node shmlimits, Notes, shmctl, Shared Memory
+@subsection Limits on Shared Memory Resources
+
+@noindent
+Limits:
+@itemize @bullet
+@item
+SHMMNI max num of shared segments system wide ... 4096.
+@item
+SHMMAX max shared memory segment size (bytes) ... 4M
+@item
+SHMMIN min shared memory segment size (bytes).
+1 byte (though PAGE_SIZE is the effective minimum size).@refill
+@item
+SHMALL max shared mem system wide (in pages) ... policy.
+@item
+SHMLBA segment low boundary address multiple.
+Must be page aligned. SHMLBA = PAGE_SIZE.@refill
+@end itemize
+@noindent
+Unused or unimplemented:@*
+SHMSEG : maximum number of shared segments per process.
+
+
+
+@node Notes, top, shmlimits, top
+@section Miscellaneous Notes
+
+The system calls are mapped into one -- @code{sys_ipc}. This should be
+transparent to the user.@refill
+
+@subsection Semaphore @code{undo} requests
+
+There is one sem_undo structure associated with a process for
+each semaphore which was altered (with an undo request) by the process.
+@code{sem_undo} structures are freed only when the process exits.
+
+One major cause for unhappiness with the undo mechanism is that
+it does not fit in with the notion of having an atomic set of
+operations on an array. The undo requests for an array and each
+semaphore therein may have been accumulated over many @code{semop}
+calls. Thus use the undo mechanism with private semaphores only.@refill
+
+Should the process sleep in @code{exit} or should all undo
+operations be applied with the IPC_NOWAIT flag in effect?
+Currently those undo operations which go through immediately are
+applied and those that require a wait are ignored silently.@refill
+
+@subsection Shared memory, @code{malloc} and the @code{brk}.
+Note that since this section was written the implementation was
+changed so that non-specific attaches are done in the region
+1G - 1.5G. However much of the following is still worth thinking
+about so I left it in.
+
+On many systems, the shared memory is allocated in a special region
+of the address space ... way up somewhere. As mentioned earlier,
+this implementation attaches shared segments at the lowest possible
+address. Thus if you plan to use @code{malloc}, it is wise to malloc a
+large space and then proceed to attach the shared segments. This way
+malloc sets the brk sufficiently above the region it will use.@refill
+
+Alternatively you can use @code{sbrk} to adjust the @code{brk} value
+as you make shared memory attaches. The implementation is not very
+smart about selecting attach addresses. Using the system default
+addresses will result in fragmentation if detaches do not occur
+in the reverse sequence as attaches.@refill
+
+Taking control of the matter is probably best. The rule applied
+is that attaches are allowed in unmapped regions other than
+in the text space (see <a.out.h>). Also remember that attach addresses
+and segment sizes are multiples of PAGE_SIZE.@refill
+
+One more trap (I quote Bruno on this). If you use malloc() to get space
+for your shared memory (ie. to fix the @code{brk}), you must ensure you
+get an unmapped address range. This means you must mallocate more memory
+than you had ever allocated before. Memory returned by malloc(), used,
+then freed by free() and then again returned by malloc is no good.
+Neither is calloced memory.@refill
+
+Note that a shared memory region remains a shared memory region until
+you unmap it. Attaching a segment at the @code{brk} and calling malloc
+after that will result in an overlap of what malloc thinks is its
+space with what is really a shared memory region. For example in the case
+of a read-only attach, you will not be able to write to the overlapped
+portion.@refill
+
+
+@subsection Fork, exec and exit
+
+On a fork, the child inherits attached shared memory segments but
+not the semaphore undo information.@refill
+
+In the case of an exec, the attached shared segments are detached.
+The sem undo information however remains intact.@refill
+
+Upon exit, all attached shared memory segments are detached.
+The adjust values in the undo structures are added to the relevant semvals
+if the operations are permitted. Disallowed operations are ignored.@refill
+
+
+@subsection Other Features
+
+These features of the current implementation are
+likely to be modified in the future.
+
+The SHM_LOCK and SHM_UNLOCK flag are available (super-user) for use with the
+@code{shmctl} call to prevent swapping of a shared segment. The user
+must fault in any pages that are required to be present after locking
+is enabled.
+
+The IPC_INFO, MSG_STAT, MSG_INFO, SHM_STAT, SHM_INFO, SEM_STAT, SEMINFO
+@code{ctl} calls are used by the @code{ipcs} program to provide information
+on allocated resources. These can be modified as needed or moved to a proc
+file system interface.
+
+
+@sp 3
+Thanks to Ove Ewerlid, Bruno Haible, Ulrich Pegelow and Linus Torvalds
+for ideas, tutorials, bug reports and fixes, and merriment. And more
+thanks to Bruno.
+
+
+@contents
+@bye
+