summaryrefslogtreecommitdiffstats
path: root/include/linux
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] page flags: add commentry regarding field reservationAndy Whitcroft2006-04-111-2/+14
| | | | | | | | | | | Add some documentation regarding the utilisation of the flags field in struct page. This field is overloaded for per page bits and to hold node, zone and SPARSEMEM information. Make it clear which areas are used for what and how many bits are in each area. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] overcommit: add calculate_totalreserve_pages()Hideo AOKI2006-04-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These patches are an enhancement of OVERCOMMIT_GUESS algorithm in __vm_enough_memory(). - why the kernel needed patching When the kernel can't allocate anonymous pages in practice, currnet OVERCOMMIT_GUESS could return success. This implementation might be the cause of oom kill in memory pressure situation. If the Linux runs with page reservation features like /proc/sys/vm/lowmem_reserve_ratio and without swap region, I think the oom kill occurs easily. - the overall design approach in the patch When the OVERCOMMET_GUESS algorithm calculates number of free pages, the reserved free pages are regarded as non-free pages. This change helps to avoid the pitfall that the number of free pages become less than the number which the kernel tries to keep free. - testing results I tested the patches using my test kernel module. If the patches aren't applied to the kernel, __vm_enough_memory() returns success in the situation but autual page allocation is failed. On the other hand, if the patches are applied to the kernel, memory allocation failure is avoided since __vm_enough_memory() returns failure in the situation. I checked that on i386 SMP 16GB memory machine. I haven't tested on nommu environment currently. This patch adds totalreserve_pages for __vm_enough_memory(). Calculate_totalreserve_pages() checks maximum lowmem_reserve pages and pages_high in each zone. Finally, the function stores the sum of each zone to totalreserve_pages. The totalreserve_pages is calculated when the VM is initilized. And the variable is updated when /proc/sys/vm/lowmem_reserve_raito or /proc/sys/vm/min_free_kbytes are changed. Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] md: make sure 64bit fields in version-1 metadata are 64-bit alignedNeilBrown2006-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | reshape_position is a 64bit field that was not 64bit aligned. So swap with new_level. NOTE: this is a user-visible change. However: - The bad code has not appeared in a released kernel - This code is still marked 'experimental' - This only affects version-1 superblock, which are not in wide use - These field are only used (rather than simply reported) by user-space tools in extemely rare circumstances : after a reshape crashes in the first second of the reshape process. So I believe that, at this stage, the change is safe. Especially if people heed the 'help' message on use mdadm-2.4.1. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] de_thread: Don't confuse users do_each_thread.Eric W. Biederman2006-04-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oleg Nesterov spotted two interesting bugs with the current de_thread code. The simplest is a long standing double decrement of __get_cpu_var(process_counts) in __unhash_process. Caused by two processes exiting when only one was created. The other is that since we no longer detach from the thread_group list it is possible for do_each_thread when run under the tasklist_lock to see the same task_struct twice. Once on the task list as a thread_group_leader, and once on the thread list of another thread. The double appearance in do_each_thread can cause a double increment of mm_core_waiters in zap_threads resulting in problems later on in coredump_wait. To remedy those two problems this patch takes the simple approach of changing the old thread group leader into a child thread. The only routine in release_task that cares is __unhash_process, and it can be trivially seen that we handle cleaning up a thread group leader properly. Since de_thread doesn't change the pid of the exiting leader process and instead shares it with the new leader process. I change thread_group_leader to recognize group leadership based on the group_leader field and not based on pids. This should also be slightly cheaper then the existing thread_group_leader macro. I performed a quick audit and I couldn't see any user of thread_group_leader that cared about the difference. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] move ->eh_strategy_handler to the transport classChristoph Hellwig2006-04-101-1/+0Star
| | | | | | | | | | | | | Overriding the whole EH code is a per-transport, not per-host thing. Move ->eh_strategy_handler to the transport class, same as ->eh_timed_out. Downside is that scsi_host_alloc can't check for the total lack of EH anymore, but the transition period from old EH where we needed it is long gone already. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* [PATCH] Fix buddy list race that could lead to page lru list corruptionsNick Piggin2006-04-102-4/+9
| | | | | | | | | | | | | | | | | | | Rohit found an obscure bug causing buddy list corruption. page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0) to determine whether or not a free page's buddy is itself free and in the buddy lists. Each of the conjuncts may be true at different times due to unrelated conditions, so the non-atomic page_is_buddy test may find each conjunct to be true even if they were not both true at the same time (ie. the page was not on the buddy lists). Signed-off-by: Martin Bligh <mbligh@google.com> Signed-off-by: Rohit Seth <rohitseth@google.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [NETFILTER]: Add address family specific checksum helpersPatrick McHardy2006-04-103-0/+22
| | | | | | | | | | Add checksum operation which takes care of verifying the checksum and dealing with HW checksum errors and avoids multiple checksum operations by setting ip_summed to CHECKSUM_UNNECESSARY after successful verification. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Introduce infrastructure for address family specific operationsPatrick McHardy2006-04-101-7/+16
| | | | | | | | | Change the queue rerouter intrastructure to a generic usable infrastructure for address family specific operations as a base for some cleanups. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: H.323 helper: move some function prototypes to ip_conntrack_h323.hJing Min Zhao2006-04-103-0/+1088
| | | | | | | | | | Move prototypes of NAT callbacks to ip_conntrack_h323.h. Because the use of typedefs as arguments, some header files need to be moved as well. Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Add helper functions for mass hook registration/unregistrationPatrick McHardy2006-04-101-0/+2
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] x86_64: Fix drift with HPET timer enabledJordan Hargrave2006-04-091-0/+6
| | | | | | | | | | | | | | | | | | | | If the HPET timer is enabled, the clock can drift by ~3 seconds a day. This is due to the HPET timer not being initialized with the correct setting (still using PIT count). If HZ changes, this drift can become even more pronounced. HPET patch initializes tick_nsec with correct tick_nsec settings for HPET timer. Vojtech comments: "It's not entirely correct (it assumes the HPET ticks totally exactly), but it's significantly better than assuming the PIT error there." Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: Handle empty PXMs that only contain hotplug memoryAndi Kleen2006-04-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | The node setup code would try to allocate the node metadata in the node itself, but that fails if there is no memory in there. This can happen with memory hotplug when the hotplug area defines an so far empty node. Now use bootmem to try to allocate the mem_map in other nodes. And if it fails don't panic, but just ignore the node. To make this work I added a new __alloc_bootmem_nopanic function that does what its name implies. TBD should try to use nearby nodes here. Currently we just use any. It's hard to do it better because bootmem doesn't have proper fallback lists yet. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: Support memory hotadd without sparsememAndi Kleen2006-04-092-8/+9
| | | | | | | | | | | Memory hotadd doesn't need SPARSEMEM, but can be handled by just preallocating mem_maps. This only needs some untangling of ifdefs to enable the necessary code even without SPARSEMEM. Originally from Keith Mannthey, hacked by AK. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-blockLinus Torvalds2006-04-021-1/+5
|\ | | | | | | | | | | | | | | | | | | | | * 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block: [PATCH] splice: fix page stealing LRU handling. [PATCH] splice: page stealing needs to wait_on_page_writeback() [PATCH] splice: export generic_splice_sendpage [PATCH] splice: add a SPLICE_F_MORE flag [PATCH] splice: add comments documenting more of the code [PATCH] splice: improve writeback and clean up page stealing [PATCH] splice: fix shadow[] filling logic
| * [PATCH] splice: fix page stealing LRU handling.Jens Axboe2006-04-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Originally from Nick Piggin, just adapted to the newer branch. You can't check PageLRU without holding zone->lru_lock. The page release code can get away with it only because the page refcount is 0 at that point. Also, you can't reliably remove pages from the LRU unless the refcount is 0. Ever. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Jens Axboe <axboe@suse.de>
| * [PATCH] splice: add a SPLICE_F_MORE flagJens Axboe2006-04-021-0/+1
| | | | | | | | | | | | | | This lets userspace indicate whether more data will be coming in a subsequent splice call. Signed-off-by: Jens Axboe <axboe@suse.de>
| * [PATCH] splice: improve writeback and clean up page stealingJens Axboe2006-04-021-1/+0Star
| | | | | | | | | | | | | | | | | | | | By cleaning up the writeback logic (killing write_one_page() and the manual set_page_dirty()), we can get rid of ->stolen inside the pipe_buffer and just keep it local in pipe_to_file(). This also adds dirty page balancing logic and O_SYNC handling. Signed-off-by: Jens Axboe <axboe@suse.de>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds2006-04-022-2/+2
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (48 commits) Documentation: fix minor kernel-doc warnings BUG_ON() Conversion in drivers/net/ BUG_ON() Conversion in drivers/s390/net/lcs.c BUG_ON() Conversion in mm/slab.c BUG_ON() Conversion in mm/highmem.c BUG_ON() Conversion in kernel/signal.c BUG_ON() Conversion in kernel/signal.c BUG_ON() Conversion in kernel/ptrace.c BUG_ON() Conversion in ipc/shm.c BUG_ON() Conversion in fs/freevxfs/ BUG_ON() Conversion in fs/udf/ BUG_ON() Conversion in fs/sysv/ BUG_ON() Conversion in fs/inode.c BUG_ON() Conversion in fs/fcntl.c BUG_ON() Conversion in fs/dquot.c BUG_ON() Conversion in md/raid10.c BUG_ON() Conversion in md/raid6main.c BUG_ON() Conversion in md/raid5.c Fix minor documentation typo BFP->BPF in Documentation/networking/tuntap.txt ...
| * Documentation: fix minor kernel-doc warningsMartin Waitz2006-04-021-1/+1
| | | | | | | | | | | | | | This patch updates the comments to match the actual code. Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
| * Fix comments: s/granuality/granularity/Kalin KOZHUHAROV2006-04-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I was grepping through the code and some `grep ganularity -R .` didn't catch what I thought. Then looking closer I saw the term "granuality" used in only four places (in comments) and granularity in many more places describing the same idea. Some other facts: dictionary.com does not know such a word define:granuality on google is not found (and pages for granuality are mostly related to patches to the kernel) it has not been discussed as a term on LKML, AFAICS (=Can Search) To be consistent, I think granularity should be used everywhere. Signed-off-by: Kalin KOZHUHAROV <kalin@thinrope.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* | Merge master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvbLinus Torvalds2006-04-021-57/+8Star
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (49 commits) V4L/DVB (3667b): cpia2: fix function prototype V4L/DVB (3702): Make msp3400 routing defines more consistent V4L/DVB (3700): Remove obsolete commands from tvp5150.c V4L/DVB (3697): More msp3400 and bttv fixes V4L/DVB (3696): Previous change for cx2341X boards broke the remote support V4L/DVB (3693): Fix msp3400c and bttv stereo/mono/bilingual detection/handling V4L/DVB (3692): Keep experimental SLICED_VBI defines under an #if 0 V4L/DVB (3689): Kconfig: fix VP-3054 Secondary I2C Bus build configuration menu dependencies V4L/DVB (3673): Fix budget-av CAM reset V4L/DVB (3672): Fix memory leak in dvr open V4L/DVB (3671): New module parameter 'tv_standard' (dvb-ttpci driver) V4L/DVB (3670): Fix typo in comment V4L/DVB (3669): Configurable dma buffer size for saa7146-based budget dvb cards V4L/DVB (3653h): Move usb v4l docs into Documentation/video4linux V4L/DVB (3667a): Fix SAP + stereo mode at msp3400 V4L/DVB (3666): Remove trailing newlines V4L/DVB (3665): Add new NEC uPD64031A and uPD64083 i2c drivers V4L/DVB (3663): Fix msp3400c wait time and better audio mode fallbacks V4L/DVB (3662): Don't set msp3400c-non-existent register V4L/DVB (3661): Add wm8739 stereo audio ADC i2c driver ...
| * | V4L/DVB (3692): Keep experimental SLICED_VBI defines under an #if 0Hans Verkuil2006-04-021-57/+8Star
| |/ | | | | | | | | | | | | | | The sliced VBI defines added in videodev2.h are removed since requires more discussion. Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
* | Merge master.kernel.org:/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds2006-04-027-17/+43
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/dtor/input: (26 commits) Input: add support for Braille devices Input: synaptics - limit rate to 40pps on Toshiba Protege M300 Input: gamecon - add SNES mouse support Input: make modalias code respect allowed buffer size Input: convert /proc handling to seq_file Input: limit attributes' output to PAGE_SIZE Input: gameport - fix memory leak Input: serio - fix memory leak Input: zaurus keyboard driver updates Input: i8042 - fix logic around pnp_register_driver() Input: ns558 - fix logic around pnp_register_driver() Input: pcspkr - separate device and driver registration Input: atkbd - allow disabling on X86_PC (if EMBEDDED) Input: atkbd - disable softrepeat for dumb keyboards Input: atkbd - fix complaints about 'releasing unknown key 0x7f' Input: HID - fix duplicate key mapping for Logitech UltraX remote Input: use kzalloc() throughout the code Input: fix input_free_device() implementation Input: initialize serio and gameport at subsystem level Input: uinput - semaphore to mutex conversion ...
| * | Input: add support for Braille devicesSamuel Thibault2006-04-023-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add KEY_BRL_* input keys and K_BRL_* keycodes; - Add emulation of how braille keyboards usually combine braille dots to the console keyboard driver; - Add handling of unicode U+28xy diacritics. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | Manual merge with Linus.Dmitry Torokhov2006-04-02223-1963/+3508
| |\| | | | | | | | | | | | | | | | | | | Conflicts: arch/powerpc/kernel/setup-common.c drivers/input/keyboard/hil_kbd.c drivers/input/mouse/hil_ptr.c
| * | Input: fix input_free_device() implementationDmitry Torokhov2006-03-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | input_free_device can't just call kfree because if input_register_device fails after successfully registering corresponding class device there is a chance that someone could get a reference to it. We need to use input_put_device() to make sure that we don't delete input device until last reference to it was dropped. Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6Dmitry Torokhov2006-03-1431-40/+103
| |\ \
| * | | Input: uinput - semaphore to mutex conversionDmitry Torokhov2006-02-191-2/+2
| | | | | | | | | | | | | | | | Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | | Input: gameport - semaphore to mutex conversionArjan van de Ven2006-02-191-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | | Input: serio - semaphore to mutex conversionArjan van de Ven2006-02-192-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | | Input: input core - semaphore to mutex conversionJes Sorensen2006-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
* | | | Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2006-04-028-57/+154
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: Fully fix the memory leaks in sys_accept(). [NETFILTER]: iptables 32bit compat layer [NETFILTER]: {ip,nf}_conntrack_netlink: fix expectation notifier unregistration [NETFILTER]: fix ifdef for connmark support in nf_conntrack_netlink [NETFILTER]: x_tables: unify IPv4/IPv6 multiport match [NETFILTER]: x_tables: unify IPv4/IPv6 esp match [NET]: Fix dentry leak in sys_accept(). [IPSEC]: Kill unused decap state structure [IPSEC]: Kill unused decap state argument [NET]: com90xx kmalloc fix [TG3]: Update driver version and reldate. [TG3]: Revert "Speed up SRAM access"
| * | | | [NETFILTER]: iptables 32bit compat layerDmitry Mishin2006-04-012-0/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends current iptables compatibility layer in order to get 32bit iptables to work on 64bit kernel. Current layer is insufficient due to alignment checks both in kernel and user space tools. Patch is for current net-2.6.17 with addition of move of ipt_entry_{match| target} definitions to xt_entry_{match|target}. Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | [NETFILTER]: x_tables: unify IPv4/IPv6 multiport matchYasuyuki Kozakai2006-04-013-39/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This unifies ipt_multiport and ip6t_multiport to xt_multiport. As a result, this addes support for inversion and port range match to IPv6 packets. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | [NETFILTER]: x_tables: unify IPv4/IPv6 esp matchYasuyuki Kozakai2006-04-013-18/+22
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This unifies ipt_esp and ip6t_esp to xt_esp. Please note that now a user program needs to specify IPPROTO_ESP as protocol to use esp match with IPv6. This means that ip6tables requires '-p esp' like iptables. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* / | | splice: add SPLICE_F_NONBLOCK flagLinus Torvalds2006-04-021-0/+3
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | It doesn't make the splice itself necessarily nonblocking (because the actual file descriptors that are spliced from/to may block unless they have the O_NONBLOCK flag set), but it makes the splice pipe operations nonblocking. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2006-03-312-54/+30Star
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: Allow skb headroom to be overridden [TCP]: Kill unused extern decl for tcp_v4_hash_connecting() [NET]: add SO_RCVBUF comment [NET]: Deinline some larger functions from netdevice.h [DCCP]: Use NULL for pointers, comfort sparse. [DECNET]: Fix refcount
| * | | [NET]: Allow skb headroom to be overriddenAnton Blanchard2006-03-311-4/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we added NET_IP_ALIGN so an architecture can override the padding done to align headers. The next step is to allow the skb headroom to be overridden. We currently always reserve 16 bytes to grow into, meaning all DMAs start 16 bytes into a cacheline. On ppc64 we really want DMA writes to start on a cacheline boundary, so we increase that headroom to one cacheline. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | [NET]: Deinline some larger functions from netdevice.hDenis Vlasenko2006-03-301-50/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a allyesconfig'ured kernel: Size Uses Wasted Name and definition ===== ==== ====== ================================================ 95 162 12075 netif_wake_queue include/linux/netdevice.h 129 86 9265 dev_kfree_skb_any include/linux/netdevice.h 127 56 5885 netif_device_attach include/linux/netdevice.h 73 86 4505 dev_kfree_skb_irq include/linux/netdevice.h 46 60 1534 netif_device_detach include/linux/netdevice.h 119 16 1485 __netif_rx_schedule include/linux/netdevice.h 143 5 492 netif_rx_schedule include/linux/netdevice.h 81 7 366 netif_schedule include/linux/netdevice.h netif_wake_queue is big because __netif_schedule is a big inline: static inline void __netif_schedule(struct net_device *dev) { if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) { unsigned long flags; struct softnet_data *sd; local_irq_save(flags); sd = &__get_cpu_var(softnet_data); dev->next_sched = sd->output_queue; sd->output_queue = dev; raise_softirq_irqoff(NET_TX_SOFTIRQ); local_irq_restore(flags); } } static inline void netif_wake_queue(struct net_device *dev) { #ifdef CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state)) __netif_schedule(dev); } By de-inlining __netif_schedule we are saving a lot of text at each callsite of netif_wake_queue and netif_schedule. __netif_rx_schedule is also big, and it makes more sense to keep both of them out of line. Patch also deinlines dev_kfree_skb_any. We can deinline dev_kfree_skb_irq instead... oh well. netif_device_attach/detach are not hot paths, we can deinline them too. Signed-off-by: Denis Vlasenko <vda@ilport.com.ua> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | [PATCH] fs/namei.c: make lookup_hash() staticAdrian Bunk2006-03-311-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As announced, lookup_hash() can now become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] fbcon: Fix big-endian bogosity in slow_imageblit()Antonino A. Daplas2006-03-311-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The monochrome->color expansion routine that handles bitmaps which have (widths % 8) != 0 (slow_imageblit) produces corrupt characters in big-endian. This is caused by a bogus bit test in slow_imageblit(). Fix. This patch may deserve to go to the stable tree. The code has already been well tested in little-endian machines. It's only in big-endian where there is uncertainty and Herbert confirmed that this is the correct way to go. It should not introduce regressions. Signed-off-by: Antonino Daplas <adaplas@pol.net> Acked-by: Herbert Poetzl <herbert@13thfloor.at> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] backlight: Backlight Class ImprovementsRichard Purdie2006-03-311-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backlight class attributes are currently easy to implement incorrectly. Moving certain handling into the backlight core prevents this whilst at the same time makes the drivers simpler and consistent. The following changes are included: The brightness attribute only sets and reads the brightness variable in the backlight_properties structure. The power attribute only sets and reads the power variable in the backlight_properties structure. Any framebuffer blanking events change a variable fb_blank in the backlight_properties structure. The backlight driver has only two functions to implement. One function is called when any of the above properties change (to update the backlight brightness), the second is called to return the current backlight brightness value. A new attribute "actual_brightness" is added to return this brightness as determined by the driver having combined all the above factors (and any driver/device specific factors). Additionally, the backlight core takes care of checking the maximum brightness is not exceeded and of turning off the backlight before device removal. The corgi backlight driver is updated to reflect these changes. Signed-off-by: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] dcache: Add helper d_hash_and_lookupEric W. Biederman2006-03-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is very common to hash a dentry and then to call lookup. If we take fs specific hash functions into account the full hash logic can get ugly. Further full_name_hash as an inline function is almost 100 bytes on x86 so having a non-inline choice in some cases can measurably decrease code size. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] pidhash: Refactor the pid hash tableEric W. Biederman2006-03-312-17/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplifies the code, reduces the need for 4 pid hash tables, and makes the code more capable. In the discussions I had with Oleg it was felt that to a large extent the cleanup itself justified the work. With struct pid being dynamically allocated meant we could create the hash table entry when the pid was allocated and free the hash table entry when the pid was freed. Instead of playing with the hash lists when ever a process would attach or detach to a process. For myself the fact that it gave what my previous task_ref patch gave for free with simpler code was a big win. The problem is that if you hold a reference to struct task_struct you lock in 10K of low memory. If you do that in a user controllable way like /proc does, with an unprivileged but hostile user space application with typical resource limits of 1000 fds and 100 processes I can trigger the OOM killer by consuming all of low memory with task structs, on a machine wight 1GB of low memory. If I instead hold a reference to struct pid which holds a pointer to my task_struct, I don't suffer from that problem because struct pid is 2 orders of magnitude smaller. In fact struct pid is small enough that most other kernel data structures dwarf it, so simply limiting the number of referring data structures is enough to prevent exhaustion of low memory. This splits the current struct pid into two structures, struct pid and struct pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one. struct pid_link is the per process linkage into the hash tables and lives in struct task_struct. struct pid is given an indepedent lifetime, and holds pointers to each of the pid types. The independent life of struct pid simplifies attach_pid, and detach_pid, because we are always manipulating the list of pids and not the hash table. In addition in giving struct pid an indpendent life it makes the concept much more powerful. Kernel data structures can now embed a struct pid * instead of a pid_t and not suffer from pid wrap around problems or from keeping unnecessarily large amounts of memory allocated. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] task: RCU protect task->usageEric W. Biederman2006-03-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A big problem with rcu protected data structures that are also reference counted is that you must jump through several hoops to increase the reference count. I think someone finally implemented atomic_inc_not_zero(&count) to automate the common case. Unfortunately this means you must special case the rcu access case. When data structures are only visible via rcu in a manner that is not determined by the reference count on the object (i.e. tasks are visible until their zombies are reaped) there is a much simpler technique we can employ. Simply delaying the decrement of the reference count until the rcu interval is over. What that means is that the proc code that looks up a task and later wants to sleep can now do: rcu_read_lock(); task = find_task_by_pid(some_pid); if (task) { get_task_struct(task); } rcu_read_unlock(); The effect on the rest of the kernel is that put_task_struct becomes cheaper and immediate, and in the case where the task has been reaped it frees the task immediate instead of unnecessarily waiting an until the rcu interval is over. Cleanup of task_struct does not happen when its reference count drops to zero, instead cleanup happens when release_task is called. Tasks can only be looked up via rcu before release_task is called. All rcu protected members of task_struct are freed by release_task. Therefore we can move call_rcu from put_task_struct into release_task. And we can modify release_task to not immediately release the reference count but instead have it call put_task_struct from the function it gives to call_rcu. The end result: - get_task_struct is safe in an rcu context where we have just looked up the task. - put_task_struct() simplifies into its old pre rcu self. This reorganization also makes put_task_struct uncallable from modules as it is not exported but it does not appear to be called from any modules so this should not be an issue, and is trivially fixed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] resurrect __put_task_structAndrew Morton2006-03-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This just got nuked in mainline. Bring it back because Eric's patches use it. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] sched: activate SCHED BATCH expiredCon Kolivas2006-03-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To increase the strength of SCHED_BATCH as a scheduling hint we can activate batch tasks on the expired array since by definition they are latency insensitive tasks. Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] sched: cleanup task_activated()Con Kolivas2006-03-311-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The activated flag in task_struct is used to track different sleep types and its usage is somewhat obfuscated. Convert the variable to an enum with more descriptive names without altering the function. Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] sched: reduce overhead of calc_loadJack Steiner2006-03-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, count_active_tasks() calls both nr_running() & nr_interruptible(). Each of these functions does a "for_each_cpu" & reads values from the runqueue of each cpu. Although this is not a lot of instructions, each runqueue may be located on different node. Depending on the architecture, a unique TLB entry may be required to access each runqueue. Since there may be more runqueues than cpu TLB entries, a scan of all runqueues can trash the TLB. Each memory reference incurs a TLB miss & refill. In addition, the runqueue cacheline that contains nr_running & nr_uninterruptible may be evicted from the cache between the two passes. This causes unnecessary cache misses. Combining nr_running() & nr_interruptible() into a single function substantially reduces the TLB & cache misses on large systems. This should have no measureable effect on smaller systems. On a 128p IA64 system running a memory stress workload, the new function reduced the overhead of calc_load() from 605 usec/call to 324 usec/call. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] hrtimer: create generic sleeperThomas Gleixner2006-03-311-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The removal of the data field in the hrtimer structure enforces the embedding of the timer into another data structure. nanosleep now uses a private implementation of the most common used timer callback function (simple task wakeup). In order to avoid the reimplentation of such functionality all over the place a generic hrtimer_sleeper functionality is created. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>