openslx/kernel-qcow2-linux.git - In-kernel qcow2 (Kernel part)

	Commit message (Collapse)	Author	Age	Files	Lines
*	i2c-designware: Divide i2c_dw_xfer_msg into two functions	Shinya Kuribayashi	2009-12-09	1	-20/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have some steps at the top of i2c_dw_xfer_msg() to set up a slave address and enable DW I2C core. And it's executed only when we don't have STATUS_WRITE_IN_PROGRESS. But we need to make sure that STATUS_WRITE_IN_PROGRESS only indicates that we have a pending i2c_msg to process. In other words, even if STATUS_WRITE_IN_PROGRESS is not set, that doesn't mean we're at initial state in the I2C transaction. Since i2c_dw_xfer_msg() will be invoked again and again during a transaction, those init steps have a possibility to be re-processed needlessly. For example, this issue easily takes place when processing a combined transaction with a certain condition (the number of tx bytes in the first i2c_msg, equals to the Tx FIFO depth). Consequently we should not use STATUS_WRITE_IN_PROGRESS to determine where we're at in an I2C transaction. It would be better to separate those initialization steps from i2c_dw_xfer_msg(). Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: Enable RX_FULL interrupt	Shinya Kuribayashi	2009-12-09	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	Enable RX_FULL interrupt mask by default, and hook it in the interrupt handler. If requested amount of rx data (defined by IC_RX_TL) is not available, we don't have to process i2c_dw_read(). Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: Set Tx/Rx FIFO threshold levels	Shinya Kuribayashi	2009-12-09	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a hardware feature, DW I2C core generates a STOP condition whenever the Tx FIFO becomes empty (strictly speaking, whenever the last byte in the Tx FIFO is sent out), even if we have more bytes to be written. In other words, we must never make "Tx FIFO underrun" happen during a transaction, except for the last byte. For the safety's sake, we'd make TX_EMPTY interrupt get triggered every time one byte is processed. The Rx FIFO threshold needs to be set as well. Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: Process i2c_msg messages in the interrupt handler	Shinya Kuribayashi	2009-12-09	1	-25/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Symptom: -------- When we're going to send/receive the longer size of data than the Tx FIFO length, the I2C transaction will be divided into several separated transactions, limited by the Tx FIFO length. Details: -------- As a hardware feature, DW I2C core generates a STOP condition whenever the Tx FIFO becomes empty (strictly speaking, whenever the last byte in the Tx FIFO is sent out), even if we have more bytes to be written. Then, once a new transmit data is written to the Tx FIFO, DW I2C core will initiate a new transaction, which leads to another START condition. This explains how the transaction in question goes, and implies that current tasklet-based dw_i2c_pump_msg() strategy couldn't meet the timing constraint required for avoiding Tx FIFO underrun. To avoid this scenario, we must keep providing new transmit data within a given time period. In case of Fast-mode + 32-byte Tx FIFO, for instance, it takes about 22.5[us] to process single byte, and 720[us] in total. This patch removes the existing tasklet-based "pump" system, and move its jobs into the interrupt handler. Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: i2c_dw_xfer_msg: Fix i2c_msg search bug	Shinya Kuribayashi	2009-12-09	1	-8/+12
\| \| \| \| \| \| \| \| \| \|	In case a work-in-progress i2c_msg has more bytes to be written, we need to set STATUS_WRITE_IN_PROGRESS and exit from the msg_write_idx- searching loop. Otherwise, we will overtake the current msg_write_idx without waiting for its transmission to be processed. Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: Improved _HCNT/_LCNT calculation	Shinya Kuribayashi	2009-12-09	1	-9/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Calculate with accurate conditional expressions from DW manuals. * Round ic_clk by adding 0.5 as it's important at high ic_clk rate. * Take into account "tHD;STA" issue for _HCNT calculation. * Take into account "tf" for _LCNT calculation. * Add "cond" and "offset" fot further correction requirements. For _HCNT calculation, there's one issue needs to be carefully considered; DesignWare I2C core doesn't seem to have solid strategy to meet the tHD;STA timing spec. If you configure _HCNT based on the tHIGH timing spec, it easily results in violation of the tHD;STA spec. After many trials, we came to the conclusion that the tHD;STA period is proportional to (_HCNT + 3). For the safety's sake, this should be selected by default. As for _LCNT calculation, DW I2C core has one characteristic behavior; he starts counting the SCL CNTs for the LOW period of the SCL clock (tLOW) as soon as it pulls the SCL line. At that time, he doesn't take into account the fall time of SCL signal (tf), IOW, he starts counting CNTs without confirming the SCL input voltage has dropped to below VIL. This characteristics becomes a problem on some platforms where tf is considerably long, and results in violation of the tLOW timing spec. To make the driver configurable as much as possible for various cases, we'd have separated arguments "tf" and "offset", and for safety default values should be 0.3 us and 0, respectively. Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: Remove an useless local variable "num"	Shinya Kuribayashi	2009-12-09	1	-4/+2
\| \| \| \| \| \| \| \| \|	We couldn't know the original intent for this variable, but at this point it's useless. Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: i2c_dw_xfer_msg: Use "struct dw_i2c_dev" pointer	Shinya Kuribayashi	2009-12-09	1	-4/+3
\| \| \| \| \| \| \| \| \|	We don't have to use "struct i2c_adapter" pointer here. Let's use a local "struct dw_i2c_dev" pointer, instead. Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: i2c_dw_read: Use "struct dw_i2c_dev" pointer	Shinya Kuribayashi	2009-12-09	1	-4/+3
\| \| \| \| \| \| \| \| \|	We don't have to use "struct i2c_adapter" pointer here. Let's use a local "struct dw_i2c_dev" pointer, instead. Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: Use platform_get_irq helper	Shinya Kuribayashi	2009-12-09	1	-6/+6
\| \| \| \| \| \|	Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: Don't use the IC_CLR_INTR register to clear interrupts	Shinya Kuribayashi	2009-12-09	1	-5/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're strongly discouraged from using the IC_CLR_INTR register because it clears all software-clearable interrupts asserted at the moment. stat = readl(IC_INTR_STAT); : : <=== Interrupts asserted during this period will be lost : readl(IC_CLR_INTR); Instead, use the separately-prepared IC_CLR_* registers. At the same time, this patch adds all remaining interrupt definitions available in the DesignWare I2C hardware. Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	i2c-designware: Consolidate to use 32-bit word accesses	Shinya Kuribayashi	2009-12-09	1	-38/+38
\| \| \| \| \| \| \| \| \| \| \| \|	This driver looks originally meant for armel machines where readw()/ writew() works perfectly fine with this hardware. But that doens't work for big-endian systems. This patch converts all 8/16-bit-aware usages to 32-bit variants. Signed-off-by: Shinya Kuribayashi <shinya.kuribayashi@necel.com> Acked-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
*	Merge branch 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds	2009-12-08	107	-2132/+24813
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits) cfq-iosched: Do not access cfqq after freeing it block: include linux/err.h to use ERR_PTR cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit blkio: Allow CFQ group IO scheduling even when CFQ is a module blkio: Implement dynamic io controlling policy registration blkio: Export some symbols from blkio as its user CFQ can be a module block: Fix io_context leak after failure of clone with CLONE_IO block: Fix io_context leak after clone with CLONE_IO cfq-iosched: make nonrot check logic consistent io controller: quick fix for blk-cgroup and modular CFQ cfq-iosched: move IO controller declerations to a header file cfq-iosched: fix compile problem with !CONFIG_CGROUP blkio: Documentation blkio: Wait on sync-noidle queue even if rq_noidle = 1 blkio: Implement group_isolation tunable blkio: Determine async workload length based on total number of queues blkio: Wait for cfq queue to get backlogged if group is empty blkio: Propagate cgroup weight updation to cfq groups blkio: Drop the reference to queue once the task changes cgroup blkio: Provide some isolation between groups ...
\| *	cfq-iosched: Do not access cfqq after freeing it	Vivek Goyal	2009-12-07	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a crash during boot reported by Jeff Moyer. Fix the issue of accessing cfqq after freeing it. Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@carl.(none)>
\| *	block: include linux/err.h to use ERR_PTR	Stephen Rothwell	2009-12-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit	Jens Axboe	2009-12-06	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After the merge of the IO controller patches, booting on my megaraid box ran much slower. Vivek Goyal traced it down to megaraid discovery creating tons of devices, each suffering a grace period when they later kill that queue (if no device is found). So lets use call_rcu() to batch these deferred frees, instead of taking the grace period hit for each one. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Allow CFQ group IO scheduling even when CFQ is a module	Vivek Goyal	2009-12-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Now issues of blkio controller and CFQ in module mode should be fixed. Enable the cfq group scheduling support in module mode. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Implement dynamic io controlling policy registration	Vivek Goyal	2009-12-04	4	-12/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o One of the goals of block IO controller is that it should be able to support mulitple io control policies, some of which be operational at higher level in storage hierarchy. o To begin with, we had one io controlling policy implemented by CFQ, and I hard coded the CFQ functions called by blkio. This created issues when CFQ is compiled as module. o This patch implements a basic dynamic io controlling policy registration functionality in blkio. This is similar to elevator functionality where ioschedulers register the functions dynamically. o Now in future, when more IO controlling policies are implemented, these can dynakically register with block IO controller. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Export some symbols from blkio as its user CFQ can be a module	Vivek Goyal	2009-12-04	3	-2/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o blkio controller is inside the kernel and cfq makes use of interfaces exported by blkio. CFQ can be a module too, hence export symbols used by CFQ. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	block: Fix io_context leak after failure of clone with CLONE_IO	Louis Rilling	2009-12-04	4	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With CLONE_IO, parent's io_context->nr_tasks is incremented, but never decremented whenever copy_process() fails afterwards, which prevents exit_io_context() from calling IO schedulers exit functions. Give a task_struct to exit_io_context(), and call exit_io_context() instead of put_io_context() in copy_process() cleanup path. Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	block: Fix io_context leak after clone with CLONE_IO	Louis Rilling	2009-12-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With CLONE_IO, copy_io() increments both ioc->refcount and ioc->nr_tasks. However exit_io_context() only decrements ioc->refcount if ioc->nr_tasks reaches 0. Always call put_io_context() in exit_io_context(). Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	cfq-iosched: make nonrot check logic consistent	Shaohua Li	2009-12-04	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cfq_arm_slice_timer() has logic to disable idle window for SSD device. The same thing should be done at cfq_select_queue() too, otherwise we will still see idle window. This makes the nonrot check logic consistent in cfq. Tests in a intel SSD with low_latency knob close, below patch can triple disk thoughput for muti-thread sequential read. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	io controller: quick fix for blk-cgroup and modular CFQ	Jens Axboe	2009-12-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	It's currently not an allowed configuration, so express that in Kconfig. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	cfq-iosched: move IO controller declerations to a header file	Jens Axboe	2009-12-04	3	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	They should not be declared inside some other file that's not related to CFQ. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	cfq-iosched: fix compile problem with !CONFIG_CGROUP	Jens Axboe	2009-12-03	1	-0/+10
\| \| \| \| \| \| \| \|	Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Documentation	Vivek Goyal	2009-12-03	1	-0/+135
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Wait on sync-noidle queue even if rq_noidle = 1	Vivek Goyal	2009-12-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o rq_noidle() is supposed to tell cfq that do not expect a request after this one, hence don't idle. But this does not seem to work very well. For example for direct random readers, rq_noidle = 1 but there is next request coming after this. Not idling, leads to a group not getting its share even if group_isolation=1. o The right solution for this issue is to scan the higher layers and set right flag (WRITE_SYNC or WRITE_ODIRECT). For the time being, this single line fix helps. This should not have any significant impact when we are not using cgroups. I will later figure out IO paths in higher layer and fix it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Implement group_isolation tunable	Vivek Goyal	2009-12-03	1	-1/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o If a group is running only a random reader, then it will not have enough traffic to keep disk busy and we will reduce overall throughput. This should result in better latencies for random reader though. If we don't idle on random reader service tree, then this random reader will experience large latencies if there are other groups present in system with sequential readers running in these. o One solution suggested by corrado is that by default keep the random readers or sync-noidle workload in root group so that during one dispatch round we idle only once on sync-noidle tree. This means that all the sync-idle workload queues will be in their respective group and we will see service differentiation in those but not on sync-noidle workload. o Provide a tunable group_isolation. If set, this will make sure that even sync-noidle queues go in their respective group and we wait on these. This provides stronger isolation between groups but at the expense of throughput if group does not have enough traffic to keep the disk busy. o By default group_isolation = 0 Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Determine async workload length based on total number of queues	Vivek Goyal	2009-12-03	1	-5/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Async queues are not per group. Instead these are system wide and maintained in root group. Hence their workload slice length should be calculated based on total number of queues in the system and not just queues in the root group. o As root group's default weight is 1000, make sure to charge async queue more in terms of vtime so that it does not get more time on disk because root group has higher weight. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Wait for cfq queue to get backlogged if group is empty	Vivek Goyal	2009-12-03	1	-5/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o If a queue consumes its slice and then gets deleted from service tree, its associated group will also get deleted from service tree if this was the only queue in the group. That will make group loose its share. o For the queues on which we have idling on and if these have used their slice, wait a bit for these queues to get backlogged again and then expire these queues so that group does not loose its share. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Propagate cgroup weight updation to cfq groups	Vivek Goyal	2009-12-03	2	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Propagate blkio cgroup weight updation to associated cfq groups. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Drop the reference to queue once the task changes cgroup	Vivek Goyal	2009-12-03	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o If a task changes cgroup, drop reference to the cfqq associated with io context and set cfqq pointer stored in ioc to NULL so that upon next request arrival we will allocate a new queue in new group. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Provide some isolation between groups	Vivek Goyal	2009-12-03	1	-10/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Do not allow following three operations across groups for isolation. - selection of co-operating queues - preemtpions across groups - request merging across groups. o Async queues are currently global and not per group. Allow preemption of an async queue if a sync queue in other group gets backlogged. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Export disk time and sectors used by a group to user space	Vivek Goyal	2009-12-03	3	-6/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Export disk time and sector used by a group to user space through cgroup interface. o Also export a "dequeue" interface to cgroup which keeps track of how many a times a group was deleted from service tree. Helps in debugging. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Some debugging aids for CFQ	Vivek Goyal	2009-12-03	5	-1/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Some debugging aids for CFQ. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Take care of cgroup deletion and cfq group reference counting	Vivek Goyal	2009-12-03	3	-2/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o One can choose to change elevator or delete a cgroup. Implement group reference counting so that both elevator exit and cgroup deletion can take place gracefully. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Nauman Rafique <nauman@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Dynamic cfq group creation based on cgroup tasks belongs to	Vivek Goyal	2009-12-03	1	-11/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Determine the cgroup IO submitting task belongs to and create the cfq group if it does not exist already. o Also link cfqq and associated cfq group. o Currently all async IO is mapped to root group. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Group time used accounting and workload context save restore	Vivek Goyal	2009-12-03	1	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o This patch introduces the functionality to do the accounting of group time when a queue expires. This time used decides which is the group to go next. o Also introduce the functionlity to save and restore the workload type context with-in group. It might happen that once we expire the cfq queue and group, a different group will schedule in and we will lose the context of the workload type. Hence save and restore it upon queue expiry. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Implement per cfq group latency target and busy queue avg	Vivek Goyal	2009-12-03	1	-20/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o So far we had 300ms soft target latency system wide. Now with the introduction of cfq groups, divide that latency by number of groups so that one can come up with group target latency which will be helpful in determining the workload slice with-in group and also the dynamic slice length of the cfq queue. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Introduce per cfq group weights and vdisktime calculations	Vivek Goyal	2009-12-03	2	-2/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Bring in the per cfq group weight and how vdisktime is calculated for the group. Also bring in the functionality of updating the min_vdisktime of the group service tree. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Introduce blkio controller cgroup interface	Vivek Goyal	2009-12-03	7	-0/+260
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o This is basic implementation of blkio controller cgroup interface. This is the common interface visible to user space and should be used by different IO control policies as we implement those. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Introduce the root service tree for cfq groups	Vivek Goyal	2009-12-03	1	-3/+134
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o So far we just had one cfq_group in cfq_data. To create space for more than one cfq_group, we need to have a service tree of groups where all the groups can be queued if they have active cfq queues backlogged in these. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Keep queue on service tree until we expire it	Vivek Goyal	2009-12-03	1	-21/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Currently cfqq deletes a queue from service tree if it is empty (even if we might idle on the queue). This patch keeps the queue on service tree hence associated group remains on the service tree until we decide that we are not going to idle on the queue and expire it. o This just helps in time accounting for queue/group and in implementation of rest of the patches. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Implement macro to traverse each service tree in group	Vivek Goyal	2009-12-03	1	-16/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Implement a macro to traverse each service tree in the group. This avoids usage of double for loop and special condition for idle tree 4 times. o Macro is little twisted because of special handling of idle class service tree. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Introduce the notion of cfq groups	Vivek Goyal	2009-12-03	1	-33/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o This patch introduce the notion of cfq groups. Soon we will can have multiple groups of different weights in the system. o Various service trees (prioclass and workload type trees), will become per cfq group. So hierarchy looks as follows. cfq_groups \| workload type \| cfq queue o When an scheduling decision has to be taken, first we select the cfq group then workload with-in the group and then cfq queue with-in the workload type. o This patch just makes various workload service tree per cfq group and introduce the function to be able to choose a group for scheduling. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	blkio: Set must_dispatch only if we decided to not dispatch the request	Vivek Goyal	2009-12-03	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o must_dispatch flag should be set only if we decided not to run the queue and dispatch the request. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	drbd_req.c: use part_[inc\|dec]_in_flight()	Philipp Reisner	2009-12-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
\| *	writeback: remove unused nonblocking and congestion checks	Wu Fengguang	2009-12-03	4	-39/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- no one is calling wb_writeback and write_cache_pages with wbc.nonblocking=1 any more - lumpy pageout will want to do nonblocking writeback without the congestion wait So remove the congestion checks as suggested by Chris. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Alex Elder <aelder@sgi.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	writeback: introduce wbc.for_background	Wu Fengguang	2009-12-03	3	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It will lower the flush priority for NFS, and maybe more in future. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| *	writeback: remove the always false bdi_cap_writeback_dirty() test	Wu Fengguang	2009-12-03	1	-18/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is dead code because no bdi flush thread will be started for !bdi_cap_writeback_dirty bdi. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>