<feed xmlns='http://www.w3.org/2005/Atom'>
<title>bwlp/qemu.git/target/ppc/translate, branch master</title>
<subtitle>Experimental fork of QEMU with video encoding patches</subtitle>
<id>https://git.openslx.org/bwlp/qemu.git/atom/target/ppc/translate?h=master</id>
<link rel='self' href='https://git.openslx.org/bwlp/qemu.git/atom/target/ppc/translate?h=master'/>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/'/>
<updated>2022-10-28T16:15:22+00:00</updated>
<entry>
<title>target/ppc: Use gvec to decode XVTSTDC[DS]P</title>
<updated>2022-10-28T16:15:22+00:00</updated>
<author>
<name>Lucas Mateus Castro (alqotel)</name>
</author>
<published>2022-10-19T12:50:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=bbd8dd5e45b831ef3fda585cf80d08f45cdaba95'/>
<id>urn:sha1:bbd8dd5e45b831ef3fda585cf80d08f45cdaba95</id>
<content type='text'>
Used gvec to translate XVTSTDCSP and XVTSTDCDP.

xvtstdcsp:
rept    loop    imm     master version  prev version        current version
25      4000    0       0,206200        0,040730 (-80.2%)    0,040740 (-80.2%)
25      4000    1       0,205120        0,053650 (-73.8%)    0,053510 (-73.9%)
25      4000    3       0,206160        0,058630 (-71.6%)    0,058570 (-71.6%)
25      4000    51      0,217110        0,191490 (-11.8%)    0,192320 (-11.4%)
25      4000    127     0,206160        0,191490 (-7.1%)     0,192640 (-6.6%)
8000    12      0       1,234719        0,418833 (-66.1%)    0,386365 (-68.7%)
8000    12      1       1,232417        1,435979 (+16.5%)    1,462792 (+18.7%)
8000    12      3       1,232760        1,766073 (+43.3%)    1,743990 (+41.5%)
8000    12      51      1,239281        1,319562 (+6.5%)     1,423479 (+14.9%)
8000    12      127     1,231708        1,315760 (+6.8%)     1,426667 (+15.8%)

xvtstdcdp:
rept    loop    imm     master version  prev version    current version
25      4000    0       0,159930        0,040830 (-74.5%)    0,040610 (-74.6%)
25      4000    1       0,160640        0,053670 (-66.6%)    0,053480 (-66.7%)
25      4000    3       0,160020        0,063030 (-60.6%)    0,062960 (-60.7%)
25      4000    51      0,160410        0,128620 (-19.8%)    0,127470 (-20.5%)
25      4000    127     0,160330        0,127670 (-20.4%)    0,128690 (-19.7%)
8000    12      0       1,190365        0,422146 (-64.5%)    0,388417 (-67.4%)
8000    12      1       1,191292        1,445312 (+21.3%)    1,428698 (+19.9%)
8000    12      3       1,188687        1,980656 (+66.6%)    1,975354 (+66.2%)
8000    12      51      1,191250        1,264500 (+6.1%)     1,355083 (+13.8%)
8000    12      127     1,197313        1,266729 (+5.8%)     1,349156 (+12.7%)

Overall, these instructions are the hardest ones to measure performance
as the gvec implementation is affected by the immediate. Above there are
5 different scenarios when it comes to immediate and 2 when it comes to
rept/loop combination. The immediates scenarios are: all bits are 0
therefore the target register should just be changed to 0, with 1 bit
set, with 2 bits set in a combination the new implementation can deal
with using gvec, 4 bits set and the new implementation can't deal with
it using gvec and all bits set. The rept/loop scenarios are high loop
and low rept (so it should spend more time executing it than translating
it) and high rept low loop (so it should spend more time translating it
than executing this code).
These comparisons are between the upstream version, a previous similar
implementation and a one with a cleaner code(this one).
For a comparison with o previous different implementation:
&lt;20221010191356.83659-13-lucas.araujo@eldorado.org.br&gt;

Signed-off-by: Lucas Mateus Castro (alqotel) &lt;lucas.araujo@eldorado.org.br&gt;
Reviewed-by: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Message-Id: &lt;20221019125040.48028-13-lucas.araujo@eldorado.org.br&gt;
Signed-off-by: Daniel Henrique Barboza &lt;danielhb413@gmail.com&gt;
</content>
</entry>
<entry>
<title>target/ppc: Moved XSTSTDC[QDS]P to decodetree</title>
<updated>2022-10-28T16:15:22+00:00</updated>
<author>
<name>Lucas Mateus Castro (alqotel)</name>
</author>
<published>2022-10-19T12:50:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=da3c53bac3d8f3040b35ef3d98a64b592bb28a66'/>
<id>urn:sha1:da3c53bac3d8f3040b35ef3d98a64b592bb28a66</id>
<content type='text'>
Moved XSTSTDCSP, XSTSTDCDP and XSTSTDCQP to decodetree and moved some of
its decoding away from the helper as previously the DCMX, XB and BF were
calculated in the helper with the help of cpu_env, now that part was
moved to the decodetree with the rest.

xvtstdcsp:
rept    loop    master             patch
8       12500   1,85393600         1,94683600 (+5.0%)
25      4000    1,78779800         1,92479000 (+7.7%)
100     1000    2,12775000         2,28895500 (+7.6%)
500     200     2,99655300         3,23102900 (+7.8%)
2500    40      6,89082200         7,44827500 (+8.1%)
8000    12     17,50585500        18,95152100 (+8.3%)

xvtstdcdp:
rept    loop    master             patch
8       12500   1,39043100         1,33539800 (-4.0%)
25      4000    1,35731800         1,37347800 (+1.2%)
100     1000    1,51514800         1,56053000 (+3.0%)
500     200     2,21014400         2,47906000 (+12.2%)
2500    40      5,39488200         6,68766700 (+24.0%)
8000    12     13,98623900        18,17661900 (+30.0%)

xvtstdcdp:
rept    loop    master             patch
8       12500   1,35123800         1,34455800 (-0.5%)
25      4000    1,36441200         1,36759600 (+0.2%)
100     1000    1,49763500         1,54138400 (+2.9%)
500     200     2,19020200         2,46196400 (+12.4%)
2500    40      5,39265700         6,68147900 (+23.9%)
8000    12     14,04163600        18,19669600 (+29.6%)

As some values are now decoded outside the helper and passed to it as an
argument the number of arguments of the helper increased, the number
of TCGop needed to load the arguments increased. I suspect that's why
the slow-down in the tests with a high REPT but low LOOP.

Signed-off-by: Lucas Mateus Castro (alqotel) &lt;lucas.araujo@eldorado.org.br&gt;
Reviewed-by: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Message-Id: &lt;20221019125040.48028-12-lucas.araujo@eldorado.org.br&gt;
Signed-off-by: Daniel Henrique Barboza &lt;danielhb413@gmail.com&gt;
</content>
</entry>
<entry>
<title>target/ppc: Moved XVTSTDC[DS]P to decodetree</title>
<updated>2022-10-28T16:15:22+00:00</updated>
<author>
<name>Lucas Mateus Castro (alqotel)</name>
</author>
<published>2022-10-19T12:50:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=a70a5247104d3d7a8cf584f20d73f41bef643a19'/>
<id>urn:sha1:a70a5247104d3d7a8cf584f20d73f41bef643a19</id>
<content type='text'>
Moved XVTSTDCSP and XVTSTDCDP to decodetree an restructured the helper
to be simpler and do all decoding in the decodetree (so XB, XT and DCMX
are all calculated outside the helper).

Obs: The tests in this one are slightly different, these are the sum of
these instructions with all possible immediate and those instructions
are repeated 10 times.

xvtstdcsp:
rept    loop    master             patch
8       12500   2,76402100         2,70699100 (-2.1%)
25      4000    2,64867100         2,67884100 (+1.1%)
100     1000    2,73806300         2,78701000 (+1.8%)
500     200     3,44666500         3,61027600 (+4.7%)
2500    40      5,85790200         6,47475500 (+10.5%)
8000    12     15,22102100        17,46062900 (+14.7%)

xvtstdcdp:
rept    loop    master             patch
8       12500   2,11818000         1,61065300 (-24.0%)
25      4000    2,04573400         1,60132200 (-21.7%)
100     1000    2,13834100         1,69988100 (-20.5%)
500     200     2,73977000         2,48631700 (-9.3%)
2500    40      5,05067000         5,25914100 (+4.1%)
8000    12     14,60507800        15,93704900 (+9.1%)

Signed-off-by: Lucas Mateus Castro (alqotel) &lt;lucas.araujo@eldorado.org.br&gt;
Reviewed-by: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Message-Id: &lt;20221019125040.48028-11-lucas.araujo@eldorado.org.br&gt;
Signed-off-by: Daniel Henrique Barboza &lt;danielhb413@gmail.com&gt;
</content>
</entry>
<entry>
<title>target/ppc: Use gvec to decode XVCPSGN[SD]P</title>
<updated>2022-10-28T16:15:22+00:00</updated>
<author>
<name>Lucas Mateus Castro (alqotel)</name>
</author>
<published>2022-10-19T12:50:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=95a89d3118590aaed3a56b52e08246cdb51a108e'/>
<id>urn:sha1:95a89d3118590aaed3a56b52e08246cdb51a108e</id>
<content type='text'>
Moved XVCPSGNSP and XVCPSGNDP to decodetree and used gvec to translate
them.

xvcpsgnsp:
rept    loop    master             patch
8       12500   0,00561400         0,00537900 (-4.2%)
25      4000    0,00562100         0,00400000 (-28.8%)
100     1000    0,00696900         0,00416300 (-40.3%)
500     200     0,02211900         0,00840700 (-62.0%)
2500    40      0,09328600         0,02728300 (-70.8%)
8000    12      0,27295300         0,06867800 (-74.8%)

xvcpsgndp:
rept    loop    master             patch
8       12500   0,00556300         0,00584200 (+5.0%)
25      4000    0,00482700         0,00431700 (-10.6%)
100     1000    0,00585800         0,00464400 (-20.7%)
500     200     0,01565300         0,00839700 (-46.4%)
2500    40      0,05766500         0,02430600 (-57.8%)
8000    12      0,19875300         0,07947100 (-60.0%)

Like the previous instructions there seemed to be a improvement on
translation time.

Signed-off-by: Lucas Mateus Castro (alqotel) &lt;lucas.araujo@eldorado.org.br&gt;
Reviewed-by: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Message-Id: &lt;20221019125040.48028-10-lucas.araujo@eldorado.org.br&gt;
Signed-off-by: Daniel Henrique Barboza &lt;danielhb413@gmail.com&gt;
</content>
</entry>
<entry>
<title>target/ppc: Use gvec to decode XV[N]ABS[DS]P/XVNEG[DS]P</title>
<updated>2022-10-28T16:15:22+00:00</updated>
<author>
<name>Lucas Mateus Castro (alqotel)</name>
</author>
<published>2022-10-19T12:50:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=a5b36805192ac12f114c45e9c8ccbe4ce9ba1cf9'/>
<id>urn:sha1:a5b36805192ac12f114c45e9c8ccbe4ce9ba1cf9</id>
<content type='text'>
Moved XVABSSP, XVABSDP, XVNABSSP,XVNABSDP, XVNEGSP and XVNEGDP to
decodetree and used gvec to translate them.

xvabssp:
rept    loop    master             patch
8       12500   0,00477900         0,00476000 (-0.4%)
25      4000    0,00442800         0,00353300 (-20.2%)
100     1000    0,00478700         0,00366100 (-23.5%)
500     200     0,00973200         0,00649400 (-33.3%)
2500    40      0,03165200         0,02226700 (-29.7%)
8000    12      0,09315900         0,06674900 (-28.3%)

xvabsdp:
rept    loop    master             patch
8       12500   0,00475000         0,00474400 (-0.1%)
25      4000    0,00355600         0,00367500 (+3.3%)
100     1000    0,00444200         0,00366000 (-17.6%)
500     200     0,00942700         0,00732400 (-22.3%)
2500    40      0,02990000         0,02308500 (-22.8%)
8000    12      0,08770300         0,06683800 (-23.8%)

xvnabssp:
rept    loop    master             patch
8       12500   0,00494500         0,00492900 (-0.3%)
25      4000    0,00397700         0,00338600 (-14.9%)
100     1000    0,00421400         0,00353500 (-16.1%)
500     200     0,01048000         0,00707100 (-32.5%)
2500    40      0,03251500         0,02238300 (-31.2%)
8000    12      0,08889100         0,06469800 (-27.2%)

xvnabsdp:
rept    loop    master             patch
8       12500   0,00511000         0,00492700 (-3.6%)
25      4000    0,00398800         0,00381500 (-4.3%)
100     1000    0,00390500         0,00365900 (-6.3%)
500     200     0,00924800         0,00784600 (-15.2%)
2500    40      0,03138900         0,02391600 (-23.8%)
8000    12      0,09654200         0,05684600 (-41.1%)

xvnegsp:
rept    loop    master             patch
8       12500   0,00493900         0,00452800 (-8.3%)
25      4000    0,00369100         0,00366800 (-0.6%)
100     1000    0,00371100         0,00380000 (+2.4%)
500     200     0,00991100         0,00652300 (-34.2%)
2500    40      0,03025800         0,02422300 (-19.9%)
8000    12      0,09251100         0,06457600 (-30.2%)

xvnegdp:
rept    loop    master             patch
8       12500   0,00474900         0,00454400 (-4.3%)
25      4000    0,00353100         0,00325600 (-7.8%)
100     1000    0,00398600         0,00366800 (-8.0%)
500     200     0,01032300         0,00702400 (-32.0%)
2500    40      0,03125000         0,02422400 (-22.5%)
8000    12      0,09475100         0,06173000 (-34.9%)

This one to me seemed the opposite of the previous instructions, as it
looks like there was an improvement in the translation time (itself not
a surprise as operations were done twice before so there was the need to
translate twice as many TCGop)

Signed-off-by: Lucas Mateus Castro (alqotel) &lt;lucas.araujo@eldorado.org.br&gt;
Reviewed-by: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Message-Id: &lt;20221019125040.48028-9-lucas.araujo@eldorado.org.br&gt;
Signed-off-by: Daniel Henrique Barboza &lt;danielhb413@gmail.com&gt;
</content>
</entry>
<entry>
<title>target/ppc: Move VABSDU[BHW] to decodetree and use gvec</title>
<updated>2022-10-28T16:15:22+00:00</updated>
<author>
<name>Lucas Mateus Castro (alqotel)</name>
</author>
<published>2022-10-19T12:50:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=26c964f85159c78f6ecf132de8f2f0093d3f2e89'/>
<id>urn:sha1:26c964f85159c78f6ecf132de8f2f0093d3f2e89</id>
<content type='text'>
Moved VABSDUB, VABSDUH and VABSDUW to decodetree and use gvec to
translate them.

vabsdub:
rept    loop    master             patch
8       12500   0,03601600         0,00688500 (-80.9%)
25      4000    0,03651000         0,00532100 (-85.4%)
100     1000    0,03666900         0,00595300 (-83.8%)
500     200     0,04305800         0,01244600 (-71.1%)
2500    40      0,06893300         0,04273700 (-38.0%)
8000    12      0,14633200         0,12660300 (-13.5%)

vabsduh:
rept    loop    master             patch
8       12500   0,02172400         0,00687500 (-68.4%)
25      4000    0,02154100         0,00531500 (-75.3%)
100     1000    0,02235400         0,00596300 (-73.3%)
500     200     0,02827500         0,01245100 (-56.0%)
2500    40      0,05638400         0,04285500 (-24.0%)
8000    12      0,13166000         0,12641400 (-4.0%)

vabsduw:
rept    loop    master             patch
8       12500   0,01646400         0,00688300 (-58.2%)
25      4000    0,01454500         0,00475500 (-67.3%)
100     1000    0,01545800         0,00511800 (-66.9%)
500     200     0,02168200         0,01114300 (-48.6%)
2500    40      0,04571300         0,04138800 (-9.5%)
8000    12      0,12209500         0,12178500 (-0.3%)

Same as VADDCUW and VSUBCUW, overall performance gain but it uses more
TCGop (4 before the patch, 6 after).

Signed-off-by: Lucas Mateus Castro (alqotel) &lt;lucas.araujo@eldorado.org.br&gt;
Reviewed-by: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Message-Id: &lt;20221019125040.48028-8-lucas.araujo@eldorado.org.br&gt;
Signed-off-by: Daniel Henrique Barboza &lt;danielhb413@gmail.com&gt;
</content>
</entry>
<entry>
<title>target/ppc: Move VAVG[SU][BHW] to decodetree and use gvec</title>
<updated>2022-10-28T16:15:22+00:00</updated>
<author>
<name>Lucas Mateus Castro (alqotel)</name>
</author>
<published>2022-10-19T12:50:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=c85929b2ddf6bbad737635c9b85213007ec043af'/>
<id>urn:sha1:c85929b2ddf6bbad737635c9b85213007ec043af</id>
<content type='text'>
Moved the instructions VAVGUB, VAVGUH, VAVGUW, VAVGSB, VAVGSH, VAVGSW,
to decodetree and use gvec with them. For these one the right shift
had to be made before the sum as to avoid an overflow, so add 1 at the
end if any of the entries had 1 in its LSB as to replicate the "+ 1"
before the shift described by the ISA.

vavgub:
rept    loop    master             patch
8       12500   0,02616600         0,00754200 (-71.2%)
25      4000    0,02530000         0,00637700 (-74.8%)
100     1000    0,02604600         0,00790100 (-69.7%)
500     200     0,03189300         0,01838400 (-42.4%)
2500    40      0,06006900         0,06851000 (+14.1%)
8000    12      0,13941000         0,20548500 (+47.4%)

vavguh:
rept    loop    master             patch
8       12500   0,01818200         0,00780600 (-57.1%)
25      4000    0,01789300         0,00641600 (-64.1%)
100     1000    0,01899100         0,00787200 (-58.5%)
500     200     0,02527200         0,01828400 (-27.7%)
2500    40      0,05361800         0,06773000 (+26.3%)
8000    12      0,12886600         0,20291400 (+57.5%)

vavguw:
rept    loop    master             patch
8       12500   0,01423100         0,00776600 (-45.4%)
25      4000    0,01780800         0,00638600 (-64.1%)
100     1000    0,02085500         0,00787000 (-62.3%)
500     200     0,02737100         0,01828800 (-33.2%)
2500    40      0,05572600         0,06774200 (+21.6%)
8000    12      0,13101700         0,20311600 (+55.0%)

vavgsb:
rept    loop    master             patch
8       12500   0,03006000         0,00788600 (-73.8%)
25      4000    0,02882200         0,00637800 (-77.9%)
100     1000    0,02958000         0,00791400 (-73.2%)
500     200     0,03548800         0,01860400 (-47.6%)
2500    40      0,06360000         0,06850800 (+7.7%)
8000    12      0,13816500         0,20550300 (+48.7%)

vavgsh:
rept    loop    master             patch
8       12500   0,01965900         0,00776600 (-60.5%)
25      4000    0,01875400         0,00638700 (-65.9%)
100     1000    0,01952200         0,00786900 (-59.7%)
500     200     0,02562000         0,01760300 (-31.3%)
2500    40      0,05384300         0,06742800 (+25.2%)
8000    12      0,13240800         0,20330000 (+53.5%)

vavgsw:
rept    loop    master             patch
8       12500   0,01407700         0,00775600 (-44.9%)
25      4000    0,01762300         0,00640000 (-63.7%)
100     1000    0,02046500         0,00788500 (-61.5%)
500     200     0,02745600         0,01843000 (-32.9%)
2500    40      0,05375500         0,06820500 (+26.9%)
8000    12      0,13068300         0,20304900 (+55.4%)

These results to me seems to indicate that with gvec the results have a
slower translation but faster execution.

Signed-off-by: Lucas Mateus Castro (alqotel) &lt;lucas.araujo@eldorado.org.br&gt;
Reviewed-by: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Message-Id: &lt;20221019125040.48028-7-lucas.araujo@eldorado.org.br&gt;
Signed-off-by: Daniel Henrique Barboza &lt;danielhb413@gmail.com&gt;
</content>
</entry>
<entry>
<title>target/ppc: Move VPRTYB[WDQ] to decodetree and use gvec</title>
<updated>2022-10-28T16:15:22+00:00</updated>
<author>
<name>Lucas Mateus Castro (alqotel)</name>
</author>
<published>2022-10-19T12:50:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=d57fbd8fd9ea4f559cf4ed6c8fb6f064b73d6882'/>
<id>urn:sha1:d57fbd8fd9ea4f559cf4ed6c8fb6f064b73d6882</id>
<content type='text'>
Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to
decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8,
respectively.

vprtybw:
rept    loop    master             patch
8       12500   0,01198900         0,00703100 (-41.4%)
25      4000    0,01070100         0,00571400 (-46.6%)
100     1000    0,01123300         0,00678200 (-39.6%)
500     200     0,01601500         0,01535600 (-4.1%)
2500    40      0,03872900         0,05562100 (43.6%)
8000    12      0,10047000         0,16643000 (65.7%)

vprtybd:
rept    loop    master             patch
8       12500   0,00757700         0,00788100 (4.0%)
25      4000    0,00652500         0,00669600 (2.6%)
100     1000    0,00714400         0,00825400 (15.5%)
500     200     0,01211000         0,01903700 (57.2%)
2500    40      0,03483800         0,07021200 (101.5%)
8000    12      0,09591800         0,21036200 (119.3%)

vprtybq:
rept    loop    master             patch
8       12500   0,00675600         0,00667200 (-1.2%)
25      4000    0,00619400         0,00643200 (3.8%)
100     1000    0,00707100         0,00751100 (6.2%)
500     200     0,01199300         0,01342000 (11.9%)
2500    40      0,03490900         0,04092900 (17.2%)
8000    12      0,09588200         0,11465100 (19.6%)

I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ,
I'm not sure if it's worth to move those instructions. Comparing the
assembly of the helper with the TCGop they are pretty similar, so
I'm not sure why vprtybd took so much more time.

Signed-off-by: Lucas Mateus Castro (alqotel) &lt;lucas.araujo@eldorado.org.br&gt;
Reviewed-by: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Message-Id: &lt;20221019125040.48028-6-lucas.araujo@eldorado.org.br&gt;
Signed-off-by: Daniel Henrique Barboza &lt;danielhb413@gmail.com&gt;
</content>
</entry>
<entry>
<title>target/ppc: Move VNEG[WD] to decodtree and use gvec</title>
<updated>2022-10-28T16:15:22+00:00</updated>
<author>
<name>Lucas Mateus Castro (alqotel)</name>
</author>
<published>2022-10-19T12:50:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=90b5aadb09b1063dfc34636efe212c6862cbe32f'/>
<id>urn:sha1:90b5aadb09b1063dfc34636efe212c6862cbe32f</id>
<content type='text'>
Moved the instructions VNEGW and VNEGD to decodetree and used gvec to
decode it.

vnegw:
rept    loop    master             patch
8       12500   0,01053200         0,00548400 (-47.9%)
25      4000    0,01030500         0,00390000 (-62.2%)
100     1000    0,01096300         0,00395400 (-63.9%)
500     200     0,01472000         0,00712300 (-51.6%)
2500    40      0,03809000         0,02147700 (-43.6%)
8000    12      0,09957100         0,06202100 (-37.7%)

vnegd:
rept    loop    master             patch
8       12500   0,00594600         0,00543800 (-8.5%)
25      4000    0,00575200         0,00396400 (-31.1%)
100     1000    0,00676100         0,00394800 (-41.6%)
500     200     0,01149300         0,00709400 (-38.3%)
2500    40      0,03441500         0,02169600 (-37.0%)
8000    12      0,09516900         0,06337000 (-33.4%)

Signed-off-by: Lucas Mateus Castro (alqotel) &lt;lucas.araujo@eldorado.org.br&gt;
Reviewed-by: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Message-Id: &lt;20221019125040.48028-5-lucas.araujo@eldorado.org.br&gt;
Signed-off-by: Daniel Henrique Barboza &lt;danielhb413@gmail.com&gt;
</content>
</entry>
<entry>
<title>target/ppc: Move V(ADD|SUB)CUW to decodetree and use gvec</title>
<updated>2022-10-28T16:15:22+00:00</updated>
<author>
<name>Lucas Mateus Castro (alqotel)</name>
</author>
<published>2022-10-19T12:50:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=611bc69bf6bc6b23e91e4798c036b941c9958b40'/>
<id>urn:sha1:611bc69bf6bc6b23e91e4798c036b941c9958b40</id>
<content type='text'>
This patch moves VADDCUW and VSUBCUW to decodtree with gvec using an
implementation based on the helper, with the main difference being
changing the -1 (aka all bits set to 1) result returned by cmp when
true to +1. It also implemented a .fni4 version of those instructions
and dropped the helper.

vaddcuw:
rept    loop    master             patch
8       12500   0,01008200         0,00612400 (-39.3%)
25      4000    0,01091500         0,00471600 (-56.8%)
100     1000    0,01332500         0,00593700 (-55.4%)
500     200     0,01998500         0,01275700 (-36.2%)
2500    40      0,04704300         0,04364300 (-7.2%)
8000    12      0,10748200         0,11241000 (+4.6%)

vsubcuw:
rept    loop    master             patch
8       12500   0,01226200         0,00571600 (-53.4%)
25      4000    0,01493500         0,00462100 (-69.1%)
100     1000    0,01522700         0,00455100 (-70.1%)
500     200     0,02384600         0,01133500 (-52.5%)
2500    40      0,04935200         0,03178100 (-35.6%)
8000    12      0,09039900         0,09440600 (+4.4%)

Overall there was a gain in performance, but the TCGop code was still
slightly bigger in the new version (it went from 4 to 5).

Signed-off-by: Lucas Mateus Castro (alqotel) &lt;lucas.araujo@eldorado.org.br&gt;
Reviewed-by: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Message-Id: &lt;20221019125040.48028-4-lucas.araujo@eldorado.org.br&gt;
Signed-off-by: Daniel Henrique Barboza &lt;danielhb413@gmail.com&gt;
</content>
</entry>
</feed>
