Intel NetBurst certainly is not a successful
overhead construction, although Intel in designs at the beginning of
Pentium 4, the goal is the 10GHz core frequency. But the fact
proved the goal completes with difficulty. Engineer during
promotion frequency, controls with difficulty the calorific capacity
to a reasonable scope. Therefore, Intel declared the NetBurst
product frequency does not promote to 4GHz above. Everybody does
not want to sit after all in "the stove" side, everybody does not want
to hear to the ventilator sound which humming sound makes noise. Intel was the tabletop processor had found a new method of
exploitation, did not promote the clock frequency, but integrates many
computations core. But the congenital flaw avoids with
difficulty, if does not make any change, Pentium 4 with difficulty and
its competition to the palm reading anti-, in the current processor
test, Pentium 4 all falls behind in the majority test project AMD. NetBurst overhead construction high heating value and high
energy consumption, not only lets Intel be helpless, also gives the
consumer which some hopes purchases the Intel product 当头 as soon
as to strike. But Intel now in the product line or has the
product. Although Pentium III already withdrew from the tabletop
market, but it in very quickly moved the market to find the new
position, but the present moves in the market the Intel processor all
is based on with the Pentium III similar overhead construction, only
has used the more advanced production craft and other some
improvements, has realized compared to the Pentium III smaller
calorific capacity and the higher overall performance, Pentium III is
moving the market to obtain the rebirth, is named as - Pentium M. Although has used the Pentium III correlation overhead
construction, but Pentium the M use actually is the QPB 4 time of
front end main lines, this main line and the Pentium 4 main lines are
same, simultaneously also was Pentium M through switches over the card
to use above the Pentium 4 ordinary motherboards has provided the
rationale. Engineer Hua Shuo developed special-purpose switches
over the card to Pentium M to realize this function. Why not Pentium does the M processor and above other processor
overhead constructions have with? Intel continuously all does
not have excessively many discussions this section product the
overhead construction. In in the all issues official documents,
to its performance index and the naming method description also is
only the word, for example: In order to move the overhead
construction which the computer designs, special-purpose storehouse
management, micro-operation fusion technology (micro-ops fusion), as
well as enhancement Intel SpeedStep technology (EIST). But these
descriptions certainly cannot indicate clearly the Pentium M internal
overhead construction. Looked like Intel looks like certainly
does not want to disclose goes to extremes about the Pentium M detail
information, inside this certainly has the reason, is what? The product according to the P6 overhead construction
which actually Pentium M radically is which made the small revision
becomes, the P6 overhead construction is Intel very early before
developed the overhead construction, most early once uses in Pentium
Pro, afterwards Pentium II, Pentium III all used was this overhead
construction. Then Pentium M looks like the picture is the
NetBurst overhead construction one kind which the advertisement blots
out the sky backs up oh? Certainly is not, you cannot only
regard as Pentium M are section obsolete, already inopportune old
overhead construction. Actually in the actual evaluation, Pentium M scored points
in very many projects has all surpassed Pentium 4. But the P6
overhead construction also is the Intel development is most
outstanding, a most successful section overhead construction, merely
from it the time which exists in the market, as well as grows the
product quantity may know 12. Such being the case, why similarly
uses this overhead construction Pentium M not to be able to continue
to realize its is magnificent? Now lets us have a look, to
compare to Pentium III, Pentium M makes which does the concrete
improvement have? The pipeline and carries out the core: Pentium M and Pentium III are same, all is (simplifies
instruction collection overhead construction) based on the RISC
overhead construction the processor, but two section processors carry
out the core to have slightly have the difference. For example:
Although two section processors both only have 5 executions
units, but two section processors execution pipeline lengths are
dissimilar. The Pentium III integer pipeline length is 10
levels, but the Pentium M pipeline must longer. Certainly the
Pentium M pipeline length still by far was inferior to Pentium 4,
needed to guarantee after all Pentium the M processor carried out the
efficiency, but from now on in order to will be able further to
promote the processor the frequency, Engineer Intel or increased the
pipeline length. The pipeline length decision frequency promotion
potential, simultaneously can bring more energy consumption and the
calorific capacity for the processor, therefore the pipeline length
designation, to moves the processor to say especially importantly.
Therefore through some use empirical data judgements, this
processor pipeline approximately about 12-14 level, in other words
must a Pentium III pipeline be longer than spot. The new
increase pipeline progression, besides uses for to push rises the
processor the clock frequency, in Pentium in M processor
micro-operation fusion technology also need longer pipeline.
This point will mention in the after article. Longer pipeline shortcoming also quite a lot, while has
brought a higher energy consumption and more calorific capacity, after
but also can bring because branch forecast defeat more expenses.
Especially also has the chaotic foreword regarding the present
ultra scalar system structure to carry out the ability processor to
say, the branch forecast the defeat brings the negative influence is
not allow to neglect, moreover already became the influence processor
performance the important attribute. In the research and
development process, the development personnel can reduce this kind of
influence as far as possible which brings because of the increase
execution pipeline progression, then the present lets us have a look,
how in Pentium M improves the branch forecast unit. The improvement branch forecast and the hardware data takes in
advance: When in the processor pipeline starts the full speed
revolution, suddenly occurred has carried out a wrong procedure
branch, then the processor had reto search the execution correct
branch, in this process, part of executions units could stem from the
idle condition, the execution detention increase, further has affected
the final performance. The branch forecast logical the goal is
for the probability which has this kind of situation is small.
In Pentium M, the branch forecast logic is the main improvement
part. In fact, the Pentium M branch transports measured with
Pentium 4 looks like very much. Accurate saying, the Pentium M branch forecast unit should
4 processors be similar with Prescott the core Pentium. It
increased two parts: Distinguishes the circulation, another is
the forecast indirect branch. Because of this, in the Pentium M
branch forecast and in front of Prescott Pentium 4 has the obvious
difference, moreover must be more advanced than them.
Certainly, will have to want further originally based on the use
branch history table traditional static state branch to forecast way
improvement better, the difficulty is extremely big. But through
following several aspects branch forecast unit improvement, Engineer
Intel fully enhanced the Pentium M forecast precision 20%, certainly
this is and Pentium III compares. The first improvement increased the circulation
recognition logic. The traditional static branch forecast way,
the branch forecast the end of loop condition always makes a mistake.
Certainly can through the expanded memory branch information
buffer storage capacity, make it to save the more branches
information, then analyzes data to solve the problem. But such
11 analyses data can create the very long detention. Therefore
Pentium M has used slightly different method, independently comes the
code in circulation recognition logic and the circulation conclusion
information. This can the enormous promotion conclusion
circulation condition forecast precision. Second is the improvement indirect branch forecast.
The so-called indirect branch is a branch branch address, this
address when procedure translation is does not know, when is the
procedure execution, decides by the correlation register condition.
The traditional static branch forecast uses two tables:
The branch history table and the branch address table, this has
these two tables to lack the indirect branch address table, lets the
forecast the result be correct rate does not surpass 75%.
Therefore the development personnel in Pentium M, newly
increased an indirect branch table, specially uses for to save this
type the indirect branch address. After above two aspects improvement, because the forecast
precision greatly is the enhancement, pipeline full speed movement
situation before have been more than, the execution unit idle waiting
situation also changed few. Because of, under same frequency
Pentium the M overall performance III has been being like this higher
than Pentium about 7%. Moreover along with the branch forecast
unit improvement, Pentium M also renewed the hardware data to take the
logic in advance, used in to take from the memory the data in the
buffer. Pentium M used the hardware data which and prescott the
core Pentium 4 processors was similar to take the algorithm in
advance, this algorithm had to be higher than the Pentium III
algorithm efficiency : Pentium M and Pentium III, Pentium 4 is all same, is the
RISC processor. This meant the execution unit orders in the
processing interior simplification, is far more effective than the
processing complex x86 instruction. In other words, also is
carries out RISC instruction time, must the execution the x86 overhead
construction which is even composed by three more operands quickly be
usually smoother than. Therefore, x86 order after process
decoding, usually can decompose two even three micro-operations
numbers. For example: A stored datum to in the memory or a
processing memory the data order, is separately decoded two
instructions. The first kind of situation, to the buffer two
instructions composes from the computation address and the stored
datum; The second kind of situation, by reads the fetching from
the memory according to be composed with the service data two
instructions. But the present processor all has the chaotic
foreword to carry out the micro-operation number ability, after
therefore a x86 instruction decomposes Cheng Duoge the micro-operation
number, can separately deliver carries out in the pipeline to process. If between these micro-operation each other irrelevant,
then separates carries out the nature no question. But if an
instruction execution needs other to carry out the result, then the
pipeline can appear the waiting phenomenon, waited for the execution
unit will process the result which will complete to transmit, then
will be able to continue to process. This kind of waiting
phenomenon is not certainly obvious in the NetBurst overhead
construction, because it has the very many executions unit, but this
type processor says regarding Pentium the M, the performance influence
quite has been obvious, moreover under the waiting status processor
continues to waste the energy, this regarding moves the processor to
say also is cannot be accepted. This also is why Pentium does
the M processor have to join the micro-operation fusion technology the
reason, it can avoid appearing the execution unit as far as possible
to be at idle condition this situation. This technical work is extremely simple, is divides
according to the relevance the x86 instruction some parts, then all
micro-operations all will concentrate through the decoding together,
then through before determined relevant division micro-operation, thus
will form the x86 instruction the subset, will have the relevant
micro-operation to divide in together, by identical execution unit
execution, but differently will carry out micro-operation each other
which the unit will carry out will be irrelevant. Therefore
cannot again appear waited for some execution unit carries out the
result situation. Although the micro-operation fusion needs to
do some work, but this has the advantage regarding the performance
promotion. Through the test, uses this technology to be able to
let the integer data the processing speed promote 5%, the floating
point data processing speed to promote 9%. Special-purpose storehouse management: In the Pentium M another improvement is the storehouse
management. Because the software use storehouse is extremely
frequent, has it is works as when its transfer subroutine is so.
Let the execution unit frequently process PUSH, POP, CALL and
RET such about the storehouse operation instruction, lets the
execution unit clock be at the running status, this is disadvantageous
to the processor control calorific capacity and the energy
consumption. Therefore in the Pentium M special-purpose
storehouse management and the storehouse indicator register work
together, the storehouse management can distinguish, likes PUSH, POP,
CALL and the RET such instruction, passes through the decoding in
them, but arrived the execution unit before pretreated them, thus
reduced the execution unit the load. Can during promotion
performance, further control the calorific capacity and the energy
consumption. According to tests indicated that, the use
special-purpose storehouse management can reduce the integer execution
unit 5% instruction to carry out the quantity. Processor main line: Although Pentium M use based on Pentium III overhead
construction, but Pentium M has used the completely different main
line. The P6 overhead construction system bus peak value band
width is only 1GB/s, this said too slightly regarding the present
standard. Similarly considered the possible tradition the main
line not too to suit present the application, therefore Engineer Intel
decided lets Pentium M use Quad Pumped the Bus main line. This
kind of main line is precisely the Pentium 4 main lines standards. In fact, the QPB main line also is Pentium M and Pentium 4
only similarity. If thin Canadian analysis, two main lines
overhead constructions or have some slight differences, the Pentium M
QPB main line lacks some functions. For example: The most
remarkable characteristic is when the Pentium 4 system buses 800MHz,
after but Pentium M 533MHz; Then the Pentium M system bus only
supports 32 bit of addressing, in other words most only supports 4GB
the memory space. Finally the Pentium M main line does not
support the multi-processor disposition. But these differences
place not too are all important, instead is Pentium M and Pentium 4
between the main line compatibility, only then established moved the
processor in the tabletop computer the application foundation. SSE2 instruction collection: All Pentium the M processor all supports SSE and the SSE2
expansion instruction collection. Therefore this also is Pentium
M in view of a Pentium III promotion. But Pentium M certainly
does not support the SSE3 instruction collection, this is an
instruction after all which first time uses on the Prescott core
processor, promotes the time must processor be later than Pentium the
M. L2 buffer energy conservation measure: Pentium the M equipment has the extremely big L2 buffer,
the capacity achieves 2MB. Uses the big buffer to have many
advantage, for example can reduce the system bus and the memory main
line load, achieved reduces the energy consumption the function.
But specialer is, Intel was Pentium M processor itself has also
used the special economical energy consumption method. Is same
with the Intel other processors, in the Pentium M buffer is 8 groups
correlations, and further is subdivided the L2 buffer is 4 parts, each
part all may alone visit. In other words, processor when work, does not need to read
takes a buffer also to revolve the entire buffer. Therefore the
such economical L2 energy consumption probably is 4 times. But
uses this way L2 buffer the detention to be able to increase for 1
cycle, if III compares to Pentium. Moreover the Pentium M L1
buffer is 64KB, code and the data capacity each are 32KB, is Pentium
III L1 buffer capacity two times. SpeedStep III energy conservation technology: Because Pentium M moves the processor, then naturally can
have the special energy conservation technology, in the Pentium M
energy conservation technology is speedstep III. According to
uses the experience to look that, the processor energy consumption and
the processor frequency, the processor work load, as well as the
processor great voltage is closely linked. In other words, must
want to reduce the processor the energy consumption, must obtain from
these three aspects. Therefore the development personnel establishes is smaller
in the processor work load, through reduces the operating frequency
and the voltage may reduce the processor energy consumption. For
example: Processor in processing office software time, is not
100% load, but this also is the majority notebook computer most common
application. Therefore the processor can automatically decline
the frequency and the voltage, this process is extremely smooth,
cannot let the user have slightly realized. This is the
speedstep technology primary mission. In Pentium III-M first generation of speedstep, has only
provided two processors patterns: Full speed pattern and energy
conservation pattern. When the battery capacity is lower than
some rank or the processor free time, can enter the energy
conservation pattern. In Pentium in the 4-M processor, has used
second-generation speedstep, can automatically transform in three kind
of patterns. In this generation, between the energy conservation
pattern and the full speed pattern performance difference is huge,
this relies on the processor work load. Moreover works under the energy conservation pattern
processor, once the CPU work load suddenly enlarged or the user has
carried out a large-scale procedure, then processor with difficulty
fast promotion performance and transformation condition, thus caused
CPU the overall performance to receive the influence. Technology
has undergone three generations in Pentium M in the processor
speedstep, can provide 7 kind of different conditions, can
automatically reduce the frequency and the voltage according to the
processor work load, moreover between the different pattern
transformation is rapid, cannot bring to the user slightly realized Now in market condition Pentium M processor all based on
Dothan core. The processor core uses 90 nanometer manufactures
crafts and "the strain silicon" the technology, the Dothan core
manufacture craft and Prescott the core Pentium 4 processors is same.
The processor core area is 83.6 square millimeters, the interior
includes 140 million transistors.

Through the next table can direct-viewing comparison
Dothan core Pentium the M processor and the Pentium 4 differences:

 |