TRANSFER OF AN INTEGER BINARY VALUE INTO DECIMAL ASCII STRING

Abstract

The objective of this article is to present the variety of methods available for transfer of an integer binary number into the decimal ASCII string. The common methods often are not the fastest ones and there are some almost forgotten and still reliable methods (by repetitive subtractions and additions) and some less known ones like division via multiplication or binary tree search for the decimal digits.

Although we write from left to right, the weight places in decimal numbers are organized in reverse order, from right to left, and this require minor modifications and additional code in some algorithms. It is much more convenient on paper or papyrus to write from left to right because the process of writing can be visually controlled and the angle of the pen is higher then 90º which facilitate pen’s gliding on paper, while writing on the clay has to be performed in reverse order because the angle of the pen, i.e. wedge shaped stick is less then 90º and therefore it is in much better position to scratch and punch the raw clay table. The reverse order of digits in decimal numbers is the legacy of that ancient era of cuneiform writing on clay tables which became quite obvious during the development of these routines for 2→10 radix conversion. This challenges historical theory that decimal numbers appears firstly in India because they wrote from left to right on fabrics and frescoes and therefore decimal numbers could be much older and with the different origin.

The problem itself is mathematically quite clear and essentially it is consisted of radix transition from radix 2 into the radix 10, but its practical solution is not that simple.

There are 6 basic ways to transfer register’s binary value into its decimal ASCII string:

1. Successive subtraction of decimal weight coefficients (1, 2, 4, 8, 16, 32, 64, 128, 256, etc.) and countering till the result is being negative: This method yields the decimal digits in direct order with leading zeroes that should be truncated, but the number of these leading zeroes can be vastly reduced by an estimation of the leading zero positions with the error of eventually a single position only. The set of weight coefficients could be stored in an array and then grabbed with LODSB instruction or on the stack and manipulated via POP one. The average number of iterations per decimal digit is 4.5 loops. The method is quite simple and yet it utilizes only fast instructions for integer addition and subtraction. This method was very popular in the era of 8‑bit computers when it was almost only available option due to lack of complex assembly instructions. The routines based on that method are EAX2SUB and ESI2ASC ones. EAX2SUB pulls the weight coefficients from an array while ESI2ASC uses the stack. The array based routine is faster than the stack one because it can finish the task earlier, which is not case with the stack based routine. The stack based ESI2ASC routine has to pull all the coefficients from the stack to be able to return back with RET instruction while the array based EAX2SUB does not have such limitation and it can cancel its main loop at any time. The array based routine is also suitable for one more optimization: the number of decimal digits can be estimated in regards to the most significant bit of the converted binary number which is determined by the BSR instruction. Thereby the NUMDIG array contains the number of certainly leading zeroes in respect to the most significant bit of the converted binary number and that estimation, as aforesaid, can miss for only one leading zero digit’s position which utterly eliminates usage of REP SCASB instruction’s pair for truncation of leading zero’s decimal digits, just as it is done in the ESI2ASC stack based routine. Disadvantage of the array based method is that it alters more registers then the stack’s based one and it also uses CMOVZ instruction to clear the undefined state of destination register of BSR instruction when the source register is zero, and this instruction is not available in most embedded systems because it firstly appeared within Pentium Pro processor.
3. The extraction via division by 10: This method is very convenient because DIV instruction simultaneously yields both the result and the remainder. Thereby the decimal digits are extracting in reverse order by repetitive division by 10 – the remainder is a single decimal digit and the result is something that is going to be used once again in the following iteration. The digits are extracting in reverse order and therefore they should be storing backward. The method is demonstrated by the EAX2DEC routine. The disadvantage of the DIV assembly instruction is that it alters Zero flag incorrectly and thus zero flag is not switched on whenever the result of the integer division is zero and therefore one additional TEST instruction must be used to check whether the result is zero because this zero result terminates the decimal digits’ extraction.
4. The extraction via division by 10 by multiplication: A strange property of the integer numbers is that division by the constant can be achieved via multiplication with another appropriate constant. For the case of the 32-bit variable the multiplication constant is 858993459 (x/10≈(858993459·(x+1))\233), for the case of 16-bit arithmetic the constant is 13107 (x/10≈13107·((x+1))\217) and for 64-bit arithmetic the corresponding formula is x/10≈(3689348814741910323·(x+1))\265. Disadvantage of this method is that there is no remainder denoting single decimal digits and therefore it has to be additionally calculated. The advantage is usage of MUL instruction which is much faster then DIV one and therefore the entire subroutine is much faster too. The routine that demonstrates this is EAX2ASC one. The improvement of the speed by the replacement of DIV instruction is huge although it may happen in the future that it will not be the case – the latency of the DIV instruction may reach the latency of MUL instruction and then this meandering approach will be obsolete making EAX2ASC routine even slower than EAX2DEC one. The extraction of the decimal digits is performing in reverse order and therefore the digits have to be storing reversely.
5. The decimal digits can be extracted from binary numbers by the Binary Tree Search method: The theory says that the number of comparisons in binary tree per digit should be equal to LOG2(10), and in our case the maximal length of a single branch is maximally 4 comparisons (LOG2(10)=3.32). This method can be implemented via double indirect addressing (routine EAX2BTR) or by utilization of pair of complementary CMOVxx instructions (EAXCMOV). The CMOVxx instructions on some processors have a serious flaw in which the source memory address will be read regardless the condition is fulfilled or not and then the contents will be transferred to the destination location in respect to the condition. In this particular case it means that Access Violation Fault will be launched because the first CMOVC instruction (from the pair of complementary ones) will affect the register holding the indirect memory address of the second CMOVNC instruction which is not expected to be performed at all according the condition but it will be and that launches this fault – this moment is clearly commented in the EAXCMOV routine. Furthermore, the overflow fault may be also triggered in the situation where the source address is calculated by doubling or quadrupling of the addressing register already altered with wrong content. So, the pair of complementary CMOVC/CMOVNC instructions must be kept isolated with preserved address and that requires usage of an extra register. Although the reason for such persistent ignoring of this error is apparently inscrutable, it might be deliberately done because there is a perceivable reason for this failure – on this way the queue of instructions is not flushed and thereby the execution is faster. This routine is consisted of two stages: on the first stage the number of decimal digits is determining and on the second stage the extraction of these digits is performing. As it is mentioned in the above text, the name of the routine which utilizes the double indirect addressing is EAX2BTR and of the routine which uses CMOVC/CMOVNC instructions is EAXCMOV. The LB array contains the guiding rules for the search in binary tree while D array contains the digits weight coefficients used for subtractions during binary tree search. The method of Binary Tree Search requires substantial effort and dedication to be involved in its design and implementation and therefore the proper debugger and patient are necessary and DOS Box environment may help a lot in such situations.
7. “Lazy” method: This is the method which is not the real one or at least less decent one because it utilizes existing Windows API function Wsprintf that is able to convert integer contents of a register into corresponding decimal ASCII value. The method is tremendously slow and it should be used only when the speed is less limiting factor than the time itself. The routine based on this API call is EAX2WIN one. The Wsprintf is full of various formatting and rich with options of all sorts which make this routine tremendously slow. This routine really raises the fear that various compilers may routinely use it tomorrow...

The demo program also contains the SQR routine which simultaneously yields both the result and the remainder. The explanation of the algorithm is beyond the scope of this article and therefore its explanation of operation will be omitted here, but let say that it is derived from the following formula (x + ∆x)2 = x2 + 2 · x · ∆x + (∆x)2 and basically this is a kind of exhaustion of the quantified interval with Binary Tree Search.

The Windows assembly programming is very meticulous job because Windows environment is much more sensitive than the DOS one and therefore the direction flag must be maintained clear all the time otherwise some Window API call may fail or console print may fail, or some other exotic error may be triggered and for this reason another development environment may help a lot and DOS in DOS Box is close enough.

The another disadvantage of the Windows environment in respect to DOS one is the strong isolation between the DATA and CODE sections which impedes coding of the self modifying routine and also disables the possibility of keeping the routines and their memory variables together in the listing. Also, in an OBJ file usually rather exists .text section than the true .CODE section although existence of the .CODE section should be expected according the common myth available in most of Assembler’s textbooks. The ability of keeping the routines gathered with their constants and variables is very important for the Cat & Paste rapid subroutines assembling. These irrational limitation can be surpassed by the modification of the Text section properties via altering of the .text section’s flags trough the section’s command of linker: Link.exe /section:”.text”,ERW, which also duly debunks the second myth frequently cited in Assembly textbooks that the self-modifying code is not possible within Windows and on newer processors.

Although this method alters section’s read/write property via appropriate linker commands and therefore it should be avoided because its implementation requires certain reader’s experience based on the particular assembler’s peculiarities (ML, jwasm, etc.) that may distract reader from the essence of the presented algorithms, this particular way is chosen because it offers creation of much more readable listing. All these routines were initially written in DOS and then transferred in Windows environment via WinASm IDE with MASM32 10.0 package and ML 6.14.8444 with Link 5.12.8078. The alternative assembler used here was JWASM v2.04c.

The routines in this text are already coded highly optimized and the numbers of instructions are minimized. Any assembly routine is much faster than any higher language corresponding one just because the contemporary compilers do not utilize all the instructions and do not spread all data trough the available registers. The MMX and XMM registers usually remain unused or just partially populated by the contemporary compilers and the code is exceedingly filled with MOV instructions, thereby carefully crafted assembly routines usually run much faster.

The listing of the aforesaid routines follows:

;Author of the program and of all algorithms is Andrija Radovic,

;This code should be assembled with:

;\masm32\bin\ML.EXE /c /coff /Cp /nologo /I\Masm32\Include ASCIIwin.asm

;and linked with the following line:

;\masm32\bin\LINK.EXE /SUBSYSTEM:CONSOLE /RELEASE /VERSION:4.0 /LIBPATH:"\Masm32\Lib" /section:".text",ERW ASCIIwin.obj /OUT:ASCIIwin.exe

.686

.MODEL          flat, stdcall

OPTION          casemap:none                                           ;case sensitive

INCLUDE         \masm32\include\windows.inc

INCLUDE         \masm32\include\kernel32.inc

INCLUDELIB      \masm32\lib\kernel32.lib

INCLUDE         \masm32\include\user32.inc

INCLUDELIB      \masm32\lib\user32.lib

INCLUDE         \masm32\include\masm32.inc

INCLUDELIB      \masm32\lib\masm32.lib

;---------------------------------------------------------------------------------------------------------

.DATA

TESTS           DB      "This is the test...", 13, 10, 0

INTRES          DB      0, 0, 0, 0, 0, 0, 0, 0, 0, 0

RESULT          DB      0

;---------------------------------------------------------------------------------------------------------

.STACK  20000

;---------------------------------------------------------------------------------------------------------

.CODE

ASCII:

CALL    TESTR

CALL    MAINN

INVOKE  ExitProcess, 0

;---------------------------------------------------------------------------------------------------------

TESTV   DD      0

TESTR   PROC

MOV     TESTV, 3000000000

MOV     EDX, TESTV

CALL    EDX2DEC

CALL    NEW_ROW

MOV     EAX, TESTV

CALL    EAX2SUB

CALL    NEW_ROW

MOV     ESI, TESTV

CALL    ESI2ASC

CALL    NEW_ROW

MOV     EAX, TESTV

CALL    EAX2ASC

CALL    NEW_ROW

MOV     EAX, TESTV

CALL    EAX2AST

CALL    NEW_ROW

MOV     EAX, TESTV

CALL    EAX2AFL

CALL    NEW_ROW

MOV     EAX, TESTV

CALL    EAX2DEC

CALL    NEW_ROW

MOV     EAX, TESTV

CALL    EAX2BTR

CALL    NEW_ROW

MOV     EAX, TESTV

CALL    EAXCMOV

CALL    NEW_ROW

MOV     EAX, TESTV

CALL    EAX2WIN

CALL    NEW_ROW

MOV     EAX, TESTV

CALL    EAX2BCD

CALL    NEW_ROW

RET

TESTR   ENDP

;---------------------------------------------------------------------------------------------------------

MAINN   PROC

MOV     TESTV, 1000000000

LL0:

;MOV    EDX, TESTV

;CALL   EDX2DEC

MOV     EAX, TESTV

;CALL   EAX2ASC

;CALL   EAX2DEC

;CALL   EAX2AST

;CALL   EAX2AFL

;CALL   EAX2BTR

;CALL   EAXCMOV

CALL    EAX2BCD

CALL    S_EQU

MOV     EAX, TESTV

CALL    EAX_SQR

PUSH    EDX

;CALL   EAX2DEC

;CALL   EAX2BTR

;CALL   EAXCMOV

CALL    EAX2ASC

;CALL   EAX2AST

;CALL   EAX2AFL

;MOV    ESI, EAX

;CALL   ESI2ASC

CALL    S_PLUS

;POP    EAX

;CALL   EAX2SUB

POP     EAX

;CALL   EAX2ASC

CALL    EAX2DEC

;CALL   EAX2ASF

;CALL   EAX2AST

;CALL   EAX2AFL

CALL    NEW_ROW

INC     TESTV

CMP     TESTV, 1000000100

JLE     LL0

RET

MAINN   ENDP

;---------------------------------------------------------------------------------------------------------

SPCL    DB      9, 0

TSPACE  PROC

RET

TSPACE  ENDP

;---------------------------------------------------------------------------------------------------------

NEWROWLDB      13, 10, 0

NEW_ROW PROC

RET

NEW_ROW ENDP

;---------------------------------------------------------------------------------------------------------

EPE     DB      253, " + ", 0

S_PLUS  PROC

RET

S_PLUS  ENDP

;---------------------------------------------------------------------------------------------------------

EQE     DB      " = ", 0

S_EQU   PROC

RET

S_EQU   ENDP

;---------------------------------------------------------------------------------------------------------

D00     DB      02, 1

D01     DB      02, 2

D02     DB      02, 4

D03     DB      03, 8, 0

D04     DB      03, 6, 1

D05     DB      03, 2, 3

D06     DB      04, 4, 6, 0

D07     DB      04, 8, 2, 1

D08     DB      04, 6, 5, 2

D09     DB      05, 2, 1, 5, 0

D10     DB      05, 4, 2, 0, 1

D11     DB      05, 8, 4, 0, 2

D12     DB      05, 6, 9, 0, 4

D13     DB      06, 2, 9, 1, 8, 0

D14     DB      06, 4, 8, 3, 6, 1

D15     DB      06, 8, 6, 7, 2, 3

D16     DB      07, 6, 3, 5, 5, 6, 0

D17     DB      07, 2, 7, 0, 1, 3, 1

D18     DB      07, 4, 4, 1, 2, 6, 2

D19     DB      08, 8, 8, 2, 4, 2, 5, 0

D20     DB      08, 6, 7, 5, 8, 4, 0, 1

D21     DB      08, 2, 5, 1, 7, 9, 0, 2

D22     DB      08, 4, 0, 3, 4, 9, 1, 4

D23     DB      09, 8, 0, 6, 8, 8, 3, 8, 0

D24     DB      09, 6, 1, 2, 7, 7, 7, 6, 1

D25     DB      09, 2, 3, 4, 4, 5, 5, 3, 3

D26     DB      10, 4, 6, 8, 8, 0, 1, 7, 6, 0

D27     DB      10, 8, 2, 7, 7, 1, 2, 4, 3, 1

D28     DB      10, 6, 5, 4, 5, 3, 4, 8, 6, 2

D29     DB      11, 2, 1, 9, 0, 7, 8, 6, 3, 5, 0

D30     DB      11, 4, 2, 8, 1, 4, 7, 3, 7, 0, 1

D31     DB      11, 8, 4, 6, 3, 8, 4, 7, 4, 1, 2

EDX2DEC PROC                                                            ;The routine prints ASCII

STD                                                             ;decimal content of EDX register

MOV     DWORD PTR INTRES, 0                                     ;via BCD arithmetic AAA

MOV     DWORD PTR INTRES + 4, 0                                 ;instruction.

LEA     EBX, D00

LEA     EDI, 9 + INTRES

PUT_I_DO:

MOVZX   ECX, BYTE PTR [EBX]

SHR     EDX, 1

JNC     PUT_I_END_IF

INC     EBX

DEC     ECX

XOR     AX, AX

LEA     EDI, 9 + INTRES

PUT_I_DO1:

MOVZX   AX, AH

AAA

STOSB

INC     EBX

LOOPD   PUT_I_DO1

PUT_I_END_IF:

TEST    EDX, EDX

JNZ     PUT_I_DO

INC     EDI

OR      DWORD PTR INTRES, "0000"

OR      DWORD PTR INTRES + 4, "0000"

OR      WORD PTR INTRES + 8, "00"

XOR     EDX, EDX

CMP     BYTE PTR [EDI], "0"

SETZ    DL

CMP     EDX, OFFSET RESULT

SETZ    CL

SUB     EDX, ECX

CLD

INVOKE  StdOut, EDX

RET

EDX2DEC ENDP

;---------------------------------------------------------------------------------------------------------

ADDIT   DW      "00", "01", "02", "03", "04", "05", "06", "07", "08"

DW      "09", "10", "11", "12", "13", "14", "15", "16", "17"

DW      "18", "19", "20", "21", "22", "23", "24", "25", "26"

DW      "27", "28", "29", "30"

WC      DB      02, "1"

DB      02, "2"

DB      02, "4"

DB      03, "8", "0"

DB      03, "6", "1"

DB      03, "2", "3"

DB      04, "4", "6", "0"

DB      04, "8", "2", "1"

DB      04, "6", "5", "2"

DB      05, "2", "1", "5", "0"

DB      05, "4", "2", "0", "1"

DB      05, "8", "4", "0", "2"

DB      05, "6", "9", "0", "4"

DB      06, "2", "9", "1", "8", "0"

DB      06, "4", "8", "3", "6", "1"

DB      06, "8", "6", "7", "2", "3"

DB      07, "6", "3", "5", "5", "6", "0"

DB      07, "2", "7", "0", "1", "3", "1"

DB      07, "4", "4", "1", "2", "6", "2"

DB      08, "8", "8", "2", "4", "2", "5", "0"

DB      08, "6", "7", "5", "8", "4", "0", "1"

DB      08, "2", "5", "1", "7", "9", "0", "2"

DB      08, "4", "0", "3", "4", "9", "1", "4"

DB      09, "8", "0", "6", "8", "8", "3", "8", "0"

DB      09, "6", "1", "2", "7", "7", "7", "6", "1"

DB      09, "2", "3", "4", "4", "5", "5", "3", "3"

DB      10, "4", "6", "8", "8", "0", "1", "7", "6", "0"

DB      10, "8", "2", "7", "7", "1", "2", "4", "3", "1"

DB      10, "6", "5", "4", "5", "3", "4", "8", "6", "2"

DB      11, "2", "1", "9", "0", "7", "8", "6", "3", "5", "0"

DB      11, "4", "2", "8", "1", "4", "7", "3", "7", "0", "1"

DB      11, "8", "4", "6", "3", "8", "4", "7", "4", "1", "2"

EAX2BCD PROC                                                            ;The routine prints ASCII

MOV     DWORD PTR INTRES, "0000"                                ;decimal content of EDX register

MOV     DWORD PTR INTRES + 4, "0000"                            ;via BCD arithmetics on the

MOV     WORD PTR INTRES + 8, "00"                               ;ADDIT array.

LEA     EBX, WC                                                 ;No specific instruction is used.

EAX2BCD_DO:

MOVZX   ECX, BYTE PTR [EBX]

SHR     EAX, 1

JNC     EAX2BCD_END_IF

INC     EBX

DEC     ECX

MOV     DH, "0"

LEA     EDI, 9 + INTRES

SUB     EDI, ECX

EAX2BCD_DO1:

MOVZX   EDX, DH

ADD     DL, BYTE PTR [EDI + ECX]

MOV     DX, WORD PTR [2 * EDX + OFFSET ADDIT - 6 * "0"]

MOV     BYTE PTR [EDI + ECX], DL

INC     EBX

LOOPD   EAX2BCD_DO1

EAX2BCD_END_IF:

TEST    EAX, EAX

JNZ     EAX2BCD_DO

INC     EDI

XOR     EDX, EDX

CMP     BYTE PTR [EDI], "0"

SETZ    DL

CMP     EDX, OFFSET RESULT

SETZ    CL

SUB     EDX, ECX

INVOKE  StdOut, EDX

RET

EAX2BCD ENDP

;---------------------------------------------------------------------------------------------------------

BASES   DD      1000000000, 100000000, 10000000, 1000000, 100000, 10000, 1000, 100, 10, 1

NUMDIG  DB      9, 9, 9, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2

DB      1, 1, 1, 0, 0, 0

EAX2SUB PROC                                                            ;The routine prints ASCII

MOV     DX, "00" - 101H                                         ;decimal content of EAX register

BSR     EDI, EAX                                                ;by repetitive subtractions

CMOVZ   EDI, EAX                                                ;with coefficients pulled from

MOVZX   EDI, BYTE PTR [EDI + OFFSET NUMDIG]                     ;array with reducted number

PUSH    EDI                                                     ;of iteration.

MOV     EBX, DWORD PTR [4 * EDI + OFFSET BASES]

EBX_DO2:

INC     DL

SUB     EAX, EBX

JNC     EBX_DO2

MOV     BYTE PTR [EDI + OFFSET INTRES], DL

INC     EDI

MOV     DL, DH

JNZ     EAX2SUB_DO1

POP     EDI

CMP     BYTE PTR [EDI + OFFSET INTRES], "0"

SETZ    AL

LEA     EDI, [EDI + EAX + OFFSET INTRES]

CMP     EDI, OFFSET INTRES + 10

SETZ    AL

SUB     EDI, EAX

INVOKE  StdOut, EDI

RET

EAX2SUB ENDP

;---------------------------------------------------------------------------------------------------------

ESI2ASC PROC                                                            ;The routine prints ASCII

LEA     EDI, INTRES                                             ;decimal content of ESI register

MOV     AX, "00" - 101H                                         ;by repetitive subtractions

PUSH    DWORD PTR 0                                             ;with coefficients pulled from

PUSH    DWORD PTR 1                                             ;stack without counter.

PUSH    DWORD PTR 100

PUSH    DWORD PTR 1000

PUSH    DWORD PTR 10000

PUSH    DWORD PTR 100000

PUSH    DWORD PTR 1000000

PUSH    DWORD PTR 10000000

PUSH    DWORD PTR 100000000

MOV     EBX, 1000000000

CLD

EAS_DO1:

EAS_DO2:

INC     AX

SUB     ESI, EBX

JNC     EAS_DO2

STOSB

MOV     AL, AH

POP     EBX

TEST    EBX, EBX

JNZ     EAS_DO1

MOV     AL, "0"

MOV     ECX, 10

LEA     EDI, INTRES

REPE    SCASB

DEC     EDI

INVOKE  StdOut, EDI

RET

ESI2ASC ENDP

;---------------------------------------------------------------------------------------------------------

ASCII_TABLE     DW      "00", "10", "20", "30", "40", "50", "60", "70", "80", "90", "00", "00"

DW      "00", "00", "00", "00", "01", "11", "21", "31", "41", "51", "61", "71"

DW      "81", "91", "00", "00", "00", "00", "00", "00", "02", "12", "22", "32"

DW      "42", "52", "62", "72", "82", "92", "00", "00", "00", "00", "00", "00"

DW      "03", "13", "23", "33", "43", "53", "63", "73", "83", "93", "00", "00"

DW      "00", "00", "00", "00", "04", "14", "24", "34", "44", "54", "64", "74"

DW      "84", "94", "00", "00", "00", "00", "00", "00", "05", "15", "25", "35"

DW      "45", "55", "65", "75", "85", "95", "00", "00", "00", "00", "00", "00"

DW      "06", "16", "26", "36", "46", "56", "66", "76", "86", "96", "00", "00"

DW      "00", "00", "00", "00", "07", "17", "27", "37", "47", "57", "67", "77"

DW      "87", "97", "00", "00", "00", "00", "00", "00", "08", "18", "28", "38"

DW      "48", "58", "68", "78", "88", "98", "00", "00", "00", "00", "00", "00"

DW      "09", "19", "29", "39", "49", "59", "69", "79", "89", "99", "00", "00"

BCRESUL         DT      0

EAX2AST PROC                                                            ;The routine prints ASCII

MOV     DWORD PTR BCRESUL, EAX                                  ;decimal content of EAX register

MOV     DWORD PTR BCRESUL + 4, 0                                ;by usage of coprocessor FBSTP

FILD    QWORD PTR BCRESUL                                       ;instruction which converts

FBSTP   BCRESUL                                                 ;number into the packed BCD.

LEA     EDI, INTRES                                             ;Unpacking of BCD is done via

MOV     ECX, 5                                                  ;the table of unpacked values.

EAX2AST_DO:

MOVZX   EAX, BYTE PTR [ECX + OFFSET BCRESUL - 1]

MOV     AX, WORD PTR [2 * EAX + OFFSET ASCII_TABLE]

STOSW

LOOPD   EAX2AST_DO

MOV     AL, "0"

MOV     ECX, 10

LEA     EDI, INTRES

REPE    SCASB

DEC     EDI

INVOKE  StdOut, EDI

RET

EAX2AST ENDP

;---------------------------------------------------------------------------------------------------------

BCDEX   DT      0

EAX2AFL PROC                                                            ;The routine prints ASCII

STD                                                             ;decimal content of EAX register

MOV     DWORD PTR BCDEX, EAX                                    ;by usage of coprocessor FBSTP

MOV     DWORD PTR BCDEX + 4, 0                                  ;instruction which converts

FILD    QWORD PTR BCDEX                                         ;number into the packed BCD.

FBSTP   BCDEX                                                   ;Unpacking of BCD is done via

MOV     EDX, DWORD PTR BCDEX                                    ;SHR and SHRD instructions.

LEA     EDI, 9 + INTRES

EAX2AFL_DO:

MOV     EAX, EDX

AND     EAX, 15

OR      AL, "0"

STOSB

SHRD    EDX, EBX, 4

SHR     EBX, 4

MOV     EAX, EDX

OR      EAX, EBX

JNZ     EAX2AFL_DO

LEA     EDX, [EDI + 1]

CLD

INVOKE  StdOut, EDX

RET

EAX2AFL ENDP

;---------------------------------------------------------------------------------------------------------

EAX2DEC PROC                                                            ;This routine prints ASCII

LEA     ECX, INTRES + 9                                         ;decimal content of EAX register

MOV     EBX, 10                                                 ;by its repetitive dividing by

EAX2DEC_DO:                                                     ;10 using DIV instruction that

XOR     EDX, EDX                                        ;simultaneously yields result

DIV     EBX                                             ;and remainder which denotes

OR      EDX, "0"                                        ;current decimal digit.

TEST    EAX, EAX

LOOPNZD EAX2DEC_DO

LEA     EDX, [ECX + 1]

INVOKE  StdOut, EDX

RET

EAX2DEC ENDP

;---------------------------------------------------------------------------------------------------------

EAX2ASC PROC                                                            ;This routine prints ASCII

LEA     ECX, INTRES + 9                                         ;decimal content of EAX register

MOV     EDI, 858993459                                          ;by its repetitive dividing by

MOV     EBX, EAX                                                ;10 using MUL instruction to

AX2ASC_DO:                                                      ;divide by 10 via multiplication

LEA     EAX, [EBX + 1]                                  ;with the appropriate constant.

SHR     EDX, 1

LEA     EAX, [4 * EDX + EDX]

NEG     EAX

LEA     EAX, [EBX + 2 * EAX + "0"]

MOV     BYTE PTR [ECX], AL

MOV     EBX, EDX

LOOPNZD AX2ASC_DO

LEA     EDX, [ECX + 1]

INVOKE  StdOut, EDX

RET

EAX2ASC ENDP

;---------------------------------------------------------------------------------------------------------

LB      DB      0F0H,0F1H, 1,3, 0F2H,4, 0F3H,0F4H, 2,7, 0F5H,0F6H, 6,8, 0F7H,9, 0F8H,0F9H

D       DD      0,1,2,3,4,5,6,7,8,9

DD      00,10,20,30,40,50,60,70,80,90

DD      000,100,200,300,400,500,600,700,800,900

DD      0000,1000,2000,3000,4000,5000,6000,7000,8000,9000

DD      00000,10000,20000,30000,40000,50000,60000,70000,80000,90000

DD      000000,100000,200000,300000,400000,500000,600000,700000,800000,900000

DD      0000000,1000000,2000000,3000000,4000000,5000000,6000000,7000000,8000000,9000000

DD      00000000,10000000,20000000,30000000,40000000,50000000,60000000,70000000,80000000,90000000

DD      000000000,100000000,200000000,300000000,400000000,500000000,600000000,700000000,800000000

DD      900000000

DD      0000000000,1000000000,2000000000,3000000000,4000000000,4294967295,4294967295,4294967295

DD      4294967295,4294967295

DD      1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000

EAX2BTR PROC                                                            ;This routine prints ASCII

CLD                                                             ;decimal content of EAX register

MOV     ESI, 400                                                ;by binary tree digits

MOV     EBX, EAX                                                ;extraction.

MOV     EAX, 5                                                  ;No specific instruction is used.

CMP     EBX, DWORD PTR [4 * EAX + ESI + OFFSET D]

SBB     EDX, EDX

MOV     AL, BYTE PTR [2 * EAX + EDX + (OFFSET LB - 1)]

TEST    AL, AL

JNS     EAX2BTR_DO1

AND     AX, 15

LEA     EDI, 9 + INTRES

SUB     EDI, EAX

PUSH    EDI

LEA     ESI, [EAX + 4 * EAX]

MOV     EAX, 5

SHL     ESI, 3

EAX2BTR_DO3:

CMP     EBX, DWORD PTR [4 * EAX + ESI + OFFSET D]

SBB     EDX, EDX

MOV     AL, BYTE PTR [2 * EAX + EDX + (OFFSET LB - 1)]

TEST    AL, AL

JNS     EAX2BTR_DO3

AND     EAX, 15

SUB     EBX, DWORD PTR [4 * EAX + ESI + OFFSET D]

OR      AL, "0"

STOSB

MOV     EAX, 5

SUB     ESI, 40

JNC     EAX2BTR_DO3

POP     EDX

INVOKE  StdOut, EDX

RET

EAX2BTR ENDP

;---------------------------------------------------------------------------------------------------------

LC      DW      0FF00H,0FF01H, 1,3, 0FF02H,4, 0FF03H,0FF04H, 2,7, 0FF05H,0FF06H, 6,8, 0FF07H,9

DW      0FF08H,0FF09H

DC      DD      0,1,2,3,4,5,6,7,8,9

DD      00,10,20,30,40,50,60,70,80,90

DD      000,100,200,300,400,500,600,700,800,900

DD      0000,1000,2000,3000,4000,5000,6000,7000,8000,9000

DD      00000,10000,20000,30000,40000,50000,60000,70000,80000,90000

DD      000000,100000,200000,300000,400000,500000,600000,700000,800000,900000

DD      0000000,1000000,2000000,3000000,4000000,5000000,6000000,7000000,8000000,9000000

DD      00000000,10000000,20000000,30000000,40000000,50000000,60000000,70000000,80000000,90000000

DD      000000000,100000000,200000000,300000000,400000000,500000000,600000000,700000000,800000000

DD      900000000

DD      0000000000,1000000000,2000000000,3000000000,4000000000,4294967295,4294967295,4294967295

DD      4294967295,4294967295

DD      1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000

EAXCMOV PROC                                                            ;This routine prints ASCII

CLD                                                             ;decimal content of EAX register

MOV     ESI, 400                                                ;by binary tree digits

MOV     EBX, EAX                                                ;extraction.

MOV     EAX, 5                                                  ;CMOV instruction is used.

CMP     EBX, DWORD PTR [4 * EAX + ESI + OFFSET DC]

;               CMOVC   AX, WORD PTR [4 * EAX + (OFFSET LC - 4)]        ;These two instructions

;               CMOVNC  AX, WORD PTR [4 * EAX + (OFFSET LC - 2)]        ;work on some processors...

CMOVC   DX, WORD PTR [4 * EAX + (OFFSET LC - 4)]        ;Malfunction of x86 CMOV

CMOVNC  DX, WORD PTR [4 * EAX + (OFFSET LC - 2)]        ;requires these ones instead

MOV     AX, DX                                          ;of above two instructions.

TEST    AX, AX

JNS     EAXCMOV_DO1

MOVZX   EAX, AL

LEA     EDI, 9 + INTRES

SUB     EDI, EAX

PUSH    EDI

LEA     ESI, [EAX + 4 * EAX]

MOV     EAX, 5

SHL     ESI, 3

EAXCMOV_DO3:

CMP     EBX, DWORD PTR [4 * EAX + ESI + OFFSET DC]

;               CMOVC   AX, WORD PTR [4 * EAX + (OFFSET LC - 4)]        ;These two instructions

;               CMOVNC  AX, WORD PTR [4 * EAX + (OFFSET LC - 2)]        ;work on some processors...

CMOVC   DX, WORD PTR [4 * EAX + (OFFSET LC - 4)]        ;Malfunction of x86 CMOV

CMOVNC  DX, WORD PTR [4 * EAX + (OFFSET LC - 2)]        ;requires these ones instead

MOV     AX, DX                                          ;of above two instructions.

TEST    AX, AX

JNS     EAXCMOV_DO3

MOVZX   EAX, AL

SUB     EBX, DWORD PTR [4 * EAX + ESI + OFFSET DC]

OR      AL, "0"

STOSB

MOV     EAX, 5

SUB     ESI, 40

JNC     EAXCMOV_DO3

POP     EDX

INVOKE  StdOut, EDX

RET

EAXCMOV ENDP

;---------------------------------------------------------------------------------------------------------

NumFormat       DB      "%u",0

BufferW         DB      32DUP(0)

EAX2WIN PROC                                                            ;This routine prints ASCII EAX.

INVOKE  wsprintf, ADDR BufferW, ADDR NumFormat, EAX             ;It utilizes native Windows NT

RET

EAX2WIN ENDP

;---------------------------------------------------------------------------------------------------------

EAX_SQR PROC                                                            ;EAX = SQR(EAX), square rooting

XOR     ESI, ESI                                                ;routine.

XOR     EDX, EDX                                                ;EDX is remainder.

SQRT_DO:

LEA     EDI, [EBX + ESI]

SHR     EDX, 1

CMP     EAX, EDI

JC      SQRT_END_IF

MOV     ESI, EDI

SQRT_END_IF:

SHR     EBX, 2

JNZ     SQRT_DO

SUB     EAX, ESI

XCHG    EAX, EDX

RET

EAX_SQR ENDP

;---------------------------------------------------------------------------------------------------------

END     ASCII

Above routines should be assembled rather from the command line than from some assembler’s IDE because most of IDEs usually do not offer full control of the parameters passed to assembler and linker, just as is required by the linking process with removed section’s read-only property.

Therefore they should be assembled manually from the command line:

C:\masm32\bin\ML.EXE /c /coff /Cp /nologo /I"\Masm32\Include" ASCIIwin.asm

And it should be linked with the following single line which is spread to two lines of text although this actually should be only one uninterrupted line in the console:

C:\masm32\bin\LINK.EXE /SUBSYSTEM:CONSOLE /RELEASE /LIBPATH:"\Masm32\Lib" /section:".text",ERW ASCIIwin.obj

Above two lines make EXE program with writable Code section. The benefit of such hand crafted linking is ability to write self modifying code. Although it is highly discouraged by the MASM manual, this is something that should not be instantly rejected! Actually, all the recommendations those allegedly improve prettiness of the code by forcing real limitations to programmers should be immediately rejected! The self modifying code is very useful and sometimes may shrink code a lot and rejection of such option just because somebody found that it is not fancy enough seems to be quite wrong way of action. The second ability to keep all the routines’ peaces tightly together is also very important for quick and efficient coding, which all requires section with the code to be writable.

CONCLUSION

All presented methods suffer from some sorts of disadvantages that elongate their executions. The misery of the task is consisted of the fact that set of 2 elements should be translated into the set of only 10 elements with up to 10 decimal positions and this exceptionally small number of decimal digits and decimal positions (32-bit integer has 10 decimal positions) really limits the efficiency of all of the applied optimization schemes, highly challenging coding skills because the number of instructions in optimization schemes reach the number of instructions performed in main loops.

Although the algorithm EDX2DEC based on the AAA instruction altogether with the EAX2BTR based on binary tree search are duly incomprehensible at the glance, this give it special charm making them quite fashionable. Despite the fact that EAX2BTR utilizes theoretically very promising Binary Tree Search, it is not much better than much simpler algorithms like EAX2SUB based on the repetitive subtractions. The one non-trivial algorithm with comparable performance with simplest ones is the EAX2ASC routine that utilizes division trough multiplication with only 8 instructions within main loop, which is going to be quite fast on the new processors with MUL instruction able to accomplish it in just of a couple of clock cycles. The algorithm EAX2DEC based on the DIV instruction is absolute champion of shortness – only five instructions within the main loop, but DIV instruction may be very slow depending on the particular processor. The situation is quite reversal when these algorithms should be transferred in VHDL hardware, then the binary tree search and BCD AAA instruction based algorithms are the most promising ones offering tremendous opportunities for hardware parallelization – they can reach even execution in a single clock cycle only.

P. S.

Intel explained in an E-mail that this operation of the CMOV instruction is not actually bug and that this is rather a feature because it works just as they figured out it should work. This leaves a bitter taste because there is a following question about the purpose of this abundance of instructions of all sorts with only purpose to facilitate optimization and then they cannot fulfill their unique mission just because they operates… let say on peculiar ways. It seems that some kind of “Application Notes” that is usual thing in electronic industry for years really should become common thing in the Microprocessor industry too and that Intel should issue one book with demonstrations how they figured out precisely how every single particular instruction should be used. Without that many of potentially mighty instructions will remain unnoticed which is instant loss for customers and developers. It is interesting that AMD processors also evince identical behavior indicating that their designers mutually exchange microcode. I had no chance to test other manufacturers processors but there is reasonable suspicion that they all operate on the quite same way.

Author:

Attention:

These routines are freeware only for personal use and therefore they can be freely distributed. The author will not be responsible for any kind of loss caused by the usage of any one those routines. These routines should not be distributed for commercial purpose. The name of the author must stay visible on program as contributor unless there is a different agreement. If you charge money for your application, or use the application in conjunction with a product you sell then you require granted permission for usage of unmodified or slightly modified any of those routines.

If you agree with above terms press the following key for the download of source and executable:

Press the following key for the download of PDF vesrion of the text: