TRANSFER OF AN INTEGER BINARY VALUE INTO DECIMAL ASCII STRING
Abstract
The objective of this article is to present the variety of methods available for transfer of an integer binary number into the decimal ASCII string. The common methods often are not the fastest ones and there are some almost forgotten and still reliable methods (by repetitive subtractions and additions) and some less known ones like division via multiplication or binary tree search for the decimal digits.
Although we write from left to right, the weight places in decimal numbers are organized in reverse order, from right to left, and this require minor modifications and additional code in some algorithms. It is much more convenient on paper or papyrus to write from left to right because the process of writing can be visually controlled and the angle of the pen is higher then 90º which facilitate pen’s gliding on paper, while writing on the clay has to be performed in reverse order because the angle of the pen, i.e. wedge shaped stick is less then 90º and therefore it is in much better position to scratch and punch the raw clay table. The reverse order of digits in decimal numbers is the legacy of that ancient era of cuneiform writing on clay tables which became quite obvious during the development of these routines for 2→10 radix conversion. This challenges historical theory that decimal numbers appears firstly in India because they wrote from left to right on fabrics and frescoes and therefore decimal numbers could be much older and with the different origin.
The problem itself is mathematically quite clear and essentially it is consisted of radix transition from radix 2 into the radix 10, but its practical solution is not that simple.
There are 6 basic ways to transfer register’s binary value into its decimal ASCII string:
The demo program also contains the SQR routine which simultaneously yields both the result and the remainder. The explanation of the algorithm is beyond the scope of this article and therefore its explanation of operation will be omitted here, but let say that it is derived from the following formula (x + ∆x)2 = x2 + 2 · x · ∆x + (∆x)2 and basically this is a kind of exhaustion of the quantified interval with Binary Tree Search.
The Windows assembly programming is very meticulous job because Windows environment is much more sensitive than the DOS one and therefore the direction flag must be maintained clear all the time otherwise some Window API call may fail or console print may fail, or some other exotic error may be triggered and for this reason another development environment may help a lot and DOS in DOS Box is close enough.
The another disadvantage of the Windows environment in respect to DOS one is the strong isolation between the DATA and CODE sections which impedes coding of the self modifying routine and also disables the possibility of keeping the routines and their memory variables together in the listing. Also, in an OBJ file usually rather exists .text section than the true .CODE section although existence of the .CODE section should be expected according the common myth available in most of Assembler’s textbooks. The ability of keeping the routines gathered with their constants and variables is very important for the Cat & Paste rapid subroutines assembling. These irrational limitation can be surpassed by the modification of the Text section properties via altering of the .text section’s flags trough the section’s command of linker: Link.exe /section:”.text”,ERW, which also duly debunks the second myth frequently cited in Assembly textbooks that the self-modifying code is not possible within Windows and on newer processors.
Although this method alters section’s read/write property via appropriate linker commands and therefore it should be avoided because its implementation requires certain reader’s experience based on the particular assembler’s peculiarities (ML, jwasm, etc.) that may distract reader from the essence of the presented algorithms, this particular way is chosen because it offers creation of much more readable listing. All these routines were initially written in DOS and then transferred in Windows environment via WinASm IDE with MASM32 10.0 package and ML 6.14.8444 with Link 5.12.8078. The alternative assembler used here was JWASM v2.04c.
The routines in this text are already coded highly optimized and the numbers of instructions are minimized. Any assembly routine is much faster than any higher language corresponding one just because the contemporary compilers do not utilize all the instructions and do not spread all data trough the available registers. The MMX and XMM registers usually remain unused or just partially populated by the contemporary compilers and the code is exceedingly filled with MOV instructions, thereby carefully crafted assembly routines usually run much faster.
The listing of the aforesaid routines follows:
;Author of the program and of all algorithms is Andrija Radovic,
;All Rights Reserved, ©2011.
;This code should be assembled with:
;\masm32\bin\ML.EXE /c /coff /Cp /nologo /I\Masm32\Include ASCIIwin.asm
;and linked with the following line:
;\masm32\bin\LINK.EXE /SUBSYSTEM:CONSOLE /RELEASE /VERSION:4.0 /LIBPATH:"\Masm32\Lib" /section:".text",ERW ASCIIwin.obj /OUT:ASCIIwin.exe
.686
.MODEL flat, stdcall
OPTION casemap:none ;case sensitive
INCLUDE \masm32\include\windows.inc
INCLUDE \masm32\include\kernel32.inc
INCLUDELIB \masm32\lib\kernel32.lib
INCLUDE \masm32\include\user32.inc
INCLUDELIB \masm32\lib\user32.lib
INCLUDE \masm32\include\masm32.inc
INCLUDELIB \masm32\lib\masm32.lib
;---------------------------------------------------------------------------------------------------------
.DATA
TESTS DB "This is the test...", 13, 10, 0
INTRES DB 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
RESULT DB 0
;---------------------------------------------------------------------------------------------------------
.STACK 20000
;---------------------------------------------------------------------------------------------------------
.CODE
ASCII:
INVOKE StdOut, ADDR TESTS
CALL TESTR
INVOKE StdOut, ADDR TESTS
CALL MAINN
INVOKE StdOut, ADDR TESTS
INVOKE ExitProcess, 0
;---------------------------------------------------------------------------------------------------------
TESTV DD 0
TESTR PROC
MOV TESTV, 3000000000
MOV EDX, TESTV
CALL EDX2DEC
CALL NEW_ROW
MOV EAX, TESTV
ADD EAX, 1
CALL EAX2SUB
CALL NEW_ROW
MOV ESI, TESTV
ADD ESI, 2
CALL ESI2ASC
CALL NEW_ROW
MOV EAX, TESTV
ADD EAX, 3
CALL EAX2ASC
CALL NEW_ROW
MOV EAX, TESTV
ADD EAX, 4
CALL EAX2AST
CALL NEW_ROW
MOV EAX, TESTV
ADD EAX, 5
CALL EAX2AFL
CALL NEW_ROW
MOV EAX, TESTV
ADD EAX, 6
CALL EAX2DEC
CALL NEW_ROW
MOV EAX, TESTV
ADD EAX, 7
CALL EAX2BTR
CALL NEW_ROW
MOV EAX, TESTV
ADD EAX, 8
CALL EAXCMOV
CALL NEW_ROW
MOV EAX, TESTV
ADD EAX, 9
CALL EAX2WIN
CALL NEW_ROW
MOV EAX, TESTV
ADD EAX, 10
CALL EAX2BCD
CALL NEW_ROW
RET
TESTR ENDP
;---------------------------------------------------------------------------------------------------------
MAINN PROC
MOV TESTV, 1000000000
LL0:
;MOV EDX, TESTV
;CALL EDX2DEC
MOV EAX, TESTV
;CALL EAX2ASC
;CALL EAX2DEC
;CALL EAX2AST
;CALL EAX2AFL
;CALL EAX2BTR
;CALL EAXCMOV
CALL EAX2BCD
CALL S_EQU
MOV EAX, TESTV
CALL EAX_SQR
PUSH EDX
;CALL EAX2DEC
;CALL EAX2BTR
;CALL EAXCMOV
CALL EAX2ASC
;CALL EAX2AST
;CALL EAX2AFL
;MOV ESI, EAX
;CALL ESI2ASC
CALL S_PLUS
;POP EAX
;CALL EAX2SUB
POP EAX
;CALL EAX2ASC
CALL EAX2DEC
;CALL EAX2ASF
;CALL EAX2AST
;CALL EAX2AFL
CALL NEW_ROW
INC TESTV
CMP TESTV, 1000000100
JLE LL0
RET
MAINN ENDP
;---------------------------------------------------------------------------------------------------------
SPCL DB 9, 0
TSPACE PROC
INVOKE StdOut, ADDR SPCL
RET
TSPACE ENDP
;---------------------------------------------------------------------------------------------------------
NEWROWLDB 13, 10, 0
NEW_ROW PROC
INVOKE StdOut, ADDR NEWROWL
RET
NEW_ROW ENDP
;---------------------------------------------------------------------------------------------------------
EPE DB 253, " + ", 0
S_PLUS PROC
INVOKE StdOut, ADDR EPE
RET
S_PLUS ENDP
;---------------------------------------------------------------------------------------------------------
EQE DB " = ", 0
S_EQU PROC
INVOKE StdOut, ADDR EQE
RET
S_EQU ENDP
;---------------------------------------------------------------------------------------------------------
D00 DB 02, 1
D01 DB 02, 2
D02 DB 02, 4
D03 DB 03, 8, 0
D04 DB 03, 6, 1
D05 DB 03, 2, 3
D06 DB 04, 4, 6, 0
D07 DB 04, 8, 2, 1
D08 DB 04, 6, 5, 2
D09 DB 05, 2, 1, 5, 0
D10 DB 05, 4, 2, 0, 1
D11 DB 05, 8, 4, 0, 2
D12 DB 05, 6, 9, 0, 4
D13 DB 06, 2, 9, 1, 8, 0
D14 DB 06, 4, 8, 3, 6, 1
D15 DB 06, 8, 6, 7, 2, 3
D16 DB 07, 6, 3, 5, 5, 6, 0
D17 DB 07, 2, 7, 0, 1, 3, 1
D18 DB 07, 4, 4, 1, 2, 6, 2
D19 DB 08, 8, 8, 2, 4, 2, 5, 0
D20 DB 08, 6, 7, 5, 8, 4, 0, 1
D21 DB 08, 2, 5, 1, 7, 9, 0, 2
D22 DB 08, 4, 0, 3, 4, 9, 1, 4
D23 DB 09, 8, 0, 6, 8, 8, 3, 8, 0
D24 DB 09, 6, 1, 2, 7, 7, 7, 6, 1
D25 DB 09, 2, 3, 4, 4, 5, 5, 3, 3
D26 DB 10, 4, 6, 8, 8, 0, 1, 7, 6, 0
D27 DB 10, 8, 2, 7, 7, 1, 2, 4, 3, 1
D28 DB 10, 6, 5, 4, 5, 3, 4, 8, 6, 2
D29 DB 11, 2, 1, 9, 0, 7, 8, 6, 3, 5, 0
D30 DB 11, 4, 2, 8, 1, 4, 7, 3, 7, 0, 1
D31 DB 11, 8, 4, 6, 3, 8, 4, 7, 4, 1, 2
EDX2DEC PROC ;The routine prints ASCII
STD ;decimal content of EDX register
MOV DWORD PTR INTRES, 0 ;via BCD arithmetic AAA
MOV DWORD PTR INTRES + 4, 0 ;instruction.
MOV WORD PTR INTRES + 8, 0 ;Author: Andrija Radovic, ©2011
LEA EBX, D00
LEA EDI, 9 + INTRES
PUT_I_DO:
MOVZX ECX, BYTE PTR [EBX]
SHR EDX, 1
JNC PUT_I_END_IF
INC EBX
DEC ECX
XOR AX, AX
LEA EDI, 9 + INTRES
PUT_I_DO1:
MOVZX AX, AH
ADD AL, BYTE PTR [EDI]
ADD AL, BYTE PTR [EBX]
AAA
STOSB
INC EBX
LOOPD PUT_I_DO1
PUT_I_END_IF:
ADD EBX, ECX
TEST EDX, EDX
JNZ PUT_I_DO
INC EDI
OR DWORD PTR INTRES, "0000"
OR DWORD PTR INTRES + 4, "0000"
OR WORD PTR INTRES + 8, "00"
XOR EDX, EDX
CMP BYTE PTR [EDI], "0"
SETZ DL
ADD EDX, EDI
CMP EDX, OFFSET RESULT
SETZ CL
SUB EDX, ECX
CLD
INVOKE StdOut, EDX
RET
EDX2DEC ENDP
;---------------------------------------------------------------------------------------------------------
ADDIT DW "00", "01", "02", "03", "04", "05", "06", "07", "08"
DW "09", "10", "11", "12", "13", "14", "15", "16", "17"
DW "18", "19", "20", "21", "22", "23", "24", "25", "26"
DW "27", "28", "29", "30"
WC DB 02, "1"
DB 02, "2"
DB 02, "4"
DB 03, "8", "0"
DB 03, "6", "1"
DB 03, "2", "3"
DB 04, "4", "6", "0"
DB 04, "8", "2", "1"
DB 04, "6", "5", "2"
DB 05, "2", "1", "5", "0"
DB 05, "4", "2", "0", "1"
DB 05, "8", "4", "0", "2"
DB 05, "6", "9", "0", "4"
DB 06, "2", "9", "1", "8", "0"
DB 06, "4", "8", "3", "6", "1"
DB 06, "8", "6", "7", "2", "3"
DB 07, "6", "3", "5", "5", "6", "0"
DB 07, "2", "7", "0", "1", "3", "1"
DB 07, "4", "4", "1", "2", "6", "2"
DB 08, "8", "8", "2", "4", "2", "5", "0"
DB 08, "6", "7", "5", "8", "4", "0", "1"
DB 08, "2", "5", "1", "7", "9", "0", "2"
DB 08, "4", "0", "3", "4", "9", "1", "4"
DB 09, "8", "0", "6", "8", "8", "3", "8", "0"
DB 09, "6", "1", "2", "7", "7", "7", "6", "1"
DB 09, "2", "3", "4", "4", "5", "5", "3", "3"
DB 10, "4", "6", "8", "8", "0", "1", "7", "6", "0"
DB 10, "8", "2", "7", "7", "1", "2", "4", "3", "1"
DB 10, "6", "5", "4", "5", "3", "4", "8", "6", "2"
DB 11, "2", "1", "9", "0", "7", "8", "6", "3", "5", "0"
DB 11, "4", "2", "8", "1", "4", "7", "3", "7", "0", "1"
DB 11, "8", "4", "6", "3", "8", "4", "7", "4", "1", "2"
EAX2BCD PROC ;The routine prints ASCII
MOV DWORD PTR INTRES, "0000" ;decimal content of EDX register
MOV DWORD PTR INTRES + 4, "0000" ;via BCD arithmetics on the
MOV WORD PTR INTRES + 8, "00" ;ADDIT array.
LEA EBX, WC ;No specific instruction is used.
LEA EDI, 9 + INTRES ;Author: Andrija Radovic, ©2011
EAX2BCD_DO:
MOVZX ECX, BYTE PTR [EBX]
SHR EAX, 1
JNC EAX2BCD_END_IF
INC EBX
DEC ECX
MOV DH, "0"
LEA EDI, 9 + INTRES
SUB EDI, ECX
EAX2BCD_DO1:
MOVZX EDX, DH
ADD DL, BYTE PTR [EDI + ECX]
ADD DL, BYTE PTR [EBX]
MOV DX, WORD PTR [2 * EDX + OFFSET ADDIT - 6 * "0"]
MOV BYTE PTR [EDI + ECX], DL
INC EBX
LOOPD EAX2BCD_DO1
EAX2BCD_END_IF:
ADD EBX, ECX
TEST EAX, EAX
JNZ EAX2BCD_DO
INC EDI
XOR EDX, EDX
CMP BYTE PTR [EDI], "0"
SETZ DL
ADD EDX, EDI
CMP EDX, OFFSET RESULT
SETZ CL
SUB EDX, ECX
INVOKE StdOut, EDX
RET
EAX2BCD ENDP
;---------------------------------------------------------------------------------------------------------
BASES DD 1000000000, 100000000, 10000000, 1000000, 100000, 10000, 1000, 100, 10, 1
NUMDIG DB 9, 9, 9, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2
DB 1, 1, 1, 0, 0, 0
EAX2SUB PROC ;The routine prints ASCII
MOV DX, "00" - 101H ;decimal content of EAX register
BSR EDI, EAX ;by repetitive subtractions
CMOVZ EDI, EAX ;with coefficients pulled from
MOVZX EDI, BYTE PTR [EDI + OFFSET NUMDIG] ;array with reducted number
PUSH EDI ;of iteration.
EAX2SUB_DO1: ;Author: Andrija Radovic, ©2011
MOV EBX, DWORD PTR [4 * EDI + OFFSET BASES]
EBX_DO2:
INC DL
SUB EAX, EBX
JNC EBX_DO2
MOV BYTE PTR [EDI + OFFSET INTRES], DL
INC EDI
MOV DL, DH
ADD EAX, EBX
JNZ EAX2SUB_DO1
POP EDI
CMP BYTE PTR [EDI + OFFSET INTRES], "0"
SETZ AL
LEA EDI, [EDI + EAX + OFFSET INTRES]
CMP EDI, OFFSET INTRES + 10
SETZ AL
SUB EDI, EAX
INVOKE StdOut, EDI
RET
EAX2SUB ENDP
;---------------------------------------------------------------------------------------------------------
ESI2ASC PROC ;The routine prints ASCII
LEA EDI, INTRES ;decimal content of ESI register
MOV AX, "00" - 101H ;by repetitive subtractions
PUSH DWORD PTR 0 ;with coefficients pulled from
PUSH DWORD PTR 1 ;stack without counter.
PUSH DWORD PTR 10 ;Author: Andrija Radovic, ©2011
PUSH DWORD PTR 100
PUSH DWORD PTR 1000
PUSH DWORD PTR 10000
PUSH DWORD PTR 100000
PUSH DWORD PTR 1000000
PUSH DWORD PTR 10000000
PUSH DWORD PTR 100000000
MOV EBX, 1000000000
CLD
EAS_DO1:
EAS_DO2:
INC AX
SUB ESI, EBX
JNC EAS_DO2
ADD ESI, EBX
STOSB
MOV AL, AH
POP EBX
TEST EBX, EBX
JNZ EAS_DO1
MOV AL, "0"
MOV ECX, 10
LEA EDI, INTRES
REPE SCASB
DEC EDI
INVOKE StdOut, EDI
RET
ESI2ASC ENDP
;---------------------------------------------------------------------------------------------------------
ASCII_TABLE DW "00", "10", "20", "30", "40", "50", "60", "70", "80", "90", "00", "00"
DW "00", "00", "00", "00", "01", "11", "21", "31", "41", "51", "61", "71"
DW "81", "91", "00", "00", "00", "00", "00", "00", "02", "12", "22", "32"
DW "42", "52", "62", "72", "82", "92", "00", "00", "00", "00", "00", "00"
DW "03", "13", "23", "33", "43", "53", "63", "73", "83", "93", "00", "00"
DW "00", "00", "00", "00", "04", "14", "24", "34", "44", "54", "64", "74"
DW "84", "94", "00", "00", "00", "00", "00", "00", "05", "15", "25", "35"
DW "45", "55", "65", "75", "85", "95", "00", "00", "00", "00", "00", "00"
DW "06", "16", "26", "36", "46", "56", "66", "76", "86", "96", "00", "00"
DW "00", "00", "00", "00", "07", "17", "27", "37", "47", "57", "67", "77"
DW "87", "97", "00", "00", "00", "00", "00", "00", "08", "18", "28", "38"
DW "48", "58", "68", "78", "88", "98", "00", "00", "00", "00", "00", "00"
DW "09", "19", "29", "39", "49", "59", "69", "79", "89", "99", "00", "00"
BCRESUL DT 0
EAX2AST PROC ;The routine prints ASCII
MOV DWORD PTR BCRESUL, EAX ;decimal content of EAX register
MOV DWORD PTR BCRESUL + 4, 0 ;by usage of coprocessor FBSTP
FILD QWORD PTR BCRESUL ;instruction which converts
FBSTP BCRESUL ;number into the packed BCD.
LEA EDI, INTRES ;Unpacking of BCD is done via
MOV ECX, 5 ;the table of unpacked values.
CLD ;Author: Andrija Radovic, ©2011
EAX2AST_DO:
MOVZX EAX, BYTE PTR [ECX + OFFSET BCRESUL - 1]
MOV AX, WORD PTR [2 * EAX + OFFSET ASCII_TABLE]
STOSW
LOOPD EAX2AST_DO
MOV AL, "0"
MOV ECX, 10
LEA EDI, INTRES
REPE SCASB
DEC EDI
INVOKE StdOut, EDI
RET
EAX2AST ENDP
;---------------------------------------------------------------------------------------------------------
BCDEX DT 0
EAX2AFL PROC ;The routine prints ASCII
STD ;decimal content of EAX register
MOV DWORD PTR BCDEX, EAX ;by usage of coprocessor FBSTP
MOV DWORD PTR BCDEX + 4, 0 ;instruction which converts
FILD QWORD PTR BCDEX ;number into the packed BCD.
FBSTP BCDEX ;Unpacking of BCD is done via
MOV EDX, DWORD PTR BCDEX ;SHR and SHRD instructions.
MOVZX EBX, WORD PTR BCDEX + 4 ;Author: Andrija Radovic, ©2011
LEA EDI, 9 + INTRES
EAX2AFL_DO:
MOV EAX, EDX
AND EAX, 15
OR AL, "0"
STOSB
SHRD EDX, EBX, 4
SHR EBX, 4
MOV EAX, EDX
OR EAX, EBX
JNZ EAX2AFL_DO
LEA EDX, [EDI + 1]
CLD
INVOKE StdOut, EDX
RET
EAX2AFL ENDP
;---------------------------------------------------------------------------------------------------------
EAX2DEC PROC ;This routine prints ASCII
LEA ECX, INTRES + 9 ;decimal content of EAX register
MOV EBX, 10 ;by its repetitive dividing by
EAX2DEC_DO: ;10 using DIV instruction that
XOR EDX, EDX ;simultaneously yields result
DIV EBX ;and remainder which denotes
OR EDX, "0" ;current decimal digit.
MOV BYTE PTR [ECX], DL ;Author: Andrija Radovic, ©2011
TEST EAX, EAX
LOOPNZD EAX2DEC_DO
LEA EDX, [ECX + 1]
INVOKE StdOut, EDX
RET
EAX2DEC ENDP
;---------------------------------------------------------------------------------------------------------
EAX2ASC PROC ;This routine prints ASCII
LEA ECX, INTRES + 9 ;decimal content of EAX register
MOV EDI, 858993459 ;by its repetitive dividing by
MOV EBX, EAX ;10 using MUL instruction to
AX2ASC_DO: ;divide by 10 via multiplication
LEA EAX, [EBX + 1] ;with the appropriate constant.
MUL EDI ;Author: Andrija Radovic, ©2011
SHR EDX, 1
LEA EAX, [4 * EDX + EDX]
NEG EAX
LEA EAX, [EBX + 2 * EAX + "0"]
MOV BYTE PTR [ECX], AL
MOV EBX, EDX
LOOPNZD AX2ASC_DO
LEA EDX, [ECX + 1]
INVOKE StdOut, EDX
RET
EAX2ASC ENDP
;---------------------------------------------------------------------------------------------------------
LB DB 0F0H,0F1H, 1,3, 0F2H,4, 0F3H,0F4H, 2,7, 0F5H,0F6H, 6,8, 0F7H,9, 0F8H,0F9H
D DD 0,1,2,3,4,5,6,7,8,9
DD 00,10,20,30,40,50,60,70,80,90
DD 000,100,200,300,400,500,600,700,800,900
DD 0000,1000,2000,3000,4000,5000,6000,7000,8000,9000
DD 00000,10000,20000,30000,40000,50000,60000,70000,80000,90000
DD 000000,100000,200000,300000,400000,500000,600000,700000,800000,900000
DD 0000000,1000000,2000000,3000000,4000000,5000000,6000000,7000000,8000000,9000000
DD 00000000,10000000,20000000,30000000,40000000,50000000,60000000,70000000,80000000,90000000
DD 000000000,100000000,200000000,300000000,400000000,500000000,600000000,700000000,800000000
DD 900000000
DD 0000000000,1000000000,2000000000,3000000000,4000000000,4294967295,4294967295,4294967295
DD 4294967295,4294967295
DD 1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000
EAX2BTR PROC ;This routine prints ASCII
CLD ;decimal content of EAX register
MOV ESI, 400 ;by binary tree digits
MOV EBX, EAX ;extraction.
MOV EAX, 5 ;No specific instruction is used.
EAX2BTR_DO1: ;Author: Andrija Radovic, ©2011
CMP EBX, DWORD PTR [4 * EAX + ESI + OFFSET D]
SBB EDX, EDX
MOV AL, BYTE PTR [2 * EAX + EDX + (OFFSET LB - 1)]
TEST AL, AL
JNS EAX2BTR_DO1
AND AX, 15
LEA EDI, 9 + INTRES
SUB EDI, EAX
PUSH EDI
LEA ESI, [EAX + 4 * EAX]
MOV EAX, 5
SHL ESI, 3
EAX2BTR_DO3:
CMP EBX, DWORD PTR [4 * EAX + ESI + OFFSET D]
SBB EDX, EDX
MOV AL, BYTE PTR [2 * EAX + EDX + (OFFSET LB - 1)]
TEST AL, AL
JNS EAX2BTR_DO3
AND EAX, 15
SUB EBX, DWORD PTR [4 * EAX + ESI + OFFSET D]
OR AL, "0"
STOSB
MOV EAX, 5
SUB ESI, 40
JNC EAX2BTR_DO3
POP EDX
INVOKE StdOut, EDX
RET
EAX2BTR ENDP
;---------------------------------------------------------------------------------------------------------
LC DW 0FF00H,0FF01H, 1,3, 0FF02H,4, 0FF03H,0FF04H, 2,7, 0FF05H,0FF06H, 6,8, 0FF07H,9
DW 0FF08H,0FF09H
DC DD 0,1,2,3,4,5,6,7,8,9
DD 00,10,20,30,40,50,60,70,80,90
DD 000,100,200,300,400,500,600,700,800,900
DD 0000,1000,2000,3000,4000,5000,6000,7000,8000,9000
DD 00000,10000,20000,30000,40000,50000,60000,70000,80000,90000
DD 000000,100000,200000,300000,400000,500000,600000,700000,800000,900000
DD 0000000,1000000,2000000,3000000,4000000,5000000,6000000,7000000,8000000,9000000
DD 00000000,10000000,20000000,30000000,40000000,50000000,60000000,70000000,80000000,90000000
DD 000000000,100000000,200000000,300000000,400000000,500000000,600000000,700000000,800000000
DD 900000000
DD 0000000000,1000000000,2000000000,3000000000,4000000000,4294967295,4294967295,4294967295
DD 4294967295,4294967295
DD 1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000
EAXCMOV PROC ;This routine prints ASCII
CLD ;decimal content of EAX register
MOV ESI, 400 ;by binary tree digits
MOV EBX, EAX ;extraction.
MOV EAX, 5 ;CMOV instruction is used.
EAXCMOV_DO1: ;Author: Andrija Radovic, ©2011
CMP EBX, DWORD PTR [4 * EAX + ESI + OFFSET DC]
; CMOVC AX, WORD PTR [4 * EAX + (OFFSET LC - 4)] ;These two instructions
; CMOVNC AX, WORD PTR [4 * EAX + (OFFSET LC - 2)] ;work on some processors...
CMOVC DX, WORD PTR [4 * EAX + (OFFSET LC - 4)] ;Malfunction of x86 CMOV
CMOVNC DX, WORD PTR [4 * EAX + (OFFSET LC - 2)] ;requires these ones instead
MOV AX, DX ;of above two instructions.
TEST AX, AX
JNS EAXCMOV_DO1
MOVZX EAX, AL
LEA EDI, 9 + INTRES
SUB EDI, EAX
PUSH EDI
LEA ESI, [EAX + 4 * EAX]
MOV EAX, 5
SHL ESI, 3
EAXCMOV_DO3:
CMP EBX, DWORD PTR [4 * EAX + ESI + OFFSET DC]
; CMOVC AX, WORD PTR [4 * EAX + (OFFSET LC - 4)] ;These two instructions
; CMOVNC AX, WORD PTR [4 * EAX + (OFFSET LC - 2)] ;work on some processors...
CMOVC DX, WORD PTR [4 * EAX + (OFFSET LC - 4)] ;Malfunction of x86 CMOV
CMOVNC DX, WORD PTR [4 * EAX + (OFFSET LC - 2)] ;requires these ones instead
MOV AX, DX ;of above two instructions.
TEST AX, AX
JNS EAXCMOV_DO3
MOVZX EAX, AL
SUB EBX, DWORD PTR [4 * EAX + ESI + OFFSET DC]
OR AL, "0"
STOSB
MOV EAX, 5
SUB ESI, 40
JNC EAXCMOV_DO3
POP EDX
INVOKE StdOut, EDX
RET
EAXCMOV ENDP
;---------------------------------------------------------------------------------------------------------
NumFormat DB "%u",0
BufferW DB 32DUP(0)
EAX2WIN PROC ;This routine prints ASCII EAX.
INVOKE wsprintf, ADDR BufferW, ADDR NumFormat, EAX ;It utilizes native Windows NT
INVOKE StdOut, ADDR BufferW ;subroutine.
RET
EAX2WIN ENDP
;---------------------------------------------------------------------------------------------------------
EAX_SQR PROC ;EAX = SQR(EAX), square rooting
XOR ESI, ESI ;routine.
XOR EDX, EDX ;EDX is remainder.
MOV EBX, 1073741824 ;Author: Andrija Radovic, ©2011
SQRT_DO:
LEA EDI, [EBX + ESI]
ADD EDI, EDX
SHR EDX, 1
CMP EAX, EDI
JC SQRT_END_IF
MOV ESI, EDI
ADD EDX, EBX
SQRT_END_IF:
SHR EBX, 2
JNZ SQRT_DO
SUB EAX, ESI
XCHG EAX, EDX
RET
EAX_SQR ENDP
;---------------------------------------------------------------------------------------------------------
END ASCII
Above routines should be assembled rather from the command line than from some assembler’s IDE because most of IDEs usually do not offer full control of the parameters passed to assembler and linker, just as is required by the linking process with removed section’s read-only property.
Therefore they should be assembled manually from the command line:
C:\masm32\bin\ML.EXE /c /coff /Cp /nologo /I"\Masm32\Include" ASCIIwin.asm
And it should be linked with the following single line which is spread to two lines of text although this actually should be only one uninterrupted line in the console:
C:\masm32\bin\LINK.EXE /SUBSYSTEM:CONSOLE /RELEASE /LIBPATH:"\Masm32\Lib" /section:".text",ERW ASCIIwin.obj
Above two lines make EXE program with writable Code section. The benefit of such hand crafted linking is ability to write self modifying code. Although it is highly discouraged by the MASM manual, this is something that should not be instantly rejected! Actually, all the recommendations those allegedly improve prettiness of the code by forcing real limitations to programmers should be immediately rejected! The self modifying code is very useful and sometimes may shrink code a lot and rejection of such option just because somebody found that it is not fancy enough seems to be quite wrong way of action. The second ability to keep all the routines’ peaces tightly together is also very important for quick and efficient coding, which all requires section with the code to be writable.
CONCLUSION
All presented methods suffer from some sorts of disadvantages that elongate their executions. The misery of the task is consisted of the fact that set of 2 elements should be translated into the set of only 10 elements with up to 10 decimal positions and this exceptionally small number of decimal digits and decimal positions (32-bit integer has 10 decimal positions) really limits the efficiency of all of the applied optimization schemes, highly challenging coding skills because the number of instructions in optimization schemes reach the number of instructions performed in main loops.
Although the algorithm EDX2DEC based on the AAA instruction altogether with the EAX2BTR based on binary tree search are duly incomprehensible at the glance, this give it special charm making them quite fashionable. Despite the fact that EAX2BTR utilizes theoretically very promising Binary Tree Search, it is not much better than much simpler algorithms like EAX2SUB based on the repetitive subtractions. The one non-trivial algorithm with comparable performance with simplest ones is the EAX2ASC routine that utilizes division trough multiplication with only 8 instructions within main loop, which is going to be quite fast on the new processors with MUL instruction able to accomplish it in just of a couple of clock cycles. The algorithm EAX2DEC based on the DIV instruction is absolute champion of shortness – only five instructions within the main loop, but DIV instruction may be very slow depending on the particular processor. The situation is quite reversal when these algorithms should be transferred in VHDL hardware, then the binary tree search and BCD AAA instruction based algorithms are the most promising ones offering tremendous opportunities for hardware parallelization – they can reach even execution in a single clock cycle only.
P. S.
Intel explained in an E-mail that this operation of the CMOV instruction is not actually bug and that this is rather a feature because it works just as they figured out it should work. This leaves a bitter taste because there is a following question about the purpose of this abundance of instructions of all sorts with only purpose to facilitate optimization and then they cannot fulfill their unique mission just because they operates… let say on peculiar ways. It seems that some kind of “Application Notes” that is usual thing in electronic industry for years really should become common thing in the Microprocessor industry too and that Intel should issue one book with demonstrations how they figured out precisely how every single particular instruction should be used. Without that many of potentially mighty instructions will remain unnoticed which is instant loss for customers and developers. It is interesting that AMD processors also evince identical behavior indicating that their designers mutually exchange microcode. I had no chance to test other manufacturers processors but there is reasonable suspicion that they all operate on the quite same way.
Author:
Dipl.-Ing. Andrija Radović
All Rights Reserved, ©2011
Attention:
These routines are freeware only for personal use and therefore they can be freely distributed. The author will not be responsible for any kind of loss caused by the usage of any one those routines. These routines should not be distributed for commercial purpose. The name of the author must stay visible on program as contributor unless there is a different agreement. If you charge money for your application, or use the application in conjunction with a product you sell then you require granted permission for usage of unmodified or slightly modified any of those routines.
If you agree with above terms press the following key for the download of source and executable:
Press the following key for the download of PDF vesrion of the text: