Adriana Vlad, Adrian Mitrea * Estimating Conditional Probabilities and Digram Statistical Structure in Printed Romanian
Digram
(i,j) | Probability
(j/i) | Signal to
noise ratio | Relative error | Probability
(i,j) | Cumulated
relative error | *) Probability
(i,j) | **) Probability
(i,j) |
AL | 0.0959 | 24.01 | 0.0816 | 0.0079 | 0.1074 | 0.0070 | 0.0065 |
AR | 0.1606 | 32.23 | 0.0608 | 0.0132 | 0.0861 | 0.0121 | 0.0121 |
AT | 0.1105 | 25.97 | 0.0755 | 0.0091 | 0.1011 | 0.0089 | 0.0094 |
A- | 0.2435 | 41.80 | 0.0469 | 0.0200 | 0.0718 | 0.0218 | 0.0218 |
CA | 0.1451 | 21.66 | 0.0905 | 0.0065 | 0.1263 | 0.0065 | 0.0063 |
CE | 0.1784 | 24.50 | 0.0800 | 0.0080 | 0.1154 | 0.0081 | 0.0080 |
CU | 0.1487 | 21.97 | 0.0892 | 0.0067 | 0.1249 | 0.0062 | 0.0064 |
DE | 0.4468 | 39.27 | 0.0499 | 0.0123 | 0.0943 | 0.0124 | 0.0124 |
DI | 0.1776 | 20.30 | 0.0965 | 0.0049 | 0.1429 | 0.0049 | 0.0046 |
EA | 0.0960 | 26.67 | 0.0735 | 0.0099 | 0.0961 | 0.0093 | 0.0095 |
EL | 0.0615 | 20.95 | 0.0935 | 0.0063 | 0.1166 | 0.0061 | 0.0062 |
EN | 0.0606 | 20.79 | 0.0943 | 0.0062 | 0.1174 | 0.0063 | 0.0060 |
ER | 0.0890 | 25.58 | 0.0766 | 0.0091 | 0.0993 | 0.0089 | 0.0091 |
ES | 0.0579 | 20.29 | 0.0966 | 0.0060 | 0.1197 | 0.0055 | 0.0057 |
E- | 0.3863 | 64.94 | 0.0302 | 0.0397 | 0.0519 | 0.0399 | 0.0403 |
IN | 0.1402 | 30.81 | 0.0636 | 0.0125 | 0.0879 | 0.0119 | 0.0118 |
I- | 0.2826 | 47.89 | 0.0409 | 0.0252 | 0.0647 | 0.0262 | 0.0255 |
LE | 0.2003 | 25.77 | 0.0761 | 0.0078 | 0.1143 | 0.0085 | 0.0077 |
L- | 0.2139 | 26.86 | 0.0730 | 0.0083 | 0.1111 | 0.0089 | 0.0081 |
NE | 0.1175 | 21.84 | 0.0897 | 0.0064 | 0.1221 | 0.0063 | 0.0063 |
NT | 0.1270 | 22.83 | 0.0859 | 0.0069 | 0.1181 | 0.0076 | 0.0075 |
N- | 0.1956 | 29.52 | 0.0664 | 0.0107 | 0.0981 | 0.0100 | 0.0100 |
OR | 0.2347 | 25.99 | 0.0754 | 0.0081 | 0.1158 | 0.0084 | 0.0083 |
PE | 0.2244 | 22.37 | 0.0876 | 0.0059 | 0.1347 | 0.0056 | 0.0053 |
PR | 0.2412 | 23.44 | 0.0836 | 0.0064 | 0.1305 | 0.0061 | 0.0061 |
RA | 0.1155 | 23.20 | 0.0845 | 0.0072 | 0.1144 | 0.0065 | 0.0056 |
RE | 0.2674 | 38.78 | 0.0505 | 0.0167 | 0.0795 | 0.0172 | 0.0172 |
RI | 0.1694 | 28.99 | 0.0676 | 0.0106 | 0.0971 | 0.0113 | 0.0114 |
R- | 0.0946 | 20.75 | 0.0944 | 0.0059 | 0.1246 | 0.0063 | 0.0069 |
SE | 0.1592 | 21.01 | 0.0933 | 0.0059 | 0.1330 | 0.0060 | 0.0062 |
ST | 0.2505 | 27.91 | 0.0702 | 0.0093 | 0.1091 | 0.0079 | 0.0083 |
TA | 0.1194 | 21.68 | 0.0904 | 0.0064 | 0.1232 | 0.0058 | 0.0057 |
TE | 0.2475 | 33.77 | 0.0580 | 0.0132 | 0.0899 | 0.0128 | 0.0126 |
TR | 0.1206 | 21.80 | 0.0899 | 0.0064 | 0.1227 | 0.0063 | 0.0059 |
T- | 0.1422 | 23.97 | 0.0818 | 0.0076 | 0.1143 | 0.0083 | 0.0088 |
UL | 0.1822 | 26.88 | 0.0729 | 0.0092 | 0.1061 | 0.0091 | 0.0097 |
UN | 0.1357 | 22.56 | 0.0869 | 0.0069 | 0.1205 | 0.0071 | 0.0071 |
U- | 0.1924 | 27.80 | 0.0705 | 0.0097 | 0.1036 | 0.0089 | 0.0088 |
Ã- | 0.6120 | 54.06 | 0.0363 | 0.0173 | 0.0796 | 0.0177 | 0.0179 |
ÂN | 0.6142 | 24.63 | 0.0796 | 0.0033 | 0.1848 | 0.0033 | 0.0029 |
ÎN | 0.9031 | 77.24 | 0.0254 | 0.0093 | 0.0968 | 0.0090 | 0.0091 |
ªI | 0.6578 | 39.87 | 0.0492 | 0.0066 | 0.1231 | 0.0070 | 0.0065 |
ÞI | 0.6258 | 33.02 | 0.0594 | 0.0059 | 0.1370 | 0.0060 | 0.0061 |
-A | 0.1061 | 35.06 | 0.0559 | 0.0171 | 0.0731 | 0.0166 | 0.0166 |
-C | 0.1160 | 36.86 | 0.0532 | 0.0186 | 0.0703 | 0.0190 | 0.0194 |
-D | 0.1132 | 36.35 | 0.0539 | 0.0182 | 0.0711 | 0.0168 | 0.0168 |
-L | 0.0422 | 21.36 | 0.0918 | 0.0068 | 0.1096 | 0.0068 | 0.0067 |
-M | 0.0425 | 21.44 | 0.0914 | 0.0068 | 0.1092 | 0.0076 | 0.0080 |
-N | 0.0384 | 20.34 | 0.0963 | 0.0062 | 0.1142 | 0.0063 | 0.0065 |
-P | 0.0889 | 31.79 | 0.0616 | 0.0143 | 0.0790 | 0.0140 | 0.0132 |
-S | 0.0937 | 32.71 | 0.0599 | 0.0151 | 0.0772 | 0.0145 | 0.0139 |
-Î | 0.0612 | 25.99 | 0.0754 | 0.0098 | 0.0930 | 0.0100 | 0.0104 |
-ª | 0.0480 | 22.85 | 0.0858 | 0.0077 | 0.1035 | 0.0060 | 0.0059 |
*) Calculated as ratio between the occurrence number and the total digram number on the whole X text.
**) Calculated on a periodical
sample from the whole X text with a step of 200 letters.
52