Computer, Zokuhlela
UTF-8 - ikhodi
Unicode ixhasa phantse zonke iiseti isimilo ezikhoyo. Ifom engcono ujiko Unicode iseti ubuntu kuyinto UTF-8 encoding. Ixhasa ayahambisana ASCII, ukumelana uhlaba i data, ukusebenza kunye lula processing. Kodwa izinto zokuqala kuqala.
ifomu ikhowudi
Iikhompyutha asebenza izinto kungekuphela nje amanani abstract zezibalo, kwakunye nendibaniselwano leeyunithi ze ukugcinwa nokuphathwa data fixed-ubungakanani - byte kunye bit-32 amagama. standard Ukhowudo kufuneka zikuthathele ingqalelo oku xa kujongwa ukubeka indlela inani loonobumba.
Kwiinkqubo zekhompyutha, le integers egcinwe iiseli memory ziikhowudi 8 (1 byte), 16 okanye amasuntswana 32. ifomu nganye sichaza Unicode encoding, leyo ulandelelwano memory iiseli yi elipheleleyo ehambisana uphawu ethile. Xa umgangatho kukho iintlobo ezintathu ezahlukeneyo ikhowudi abalinganiswa Unicode iibhloko 8, 16 bit-32. Ngako oko, aziwa njengoku UTF-8, UTF-16 kunye UTF-32. Igama UTF imele Unicode Transformation Format. Ngalinye iintlobo ezintathu iindlela ujiko ilingana umelo Unicode onesimilo Kuyinzuzo kwizicelo ezahlukeneyo.
encryption Data ingasetyenziswa ukumela onke ngabalinganiswa standard Unicode. Ngenxa yoko, ukuba isebenzisane ngokupheleleyo kwizisombululo ngenxa yezizathu ezininzi, usebenzisa iindlela ezahlukeneyo ikhowudi. esephepheni ngasinye ayintsokothanga zitshintshwe zibe naziphi na ezinye ezimbini ngaphandle kokulahleka data.
nenalozheniya isimiso
Ngalinye iifom Unicode enkhowudingi zaphuhliswa ngenxa zinonxibelelwano non buso. Umzekelo,-Windows 932 iyinxalenye abalinganiswa elinye okanye amabini bytes ye khowudi. Ubude Ulandelelwano ixhomekeke byte yokuqala, ngoko ke amaxabiso byte nephambili imingcelele byte-amabini nabangatshatanga disjoint byte. Nangona kunjalo, ixabiso byte owodwa kwaye uyamlandela ulandelelwano byte ukuze idibana. Oku kuthetha umzekelo ukuba umlinganiswa yokukhangela D (ukhowudi 44) Ungayifumana ngempazamo engena kwinxalenye yesibini kulandelelwano byte-ezimbini uphawu "D" (ukhowudi 84 44). Ukuze ufumanise ukuba yeyiphi ulandelelwano oluchanekileyo, inkqubo kufuneka athathele ingqalelo le bytes zangaphambili.
Le meko inzima, ukuba umdlalo ephambili yaye ihamba bytes. Oku kuthetha ukuba ukuze ukususa ukungacaci iya kuba bheka umva phambi kokuba afike ekuqaleni okubhaliweyo okanye ukulandelelana ikhowudi ekhethekileyo. Oku kuphela kakuhle, kodwa hayi lokukhuselwa iimpazamo engaba zikhona, ukususela enye kuphela byte engalunganga isicatshulwa ehlangeneyo angafundekiyo.
ukuguqulwa Format Unicode uyayiphepha le ngxaki kuba ixabiso ophambili, abalandela, kwaye iyunithi enye yokugcina asingabo ulwazi olufanayo. Oku kuqinisekisa ukuba zonke Unicode ngenxa yokufuna kunye uthelekiso, ungaze ukunika iziphumo eziphosakeleyo ngenxa kuqondana kwiindawo ezahlukeneyo ikhowudi yekharakhtha. Isibakala sokuba ezi ntlobo ikhowudi ukubona nenalozheniya siseko, buyazohlula kwezinye enkhowudingi East multi-byte Asia.
Omnye umba nonintersection enkhowudingi Unicode kukuba umlinganiswa ngamnye umda kuchazwe ngokucacileyo. Oku kushenxisa imfuneko ukuvavanya inani elingenammiselo zeesimboli zangaphambili. Olu phawu ngamanye amaxesha kuthiwa encoding self-isamba. Uhlaba ikhowudi macandelo iya kuqalisa uhlaba umlinganiswa omnye kuphela, kwaye abalinganiswa ezingqongileyo zisekhona. Xa ukuguqulwa format-8-bit, ukuba amanqaku isalathisi ukuya byte, kuqalwa 10xxxxxx (in Ikhowudi yokubini) ukufumana isiqalo phawu efunekayo ukuze ubani ezintathu ukuguquka umva.
uzinzo
Unicode Consortium ixhasa ngokupheleleyo zonke iintlobo 3 enkhowudingi. Kubalulekile ukuba ukuchasa UTF-8 kunye Unicode, njengoko zonke neefomathi ukuguqulwa - ngokulinganayo iintlobo lwenene ngumfuziselo Unicode standard ubhalo-encoding.
Byte-orientation
Ukuze kumela abalinganiswa UTF-32 kuya kufuneka iyunithi ikhowudi bit-32, nto leyo ingqamana ikhowudi Unicode. UTF-16 - nye ukuya iiyunithi ezimbini bit-16. A UTF-8 isebenzisa ukuya bytes-4.
UTF-8 encoding Iyilelwe ukuba isebenzisane kunye neenkqubo ASCII-based byte ebomini. Uninzi software ekhoyo kunye nenkqubo yokusebenzisa information technology ixesha elide lathembela umelo zabalinganiswa kulandelelwano bytes. Izivumelwano Multiple kuxhomekeke rhoqo lwe ASCII encoding kwaye isebenzisa mhlawumbi uphepha abalinganiswa ulawulo olulodwa. Indlela elula ukuziqhelanisa neemeko Unicode kungaba, usebenzisa-8-bit kweekhowudi emele abalinganiswa Unicode, nayiphi elinganayo ASCII uphawu okanye uphawu yolawulo. Ukuza kuthi ga ngoku, yaye UTF-8 encoding.
ubude variable
UTF-8 - ikhowudi ngobude bume, ezibandakanya indawo yokugcina-8-bit, badinga eliphezulu ibonise ukuba yeyiphi inxalenye kulandelelwano byte ngamnye bobabo. Enye uluhlu lwamaxabiso azabele izizwe element lokuqala ngokulandelelana ikhowudi, omnye - kuba elandelayo. Oku kunika encoding disjointness.
ASCII
UTF-8 encoding ixhaswa ngokupheleleyo ASCII khowudi (0x00-0x7F). Oku kuthetha ukuba abasebenzi Unicode U + 0000-U + 007F zitshintshele zibe byte enye 0x00-0x7F UTF-8 baze ngaloo ndlela babe yinto enzima ASCII. Ngaphezu koko, ukunqanda ambiguity, ixabiso 0x00-0x7F ayisetyenziswa na kakhulu Byte ukumelwa enye zabalinganiswa Unicode. Uku encoder iisimboli neideograficheskih ngaphandle ASCII, esebenzisa ukulandelelana bytes ezimbini. Imiqondiso yobubanzi U + 0800-U + FFFF zimelwe bytes amathathu, kunye neekhowudi ezongezelelweyo ngaphezu U + FFFF zifuna bytes ezine.
enkulu isicelo
UTF-8 encoding kudla inikwe kuqala kwi protocol HTML, nezinto ezinjengezo.
XML liye laba umgangatho wokuqala ngenkxaso epheleleyo ukuze UTF-8 encoding. imibutho Standards kwakhona bancoma. ingxaki Inkxaso kwidilesi URL ukuba yahlukile kwi ASCII-abalinganiswa, wasonjululwa xa W3C yamaqumrhu kunye neqela zobunjineli IETF weza kwisivumelwano esephepheni of zonke iidilesi URL kuphela UTF-8.
Ayahambisana ASCII lula lenguqu ukuya software entsha. Nge UTF-8 isebenza kakhulu abahleli yombhalo, kuquka JEdit, Emacs, BBEdit, Eclipse, yaye "Iphetshan lokubhala" i-Windows yokusebenza kwinkqubo. Ayikho enye indlela encoding Unicode akakwazi ndoqhayisa inkxaso enjalo zesixhobo.
esephepheni inzuzo kukuba le nkqubo ibandakanya kulandelelwano bytes. Nge UTF-8 umtya Kulula ukusebenza ku C kunye nezinye iilwimi kweenkqubo. Le fom kuphela encoding, lo umyalelo akuthethi ukuba iilebhile bytes bom okanye isibhengezo enkhowudingi kule XML.
self-ungqamaniso
Kwindawo esebenzisa imiqondiso-8-bit le processing kuthelekiswa nezinye iiseti multi-byte character, UTF-8 unawo iingenelo zilandelayo:
- Ulandelelwano yokuqala Ikhowudi byte iqulathe ulwazi malunga ubude bayo. Oku kwandisa ukusebenza yophendlo ngqo.
- Lula ukufumana ekuqaleni isimboli njengoko byte yokuqala anqongophele kuluhlu fixed lwamaxabiso.
- Akukho amaxabiso kuhlangana byte.
Thelekisa iinzuzo
UTF-8 encoding Ngokomzi. Kodwa xa esetyenziselwa ujiko abalinganiswa East Asian (Chinese, IsiJapanese, IsiKorean, ukubhala IsiTshayina usebenzisa iimpawu) kusetyenziswa ulandelelwano-3 byte-. Kwakhona UTF-8 encoding na enasilela ngayo kwamanye iintlobo ikhowudi isantya processing. A binary ukuhlela imigca uvelisa isiphumo esifanayo binary yokuhlela Unicode.
Iskimu uphawu encoding
Iskimu uphawu encoding ifomu iisimboli encoding kunye indlela byte olunye ikhowudi indawo iiyunithi iquka. Ukuze ubone scheme encoding standard Unicode inika ukusetyenziswa uphawu byte umyalelo wokuqala (umyalelo uphawu bom, Byte).
Xa bom kwi UTF-8 phawu ithegi kuphela ngokubhekiswa ukusetyenziswa iintlobo ikhowudi. Iingxaki ekumiseleni UTF-8 endian kuye, njengoko ubukhulu bayo unit encoding na byte mnye. Ukusebenzisa i bom kolu hlobo iikhowudi neze efunekayo okanye kucetyiswa. Bom kungenzeka kumbhalo ukuba zitshintshwe kwezinye codings usebenzisa uphawu byte umyalelo okanye utyikityo for UTF-8 encoding. Ngaba ukulandelelana bytes 3 EF BB 16 16 BF 16.
Indlela ukubeka UTF-8 encoding
I HTML ikhowudi UTF-8 ifakwe ikhowudi ilandelayo:
Head
Meta http-equiv = "zalo Mxholo-Uhlobo" Umxholo = "umbhalo / html; charset = UTF-8" ˃
In PHP UTF-8 encoding umiselwe usebenzisa i header () umsebenzi ekuqaleni ifayile emva ngokumisela isiphoso ixabiso kwinqanaba output:
˂? Filipi
error_reporting (-1);
header ( "Mxholo-Uhlobo: umbhalo / html; charset = UTF-8 ');
To connect kwiziko ledatha I SQL yam UTF-8 encoding ibekwe:
˂? Filipi
mysql_set_charset ( 'UTF8');
Le CSS-ifayile encoding yi UTF-8 ucaciswe ngolu hlobo lulandelayo:
@charset "UTF-8";
Xa ugcine iifayile zonke iintlobo ukukhetha UTF-8 enkhowudingi ngaphandle bom, kungenjalo siza kusebenza. Ukwenza oku DreamWeave kufuneka ukhethe umba menu "Uguqulo - Iimpahla Iphepha - Isihloko / ikhodi" ukutshintsha i encoding ukuya UTF-8. Kwalandelwa ukulayisha ikhasi, asuse uphawu ukusuka 'Connect Unicode usayino (bom) »kwaye ukufaka utshintshi. Ukuba nawuphi na umbhalo kwi kwiphepha okanye uvimba yaqaliswa olunye uhlobo ukhowudo, kuyimfuneko ukuba baphinde bangene okanye kwakhona encode-. Xa usebenza nabafundi amabinzana rhoqo, qiniseka ukusebenzisa isilungisi u.
Unakho ukugcina ifayile UTF-8 khowudi kwi "Notepad" of Windows. Emva kokuba ukhethe umba menu "File - Gcina Njenge ..." ukufaka uhlobo eyimfuneko encoding kwaye ugcine ifayile UTF-8.
Kwisicatshulwa umhleli Notepad ++, ukuba icwangciselwe ngaphandle UTF-8, nge kule ayithem yemenyu "Guqulela kwi UTF-8 ngaphandle bom» kutshintshe kananjalo ubume kwaye ngaphandle UTF-8.
ayikho enye indlela
Kwimeko jikelele, apho imida yezopolitiko neelwimi kwaye kahle, iiseti umlinganiswa ukuba abe neempawu ekuwo, nokusetyenziswa kangako. Unicode kuyinto isethi uphawu olunye exhasa zonke kwendawo. A UTF-8 - umzekelo ukuphunyezwa Unicode, oko kukuthi:
- Ixhasa uluhlu olubanzi lwezixhobo, kuquka wokuhambelana ASCII encoding;
- It iyamelana data ubugqwetha;
- elula ngempumelelo unyango;
- na umgangatho ozimeleyo.
Ngokufika kwe-mpikiswano UTF-8 malunga uhlobo encoding okanye neqela ungcono, kuba ngamampunge.
Similar articles
Trending Now