Biotechnology Frontier September 2013, Volume 2, Issue 3, PP.42-46
Plasmid as Vector of Communication for Chinese Written Language ——Programming, cloning and deciphering Qian Li 1, Mantian Li2, Bo Zhao2, Manji Sun 1# 1. Institute of Pharmacology and Toxicology, Academy of Military Medical Sciences, Beijing 100850, China. 2. Harbin Institute of Technology, Robotics Institute, Harbin 150080, China. #Email: sunmj@nic.bmi.ac.cn.
Abstract Since the emerging of hand writing, people began to picture how to store the written language in a certain vector. It is attempted to show here that the genetic plasmid can be used as a vector for storage of the Chinese written words, and to construct a ―Gene / Character Transformation Panel‖ to make things easy to retrieve the information of the written words from the gene. We selected the analects of Confucius from Lunyu, the oft-quoted and widely loved book in China. The gene of the analects paragraphs was programmed by hexad codes, cloned in plasmid, expressed and multiplied in the engineering bacteria, and stored in dot on small round pieces of paper (2 mm in diameter). Once the stored information needs to be read, it is simply to sequence the stored gene, and then input to the ―Gene / Character Transformation Panel‖, thus the codes can be automatically translated into Chinese words. Keeping written words in plasmids is highlighted in long-term storage, convenience for portability and concealment of the information. The analysis in paper may also gain enlightenment to the other languages of the world. Keywords: Plasmids, Gene Dictionary, Information
1 INTRODUCTION Since time immemorial, in order to meet the needs of the social intercourse in association with people, the ancients had to communicate their ideas, feelings and desires by means of systems of sounds and symbols. Along with the emergence of oral speech, they had been thinking of how to set down and store the information in writing for reference at the time of things occurrence. The written language has been a non-instinctive means of association indispensable in daily life, regardless of nation or race, profession or class, in period of aged-old or modern times. Lots of carriers (chips, tapes and discs) have been used on purpose of storage of languages. Here it is shown that a new vector(the genetic plasmid)can be used for storage of the Chinese written words. In working out the idea of programming, cloning and deciphering, a gene dictionary of Chinese (Han) language was constructed (available in ―Project 1‖ at genecodedictionary@yahoo.com, password 12345678). The recombinant gene was applied to a piece of round paper with 2 mm in diameter for long-term storage. Once the stored information needs to be read, it is simply to sequence the gene and retrieve the Chinese words using the ―Gene / Character Transformation Panel‖ (available in ―Project 1‖ at genecodedictionary@yahoo.com, password 12345678). In realization of the project, we have to, for one thing, utilize the naive triad codes (the only signals recognized by the genetic machinery of the living beings), and for the other, to increase the number of codes to match thousands of the Chinese (Han) characters in compilation of the gene dictionary. In the biological world, there are 64 (43) genetic codes coming from 4 base pairs (ATCG) altogether, of which 3 termination codes can not be used in the gene programming, thus only 61 triplet codes are available for encoding the gene in the dictionary. In order to meet the naive triad codes and the huge amounts of Chinese words, the number of the base pairs per code should be in multiple of three. Then a set of hexad codes was utilized using the four base - 42 http://www.ivypub.org/bf
pairs by virtue of mathematical permutation to match the elements in the dictionary. One hexad code designates a definite Chinese character or a symbol. Taking account of the size and capacity of the plasmids, compiling a library of codes in a total of 3721 (612) hexads is reasonable and feasible. From 1986 to 1988, a giant philology project was carried out in a massive scale in China in order to make an appraisal on the utilization rate of the Chinese characters. The National Language Working Committee entrusted the Department of Computer Science, Shanxi University with responsibility for the sampling in statistics and carrying out the research based on two millions of language materials. The results showed that in spite of the huge amounts of Chinese (Han) characters, only several thousands of words were frequently used. Thus a document ―The Chinese words in common use‖ was promulgated in 1988 [1], the promulgated 3500 words covered 99.48 % of the words commonly used in the daily life [2]. For this reason, we compiled the gene dictionary in line with words (3500) promulgated in the ―Document‖, appending the Arabic numerals (10), capital and small English letters (26 × 2), punctuation marks (35), seldom words (3), and allotting 121 spare codes for extra, in a total of 3721 stylistic symbols.
2 MATERIALS AND METHODS 2.1 Materials Plasmid pBV220 (Institute of Pharmacology, Beijing); E.coli DH 5α (Shengong Co. Shanghai); Endotoxin-free plasmid kit and DNA molecular weight marker (TianGen Co. Beijing); Agarose (Serva). Other reagents were all grade. The genes encoding the Chinese characters of analects of Confucius were synthesized by the Shengong Co. Shanghai.
2.2 Methods 2.2.1 Construction of Recombinant Plasmids The gene was inserted into the plasmid pBV 220 between the EcoR1 and BamH1 sites to construct the recombinant plasmid pBV 220-lunyu, which was then transformed to E.coli DH 5α. The gene in plasmids pBV 220-lunyu can be stored on the round pieces of paper 2 mm in diameter, or just kept in the E.coli DH 5α at 0~4 or -20℃. 2.2.2 Compilation of the Gene Dictionary of the Chinese (Han) Language The Chinese commonly used words [1] in the dictionary were arranged in order according to the number of strokes of the characters, and followed by the Arabic numerals, English letters, punctuation marks, full stop copy, comma copy, seldom words etc in a total of 3721 stylistic symbols. They are encoded by a set of hexad codes which is plenty enough to cover the symbols mentioned above. 2.2.3 Mutual Translation between Gene Codes and Chinese Characters The ―Gene / Character Transformation Panel‖ is the tool used to translate each other the gene codes and Chinese characters. The translation is a process of querying the database built by Microsoft Access including the matches of Chinese characters / gene codes and the ID number. The Chinese characters of the context can be introduced into the upper box at the left-hand column of the panel in three modes: (1) directly type the Chinese characters in the box; (2) input the context as a file; or (3) just simply copy and paste the context. After clicking the button ―Go‖, the translation results will be output correspondingly in both the ID number dialog (middle box) and the gene code dialog (lower box). The button ―Clipboard‖ can be used for storage in the clipboard. In case of deciphering, the genetic codes are just input to the upper box at the right-hand column of the panel, performed likewise as abovementioned, then the Chinese context will appear in the lower box.
3 RESULTS AND DISCUSSION Confucius is a great philosopher and one thinker most influential in forming of the Chinese culture. We selected the analects from Lunyu, the oft-quoted and widely loved book collecting the discourses and saying from Confucius.
3.1 The Selected Analects in Chinese for Programming Are as Follows: - 43 http://www.ivypub.org/bf
知之为知之,不知为不知,是知也。学而时习之,不亦说乎。有朋自远方来,不亦乐乎。人不知而不愠, 不亦君子乎。温故而知新。工欲善其事,必先利其器。人无远虑,必有近忧。己所不欲,勿施于人。躬自 厚而薄责于人。过而不改,是谓过矣。有教无类。性相近也,习相远也。君子不以言举人,不以人废言。 见义不为,无勇也。弟子入则孝,出则弟,谨而信,泛爱众,而亲仁。君子坦荡荡,小人长戚戚。益有三 乐,损有三乐。乐节礼乐,乐道人之善,乐多贤友,益矣。乐骄乐,乐佚游,乐宴乐,损矣。君子耻其言 而过其行。学而不厌,诲人不倦。三思而后行。言必信,行必果。
3.2 The English Version [3] of the Analects Says: To know what it is that you know, and to know what it is that you do not know, that is understanding. It is indeed a pleasure to acquire knowledge and, as you go on acquiring, to put into practice what you have acquired. A greater pleasure still it is when friends of congenial minds come from afar to seek you because of your attainments. But he is truly a wise and good man who feels no discomposure even when he is not noticed of men. To constantly go over what have acquired, and keep continually adding to it new acquirements. A workman who wants to perfect his work first sharpens his tools. If a man takes no thought for the morrow, he will be sorry before today is out. Whatsoever things you do not wish that others should do unto you, do not unto them. A man who expects much from himself and demands little from others will never have any enemies. To be wrong and not to reform is indeed to be wrong. Among really educated men, there is no caste or race distinction. Men, in their nature, are alike, but by practice they become widely different. A wise man never upholds a man because of what he says, nor does he discard what a man says because of the speaker's character. To see what is right and to act against one’s judgment shows a want of courage. A young man, when at home, should be a good son, when out in the world, a good citizen. He should be circumspect and truthful. He should be in sympathy with all men, but intimate with men of moral character. A wise and good man is composed and happy; a fool is always worried and full of distress. There are three kinds of pleasures which are beneficial and three kinds which are injurious. Pleasure derived from the study and criticism of the polite arts, pleasure in admiring and speaking of the excellent qualities of men, and pleasure in having many friends of virtue and talents: these pleasures are beneficial. Pleasure in dissipation, in extravagance, in mere conviviality: such pleasures are injurious. A wise man is ashamed to say much; he prefers to do more. Patiently to acquire knowledge, and to be indefatigable in teaching it to others. Always reflect thrice over every time before act. One who makes it a point to carry out what he says and to persist in what he undertakes.
3.3 The Gene Sequence of Analects The genes encoding the Chinese characters of the analects in terms of the hexad codes were translated below using the dictionary of Chinese (Han) language. The restriction sites, EcoRI and BamHI, were underlined. GAATTCT ATGCAT TCATTT GAATAT AATCAT TCATTT GAAGAG CATTCT CTGCAT TCATAT AATTCT CTGCAT TCAGAG TCCCAC ATCCAT TCATCT TCTGTG CTGCTC TCCTCC GTTTTG TGCTCT TCCTTT GAAGCG GTGTCT CTGTGC AATCCA ATATTC TACGAG CCTTCC ACGCAT GAGTAC - 44 http://www.ivypub.org/bf
GCGTTA GATTAT ATTTTG TTTGAG TCTTCT CTGTGC AATTTC TGGTTC TACGAG TTTTTT TCATCT CTGCAT TCATCC GTTTCT CTGGCG GTAGTG CGGTCT CTGTGC AATTGG CAATTT GGGTTC TACGCG GGAACC TCACCC GCCTCC GTTCAT TCAAGC AGCGTG CTGTTT CTGATT GATATC GCGCTT ACACCT TACGTG CGGTTC ACGTAC AGTTTG AGACTT ACAAAA GCAGTG CTGTTT TCATCT CATTTA GATCGA ATAGAG CATTTC ACGTCC ACGTCG TATTGG TATGTG CTGTTT GCGCAT ACGTCT CTGATT GATGTG CGGTAT CACCTA AACTTT CTATTT TCAGTG CTGCTG AAATAC GCGCAC TGGTCC GTTAAA GTTCTT TTCTTT CTATTT TCAGAG TTTTCC ACCTCC GTTTCT CTGTGG AATGTG CGGCAC ATCGCC CCCTCC ACCGCG GCAGTG CTGTCC ACGCGG TTATCT CATCTA GTTGTG CTGCTC TCTCCC GCGTCG TATTCT TCTGTG CGGTCT TCCCCC GCGTTA GATTCT TCTGAG TTTTGG CAATTT GGGTCT CTGTAT GCTTCG ACTCCA CCCTTT TCAGAG CATTCT CTGTAT GCTTTT TCACGT CGATCG ACTGTG CTGTCT GTCTTT GCATCT CTGTAT AATGTG CGGTCT CATCCA GGGTCT TCTGAG CCTTCG GGCTTT GGGTTT TTGTAC CAGTCA TTGGTG CGGTTC GCCTAC CAGTCG GGCGTG CGGATA TTTTCC GTTCGC AAGGTG CGGTCG GTGCTG GCCTGC TTGGTG CGGTCC GTTCTA AGTTCT GTGGTG CTGTGG CAATTT GGGCTT CTTCCC GGTCCC GGTGTG CGGTTT AACTTT TCATCT GGACGG ACGCGG ACGGTG CTGCCG ATATCC ACGTTT CGCTTC TGGGTG CGGCAA AGATCC ACGTTT CGCTTC TGGGAG TTTTTC TGGTGT TCATTC AGATTC TGGGCG GTGTTC TGGACC TATTTT TCATTT GAAATC GCGGAG CATTTC TGGTGC CAACCT AGCTCT AATGTG CGGCCG ATAGCG GCAGAG TTTTTC TGGCAA TGCTTC TGGGAG TCTTTC TGGGCG GAAACC CCTGTG CGGTTC TGGCAG TCGTTC TGGGAG TCCCAA AGAGCG GCAGCG GGATGG CAATTT GGGCAA GACCTT ACATCG ACTTCC GTTTCC ACCCTT ACATGC TATGAG TTTCTC TCCTCC GTTTCT CTGTCC AGAGTG CGGGCT CTCTTT TCATCT CTGCTG AACGTG CTGTTT CGCCAC GACTCC GTTTGC TCTTGC TATGTG CTGTCG ACTTTC ACGCGC AAGGAG TCTTGC TATTTC ACGCCT AGAGTG CTGTAA TGGATCC The gene encoding the analects in hexad codes was inserted into pBV 220 at the multiple cloning sites between EcoRI and BamHI to construct the recombinant plasmid pBV 220-lunyu, which was then transformed into E. coli DH 5α for large-scale proliferation and reproduction of the plasmids. The plasmids are biologically stable, and can be spotted onto a round piece of paper for long-term storage. It is convenient for plasmids to carry about and deliver, and suitable for passing round secretly in long distance.
3.4 The Gene / Character Transformation Dialog The Gene / Character transformation panel was constructed to transform the Chinese characters to their corresponding gene codes and vice versa. The Chinese character translation was performed in the left column at the interface, and the gene code translation in the right column (details see the Materials and Methods, and the display see the Supplementary Information). The interface of the Panel is shown as follows:
- 45 http://www.ivypub.org/bf
FIG. 1 The interface of the gene / character transformation panel
The languages in the world fall into two categories: the pictographic and the alphabetic writings. The pictographic character (e.g. the Chinese language) originates from picture or figure of object, one independent element with one syllable. In Chinese (Han) language, there are special characters grammatically representing various tenses and moods, so that in making sentences it simply needs to put them together with the essential words thus expressing the present, past or the future tenses as well as the active or passive moods, without changing the prefix or the suffix as what comes across in the alphabetic languages. In case of programming the gene dictionary of alphabetic language, such as English, the complexity must be greatly concerned. Two principal protocols might be considered: encoding every word, or encoding each of the twenty-six alphabetical letters in addition of the punctuations and numerals etc to a particular triad code maybe of the alternative choice.
4 CONCLUSIONS The programming, cloning and deciphering of the written words of Chinese (Han) language by means of the genetic engineering technique have been realized. The gene dictionary has not only carried out the genetic manipulation of Chinese language, but also put into practice of the large scale reproduction and storage of the gene information. The vector is cost-effective and easy to handle. The outstanding features of the plasmids bearing the written word genes are highlighted in long-term storage, convenience for portability and concealment. The outcome of the experiment may be also useful for the other languages of the world.
REFERENCES [1]
The Chinese words in common use.(1988) China National Language Wording Committee.
[2]
Man Jun Sun et al. ―Scale and statistics of Chinese characters proficiency testing.‖ Journal of Language Application, 1(2004)):63 – 70.
[3]
Hong Ming Gu. The discourses and saying of Confucius (Lunyu). Hainan Press, China, 1996
AUTHORS 1
2
female, PhD. of Science (08/1996 –
of Technology, Robotics Institute. Research interests: artificial
07/1999),
intelligence, lmt@hit.edu.cn.
Qian Li (Hunan province, 1970- ), Biochemical
Pharmacology,
Institute of Pharmacology and Toxicology, Academy of Military Medical Sciences, Beijing,
China.
Research
interests:
Biochemical pharmacology and Military
3
Bo Zhao male, PhD. of Science, Harbin Institute of Technology,
Robotics Institute. Research interests: artificial intelligence. 4
Man Ji Sun (1931- ), male, Academician of Chinese academy
of
medicine. She is working in Academy of Military Medical Sciences, associate research fellow. Email: bjliqian@sina.com
Man Tian Li male, PhD. of Science, Professor, Harbin Institute
Science,
Biochemical
pharmacologist.
Institute
of
Pharmacology and Toxicology, Academy of military medical sciences. Email: sunmj@nic.bmi.ac.cn
- 46 http://www.ivypub.org/bf