随机文本生成器
【問題描述】
有一種基于馬爾可夫鏈(Markov Chain)算法的隨機文本生成方法,它利用任何一個現有的某種語言的文本(如一本英文小說),可以構造出由這個文本中的語言使用情況而形成的統計模型,并通過該模型生成的隨機文本將具有與原文本類似的統計性質(即具有類似寫作風格)。
該算法的基本原理是將輸入看成是由一些互相重疊的短語構成的序列,其將每個短語分割為兩個部分:一部分是由多個詞構成的前綴,另一部分是只包含一個詞的后綴。在生成文本時依據原文本的統計性質(即前綴確定的情況下,得到所有可能的后綴),隨機地選擇某前綴后面的特定后綴。在此,假設前綴長度為兩個單詞,則馬爾可夫鏈(Markov Chain)隨機文本生成算法如下:
設w1和w2為文本的前兩個詞
輸出w1和w2
循環:
隨機地選出w3,它是原文本中w1w2為前綴的后綴中的一個
輸出w3
w1 = w2
w2 = w3
重復循環
?
下面將通過一個例子來說明該算法原理,假設有一個原文如下:
Show your flowcharts and conceal your tables and I will be mystified. Show your tables and your flowcharts will be obvious.
下面是上述原文的一些前綴和其后綴(注意只是部分)的統計:
?
| 前綴 | 后綴 |
| Show your | flowcharts ?tables |
| your flowcharts | and ?will |
| flowcharts and | conceal |
| flowcharts willl | be |
| your tables | and ?and |
| will be | mystified.? obvious. |
| be mystified. | Show |
| be obvious. | (end) |
?
基于上述文本,按照馬爾可夫鏈(Markov Chain)算法隨機文本生成文本時,首先輸出的是Show your,然后隨機取出flowcharts或tables。如果為前者,則接下來的前綴就變成your flowcharts,而下一個后綴應該是and或will;如果為tables,則接下來的前綴就變成your tables,而下一個詞就應該是and。這樣繼續下去,直到產生出足夠多的輸出,或在查找后綴時遇到了結束標志。
?
編寫一個程序從文件中讀入一個英文文本,利用馬爾可夫鏈(Markov Chain)算法,基于文本中固定長度的短語的出現頻率,生成一個最大單詞數目不超過N的新文本到給定文件中。程序要求前綴詞的個數為2,最大單詞數目N由標準輸入獲得。
說明:
n = (int)(rrand() * N);
在此N為某一前綴的所有后綴的總數,n為所確定的后綴在該前綴的后綴序列中的序號(從0開始計數,即n為0時選取第一個后綴,為1時選取第二個后綴,以此類推)。在此,隨機數生成函數rrand()的定義如下:
double seed = 997;
double rrand()
{
?????? double lambda = 3125.0;
?????? double m = 34359738337.0;
?????? double r;
?????? seed = fmod(lambda*seed, m); //要包含頭文件#include <math.h>
?????? r = seed / m;
??? return r;
}
注意:為了保證運行結果的確定性,請務必使用本文提供的隨機數生成函數。
在下面條件滿足時文本生成結束:1)遇到后綴為文件結束標志;或2)生成文本的單詞數達到所設定的最大單詞數。在程序實現時,當讀到文件(結束)尾時,可將一個特殊標志賦給后綴串suffix變量。
【輸入形式】
開打當前目錄下英文文本文件“article.txt”進行統計分析,并從標準輸入中讀入一個正整數作為生成文本時的最大單詞數。
?
【輸出形式】
將生成文本輸出到當前目錄下文件“markov.txt”中。單詞間以一個空格分隔,最后一個單詞后空格可有可無。
【樣例輸入】
若當前目錄下文件article.txt中內容如下:
I will give you some advice about life.
Eat more roughage;
Do more than others expect you to do and do it pains;
Remember what life tells you;
do not take to heart every thing you hear.
do not spend all that you have.
do not sleep as long as you want;
Whenever you say "I love you", please say it honestly;
Whevever you say "I am sorry", please look into the other person's eyes;
Whenever you find your wrongdoing, be quick with reparation!
Whenever you make a phone call smil when you pick up the phone, because someone feel it!
Understand rules completely and change them reasonably;
Remember, the best love is to love others unconditionally rather than make demands on them;
Comment on the success you have attained by looking in the past at the target you wanted to achieve most;
In love and cooking, you must give 100% effort - but expect little appreciation.
從標準輸入中輸入的單詞個數為:
1000
【樣例輸出】
當前目錄下所生成的文件markov.txt中內容如下:
I will give you some advice about life. Eat more roughage; Do more than others expect you to do and do it pains; Remember what life tells you; do not take to heart every thing you hear. do not take to heart every thing you hear. do not spend all that you have. do not sleep as long as you want; Whenever you find your wrongdoing, be quick with reparation! Whenever you find your wrongdoing, be quick with reparation! Whenever you find your wrongdoing, be quick with reparation! Whenever you find your wrongdoing, be quick with reparation! Whenever you say "I am sorry", please look into the other person's eyes; Whenever you say "I am sorry", please look into the other person's eyes; Whenever you make a phone call smil when you pick up the phone, because someone feel it! Understand rules completely and change them reasonably; Remember, the best love is to love others unconditionally rather than make demands on them; Comment on the success you have attained by looking in the past at the target you wanted to achieve most; In love and cooking, you must give 100% effort - but expect little appreciation.
?
?
【題解】
1 #include<stdio.h> 2 #include<stdlib.h> 3 #include<string.h> 4 #include<math.h> 5 6 #define NHASH 1048576 7 8 typedef struct HashTable 9 { 10 char *prefix[2]; 11 char **suffix; 12 int num; 13 unsigned int conflict; 14 struct HashTable *next; 15 }HashTable; 16 17 double seed=997; 18 char *list[1200000]; 19 char *end="(end)"; 20 HashTable *Hash[NHASH]; 21 22 void InsertHash(char *pre1,char *pre2,char *suf); 23 HashTable *HashSearch(char *pre1,char *pre2); 24 void write(int num,int Nmax); 25 unsigned int HashFirst(char *pre1,char *pre2); 26 unsigned int HashConflict(char *pre); 27 double rrand(void); 28 29 int main() 30 { 31 FILE *in; 32 int len=0,num=0,Nmax,i,j; 33 char *book; 34 char buf[105]; 35 36 in=fopen("article.txt","r"); 37 fseek(in,0,SEEK_END); 38 len=ftell(in); 39 fseek(in,0,SEEK_SET); 40 41 book=(char *)malloc(sizeof(char)*len); 42 len=fread(book,sizeof(char),len,in); 43 44 for(i=0,j=0;i<len;i++) 45 { 46 if(book[i]>32 && book[i]<127) 47 { 48 j=0; 49 while(book[i]>32 && book[i]<127) 50 buf[j++]=book[i++]; 51 buf[j]='\0'; 52 list[num]=(char *)malloc(sizeof(buf)); 53 strcpy(list[num],buf); 54 num++; 55 } 56 } 57 list[num]=end; 58 num++; 59 fclose(in); 60 61 scanf("%d",&Nmax); 62 63 for(i=0;i<num-2;i++) 64 InsertHash(list[i],list[i+1],list[i+2]); 65 66 write(num,Nmax); 67 return 0; 68 } 69 void InsertHash(char *pre1,char *pre2,char *suf) 70 { 71 unsigned int addr=HashFirst(pre1,pre2); 72 unsigned int conflict=HashConflict(pre2); 73 74 if(!Hash[addr]) 75 { 76 Hash[addr]=(HashTable *)malloc(sizeof(HashTable)); 77 Hash[addr]->prefix[0]=pre1; 78 Hash[addr]->prefix[1]=pre2; 79 Hash[addr]->next=NULL; 80 Hash[addr]->num=1; 81 Hash[addr]->conflict=conflict; 82 Hash[addr]->suffix=(char **)malloc(sizeof(char *)); 83 Hash[addr]->suffix[0]=suf; 84 } 85 else 86 { 87 HashTable *p,*q; 88 p=q=Hash[addr]; 89 90 while(1) 91 { 92 if(!p) 93 { 94 HashTable *s=(HashTable *)malloc(sizeof(HashTable)); 95 s->prefix[0]=pre1; 96 s->prefix[1]=pre2; 97 s->next=NULL; 98 s->num=1; 99 s->conflict=conflict; 100 s->suffix=(char **)malloc(sizeof(char *)); 101 s->suffix[0]=suf; 102 q->next=s; 103 104 return; 105 } 106 else if(!strcmp(p->prefix[0],pre1) && !strcmp(p->prefix[1],pre2)) 107 { 108 p->num++; 109 p->suffix=(char **)realloc(p->suffix,p->num*sizeof(char *)); 110 p->suffix[p->num-1]=suf; 111 112 return; 113 } 114 q=p; 115 p=p->next; 116 } 117 } 118 } 119 HashTable *HashSearch(char *pre1,char *pre2) 120 { 121 unsigned int addr=HashFirst(pre1,pre2); 122 unsigned int conflict=HashConflict(pre2); 123 HashTable *p=Hash[addr]; 124 125 while(1) 126 { 127 if(conflict==p->conflict && !strcmp(p->prefix[0],pre1)) 128 return p; 129 p=p->next; 130 } 131 return 0; 132 } 133 void write(int num,int Nmax) 134 { 135 FILE *out; 136 char *pre1,*pre2; 137 int n; 138 139 out=fopen("markov.txt","w"); 140 141 pre1=list[0]; 142 pre2=list[1]; 143 fprintf(out,"%s %s ",pre1,pre2); 144 Nmax-=2; 145 146 while(Nmax--) 147 { 148 HashTable *p=HashSearch(pre1,pre2); 149 150 if(p->num==1) 151 n=0; 152 else 153 n=(int)(rrand()*p->num); 154 155 pre1=pre2; 156 pre2=p->suffix[n]; 157 158 if(pre2==end) 159 break; 160 161 fprintf(out,"%s ",pre2); 162 } 163 fclose(out); 164 return; 165 } 166 unsigned int HashFirst(char *pre1,char *pre2) 167 { 168 unsigned int hash=0; 169 170 while(*pre1) 171 hash=131*hash+*pre1++; 172 while(*pre2) 173 hash=131*hash+*pre2++; 174 175 return hash&(NHASH-1); 176 } 177 unsigned int HashConflict(char *pre) 178 { 179 unsigned int hash=5381; 180 181 while (*pre) 182 hash+=(hash<<5)+(*pre++); 183 184 return hash&(NHASH-1); 185 } 186 double rrand(void) 187 { 188 static double lambda=3125.0; 189 static double m=34359738337.0; 190 double r; 191 seed=fmod(lambda*seed,m); 192 r=seed/m; 193 194 return r; 195 }?
轉載于:https://www.cnblogs.com/tuoniao/p/10346398.html
總結
- 上一篇: Ros命令及功能
- 下一篇: 许愿墙|爱墙 js代码