• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

MushroomObserver / mushroom-observer / 27722897639

17 Jun 2026 10:08PM UTC coverage: 97.821% (-0.001%) from 97.822%
27722897639

push

github

web-flow
Merge pull request #4557 from MushroomObserver/nimmo-phlex-inat-imports

Phlex hygiene: helper sweep + layouts/header + modal title + sorter/dropdown

775 of 796 new or added lines in 106 files covered. (97.36%)

29 existing lines in 2 files now uncovered.

49707 of 50814 relevant lines covered (97.82%)

673.55 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

97.2
/app/extensions/string.rb
1
# frozen_string_literal: true
2

3
#
4
#  = Extensions to String
5
#  == Class Methods
6
#  random::             Generate a random string.
7
#
8
#  == Instance Methods
9
#
10
#  t::                  Textilize (no paragraphs or obj links).
11
#  tl::                 Textilize with obj links (no paragraphs).
12
#  tp::                 Textilize with paragraphs (no obj links).
13
#  tpl::                Textilize with paragraphs and obj links.
14
#  tp_nodiv::           Textilize with paragraphs (no obj links, without div).
15
#  tl_for_api::         Textilize with obj links and paragraphs when needed
16
#  ---
17
#  gsub!::              Gobal replace in place.
18
#  to_ascii::           Convert string from UTF-8 to plain ASCII.
19
#  iconv::              Convert string from UTF-8 to "charset".
20
#  strip_html::         Remove HTML tags (not entities) from string.
21
#  truncate_html::      Truncate an HTML string to N display characters.
22
#  html_to_ascii::      Convert HTML into plain text.
23
#  gsub_html_special_chars:: auxiliary to html_to_ascii
24
#  unescape_html::      Render special encoded characters as regular characters
25
#  as_displayed::       Render everything humanly legible, for integration tests
26
#  id_of_nested_field:: Rails generates `observation_notes` for the ID of a
27
#                       nested field like `observation[notes]`
28
#  ---
29
#  break_name::         Break a taxon name at the author
30
#  small_author::       Wrap the author in a <small> span
31
#  ---
32
#  nowrap::             Surround HTML string inside '<nowrap>' span.
33
#  strip_squeeze::      Strip and squeeze spaces.
34
#  rand_char::          Pick a single random character from the string.
35
#  dealphabetize::      Reverse Integer#alphabetize.
36
#  is_ascii_character?:: Does string start with ASCII character?
37
#  is_nonascii_character?:: Does string start with non-ASCII character?
38
#  percent_match::      Measure how closely this String matches another String.
39
#  unindent::           Remove indentation (e.g., from here docs).
40
#  ---
41
#  md5sum::             Calculate MD5 sum.
42
#  to_boolean::         Evaluates and returns a Boolean.
43
#
44
################################################################################
45

46
# MO extensions to Ruby String class
47
class String
1✔
48
  require("digest/md5")
1✔
49

50
  # :stopdoc:
51
  unless defined? UTF_TO_ASCII
1✔
52
    # This should cover most everything we'll see, at least all the European
53
    # characters and accents -- it covers HTML codes &#1 to &#400.
54
    # Disable alignment cop to make code more readable
55
    # rubocop:disable Layout/HashAlignment
56
    # Disable CollectionLiteralLength because constant is most convenient method
57
    # rubocop:disable Metrics/CollectionLiteralLength
58
    UTF8_TO_ASCII = {
1✔
59
      "\x00"         => " ",
60
      "\x01"         => " ",
61
      "\x02"         => " ",
62
      "\x03"         => " ",
63
      "\x04"         => " ",
64
      "\x05"         => " ",
65
      "\x06"         => " ",
66
      "\x07"         => " ",
67
      "\x08"         => " ",
68
      "\x0B"         => " ",
69
      "\x0C"         => " ",
70
      "\x0E"         => " ",
71
      "\x0F"         => " ",
72
      "\x10"         => " ",
73
      "\x11"         => " ",
74
      "\x12"         => " ",
75
      "\x13"         => " ",
76
      "\x14"         => " ",
77
      "\x15"         => " ",
78
      "\x16"         => " ",
79
      "\x17"         => " ",
80
      "\x18"         => " ",
81
      "\x19"         => " ",
82
      "\x1A"         => " ",
83
      "\x1B"         => " ",
84
      "\x1C"         => " ",
85
      "\x1D"         => " ",
86
      "\x1E"         => " ",
87
      "\x1F"         => " ",
88
      "\xE2\x82\xAC" => "$",    # €
89
      "\xEF\xBF\xBD" => "?",    # �
90
      "\xE2\x80\x9A" => ",",    # ‚
91
      "\xC6\x92"     => "f",    # ƒ
92
      "\xE2\x80\x9E" => '"',    # „
93
      "\xE2\x80\xA6" => "...",  # …
94
      "\xE2\x80\xA0" => "+",    # †
95
      "\xE2\x80\xA1" => "++",   # ‡
96
      "\xCB\x86"     => "^",    # ˆ
97
      "\xE2\x80\xB0" => "%",    # ‰
98
      "\xE2\x80\xB9" => "<",    # ‹
99
      "\xE2\x80\x98" => "'",    # ‘
100
      "\xE2\x80\x99" => "'",    # ’
101
      "\xE2\x80\x9C" => '"',    # “
102
      "\xE2\x80\x9D" => '"',    # ”
103
      "\xE2\x80\xA2" => ".",    # •
104
      "\xE2\x80\x93" => "-",    # –
105
      "\xE2\x80\x94" => "-",    # —
106
      "\xCB\x9C"     => "~",    # ˜
107
      "\xE2\x84\xA2" => "(TM)", # ™
108
      "\xE2\x80\xBA" => ">",    # ›
109
      "\xC2\xA1"     => "!",    # ¡
110
      "\xC2\xA2"     => "$",    # ¢
111
      "\xC2\xA3"     => "$",    # £
112
      "\xC2\xA4"     => "$",    # ¤
113
      "\xC2\xA5"     => "$",    # ¥
114
      "\xC2\xA6"     => "|",    # ¦
115
      "\xC2\xA7"     => "?",    # §
116
      "\xC2\xA8"     => "?",    # ¨
117
      "\xC2\xA9"     => "(C)",  # ©
118
      "\xC2\xAA"     => "a",    # ª
119
      "\xC2\xAB"     => "<<",   # «
120
      "\xC2\xAC"     => "-",    # ¬
121
      "\xC2\xAD"     => "-",    # ­
122
      "\xC2\xAE"     => "(R)",  # ®
123
      "\xC2\xAF"     => "-",    # ¯
124
      "\xC2\xB0"     => "(o)",  # °
125
      "\xC2\xB1"     => "+/-",  # ±
126
      "\xC2\xB2"     => "(2)",  # ²
127
      "\xC2\xB3"     => "(3)",  # ³
128
      "\xC2\xB4"     => "'",    # ´
129
      "\xC2\xB5"     => "u",    # µ
130
      "\xC2\xB6"     => "?",    # ¶
131
      "\xC2\xB7"     => ".",    # ·
132
      "\xC2\xB8"     => ".",    # ¸
133
      "\xC2\xB9"     => "(1)",  # ¹
134
      "\xC2\xBA"     => "(0)",  # º
135
      "\xC2\xBB"     => ">>",   # »
136
      "\xC2\xBC"     => "1/4",  # ¼
137
      "\xC2\xBD"     => "1/2",  # ½
138
      "\xC2\xBE"     => "3/4",  # ¾
139
      "\xC2\xBF"     => "?",    # ¿
140
      "\xC3\x80"     => "A",    # À
141
      "\xC3\x81"     => "A",    # Á
142
      "\xC3\x82"     => "A",    # Â
143
      "\xC3\x83"     => "A",    # Ã
144
      "\xC3\x84"     => "A",    # Ä
145
      "\xC3\x85"     => "A",    # Å
146
      "\xC3\x86"     => "AE",   # Æ
147
      "\xC3\x87"     => "C",    # Ç
148
      "\xC3\x88"     => "E",    # È
149
      "\xC3\x89"     => "E",    # É
150
      "\xC3\x8A"     => "E",    # Ê
151
      "\xC3\x8B"     => "E",    # Ë
152
      "\xC3\x8C"     => "I",    # Ì
153
      "\xC3\x8D"     => "I",    # Í
154
      "\xC3\x8E"     => "I",    # Î
155
      "\xC3\x8F"     => "I",    # Ï
156
      "\xC3\x90"     => "D",    # Ð
157
      "\xC3\x91"     => "N",    # Ñ
158
      "\xC3\x92"     => "O",    # Ò
159
      "\xC3\x93"     => "O",    # Ó
160
      "\xC3\x94"     => "O",    # Ô
161
      "\xC3\x95"     => "O",    # Õ
162
      "\xC3\x96"     => "O",    # Ö
163
      "\xC3\x97"     => " x ",  # ×
164
      "\xC3\x98"     => "O",    # Ø
165
      "\xC3\x99"     => "U",    # Ù
166
      "\xC3\x9A"     => "U",    # Ú
167
      "\xC3\x9B"     => "U",    # Û
168
      "\xC3\x9C"     => "U",    # Ü
169
      "\xC3\x9D"     => "Y",    # Ý
170
      "\xC3\x9E"     => "P",    # Þ
171
      "\xC3\x9F"     => "ss",   # ß
172
      "\xC3\xA0"     => "a",    # à
173
      "\xC3\xA1"     => "a",    # á
174
      "\xC3\xA2"     => "a",    # â
175
      "\xC3\xA3"     => "a",    # ã
176
      "\xC3\xA4"     => "a",    # ä
177
      "\xC3\xA5"     => "a",    # å
178
      "\xC3\xA6"     => "ae",   # æ
179
      "\xC3\xA7"     => "c",    # ç
180
      "\xC3\xA8"     => "e",    # è
181
      "\xC3\xA9"     => "e",    # é
182
      "\xC3\xAA"     => "e",    # ê
183
      "\xC3\xAB"     => "e",    # ë
184
      "\xC3\xAC"     => "i",    # ì
185
      "\xC3\xAD"     => "i",    # í
186
      "\xC3\xAE"     => "i",    # î
187
      "\xC3\xAF"     => "i",    # ï
188
      "\xC3\xB0"     => "o",    # ð
189
      "\xC3\xB1"     => "n",    # ñ
190
      "\xC3\xB2"     => "o",    # ò
191
      "\xC3\xB3"     => "o",    # ó
192
      "\xC3\xB4"     => "o",    # ô
193
      "\xC3\xB5"     => "o",    # õ
194
      "\xC3\xB6"     => "o",    # ö
195
      "\xC3\xB7"     => "/",    # ÷
196
      "\xC3\xB8"     => "o",    # ø
197
      "\xC3\xB9"     => "u",    # ù
198
      "\xC3\xBA"     => "u",    # ú
199
      "\xC3\xBB"     => "u",    # û
200
      "\xC3\xBC"     => "u",    # ü
201
      "\xC3\xBD"     => "y",    # ý
202
      "\xC3\xBE"     => "p",    # þ
203
      "\xC3\xBF"     => "y",    # ÿ
204
      "\xC4\x3F"     => "c",    # č (where did this come from??)
205
      "\xC4\x80"     => "A",    # Ā
206
      "\xC4\x81"     => "a",    # ā
207
      "\xC4\x82"     => "A",    # Ă
208
      "\xC4\x83"     => "a",    # ă
209
      "\xC4\x84"     => "A",    # Ą
210
      "\xC4\x85"     => "a",    # ą
211
      "\xC4\x86"     => "C",    # Ć
212
      "\xC4\x87"     => "c",    # ć
213
      "\xC4\x88"     => "C",    # Ĉ
214
      "\xC4\x89"     => "c",    # ĉ
215
      "\xC4\x8A"     => "C",    # Ċ
216
      "\xC4\x8B"     => "c",    # ċ
217
      "\xC4\x8C"     => "C",    # Č
218
      "\xC4\x8D"     => "c",    # č
219
      "\xC4\x8E"     => "D",    # Ď
220
      "\xC4\x8F"     => "d",    # ď
221
      "\xC4\x90"     => "D",    # Đ
222
      "\xC4\x91"     => "d",    # đ
223
      "\xC4\x92"     => "E",    # Ē
224
      "\xC4\x93"     => "e",    # ē
225
      "\xC4\x94"     => "E",    # Ĕ
226
      "\xC4\x95"     => "e",    # ĕ
227
      "\xC4\x96"     => "E",    # Ė
228
      "\xC4\x97"     => "e",    # ė
229
      "\xC4\x98"     => "E",    # Ę
230
      "\xC4\x99"     => "e",    # ę
231
      "\xC4\x9A"     => "E",    # Ě
232
      "\xC4\x9B"     => "e",    # ě
233
      "\xC4\x9C"     => "G",    # Ĝ
234
      "\xC4\x9D"     => "g",    # ĝ
235
      "\xC4\x9E"     => "G",    # Ğ
236
      "\xC4\x9F"     => "g",    # ğ
237
      "\xC4\xA0"     => "G",    # Ġ
238
      "\xC4\xA1"     => "g",    # ġ
239
      "\xC4\xA2"     => "G",    # Ģ
240
      "\xC4\xA3"     => "g",    # ģ
241
      "\xC4\xA4"     => "H",    # Ĥ
242
      "\xC4\xA5"     => "h",    # ĥ
243
      "\xC4\xA6"     => "H",    # Ħ
244
      "\xC4\xA7"     => "h",    # ħ
245
      "\xC4\xA8"     => "I",    # Ĩ
246
      "\xC4\xA9"     => "i",    # ĩ
247
      "\xC4\xAA"     => "I",    # Ī
248
      "\xC4\xAB"     => "i",    # ī
249
      "\xC4\xAC"     => "I",    # Ĭ
250
      "\xC4\xAD"     => "i",    # ĭ
251
      "\xC4\xAE"     => "I",    # Į
252
      "\xC4\xAF"     => "i",    # į
253
      "\xC4\xB0"     => "I",    # İ
254
      "\xC4\xB1"     => "i",    # ı
255
      "\xC4\xB2"     => "IJ",   # IJ
256
      "\xC4\xB3"     => "ij",   # ij
257
      "\xC4\xB4"     => "J",    # Ĵ
258
      "\xC4\xB5"     => "j",    # ĵ
259
      "\xC4\xB6"     => "K",    # Ķ
260
      "\xC4\xB7"     => "k",    # ķ
261
      "\xC4\xB8"     => "k",    # ĸ
262
      "\xC4\xB9"     => "L",    # Ĺ
263
      "\xC4\xBA"     => "l",    # ĺ
264
      "\xC4\xBB"     => "L",    # Ļ
265
      "\xC4\xBC"     => "l",    # ļ
266
      "\xC4\xBD"     => "L",    # Ľ
267
      "\xC4\xBE"     => "l",    # ľ
268
      "\xC4\xBF"     => "L",    # Ŀ
269
      "\xC5\x80"     => "l",    # ŀ
270
      "\xC5\x81"     => "L",    # Ł
271
      "\xC5\x82"     => "l",    # ł
272
      "\xC5\x83"     => "N",    # Ń
273
      "\xC5\x84"     => "n",    # ń
274
      "\xC5\x85"     => "N",    # Ņ
275
      "\xC5\x86"     => "n",    # ņ
276
      "\xC5\x87"     => "N",    # Ň
277
      "\xC5\x88"     => "n",    # ň
278
      "\xC5\x89"     => "n",    # ʼn
279
      "\xC5\x8A"     => "N",    # Ŋ
280
      "\xC5\x8B"     => "n",    # ŋ
281
      "\xC5\x8C"     => "O",    # Ō
282
      "\xC5\x8D"     => "o",    # ō
283
      "\xC5\x8E"     => "O",    # Ŏ
284
      "\xC5\x8F"     => "o",    # ŏ
285
      "\xC5\x90"     => "O",    # Ő
286
      "\xC5\x91"     => "o",    # ő
287
      "\xC5\x92"     => "OE",   # Œ
288
      "\xC5\x93"     => "oe",   # œ
289
      "\xC5\x94"     => "R",    # Ŕ
290
      "\xC5\x95"     => "r",    # ŕ
291
      "\xC5\x96"     => "R",    # Ŗ
292
      "\xC5\x97"     => "r",    # ŗ
293
      "\xC5\x98"     => "R",    # Ř
294
      "\xC5\x99"     => "r",    # ř
295
      "\xC5\x9A"     => "S",    # Ś
296
      "\xC5\x9B"     => "s",    # ś
297
      "\xC5\x9C"     => "S",    # Ŝ
298
      "\xC5\x9D"     => "s",    # ŝ
299
      "\xC5\x9E"     => "S",    # Ş
300
      "\xC5\x9F"     => "s",    # ş
301
      "\xC5\xA0"     => "S",    # Š
302
      "\xC5\xA1"     => "s",    # š
303
      "\xC5\xA2"     => "T",    # Ţ
304
      "\xC5\xA3"     => "t",    # ţ
305
      "\xC5\xA4"     => "T",    # Ť
306
      "\xC5\xA5"     => "t",    # ť
307
      "\xC5\xA6"     => "T",    # Ŧ
308
      "\xC5\xA7"     => "t",    # ŧ
309
      "\xC5\xA8"     => "U",    # Ũ
310
      "\xC5\xA9"     => "u",    # ũ
311
      "\xC5\xAA"     => "U",    # Ū
312
      "\xC5\xAB"     => "u",    # ū
313
      "\xC5\xAC"     => "U",    # Ŭ
314
      "\xC5\xAD"     => "u",    # ŭ
315
      "\xC5\xAE"     => "U",    # Ů
316
      "\xC5\xAF"     => "u",    # ů
317
      "\xC5\xB0"     => "U",    # Ű
318
      "\xC5\xB1"     => "u",    # ű
319
      "\xC5\xB2"     => "U",    # Ų
320
      "\xC5\xB3"     => "u",    # ų
321
      "\xC5\xB4"     => "W",    # Ŵ
322
      "\xC5\xB5"     => "w",    # ŵ
323
      "\xC5\xB6"     => "Y",    # Ŷ
324
      "\xC5\xB7"     => "y",    # ŷ
325
      "\xC5\xB8"     => "Y",    # Ÿ
326
      "\xC5\xB9"     => "Z",    # Ź
327
      "\xC5\xBA"     => "z",    # ź
328
      "\xC5\xBB"     => "Z",    # Ż
329
      "\xC5\xBC"     => "z",    # ż
330
      "\xC5\xBD"     => "Z",    # Ž
331
      "\xC5\xBE"     => "z",    # ž
332
      "\xC5\xBF"     => "f",    # ſ
333
      "\xC6\x80"     => "b",    # ƀ
334
      "\xC6\x81"     => "B",    # Ɓ
335
      "\xC6\x82"     => "B",    # Ƃ
336
      "\xC6\x83"     => "b",    # ƃ
337
      "\xC6\x84"     => "b",    # Ƅ
338
      "\xC6\x85"     => "b",    # ƅ
339
      "\xC6\x86"     => "C",    # Ɔ
340
      "\xC6\x87"     => "C",    # Ƈ
341
      "\xC6\x88"     => "c",    # ƈ
342
      "\xC6\x89"     => "D",    # Ɖ
343
      "\xC6\x8A"     => "D",    # Ɗ
344
      "\xC6\x8B"     => "D",    # Ƌ
345
      "\xC6\x8C"     => "d",    # ƌ
346
      "\xC6\x8D"     => "g",    # ƍ
347
      "\xC6\x8E"     => "E",    # Ǝ
348
      "\xC6\x8F"     => "e",    # Ə
349
      "\xC6\x90"     => "E"     # Ɛ
350
    }.freeze
351
  end
352
  # rubocop:enable Metrics/CollectionLiteralLength
353

354
  # Plain-text alternatives to the HTML special characters RedCloth uses.
355
  unless defined? HTML_SPECIAL_CHAR_EQUIVALENTS
1✔
356
    HTML_SPECIAL_CHAR_EQUIVALENTS = {
1✔
357
      "#64"   => "@",
358
      "amp"   => "&",
359
      "#38"   => "&",
360
      "gt"    => ">",
361
      "#62"   => ">",
362
      "lt"    => "<",
363
      "#60"   => "<",
364
      "quot"  => '"',
365
      "#34"   => '"',
366
      "#39"   => "'",
367
      "#169"  => "(c)",
368
      "#174"  => "(r)",
369
      "#215"  => "x",
370
      "#8211" => "-",
371
      "#8212" => "--",
372
      "#8216" => "'",
373
      "#8217" => "'",
374
      "#8220" => '"',
375
      "#8221" => '"',
376
      "#8230" => "...",
377
      "#8242" => "'",
378
      "#8243" => '"',
379
      "#8482" => "(tm)",
380
      "#8594" => "->",
381
      "nbsp"  => " "
382
    }.freeze
383
  end
384
  # rubocop:enable Layout/HashAlignment
385
  # :startdoc:
386

387
  # This should safely match anything that could possibly be interpreted as
388
  # an HTML tag.
389
  HTML_TAG_PATTERN = %r{</*[A-Za-z][^>]*>}
1✔
390

391
  ### Textile-related methods ###
392

393
  def t(sanitize = true)
1✔
394
    Textile.textilize_without_paragraph_safe(self, do_object_links: false,
294,279✔
395
                                                   sanitize: sanitize)
396
  end
397

398
  def tl(sanitize = true)
1✔
399
    Textile.textilize_without_paragraph_safe(self, do_object_links: true,
2,894✔
400
                                                   sanitize: sanitize)
401
  end
402

403
  # Textilize string, wrapped in a <div>, making it all safe for output
404
  def tp(sanitize = true)
1✔
405
    Textile.textile_div_safe do
2,385✔
406
      Textile.textilize(self, do_object_links: false, sanitize: sanitize)
2,385✔
407
    end
408
  end
409

410
  # Textilize string (with links), wrapped in a <div>,
411
  # making it all safe for output
412
  def tpl(sanitize = true)
1✔
413
    Textile.textile_div_safe do
846✔
414
      Textile.textilize(self, do_object_links: true, sanitize: sanitize)
846✔
415
    end
416
  end
417

418
  def tp_nodiv(sanitize = true)
1✔
419
    Textile.textilize_safe(self, do_object_links: false, sanitize: sanitize)
1✔
420
  end
421

422
  def tl_for_api(sanitize = true)
1✔
423
    return tl(sanitize) unless include?("\n")
1,281✔
424

425
    Textile.textilize_safe(self, do_object_links: true, sanitize: sanitize)
26✔
426
  end
427

428
  ### String transformations ###
429
  #
430
  # Convert string (assumed to be in UTF-8) to plain ASCII.
431
  def to_ascii
1✔
432
    to_s.gsub(/[^\t\n\r\x20-\x7E]/) { |c| UTF8_TO_ASCII[c] || " " }
33✔
433
  end
434

435
  # Convert string (assumed to be in UTF-8) to any other charset.  All invalid
436
  # characters are degraded to their rough ASCII equivalent, then converted.
437
  def iconv(charset)
1✔
438
    encode(charset, fallback: ->(c) { UTF8_TO_ASCII[c] || "?" })
19✔
439
  end
440

441
  # This fixes a string which is supposed to be UTF-8 but which nevertheless
442
  # might have invalid byte sequences and there's nothing we can do to fix it
443
  # "correctly".  This just ignores the invalid sequences so we get at least
444
  # *something* out of the string, and don't just dying and do nothing.
445
  #
446
  # Found this solution here:
447
  # https://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8
448
  def fix_utf8
1✔
449
    str = force_encoding("UTF-8")
7✔
450
    return str if str.valid_encoding?
7✔
451

452
    str.encode("UTF-8", "binary",
1✔
453
               invalid: :replace, undef: :replace, replace: "")
454
  end
455

456
  # Escape a string to be safe to place in double-quotes inside javascript.
457
  # TODO: Use the rails method "j" for this
458
  def escape_js_string
1✔
459
    gsub(/(["\\])/, '\\\1').
×
460
      gsub("\n", '\\n')
461
  end
462

463
  # Remove HTML tags (not entities) from string.  Used to make sure title is
464
  # safe for HTML header field.
465
  def strip_html
1✔
466
    gsub(HTML_TAG_PATTERN, "")
10,963✔
467
  end
468

469
  # Remove hyperlinks from an HTML string.
470
  def strip_links
1✔
471
    gsub(%r{</?a.*?>}, "")
17✔
472
  end
473

474
  # Truncate an HTML string, being careful to close off any open formatting
475
  # tags.  If greater than +max+, truncates to <tt>max - 1</tt> and adds "..."
476
  # to the end (inside any formatting tags open at that point).  Assumes the
477
  # String is well-formatted HTML with properly-nested tags.
478
  def truncate_html(max)
1✔
479
    result = ""
761✔
480
    # make str mutable because it will be modified in place with sub!
481
    str = String.new(self)
761✔
482
    opens = []
761✔
483
    while str != ""
761✔
484
      # Self-closing tag.
485
      if str.sub!(%r{^<(\w+)[^<>]*/ *>}, "")
779✔
486
        result += Regexp.last_match(0)
1✔
487
      # Opening tag.
488
      elsif str.sub!(/^<(\w+)[^<>]*>/, "")
778✔
489
        result += Regexp.last_match(0)
7✔
490
        opens << Regexp.last_match(1)
7✔
491
      # Closing tag -- just assume tags are nested properly.
492
      elsif str.sub!(%r{^< */ *(\w+)[^<>]*>}, "")
771✔
493
        result += Regexp.last_match(0)
3✔
494
        opens.pop
3✔
495
      # Normal text.
496
      elsif str.sub!(/^[^<>]+/, "")
768✔
497
        part = Regexp.last_match(0)
767✔
498
        if part.length > max
767✔
499
          result += "#{part[0, max - 1]}..."
5✔
500
          break
5✔
501
        elsif part
762✔
502
          max -= part.length
762✔
503
          result += part
762✔
504
        end
505
      # All bets are off if not well-formatted HTML.
506
      else
507
        break
1✔
508
      end
509
    end
510
    result += opens.reverse.map { |x| "</#{x}>" }.join
765✔
511
    # Disable cop; we need `html_safe` to prevent Rails from adding escaping
512
    result.html_safe # rubocop:disable Rails/OutputSafety
761✔
513
  end
514

515
  # Attempt to turn HTML into plain text.  Remove all '<blah>' tags, and
516
  # convert '&blah;' codes into ASCII equivalents.  Line breaks may still be a
517
  # problem, but this seems to work pretty well on the output of RedCloth at
518
  # least.
519
  def html_to_ascii
1✔
520
    gsub(/\s*\n\s*/, " "). # remove all newlines first
34,707✔
521
      gsub(%r{</?div[^>]*>}, ""). # divs are messing things up, too
522
      gsub(%r{<br */> *}, "\n"). # put \n after every line break
523
      gsub(%r{</li> *}, "\n"). # put \n after every list item
524
      gsub(%r{</tr> *}, "\n"). # put \n after every table row
525
      gsub(%r{</(p|h\d)> *}, "\n\n"). # put two \n between paragraphs
526
      gsub(%r{</td> *}, "\t"). # put tabs between table columns
527
      gsub(/[ \t]+(\n|$)/, '\\1'). # remove superfluous trailing whitespace
528
      gsub(/\n+\Z/, ""). # remove superfluous newlines at end
529
      gsub(HTML_TAG_PATTERN, ""). # remove all <tags>
530
      gsub(/^ +|[ \t]+$/, ""). # remove leading/trailing sp on each line
531
      gsub_html_special_chars # convert &xxx; and &#nnn; to ascii
532
  end
533

534
  def gsub_html_special_chars
1✔
535
    gsub(
34,707✔
536
      /&(#\d+|[a-zA-Z]+);/
537
    ) { HTML_SPECIAL_CHAR_EQUIVALENTS[Regexp.last_match(1)].to_s }.
321✔
538
      # Disable cop; we need `html_safe` to prevent Rails from adding escaping
539
      html_safe # rubocop:disable Rails/OutputSafety
540
  end
541

542
  # Render special encoded characters as regular characters in HTML
543
  def unescape_html
1✔
544
    CGI.unescapeHTML(self)
2,053✔
545
  end
546

547
  # For integration test comparisons: no tags and no special char encodings
548
  # i.e., the whole string as a human would encounter it in the browser
549
  def as_displayed
1✔
550
    strip_html.unescape_html.strip_squeeze
34✔
551
  end
552

553
  # Rails generates an id for a nested field like "foo[bar]" that's snake_case
554
  # - no brackets. This gets you that string. (used in forms_helper)
555
  # `chomp("_")` is to remove trailing underscores
556
  def id_of_nested_field
1✔
UNCOV
557
    gsub(/[\[\]]+/, "_").chomp("_")
×
558
  end
559

560
  # Insert a line break between the scientific name and the author
561
  # (for styling taxonomic names legibly)
562
  def break_name
1✔
563
    possibles = ["</i></b>", "</i>"]
4,588✔
564
    tag = possibles.each do |x|
4,588✔
565
      break x if include?(x)
4,776✔
566
    end
567
    return self unless tag.is_a?(String)
4,588✔
568

569
    offset = tag.length + 1
4,523✔
570
    ind = rindex(tag)
4,523✔
571
    return self if !ind || !offset || (length <= (ind + offset))
4,523✔
572

573
    insert(ind + offset, "<br/>".html_safe)
3,129✔
574
  end
575

576
  # Wrap the author name in <small> HTML tag, with or without break
577
  # (for styling taxonomic names legibly)
578
  def small_author
1✔
579
    possibles = ["<br/>", "</i></b>", "</i>"]
5,460✔
580
    tag = possibles.each do |x|
5,460✔
581
      break x if include?(x)
7,938✔
582
    end
583
    return self unless tag.is_a?(String)
5,460✔
584

585
    offset = tag.length
5,392✔
586
    ind = rindex(tag)
5,392✔
587
    return self if !ind || !offset || (length <= (ind + offset))
5,392✔
588

589
    insert(length, "</small>".html_safe)
3,509✔
590
    insert(ind + offset, "<small>".html_safe)
3,509✔
591
  end
592

593
  # Strip leading and trailing spaces, and squeeze embedded spaces.
594
  # Differs from Rails "squish" which works on all whitespace
595
  #
596
  # The following two are equivalent:
597
  #
598
  #   string.strip_squeeze
599
  #   string.strip.squeeze(' ')
600
  #
601
  # Why?  Because it lets us do this:
602
  #
603
  #   names = text.split(/\n/).map(&:strip_squeeze)
604
  #
605
  # Example: string = "This  type of string. "
606
  #
607
  #   string.strip_squeeze == "This type of string."
608
  #
609
  def strip_squeeze
1✔
610
    strip.squeeze(" ")
15,583✔
611
  end
612

613
  # Sort of like strip_squeeze, but removes Textile's line breaks in the middle
614
  # of the string (not leading/trailing space). Raw Textile strings may contain
615
  # "\r\n"... Textile's .tpl turns "\r\n\r\n" into a closing/opening "</p><p>"
616
  # and a single "\r\n" into "<br />\n"
617
  # This gets rid of them both. Turns all newlines into single spaces.
618
  def wring_out_textile
1✔
619
    gsub("\r\n", " ").squeeze(" ")
281✔
620
  end
621

622
  # Generate a string of random characters of length +len+.  By default it
623
  # chooses from among the lowercase letters and digits, however you can give
624
  # it an arbitrary set of characters to choose from.  (And they don't have to
625
  # be unique, if you want to change the distribution a little bit.)
626
  #
627
  #   new_password = String.random(10)
628
  #
629
  def self.random(len, chars = "abcdefghijklmnopqrstuvwxyz0123456789")
1✔
630
    result = ""
1,860✔
631
    len.times { result += chars.to_s.rand_char }
32,530✔
632
    result
1,860✔
633
  end
634

635
  # Pick a random character from the String.  Result is a String of length 1.
636
  #
637
  #   char = "jabberwocky".rand_char
638
  #
639
  def rand_char
1✔
640
    self[Kernel.rand(length), 1]
30,671✔
641
  end
642

643
  # Reverse Integer#alphabetize.
644
  #
645
  #   string = integer.alphabetize
646
  #   integer = string.dealphabetize
647
  #   #   0         -> 0
648
  #   #   42        -> g
649
  #   #   123456789 -> 8M0kX
650
  #
651
  #   hex = decimal.alphabetize("0123456789ABCDEF")
652
  #   decimal = hex.dealphabetize("0123456789ABCDEF")
653
  #   #   0         -> 0
654
  #   #   42        -> 2A
655
  #   #   123456789 -> 75BCD15
656
  #
657
  def dealphabetize(
1✔
658
    alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
659
  )
660
    str      = to_s
22✔
661
    alphabet = alphabet.to_s
22✔
662
    len      = alphabet.length
22✔
663
    str.chars.inject(0) do |num, char|
22✔
664
      i = alphabet.index(char)
58✔
665
      raise("Character not in alphabet: '#{char}'") if i.nil?
58✔
666

667
      num * len + i
57✔
668
    end
669
  end
670

671
  # Find amount first line is indented and remove that from all lines.
672
  def unindent
1✔
673
    gsub(/^#{self[/\A\s*/]}/, "")
2✔
674
  end
675

676
  ### String Queries ###
677
  #
678
  # Does this string start with a ASCII character?
679
  def is_ascii_character?
1✔
680
    dup.force_encoding("binary")[0].ord < 128
15✔
681
  end
682

683
  # Clean a pattern for use in LIKE condition. Takes and returns a String.
684
  # This is a replacement for Query's method `clean_pattern`
685
  def clean_pattern
1✔
686
    gsub(/[%'"\\]/) { |x| "\\#{x}" }.tr("*", "%")
1,060✔
687
  end
688

689
  # Returns percentage match between +self+ and +other+, where 1.0 means the two
690
  # strings are equal, and 0.0 means every character is different.
691
  def percent_match(other)
1✔
692
    max = [length, other.length].max
89✔
693
    1.0 - levenshtein_distance_to(other).to_f / max
89✔
694
  end
695

696
  # Returns number of character edits required to transform +self+ into +other+.
697
  def levenshtein_distance_to(other)
1✔
698
    levenshtein_distance(self, other)
95✔
699
  end
700

701
  # This definition copied from Rails::Generators, Which is based directly on
702
  # the Text gem implementation.
703
  def levenshtein_distance(str1, str2)
1✔
704
    s = str1
95✔
705
    t = str2
95✔
706
    n = s.length
95✔
707
    m = t.length
95✔
708

709
    return m if n.zero?
95✔
710
    return n if m.zero?
93✔
711

712
    d = (0..m).to_a
92✔
713
    x = nil
92✔
714

715
    str1.each_char.with_index do |char1, i|
92✔
716
      e = i + 1
2,158✔
717

718
      str2.each_char.with_index do |char2, j|
2,158✔
719
        cost = (char1 == char2 ? 0 : 1)
58,928✔
720
        x = [
721
          d[j + 1] + 1, # insertion
58,928✔
722
          e + 1,        # deletion
723
          d[j] + cost   # substitution
724
        ].min
725
        d[j] = e
58,928✔
726
        e = x
58,928✔
727
      end
728

729
      d[m] = x
2,158✔
730
    end
731

732
    x
92✔
733
  end
734

735
  # Returns the MD5 sum.
736
  def md5sum
1✔
737
    Digest::MD5.hexdigest(self)
×
738
  end
739

740
  ### Misc Utilities ###
741

742
  # Disable cop because it seems like we really want to priint, not just log
743
  def print_thing(thing)
1✔
744
    print("#{self}: #{thing.class}: #{thing}\n") # rubocop:disable Rails/Output
×
745
  end
746

747
  def to_boolean
1✔
748
    ActiveRecord::Type::Boolean.new.cast(self)
662✔
749
  end
750
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc