• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In
Build has been canceled!

PolyMathOrg / DataFrame / 5679451719

pending completion
5679451719

push

github

web-flow
Merge pull request #276 from Joshua-Dias-Barreto/InspectorRowNames

Added a column for row names in the inspector table. Fixes #275

13 of 13 new or added lines in 1 file covered. (100.0%)

13194 of 13483 relevant lines covered (97.86%)

2.94 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

97.26
/src/DataFrame/DataFrame.class.st
1
"
2
I am a tabular data structure designed for data analysis.
3

4
I store data in a table and provide an API for querying and modifying that data. I know row and column names associated with the table of data, which allows you to treat rows as observations and columns as features and reference them by their names. I also know the type of data stored in each column. In general, I am similar to spreadsheets such as Excel or to data frames in other languages, for example pandas (Python) or R.
5

6
The efficient data structure that I use to store the data is defined by DataFrameInternal. However, you can think of me as a collection of rows. Every time you interact with one of my rows or columns it will be an instance of the DataSeries class. I use DataTypeInductor to induce types of my columns every time they are modified. DataPrettyPrinter allows you to print me as a beautiful string table, DataFrameFTData defines a data source based on myself that is used in FastTable to display me in the inspector. I provide aggregation and grouping fuctionality which is implemented using a helper class DataFrameGrouped.
7

8
Public API and Key Messages
9

10
        Creating empty data frame (class side):
11
                - new (empty data frame)
12
                - new: point (empty data frame with given dimensions)
13
                - withColumnNames: arrayOfColumnNames (empty data frame with column names)
14
                - withRowNames: arrayOfRowNames (empty data frame with row names)
15
                - withRowNames: arrayOfRowNames columnNames: arrayOfColumnNames (empty data frame with row and column names)
16
                
17
        Creating data frame from an array of columns (class side):
18
                - withColumns: arrayOfArrays
19
                - withColumns: arrayOfArrays columnNames: arrayOfColumnNames
20
                - withColumns: arrayOfArrays rowNames: arrayOfRowNames
21
                - withColumns: arrayOfArrays rowNames: arrayOfRowNames columnNames: arrayOfColumnNames
22
                
23
        Creating data frame from an array of rows (class side):
24
                - withRows: arrayOfArrays
25
                - withRows: arrayOfArrays columnNames: arrayOfColumnNames
26
                - withRows: arrayOfArrays rowNames: arrayOfRowNames
27
                - withRows: arrayOfArrays rowNames: arrayOfRowNames columnNames: arrayOfColumnNames
28

29
        Converting:
30
                - asArrayOfColumns
31
                - asArrayOfRows
32

33
        Dimensions
34
                - numberOfColumns
35
                - numberOfRows
36
                - dimensions (a Point numberOfRows @ numberOfColumns)
37
                
38
        Column and row names:
39
                - columnNames 
40
                - columnNames: arrayOfNewNames
41
                - rowNames
42
                - rowNames: arrayOfNewNames
43
                
44
        Column types
45
                - columnTypes (classes of values stored in each column)
46

47
        Getting columns:
48
                - column: columnName 
49
                - columnAt: index
50
                - columns: arrayOfColumnNames
51
                - columnsAt: arrayOfIndices
52
                - columnsFrom: firstIndex to: lastIndex
53
                
54
        Getting rows:
55
                - row: rowName
56
                - rowAt: index
57
                - rows: arrayOfRowNames
58
                - rowsAt: arrayOfIndices
59
                - rowsFrom: firstIndex to: lastIndex
60
                - at: index (same as rowAt:)
61
                
62
        Getting a cell value:
63
                - at: rowIndex at: columnIndex
64
                
65
        Setting columns
66
                - column: columnName put: arrayOrDataSeries
67
                - columnAt: index put: arrayOrDataSeries
68
                
69
        Setting rows
70
                - row: rowName put: arrayOrDataSeries
71
                - rowAt: index put: arrayOrDataSeries
72

73
        Setting a cell value:
74
                - at: rowIndex at: columnIndex put: value
75
                
76
        Head and tail:
77
                - head (first 5 rows)
78
                - head: numberOfRows
79
                - tail (last 5 rows)
80
                - tail: numberOfRows
81
                
82
        Adding columns:
83
                - addColumn: dataSeries
84
                - addColumn: dataSeries atPosition: index
85
                - addColumn: array named: columnName
86
                - addColumn: array named: columnName atPosition: index
87
                - addEmptyColumnNamed: columnName
88
                - addEmptyColumnNamed: columnName atPosition: index
89
                
90
        Adding rows:
91
                - addRow: dataSeries
92
                - addRow: dataSeries atPosition: index
93
                - addRow: array named: rowName
94
                - addRow: array named: rowName atPosition: index
95
                - addEmptyRowNamed: rowName
96
                - addEmptyRowNamed: rowName atPosition: index
97
                - add: dataSeries (same as addRow:)
98
                
99
        Removing columns:
100
                - removeColumn: columnName
101
                - removeColumnAt: index
102
                
103
        Removing rows: 
104
                - removeRow: rowName
105
                - removeRowAt: index
106
                - removeFirstRow
107
                - removeLastRow
108
                
109
        Enumerating (over rows):
110
                - collect: block
111
                - do: block 
112
                - select: block
113
                - withKeyDo: block
114
                
115
        Aggregating and grouping:
116
                - groupBy: columnName (returns an instance of DataFrameGrouped)
117
                - groupBy: columnName aggregate: selector (groups data and aggregates it with a given function)
118
                - group: columnNameOrArrayOfColumnNames by: columnName (groups part of data frame)
119
                
120
        Applying:
121
                - applyElementwise: block (to all columns)
122
                - toColumn: columnName applyElementwise: block
123
                - toColumnAt: index applyElementwise: block
124
                - toColumns: arrayOfColumnNames applyElementwise: block
125
                - toColumnsAt: arrayOfIndices applyElementwise: block
126
                
127
        Sorting:
128
                - sortBy: columnName
129
                - sortDescendingBy: columnName
130
                - sortBy: columnName using: block
131
                
132
        Statistical functions (applied to quantitative columns):
133
                - min
134
                - max
135
                - range (max minus min)
136
                - average
137
                - mean
138
                - mode
139
                - median (second quartile)
140
                - first quartile
141
                - third quartile
142
                - interquartileRange (trird quartile minus first quartile)
143
                - stdev (standard deviation)
144
                - variance
145
 
146
Internal Representation and Key Implementation Points.
147

148
        DataFrameInternal defines how data is stored inside me.
149
"
150
Class {
151
        #name : #DataFrame,
152
        #superclass : #Collection,
153
        #instVars : [
154
                'contents',
155
                'rowNames',
156
                'columnNames',
157
                'dataTypes'
158
        ],
159
        #category : #'DataFrame-Core'
160
}
161

162
{ #category : #'instance creation' }
163
DataFrame class >> new: aPoint [
3✔
164

3✔
165
        ^ super new initialize: aPoint
3✔
166
]
3✔
167

168
{ #category : #'instance creation' }
169
DataFrame class >> withColumnNames: anArrayOfColumnNames [
3✔
170
        "Create an empty data frame with given column names"
3✔
171
        | numberOfColumns df |
3✔
172

3✔
173
        numberOfColumns := anArrayOfColumnNames size.
3✔
174
        df := self new: 0 @ numberOfColumns.
3✔
175

3✔
176
        df columnNames: anArrayOfColumnNames.
3✔
177
        ^ df
3✔
178
]
3✔
179

180
{ #category : #'instance creation' }
181
DataFrame class >> withColumnNames: anArrayOfColumnNames withRowNames: anArrayOfRowNames [
3✔
182
        "Create an empty data frame with given column and row names"
3✔
183

3✔
184
        | numberOfColumns numberOfRows df |
3✔
185

3✔
186
        numberOfColumns := anArrayOfColumnNames size.
3✔
187
        numberOfRows := anArrayOfRowNames size.
3✔
188
        df := self new: numberOfRows @ numberOfColumns.
3✔
189

3✔
190
        df columnNames: anArrayOfColumnNames.
3✔
191
        df rowNames: anArrayOfRowNames.
3✔
192
        ^ df
3✔
193
]
3✔
194

195
{ #category : #'instance creation' }
196
DataFrame class >> withColumns: anArrayOfArrays [
3✔
197

3✔
198
        ^ self new initializeColumns: anArrayOfArrays
3✔
199
]
3✔
200

201
{ #category : #'instance creation' }
202
DataFrame class >> withColumns: anArrayOfArrays columnNames: anArrayOfColumnNames [
3✔
203

3✔
204
        | df |
3✔
205
        df := self withColumns: anArrayOfArrays.
3✔
206
        df columnNames: anArrayOfColumnNames.
3✔
207
        ^ df
3✔
208
]
3✔
209

210
{ #category : #'instance creation' }
211
DataFrame class >> withColumns: anArrayOfArrays rowNames: anArrayOfRowNames [
3✔
212
        ^ anArrayOfArrays
3✔
213
                ifNotEmpty: [ (self withColumns: anArrayOfArrays)
3✔
214
                                rowNames: anArrayOfRowNames;
3✔
215
                                yourself ]
3✔
216
                ifEmpty: [ self withRowNames: anArrayOfRowNames ]
3✔
217
]
3✔
218

219
{ #category : #'instance creation' }
220
DataFrame class >> withColumns: anArrayOfArrays rowNames: anArrayOfRowNames columnNames: anArrayOfColumnNames [
3✔
221
        ^ anArrayOfArrays
3✔
222
                ifNotEmpty: [ (self withColumns: anArrayOfArrays)
3✔
223
                                rowNames: anArrayOfRowNames;
3✔
224
                                columnNames: anArrayOfColumnNames;
3✔
225
                                yourself ]
3✔
226
                ifEmpty: [ self withRowNames: anArrayOfRowNames ]
3✔
227
]
3✔
228

229
{ #category : #'instance creation' }
230
DataFrame class >> withDataFrameInternal: aDataFrameIndernal rowNames: rows columnNames: columns [
3✔
231

3✔
232
        ^ self new
3✔
233
                initializeContents: aDataFrameIndernal
3✔
234
                rowNames: rows
3✔
235
                columnNames: columns
3✔
236
]
3✔
237

238
{ #category : #'instance creation' }
239
DataFrame class >> withRowNames: anArrayOfRowNames [
3✔
240
        "Create an empty data frame with given row names"
3✔
241
        | numberOfRows df |
3✔
242

3✔
243
        numberOfRows := anArrayOfRowNames size.
3✔
244
        df := self new: numberOfRows @ 0.
3✔
245

3✔
246
        df rowNames: anArrayOfRowNames.
3✔
247
        ^ df
3✔
248
]
3✔
249

250
{ #category : #'instance creation' }
251
DataFrame class >> withRowNames: anArrayOfRowNames columnNames: anArrayOfColumnNames [
3✔
252
        "Create an empty data frame with given row and column names"
3✔
253
        | numberOfRows numberOfColumns df |
3✔
254

3✔
255
        numberOfRows := anArrayOfRowNames size.
3✔
256
        numberOfColumns := anArrayOfColumnNames size.
3✔
257

3✔
258
        df := self new: numberOfRows @ numberOfColumns.
3✔
259

3✔
260
        df rowNames: anArrayOfRowNames.
3✔
261
        df columnNames: anArrayOfColumnNames.
3✔
262

3✔
263
        ^ df
3✔
264
]
3✔
265

266
{ #category : #'instance creation' }
267
DataFrame class >> withRows: anArrayOfArrays [
3✔
268

3✔
269
        ^ self new initializeRows: anArrayOfArrays
3✔
270
]
3✔
271

272
{ #category : #'instance creation' }
273
DataFrame class >> withRows: anArrayOfArrays columnNames: anArrayOfColumnNames [
3✔
274
        ^ anArrayOfArrays
3✔
275
                ifNotEmpty: [ (self withRows: anArrayOfArrays)
3✔
276
                                columnNames: anArrayOfColumnNames;
3✔
277
                                yourself ]
3✔
278
                ifEmpty: [ self withColumnNames: anArrayOfColumnNames ]
3✔
279
]
3✔
280

281
{ #category : #'instance creation' }
282
DataFrame class >> withRows: anArrayOfArrays rowNames: anArrayOfRowNames [
3✔
283

3✔
284
        | df |
3✔
285
        df := self withRows: anArrayOfArrays.
3✔
286
        df rowNames: anArrayOfRowNames.
3✔
287
        ^ df
3✔
288
]
3✔
289

290
{ #category : #'instance creation' }
291
DataFrame class >> withRows: anArrayOfArrays rowNames: anArrayOfRowNames columnNames: anArrayOfColumnNames [
3✔
292
        ^ anArrayOfArrays
3✔
293
                ifNotEmpty: [ (self withRows: anArrayOfArrays)
3✔
294
                                rowNames: anArrayOfRowNames;
3✔
295
                                columnNames: anArrayOfColumnNames;
3✔
296
                                yourself ]
3✔
297
                ifEmpty: [ self withColumnNames: anArrayOfColumnNames ]
3✔
298
]
3✔
299

300
{ #category : #comparing }
301
DataFrame >> , aDataFrame [
3✔
302

3✔
303
        | dataFrame rows |
3✔
304
        self columnNames = aDataFrame columnNames ifFalse: [ self error: 'Not yet supported.' ].
3✔
305
        (self rowNames includesAny: aDataFrame rowNames) ifTrue: [ self error: 'Not yet supported.' ].
3✔
306

3✔
307
        dataFrame := self copy.
3✔
308
        rows := aDataFrame asArrayOfRows.
3✔
309
        aDataFrame rowNames doWithIndex: [ :name :index | dataFrame addRow: (rows at: index) named: name ].
3✔
310

3✔
311
        ^ dataFrame
3✔
312
]
3✔
313

314
{ #category : #comparing }
315
DataFrame >> = aDataFrame [
3✔
316

3✔
317
        "Most objects will fail here"
3✔
318
        aDataFrame species = self species
3✔
319
                ifFalse: [ ^ false ].
3✔
320

3✔
321
        "This is the fastest way for two data frames with different dimensions"
3✔
322
        aDataFrame dimensions = self dimensions
3✔
323
                ifFalse: [ ^ false ].
3✔
324

3✔
325
        "If the names are different we don't need to iterate through values"
3✔
326
        (aDataFrame rowNames = self rowNames
3✔
327
                and: [ aDataFrame columnNames = self columnNames ])
3✔
328
                ifFalse: [ ^ false ].
3✔
329

3✔
330
        ^ aDataFrame contents = self contents
3✔
331
]
3✔
332

333
{ #category : #adding }
334
DataFrame >> add: aDataSeries [
3✔
335

3✔
336
        "Add DataSeries as a new row at the end"
3✔
337

3✔
338
        self flag:
3✔
339
                'This mathod name is not correct. It is misleading. We should think if we should delete it or keep it'.
3✔
340
        self addRow: aDataSeries
3✔
341
]
3✔
342

343
{ #category : #adding }
344
DataFrame >> addColumn: aDataSeries [
3✔
345
        "Add DataSeries as a new column at the end"
3✔
346

3✔
347
        "(#(#(1 2) #(3 4)) asDataFrame addColumn: #(5 6) asDataSeries named: 3) >>> (#(#(1 2 5) #(3 4 6)) asDataFrame)"
3✔
348

3✔
349
        "(#(#(r1c1 r1c2)) asDataFrame addColumn: #(r1c3) asDataSeries named: 3) >>> (#(#(r1c1 r1c2 r1c3)) asDataFrame)"
3✔
350

3✔
351
        self addColumn: aDataSeries named: aDataSeries name.
3✔
352
        self dataTypes
3✔
353
                at: aDataSeries name
3✔
354
                put: aDataSeries calculateDataType
3✔
355
]
3✔
356

357
{ #category : #adding }
358
DataFrame >> addColumn: aDataSeries atPosition: aNumber [
3✔
359
        "Add DataSeries as a new column at the given position"
3✔
360

3✔
361
        "(#(#(1 2) #(3 4)) asDataFrame addColumn: #(5 6) asDataSeries named: 3 atPosition: 3) >>> (#(#(1 2 5) #(3 4 6)) asDataFrame)"
3✔
362

3✔
363
        "(#(#(r1c1 r1c2)) asDataFrame addColumn: #(r1c3) asDataSeries named: 3 atPosition: 3) >>> (#(#(r1c1 r1c2 r1c3)) asDataFrame)"
3✔
364

3✔
365
        self
3✔
366
                addColumn: aDataSeries asArray
3✔
367
                named: aDataSeries name
3✔
368
                atPosition: aNumber
3✔
369
]
3✔
370

371
{ #category : #adding }
372
DataFrame >> addColumn: anArray named: aString [
3✔
373
        "Add a new column at the end"
3✔
374
        self addColumn: anArray named: aString atPosition: self numberOfColumns + 1
3✔
375
]
3✔
376

377
{ #category : #adding }
378
DataFrame >> addColumn: anArray named: aString atPosition: aNumber [
3✔
379
        "Add a new column at the given position"
3✔
380
        (self columnNames includes: aString)
3✔
381
                ifTrue: [ Error signal: 'A column with that name already exists' ].
3✔
382

3✔
383
        contents addColumn: anArray asArray atPosition: aNumber.
3✔
384
        columnNames add: aString afterIndex: aNumber - 1.
3✔
385
        dataTypes at: aString put: (anArray asDataSeries calculateDataType)
3✔
386
]
3✔
387

388
{ #category : #adding }
389
DataFrame >> addEmptyColumnNamed: aString [
3✔
390
        "Add an empty column at the end"
3✔
391
        self addEmptyColumnNamed: aString atPosition: self numberOfColumns + 1
3✔
392
]
3✔
393

394
{ #category : #adding }
395
DataFrame >> addEmptyColumnNamed: aString atPosition: aNumber [
3✔
396
        "Add an empty column at the given position"
3✔
397
        self addColumn: (Array new: self numberOfRows) named: aString atPosition: aNumber
3✔
398
]
3✔
399

400
{ #category : #adding }
401
DataFrame >> addEmptyRowNamed: aString [
3✔
402
        "Add an empty row at the end"
3✔
403
        self addEmptyRowNamed: aString atPosition: self numberOfRows + 1
3✔
404
]
3✔
405

406
{ #category : #adding }
407
DataFrame >> addEmptyRowNamed: aString atPosition: aNumber [
3✔
408
        "Add an empty row at the given position"
3✔
409
        self addRow: (Array new: self numberOfColumns) named: aString atPosition: aNumber
3✔
410
]
3✔
411

412
{ #category : #adding }
413
DataFrame >> addRow: aDataSeries [
3✔
414
        "Add DataSeries as a new row at the end"
3✔
415

3✔
416
        "(#(#(1 2) #(3 4)) asDataFrame addRow: #(5 6) asDataSeries named: 3) >>> (#(#(1 2) #(3 4) #(5 6)) asDataFrame)"
3✔
417

3✔
418
        "(#(#(r1c1 r1c2)) asDataFrame addRow: #(r2c1 r2c2) asDataSeries named: 2) >>> (#(#(r1c1 r1c2 ) #(r2c1 r2c2)) asDataFrame)"
3✔
419

3✔
420
        self addRow: aDataSeries atPosition: self numberOfRows + 1
3✔
421
]
3✔
422

423
{ #category : #adding }
424
DataFrame >> addRow: aDataSeries atPosition: aNumber [
3✔
425
        "Add DataSeries as a new row at the given position"
3✔
426

3✔
427
        "(#(#(1 2) #(3 4)) asDataFrame addRow: #(5 6) asDataSeries named: 3 atPosition: 3) >>> (#(#(1 2) #(3 4) #(5 6)) asDataFrame)"
3✔
428

3✔
429
        "(#(#(r1c1 r1c2)) asDataFrame addRow: #(r2c1 r2c2) asDataSeries named: 2 atPosition: 2) >>> (#(#(r1c1 r1c2 ) #(r2c1 r2c2)) asDataFrame)"
3✔
430

3✔
431
        | row |
3✔
432
        row := Array new: self columnNames size.
3✔
433
        self columnNames withIndexDo: [ :columnName :index |
3✔
434
                | value |
3✔
435
                value := aDataSeries
3✔
436
                                 at: columnName
3✔
437
                                 ifAbsent: [ aDataSeries atIndex: index ].
3✔
438
                row at: index put: value ].
3✔
439
        self addRow: row named: aDataSeries name atPosition: aNumber
3✔
440
]
3✔
441

442
{ #category : #adding }
443
DataFrame >> addRow: anArray named: aString [
3✔
444
        "Add a new row at the end"
3✔
445
        self addRow: anArray named: aString atPosition: self numberOfRows + 1
3✔
446
]
3✔
447

448
{ #category : #adding }
449
DataFrame >> addRow: anArray named: aString atPosition: aNumber [
3✔
450
        "Add a new row at the given position"
3✔
451
        (self rowNames includes: aString)
3✔
452
                ifTrue: [ Error signal: 'A row with that name already exists' ].
3✔
453

3✔
454
        contents addRow: anArray atPosition: aNumber.
3✔
455
        rowNames add: aString afterIndex: aNumber - 1
3✔
456
]
3✔
457

458
{ #category : #applying }
459
DataFrame >> applyElementwise: aBlock [
3✔
460
        "Applies a given block to all columns of a data frame"
3✔
461

3✔
462
        "(#(#(1 2) #(3 4)) asDataFrame applyElementwise:[ :x | x - 1 ]) >>> (#(#(0 1) #(2 3)) asDataFrame)"
3✔
463

3✔
464
        self toColumns: self columnNames applyElementwise: aBlock
3✔
465
]
3✔
466

467
{ #category : #enumerating }
468
DataFrame >> applySize [
×
469
        "Answer a new instance of the receiver with the size of each element at each element position"
×
470

×
471
        ^ self collectWithIndex: [ :r :i |
×
472
                DataSeries
×
473
                        withValues: (r values collect: [ : e | e ifNil: [ 0 ] ifNotNil: [ e size ]])
×
474
                        name: i ]
×
475
]
×
476

477
{ #category : #private }
478
DataFrame >> applyToAllColumns: aSymbol [
3✔
479
"Sends the unary selector, aSymbol, to all columns of DataFrame and collects the result into a DataSeries object. Used by statistical functions of DataFrame"
3✔
480

3✔
481
        | series column |
3✔
482

3✔
483
        series := DataSeries withValues:
3✔
484
                (self columnNames collect: [ :colName |
3✔
485
                        column := self column: colName.
3✔
486
                        column perform: aSymbol ]).
3✔
487

3✔
488
        series name: aSymbol.
3✔
489
        series keys: self columnNames.
3✔
490

3✔
491
        ^ series
3✔
492
]
3✔
493

3✔
494
{ #category : #converting }
3✔
495
DataFrame >> asArray [
3✔
496
        "Converts DataFrame to the array of rows"
3✔
497

3✔
498
        "(#(#(1 2) #(3 4)) asDataFrame asArray) >>> (#(#(1 2) #(3 4)))"
3✔
499

3✔
500
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame asArray) >>> (#(#(r1c1 r1c2) #(r2c1 r2c2)))"
3✔
501

3✔
502
        ^ self asArrayOfRows
3✔
503
]
3✔
504

3✔
505
{ #category : #converting }
3✔
506
DataFrame >> asArrayOfColumns [
3✔
507
        "Converts DataFrame to the array of columns"
3✔
508

3✔
509
        "(#(#(1 2) #(3 4)) asDataFrame asArrayOfColumns) >>> (#(#(1 3) #(2 4)))"
3✔
510

3✔
511
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame asArrayOfColumns) >>> (#(#(r1c1 r2c1) #(r1c2 r2c2)))"
3✔
512

3✔
513
        ^ contents asArrayOfColumns
3✔
514
]
3✔
515

3✔
516
{ #category : #converting }
3✔
517
DataFrame >> asArrayOfRows [
3✔
518
        "Converts DataFrame to the array of rows"
3✔
519

3✔
520
        "(#(#(1 2) #(3 4)) asDataFrame asArrayOfRows) >>> (#(#(1 2) #(3 4)))"
3✔
521

3✔
522
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame asArrayOfRows) >>> (#(#(r1c1 r1c2) #(r2c1 r2c2)))"
3✔
523

3✔
524
        ^ contents asArrayOfRows
3✔
525
]
3✔
526

3✔
527
{ #category : #converting }
3✔
528
DataFrame >> asArrayOfRowsWithName [
3✔
529
        "Answer an OrderedCollection where each item is an Array with:
3✔
530
        - the name of that row, in first place,
3✔
531
        - the contents of that row.
3✔
532
        "
3✔
533

3✔
534
        ^ self rowNames withIndexCollect: [ :name :index |
3✔
535
                Array streamContents: [ :stream |
3✔
536
                        stream nextPut: name;
3✔
537
                                nextPutAll: (self at: index) ] ]
3✔
538
]
3✔
539

3✔
540
{ #category : #accessing }
3✔
541
DataFrame >> at: aNumber [
3✔
542
        "Returns the row of a DataFrame at row index aNumber"
3✔
543

3✔
544
        "(#(#(1 2) #(3 4)) asDataFrame at: 1) >>> (#(1 2) asDataSeries)"
3✔
545

3✔
546
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame at: 2) >>> (#(r2c1 r2c2) asDataSeries)"
3✔
547

3✔
548
        ^ self rowAt: aNumber
3✔
549
]
3✔
550

3✔
551
{ #category : #accessing }
3✔
552
DataFrame >> at: rowNumber at: columnNumber [
3✔
553
        "Returns the value whose row index is rowNumber and column index is columnNumber"
3✔
554

3✔
555
        "(#(#(1 2) #(3 4)) asDataFrame at: 1 at:1) >>> 1"
3✔
556

3✔
557
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame at: 2 at: 1) >>> #r2c1"
3✔
558

3✔
559
        ^ contents at: rowNumber at: columnNumber
3✔
560
]
3✔
561

3✔
562
{ #category : #accessing }
3✔
563
DataFrame >> at: rowNumber at: columnNumber put: value [
3✔
564
        "Replaces the original value of a DataFrame at row index rowNumber and column index columnNumber with a given value"
3✔
565

3✔
566
        "(#(#(1 2) #(3 4)) asDataFrame at: 1 at:1 put: 5) >>> (#(#(5 2) #(3 4)) asDataFrame)"
3✔
567

3✔
568
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame at: 2 at: 1 put: #R2C1) >>> (#(#(r1c1 r1c2) #(R2C1 r2c2)) asDataFrame)"
3✔
569

3✔
570
        contents at: rowNumber at: columnNumber put: value
3✔
571
]
3✔
572

3✔
573
{ #category : #accessing }
3✔
574
DataFrame >> at: rowIndex at: columnIndex transform: aBlock [
3✔
575
        "Evaluate aBlock on the value at the intersection of rowIndex and columnIndex and replace that value with the result"
3✔
576

3✔
577
        "(#(#(1 2) #(3 4)) asDataFrame at: 1 at:1 transform: [:x| x - 1]) >>>(#(#(0 2) #(3 4)) asDataFrame)"
3✔
578

3✔
579
        | value |
3✔
580
        value := self at: rowIndex at: columnIndex.
3✔
581
        self at: rowIndex at: columnIndex put: (aBlock value: value)
3✔
582
]
3✔
583

3✔
584
{ #category : #accessing }
3✔
585
DataFrame >> at: aNumber transform: aBlock [
3✔
586
        "Evaluate aBlock on the row at aNumber and replace that row with the result"
3✔
587

3✔
588
        "(#(#(1 2) #(3 4)) asDataFrame at: 1 transform: [:x| x - 1]) >>>(#(#(0 1) #(3 4)) asDataFrame)"
3✔
589

3✔
590
        ^ self rowAt: aNumber transform: aBlock
3✔
591
]
3✔
592

3✔
593
{ #category : #accessing }
3✔
594
DataFrame >> atAll: indexes [
3✔
595
        "For polymorphisme with other collections."
3✔
596

3✔
597
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame atAll: #(1 3)) >>> (#(#(1 2) #(5 6)) asDataFrame)"
3✔
598

3✔
599
        "(#(#(r1c1 r1c2) #(r2c1 r2c2) #(r3c1 r3c2)) asDataFrame atAll: #(1 3)) >>> (#(#(r1c1 r1c2) #(r3c1 r3c2)) asDataFrame)"
3✔
600

3✔
601
        ^ self rowsAt: indexes
3✔
602
]
3✔
603

3✔
604
{ #category : #statistics }
3✔
605
DataFrame >> average [
3✔
606
        "Average is the ratio of sum of values in a set to the number of values in the set"
3✔
607

3✔
608
        "(#(#(10 3) #(20 1) #(30 2)) asDataFrame average) >>> (Dictionary newFrom: {(1 -> 20).(2 -> 2)})"
3✔
609

3✔
610
        ^ self applyToAllColumns: #average
3✔
611
]
3✔
612

3✔
613
{ #category : #'data-types' }
3✔
614
DataFrame >> calculateDataTypes [
3✔
615

3✔
616
        self asArrayOfColumns doWithIndex: [ :column :i |
3✔
617
                self dataTypes
3✔
618
                        at: (self columnNames at: i)
3✔
619
                        put: column calculateDataType ]
3✔
620
]
3✔
621

3✔
622
{ #category : #comparing }
3✔
623
DataFrame >> closeTo: aDataFrame [
3✔
624
        "(#(#(1 2) #(3 4)) asDataFrame closeTo: #(#(1.0001 1.9999) #(3 4.0001)) asDataFrame ) >>> true"
3✔
625

3✔
626
        "(#(#(1 2) #(3 4)) asDataFrame closeTo: #(#(1 1) #(3 4)) asDataFrame ) >>> false"
3✔
627

3✔
628
        aDataFrame species = self species ifFalse: [ ^ false ].
3✔
629

3✔
630
        aDataFrame dimensions = self dimensions ifFalse: [ ^ false ].
3✔
631

3✔
632
        (aDataFrame rowNames = self rowNames and: [
3✔
633
                 aDataFrame columnNames = self columnNames ]) ifFalse: [ ^ false ].
3✔
634

3✔
635
        1 to: self numberOfRows do: [ :i |
3✔
636
                1 to: self numberOfColumns do: [ :j |
3✔
637
                        | value |
3✔
638
                        value := self at: i at: j.
3✔
639
                        (value isNumber
3✔
640
                                 ifTrue: [ value closeTo: (aDataFrame at: i at: j) ]
3✔
641
                                 ifFalse: [ value = (aDataFrame at: i at: j) ]) ifFalse: [
3✔
642
                                ^ false ] ] ].
3✔
643

3✔
644
        ^ true
3✔
645
]
3✔
646

3✔
647
{ #category : #comparing }
3✔
648
DataFrame >> closeTo: aDataFrame precision: epsilon [
3✔
649

3✔
650
        "(#(#(1 2) #(3 4)) asDataFrame closeTo: #(#(1.2 2.19) #(3 4)) asDataFrame precision: 0.2 ) >>> true"
3✔
651

3✔
652
        "(#(#(1 2) #(3 4)) asDataFrame closeTo: #(#(1.21 2) #(3 4)) asDataFrame precision: 0.2 ) >>> false"
3✔
653

3✔
654
        aDataFrame species = self species ifFalse: [ ^ false ].
3✔
655

3✔
656
        aDataFrame dimensions = self dimensions ifFalse: [ ^ false ].
3✔
657

3✔
658
        (aDataFrame rowNames = self rowNames and: [ aDataFrame columnNames = self columnNames ]) ifFalse: [ ^ false ].
3✔
659

3✔
660
        1 to: self numberOfRows do: [ :i |
3✔
661
                1 to: self numberOfColumns do: [ :j |
3✔
662
                        | value |
3✔
663
                        value := self at: i at: j.
3✔
664
                        (value isNumber
3✔
665
                                 ifTrue: [ value closeTo: (aDataFrame at: i at: j) precision: epsilon ]
3✔
666
                                 ifFalse: [ value = (aDataFrame at: i at: j) ]) ifFalse: [ ^ false ] ] ].
3✔
667

3✔
668
        ^ true
3✔
669
]
3✔
670

3✔
671
{ #category : #enumerating }
3✔
672
DataFrame >> collect: aBlock [
3✔
673
        "Overrides the Collection>>collect to create DataFrame with the same number of columns as values in the first row"
3✔
674
        | firstRow newDataFrame |
3✔
675

3✔
676
        firstRow := aBlock value: (self rowAt: 1) copy.
3✔
677
        newDataFrame := self class new: 0@firstRow size.
3✔
678
        newDataFrame columnNames: firstRow keys.
3✔
679

3✔
680
        self do: [:each | newDataFrame add: (aBlock value: each copy)].
3✔
681
        ^ newDataFrame
3✔
682
]
3✔
683

3✔
684
{ #category : #enumerating }
3✔
685
DataFrame >> collectWithIndex: aBlock [
3✔
686
        "Overrides the Collection>>collect to create DataFrame with the same number of columns as values in the first row"
3✔
687
        | firstRow newDataFrame |
3✔
688

3✔
689
        firstRow := aBlock value: (self rowAt: 1) copy value: 1.
3✔
690
        newDataFrame := self class new: 0@firstRow size.
3✔
691
        newDataFrame columnNames: firstRow keys.
3✔
692

3✔
693
        self doWithIndex: [ : each : index | newDataFrame add: (aBlock value: each copy value: index) ].
3✔
694
        ^ newDataFrame
3✔
695
]
3✔
696

3✔
697
{ #category : #accessing }
3✔
698
DataFrame >> column: columnName [
3✔
699
        "Answer the column with columnName as a DataSeries or signal an exception if a column with that name was not found"
3✔
700
        | index |
3✔
701
        index := self indexOfColumnNamed: columnName.
3✔
702
        ^ self columnAt: index
3✔
703
]
3✔
704

3✔
705
{ #category : #accessing }
3✔
706
DataFrame >> column: columnName ifAbsent: exceptionBlock [
3✔
707
        "Answer the column with columnName as a DataSeries or evaluate exception block if a column with that name was not found"
3✔
708
        | index |
3✔
709
        index := self
3✔
710
                indexOfColumnNamed: columnName
3✔
711
                ifAbsent: [ ^ exceptionBlock value ].
3✔
712

3✔
713
        ^ self columnAt: index
3✔
714
]
3✔
715

3✔
716
{ #category : #accessing }
3✔
717
DataFrame >> column: columnName put: anArray [
3✔
718
        "Replace the current values of column with columnName with anArray or signal an exception if a column with that name was not found"
3✔
719
        | index |
3✔
720
        index := self indexOfColumnNamed: columnName.
3✔
721
        ^ self columnAt: index put: anArray
3✔
722
]
3✔
723

3✔
724
{ #category : #accessing }
3✔
725
DataFrame >> column: columnName put: anArray ifAbsent: exceptionBlock [
3✔
726
        "Replace the current values of column with columnName with anArray or evaluate exception block if a column with that name was not found"
3✔
727
        | index |
3✔
728
        index := self
3✔
729
                indexOfColumnNamed: columnName
3✔
730
                ifAbsent: [ ^ exceptionBlock value ].
3✔
731

3✔
732
        ^ self columnAt: index put: anArray
3✔
733
]
3✔
734

3✔
735
{ #category : #accessing }
3✔
736
DataFrame >> column: columnName transform: aBlock [
3✔
737
        "Evaluate aBlock on the column with columnName and replace column with the result. Signal an exception if columnName was not found"
3✔
738
        | column |
3✔
739
        column := self column: columnName.
3✔
740
        self column: columnName put: (aBlock value: column) asArray
3✔
741
]
3✔
742

3✔
743
{ #category : #accessing }
3✔
744
DataFrame >> column: columnName transform: aBlock ifAbsent: exceptionBlock [
3✔
745
        "Evaluate aBlock on the column with columnName and replace column with the result. Evaluate exceptionBlock if columnName was not found"
3✔
746
        | column |
3✔
747
        column := self column: columnName ifAbsent: [ ^ exceptionBlock value ].
3✔
748
        self column: columnName put: (aBlock value: column)
3✔
749
]
3✔
750

3✔
751
{ #category : #accessing }
3✔
752
DataFrame >> columnAt: aNumber [
3✔
753
        "Returns the column of a DataFrame at column index aNumber"
3✔
754

3✔
755
        "(#(#(1 2) #(5 6)) asDataFrame columnAt: 2) >>> (#(2 6) asDataSeries) "
3✔
756

3✔
757
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame columnAt: 2) >>> (#(r1c2 r2c2) asDataSeries) "
3✔
758

3✔
759
        ^ (DataSeries
3✔
760
                   withKeys: self rowNames
3✔
761
                   values: (contents columnAt: aNumber))
3✔
762
                  name: (self columnNames at: aNumber);
3✔
763
                  yourself
3✔
764
]
3✔
765

3✔
766
{ #category : #accessing }
3✔
767
DataFrame >> columnAt: aNumber put: anArray [
3✔
768
        "Replaces the column at column index aNumber with contents of the array anArray"
3✔
769

3✔
770
        "(#(#(1 2) #(3 4)) asDataFrame columnAt: 2 put: #(5 6)) >>> (#(#(1 5) #(3 6)) asDataFrame) "
3✔
771

3✔
772
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame columnAt: 2 put: #(R1C2 R2C2)) >>> (#(#(r1c1 R1C2) #(r2c1 R2C2)) asDataFrame) "
3✔
773

3✔
774
        anArray size = self numberOfRows ifFalse: [ SizeMismatch signal ].
3✔
775

3✔
776
        contents columnAt: aNumber put: anArray
3✔
777
]
3✔
778

3✔
779
{ #category : #accessing }
3✔
780
DataFrame >> columnAt: aNumber transform: aBlock [
3✔
781
        "Evaluate aBlock on the column at aNumber and replace that column with the result"
3✔
782

3✔
783
        "(#(#(1 2) #(3 4)) asDataFrame columnAt: 2 transform: [ :x | x / 2 ]) >>> (#(#(1 1) #(3 2)) asDataFrame) "
3✔
784

3✔
785
        | column |
3✔
786
        column := self columnAt: aNumber.
3✔
787
        self columnAt: aNumber put: (aBlock value: column) asArray
3✔
788
]
3✔
789

3✔
790
{ #category : #accessing }
3✔
791
DataFrame >> columnNames [
3✔
792
        "Returns the column names of a DataFrame"
3✔
793
        
3✔
794
        ^ columnNames
3✔
795
]
3✔
796

3✔
797
{ #category : #accessing }
3✔
798
DataFrame >> columnNames: aCollection [
3✔
799
        "Sets the column names of a DataFrame with contents of the collection aCollection"
3✔
800
        
3✔
801
        | type |
3✔
802
        aCollection size = self numberOfColumns
3✔
803
                ifFalse: [ SizeMismatch signal: 'Wrong number of column names' ].
3✔
804

3✔
805
        aCollection asSet size = aCollection size
3✔
806
                ifFalse: [ Error signal: 'All column names must be distinct' ].
3✔
807

3✔
808
        self columnNames ifNotNil: [
3✔
809
                self columnNames withIndexDo: [ :currentColumnName :i |
3✔
810
                        type := dataTypes at: currentColumnName.
3✔
811
                        dataTypes removeKey: currentColumnName.
3✔
812
                        dataTypes at: (aCollection at: i) put: type ] ].
3✔
813

3✔
814
        columnNames := aCollection asOrderedCollection
3✔
815
]
3✔
816

3✔
817
{ #category : #accessing }
3✔
818
DataFrame >> columns [
3✔
819
        "Returns a collection of all columns"
3✔
820

3✔
821
        "(#(#(1 2) #(3 4)) asDataFrame columns) >>> (#( #(1 3) #(2 4) ) collect: #asDataSeries) "
3✔
822

3✔
823
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame columns) >>> (#( #(r1c1 r2c1) #(r1c2 r2c2) ) collect: #asDataSeries) "
3✔
824

3✔
825
        ^ (1 to: self numberOfColumns) collect: [ :j | self columnAt: j ]
3✔
826
]
3✔
827

3✔
828
{ #category : #accessing }
3✔
829
DataFrame >> columns: anArrayOfNames [
3✔
830
        "Returns a collection of columns whose column names are present in the array anArrayOfNames"
3✔
831
        
3✔
832
        | anArrayOfNumbers |
3✔
833

3✔
834
        anArrayOfNumbers := anArrayOfNames
3✔
835
                collect: [ :name |
3✔
836
                        self indexOfColumnNamed: name ].
3✔
837

3✔
838
        ^ self columnsAt: anArrayOfNumbers
3✔
839
]
3✔
840

3✔
841
{ #category : #accessing }
3✔
842
DataFrame >> columns: anArrayOfColumnNames put: anArrayOfArrays [
3✔
843
        "Replaces the columns whose column names are present in the array anArrayOfColumnNames with the contents of the array of arrays anArrayOfArrays"
3✔
844

3✔
845
        anArrayOfArrays size = anArrayOfColumnNames size
3✔
846
                ifFalse: [ SizeMismatch signal ].
3✔
847

3✔
848
        anArrayOfColumnNames with: anArrayOfArrays do: [ :name :array |
3✔
849
                self column: name put: array ]
3✔
850
]
3✔
851

3✔
852
{ #category : #accessing }
3✔
853
DataFrame >> columnsAt: anArrayOfNumbers [
3✔
854
        "Returns a collection of columns whose column indices are present in the array anArrayOfNumbers"
3✔
855

3✔
856
        "(#(#(1 2 3) #(4 5 6)) asDataFrame columnsAt: #(1 3)) >>> (#(#(1 3) #(4 6)) asDataFrame)"
3✔
857

3✔
858
        "(#(#(r1c1 r1c2 r1c3) #(r2c1 r2c2 r2c3)) asDataFrame columnsAt: #(1 3)) >>> (#(#(r1c1 r1c3) #(r2c1 r2c3)) asDataFrame)"
3✔
859

3✔
860
        | newColumnNames |
3✔
861
        newColumnNames := anArrayOfNumbers collect: [ :i |
3✔
862
                                  self columnNames at: i ].
3✔
863

3✔
864
        ^ DataFrame
3✔
865
                  withDataFrameInternal: (self contents columnsAt: anArrayOfNumbers)
3✔
866
                  rowNames: self rowNames
3✔
867
                  columnNames: newColumnNames
3✔
868
]
3✔
869

3✔
870
{ #category : #accessing }
3✔
871
DataFrame >> columnsAt: anArrayOfNumbers put: anArrayOfArrays [
3✔
872
        "Replaces the columns whose column indices are present in the array anArrayOfNumbers with the contents of the array of arrays anArrayOfArrays"
3✔
873

3✔
874
        "(#(#(1 2 3) #(4 5 6)) asDataFrame columnsAt: #(1 3) put: #(#(10 40) #(30 60))) >>> (#(#(10 2 30) #(40 5 60)) asDataFrame)"
3✔
875

3✔
876
        anArrayOfArrays size = anArrayOfNumbers size ifFalse: [
3✔
877
                SizeMismatch signal ].
3✔
878

3✔
879
        anArrayOfNumbers
3✔
880
                with: anArrayOfArrays
3✔
881
                do: [ :index :array | self columnAt: index put: array ]
3✔
882
]
3✔
883

3✔
884
{ #category : #accessing }
3✔
885
DataFrame >> columnsFrom: begin to: end [
3✔
886
        "Returns a collection of columns whose column indices are present between begin and end"
3✔
887

3✔
888
        "(#(#(1 2 3) #(4 5 6)) asDataFrame columnsFrom: 1 to: 2)  >>> (#(#(1 2) #(4 5)) asDataFrame)"
3✔
889

3✔
890
        "(#(#(r1c1 r1c2 r1c3) #(r2c1 r2c2 r2c3)) asDataFrame columnsFrom: 1 to: 2) >>> (#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame)"
3✔
891

3✔
892
        | array |
3✔
893
        array := begin < end
3✔
894
                         ifTrue: [ (begin to: end) asArray ]
3✔
895
                         ifFalse: [ (end to: begin) asArray reverse ].
3✔
896

3✔
897
        ^ self columnsAt: array
3✔
898
]
3✔
899

3✔
900
{ #category : #accessing }
3✔
901
DataFrame >> columnsFrom: firstNumber to: secondNumber put: anArrayOfArrays [
3✔
902
        "Replaces the columns whose column indices are present between firstNumber and secondNumber with the contents of the array of arrays anArrayOfArrays"
3✔
903

3✔
904
        "(#(#(1 2 3) #(4 5 6)) asDataFrame columnsFrom: 1 to: 2 put:#(#(7 8) #(9 10)))  >>> (#(#(7 9 3) #(8 10 6)) asDataFrame)"
3✔
905

3✔
906
        | interval |
3✔
907
        anArrayOfArrays size = ((firstNumber - secondNumber) abs + 1)
3✔
908
                ifFalse: [ SizeMismatch signal ].
3✔
909

3✔
910
        interval := secondNumber >= firstNumber
3✔
911
                            ifTrue: [ firstNumber to: secondNumber ]
3✔
912
                            ifFalse: [ (secondNumber to: firstNumber) reversed ].
3✔
913

3✔
914
        interval withIndexDo: [ :columnIndex :i |
3✔
915
                self columnAt: columnIndex put: (anArrayOfArrays at: i) ]
3✔
916
]
3✔
917

3✔
918
{ #category : #accessing }
3✔
919
DataFrame >> contents [
3✔
920
        "Returns all the values of the DataFrame"
3✔
921
        
3✔
922
        ^ contents
3✔
923
]
3✔
924

3✔
925
{ #category : #copying }
3✔
926
DataFrame >> copyReplace: missingValue in2DCollectionBy: arrayOfReplacementValues [
3✔
927
        "I am a 2D collection and the goal is to return a copy replace the missing values by the values of my second parameter. The good value is the index of the missing value in the sub collection.
3✔
928

3✔
929
        I am needed for the project pharo-ai/data-imputers. I can work without that method but the time it will take to replace the missing values will be huuuuuuuuuuuge"
3✔
930

3✔
931
        | copy |
3✔
932
        copy := self copy.
3✔
933
        1 to: self numberOfColumns do: [ :columnIndex |
3✔
934
                | replacementValue |
3✔
935
                replacementValue := arrayOfReplacementValues at: columnIndex.
3✔
936
                1 to: self numberOfRows do: [ :rowIndex | (self at: rowIndex at: columnIndex) = missingValue ifTrue: [ self copy at: rowIndex at: columnIndex put: replacementValue ] ] ].
3✔
937
        ^ copy
3✔
938
]
3✔
939

3✔
940
{ #category : #statistics }
3✔
941
DataFrame >> correlationMatrix [
3✔
942
        "Calculate a correlation matrix (correlation of every column with every column) using Pearson's correlation coefficient"
3✔
943
        ^ self correlationMatrixUsing: DataPearsonCorrelationMethod
3✔
944
]
3✔
945

3✔
946
{ #category : #statistics }
3✔
947
DataFrame >> correlationMatrixUsing: aCorrelationCoefficient [
3✔
948
        "Calculate a correlation matrix (correlation of every column with every column) using the given correlation coefficient"
3✔
949

3✔
950
        | numericalColumnNames correlationMatrix firstColumn secondColumn correlation |
3✔
951

3✔
952
        numericalColumnNames := self columnNames select: [ :columnName |
3✔
953
                (self column: columnName) isNumerical ].
3✔
954

3✔
955
        numericalColumnNames ifEmpty: [
3✔
956
                Error signal: 'This data frame does not have any numerical columns' ].
3✔
957

3✔
958
        correlationMatrix := self class
3✔
959
                withRowNames: numericalColumnNames
3✔
960
                columnNames: numericalColumnNames.
3✔
961

3✔
962
        1 to: numericalColumnNames size do: [ :i |
3✔
963
                1 to: i - 1 do: [ :j |
3✔
964
                        firstColumn := self column: (numericalColumnNames at: i).
3✔
965
                        secondColumn := self column: (numericalColumnNames at: j).
3✔
966
                        correlation := firstColumn correlationWith: secondColumn using: aCorrelationCoefficient.
3✔
967

3✔
968
                        correlationMatrix at: i at: j put: correlation.
3✔
969
                        correlationMatrix at: j at: i put: correlation ] ].
3✔
970

3✔
971
        1 to: numericalColumnNames size do: [ :i |
3✔
972
                correlationMatrix at: i at: i put: 1 ].
3✔
973

3✔
974
        ^ correlationMatrix
3✔
975
]
3✔
976

3✔
977
{ #category : #accessing }
3✔
978
DataFrame >> crossTabulate: colName1 with: colName2 [
3✔
979
        "Returns the cross tabulation of a column named colName1 with the column named colName2 of the DataFrame"
3✔
980

3✔
981
        | col1 col2 |
3✔
982

3✔
983
        col1 := self column: colName1.
3✔
984
        col2 := self column: colName2.
3✔
985

3✔
986
        ^ col1 crossTabulateWith: col2
3✔
987
]
3✔
988

3✔
989
{ #category : #copying }
3✔
990
DataFrame >> dataPreProcessingEncodeWith: anEncoder [
3✔
991
        "This method is here to speed up pharo-ai/data-preprocessing algos without coupling both projects."
3✔
992

3✔
993
        | copy cache |
3✔
994
        copy := self copy.
3✔
995
        cache := IdentityDictionary new.
3✔
996
        self columns doWithIndex: [ :dataSerie :columnIndex |
3✔
997
                | category |
3✔
998
                category := cache at: columnIndex ifAbsentPut: [ ((anEncoder categories at: columnIndex) collectWithIndex: [ :elem :index | elem -> index ]) asDictionary ].
3✔
999
                dataSerie doWithIndex: [ :element :rowIndex |
3✔
1000
                        copy at: rowIndex at: columnIndex put: (category at: element ifAbsent: [ AIMissingCategory signalFor: element ]) ] ].
3✔
1001

3✔
1002
        ^ copy
3✔
1003
]
3✔
1004

3✔
1005
{ #category : #'data-types' }
3✔
1006
DataFrame >> dataTypeOfColumn: aColumnName [
3✔
1007
        "Given a column name of the DataFrame, it returns the data type of that column"
3✔
1008
        
3✔
1009
        ^ dataTypes at: aColumnName
3✔
1010
]
3✔
1011

3✔
1012
{ #category : #'data-types' }
3✔
1013
DataFrame >> dataTypeOfColumn: aColumnName put: aDataType [
3✔
1014
        "Given a column name and a data type, it replaces the original data type of that column with the data type that was given as a parameter"
3✔
1015

3✔
1016
        dataTypes at: aColumnName put: aDataType
3✔
1017
]
3✔
1018

3✔
1019
{ #category : #'data-types' }
3✔
1020
DataFrame >> dataTypeOfColumnAt: aNumber [
3✔
1021
        "Given a column index of the DataFrame, it returns the data type of that column"
3✔
1022

3✔
1023
        ^ self dataTypeOfColumn: (columnNames at: aNumber)
3✔
1024
]
3✔
1025

3✔
1026
{ #category : #'data-types' }
3✔
1027
DataFrame >> dataTypeOfColumnAt: aNumber put: aDataType [
3✔
1028
        "Given a column index and a data type, it replaces the original data type of that column with the data type that was given as a parameter"
3✔
1029

3✔
1030
        ^ self dataTypeOfColumn: (columnNames at: aNumber) put: aDataType
3✔
1031
]
3✔
1032

3✔
1033
{ #category : #accessing }
3✔
1034
DataFrame >> dataTypes [
3✔
1035
        "Returns the data types of each column"
3✔
1036
        
3✔
1037
        ^ dataTypes
3✔
1038
]
3✔
1039

3✔
1040
{ #category : #accessing }
3✔
1041
DataFrame >> dataTypes: anObject [
3✔
1042

3✔
1043
        dataTypes := anObject
3✔
1044
]
3✔
1045

3✔
1046
{ #category : #accessing }
3✔
1047
DataFrame >> defaultHeadTailSize [
3✔
1048

3✔
1049
        ^ 5
3✔
1050
]
3✔
1051

3✔
1052
{ #category : #statistics }
3✔
1053
DataFrame >> describe [
3✔
1054
        "Answer another data frame with statistics describing the columns of this data frame"
3✔
1055

3✔
1056
        | content |
3✔
1057
        content := self numericalColumns collect: [ :column |
3✔
1058
                           {
3✔
1059
                                   column countNonNils.
3✔
1060
                                   column average.
3✔
1061
                                   column stdev.
3✔
1062
                                   column min.
3✔
1063
                                   column firstQuartile.
3✔
1064
                                   column secondQuartile.
3✔
1065
                                   column thirdQuartile.
3✔
1066
                                   column max.
3✔
1067
                                   column calculateDataType } ].
3✔
1068

3✔
1069
        ^ self class
3✔
1070
                  withRows: content
3✔
1071
                  rowNames: self numericalColumnNames
3✔
1072
                  columnNames: #( count mean std min '25%' '50%' '75%' max dtype )
3✔
1073
]
3✔
1074

3✔
1075
{ #category : #accessing }
3✔
1076
DataFrame >> dimensions [
3✔
1077
        "Returns the number of rows and number of columns in a DataFrame"
3✔
1078

3✔
1079
        "(#(#(1 2) #(3 4)) asDataFrame dimensions) >>> (2@2)"
3✔
1080

3✔
1081
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame dimensions) >>> (3@2)"
3✔
1082

3✔
1083
        "(#(#(1 2 3) #(4 5 6)) asDataFrame dimensions) >>> (2@3)"
3✔
1084

3✔
1085
        ^ self numberOfRows @ self numberOfColumns
3✔
1086
]
3✔
1087

3✔
1088
{ #category : #enumerating }
3✔
1089
DataFrame >> do: aBlock [
3✔
1090
"We enumerate through the data enrties - through rows of a data frame"
3✔
1091
        | row |
3✔
1092

3✔
1093
        1 to: self numberOfRows do: [ :i |
3✔
1094
                row := self rowAt: i.
3✔
1095
                aBlock value: row.
3✔
1096
                "A hack to allow modification of rows inside do block"
3✔
1097
                self rowAt: i put: row asArray ]
3✔
1098
]
3✔
1099

1100
{ #category : #'find-select' }
1101
DataFrame >> findAll: anObject atColumn: columnName [
3✔
1102
        "Returns rowNames of rows having anObject at columnName"
3✔
1103

3✔
1104
        ^ self rowNames select: [ :row | ((self column: columnName) at: row) = anObject ]
3✔
1105
]
3✔
1106

1107
{ #category : #'find-select' }
1108
DataFrame >> findAllIndicesOf: anObject atColumn: columnName [
3✔
1109
        "Returns indices of rows having anObject at columnName"
3✔
1110
        | output |
3✔
1111
        output := OrderedCollection new.
3✔
1112
        self rowNames withIndexDo: [ :row :index | ((self column: columnName) at: row) = anObject ifTrue: [ output add: index ]].
3✔
1113
        ^ output
3✔
1114
]
3✔
1115

1116
{ #category : #accessing }
1117
DataFrame >> first [
3✔
1118
        "Returns the first row of the DataFrame"
3✔
1119

3✔
1120
        "(#(#(1 2) #(3 4)) asDataFrame first) >>> (#(1 2) asDataSeries)"
3✔
1121

3✔
1122
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame first) >>> (#(r1c1 r1c2) asDataSeries)"
3✔
1123

3✔
1124
        ^ self at: 1
3✔
1125
]
3✔
1126

1127
{ #category : #statistics }
1128
DataFrame >> firstQuartile [
3✔
1129
        "25% of the values in a set are smaller than or equal to the first Quartile of that set"
3✔
1130

3✔
1131
        "(#(#(10 3) #(20 1) #(30 2)) asDataFrame firstQuartile) >>> (Dictionary newFrom: {(1 -> 10).(2 -> 1)})"
3✔
1132

3✔
1133
        ^ self applyToAllColumns: #firstQuartile
3✔
1134
]
3✔
1135

1136
{ #category : #private }
1137
DataFrame >> getJointColumnsWith: aDataFrame [
3✔
1138
        "comment stating purpose of message"
3✔
1139

3✔
1140
        | columnIntersection outputColumns |
3✔
1141
        columnIntersection := (self columnNames intersection: (aDataFrame columnNames)) asSet.
3✔
1142
        outputColumns := OrderedCollection new.
3✔
1143
        self columnNames do: [ :column |
3✔
1144
                (columnIntersection includes: column)
3✔
1145
                        ifTrue: [ outputColumns add: ('' join: {column, '_x'}) ]
3✔
1146
                        ifFalse: [ outputColumns add: column ]
3✔
1147
                        ].
3✔
1148
        aDataFrame columnNames do: [ :column |
3✔
1149
                (columnIntersection includes: column)
3✔
1150
                        ifTrue: [ outputColumns add: ('' join: {column, '_y'}) ]
3✔
1151
                        ifFalse: [ outputColumns add: column ]
3✔
1152
                        ].
3✔
1153

3✔
1154
        ^ outputColumns
3✔
1155
]
3✔
1156

1157
{ #category : #grouping }
1158
DataFrame >> group: anAggregateColumnName by: aGroupColumnName aggregateUsing: aBlock [
3✔
1159
        "Group the values of the cloumn named anAggregateColumnName by the unique values of the column named aGroupColumnName, aggregate them using aBlock. With the same name as anAggregateColumnName"
3✔
1160
        
3✔
1161
        ^ self group: anAggregateColumnName by: aGroupColumnName aggregateUsing: aBlock as: anAggregateColumnName
3✔
1162
]
3✔
1163

1164
{ #category : #grouping }
1165
DataFrame >> group: anAggregateColumnName by: aGroupColumnName aggregateUsing: aBlock as: aNewColumnName [
3✔
1166
        "Group the values of the cloumn named anAggregateColumnName by the unique values of the column named aGroupColumnName, aggregate them using aBlock. With a new column name aNewColumnName"
3✔
1167

3✔
1168
        | groupColumn aggregateColumn |
3✔
1169

3✔
1170
        aGroupColumnName = anAggregateColumnName
3✔
1171
                ifTrue: [ Error signal: 'Can not group a column by itself' ].
3✔
1172

3✔
1173
        groupColumn := self column: aGroupColumnName.
3✔
1174
        aggregateColumn := self column: anAggregateColumnName.
3✔
1175

3✔
1176
        ^ aggregateColumn groupBy: groupColumn aggregateUsing: aBlock as: aNewColumnName
3✔
1177
]
3✔
1178

1179
{ #category : #grouping }
1180
DataFrame >> groupBy: columnName aggregate: anArrayOfUsingAsStatements [
3✔
1181

3✔
1182
        | aggregatedColumns |
3✔
1183

3✔
1184
        aggregatedColumns := anArrayOfUsingAsStatements collect: [ :aBlock |
3✔
1185
                aBlock value: self value: columnName ].
3✔
1186

3✔
1187
        ^ DataFrame
3✔
1188
                withColumns: aggregatedColumns
3✔
1189
                rowNames: aggregatedColumns first keys
3✔
1190
                columnNames: (aggregatedColumns collect: #name)
3✔
1191
]
3✔
1192

1193
{ #category : #replacing }
1194
DataFrame >> hasNils [
3✔
1195
        "Returns true if there is atleast one nil value in the data frame. Returns false if there are no nil values in the dataframe"
3✔
1196

3✔
1197
        "(#(#(nil 2) #(nil 4)) asDataFrame hasNils) >>> true"
3✔
1198

3✔
1199
        "(#(#('nil' 'nil') #('nil' 'nil')) asDataFrame hasNils) >>> false"
3✔
1200

3✔
1201
        "(#(#(nil 'nil') #('nil' 'nil')) asDataFrame hasNils) >>> true"
3✔
1202

3✔
1203
        | arrayOfColumns |
3✔
1204
        arrayOfColumns := self asArrayOfColumns.
3✔
1205
        1 to: self numberOfColumns do: [ :column |
3✔
1206
                1 to: self numberOfRows do: [ :row |
3✔
1207
                ((arrayOfColumns at: column) at: row) ifNil: [ ^ true ] ] ].
3✔
1208
        ^ false
3✔
1209
]
3✔
1210

1211
{ #category : #replacing }
1212
DataFrame >> hasNilsByColumn [
3✔
1213
        "Returns a dictionary which indicates the presence of any nil values column wise"
3✔
1214

3✔
1215
        "(#(#(1 2) #(nil 4)) asDataFrame hasNilsByColumn) >>> (Dictionary newFrom: {(1 -> true).(2 -> false)})"
3✔
1216

3✔
1217
        "(#(#('nil' 'nil') #('nil' 'nil')) asDataFrame hasNilsByColumn) >>> (Dictionary newFrom: {(1 -> false).(2 -> false)})"
3✔
1218

3✔
1219
        "(#(#(nil 'nil') #('nil' 'nil')) asDataFrame hasNilsByColumn) >>> (Dictionary newFrom: {(1 -> true).(2 -> false)})"
3✔
1220

3✔
1221
        | dictionary |
3✔
1222
        dictionary := Dictionary new.
3✔
1223
        self columnNames do: [ :each |
3✔
1224
                dictionary at: each put: (self column: each) hasNil ].
3✔
1225
        ^ dictionary
3✔
1226
]
3✔
1227

1228
{ #category : #accessing }
1229
DataFrame >> head [
3✔
1230
        "Returns the first 5 rows of the DataFrame"
3✔
1231
        
3✔
1232
        ^ self head: self defaultHeadTailSize
3✔
1233
]
3✔
1234

1235
{ #category : #accessing }
1236
DataFrame >> head: aNumber [
3✔
1237
        "Returns the first aNumber rows of a DataFrame"
3✔
1238

3✔
1239
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame head: 2) >>> (#(#(1 2) #(3 4)) asDataFrame)"
3✔
1240

3✔
1241
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame head: 1) >>> (#(#(r1c1 r1c2)) asDataFrame)"
3✔
1242

3✔
1243
        ^ self rowsAt: (1 to: (self numberOfRows min: aNumber))
3✔
1244
]
3✔
1245

1246
{ #category : #accessing }
1247
DataFrame >> indexOfColumnNamed: columnName [
3✔
1248
        "Answer the index of a column with a given name or signal an exception if the column with that name was not found"
3✔
1249
        ^ self
3✔
1250
                indexOfColumnNamed: columnName
3✔
1251
                ifAbsent: [ self error: ('Column ', columnName, ' was not found') ]
3✔
1252
]
3✔
1253

1254
{ #category : #accessing }
1255
DataFrame >> indexOfColumnNamed: columnName ifAbsent: exceptionBlock [
3✔
1256
        "Answer the index of a column with a given name or evaluate the exceptionBlock if the column with that name was not found"
3✔
1257
        ^ self columnNames
3✔
1258
                indexOf: columnName
3✔
1259
                ifAbsent: exceptionBlock
3✔
1260
]
3✔
1261

1262
{ #category : #accessing }
1263
DataFrame >> indexOfRowNamed: rowName [
3✔
1264
        "Answer the index of a row with a given name or signal an exception if the row with that name was not found"
3✔
1265
        ^ self
3✔
1266
                indexOfRowNamed: rowName
3✔
1267
                ifAbsent: [ self error: ('Row ', rowName, ' was not found') ]
3✔
1268
]
3✔
1269

1270
{ #category : #accessing }
1271
DataFrame >> indexOfRowNamed: rowName ifAbsent: exceptionBlock [
3✔
1272
        "Answer the index of a row with a given name or evaluate the exceptionBlock if the row with that name was not found"
3✔
1273
        ^ self rowNames
3✔
1274
                indexOf: rowName
3✔
1275
                ifAbsent: exceptionBlock
3✔
1276
]
3✔
1277

1278
{ #category : #printing }
1279
DataFrame >> info [
3✔
1280
        "Prints the number of entries and number of columns of a data frame. For each column of the data frame, it prints the column index, column name, number of non-nil values in the column and the data type of the contents of the column"
3✔
1281
        
3✔
1282
        ^ String streamContents: [ :aStream |
3✔
1283
                  aStream
3✔
1284
                          nextPutAll: 'DataFrame: ';
3✔
1285
                          print: self size;
3✔
1286
                          nextPutAll: ' entries';
3✔
1287
                          cr;
3✔
1288
                          nextPutAll: 'Data columns (total ';
3✔
1289
                          print: self columnNames size;
3✔
1290
                          nextPutAll: ' columns):';
3✔
1291
                          cr;
3✔
1292
                          nextPutAll: ' # | Column | Non-nil count | Dtype';
3✔
1293
                          cr;
3✔
1294
                          nextPutAll: '---------------------------------------------------';
3✔
1295
                          cr.
3✔
1296
                  self columnNames doWithIndex: [ :col :index |
3✔
1297
                          aStream
3✔
1298
                                  print: index;
3✔
1299
                                  nextPutAll: ' | '.
3✔
1300
                          col isString
3✔
1301
                                  ifTrue: [ aStream nextPutAll: col ]
3✔
1302
                                  ifFalse: [ aStream print: col ].
3✔
1303
                          aStream
3✔
1304
                                  nextPutAll: ' | ';
3✔
1305
                                  print: ((self columnAt: index) reject: #isNil) size;
3✔
1306
                                  nextPutAll: ' non-nil | ';
3✔
1307
                                  print: (self dataTypes at: col);
3✔
1308
                                  cr ] ]
3✔
1309
]
3✔
1310

1311
{ #category : #initialization }
1312
DataFrame >> initialize [
3✔
1313

3✔
1314
        super initialize.
3✔
1315

3✔
1316
        dataTypes := Dictionary new.
3✔
1317
        contents := DataFrameInternal new.
3✔
1318
        self setDefaultRowColumnNames.
3✔
1319
        self calculateDataTypes
3✔
1320
]
3✔
1321

1322
{ #category : #initialization }
1323
DataFrame >> initialize: aPoint [
3✔
1324

3✔
1325
        super initialize.
3✔
1326

3✔
1327
        contents := DataFrameInternal new: aPoint.
3✔
1328
        self setDefaultRowColumnNames.
3✔
1329
        self calculateDataTypes
3✔
1330
]
3✔
1331

1332
{ #category : #initialization }
1333
DataFrame >> initializeColumns: anArrayOfArrays [
3✔
1334

3✔
1335
        contents := DataFrameInternal withColumns: anArrayOfArrays.
3✔
1336
        self setDefaultRowColumnNames.
3✔
1337
        self calculateDataTypes
3✔
1338
]
3✔
1339

1340
{ #category : #initialization }
1341
DataFrame >> initializeContents: aDataFrameInternal rowNames: rows columnNames: columns [
3✔
1342

3✔
1343
        super initialize.
3✔
1344

3✔
1345
        contents := aDataFrameInternal.
3✔
1346
        self rowNames: rows.
3✔
1347
        self columnNames: columns.
3✔
1348
        self calculateDataTypes
3✔
1349
]
3✔
1350

1351
{ #category : #initialization }
1352
DataFrame >> initializeRows: anArrayOfArrays [
3✔
1353

3✔
1354
        contents := DataFrameInternal withRows: anArrayOfArrays.
3✔
1355
        self setDefaultRowColumnNames.
3✔
1356
        self calculateDataTypes
3✔
1357
]
3✔
1358

1359
{ #category : #enumerating }
1360
DataFrame >> inject: thisValue into: binaryBlock [
3✔
1361
        | series |
3✔
1362
        series := super inject: thisValue into: binaryBlock.
3✔
1363
        series name: series defaultName.
3✔
1364
        ^ series
3✔
1365
]
3✔
1366

1367
{ #category : #splitjoin }
1368
DataFrame >> innerJoin: aDataFrame [
3✔
1369
        "Performs inner join on aDataFrame with rowNames as keys"
3✔
1370

3✔
1371
        | outputRows outputDf |
3✔
1372

3✔
1373
        outputDf := self class withColumnNames: (self getJointColumnsWith: aDataFrame).
3✔
1374

3✔
1375
        "Using select instead of intersection to preserve order"
3✔
1376
        outputRows := self rowNames select: [ :row | aDataFrame rowNames includes: row ].
3✔
1377
        outputRows do: [ :rowName |
3✔
1378
                | rowToAdd |
3✔
1379
                rowToAdd := (self row: rowName) asArray, (aDataFrame row: rowName) asArray.
3✔
1380
                outputDf addRow: rowToAdd named: rowName.
3✔
1381
                ].
3✔
1382

3✔
1383
        ^ outputDf
3✔
1384
]
3✔
1385

1386
{ #category : #splitjoin }
1387
DataFrame >> innerJoin: aDataFrame on: aColumnName [
3✔
1388
        "Inner join of self with aDataFrame on a column that has a name aColumnName in both data frames"
3✔
1389
        ^ self innerJoin: aDataFrame onLeft: aColumnName onRight: aColumnName
3✔
1390
]
3✔
1391

1392
{ #category : #splitjoin }
1393
DataFrame >> innerJoin: aDataFrame onLeft: leftColumn onRight: rightColumn [
3✔
1394
        "Performs inner join on aDataFrame with rowNames as keys.
3✔
1395
         rowNames are not preserved.
3✔
1396
         Duplicate column names will be appended with '_x' and '_y'."
3✔
1397

3✔
1398
        | outputRows outputDf |
3✔
1399

3✔
1400
        outputDf := self class withColumnNames: (self getJointColumnsWith: aDataFrame).
3✔
1401

3✔
1402
        "Skip join if any of the dataframe is zero"
3✔
1403
        ((self size isZero) | (aDataFrame size isZero)) ifFalse: [
3✔
1404
                "Using select instead of intersection to preserve order"
3✔
1405
                outputRows := OrderedCollection new.
3✔
1406
                (self column: leftColumn) withIndexDo: [ :ele :index |
3✔
1407
                        ((aDataFrame column: rightColumn) includes: ele) ifTrue: [ outputRows add: index ] ].
3✔
1408
                outputRows do: [ :rowIndex |
3✔
1409
                        | rowsWithSameKey rowToAdd |
3✔
1410
                        rowsWithSameKey := aDataFrame findAllIndicesOf: (self at: rowIndex at: (self indexOfColumnNamed: leftColumn)) atColumn: rightColumn.
3✔
1411
                        rowsWithSameKey do: [ :rightRow |
3✔
1412
                                rowToAdd := (self rowAt: rowIndex) asArray, (aDataFrame rowAt: rightRow) asArray.
3✔
1413
                                outputDf addRow: rowToAdd named: (outputDf size + 1).
3✔
1414
                                ].
3✔
1415
                        ].
3✔
1416
                ].
3✔
1417

3✔
1418
        "Since Key is common, remove duplicate key column if it is of same name"
3✔
1419
        (leftColumn = rightColumn) ifTrue: [
3✔
1420
                outputDf removeColumn: (rightColumn, '_y').
3✔
1421
                outputDf renameColumn: (leftColumn, '_x') to: leftColumn.
3✔
1422
                ].
3✔
1423

3✔
1424
        ^ outputDf
3✔
1425
]
3✔
1426

1427
{ #category : #newtools }
1428
DataFrame >> inspectionItems: aBuilder [
×
1429

×
1430
        <inspectorPresentationOrder: 0 title: 'DataFrame'>
×
1431
        | table |
×
1432
        table := aBuilder newTable.
×
1433

×
1434
        table addColumn: (SpIndexTableColumn new
×
1435
                         title: '#';
×
1436
                         sortFunction: #yourself ascending;
×
1437
                         beNotExpandable;
×
1438
                         yourself).
×
1439
        self rowNames = (1 to: self numberOfRows) asOrderedCollection
×
1440
                ifFalse: [
×
1441
                        table addColumn: (SpStringTableColumn
×
1442
                                         title: ''
×
1443
                                         evaluated: [ :rowWithName | rowWithName at: 1 ]) ].
×
1444

×
1445
        self columnNames doWithIndex: [ :headerName :columnIndex |
×
1446
                table addColumn: (SpStringTableColumn
×
1447
                                 title: headerName
×
1448
                                 evaluated: [ :rowWithName | rowWithName at: columnIndex + 1 ]) ].
×
1449

×
1450
        table items: self asArrayOfRowsWithName.
×
1451

×
1452
        ^ table
×
1453
]
×
1454

1455
{ #category : #statistics }
1456
DataFrame >> interquartileRange [
3✔
1457
        "The Inter Quartile Range is the difference between the third Quartile and the first Quartile"
3✔
1458

3✔
1459
        "(#(#(10 3) #(20 1) #(30 2)) asDataFrame interquartileRange) >>> (Dictionary newFrom: {(1 -> 20).(2 -> 2)})"
3✔
1460

3✔
1461
        ^ self applyToAllColumns: #interquartileRange
3✔
1462
]
3✔
1463

1464
{ #category : #splitjoin }
1465
DataFrame >> leftJoin: aDataFrame [
3✔
1466
        "Performs left join on aDataFrame with rowNames as keys"
3✔
1467

3✔
1468
        | outputDf commonRows |
3✔
1469

3✔
1470
        outputDf := self class withColumnNames: (self getJointColumnsWith: aDataFrame).
3✔
1471
        commonRows := self rowNames intersection: aDataFrame rowNames.
3✔
1472
        self rowNames do: [ :rowName |
3✔
1473
                | rowToAdd |
3✔
1474
                rowToAdd := (commonRows includes: rowName)
3✔
1475
                        ifTrue: [ (self row: rowName) asArray , (aDataFrame row: rowName) asArray ]
3✔
1476
                        ifFalse: [ (self row: rowName) asArray , (Array new: aDataFrame columnNames size) ].
3✔
1477
                outputDf addRow: rowToAdd named: rowName ].
3✔
1478

3✔
1479
        ^ outputDf
3✔
1480
]
3✔
1481

1482
{ #category : #splitjoin }
1483
DataFrame >> leftJoin: aDataFrame on: aColumnName [
3✔
1484
        "Left join of self with aDataFrame on a column that has a name aColumnName in both data frames"
3✔
1485
        ^ self leftJoin: aDataFrame onLeft: aColumnName onRight: aColumnName
3✔
1486
]
3✔
1487

1488
{ #category : #splitjoin }
1489
DataFrame >> leftJoin: aDataFrame onLeft: leftColumn onRight: rightColumn [
3✔
1490
        "Performs left join on aDataFrame with rowNames as keys.
3✔
1491
         rowNames are not preserved.
3✔
1492
         Duplicate column names will be appended with '_x' and '_y'."
3✔
1493

3✔
1494
        | outputDf commonRows |
3✔
1495

3✔
1496
        outputDf := self class withColumnNames: (self getJointColumnsWith: aDataFrame).
3✔
1497

3✔
1498
        commonRows := (self column: leftColumn) asArray intersection: (aDataFrame column: rightColumn) asArray.
3✔
1499

3✔
1500
        1 to: self size do: [ :rowIndex |
3✔
1501
                | rowsWithSameKey rowToAdd |
3✔
1502
                (commonRows includes: (self at: rowIndex at: (self indexOfColumnNamed: leftColumn)))
3✔
1503
                ifTrue: [
3✔
1504
                        "Row present in both df - append rows and add to outputDf"
3✔
1505
                        rowsWithSameKey := aDataFrame findAllIndicesOf: (self at: rowIndex at: (self indexOfColumnNamed: leftColumn)) atColumn: rightColumn.
3✔
1506
                        rowsWithSameKey do: [ :rightRow |
3✔
1507
                                rowToAdd := (self rowAt: rowIndex) asArray, (aDataFrame rowAt: rightRow) asArray.
3✔
1508
                                outputDf addRow: rowToAdd named: (outputDf size + 1).
3✔
1509
                                ].
3✔
1510
                        ]
3✔
1511
                ifFalse: [
3✔
1512
                        "Row present in left-only - append nils and add to outputDf"
3✔
1513
                        rowToAdd := (self rowAt: rowIndex) asArray, (Array new: aDataFrame columnNames size).
3✔
1514
                        outputDf addRow: rowToAdd named: (outputDf size + 1)
3✔
1515
                        ].
3✔
1516
                ].
3✔
1517

3✔
1518
        "Since Key is common, remove duplicate key column if it is of same name"
3✔
1519
        (leftColumn = rightColumn) ifTrue: [
3✔
1520
                outputDf removeColumn: (rightColumn, '_y').
3✔
1521
                outputDf renameColumn: (leftColumn, '_x') to: leftColumn.
3✔
1522
                ].
3✔
1523

3✔
1524
        ^ outputDf
3✔
1525
]
3✔
1526

1527
{ #category : #statistics }
1528
DataFrame >> max [
3✔
1529
        "Max is the largest value present in a set of values"
3✔
1530

3✔
1531
        "(#(#(10 3) #(20 1) #(30 2)) asDataFrame max) >>> (Dictionary newFrom: {(1 -> 30).(2 -> 3)})"
3✔
1532

3✔
1533
        ^ self applyToAllColumns: #max
3✔
1534
]
3✔
1535

1536
{ #category : #statistics }
1537
DataFrame >> median [
3✔
1538
        "50% of data points have a value smaller or equal to the median . The median of a set of values is the middle value of the set when the set is arranged in increasing order."
3✔
1539

3✔
1540
        "(#(#(10 3) #(20 1) #(30 2)) asDataFrame median) >>> (Dictionary newFrom: {(1 -> 20).(2 -> 2)})"
3✔
1541

3✔
1542
        ^ self applyToAllColumns: #median
3✔
1543
]
3✔
1544

1545
{ #category : #statistics }
1546
DataFrame >> min [
3✔
1547
        "Min is the smallest value present in a set of values"
3✔
1548

3✔
1549
        "(#(#(10 3) #(20 1) #(30 2)) asDataFrame min) >>> (Dictionary newFrom: {(1 -> 10).(2 -> 1)})"
3✔
1550

3✔
1551
        ^ self applyToAllColumns: #min
3✔
1552
]
3✔
1553

1554
{ #category : #statistics }
1555
DataFrame >> mode [
3✔
1556
        "The mode of a set of values is the value that appears most often. "
3✔
1557

3✔
1558
        "(#(#(10 3) #(10 1) #(30 3)) asDataFrame mode) >>> (Dictionary newFrom: {(1 -> 10).(2 -> 3)})"
3✔
1559

3✔
1560
        ^ self applyToAllColumns: #mode
3✔
1561
]
3✔
1562

1563
{ #category : #converting }
1564
DataFrame >> normalized [
×
1565
        "This methods returns a new DataFrame, without altering this one, that has all the columns normalized."
×
1566

×
1567
        | normalizers normalizedColumns |
×
1568
        self deprecated:
×
1569
                'DataFrame will remove the dependency over normalization in the next version. You can use pharo-ai/data-preprocessing project to normalize your DataFrame and even more!'.
×
1570
        normalizers := (1 to: self anyOne size) collect: [ :e | self class defaultNormalizerClass new ].
×
1571

×
1572
        normalizedColumns := self asArrayOfColumns with: normalizers collect: [ :col :normalizer | col normalizedUsing: normalizer ].
×
1573

×
1574
        ^ self class withColumns: normalizedColumns columnNames: self columnNames
×
1575
]
×
1576

1577
{ #category : #accessing }
1578
DataFrame >> numberOfColumns [
3✔
1579
        "Returns the number of columns of a DataFrame"
3✔
1580

3✔
1581
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame numberOfColumns) >>> 2 "
3✔
1582

3✔
1583
        "(#(#(1 2 3) #(4 5 6)) asDataFrame numberOfColumns) >>> 3 "
3✔
1584

3✔
1585
        ^ contents numberOfColumns
3✔
1586
]
3✔
1587

1588
{ #category : #replacing }
1589
DataFrame >> numberOfNils [
3✔
1590
        "Returns a dictionary which indicates the number of nil values column wise"
3✔
1591

3✔
1592
        "(#(#(nil 2) #(nil 4)) asDataFrame numberOfNils) >>> (Dictionary newFrom: {(1 -> 2).(2 -> 0)})"
3✔
1593

3✔
1594
        "(#(#('nil' 'nil') #('nil' 'nil')) asDataFrame numberOfNils) >>> (Dictionary newFrom: {(1 -> 0).(2 -> 0)})"
3✔
1595

3✔
1596
        "(#(#(nil 'nil') #('nil' 'nil')) asDataFrame numberOfNils) >>> (Dictionary newFrom: {(1 -> 1).(2 -> 0)})"
3✔
1597

3✔
1598
        | dictionary count |
3✔
1599
        dictionary := Dictionary new.
3✔
1600
        self columnNames do: [ :each |
3✔
1601
                count := (self column: each) count: [ :each2 | each2 isNil ].
3✔
1602
                dictionary at: each put: count ].
3✔
1603
        ^ dictionary
3✔
1604
]
3✔
1605

1606
{ #category : #accessing }
1607
DataFrame >> numberOfRows [
3✔
1608
        "Returns the number of rows of a DataFrame"
3✔
1609

3✔
1610
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame numberOfRows) >>> 3 "
3✔
1611

3✔
1612
        "(#(#(1 2 3) #(4 5 6)) asDataFrame numberOfRows) >>> 2 "
3✔
1613

3✔
1614
        ^ contents numberOfRows
3✔
1615
]
3✔
1616

1617
{ #category : #accessing }
1618
DataFrame >> numericalColumnNames [
3✔
1619
        "Returns the names of all numerical columns of the dataframe"
3✔
1620

3✔
1621
        ^ self columnNames select: [ :columnName |
3✔
1622
                  (self dataTypes at: columnName) includesBehavior: Number ]
3✔
1623
]
3✔
1624

1625
{ #category : #accessing }
1626
DataFrame >> numericalColumns [
3✔
1627
        "Returns all numerical columns of the dataframe"
3✔
1628

3✔
1629
        ^ self columns select: [ :column |
3✔
1630
                  (self dataTypes at: column name) includesBehavior: Number ]
3✔
1631
]
3✔
1632

1633
{ #category : #splitjoin }
1634
DataFrame >> outerJoin: aDataFrame [
3✔
1635
        "Performs outer join on aDataFrame with rowNames as keys"
3✔
1636

3✔
1637
        | outputDf commonRows |
3✔
1638

3✔
1639
        outputDf := self class withColumnNames: (self getJointColumnsWith: aDataFrame).
3✔
1640
        commonRows := self rowNames intersection: aDataFrame rowNames.
3✔
1641
        self rowNames do: [ :rowName |
3✔
1642
                | rowToAdd |
3✔
1643
                rowToAdd := (commonRows includes: rowName)
3✔
1644
                        ifTrue: [ (self row: rowName) asArray , (aDataFrame row: rowName) asArray ]
3✔
1645
                        ifFalse: [ (self row: rowName) asArray , (Array new: aDataFrame columnNames size) ].
3✔
1646
                outputDf addRow: rowToAdd named: rowName ].
3✔
1647

3✔
1648
        aDataFrame rowNames do: [ :rowName |
3✔
1649
                (commonRows includes: rowName)
3✔
1650
                        ifFalse: [ outputDf
3✔
1651
                                addRow: (Array new: self columnNames size) , (aDataFrame row: rowName) asArray
3✔
1652
                                named: rowName ] ].
3✔
1653

3✔
1654
        ^ outputDf
3✔
1655
]
3✔
1656

1657
{ #category : #splitjoin }
1658
DataFrame >> outerJoin: aDataFrame on: aColumnName [
3✔
1659
        "Outer join of self with aDataFrame on a column that has a name aColumnName in both data frames"
3✔
1660
        ^ self outerJoin: aDataFrame onLeft: aColumnName onRight: aColumnName
3✔
1661
]
3✔
1662

1663
{ #category : #splitjoin }
1664
DataFrame >> outerJoin: aDataFrame onLeft: leftColumn onRight: rightColumn [
3✔
1665
        "Performs outer join on aDataFrame with rowNames as keys.
3✔
1666
         rowNames are not preserved.
3✔
1667
         Duplicate column names will be appended with '_x' and '_y'."
3✔
1668

3✔
1669
        | outputDf commonRows leftNils |
3✔
1670

3✔
1671
        outputDf := self class withColumnNames: (self getJointColumnsWith: aDataFrame).
3✔
1672

3✔
1673
        commonRows := (self column: leftColumn) asArray intersection: (aDataFrame column: rightColumn) asArray.
3✔
1674

3✔
1675
        1 to: self size do: [ :rowIndex |
3✔
1676
                | rowsWithSameKey rowToAdd |
3✔
1677
                (commonRows includes: (self at: rowIndex at: (self indexOfColumnNamed: leftColumn)))
3✔
1678
                ifTrue: [
3✔
1679
                        "Row present in both df - append rows and add to outputDf"
3✔
1680
                        rowsWithSameKey := aDataFrame findAllIndicesOf: (self at: rowIndex at: (self indexOfColumnNamed: leftColumn)) atColumn: rightColumn.
3✔
1681
                        rowsWithSameKey do: [ :rightRow |
3✔
1682
                                rowToAdd := (self rowAt: rowIndex) asArray, (aDataFrame rowAt: rightRow) asArray.
3✔
1683
                                outputDf addRow: rowToAdd named: (outputDf size + 1).
3✔
1684
                                ].
3✔
1685
                        ]
3✔
1686
                ifFalse: [
3✔
1687
                        "Row present in left-only - append nils and add to outputDf"
3✔
1688
                        rowToAdd := (self rowAt: rowIndex) asArray, (Array new: aDataFrame columnNames size).
3✔
1689
                        outputDf addRow: rowToAdd named: (outputDf size + 1)
3✔
1690
                        ].
3✔
1691
                ].
3✔
1692

3✔
1693
        1 to: aDataFrame size do: [ :rowIndex |
3✔
1694
                | rowToAdd |
3✔
1695
                (commonRows includes: (aDataFrame at: rowIndex at: (aDataFrame indexOfColumnNamed: rightColumn)))
3✔
1696
                ifFalse: [
3✔
1697
                        "Row present in right-only - construct row and append"
3✔
1698
                        leftNils := self columnNames collect: [ :col |
3✔
1699
                                col = rightColumn
3✔
1700
                                        ifTrue: [ (aDataFrame rowAt: rowIndex) at: rightColumn ]
3✔
1701
                                        ifFalse: [ nil ] ].
3✔
1702
                        rowToAdd := leftNils, (aDataFrame rowAt: rowIndex) asArray.
3✔
1703
                        outputDf addRow: rowToAdd named: (outputDf size + 1).
3✔
1704
                        ].
3✔
1705
                ].
3✔
1706

3✔
1707
        "Since Key is common, remove duplicate key column if it is of same name"
3✔
1708
        (leftColumn = rightColumn) ifTrue: [
3✔
1709
                outputDf removeColumn: (rightColumn, '_y').
3✔
1710
                outputDf renameColumn: (leftColumn, '_x') to: leftColumn.
3✔
1711
                ].
3✔
1712

3✔
1713
        ^ outputDf
3✔
1714
]
3✔
1715

1716
{ #category : #copying }
1717
DataFrame >> postCopy [
3✔
1718

3✔
1719
        super postCopy.
3✔
1720
        contents := contents copy.
3✔
1721
        rowNames := rowNames copy.
3✔
1722
        columnNames := columnNames copy.
3✔
1723
        dataTypes := dataTypes copy
3✔
1724
]
3✔
1725

1726
{ #category : #printing }
1727
DataFrame >> printOn: aStream [
3✔
1728

3✔
1729
        | title |
3✔
1730
        title := self class name.
3✔
1731
        aStream
3✔
1732
                nextPutAll: (title first isVowel ifTrue: ['an '] ifFalse: ['a ']);
3✔
1733
                nextPutAll: title;
3✔
1734
                space;
3✔
1735
                nextPutAll: self dimensions asString
3✔
1736
]
3✔
1737

1738
{ #category : #private }
1739
DataFrame >> privateRowNames: anArray [
3✔
1740
        "I am a private method skipping the assertions when my internal mecanisms know they can skip them."
3✔
1741

3✔
1742
        rowNames := anArray asOrderedCollection
3✔
1743
]
3✔
1744

1745
{ #category : #statistics }
1746
DataFrame >> range [
3✔
1747
        "Range is the difference between the highest value and the lowest value in a set"
3✔
1748

3✔
1749
        "(#(#(10 3) #(20 1) #(30 2)) asDataFrame range) >>> (Dictionary newFrom: {(1 -> 20).(2 -> 2)})"
3✔
1750

3✔
1751
        ^ self applyToAllColumns: #range
3✔
1752
]
3✔
1753

1754
{ #category : #removing }
1755
DataFrame >> removeColumn: columnName [
3✔
1756
        "Removes the column named columnName from a data frame"
3✔
1757
        
3✔
1758
        | index |
3✔
1759
        index := self indexOfColumnNamed: columnName.
3✔
1760
        self removeColumnAt: index
3✔
1761
]
3✔
1762

1763
{ #category : #removing }
1764
DataFrame >> removeColumnAt: columnNumber [
3✔
1765
        "Removes the column at column index columnNumber from a data frame"
3✔
1766

3✔
1767
        "(#(#(1 2) #(3 4)) asDataFrame removeColumnAt: 2) >>> (#(#(1) #(3)) asDataFrame)"
3✔
1768

3✔
1769
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame removeColumnAt: 2) >>> (#(#(r1c1) #(r2c1)) asDataFrame)"
3✔
1770

3✔
1771
        (columnNumber < 1 or: [ columnNumber > self numberOfColumns ])
3✔
1772
                ifTrue: [ SubscriptOutOfBounds signalFor: columnNumber ].
3✔
1773

3✔
1774
        self dataTypes removeKey: (self columnAt: columnNumber) name.
3✔
1775

3✔
1776
        contents removeColumnAt: columnNumber.
3✔
1777
        columnNames := columnNames copyWithoutIndex: columnNumber
3✔
1778
]
3✔
1779

1780
{ #category : #removing }
1781
DataFrame >> removeColumns: aCollectionOfColumnNames [
3✔
1782
        "Removes all columns from a data frame whose names are present in the collection aCollectionOfColumnNames"
3✔
1783

3✔
1784
        aCollectionOfColumnNames do: [ :each |
3✔
1785
                self removeColumn: each.
3✔
1786
                ]
3✔
1787
]
3✔
1788

1789
{ #category : #removing }
1790
DataFrame >> removeColumnsAt: aCollectionOfColumnIndices [
3✔
1791
        "Removes all columns from a data frame whose column indices are present in the collection aCollectionOfColumnIndices"
3✔
1792

3✔
1793
        "(#(#(1 2 3) #(4 5 6)) asDataFrame removeColumnsAt: #(2 3)) >>> (#(#(1) #(4)) asDataFrame)"
3✔
1794

3✔
1795
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame removeColumnsAt: #(1 2)) >>> (#(#() #()) asDataFrame)"
3✔
1796

3✔
1797
        | columnNamesToRemove |
3✔
1798
        columnNamesToRemove := aCollectionOfColumnIndices collect: [ :i |
3✔
1799
                                       columnNames at: i ].
3✔
1800
        self removeColumns: columnNamesToRemove
3✔
1801
]
3✔
1802

1803
{ #category : #removing }
1804
DataFrame >> removeColumnsOfRowElementsSatisfing: aBlock onRowNamed: rowName [
3✔
1805
        "Removes columns from a data frame whose row elements at the row named rowName satisfy a given block"
3✔
1806

3✔
1807
        | index |
3✔
1808
        index := self indexOfRowNamed: rowName.
3✔
1809
        self removeColumnsOfRowElementsSatisfying: aBlock onRow: index
3✔
1810
]
3✔
1811

1812
{ #category : #removing }
1813
DataFrame >> removeColumnsOfRowElementsSatisfying: aBlock onRow: rowNumber [
3✔
1814
        "Removes columns from a data frame whose row elements at the row index rowNumber satisfy a given block"
3✔
1815

3✔
1816
        "(#(#(1 2 3) #(4 5 6)) asDataFrame removeColumnsOfRowElementsSatisfying: [ :x | x > 4 ] onRow: 2) >>> (#(#(1) #(4)) asDataFrame)"
3✔
1817

3✔
1818
        | columnNamesCopy |
3✔
1819
        (rowNumber < 1 or: [ rowNumber > self numberOfRows ]) ifTrue: [
3✔
1820
                SubscriptOutOfBounds signalFor: rowNumber ].
3✔
1821

3✔
1822
        columnNamesCopy := columnNames deepCopy.
3✔
1823
        columnNames removeAll.
3✔
1824
        columnNamesCopy withIndexDo: [ :columnName :j |
3✔
1825
                (aBlock value: (contents at: rowNumber at: j)) ifFalse: [
3✔
1826
                        columnNames add: columnName ] ].
3✔
1827
        contents
3✔
1828
                removeColumnsOfRowElementsSatisfying: aBlock
3✔
1829
                onRow: rowNumber.
3✔
1830

3✔
1831
        self numberOfColumns = 0 ifTrue: [ rowNames removeAll ]
3✔
1832
]
3✔
1833

1834
{ #category : #'handling nils' }
1835
DataFrame >> removeColumnsWithNilsAtRow: rowNumber [
3✔
1836
        "Removes all columns with nil values at row number rowNumber from the data frame"
3✔
1837

3✔
1838
        "(#(#(nil 2) #(3 nil)) asDataFrame removeColumnsWithNilsAtRow: 2) >>> (#(#(nil) #(3)) asDataFrame)"
3✔
1839

3✔
1840
        "(#(#(nil r1c2) #(r2c1 nil)) asDataFrame removeColumnsWithNilsAtRow: 2) >>> (#(#(nil) #(r2c1)) asDataFrame)"
3✔
1841

3✔
1842
        self
3✔
1843
                removeColumnsOfRowElementsSatisfying: [ :ele | ele isNil ]
3✔
1844
                onRow: rowNumber
3✔
1845
]
3✔
1846

1847
{ #category : #'handling nils' }
1848
DataFrame >> removeColumnsWithNilsAtRowNamed: rowName [
3✔
1849
        "Removes all columns with nil values at a row named rowName from the data frame"
3✔
1850

3✔
1851
        self removeColumnsOfRowElementsSatisfing: [ :ele | ele isNil ] onRowNamed: rowName
3✔
1852
]
3✔
1853

1854
{ #category : #removing }
1855
DataFrame >> removeDuplicatedRows [
3✔
1856
        "Removes duplicate rows of a dataframe except the first unique row"
3✔
1857

3✔
1858
        "(#(#(1 2) #(3 4) #(1 2)) asDataFrame removeDuplicatedRows) >>> (#(#(1 2) #(3 4)) asDataFrame)"
3✔
1859

3✔
1860
        "(#(#(r1c1) #(r2c1) #(r2c1) #(r2c1)) asDataFrame removeDuplicatedRows) >>> (#(#(r1c1) #(r2c1)) asDataFrame)"
3✔
1861

3✔
1862
        | numberOfRows nextRowIndex currentRow row aSet |
3✔
1863
        aSet := Set new.
3✔
1864
        numberOfRows := self numberOfRows.
3✔
1865
        1 to: numberOfRows do: [ :currentRowIndex |
3✔
1866
                currentRow := self rowAt: currentRowIndex.
3✔
1867
                nextRowIndex := currentRowIndex + 1.
3✔
1868
                nextRowIndex to: numberOfRows do: [ :index |
3✔
1869
                        row := self rowAt: index.
3✔
1870
                        row values = currentRow values ifTrue: [ aSet add: index ] ] ].
3✔
1871
        ^ self removeRowsAt: aSet
3✔
1872
]
3✔
1873

1874
{ #category : #removing }
1875
DataFrame >> removeRow: rowName [
3✔
1876
        "Removes the row named rowName from a data frame"
3✔
1877

3✔
1878
        | index |
3✔
1879
        index := self indexOfRowNamed: rowName.
3✔
1880
        self removeRowAt: index
3✔
1881
]
3✔
1882

1883
{ #category : #removing }
1884
DataFrame >> removeRowAt: rowNumber [
3✔
1885
        "Removes the row at row index rowNumber from a data frame"
3✔
1886

3✔
1887
        "(#(#(1 2) #(3 4)) asDataFrame removeRowAt: 2) >>> (#(#(1 2)) asDataFrame)"
3✔
1888

3✔
1889
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame removeRowAt: 2) >>> (#(#(r1c1 r1c2)) asDataFrame)"
3✔
1890

3✔
1891
        (rowNumber < 1 or: [ rowNumber > self numberOfRows ]) ifTrue: [
3✔
1892
                SubscriptOutOfBounds signalFor: rowNumber ].
3✔
1893

3✔
1894
        contents removeRowAt: rowNumber.
3✔
1895
        rowNames := rowNames copyWithoutIndex: rowNumber
3✔
1896
]
3✔
1897

1898
{ #category : #removing }
1899
DataFrame >> removeRows: aCollectionOfRowNames [
3✔
1900
        "Removes all rows from a data frame whose names are present in the collection aCollectionOfRowNames"
3✔
1901

3✔
1902
        aCollectionOfRowNames do: [ :each |
3✔
1903
                self removeRow: each ]
3✔
1904
]
3✔
1905

1906
{ #category : #removing }
1907
DataFrame >> removeRowsAt: aCollectionOfRowIndices [
3✔
1908
        "Removes all rows from a data frame whose row indices are present in the collection aCollectionOfRowIndices"
3✔
1909

3✔
1910
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame removeRowsAt: #(2 3)) >>> (#(#(1 2)) asDataFrame)"
3✔
1911

3✔
1912
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame removeRowsAt: #(2)) >>> (#(#(r1c1 r1c2)) asDataFrame)"
3✔
1913

3✔
1914
        | rowNamesToRemove |
3✔
1915
        rowNamesToRemove := aCollectionOfRowIndices collect: [ :i |
3✔
1916
                                    rowNames at: i ].
3✔
1917
        self removeRows: rowNamesToRemove
3✔
1918
]
3✔
1919

1920
{ #category : #removing }
1921
DataFrame >> removeRowsOfColumnElementsSatisfing: aBlock onColumnNamed: columnName [
3✔
1922
        "Removes rows from a data frame whose column elements at the column named columnName satisfy a given block"
3✔
1923

3✔
1924
        | index |
3✔
1925
        index := self indexOfColumnNamed: columnName.
3✔
1926
        self removeRowsOfColumnElementsSatisfying: aBlock onColumn: index
3✔
1927
]
3✔
1928

1929
{ #category : #removing }
1930
DataFrame >> removeRowsOfColumnElementsSatisfying: aBlock onColumn: columnNumber [
3✔
1931
        "Removes rows from a data frame whose column elements at the column index columnNumber satisfy a given block"
3✔
1932

3✔
1933
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame removeRowsOfColumnElementsSatisfying: [ :x | x >= 4 ] onColumn: 2) >>> (#(#(1 2)) asDataFrame)"
3✔
1934

3✔
1935
        | rowNamesCopy |
3✔
1936
        (columnNumber < 1 or: [ columnNumber > self numberOfColumns ])
3✔
1937
                ifTrue: [ SubscriptOutOfBounds signalFor: columnNumber ].
3✔
1938

3✔
1939
        rowNamesCopy := rowNames deepCopy.
3✔
1940
        rowNames removeAll.
3✔
1941
        rowNamesCopy withIndexDo: [ :rowName :i |
3✔
1942
                (aBlock value: (contents at: i at: columnNumber)) ifFalse: [
3✔
1943
                        rowNames add: rowName ] ].
3✔
1944
        contents
3✔
1945
                removeRowsOfColumnElementsSatisfying: aBlock
3✔
1946
                onColumn: columnNumber.
3✔
1947

3✔
1948
        self numberOfRows = 0 ifTrue: [ columnNames removeAll ]
3✔
1949
]
3✔
1950

1951
{ #category : #removing }
1952
DataFrame >> removeRowsWithNils [
3✔
1953
        "Removes all rows from a data frame which have atleast one nil value"
3✔
1954

3✔
1955
        "(#(#(1 2) #(nil 4) #(5 nil)) asDataFrame removeRowsWithNils) >>> (#(#(1 2)) asDataFrame)"
3✔
1956

3✔
1957
        "(#(#(r1c1 r1c2) #(nil r2c2)) asDataFrame removeRowsWithNils) >>> (#(#(r1c1 r1c2)) asDataFrame)"
3✔
1958

3✔
1959
        1 to: self numberOfColumns do: [ :i |
3✔
1960
                self
3✔
1961
                        removeRowsOfColumnElementsSatisfying: [ :ele | ele isNil ]
3✔
1962
                        onColumn: i ]
3✔
1963
]
3✔
1964

1965
{ #category : #'handling nils' }
1966
DataFrame >> removeRowsWithNilsAtColumn: columnNumber [
3✔
1967
        "Removes all rows with nil values at column number columnNumber from the data frame"
3✔
1968

3✔
1969
        "(#(#(nil 2) #(3 nil)) asDataFrame removeRowsWithNilsAtColumn: 2) >>> (#(#(nil 2)) asDataFrame)"
3✔
1970

3✔
1971
        "(#(#(nil r1c2) #(r2c1 nil)) asDataFrame removeRowsWithNilsAtColumn: 2) >>> (#(#(nil r1c2)) asDataFrame)"
3✔
1972

3✔
1973
        self
3✔
1974
                removeRowsOfColumnElementsSatisfying: [ :ele | ele isNil ]
3✔
1975
                onColumn: columnNumber
3✔
1976
]
3✔
1977

1978
{ #category : #'handling nils' }
1979
DataFrame >> removeRowsWithNilsAtColumnNamed: columnName [
3✔
1980
        "Removes all rows with nil values at a column named columnName from the data frame"
3✔
1981

3✔
1982
        self removeRowsOfColumnElementsSatisfing: [ :ele | ele isNil ] onColumnNamed: columnName
3✔
1983
]
3✔
1984

1985
{ #category : #renaming }
1986
DataFrame >> renameColumn: oldName to: newName [
3✔
1987
        "Find a column with oldName and rename it to newName"
3✔
1988
        | index |
3✔
1989
        index := self indexOfColumnNamed: oldName.
3✔
1990
        self columnNames at: index put: newName.
3✔
1991

3✔
1992
        self dataTypes at: newName put: (self dataTypes at: oldName).
3✔
1993
        self dataTypes removeKey: oldName
3✔
1994
]
3✔
1995

1996
{ #category : #renaming }
1997
DataFrame >> renameRow: oldName to: newName [
3✔
1998
        "Find a row with oldName and rename it to newName"
3✔
1999
        | index |
3✔
2000
        index := self indexOfRowNamed: oldName.
3✔
2001
        self rowNames at: index put: newName
3✔
2002
]
3✔
2003

2004
{ #category : #'handling nils' }
2005
DataFrame >> replaceAllNilsWithZeros [
×
2006

×
2007
        self deprecated: 'Use #replaceNilsWithZero instead.' transformWith: '`@receiver replaceAllNilsWithZeros' -> '`@receiver replaceNilsWithZero'.
×
2008

×
2009
        self replaceNilsWithZero
×
2010
]
×
2011

2012
{ #category : #replacing }
2013
DataFrame >> replaceNilsWith: anObject [
3✔
2014
        "Replaces all nil values of a data frame with the object anObject"
3✔
2015

3✔
2016
        "(#(#(nil 2) #(3 nil)) asDataFrame replaceNilsWith: 5) >>> (#(#(5 2) #(3 5)) asDataFrame)"
3✔
2017

3✔
2018
        "(#(#('nil' 'nil') #('nil' 'nil')) asDataFrame replaceNilsWith: 5) >>> (#(#('nil' 'nil') #('nil' 'nil')) asDataFrame)"
3✔
2019

3✔
2020
        "(#(#(nil 'nil') #('nil' 'nil')) asDataFrame replaceNilsWith: 5) >>> (#(#(5 'nil') #('nil' 'nil')) asDataFrame)"
3✔
2021

3✔
2022
        1 to: self numberOfColumns do: [ :columnIndex |
3✔
2023
                1 to: self numberOfRows do: [ :rowIndex |
3✔
2024
                        (self at: rowIndex at: columnIndex) ifNil: [
3✔
2025
                                self at: rowIndex at: columnIndex put: anObject ] ] ]
3✔
2026
]
3✔
2027

2028
{ #category : #replacing }
2029
DataFrame >> replaceNilsWithAverage [
3✔
2030
        "Replaces all nil values of a data frame with the average value of the column in which it is present"
3✔
2031

3✔
2032
        "(#(#(nil 2) #(3 nil) #(5 6)) asDataFrame replaceNilsWithAverage) >>> (#(#(4 2) #(3 4) #(5 6)) asDataFrame)"
3✔
2033

3✔
2034
        "(#(#(1 2) #(3 4)) asDataFrame replaceNilsWithAverage) >>> (#(#(1 2) #(3 4)) asDataFrame)"
3✔
2035

3✔
2036
        | averageOfColumn |
3✔
2037
        1 to: self numberOfColumns do: [ :i |
3✔
2038
                averageOfColumn := ((self columnAt: i) select: [ :ele |
3✔
2039
                                            ele isNotNil ]) average.
3✔
2040
                1 to: self numberOfRows do: [ :j |
3✔
2041
                        (self at: j at: i) ifNil: [ self at: j at: i put: averageOfColumn ] ] ]
3✔
2042
]
3✔
2043

2044
{ #category : #replacing }
2045
DataFrame >> replaceNilsWithMedian [
×
2046
        "Replaces all nil values of a data frame with the median of the column in which it is present"
×
2047

×
2048
        "(#(#(nil 2) #(3 nil) #(5 6) #(7 8)) asDataFrame replaceNilsWithMedian) >>> (#(#(5 2) #(3 6) #(5 6) #(7 8)) asDataFrame)"
×
2049

×
2050
        "(#(#(1 2) #(3 4)) asDataFrame replaceNilsWithMedian) >>> (#(#(1 2) #(3 4)) asDataFrame)"
×
2051

×
2052
        | medianOfColumn |
×
2053
        1 to: self numberOfColumns do: [ :i |
×
2054
                medianOfColumn := ((self columnAt: i) select: [ :ele | ele isNotNil ])
×
2055
                                          median.
×
2056
                1 to: self numberOfRows do: [ :j |
×
2057
                        (self at: j at: i) ifNil: [ self at: j at: i put: medianOfColumn ] ] ]
×
2058
]
×
2059

2060
{ #category : #replacing }
2061
DataFrame >> replaceNilsWithMode [
3✔
2062
        "Replaces all nil values of a data frame with the mode of the column in which it is present"
3✔
2063

3✔
2064
        "(#(#(nil 2) #(3 nil) #(3 2)) asDataFrame replaceNilsWithMode) >>> (#(#(3 2) #(3 2) #(3 2)) asDataFrame)"
3✔
2065

3✔
2066
        "(#(#(1 2) #(3 4)) asDataFrame replaceNilsWithMode) >>> (#(#(1 2) #(3 4)) asDataFrame)"
3✔
2067

3✔
2068
        1 to: self numberOfColumns do: [ :i |
3✔
2069
                | modeOfColumn |
3✔
2070
                1 to: self numberOfRows do: [ :j |
3✔
2071
                        (self at: j at: i) ifNil: [
3✔
2072
                                self at: j at: i put: (modeOfColumn ifNil: [
3✔
2073
                                                 modeOfColumn := ((self columnAt: i) select: [ :ele |
3✔
2074
                                                                          ele isNotNil ]) mode ]) ] ].
3✔
2075
                modeOfColumn := nil ]
3✔
2076
]
3✔
2077

2078
{ #category : #replacing }
2079
DataFrame >> replaceNilsWithNextRowValue [
3✔
2080
        "Replaces all nil values of a data frame with the next non-nil value of the column in which it is present. If there is no non-nil value after it, it is not replaced"
3✔
2081

3✔
2082
        "(#(#(nil 2) #(3 nil)) asDataFrame replaceNilsWithNextRowValue) >>> (#(#(3 2) #(3 nil)) asDataFrame)"
3✔
2083

3✔
2084
        "(#(#(1 2) #(3 4)) asDataFrame replaceNilsWithNextRowValue) >>> (#(#(1 2) #(3 4)) asDataFrame)"
3✔
2085

3✔
2086
        | value numberOfRows |
3✔
2087
        numberOfRows := self numberOfRows.
3✔
2088
        1 to: self numberOfColumns do: [ :i |
3✔
2089
                self numberOfRows to: 1 by: -1 do: [ :j |
3✔
2090
                        j < numberOfRows ifTrue: [
3✔
2091
                                (self at: j at: i) ifNil: [ self at: j at: i put: value ] ].
3✔
2092
                        value := self at: j at: i ] ]
3✔
2093
]
3✔
2094

2095
{ #category : #replacing }
2096
DataFrame >> replaceNilsWithPreviousRowValue [
3✔
2097
        "Replaces all nil values of a data frame with the previous non-nil value of the column in which it is present. If there is no non-nil value before it, it is not replaced"
3✔
2098

3✔
2099
        "(#(#(nil 2) #(3 nil)) asDataFrame replaceNilsWithPreviousRowValue) >>> (#(#(nil 2) #(3 2)) asDataFrame)"
3✔
2100

3✔
2101
        "(#(#(1 2) #(3 4)) asDataFrame replaceNilsWithPreviousRowValue) >>> (#(#(1 2) #(3 4)) asDataFrame)"
3✔
2102

3✔
2103
        | value |
3✔
2104
        1 to: self numberOfColumns do: [ :i |
3✔
2105
                1 to: self numberOfRows do: [ :j |
3✔
2106
                        j > 1 ifTrue: [
3✔
2107
                                (self at: j at: i) ifNil: [ self at: j at: i put: value ] ].
3✔
2108
                        value := self at: j at: i ] ]
3✔
2109
]
3✔
2110

2111
{ #category : #replacing }
2112
DataFrame >> replaceNilsWithZero [
3✔
2113
        "Replaces all nil values of a data frame with zero"
3✔
2114

3✔
2115
        "(#(#(nil 2) #(3 nil)) asDataFrame replaceNilsWithZero) >>> (#(#(0 2) #(3 0)) asDataFrame)"
3✔
2116

3✔
2117
        "(#(#(1 2) #(3 4)) asDataFrame replaceNilsWithZero) >>> (#(#(1 2) #(3 4)) asDataFrame)"
3✔
2118

3✔
2119
        self replaceNilsWith: 0
3✔
2120
]
3✔
2121

2122
{ #category : #splitjoin }
2123
DataFrame >> rightJoin: aDataFrame [
3✔
2124
        "Performs right join on aDataFrame with rowNames as keys"
3✔
2125

3✔
2126
        | outputDf commonRows |
3✔
2127

3✔
2128
        outputDf := self class withColumnNames: (self getJointColumnsWith: aDataFrame).
3✔
2129
        commonRows := self rowNames intersection: aDataFrame rowNames.
3✔
2130

3✔
2131
        aDataFrame rowNames do: [ :rowName |
3✔
2132
                | rowToAdd |
3✔
2133
                rowToAdd := (commonRows includes: rowName)
3✔
2134
                        ifTrue: [ (self row: rowName) asArray , (aDataFrame row: rowName) asArray ]
3✔
2135
                        ifFalse: [ (Array new: self columnNames size) , (aDataFrame row: rowName) asArray ].
3✔
2136
                outputDf addRow: rowToAdd named: rowName ].
3✔
2137

3✔
2138
        ^ outputDf
3✔
2139
]
3✔
2140

2141
{ #category : #splitjoin }
2142
DataFrame >> rightJoin: aDataFrame on: aColumnName [
3✔
2143
        "Right join of self with aDataFrame on a column that has a name aColumnName in both data frames"
3✔
2144
        ^ self rightJoin: aDataFrame onLeft: aColumnName onRight: aColumnName
3✔
2145
]
3✔
2146

2147
{ #category : #splitjoin }
2148
DataFrame >> rightJoin: aDataFrame onLeft: leftColumn onRight: rightColumn [
3✔
2149
        "Performs right join on aDataFrame with rowNames as keys.
3✔
2150
         rowNames are not preserved.
3✔
2151
         Duplicate column names will be appended with '_x' and '_y'."
3✔
2152

3✔
2153
        | outputDf commonRows leftNils |
3✔
2154

3✔
2155
        outputDf := self class withColumnNames: (self getJointColumnsWith: aDataFrame).
3✔
2156

3✔
2157
        commonRows := (self column: leftColumn) asArray intersection: (aDataFrame column: rightColumn) asArray.
3✔
2158

3✔
2159
        1 to: aDataFrame size do: [ :rowIndex |
3✔
2160
                | rowToAdd rowsWithSameKey |
3✔
2161
                (commonRows includes: (aDataFrame at: rowIndex at: (aDataFrame indexOfColumnNamed: rightColumn)))
3✔
2162
                ifTrue: [
3✔
2163
                        "Row present in both df - append rows and add to outputDf"
3✔
2164
                        rowsWithSameKey := self findAllIndicesOf: (aDataFrame at: rowIndex at: (aDataFrame indexOfColumnNamed: rightColumn)) atColumn: leftColumn.
3✔
2165
                        rowsWithSameKey do: [ :leftRow |
3✔
2166
                                rowToAdd := (self rowAt: leftRow) asArray, (aDataFrame rowAt: rowIndex) asArray.
3✔
2167
                                outputDf addRow: rowToAdd named: (outputDf size + 1).
3✔
2168
                                ]
3✔
2169
                        ]
3✔
2170
                ifFalse: [
3✔
2171
                        "Row present in right-only - construct row and append"
3✔
2172
                        leftNils := self columnNames collect: [ :col |
3✔
2173
                                col = rightColumn
3✔
2174
                                        ifTrue: [ (aDataFrame rowAt: rowIndex) at: rightColumn ]
3✔
2175
                                        ifFalse: [ nil ] ].
3✔
2176
                        rowToAdd := leftNils, (aDataFrame rowAt: rowIndex) asArray.
3✔
2177
                        outputDf addRow: rowToAdd named: (outputDf size + 1).
3✔
2178
                        ].
3✔
2179
                ].
3✔
2180

3✔
2181
        "Since Key is common, remove duplicate key column if it is of same name"
3✔
2182
        (leftColumn = rightColumn) ifTrue: [
3✔
2183
                outputDf removeColumn: (rightColumn, '_y').
3✔
2184
                outputDf renameColumn: (leftColumn, '_x') to: leftColumn.
3✔
2185
                ].
3✔
2186

3✔
2187
        ^ outputDf
3✔
2188
]
3✔
2189

2190
{ #category : #accessing }
2191
DataFrame >> row: rowName [
3✔
2192
        "Answer the row with rowName as a DataSeries or signal an exception if a row with that name was not found"
3✔
2193
        | index |
3✔
2194
        index := self indexOfRowNamed: rowName.
3✔
2195
        ^ self rowAt: index
3✔
2196
]
3✔
2197

2198
{ #category : #accessing }
2199
DataFrame >> row: rowName ifAbsent: exceptionBlock [
3✔
2200
        "Answer the row with rowName as a DataSeries or evaluate exception block if a row with that name was not found"
3✔
2201
        | index |
3✔
2202
        index := self
3✔
2203
                indexOfRowNamed: rowName
3✔
2204
                ifAbsent: [ ^ exceptionBlock value ].
3✔
2205

3✔
2206
        ^ self rowAt: index
3✔
2207
]
3✔
2208

2209
{ #category : #accessing }
2210
DataFrame >> row: rowName put: anArray [
3✔
2211
        "Replace the current values of row with rowName with anArray or signal an exception if a row with that name was not found"
3✔
2212
        | index |
3✔
2213
        index := self indexOfRowNamed: rowName.
3✔
2214
        ^ self rowAt: index put: anArray
3✔
2215
]
3✔
2216

2217
{ #category : #accessing }
2218
DataFrame >> row: rowName put: anArray ifAbsent: exceptionBlock [
3✔
2219
        "Replace the current values of row with rowName with anArray or evaluate exception block if a row with that name was not found"
3✔
2220
        | index |
3✔
2221
        index := self
3✔
2222
                indexOfRowNamed: rowName
3✔
2223
                ifAbsent: [ ^ exceptionBlock value ].
3✔
2224

3✔
2225
        ^ self rowAt: index put: anArray
3✔
2226
]
3✔
2227

2228
{ #category : #accessing }
2229
DataFrame >> row: rowName transform: aBlock [
3✔
2230
        "Evaluate aBlock on the row with rowName and replace row with the result. Signal an exception if rowName was not found"
3✔
2231
        | row |
3✔
2232
        row := self row: rowName.
3✔
2233
        self row: rowName put: (aBlock value: row) asArray
3✔
2234
]
3✔
2235

2236
{ #category : #accessing }
2237
DataFrame >> row: rowName transform: aBlock ifAbsent: exceptionBlock [
3✔
2238
        "Evaluate aBlock on the row with rowName and replace row with the result. Evaluate exceptionBlock if rowName was not found"
3✔
2239
        | row |
3✔
2240
        row := self row: rowName ifAbsent: [ ^ exceptionBlock value ].
3✔
2241
        self row: rowName put: (aBlock value: row)
3✔
2242
]
3✔
2243

2244
{ #category : #accessing }
2245
DataFrame >> rowAt: aNumber [
3✔
2246
        "Returns the row of a DataFrame at row index aNumber"
3✔
2247

3✔
2248
        "(#(#(1 2) #(5 6)) asDataFrame rowAt: 2) >>> (#(5 6) asDataSeries) "
3✔
2249

3✔
2250
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame rowAt: 2) >>> (#(r2c1 r2c2) asDataSeries) "
3✔
2251

3✔
2252
        | series |
3✔
2253
        series := (contents rowAt: aNumber) asDataSeries.
3✔
2254
        series name: (self rowNames at: aNumber).
3✔
2255
        series keys: self columnNames.
3✔
2256
        ^ series
3✔
2257
]
3✔
2258

2259
{ #category : #accessing }
2260
DataFrame >> rowAt: aNumber put: anArray [
3✔
2261
        "Replaces the row at row index aNumber with contents of the array anArray"
3✔
2262

3✔
2263
        "(#(#(1 2) #(3 4)) asDataFrame rowAt: 2 put: #(5 6)) >>> (#(#(1 2) #(5 6)) asDataFrame) "
3✔
2264

3✔
2265
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame rowAt: 2 put: #(R2C1 R2C2)) >>> (#(#(r1c1 r1c2) #(R2C1 R2C2)) asDataFrame) "
3✔
2266

3✔
2267
        anArray size = self numberOfColumns ifFalse: [ SizeMismatch signal ].
3✔
2268

3✔
2269
        contents rowAt: aNumber put: anArray
3✔
2270
]
3✔
2271

2272
{ #category : #accessing }
2273
DataFrame >> rowAt: aNumber transform: aBlock [
3✔
2274
        "Evaluate aBlock on the row at aNumber and replace that row with the result"
3✔
2275

3✔
2276
        "(#(#(1 2) #(3 4)) asDataFrame rowAt: 2 transform: [ :x | x + 1 ]) >>> (#(#(1 2) #(4 5)) asDataFrame) "
3✔
2277

3✔
2278
        | row |
3✔
2279
        row := self rowAt: aNumber.
3✔
2280
        self rowAt: aNumber put: (aBlock value: row) asArray
3✔
2281
]
3✔
2282

2283
{ #category : #accessing }
2284
DataFrame >> rowNames [
3✔
2285
        "Returns the row names of a DataFrame"
3✔
2286
        
3✔
2287
        ^ rowNames
3✔
2288
]
3✔
2289

2290
{ #category : #accessing }
2291
DataFrame >> rowNames: anArray [
3✔
2292
        "Sets the row names of a DataFrame with contents of the collection aCollection"
3✔
2293

3✔
2294
        anArray size = self numberOfRows ifFalse: [ SizeMismatch signal: 'Wrong number of row names' ].
3✔
2295

3✔
2296
        anArray asSet size = anArray size ifFalse: [ Error signal: 'All row names must be distinct' ].
3✔
2297

3✔
2298
        self privateRowNames: anArray
3✔
2299
]
3✔
2300

2301
{ #category : #accessing }
2302
DataFrame >> rows [
3✔
2303
        "Returns a collection of all rows"
3✔
2304

3✔
2305
        "(#(#(1 2) #(3 4)) asDataFrame rows) >>> (#( #(1 2) #(3 4) ) collect: #asDataSeries) "
3✔
2306

3✔
2307
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame rows) >>> (#( #(r1c1 r1c2) #(r2c1 r2c2) ) collect: #asDataSeries) "
3✔
2308

3✔
2309
        ^ (1 to: self numberOfRows) collect: [ :j | self rowAt: j ]
3✔
2310
]
3✔
2311

2312
{ #category : #accessing }
2313
DataFrame >> rows: anArrayOfNames [
3✔
2314
        "Returns a collection of rows whose row names are present in the array anArrayOfNames"
3✔
2315

3✔
2316
        | anArrayOfNumbers |
3✔
2317

3✔
2318
        anArrayOfNumbers := anArrayOfNames
3✔
2319
                collect: [ :name |
3✔
2320
                        self indexOfRowNamed: name ].
3✔
2321

3✔
2322
        ^ self rowsAt: anArrayOfNumbers
3✔
2323
]
3✔
2324

2325
{ #category : #accessing }
2326
DataFrame >> rows: anArrayOfRowNames put: anArrayOfArrays [
3✔
2327
        "Replaces the rows whose row names are present in the array anArrayOfRowNames with the contents of the array of arrays anArrayOfArrays"
3✔
2328

3✔
2329
        anArrayOfArrays size = anArrayOfRowNames size
3✔
2330
                ifFalse: [ SizeMismatch signal ].
3✔
2331

3✔
2332
        anArrayOfRowNames with: anArrayOfArrays do: [ :name :array |
3✔
2333
                self row: name put: array ]
3✔
2334
]
3✔
2335

2336
{ #category : #accessing }
2337
DataFrame >> rowsAt: anArrayOfNumbers [
3✔
2338
        "Returns a collection of rows whose row indices are present in the array anArrayOfNumbers"
3✔
2339

3✔
2340
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame rowsAt: #(1 3)) >>> (#(#(1 2) #(5 6)) asDataFrame)"
3✔
2341

3✔
2342
        "(#(#(r1c1 r1c2) #(r2c1 r2c2) #(r3c1 r3c2)) asDataFrame rowsAt: #(1 3)) >>> (#(#(r1c1 r1c2) #(r3c1 r3c2)) asDataFrame)"
3✔
2343

3✔
2344
        | newRowNames |
3✔
2345
        newRowNames := anArrayOfNumbers collect: [ :i | self rowNames at: i ].
3✔
2346

3✔
2347
        ^ DataFrame
3✔
2348
                  withDataFrameInternal: (self contents rowsAt: anArrayOfNumbers)
3✔
2349
                  rowNames: newRowNames
3✔
2350
                  columnNames: self columnNames
3✔
2351
]
3✔
2352

2353
{ #category : #accessing }
2354
DataFrame >> rowsAt: anArrayOfNumbers put: anArrayOfArrays [
3✔
2355
        "Replaces the rows whose row indices are present in the array anArrayOfNumbers with the contents of the array of arrays anArrayOfArrays"
3✔
2356

3✔
2357
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame rowsAt: #(1 3) put: #((10 20)(50 60))) >>> (#(#(10 20) #(3 4) #(50 60)) asDataFrame)"
3✔
2358

3✔
2359
        anArrayOfArrays size = anArrayOfNumbers size ifFalse: [
3✔
2360
                SizeMismatch signal ].
3✔
2361

3✔
2362
        anArrayOfNumbers
3✔
2363
                with: anArrayOfArrays
3✔
2364
                do: [ :index :array | self rowAt: index put: array ]
3✔
2365
]
3✔
2366

2367
{ #category : #accessing }
2368
DataFrame >> rowsFrom: begin to: end [
3✔
2369
        "Returns a collection of rows whose row indices are present between begin and end"
3✔
2370

3✔
2371
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame rowsFrom: 1 to: 2) >>> (#(#(1 2) #(3 4)) asDataFrame)"
3✔
2372

3✔
2373
        "(#(#(r1c1 r1c2) #(r2c1 r2c2) #(r3c1 r3c2)) asDataFrame rowsFrom: 1 to: 2) >>> (#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame)"
3✔
2374

3✔
2375
        ^ self rowsAt: (begin to: end)
3✔
2376
]
3✔
2377

2378
{ #category : #accessing }
2379
DataFrame >> rowsFrom: firstNumber to: secondNumber put: anArrayOfArrays [
3✔
2380
        "Replaces the rows whose row indices are present between firstNumber and secondNumber with the contents of the array of arrays anArrayOfArrays"
3✔
2381

3✔
2382
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame rowsFrom: 1 to: 2 put: #(#(7 8) #(9 10))) >>> (#(#(7 8) #(9 10) #(5 6)) asDataFrame)"
3✔
2383

3✔
2384
        | interval |
3✔
2385
        anArrayOfArrays size = ((firstNumber - secondNumber) abs + 1)
3✔
2386
                ifFalse: [ SizeMismatch signal ].
3✔
2387

3✔
2388
        interval := secondNumber >= firstNumber
3✔
2389
                            ifTrue: [ firstNumber to: secondNumber ]
3✔
2390
                            ifFalse: [ (secondNumber to: firstNumber) reversed ].
3✔
2391

3✔
2392
        interval withIndexDo: [ :rowIndex :i |
3✔
2393
                self rowAt: rowIndex put: (anArrayOfArrays at: i) ]
3✔
2394
]
3✔
2395

2396
{ #category : #enumerating }
2397
DataFrame >> select: aBlock [
3✔
2398
        "Evaluate aBlock with each of the receiver's elements as the argument.
3✔
2399
        Collect into a new collection like the receiver, only those elements for
3✔
2400
        which aBlock evaluates to true. Answer the new collection."
3✔
2401

3✔
2402
        | selectedIndexes |
3✔
2403

3✔
2404
        selectedIndexes := (1 to: self numberOfRows) select: [ :index |
3✔
2405
                aBlock value: (self at: index) ].
3✔
2406

3✔
2407
        ^ self rowsAt: selectedIndexes
3✔
2408
]
3✔
2409

2410
{ #category : #private }
2411
DataFrame >> setDefaultRowColumnNames [
3✔
2412

3✔
2413
        self privateRowNames: (1 to: self numberOfRows).
3✔
2414
        self columnNames: (1 to: self numberOfColumns)
3✔
2415
]
3✔
2416

2417
{ #category : #accessing }
2418
DataFrame >> size [
3✔
2419
        "Returns the number of rows of a DataFrame"
3✔
2420

3✔
2421
        "(#(#(1 2) #(3 4) #(5 6)) asDataFrame size) >>> 3 "
3✔
2422

3✔
2423
        "(#(#(1 2 3) #(4 5 6)) asDataFrame size) >>> 2 "
3✔
2424

3✔
2425
        "(#() asDataFrame size) >>> 0 "
3✔
2426

3✔
2427
        ^ self numberOfRows
3✔
2428
]
3✔
2429

2430
{ #category : #sorting }
2431
DataFrame >> sortBy: columnName [
3✔
2432
        "Rearranges the rows of the data frame in ascending order of the values in the column named columnName"
3✔
2433

3✔
2434
        "(#(#(3 2) #(1 4) #(2 4)) asDataFrame sortBy: 1) >>> (#(#(1 4) #(2 4) #(3 2)) asDataFrame)"
3✔
2435

3✔
2436
        "(#(#(3 2) #(1 4) #(2 4)) asDataFrame sortBy: 2) >>> (#(#(3 2) #(1 4) #(2 4)) asDataFrame)"
3✔
2437

3✔
2438
        self sortBy: columnName using: [ :a :b | a <= b ]
3✔
2439
]
3✔
2440

2441
{ #category : #sorting }
2442
DataFrame >> sortBy: columnName using: aBlock [
3✔
2443
        "Rearranges the rows of the data frame by applying the given block on the column named columnName"
3✔
2444

3✔
2445
        "(#(#(3 2) #(1 4) #(2 4)) asDataFrame sortBy: 1 using: [ :a :b | a <= b ]) >>> (#(#(1 4) #(2 4) #(3 2)) asDataFrame)"
3✔
2446

3✔
2447
        "(#(#(3 2) #(1 4) #(2 4)) asDataFrame sortBy: 2 using: [ :a :b | a <= b ]) >>> (#(#(3 2) #(1 4) #(2 4)) asDataFrame)"
3✔
2448

3✔
2449
        | column sortedKeys newContents |
3✔
2450
        column := self column: columnName.
3✔
2451
        column := column copy.
3✔
2452
        column sort: aBlock.
3✔
2453
        sortedKeys := column keys.
3✔
2454

3✔
2455
        newContents := DataFrameInternal new: self dimensions.
3✔
2456

3✔
2457
        sortedKeys withIndexDo: [ :key :i |
3✔
2458
                newContents rowAt: i put: (self row: key) asArray ].
3✔
2459

3✔
2460
        contents := newContents.
3✔
2461
        self rowNames: sortedKeys
3✔
2462
]
3✔
2463

2464
{ #category : #sorting }
2465
DataFrame >> sortByAll: arrayOfColumnNames [
3✔
2466
        " Chain sorts the data frame in ascending order. The data frame is sorted based on the first column in the array of column names, if there are same values, then it sorts these same values based on the values of the second column and so on.."
3✔
2467

3✔
2468
        "(#(#(3 2) #(1 4) #(2 4)) asDataFrame sortByAll: #(1 2)) >>> (#(#(1 4) #(2 4) #(3 2)) asDataFrame)"
3✔
2469

3✔
2470
        "(#(#(3 2) #(1 4) #(2 4)) asDataFrame sortByAll: #(2 1)) >>> (#(#(3 2) #(1 4) #(2 4)) asDataFrame)"
3✔
2471

3✔
2472
        arrayOfColumnNames reverseDo: [ :columnName |
3✔
2473
                self sortBy: columnName using: [ :a :b | a <= b ] ].
3✔
2474
        ^ self
3✔
2475
]
3✔
2476

2477
{ #category : #sorting }
2478
DataFrame >> sortByRowNames [
3✔
2479
        "Sorts the rows of the data frame based on the row names in ascending order"
3✔
2480

3✔
2481
        self sortByRowNamesUsing: [ :a :b | a <= b ]
3✔
2482
]
3✔
2483

2484
{ #category : #sorting }
2485
DataFrame >> sortByRowNamesUsing: aBlock [
3✔
2486
        "Sorts the rows of the data frame based on the row names using the given comparison block"
3✔
2487

3✔
2488
        | sortedKeys newContents |
3✔
2489
        sortedKeys := self rowNames sorted: aBlock.
3✔
2490

3✔
2491
        newContents := DataFrameInternal new: self dimensions.
3✔
2492

3✔
2493
        sortedKeys withIndexDo: [ :key :i |
3✔
2494
                newContents rowAt: i put: (self row: key) asArray ].
3✔
2495

3✔
2496
        contents := newContents.
3✔
2497
        self rowNames: sortedKeys
3✔
2498
]
3✔
2499

2500
{ #category : #sorting }
2501
DataFrame >> sortDescendingBy: columnName [
3✔
2502
        "Rearranges the rows of the data frame in descending order of the values in the column named columnName"
3✔
2503

3✔
2504
        "(#(#(3 2) #(1 4) #(2 4)) asDataFrame sortDescendingBy: 1) >>> (#(#(3 2) #(2 4) #(1 4)) asDataFrame)"
3✔
2505

3✔
2506
        "(#(#(3 2) #(1 4) #(2 4)) asDataFrame sortDescendingBy: 2) >>> (#(#(1 4) #(2 4) #(3 2)) asDataFrame)"
3✔
2507

3✔
2508
        self sortBy: columnName using: [ :a :b | a >= b ]
3✔
2509
]
3✔
2510

2511
{ #category : #sorting }
2512
DataFrame >> sortDescendingByAll: arrayOfColumnNames [
3✔
2513
        " Chain sorts the data frame in descending order. The data frame is sorted based on the first column in the array of column names, if there are same values, then it sorts these same values based on the values of the second column and so on.."
3✔
2514

3✔
2515
        "(#(#(3 2) #(1 4) #(2 4)) asDataFrame sortDescendingByAll: #(1 2)) >>> (#(#(3 2) #(2 4) #(1 4)) asDataFrame)"
3✔
2516

3✔
2517
        "(#(#(3 2) #(1 4) #(2 4)) asDataFrame sortDescendingByAll: #(2 1)) >>> (#(#(2 4) #(1 4) #(3 2)) asDataFrame)"
3✔
2518

3✔
2519
        arrayOfColumnNames reverseDo: [ :columnName |
3✔
2520
                self sortBy: columnName using: [ :a :b | a >= b ] ].
3✔
2521
        ^ self
3✔
2522
]
3✔
2523

2524
{ #category : #sorting }
2525
DataFrame >> sortDescendingByRowNames [
3✔
2526
        "Sorts the rows of the data frame based on the row names in descending order"
3✔
2527

3✔
2528
        self sortByRowNamesUsing: [ :a :b | a >= b ]
3✔
2529
]
3✔
2530

2531
{ #category : #statistics }
2532
DataFrame >> stdev [
3✔
2533
        "Standard deviation is a measure of how dispersed the data is in relation to the average"
3✔
2534

3✔
2535
        "(#(#(10 3) #(20 1) #(30 2)) asDataFrame stdev) >>> (Dictionary newFrom: {(1 -> 10).(2 -> 1)})"
3✔
2536

3✔
2537
        ^ self applyToAllColumns: #stdev
3✔
2538
]
3✔
2539

2540
{ #category : #accessing }
2541
DataFrame >> tail [
3✔
2542
        "Returns the last 5 rows of a DataFrame"
3✔
2543
        
3✔
2544
        ^ self tail: self defaultHeadTailSize
3✔
2545
]
3✔
2546

2547
{ #category : #accessing }
2548
DataFrame >> tail: aNumber [
3✔
2549
        "Returns the last aNumber rows of aDataFrame"
3✔
2550
        | rows |
3✔
2551
        rows := self numberOfRows.
3✔
2552

3✔
2553
        ^ self rowsAt: (rows - (rows min: aNumber) + 1 to: rows)
3✔
2554
]
3✔
2555

2556
{ #category : #statistics }
2557
DataFrame >> thirdQuartile [
3✔
2558
        "75% of the values in a set are smaller than or equal to the third Quartile of that set"
3✔
2559

3✔
2560
        "(#(#(10 3) #(20 1) #(30 2)) asDataFrame thirdQuartile) >>> (Dictionary newFrom: {(1 -> 30).(2 -> 3)})"
3✔
2561

3✔
2562
        ^ self applyToAllColumns: #thirdQuartile
3✔
2563
]
3✔
2564

2565
{ #category : #applying }
2566
DataFrame >> toColumn: columnName applyElementwise: aBlock [
3✔
2567
        "Applies a given block to a column named columnName of a data frame"
3✔
2568

3✔
2569
        | column |
3✔
2570
        column := (self column: columnName) asArray.
3✔
2571
        column := column collect: [ :each | aBlock value: each ].
3✔
2572
        self column: columnName put: column asArray
3✔
2573
]
3✔
2574

2575
{ #category : #applying }
2576
DataFrame >> toColumnAt: columnNumber applyElementwise: aBlock [
3✔
2577
        "Applies a given block to a column whose column index is columnNumber of a data frame"
3✔
2578

3✔
2579
        "(#(#(1 2) #(3 4)) asDataFrame toColumnAt: 1 applyElementwise:[ :x | x - 1 ]) >>> (#(#(0 2) #(2 4)) asDataFrame)"
3✔
2580

3✔
2581
        | columnName |
3✔
2582
        columnName := self columnNames at: columnNumber.
3✔
2583
        ^ self toColumn: columnName applyElementwise: aBlock
3✔
2584
]
3✔
2585

2586
{ #category : #applying }
2587
DataFrame >> toColumns: arrayOfColumnNames applyElementwise: aBlock [
3✔
2588
        "Applies a given block to columns whose names are present in the array arrayOfColumnNames of a data frame"
3✔
2589

3✔
2590
        arrayOfColumnNames do: [ :each |
3✔
2591
                self toColumn: each applyElementwise: aBlock ]
3✔
2592
]
3✔
2593

2594
{ #category : #applying }
2595
DataFrame >> toColumnsAt: arrayOfColumnNumbers applyElementwise: aBlock [
3✔
2596
        "Applies a given block to columns whose indices are present in the array arrayOfColumnNumbers of a data frame"
3✔
2597

3✔
2598
        "(#(#(1 2) #(3 4)) asDataFrame toColumnsAt: #(1 2) applyElementwise:[ :x | x - 1 ]) >>> (#(#(0 1) #(2 3)) asDataFrame)"
3✔
2599

3✔
2600
        arrayOfColumnNumbers do: [ :each |
3✔
2601
                self toColumnAt: each applyElementwise: aBlock ]
3✔
2602
]
3✔
2603

2604
{ #category : #converting }
2605
DataFrame >> toHtml [
3✔
2606
        "Prints the DataFrame as an HTML formatted table"
3✔
2607

3✔
2608
        | html columnWidths dataFrame |
3✔
2609
        dataFrame := self copy.
3✔
2610
        dataFrame addColumn: dataFrame rowNames named: '#' atPosition: 1.
3✔
2611
        html := WriteStream on: String new.
3✔
2612
        html
3✔
2613
                nextPutAll: '<table border="1" class="dataframe">';
3✔
2614
                cr;
3✔
2615
                nextPutAll: '  <thead>';
3✔
2616
                cr;
3✔
2617
                nextPutAll: '    <tr style="text-align: left;">'.
3✔
2618

3✔
2619
        columnWidths := dataFrame columnNames collect: [ :columnName |
3✔
2620
                                | maxWidth |
3✔
2621
                                maxWidth := columnName asString size.
3✔
2622
                                dataFrame rows do: [ :row |
3✔
2623
                                        | value |
3✔
2624
                                        value := row at: columnName.
3✔
2625
                                        maxWidth := maxWidth max: value printString size ].
3✔
2626
                                maxWidth ].
3✔
2627

3✔
2628
        dataFrame columnNames withIndexDo: [ :columnName :index |
3✔
2629
                | paddedColumnName |
3✔
2630
                paddedColumnName := columnName asString padRightTo: (columnWidths at: index).
3✔
2631
                html
3✔
2632
                        nextPutAll: '      <th>';
3✔
2633
                        nextPutAll: paddedColumnName;
3✔
2634
                        nextPutAll: '</th>';
3✔
2635
                        cr ].
3✔
2636

3✔
2637
        html
3✔
2638
                nextPutAll: '    </tr>';
3✔
2639
                cr;
3✔
2640
                nextPutAll: '  </thead>';
3✔
2641
                cr;
3✔
2642
                nextPutAll: '  <tbody>';
3✔
2643
                cr.
3✔
2644

3✔
2645
        dataFrame asArrayOfRows do: [ :row |
3✔
2646
                html nextPutAll: '    <tr>'.
3✔
2647

3✔
2648
                row withIndexDo: [ :value :index |
3✔
2649
                        | paddedValue |
3✔
2650
                        paddedValue := value printString padRightTo:
3✔
2651
                                               (columnWidths at: index).
3✔
2652
                        index = 1
3✔
2653
                                ifFalse: [
3✔
2654
                                        html
3✔
2655
                                                nextPutAll: '      <td>';
3✔
2656
                                                nextPutAll: paddedValue;
3✔
2657
                                                nextPutAll: '</td>';
3✔
2658
                                                cr ]
3✔
2659
                                ifTrue: [
3✔
2660
                                        html
3✔
2661
                                                nextPutAll: '      <th>';
3✔
2662
                                                nextPutAll: paddedValue;
3✔
2663
                                                nextPutAll: '</th>';
3✔
2664
                                                cr ] ].
3✔
2665

3✔
2666
                html
3✔
2667
                        nextPutAll: '    </tr>';
3✔
2668
                        cr ].
3✔
2669

3✔
2670
        html
3✔
2671
                nextPutAll: '  </tbody>';
3✔
2672
                cr;
3✔
2673
                nextPutAll: '</table>'.
3✔
2674

3✔
2675
        ^ html contents
3✔
2676
]
3✔
2677

2678
{ #category : #converting }
2679
DataFrame >> toLatex [
3✔
2680
        " Prints the DataFrame as a Latex formatted table"
3✔
2681

3✔
2682
        | markdown columnWidths dataFrame |
3✔
2683
        dataFrame := self copy.
3✔
2684
        dataFrame addColumn: dataFrame rowNames named: '\#' atPosition: 1.
3✔
2685
        markdown := WriteStream on: String new.
3✔
2686
        markdown nextPutAll: '\begin{tabular}{|'.
3✔
2687
        dataFrame numberOfColumns timesRepeat: [ markdown nextPutAll: 'l|' ].
3✔
2688
        markdown nextPutAll: '}'.
3✔
2689
        markdown cr.
3✔
2690
        markdown nextPutAll: '\hline'.
3✔
2691
        markdown cr.
3✔
2692

3✔
2693
        columnWidths := dataFrame columnNames collect: [ :columnName |
3✔
2694
                                | maxWidth |
3✔
2695
                                maxWidth := columnName asString size.
3✔
2696
                                dataFrame rows do: [ :row |
3✔
2697
                                        | value |
3✔
2698
                                        value := row at: columnName.
3✔
2699
                                        maxWidth := maxWidth max: value printString size ].
3✔
2700
                                maxWidth ].
3✔
2701

3✔
2702
        dataFrame columnNames withIndexDo: [ :columnName :index |
3✔
2703
                | paddedColumnName |
3✔
2704
                paddedColumnName := columnName asString padRightTo: (columnWidths at: index).
3✔
2705
                index = dataFrame numberOfColumns
3✔
2706
                        ifFalse: [ markdown nextPutAll: paddedColumnName , ' & ' ]
3✔
2707
                        ifTrue: [ markdown nextPutAll: paddedColumnName ] ].
3✔
2708
        markdown nextPutAll: '\\'.
3✔
2709
        markdown cr.
3✔
2710
        markdown nextPutAll: '\hline'.
3✔
2711
        markdown cr.
3✔
2712

3✔
2713

3✔
2714

3✔
2715
        dataFrame asArrayOfRows do: [ :row |
3✔
2716
                row withIndexDo: [ :value :index |
3✔
2717
                        | paddedValue |
3✔
2718
                        paddedValue := value printString padRightTo:
3✔
2719
                                               (columnWidths at: index).
3✔
2720
                        index = dataFrame numberOfColumns
3✔
2721
                                ifFalse: [ markdown nextPutAll: paddedValue , ' & ' ]
3✔
2722
                                ifTrue: [ markdown nextPutAll: paddedValue ] ].
3✔
2723
                markdown nextPutAll: '\\'.
3✔
2724
                markdown cr.
3✔
2725
                markdown nextPutAll: '\hline'.
3✔
2726
                markdown cr ].
3✔
2727
        markdown nextPutAll: '\end{tabular}'.
3✔
2728
        ^ markdown contents
3✔
2729
]
3✔
2730

2731
{ #category : #converting }
2732
DataFrame >> toMarkdown [
3✔
2733
        " Prints the DataFrame as a Markdown formatted table"
3✔
2734

3✔
2735
        | markdown columnWidths dataFrame |
3✔
2736
        dataFrame := self copy.
3✔
2737
        dataFrame addColumn: dataFrame rowNames named: '#' atPosition: 1.
3✔
2738
        markdown := WriteStream on: String new.
3✔
2739
        markdown nextPutAll: '| '.
3✔
2740

3✔
2741
        columnWidths := dataFrame columnNames collect: [ :columnName |
3✔
2742
                                | maxWidth |
3✔
2743
                                maxWidth := columnName asString size.
3✔
2744
                                dataFrame rows do: [ :row |
3✔
2745
                                        | value |
3✔
2746
                                        value := row at: columnName.
3✔
2747
                                        maxWidth := maxWidth max: value printString size ].
3✔
2748
                                maxWidth ].
3✔
2749

3✔
2750
        dataFrame columnNames withIndexDo: [ :columnName :index |
3✔
2751
                | paddedColumnName |
3✔
2752
                paddedColumnName := columnName asString padRightTo: (columnWidths at: index).
3✔
2753
                markdown nextPutAll: paddedColumnName , ' | ' ].
3✔
2754
        markdown cr.
3✔
2755
        markdown nextPutAll: '| '.
3✔
2756

3✔
2757
        columnWidths do: [ :width |
3✔
2758
                | secondRow |
3✔
2759
                secondRow := '-'.
3✔
2760
                width - 1 timesRepeat: [ secondRow := secondRow , '-' ].
3✔
2761
                markdown nextPutAll: secondRow , ' | ' ].
3✔
2762

3✔
2763
        markdown cr.
3✔
2764

3✔
2765
        dataFrame asArrayOfRows do: [ :row |
3✔
2766
                markdown nextPutAll: '| '.
3✔
2767
                row withIndexDo: [ :value :index |
3✔
2768
                        | paddedValue |
3✔
2769
                        paddedValue := value printString padRightTo:
3✔
2770
                                               (columnWidths at: index).
3✔
2771
                        markdown nextPutAll: paddedValue , ' | ' ].
3✔
2772
                markdown cr ].
3✔
2773

3✔
2774
        ^ markdown contents
3✔
2775
]
3✔
2776

2777
{ #category : #converting }
2778
DataFrame >> toString [
3✔
2779
        " Prints the DataFrame as a String formatted table"
3✔
2780

3✔
2781
        | stringTable columnWidths dataFrame |
3✔
2782
        dataFrame := self copy.
3✔
2783
        dataFrame addColumn: dataFrame rowNames named: '#' atPosition: 1.
3✔
2784
        stringTable := WriteStream on: String new.
3✔
2785

3✔
2786
        columnWidths := dataFrame columnNames collect: [ :columnName |
3✔
2787
                                | maxWidth |
3✔
2788
                                maxWidth := columnName asString size.
3✔
2789
                                dataFrame rows do: [ :row |
3✔
2790
                                        | value |
3✔
2791
                                        value := row at: columnName.
3✔
2792
                                        maxWidth := maxWidth max: value printString size ].
3✔
2793
                                maxWidth ].
3✔
2794

3✔
2795
        dataFrame columnNames withIndexDo: [ :columnName :index |
3✔
2796
                | paddedColumnName |
3✔
2797
                paddedColumnName := columnName asString padRightTo: (columnWidths at: index).
3✔
2798
                stringTable nextPutAll: paddedColumnName , '  ' ].
3✔
2799
        stringTable cr.
3✔
2800

3✔
2801

3✔
2802

3✔
2803
        dataFrame asArrayOfRows do: [ :row |
3✔
2804
                row withIndexDo: [ :value :index |
3✔
2805
                        | paddedValue |
3✔
2806
                        paddedValue := value printString padRightTo:
3✔
2807
                                               (columnWidths at: index).
3✔
2808
                        stringTable nextPutAll: paddedValue , '  ' ].
3✔
2809
                stringTable cr ].
3✔
2810

3✔
2811
        ^ stringTable contents
3✔
2812
]
3✔
2813

2814
{ #category : #geometry }
2815
DataFrame >> transposed [
3✔
2816
        "Returns a transposed DataFrame. Columns become rows and rows become columns."
3✔
2817

3✔
2818
        "(#(#(1 2) #(3 4)) asDataFrame transposed) >>> (#(#(1 3) #(2 4)) asDataFrame)"
3✔
2819

3✔
2820
        "(#(#(1 2 3)) asDataFrame transposed) >>> (#(#(1) #(2) #(3)) asDataFrame)"
3✔
2821

3✔
2822
        "(#(#(r1c1 r1c2) #(r2c1 r2c2)) asDataFrame transposed) >>> (#(#(r1c1 r2c1) #(r1c2 r2c2)) asDataFrame)"
3✔
2823

3✔
2824
        | transposedDf |
3✔
2825
        transposedDf := DataFrame withRows: self asArrayOfColumns.
3✔
2826
        transposedDf rowNames: self columnNames.
3✔
2827
        transposedDf columnNames: self rowNames.
3✔
2828
        ^ transposedDf
3✔
2829
]
3✔
2830

2831
{ #category : #statistics }
2832
DataFrame >> variance [
3✔
2833
        "variance measures how far each number in the set is from the average value of the set. It is the square of standard deviation."
3✔
2834

3✔
2835
        "(#(#(10 3) #(20 1) #(30 2)) asDataFrame variance) >>> (Dictionary newFrom: {(1 -> 100).(2 -> 1)})"
3✔
2836

3✔
2837
        ^ self applyToAllColumns: #variance
3✔
2838
]
3✔
2839

2840
{ #category : #enumerating }
2841
DataFrame >> withIndexCollect: elementAndIndexBlock [
3✔
2842
        "Overrides withIndexCollect: to create DataFrame with the same number of columns as values in the first row"
3✔
2843
        | firstRow newDataFrame |
3✔
2844

3✔
2845
        firstRow := (self rowAt: 1) copy.
3✔
2846
        newDataFrame := self class new: 0@(elementAndIndexBlock value: firstRow value: 1) size.
3✔
2847
        newDataFrame columnNames: firstRow keys.
3✔
2848

3✔
2849
        self withIndexDo: [ :each :index | newDataFrame add: (elementAndIndexBlock value: each copy value: index)].
3✔
2850
        ^ newDataFrame
3✔
2851
]
3✔
2852

2853
{ #category : #enumerating }
2854
DataFrame >> withIndexDo: elementAndIndexBlock [
3✔
2855

3✔
2856
        1 to: self size do: [ :i |
3✔
2857
                | row |
3✔
2858
                row := (self rowAt: i).
3✔
2859
                elementAndIndexBlock value: row value: i.
3✔
2860

3✔
2861
                "A hack to allow modification of rows inside do block"
3✔
2862
                self rowAt: i put: row asArray ]
3✔
2863
]
3✔
2864

2865
{ #category : #enumerating }
2866
DataFrame >> withIndexReject: elementAndIndexBlock [
3✔
2867
        "Evaluate aBlock with each of the receiver's elements and index as the arguments.
3✔
2868
        Collect into a new collection like the receiver, only those elements for
3✔
2869
        which aBlock evaluates to false. Answer the new collection."
3✔
2870
        ^ self withIndexSelect: [ :row :index | (elementAndIndexBlock value: row value: index) not ]
3✔
2871
]
3✔
2872

2873
{ #category : #enumerating }
2874
DataFrame >> withIndexSelect: aBlock [
3✔
2875
        "Evaluate aBlock with each of the receiver's elements and index as the arguments.
3✔
2876
        Collect into a new collection like the receiver, only those elements for
3✔
2877
        which aBlock evaluates to true. Answer the new collection."
3✔
2878

3✔
2879
        | selectedIndexes |
3✔
2880

3✔
2881
        selectedIndexes := (1 to: self numberOfRows) select: [ :index |
3✔
2882
                aBlock value: (self at: index) value: index ].
3✔
2883

3✔
2884
        ^ self rowsAt: selectedIndexes
3✔
2885
]
3✔
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc