• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

PolyMathOrg / DataFrame / 13409391746

19 Feb 2025 09:30AM UTC coverage: 94.756%. Remained the same
13409391746

push

github

web-flow
Enable Pharo 12 and 13 for the CI

13571 of 14322 relevant lines covered (94.76%)

4.74 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

96.04
/src/DataFrame-IO/DataFrameTypeDetector.class.st
1
"
2
I am a smart type detector. I receive a column of string values such as #('5' '-1' '0.1') or #('1:10' '2:20' '3:30'), detect the type to which all values in that column can be converted, and convert all values to that type. For example, #(5.0 -1.0 0.1) and #(1:10 am 2:20 am 3:30 am).
3

4
My typical application is to detect data types of data frame columns after that data frame was read from a CSV file.
5

6
I support the following types: Integer, Float, Boolean, Time, DateAndTime, String.
7

8
Instead of guessing the column types I can also be given a mapping of the column names and their types
9
in which case I skip detection and just convert. To set the types use my `columnTypes:` message.
10

11
    detector := DataFrameTypeDetector new.
12
    detector columnTypes: { 'columnName1' -> String. 'columnName2' -> Boolean } asDictionary.
13

14
    detector detectTypesAndConvert: aDataFrame
15

16
The keys of this mapping must be the column name and the values can be either a block (to perform custom type conversion) or one of the following strings that implement one of my standard type conversions:
17

18
- String: this does not perform any conversion.
19
- Integer: convert to an Integer object.
20
- Float: convert to a Float object.
21
- Boolean: convert to a Boolean object.
22
- DateAndTime: convert to a DateAndTime object.
23
- Time: convert to a Time object.
24
- nil: will attempt to guess the type of the column from one of the listed types above. This is the default if no conversion is given.
25
        
26
As well as the standard types I can also perform custom type conversion if the value is a block.
27

28
    types := { 'columnName' -> [:series | series collect: [:each | each asCustomType]]. } asDictionary.
29
    detector columnTypes: types        
30
    detector detectTypesAndConvert: aDataFrame
31

32
The block takes a single argument, the column, and should return the column as well. I also handle 
33
mixing the standard conversion types and custom converters in the provided types dictionary:
34

35
    types := { 'columnName2' -> Integer . 'columnName2' -> [:series | series collect: [:each | each asCustomType]]. } asDictionary.
36
    detector columnTypes: types        
37
    detector detectTypesAndConvert: aDataFrame
38
 
39
"
40
Class {
41
        #name : #DataFrameTypeDetector,
42
        #superclass : #Object,
43
        #instVars : [
44
                'columnTypes',
45
                'typeMapping'
46
        ],
47
        #category : #'DataFrame-IO-Type'
48
}
49

50
{ #category : #testing }
51
DataFrameTypeDetector >> canAllBeBoolean: anArray [
5✔
52
        "Checks to see if all of the values in the column are strings of true or false (case insensitive) or nil"
5✔
53

5✔
54
        ^ anArray allSatisfy: [ :each | each isNil or: [ each isString and: [ (each sameAs: 'true') | (each sameAs: 'false') ] ] ]
5✔
55
]
5✔
56

57
{ #category : #testing }
58
DataFrameTypeDetector >> canAllBeDateAndTime: anArray [
5✔
59

5✔
60
        [ anArray do: [ :ele | ele ifNotNil: [ ele asDateAndTime ] ] ]
5✔
61
                on: Error
5✔
62
                do: [ ^ false ].
5✔
63
        ^ true
5✔
64
]
5✔
65

66
{ #category : #testing }
67
DataFrameTypeDetector >> canAllBeNumber: anArray [
5✔
68

5✔
69
        ^ anArray allSatisfy: [ :each | each isNil or: [ each isNumber or: [ NumberParser isNumber: each ] ] ]
5✔
70
]
5✔
71

72
{ #category : #testing }
73
DataFrameTypeDetector >> canAllBeTime: anArray [
5✔
74

5✔
75
        [ anArray do: [ :ele | ele ifNotNil: [ ele asTime ] ] ]
5✔
76
                on: Error
5✔
77
                do: [ ^ false ].
5✔
78
        ^ true
5✔
79
]
5✔
80

81
{ #category : #testing }
82
DataFrameTypeDetector >> canAnyBeFloat: anArray [
5✔
83

5✔
84
        ^ anArray anySatisfy: [ :each | each isNil or: [ each asNumber isFloat ] ]
5✔
85
]
5✔
86

87
{ #category : #accessing }
88
DataFrameTypeDetector >> columnTypes [
×
89

×
90
        ^ columnTypes
×
91
]
×
92

93
{ #category : #accessing }
94
DataFrameTypeDetector >> columnTypes: aCollection [
5✔
95

5✔
96
        columnTypes := aCollection
5✔
97
]
5✔
98

99
{ #category : #converting }
100
DataFrameTypeDetector >> convertToBoolean: anArray [
5✔
101

5✔
102
        ^ anArray collect: [ :each | each ifNotNil: [ each asLowercase = 'true' ] ]
5✔
103
]
5✔
104

105
{ #category : #converting }
106
DataFrameTypeDetector >> convertToDateAndTime: anArray [
3✔
107

3✔
108
        ^ anArray collect: [ :ele | ele ifNotNil: [ ele asDateAndTime ] ]
3✔
109
]
3✔
110

111
{ #category : #converting }
112
DataFrameTypeDetector >> convertToFloat: anArray [
5✔
113

5✔
114
        ^ anArray collect: [ :each | each ifNotNil: [ each asNumber asFloat ] ]
5✔
115
]
5✔
116

117
{ #category : #converting }
118
DataFrameTypeDetector >> convertToInteger: anArray [
5✔
119

5✔
120
        ^ anArray collect: [ :each | each ifNotNil: [ each asNumber asInteger ] ]
5✔
121
]
5✔
122

123
{ #category : #converting }
124
DataFrameTypeDetector >> convertToTime: anArray [
5✔
125

5✔
126
        ^ anArray collect: [ :ele | ele ifNotNil: [ ele asTime ] ]
5✔
127
]
5✔
128

129
{ #category : #'public API' }
130
DataFrameTypeDetector >> detectColumnTypeAndConvert: anArray [
5✔
131

5✔
132
        (self canAllBeNumber: anArray) ifTrue: [
5✔
133
                ^ (self canAnyBeFloat: anArray)
5✔
134
                          ifTrue: [ self convertToFloat: anArray ]
5✔
135
                          ifFalse: [ self convertToInteger: anArray ] ].
5✔
136

5✔
137
        (self canAllBeBoolean: anArray) ifTrue: [ ^ self convertToBoolean: anArray ].
5✔
138

5✔
139
        (self canAllBeDateAndTime: anArray) ifTrue: [ ^ self convertToDateAndTime: anArray ].
5✔
140

5✔
141
        (self canAllBeTime: anArray) ifTrue: [ ^ self convertToTime: anArray ].
5✔
142

5✔
143
        ^ anArray
5✔
144
]
5✔
145

146
{ #category : #'public API' }
147
DataFrameTypeDetector >> detectTypesAndConvert: aDataFrame [
5✔
148

5✔
149
        aDataFrame asArrayOfColumns with: aDataFrame columnNames do: [ :column :columnName |
5✔
150
                | thisColumnType |
5✔
151
                "Get the user given column type for this column name and if it wasn't
5✔
152
                         given then use the default type detection"
5✔
153
                thisColumnType := columnTypes at: columnName ifAbsent: [ [ :array | self detectColumnTypeAndConvert: array ] ].
5✔
154
                "We allow users to submit either a string which is one of the standard
5✔
155
                         types that we know how to convert or a block which is for custom type
5✔
156
                         conversion. Test if it's a block here and if not assume that we can
5✔
157
                         look it up in the type mapping and assign one of the standard type
5✔
158
                         converting blocks."
5✔
159
                thisColumnType isBlock ifFalse: [ thisColumnType := typeMapping at: thisColumnType ].
5✔
160
                "Assign the column with the converted type by passing the original
5✔
161
                         column to the block for type conversion"
5✔
162
                aDataFrame column: columnName put: (thisColumnType value: column) ].
5✔
163
        aDataFrame rowNames: (self detectColumnTypeAndConvert: aDataFrame rowNames)
5✔
164
]
5✔
165

166
{ #category : #initialization }
167
DataFrameTypeDetector >> initialize [
5✔
168

5✔
169
        super initialize.
5✔
170
        columnTypes := Dictionary new.
5✔
171
        typeMapping := Dictionary newFrom: {
5✔
172
                                       (Boolean -> [ :array | self convertToBoolean: array ]).
5✔
173
                                       (Float -> [ :array | self convertToFloat: array ]).
5✔
174
                                       (Integer -> [ :array | self convertToInteger: array ]).
5✔
175
                                       (Time -> [ :array | self convertToTime: array ]).
5✔
176
                                       (DateAndTime -> [ :array | self convertToDateAndTime: array ]).
5✔
177
                                       (String -> [ :array | array ]).
5✔
178
                                       (nil -> [ :array | self detectColumnTypeAndConvert: array ]) }
5✔
179
]
5✔
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc