Data Cleaning Using Shellscript
I want a combined query suitable for selecting rows where lang = en
i.e. language is equal to English and removing the other rows while I convert this file from JSON to CSV.
json2csv -i Downloads/30.json -f id,text,lang -o Downloads/30.csv
30.json
{"created_at":"Sun Apr 01 09:00:00 +0000 2018","id":980369176291262464,"id_str":"980369176291262464","text":"RT @Q_cupid: u0e01u0e23u0e30u0e41u0e2au0e43u0e2au0e48u0e0au0e38u0e14u0e44u0e17u0e22u0e21u0e32u0e41u0e23u0e07u0e2du0e31u0e19u0e19u0e35u0e49u0e40u0e02u0e49u0e32u0e43u0e08 u0e41u0e15u0e48u0e17u0e35u0e48u0e44u0e21u0e48u0e40u0e02u0e49u0e32u0e43u0e08u0e04u0e37u0e2du0e04u0e19u0e17u0e35u0e48u0e01u0e25u0e49u0e32u0e44u0e1bu0e02u0e42u0e21u0e22u0e0au0e38u0e14u0e44u0e17u0e22u0e17u0e35u0e48u0e40u0e04u0e49u0e32u0e40u0e2du0e32u0e44u0e27u0e49u0e44u0e1bu0e41u0e01u0e49u0e1au0e19u0e15u0e32u0e21u0e28u0e32u0e25u0e19u0e35u0e48u0e41u0e2bu0e25u0e30 u0e2du0e35u0e40u0e27u0e07u0e07 u0e02u0e42u0e21u0e22u0e44u0e1bu0e40u0e01u0e37u0e2du0e1a100u2026","source":"u003ca href="http://twitter.com/download/android" rel="nofollow"u003eTwitter for Androidu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":782859283733983233,"id_str":"782859283733983233","name":"mystarud83cudf1fjaeminud83cudf1f","screen_name":"mljm1920","location":null,"url":null,"description":"u0e40u0e1bu0e47u0e19u0e21u0e35u0e4au0e17u0e35u0e48u0e23u0e31u0e01u0e19u0e49u0e2du0e07u0e41u0e08u0e21u0e21u0e32u0e01u0e17u0e35u0e48u0e2au0e38u0e14ud83dude18ud83dude18 #JAEMIN ud83cudf51 #EXO #NCTDREAM #NCT u0e0au0e34u0e1bu0e40u0e1bu0e2du0e23u0e4c #markmin u0e40u0e1bu0e47u0e19u0e2bu0e25u0e31u0e01 #HunHan #NoRen #ChanLe #SungLe","translator_type":"none","protected":false,"verified":false,"followers_count":205,"friends_count":2155,"listed_count":0,"favourites_count":17155,"statuses_count":9823,"created_at":"Mon Oct 03 08:26:12 +0000 2016","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"th","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/782859283733983233/1521042399","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":
Required output:
30.csv
shell-script csv json data
add a comment |
I want a combined query suitable for selecting rows where lang = en
i.e. language is equal to English and removing the other rows while I convert this file from JSON to CSV.
json2csv -i Downloads/30.json -f id,text,lang -o Downloads/30.csv
30.json
{"created_at":"Sun Apr 01 09:00:00 +0000 2018","id":980369176291262464,"id_str":"980369176291262464","text":"RT @Q_cupid: u0e01u0e23u0e30u0e41u0e2au0e43u0e2au0e48u0e0au0e38u0e14u0e44u0e17u0e22u0e21u0e32u0e41u0e23u0e07u0e2du0e31u0e19u0e19u0e35u0e49u0e40u0e02u0e49u0e32u0e43u0e08 u0e41u0e15u0e48u0e17u0e35u0e48u0e44u0e21u0e48u0e40u0e02u0e49u0e32u0e43u0e08u0e04u0e37u0e2du0e04u0e19u0e17u0e35u0e48u0e01u0e25u0e49u0e32u0e44u0e1bu0e02u0e42u0e21u0e22u0e0au0e38u0e14u0e44u0e17u0e22u0e17u0e35u0e48u0e40u0e04u0e49u0e32u0e40u0e2du0e32u0e44u0e27u0e49u0e44u0e1bu0e41u0e01u0e49u0e1au0e19u0e15u0e32u0e21u0e28u0e32u0e25u0e19u0e35u0e48u0e41u0e2bu0e25u0e30 u0e2du0e35u0e40u0e27u0e07u0e07 u0e02u0e42u0e21u0e22u0e44u0e1bu0e40u0e01u0e37u0e2du0e1a100u2026","source":"u003ca href="http://twitter.com/download/android" rel="nofollow"u003eTwitter for Androidu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":782859283733983233,"id_str":"782859283733983233","name":"mystarud83cudf1fjaeminud83cudf1f","screen_name":"mljm1920","location":null,"url":null,"description":"u0e40u0e1bu0e47u0e19u0e21u0e35u0e4au0e17u0e35u0e48u0e23u0e31u0e01u0e19u0e49u0e2du0e07u0e41u0e08u0e21u0e21u0e32u0e01u0e17u0e35u0e48u0e2au0e38u0e14ud83dude18ud83dude18 #JAEMIN ud83cudf51 #EXO #NCTDREAM #NCT u0e0au0e34u0e1bu0e40u0e1bu0e2du0e23u0e4c #markmin u0e40u0e1bu0e47u0e19u0e2bu0e25u0e31u0e01 #HunHan #NoRen #ChanLe #SungLe","translator_type":"none","protected":false,"verified":false,"followers_count":205,"friends_count":2155,"listed_count":0,"favourites_count":17155,"statuses_count":9823,"created_at":"Mon Oct 03 08:26:12 +0000 2016","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"th","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/782859283733983233/1521042399","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":
Required output:
30.csv
shell-script csv json data
Can you add a sample input (from 30.json) and required output?
– Haxiel
Jan 3 at 10:50
{"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel
– DaZzLeR
Jan 3 at 10:59
1
Please edit the question and add it there instead.
– Haxiel
Jan 3 at 11:02
It's unclear how you would want to do the filtering. You say you want to filter onlang
and only use data wherelang
isen
. The only field that islang
is part of theuser
object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.
– Kusalananda
Jan 3 at 11:38
Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda
– DaZzLeR
Jan 4 at 9:37
add a comment |
I want a combined query suitable for selecting rows where lang = en
i.e. language is equal to English and removing the other rows while I convert this file from JSON to CSV.
json2csv -i Downloads/30.json -f id,text,lang -o Downloads/30.csv
30.json
{"created_at":"Sun Apr 01 09:00:00 +0000 2018","id":980369176291262464,"id_str":"980369176291262464","text":"RT @Q_cupid: u0e01u0e23u0e30u0e41u0e2au0e43u0e2au0e48u0e0au0e38u0e14u0e44u0e17u0e22u0e21u0e32u0e41u0e23u0e07u0e2du0e31u0e19u0e19u0e35u0e49u0e40u0e02u0e49u0e32u0e43u0e08 u0e41u0e15u0e48u0e17u0e35u0e48u0e44u0e21u0e48u0e40u0e02u0e49u0e32u0e43u0e08u0e04u0e37u0e2du0e04u0e19u0e17u0e35u0e48u0e01u0e25u0e49u0e32u0e44u0e1bu0e02u0e42u0e21u0e22u0e0au0e38u0e14u0e44u0e17u0e22u0e17u0e35u0e48u0e40u0e04u0e49u0e32u0e40u0e2du0e32u0e44u0e27u0e49u0e44u0e1bu0e41u0e01u0e49u0e1au0e19u0e15u0e32u0e21u0e28u0e32u0e25u0e19u0e35u0e48u0e41u0e2bu0e25u0e30 u0e2du0e35u0e40u0e27u0e07u0e07 u0e02u0e42u0e21u0e22u0e44u0e1bu0e40u0e01u0e37u0e2du0e1a100u2026","source":"u003ca href="http://twitter.com/download/android" rel="nofollow"u003eTwitter for Androidu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":782859283733983233,"id_str":"782859283733983233","name":"mystarud83cudf1fjaeminud83cudf1f","screen_name":"mljm1920","location":null,"url":null,"description":"u0e40u0e1bu0e47u0e19u0e21u0e35u0e4au0e17u0e35u0e48u0e23u0e31u0e01u0e19u0e49u0e2du0e07u0e41u0e08u0e21u0e21u0e32u0e01u0e17u0e35u0e48u0e2au0e38u0e14ud83dude18ud83dude18 #JAEMIN ud83cudf51 #EXO #NCTDREAM #NCT u0e0au0e34u0e1bu0e40u0e1bu0e2du0e23u0e4c #markmin u0e40u0e1bu0e47u0e19u0e2bu0e25u0e31u0e01 #HunHan #NoRen #ChanLe #SungLe","translator_type":"none","protected":false,"verified":false,"followers_count":205,"friends_count":2155,"listed_count":0,"favourites_count":17155,"statuses_count":9823,"created_at":"Mon Oct 03 08:26:12 +0000 2016","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"th","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/782859283733983233/1521042399","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":
Required output:
30.csv
shell-script csv json data
I want a combined query suitable for selecting rows where lang = en
i.e. language is equal to English and removing the other rows while I convert this file from JSON to CSV.
json2csv -i Downloads/30.json -f id,text,lang -o Downloads/30.csv
30.json
{"created_at":"Sun Apr 01 09:00:00 +0000 2018","id":980369176291262464,"id_str":"980369176291262464","text":"RT @Q_cupid: u0e01u0e23u0e30u0e41u0e2au0e43u0e2au0e48u0e0au0e38u0e14u0e44u0e17u0e22u0e21u0e32u0e41u0e23u0e07u0e2du0e31u0e19u0e19u0e35u0e49u0e40u0e02u0e49u0e32u0e43u0e08 u0e41u0e15u0e48u0e17u0e35u0e48u0e44u0e21u0e48u0e40u0e02u0e49u0e32u0e43u0e08u0e04u0e37u0e2du0e04u0e19u0e17u0e35u0e48u0e01u0e25u0e49u0e32u0e44u0e1bu0e02u0e42u0e21u0e22u0e0au0e38u0e14u0e44u0e17u0e22u0e17u0e35u0e48u0e40u0e04u0e49u0e32u0e40u0e2du0e32u0e44u0e27u0e49u0e44u0e1bu0e41u0e01u0e49u0e1au0e19u0e15u0e32u0e21u0e28u0e32u0e25u0e19u0e35u0e48u0e41u0e2bu0e25u0e30 u0e2du0e35u0e40u0e27u0e07u0e07 u0e02u0e42u0e21u0e22u0e44u0e1bu0e40u0e01u0e37u0e2du0e1a100u2026","source":"u003ca href="http://twitter.com/download/android" rel="nofollow"u003eTwitter for Androidu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":782859283733983233,"id_str":"782859283733983233","name":"mystarud83cudf1fjaeminud83cudf1f","screen_name":"mljm1920","location":null,"url":null,"description":"u0e40u0e1bu0e47u0e19u0e21u0e35u0e4au0e17u0e35u0e48u0e23u0e31u0e01u0e19u0e49u0e2du0e07u0e41u0e08u0e21u0e21u0e32u0e01u0e17u0e35u0e48u0e2au0e38u0e14ud83dude18ud83dude18 #JAEMIN ud83cudf51 #EXO #NCTDREAM #NCT u0e0au0e34u0e1bu0e40u0e1bu0e2du0e23u0e4c #markmin u0e40u0e1bu0e47u0e19u0e2bu0e25u0e31u0e01 #HunHan #NoRen #ChanLe #SungLe","translator_type":"none","protected":false,"verified":false,"followers_count":205,"friends_count":2155,"listed_count":0,"favourites_count":17155,"statuses_count":9823,"created_at":"Mon Oct 03 08:26:12 +0000 2016","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"th","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/782859283733983233/1521042399","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":
Required output:
30.csv
shell-script csv json data
shell-script csv json data
edited Jan 3 at 15:37
L.F.C.
10919
10919
asked Jan 3 at 10:36
DaZzLeRDaZzLeR
85
85
Can you add a sample input (from 30.json) and required output?
– Haxiel
Jan 3 at 10:50
{"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel
– DaZzLeR
Jan 3 at 10:59
1
Please edit the question and add it there instead.
– Haxiel
Jan 3 at 11:02
It's unclear how you would want to do the filtering. You say you want to filter onlang
and only use data wherelang
isen
. The only field that islang
is part of theuser
object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.
– Kusalananda
Jan 3 at 11:38
Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda
– DaZzLeR
Jan 4 at 9:37
add a comment |
Can you add a sample input (from 30.json) and required output?
– Haxiel
Jan 3 at 10:50
{"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel
– DaZzLeR
Jan 3 at 10:59
1
Please edit the question and add it there instead.
– Haxiel
Jan 3 at 11:02
It's unclear how you would want to do the filtering. You say you want to filter onlang
and only use data wherelang
isen
. The only field that islang
is part of theuser
object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.
– Kusalananda
Jan 3 at 11:38
Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda
– DaZzLeR
Jan 4 at 9:37
Can you add a sample input (from 30.json) and required output?
– Haxiel
Jan 3 at 10:50
Can you add a sample input (from 30.json) and required output?
– Haxiel
Jan 3 at 10:50
{"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel
– DaZzLeR
Jan 3 at 10:59
{"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel
– DaZzLeR
Jan 3 at 10:59
1
1
Please edit the question and add it there instead.
– Haxiel
Jan 3 at 11:02
Please edit the question and add it there instead.
– Haxiel
Jan 3 at 11:02
It's unclear how you would want to do the filtering. You say you want to filter on
lang
and only use data where lang
is en
. The only field that is lang
is part of the user
object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.– Kusalananda
Jan 3 at 11:38
It's unclear how you would want to do the filtering. You say you want to filter on
lang
and only use data where lang
is en
. The only field that is lang
is part of the user
object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.– Kusalananda
Jan 3 at 11:38
Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda
– DaZzLeR
Jan 4 at 9:37
Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda
– DaZzLeR
Jan 4 at 9:37
add a comment |
1 Answer
1
active
oldest
votes
With jq
(https://stedolan.github.io/jq/), you could do this:
jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json
This should get you the required output:
980369176291262500,"RT @Q_cupid: กระแสใส่ชุดไทยมาแรงอันนี้เข้าใจ แต่ที่ไม่เข้าใจคือคนที่กล้าไปขโมยชุดไทยที่เค้าเอาไว้ไปแก้บนตามศาลนี่แหละ อีเวงง ขโมยไปเกือบ100…","en"
To write the output to a file, simply redirect it:
jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json > output.csv
Note that I have modified the example JSON from your question to be a valid one. You seem to be discarding the bulk of the data, so this should not cause any problems.
Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel
– DaZzLeR
Jan 4 at 6:16
@DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.
– Haxiel
Jan 4 at 6:48
Thank You @Haxiel It really means a lot.
– DaZzLeR
Jan 4 at 7:12
@DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.
– Haxiel
Jan 4 at 7:26
How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?
– DaZzLeR
2 days ago
|
show 5 more comments
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492209%2fdata-cleaning-using-shellscript%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
With jq
(https://stedolan.github.io/jq/), you could do this:
jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json
This should get you the required output:
980369176291262500,"RT @Q_cupid: กระแสใส่ชุดไทยมาแรงอันนี้เข้าใจ แต่ที่ไม่เข้าใจคือคนที่กล้าไปขโมยชุดไทยที่เค้าเอาไว้ไปแก้บนตามศาลนี่แหละ อีเวงง ขโมยไปเกือบ100…","en"
To write the output to a file, simply redirect it:
jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json > output.csv
Note that I have modified the example JSON from your question to be a valid one. You seem to be discarding the bulk of the data, so this should not cause any problems.
Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel
– DaZzLeR
Jan 4 at 6:16
@DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.
– Haxiel
Jan 4 at 6:48
Thank You @Haxiel It really means a lot.
– DaZzLeR
Jan 4 at 7:12
@DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.
– Haxiel
Jan 4 at 7:26
How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?
– DaZzLeR
2 days ago
|
show 5 more comments
With jq
(https://stedolan.github.io/jq/), you could do this:
jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json
This should get you the required output:
980369176291262500,"RT @Q_cupid: กระแสใส่ชุดไทยมาแรงอันนี้เข้าใจ แต่ที่ไม่เข้าใจคือคนที่กล้าไปขโมยชุดไทยที่เค้าเอาไว้ไปแก้บนตามศาลนี่แหละ อีเวงง ขโมยไปเกือบ100…","en"
To write the output to a file, simply redirect it:
jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json > output.csv
Note that I have modified the example JSON from your question to be a valid one. You seem to be discarding the bulk of the data, so this should not cause any problems.
Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel
– DaZzLeR
Jan 4 at 6:16
@DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.
– Haxiel
Jan 4 at 6:48
Thank You @Haxiel It really means a lot.
– DaZzLeR
Jan 4 at 7:12
@DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.
– Haxiel
Jan 4 at 7:26
How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?
– DaZzLeR
2 days ago
|
show 5 more comments
With jq
(https://stedolan.github.io/jq/), you could do this:
jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json
This should get you the required output:
980369176291262500,"RT @Q_cupid: กระแสใส่ชุดไทยมาแรงอันนี้เข้าใจ แต่ที่ไม่เข้าใจคือคนที่กล้าไปขโมยชุดไทยที่เค้าเอาไว้ไปแก้บนตามศาลนี่แหละ อีเวงง ขโมยไปเกือบ100…","en"
To write the output to a file, simply redirect it:
jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json > output.csv
Note that I have modified the example JSON from your question to be a valid one. You seem to be discarding the bulk of the data, so this should not cause any problems.
With jq
(https://stedolan.github.io/jq/), you could do this:
jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json
This should get you the required output:
980369176291262500,"RT @Q_cupid: กระแสใส่ชุดไทยมาแรงอันนี้เข้าใจ แต่ที่ไม่เข้าใจคือคนที่กล้าไปขโมยชุดไทยที่เค้าเอาไว้ไปแก้บนตามศาลนี่แหละ อีเวงง ขโมยไปเกือบ100…","en"
To write the output to a file, simply redirect it:
jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json > output.csv
Note that I have modified the example JSON from your question to be a valid one. You seem to be discarding the bulk of the data, so this should not cause any problems.
edited Jan 4 at 6:46
answered Jan 3 at 12:14
HaxielHaxiel
1,562410
1,562410
Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel
– DaZzLeR
Jan 4 at 6:16
@DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.
– Haxiel
Jan 4 at 6:48
Thank You @Haxiel It really means a lot.
– DaZzLeR
Jan 4 at 7:12
@DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.
– Haxiel
Jan 4 at 7:26
How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?
– DaZzLeR
2 days ago
|
show 5 more comments
Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel
– DaZzLeR
Jan 4 at 6:16
@DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.
– Haxiel
Jan 4 at 6:48
Thank You @Haxiel It really means a lot.
– DaZzLeR
Jan 4 at 7:12
@DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.
– Haxiel
Jan 4 at 7:26
How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?
– DaZzLeR
2 days ago
Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel
– DaZzLeR
Jan 4 at 6:16
Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel
– DaZzLeR
Jan 4 at 6:16
@DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.
– Haxiel
Jan 4 at 6:48
@DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.
– Haxiel
Jan 4 at 6:48
Thank You @Haxiel It really means a lot.
– DaZzLeR
Jan 4 at 7:12
Thank You @Haxiel It really means a lot.
– DaZzLeR
Jan 4 at 7:12
@DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.
– Haxiel
Jan 4 at 7:26
@DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.
– Haxiel
Jan 4 at 7:26
How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?
– DaZzLeR
2 days ago
How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?
– DaZzLeR
2 days ago
|
show 5 more comments
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492209%2fdata-cleaning-using-shellscript%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you add a sample input (from 30.json) and required output?
– Haxiel
Jan 3 at 10:50
{"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel
– DaZzLeR
Jan 3 at 10:59
1
Please edit the question and add it there instead.
– Haxiel
Jan 3 at 11:02
It's unclear how you would want to do the filtering. You say you want to filter on
lang
and only use data wherelang
isen
. The only field that islang
is part of theuser
object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.– Kusalananda
Jan 3 at 11:38
Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda
– DaZzLeR
Jan 4 at 9:37