Data Cleaning Using Shellscript












1















I want a combined query suitable for selecting rows where lang = en i.e. language is equal to English and removing the other rows while I convert this file from JSON to CSV.



json2csv -i Downloads/30.json -f id,text,lang -o Downloads/30.csv

30.json

{"created_at":"Sun Apr 01 09:00:00 +0000 2018","id":980369176291262464,"id_str":"980369176291262464","text":"RT @Q_cupid: u0e01u0e23u0e30u0e41u0e2au0e43u0e2au0e48u0e0au0e38u0e14u0e44u0e17u0e22u0e21u0e32u0e41u0e23u0e07u0e2du0e31u0e19u0e19u0e35u0e49u0e40u0e02u0e49u0e32u0e43u0e08 u0e41u0e15u0e48u0e17u0e35u0e48u0e44u0e21u0e48u0e40u0e02u0e49u0e32u0e43u0e08u0e04u0e37u0e2du0e04u0e19u0e17u0e35u0e48u0e01u0e25u0e49u0e32u0e44u0e1bu0e02u0e42u0e21u0e22u0e0au0e38u0e14u0e44u0e17u0e22u0e17u0e35u0e48u0e40u0e04u0e49u0e32u0e40u0e2du0e32u0e44u0e27u0e49u0e44u0e1bu0e41u0e01u0e49u0e1au0e19u0e15u0e32u0e21u0e28u0e32u0e25u0e19u0e35u0e48u0e41u0e2bu0e25u0e30 u0e2du0e35u0e40u0e27u0e07u0e07 u0e02u0e42u0e21u0e22u0e44u0e1bu0e40u0e01u0e37u0e2du0e1a100u2026","source":"u003ca href="http://twitter.com/download/android" rel="nofollow"u003eTwitter for Androidu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":782859283733983233,"id_str":"782859283733983233","name":"mystarud83cudf1fjaeminud83cudf1f","screen_name":"mljm1920","location":null,"url":null,"description":"u0e40u0e1bu0e47u0e19u0e21u0e35u0e4au0e17u0e35u0e48u0e23u0e31u0e01u0e19u0e49u0e2du0e07u0e41u0e08u0e21u0e21u0e32u0e01u0e17u0e35u0e48u0e2au0e38u0e14ud83dude18ud83dude18 #JAEMIN ud83cudf51 #EXO #NCTDREAM #NCT u0e0au0e34u0e1bu0e40u0e1bu0e2du0e23u0e4c #markmin u0e40u0e1bu0e47u0e19u0e2bu0e25u0e31u0e01 #HunHan #NoRen #ChanLe #SungLe","translator_type":"none","protected":false,"verified":false,"followers_count":205,"friends_count":2155,"listed_count":0,"favourites_count":17155,"statuses_count":9823,"created_at":"Mon Oct 03 08:26:12 +0000 2016","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"th","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/782859283733983233/1521042399","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":


Required output:
30.csv



In output I need rows only with lang column value =en










share|improve this question

























  • Can you add a sample input (from 30.json) and required output?

    – Haxiel
    Jan 3 at 10:50











  • {"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel

    – DaZzLeR
    Jan 3 at 10:59








  • 1





    Please edit the question and add it there instead.

    – Haxiel
    Jan 3 at 11:02











  • It's unclear how you would want to do the filtering. You say you want to filter on lang and only use data where lang is en. The only field that is lang is part of the user object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.

    – Kusalananda
    Jan 3 at 11:38













  • Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda

    – DaZzLeR
    Jan 4 at 9:37
















1















I want a combined query suitable for selecting rows where lang = en i.e. language is equal to English and removing the other rows while I convert this file from JSON to CSV.



json2csv -i Downloads/30.json -f id,text,lang -o Downloads/30.csv

30.json

{"created_at":"Sun Apr 01 09:00:00 +0000 2018","id":980369176291262464,"id_str":"980369176291262464","text":"RT @Q_cupid: u0e01u0e23u0e30u0e41u0e2au0e43u0e2au0e48u0e0au0e38u0e14u0e44u0e17u0e22u0e21u0e32u0e41u0e23u0e07u0e2du0e31u0e19u0e19u0e35u0e49u0e40u0e02u0e49u0e32u0e43u0e08 u0e41u0e15u0e48u0e17u0e35u0e48u0e44u0e21u0e48u0e40u0e02u0e49u0e32u0e43u0e08u0e04u0e37u0e2du0e04u0e19u0e17u0e35u0e48u0e01u0e25u0e49u0e32u0e44u0e1bu0e02u0e42u0e21u0e22u0e0au0e38u0e14u0e44u0e17u0e22u0e17u0e35u0e48u0e40u0e04u0e49u0e32u0e40u0e2du0e32u0e44u0e27u0e49u0e44u0e1bu0e41u0e01u0e49u0e1au0e19u0e15u0e32u0e21u0e28u0e32u0e25u0e19u0e35u0e48u0e41u0e2bu0e25u0e30 u0e2du0e35u0e40u0e27u0e07u0e07 u0e02u0e42u0e21u0e22u0e44u0e1bu0e40u0e01u0e37u0e2du0e1a100u2026","source":"u003ca href="http://twitter.com/download/android" rel="nofollow"u003eTwitter for Androidu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":782859283733983233,"id_str":"782859283733983233","name":"mystarud83cudf1fjaeminud83cudf1f","screen_name":"mljm1920","location":null,"url":null,"description":"u0e40u0e1bu0e47u0e19u0e21u0e35u0e4au0e17u0e35u0e48u0e23u0e31u0e01u0e19u0e49u0e2du0e07u0e41u0e08u0e21u0e21u0e32u0e01u0e17u0e35u0e48u0e2au0e38u0e14ud83dude18ud83dude18 #JAEMIN ud83cudf51 #EXO #NCTDREAM #NCT u0e0au0e34u0e1bu0e40u0e1bu0e2du0e23u0e4c #markmin u0e40u0e1bu0e47u0e19u0e2bu0e25u0e31u0e01 #HunHan #NoRen #ChanLe #SungLe","translator_type":"none","protected":false,"verified":false,"followers_count":205,"friends_count":2155,"listed_count":0,"favourites_count":17155,"statuses_count":9823,"created_at":"Mon Oct 03 08:26:12 +0000 2016","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"th","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/782859283733983233/1521042399","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":


Required output:
30.csv



In output I need rows only with lang column value =en










share|improve this question

























  • Can you add a sample input (from 30.json) and required output?

    – Haxiel
    Jan 3 at 10:50











  • {"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel

    – DaZzLeR
    Jan 3 at 10:59








  • 1





    Please edit the question and add it there instead.

    – Haxiel
    Jan 3 at 11:02











  • It's unclear how you would want to do the filtering. You say you want to filter on lang and only use data where lang is en. The only field that is lang is part of the user object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.

    – Kusalananda
    Jan 3 at 11:38













  • Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda

    – DaZzLeR
    Jan 4 at 9:37














1












1








1








I want a combined query suitable for selecting rows where lang = en i.e. language is equal to English and removing the other rows while I convert this file from JSON to CSV.



json2csv -i Downloads/30.json -f id,text,lang -o Downloads/30.csv

30.json

{"created_at":"Sun Apr 01 09:00:00 +0000 2018","id":980369176291262464,"id_str":"980369176291262464","text":"RT @Q_cupid: u0e01u0e23u0e30u0e41u0e2au0e43u0e2au0e48u0e0au0e38u0e14u0e44u0e17u0e22u0e21u0e32u0e41u0e23u0e07u0e2du0e31u0e19u0e19u0e35u0e49u0e40u0e02u0e49u0e32u0e43u0e08 u0e41u0e15u0e48u0e17u0e35u0e48u0e44u0e21u0e48u0e40u0e02u0e49u0e32u0e43u0e08u0e04u0e37u0e2du0e04u0e19u0e17u0e35u0e48u0e01u0e25u0e49u0e32u0e44u0e1bu0e02u0e42u0e21u0e22u0e0au0e38u0e14u0e44u0e17u0e22u0e17u0e35u0e48u0e40u0e04u0e49u0e32u0e40u0e2du0e32u0e44u0e27u0e49u0e44u0e1bu0e41u0e01u0e49u0e1au0e19u0e15u0e32u0e21u0e28u0e32u0e25u0e19u0e35u0e48u0e41u0e2bu0e25u0e30 u0e2du0e35u0e40u0e27u0e07u0e07 u0e02u0e42u0e21u0e22u0e44u0e1bu0e40u0e01u0e37u0e2du0e1a100u2026","source":"u003ca href="http://twitter.com/download/android" rel="nofollow"u003eTwitter for Androidu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":782859283733983233,"id_str":"782859283733983233","name":"mystarud83cudf1fjaeminud83cudf1f","screen_name":"mljm1920","location":null,"url":null,"description":"u0e40u0e1bu0e47u0e19u0e21u0e35u0e4au0e17u0e35u0e48u0e23u0e31u0e01u0e19u0e49u0e2du0e07u0e41u0e08u0e21u0e21u0e32u0e01u0e17u0e35u0e48u0e2au0e38u0e14ud83dude18ud83dude18 #JAEMIN ud83cudf51 #EXO #NCTDREAM #NCT u0e0au0e34u0e1bu0e40u0e1bu0e2du0e23u0e4c #markmin u0e40u0e1bu0e47u0e19u0e2bu0e25u0e31u0e01 #HunHan #NoRen #ChanLe #SungLe","translator_type":"none","protected":false,"verified":false,"followers_count":205,"friends_count":2155,"listed_count":0,"favourites_count":17155,"statuses_count":9823,"created_at":"Mon Oct 03 08:26:12 +0000 2016","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"th","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/782859283733983233/1521042399","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":


Required output:
30.csv



In output I need rows only with lang column value =en










share|improve this question
















I want a combined query suitable for selecting rows where lang = en i.e. language is equal to English and removing the other rows while I convert this file from JSON to CSV.



json2csv -i Downloads/30.json -f id,text,lang -o Downloads/30.csv

30.json

{"created_at":"Sun Apr 01 09:00:00 +0000 2018","id":980369176291262464,"id_str":"980369176291262464","text":"RT @Q_cupid: u0e01u0e23u0e30u0e41u0e2au0e43u0e2au0e48u0e0au0e38u0e14u0e44u0e17u0e22u0e21u0e32u0e41u0e23u0e07u0e2du0e31u0e19u0e19u0e35u0e49u0e40u0e02u0e49u0e32u0e43u0e08 u0e41u0e15u0e48u0e17u0e35u0e48u0e44u0e21u0e48u0e40u0e02u0e49u0e32u0e43u0e08u0e04u0e37u0e2du0e04u0e19u0e17u0e35u0e48u0e01u0e25u0e49u0e32u0e44u0e1bu0e02u0e42u0e21u0e22u0e0au0e38u0e14u0e44u0e17u0e22u0e17u0e35u0e48u0e40u0e04u0e49u0e32u0e40u0e2du0e32u0e44u0e27u0e49u0e44u0e1bu0e41u0e01u0e49u0e1au0e19u0e15u0e32u0e21u0e28u0e32u0e25u0e19u0e35u0e48u0e41u0e2bu0e25u0e30 u0e2du0e35u0e40u0e27u0e07u0e07 u0e02u0e42u0e21u0e22u0e44u0e1bu0e40u0e01u0e37u0e2du0e1a100u2026","source":"u003ca href="http://twitter.com/download/android" rel="nofollow"u003eTwitter for Androidu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":782859283733983233,"id_str":"782859283733983233","name":"mystarud83cudf1fjaeminud83cudf1f","screen_name":"mljm1920","location":null,"url":null,"description":"u0e40u0e1bu0e47u0e19u0e21u0e35u0e4au0e17u0e35u0e48u0e23u0e31u0e01u0e19u0e49u0e2du0e07u0e41u0e08u0e21u0e21u0e32u0e01u0e17u0e35u0e48u0e2au0e38u0e14ud83dude18ud83dude18 #JAEMIN ud83cudf51 #EXO #NCTDREAM #NCT u0e0au0e34u0e1bu0e40u0e1bu0e2du0e23u0e4c #markmin u0e40u0e1bu0e47u0e19u0e2bu0e25u0e31u0e01 #HunHan #NoRen #ChanLe #SungLe","translator_type":"none","protected":false,"verified":false,"followers_count":205,"friends_count":2155,"listed_count":0,"favourites_count":17155,"statuses_count":9823,"created_at":"Mon Oct 03 08:26:12 +0000 2016","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"th","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/980103235007492096/Xx-SpFsJ_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/782859283733983233/1521042399","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":


Required output:
30.csv



In output I need rows only with lang column value =en







shell-script csv json data






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 3 at 15:37









L.F.C.

10919




10919










asked Jan 3 at 10:36









DaZzLeRDaZzLeR

85




85













  • Can you add a sample input (from 30.json) and required output?

    – Haxiel
    Jan 3 at 10:50











  • {"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel

    – DaZzLeR
    Jan 3 at 10:59








  • 1





    Please edit the question and add it there instead.

    – Haxiel
    Jan 3 at 11:02











  • It's unclear how you would want to do the filtering. You say you want to filter on lang and only use data where lang is en. The only field that is lang is part of the user object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.

    – Kusalananda
    Jan 3 at 11:38













  • Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda

    – DaZzLeR
    Jan 4 at 9:37



















  • Can you add a sample input (from 30.json) and required output?

    – Haxiel
    Jan 3 at 10:50











  • {"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel

    – DaZzLeR
    Jan 3 at 10:59








  • 1





    Please edit the question and add it there instead.

    – Haxiel
    Jan 3 at 11:02











  • It's unclear how you would want to do the filtering. You say you want to filter on lang and only use data where lang is en. The only field that is lang is part of the user object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.

    – Kusalananda
    Jan 3 at 11:38













  • Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda

    – DaZzLeR
    Jan 4 at 9:37

















Can you add a sample input (from 30.json) and required output?

– Haxiel
Jan 3 at 10:50





Can you add a sample input (from 30.json) and required output?

– Haxiel
Jan 3 at 10:50













{"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel

– DaZzLeR
Jan 3 at 10:59







{"created_at":"Sun Apr 01 06:30:00 +0000 2018","id_str":"980331427534262272","text":"RT YahBoyCourage: we strictly listening to AM radio now","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":64939967,"id_str":"64939967","name":"Cesar ud83cuddf2ud83cuddfdud83cuddf5ud83cuddea","screen_name":"CZR93","location":"Richmond, CA", Couldnt add more due to word limit @Haxiel

– DaZzLeR
Jan 3 at 10:59






1




1





Please edit the question and add it there instead.

– Haxiel
Jan 3 at 11:02





Please edit the question and add it there instead.

– Haxiel
Jan 3 at 11:02













It's unclear how you would want to do the filtering. You say you want to filter on lang and only use data where lang is en. The only field that is lang is part of the user object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.

– Kusalananda
Jan 3 at 11:38







It's unclear how you would want to do the filtering. You say you want to filter on lang and only use data where lang is en. The only field that is lang is part of the user object in the incomplete JSON document fragment that you show. Does this mean that you want to filter out all non-English users? The data that you show also only shows a single message. Does the real data contain an array of messages? Your required output also seems to show non-English data.

– Kusalananda
Jan 3 at 11:38















Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda

– DaZzLeR
Jan 4 at 9:37





Yes, I wanted to remove all non-English users. Yes, I have a huge amount of data. @Kusalananda

– DaZzLeR
Jan 4 at 9:37










1 Answer
1






active

oldest

votes


















2














With jq (https://stedolan.github.io/jq/), you could do this:



jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json


This should get you the required output:



980369176291262500,"RT @Q_cupid: กระแสใส่ชุดไทยมาแรงอันนี้เข้าใจ แต่ที่ไม่เข้าใจคือคนที่กล้าไปขโมยชุดไทยที่เค้าเอาไว้ไปแก้บนตามศาลนี่แหละ อีเวงง ขโมยไปเกือบ100…","en"


To write the output to a file, simply redirect it:



jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json > output.csv


Note that I have modified the example JSON from your question to be a valid one. You seem to be discarding the bulk of the data, so this should not cause any problems.






share|improve this answer


























  • Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel

    – DaZzLeR
    Jan 4 at 6:16













  • @DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.

    – Haxiel
    Jan 4 at 6:48











  • Thank You @Haxiel It really means a lot.

    – DaZzLeR
    Jan 4 at 7:12











  • @DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.

    – Haxiel
    Jan 4 at 7:26











  • How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?

    – DaZzLeR
    2 days ago











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492209%2fdata-cleaning-using-shellscript%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














With jq (https://stedolan.github.io/jq/), you could do this:



jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json


This should get you the required output:



980369176291262500,"RT @Q_cupid: กระแสใส่ชุดไทยมาแรงอันนี้เข้าใจ แต่ที่ไม่เข้าใจคือคนที่กล้าไปขโมยชุดไทยที่เค้าเอาไว้ไปแก้บนตามศาลนี่แหละ อีเวงง ขโมยไปเกือบ100…","en"


To write the output to a file, simply redirect it:



jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json > output.csv


Note that I have modified the example JSON from your question to be a valid one. You seem to be discarding the bulk of the data, so this should not cause any problems.






share|improve this answer


























  • Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel

    – DaZzLeR
    Jan 4 at 6:16













  • @DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.

    – Haxiel
    Jan 4 at 6:48











  • Thank You @Haxiel It really means a lot.

    – DaZzLeR
    Jan 4 at 7:12











  • @DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.

    – Haxiel
    Jan 4 at 7:26











  • How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?

    – DaZzLeR
    2 days ago
















2














With jq (https://stedolan.github.io/jq/), you could do this:



jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json


This should get you the required output:



980369176291262500,"RT @Q_cupid: กระแสใส่ชุดไทยมาแรงอันนี้เข้าใจ แต่ที่ไม่เข้าใจคือคนที่กล้าไปขโมยชุดไทยที่เค้าเอาไว้ไปแก้บนตามศาลนี่แหละ อีเวงง ขโมยไปเกือบ100…","en"


To write the output to a file, simply redirect it:



jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json > output.csv


Note that I have modified the example JSON from your question to be a valid one. You seem to be discarding the bulk of the data, so this should not cause any problems.






share|improve this answer


























  • Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel

    – DaZzLeR
    Jan 4 at 6:16













  • @DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.

    – Haxiel
    Jan 4 at 6:48











  • Thank You @Haxiel It really means a lot.

    – DaZzLeR
    Jan 4 at 7:12











  • @DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.

    – Haxiel
    Jan 4 at 7:26











  • How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?

    – DaZzLeR
    2 days ago














2












2








2







With jq (https://stedolan.github.io/jq/), you could do this:



jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json


This should get you the required output:



980369176291262500,"RT @Q_cupid: กระแสใส่ชุดไทยมาแรงอันนี้เข้าใจ แต่ที่ไม่เข้าใจคือคนที่กล้าไปขโมยชุดไทยที่เค้าเอาไว้ไปแก้บนตามศาลนี่แหละ อีเวงง ขโมยไปเกือบ100…","en"


To write the output to a file, simply redirect it:



jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json > output.csv


Note that I have modified the example JSON from your question to be a valid one. You seem to be discarding the bulk of the data, so this should not cause any problems.






share|improve this answer















With jq (https://stedolan.github.io/jq/), you could do this:



jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json


This should get you the required output:



980369176291262500,"RT @Q_cupid: กระแสใส่ชุดไทยมาแรงอันนี้เข้าใจ แต่ที่ไม่เข้าใจคือคนที่กล้าไปขโมยชุดไทยที่เค้าเอาไว้ไปแก้บนตามศาลนี่แหละ อีเวงง ขโมยไปเกือบ100…","en"


To write the output to a file, simply redirect it:



jq -r '{ id, text, lang: .user.lang } | select (.lang == "en") | [ .id, .text, .lang ] | @csv' 30.json > output.csv


Note that I have modified the example JSON from your question to be a valid one. You seem to be discarding the bulk of the data, so this should not cause any problems.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 4 at 6:46

























answered Jan 3 at 12:14









HaxielHaxiel

1,562410




1,562410













  • Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel

    – DaZzLeR
    Jan 4 at 6:16













  • @DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.

    – Haxiel
    Jan 4 at 6:48











  • Thank You @Haxiel It really means a lot.

    – DaZzLeR
    Jan 4 at 7:12











  • @DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.

    – Haxiel
    Jan 4 at 7:26











  • How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?

    – DaZzLeR
    2 days ago



















  • Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel

    – DaZzLeR
    Jan 4 at 6:16













  • @DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.

    – Haxiel
    Jan 4 at 6:48











  • Thank You @Haxiel It really means a lot.

    – DaZzLeR
    Jan 4 at 7:12











  • @DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.

    – Haxiel
    Jan 4 at 7:26











  • How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?

    – DaZzLeR
    2 days ago

















Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel

– DaZzLeR
Jan 4 at 6:16







Thank you for the code but it dumped all the data onto my screen while I want output as a CSV file. Where am I supposed to specify that I want rows with language English only? @Haxiel

– DaZzLeR
Jan 4 at 6:16















@DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.

– Haxiel
Jan 4 at 6:48





@DaZzLeR I missed out on the conditional selection requirement earlier. It's fixed now; see my edit.

– Haxiel
Jan 4 at 6:48













Thank You @Haxiel It really means a lot.

– DaZzLeR
Jan 4 at 7:12





Thank You @Haxiel It really means a lot.

– DaZzLeR
Jan 4 at 7:12













@DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.

– Haxiel
Jan 4 at 7:26





@DaZzLeR If my answer works for you, please do remember to accept it (by clicking the tick mark next to the answer). This action grants reputation to both of us.

– Haxiel
Jan 4 at 7:26













How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?

– DaZzLeR
2 days ago





How can I combine these two queries? jq -r '{ text, lang: .user.lang } | select (.lang == "en") | [ .text, .lang ] | @csv' 57.json > output.csv The above query gives me the rows which are in English language and removes the rest of them. cut -d, -f2 --complement test.csv > output.csv and this query removes the language column after the need of language column is fulfilled. What should I do to do these 2 operations in a single query? Can that be done?

– DaZzLeR
2 days ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492209%2fdata-cleaning-using-shellscript%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre