How to combine strings from JSON values, keeping only part of the string?











up vote
2
down vote

favorite












I have sample:



           "name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"


I want to get the following output:



"The title of website"    url_of_website


I need to remove the protocol prefix from the URL, so that only url_of_website is left (and no http in the front).
Problem is I'm not quite familiar with sed reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.



A valid json object that I'm trying to parse is Bookmark of google chrome , sample:



{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}









share|improve this question




















  • 3




    Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
    – Jesse_b
    2 days ago








  • 4




    You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
    – Kusalananda
    2 days ago










  • @Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
    – Tuyen Pham
    2 days ago










  • @Kusalananda: Thanks, I'll edit the title and change content to suit the context.
    – Tuyen Pham
    2 days ago















up vote
2
down vote

favorite












I have sample:



           "name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"


I want to get the following output:



"The title of website"    url_of_website


I need to remove the protocol prefix from the URL, so that only url_of_website is left (and no http in the front).
Problem is I'm not quite familiar with sed reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.



A valid json object that I'm trying to parse is Bookmark of google chrome , sample:



{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}









share|improve this question




















  • 3




    Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
    – Jesse_b
    2 days ago








  • 4




    You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
    – Kusalananda
    2 days ago










  • @Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
    – Tuyen Pham
    2 days ago










  • @Kusalananda: Thanks, I'll edit the title and change content to suit the context.
    – Tuyen Pham
    2 days ago













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have sample:



           "name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"


I want to get the following output:



"The title of website"    url_of_website


I need to remove the protocol prefix from the URL, so that only url_of_website is left (and no http in the front).
Problem is I'm not quite familiar with sed reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.



A valid json object that I'm trying to parse is Bookmark of google chrome , sample:



{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}









share|improve this question















I have sample:



           "name": "The title of website",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_of_website"


I want to get the following output:



"The title of website"    url_of_website


I need to remove the protocol prefix from the URL, so that only url_of_website is left (and no http in the front).
Problem is I'm not quite familiar with sed reading multiple lines, doing some research reach me https://unix.stackexchange.com/a/337399/256195, still can't produce the result.



A valid json object that I'm trying to parse is Bookmark of google chrome , sample:



{
"checksum": "9e44bb7b76d8c39c45420dd2158a4521",
"roots": {
"bookmark_bar": {
"children": [ {
"children": [ {
"date_added": "13161269379464568",
"id": "2046",
"name": "The title is here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://the_url_is_here"
}, {
"date_added": "13161324436994183",
"id": "2047",
"meta_info": {
"last_visited_desktop": "13176472235950821"
},
"name": "The title here",
"sync_transaction_version": "1",
"type": "url",
"url": "https://url_here"
} ]
} ]
}
}
}






text-processing sed json filter






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









MatthewRock

3,81321847




3,81321847










asked 2 days ago









Tuyen Pham

459111




459111








  • 3




    Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
    – Jesse_b
    2 days ago








  • 4




    You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
    – Kusalananda
    2 days ago










  • @Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
    – Tuyen Pham
    2 days ago










  • @Kusalananda: Thanks, I'll edit the title and change content to suit the context.
    – Tuyen Pham
    2 days ago














  • 3




    Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
    – Jesse_b
    2 days ago








  • 4




    You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
    – Kusalananda
    2 days ago










  • @Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
    – Tuyen Pham
    2 days ago










  • @Kusalananda: Thanks, I'll edit the title and change content to suit the context.
    – Tuyen Pham
    2 days ago








3




3




Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
2 days ago






Can you post a valid json object? Also jq or json is the proper tool for this, not sed.
– Jesse_b
2 days ago






4




4




You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
2 days ago




You don't parse JSON with sed. JSON is a structured document format unsuitable for parsing by anything other than a JSON parser. Doing it with sed would require you to implement a JSON parser in sed that could handle the different entity encoding etc. that could be present in the data (especially in URLs).
– Kusalananda
2 days ago












@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
2 days ago




@Jesse_b: Thanks, I've just added the json object, and if possible jq and json also work if it can solve the issue.
– Tuyen Pham
2 days ago












@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
2 days ago




@Kusalananda: Thanks, I'll edit the title and change content to suit the context.
– Tuyen Pham
2 days ago










1 Answer
1






active

oldest

votes

















up vote
8
down vote



accepted










This works on the JSON document given in the question:



$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here


This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).



If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].



To trim the https:// off from the URLs, use



.url|ltrimstr("https://")


instead of just .url.






share|improve this answer























  • Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
    – Tuyen Pham
    2 days ago










  • So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
    – Tuyen Pham
    2 days ago






  • 1




    @TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
    – glenn jackman
    2 days ago












  • @TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
    – Kusalananda
    2 days ago












  • How to trim both http:// and https://?
    – Tuyen Pham
    2 days ago













Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484933%2fhow-to-combine-strings-from-json-values-keeping-only-part-of-the-string%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
8
down vote



accepted










This works on the JSON document given in the question:



$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here


This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).



If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].



To trim the https:// off from the URLs, use



.url|ltrimstr("https://")


instead of just .url.






share|improve this answer























  • Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
    – Tuyen Pham
    2 days ago










  • So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
    – Tuyen Pham
    2 days ago






  • 1




    @TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
    – glenn jackman
    2 days ago












  • @TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
    – Kusalananda
    2 days ago












  • How to trim both http:// and https://?
    – Tuyen Pham
    2 days ago

















up vote
8
down vote



accepted










This works on the JSON document given in the question:



$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here


This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).



If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].



To trim the https:// off from the URLs, use



.url|ltrimstr("https://")


instead of just .url.






share|improve this answer























  • Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
    – Tuyen Pham
    2 days ago










  • So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
    – Tuyen Pham
    2 days ago






  • 1




    @TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
    – glenn jackman
    2 days ago












  • @TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
    – Kusalananda
    2 days ago












  • How to trim both http:// and https://?
    – Tuyen Pham
    2 days ago















up vote
8
down vote



accepted







up vote
8
down vote



accepted






This works on the JSON document given in the question:



$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here


This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).



If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].



To trim the https:// off from the URLs, use



.url|ltrimstr("https://")


instead of just .url.






share|improve this answer














This works on the JSON document given in the question:



$ jq -r '.roots.bookmark_bar.children|.children|[""(.name)"",.url]|@tsv' file.json
"The title is here" https://the_url_is_here
"The title here" https://url_here


This accesses the .children array of each .roots.bookmark_bar.children array entry and creates a string that is formatted according to what you showed in the question (with a tab character in-between the two pieces of data).



If the double quotes are not necessary, you could change the cumbersome [""(.name)"",.url] to just [.name,.url].



To trim the https:// off from the URLs, use



.url|ltrimstr("https://")


instead of just .url.







share|improve this answer














share|improve this answer



share|improve this answer








edited 2 days ago

























answered 2 days ago









Kusalananda

118k16223361




118k16223361












  • Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
    – Tuyen Pham
    2 days ago










  • So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
    – Tuyen Pham
    2 days ago






  • 1




    @TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
    – glenn jackman
    2 days ago












  • @TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
    – Kusalananda
    2 days ago












  • How to trim both http:// and https://?
    – Tuyen Pham
    2 days ago




















  • Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
    – Tuyen Pham
    2 days ago










  • So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
    – Tuyen Pham
    2 days ago






  • 1




    @TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
    – glenn jackman
    2 days ago












  • @TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
    – Kusalananda
    2 days ago












  • How to trim both http:// and https://?
    – Tuyen Pham
    2 days ago


















Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
– Tuyen Pham
2 days ago




Thanks, at the end of the file I get this errror: jq: error (at Bookmarks:23397): Cannot iterate over null (null), 23397 is the last line of the file.
– Tuyen Pham
2 days ago












So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
– Tuyen Pham
2 days ago




So I've just modified your command, the correct one should be: jq -r '.roots.bookmark_bar.children|.children?|[""(.name)"",.url]|@tsv' that eliminate the above error. One more question, Is that space or tab between title and url? What if I need to insert tab between them?
– Tuyen Pham
2 days ago




1




1




@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
– glenn jackman
2 days ago






@TuyenPham, it's a tab. "@tsv" is a jq formatter for tab-separated values. You could also use @csv to get output like "The title here","https://url_here"
– glenn jackman
2 days ago














@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
– Kusalananda
2 days ago






@TuyenPham I only had the partial document that you provided to look at, so no wonder there were errors. Good work sorting them out! The @tsv command formats the array that it gets as a tab-delimited string.
– Kusalananda
2 days ago














How to trim both http:// and https://?
– Tuyen Pham
2 days ago






How to trim both http:// and https://?
– Tuyen Pham
2 days ago




















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484933%2fhow-to-combine-strings-from-json-values-keeping-only-part-of-the-string%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

List directoties down one level, excluding some named directories and files

list processes belonging to a network namespace

list systemd RuntimeDirectory mounts