Simple flex-based lexer
up vote
4
down vote
favorite
I am trying to learn flex and have created this simple program. The rule for comments works correctly for single line comments such as:
// this is a comment
and this:
/* this is also a comment */
My code:
ID [A-Z][A-Za-z0-9]
KEYWORD if|else|then|for|fi|loop|pool|proc|func
OPERATOR "+"|"-"|"/"|"*"|"&"|"%"
PUNCTUATION ":"|","
%%
{ID} printf("An Id found:%sn",yytext);
{KEYWORD} printf("An Keyword found:%sn",yytext);
{OPERATOR} printf("An Operator found:%sn",yytext);
{PUNCTUATION} printf("An Punctuation found:%sn",yytext);
[/][/].*
"/*".*"*/"
%%
int yywrap(){
return 1;
}
main(){
yylex();
}
I'd be interested in comments on this and particularly ways to improve the code, such as for being able to detect multi-line comments.
compiler lexer
add a comment |
up vote
4
down vote
favorite
I am trying to learn flex and have created this simple program. The rule for comments works correctly for single line comments such as:
// this is a comment
and this:
/* this is also a comment */
My code:
ID [A-Z][A-Za-z0-9]
KEYWORD if|else|then|for|fi|loop|pool|proc|func
OPERATOR "+"|"-"|"/"|"*"|"&"|"%"
PUNCTUATION ":"|","
%%
{ID} printf("An Id found:%sn",yytext);
{KEYWORD} printf("An Keyword found:%sn",yytext);
{OPERATOR} printf("An Operator found:%sn",yytext);
{PUNCTUATION} printf("An Punctuation found:%sn",yytext);
[/][/].*
"/*".*"*/"
%%
int yywrap(){
return 1;
}
main(){
yylex();
}
I'd be interested in comments on this and particularly ways to improve the code, such as for being able to detect multi-line comments.
compiler lexer
3
Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54
My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54
I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I am trying to learn flex and have created this simple program. The rule for comments works correctly for single line comments such as:
// this is a comment
and this:
/* this is also a comment */
My code:
ID [A-Z][A-Za-z0-9]
KEYWORD if|else|then|for|fi|loop|pool|proc|func
OPERATOR "+"|"-"|"/"|"*"|"&"|"%"
PUNCTUATION ":"|","
%%
{ID} printf("An Id found:%sn",yytext);
{KEYWORD} printf("An Keyword found:%sn",yytext);
{OPERATOR} printf("An Operator found:%sn",yytext);
{PUNCTUATION} printf("An Punctuation found:%sn",yytext);
[/][/].*
"/*".*"*/"
%%
int yywrap(){
return 1;
}
main(){
yylex();
}
I'd be interested in comments on this and particularly ways to improve the code, such as for being able to detect multi-line comments.
compiler lexer
I am trying to learn flex and have created this simple program. The rule for comments works correctly for single line comments such as:
// this is a comment
and this:
/* this is also a comment */
My code:
ID [A-Z][A-Za-z0-9]
KEYWORD if|else|then|for|fi|loop|pool|proc|func
OPERATOR "+"|"-"|"/"|"*"|"&"|"%"
PUNCTUATION ":"|","
%%
{ID} printf("An Id found:%sn",yytext);
{KEYWORD} printf("An Keyword found:%sn",yytext);
{OPERATOR} printf("An Operator found:%sn",yytext);
{PUNCTUATION} printf("An Punctuation found:%sn",yytext);
[/][/].*
"/*".*"*/"
%%
int yywrap(){
return 1;
}
main(){
yylex();
}
I'd be interested in comments on this and particularly ways to improve the code, such as for being able to detect multi-line comments.
compiler lexer
compiler lexer
edited Nov 16 at 15:03
MCCCS
1034
1034
asked Dec 16 '14 at 16:41
Setu Basak
17527
17527
3
Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54
My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54
I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56
add a comment |
3
Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54
My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54
I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56
3
3
Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54
Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54
My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54
My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54
I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56
I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56
add a comment |
1 Answer
1
active
oldest
votes
up vote
7
down vote
The code looks OK for what it does so far, but there are some things you might want to do to improve it:
Always use {} for production rules
It's not technically wrong to simply have printf(...) to the right of a rule, but when your lexer gets more complex (and when you start also using a parser) you may find it easier to troubleshoot if you always use {} to enclose production rules -- even empty ones.
Think about explicitly handling whitespace
It's very common for a parser to need to ignore whitespace. If that's the case, it's usually good to do so explicitly with a rule just above the error-handling rule(s) I mention below.
[ tn]+ { /* ignore whitespace */ }
Consider a "catch-all" rule for illegal tokens
Right now, pretty much any random character will be accepted. This might be fine, but especially while you're learning, you may find it useful to put a catch-all rule at the bottom of your list of rules:
. { printf("Bad character: %sn", yytext); }
Consider adding support for multiline comments
As your original (pre-edit) code had it, handling multiline comments is different but not too difficult. You can add this to your definitions (the first part of a flex file):
%x c_comment
Then add these rules to the rules section (second part of a flex file):
"/*" { BEGIN(c_comment); }
<c_comment>[^*]* { }
<c_comment>"*"+[^*/]* { }
<c_comment>"*/" { printf("Ignored a multiline commentn"); BEGIN(INITIAL); }
This defines a start condition called c_comment and switches into that condition when it finds the opening pair of characters for a comment. The next rule ignores everyting that is not a * character. The next line ignores all * characters that are not followed by a /. The point to these two rules is to match as many characters as possible. For performance reasons, you would generally want to write your lexer so that it matches strings that are as long as possible for each rule. This helps the lexer go faster.
Finally, the last rule finds the closing pair of characters and switches back into the initial context. You will also often see BEGIN(0) for that -- the statements are identical in function, but I prefer the more verbose BEGIN(INITIAL) form because I think it's easier to understand.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
7
down vote
The code looks OK for what it does so far, but there are some things you might want to do to improve it:
Always use {} for production rules
It's not technically wrong to simply have printf(...) to the right of a rule, but when your lexer gets more complex (and when you start also using a parser) you may find it easier to troubleshoot if you always use {} to enclose production rules -- even empty ones.
Think about explicitly handling whitespace
It's very common for a parser to need to ignore whitespace. If that's the case, it's usually good to do so explicitly with a rule just above the error-handling rule(s) I mention below.
[ tn]+ { /* ignore whitespace */ }
Consider a "catch-all" rule for illegal tokens
Right now, pretty much any random character will be accepted. This might be fine, but especially while you're learning, you may find it useful to put a catch-all rule at the bottom of your list of rules:
. { printf("Bad character: %sn", yytext); }
Consider adding support for multiline comments
As your original (pre-edit) code had it, handling multiline comments is different but not too difficult. You can add this to your definitions (the first part of a flex file):
%x c_comment
Then add these rules to the rules section (second part of a flex file):
"/*" { BEGIN(c_comment); }
<c_comment>[^*]* { }
<c_comment>"*"+[^*/]* { }
<c_comment>"*/" { printf("Ignored a multiline commentn"); BEGIN(INITIAL); }
This defines a start condition called c_comment and switches into that condition when it finds the opening pair of characters for a comment. The next rule ignores everyting that is not a * character. The next line ignores all * characters that are not followed by a /. The point to these two rules is to match as many characters as possible. For performance reasons, you would generally want to write your lexer so that it matches strings that are as long as possible for each rule. This helps the lexer go faster.
Finally, the last rule finds the closing pair of characters and switches back into the initial context. You will also often see BEGIN(0) for that -- the statements are identical in function, but I prefer the more verbose BEGIN(INITIAL) form because I think it's easier to understand.
add a comment |
up vote
7
down vote
The code looks OK for what it does so far, but there are some things you might want to do to improve it:
Always use {} for production rules
It's not technically wrong to simply have printf(...) to the right of a rule, but when your lexer gets more complex (and when you start also using a parser) you may find it easier to troubleshoot if you always use {} to enclose production rules -- even empty ones.
Think about explicitly handling whitespace
It's very common for a parser to need to ignore whitespace. If that's the case, it's usually good to do so explicitly with a rule just above the error-handling rule(s) I mention below.
[ tn]+ { /* ignore whitespace */ }
Consider a "catch-all" rule for illegal tokens
Right now, pretty much any random character will be accepted. This might be fine, but especially while you're learning, you may find it useful to put a catch-all rule at the bottom of your list of rules:
. { printf("Bad character: %sn", yytext); }
Consider adding support for multiline comments
As your original (pre-edit) code had it, handling multiline comments is different but not too difficult. You can add this to your definitions (the first part of a flex file):
%x c_comment
Then add these rules to the rules section (second part of a flex file):
"/*" { BEGIN(c_comment); }
<c_comment>[^*]* { }
<c_comment>"*"+[^*/]* { }
<c_comment>"*/" { printf("Ignored a multiline commentn"); BEGIN(INITIAL); }
This defines a start condition called c_comment and switches into that condition when it finds the opening pair of characters for a comment. The next rule ignores everyting that is not a * character. The next line ignores all * characters that are not followed by a /. The point to these two rules is to match as many characters as possible. For performance reasons, you would generally want to write your lexer so that it matches strings that are as long as possible for each rule. This helps the lexer go faster.
Finally, the last rule finds the closing pair of characters and switches back into the initial context. You will also often see BEGIN(0) for that -- the statements are identical in function, but I prefer the more verbose BEGIN(INITIAL) form because I think it's easier to understand.
add a comment |
up vote
7
down vote
up vote
7
down vote
The code looks OK for what it does so far, but there are some things you might want to do to improve it:
Always use {} for production rules
It's not technically wrong to simply have printf(...) to the right of a rule, but when your lexer gets more complex (and when you start also using a parser) you may find it easier to troubleshoot if you always use {} to enclose production rules -- even empty ones.
Think about explicitly handling whitespace
It's very common for a parser to need to ignore whitespace. If that's the case, it's usually good to do so explicitly with a rule just above the error-handling rule(s) I mention below.
[ tn]+ { /* ignore whitespace */ }
Consider a "catch-all" rule for illegal tokens
Right now, pretty much any random character will be accepted. This might be fine, but especially while you're learning, you may find it useful to put a catch-all rule at the bottom of your list of rules:
. { printf("Bad character: %sn", yytext); }
Consider adding support for multiline comments
As your original (pre-edit) code had it, handling multiline comments is different but not too difficult. You can add this to your definitions (the first part of a flex file):
%x c_comment
Then add these rules to the rules section (second part of a flex file):
"/*" { BEGIN(c_comment); }
<c_comment>[^*]* { }
<c_comment>"*"+[^*/]* { }
<c_comment>"*/" { printf("Ignored a multiline commentn"); BEGIN(INITIAL); }
This defines a start condition called c_comment and switches into that condition when it finds the opening pair of characters for a comment. The next rule ignores everyting that is not a * character. The next line ignores all * characters that are not followed by a /. The point to these two rules is to match as many characters as possible. For performance reasons, you would generally want to write your lexer so that it matches strings that are as long as possible for each rule. This helps the lexer go faster.
Finally, the last rule finds the closing pair of characters and switches back into the initial context. You will also often see BEGIN(0) for that -- the statements are identical in function, but I prefer the more verbose BEGIN(INITIAL) form because I think it's easier to understand.
The code looks OK for what it does so far, but there are some things you might want to do to improve it:
Always use {} for production rules
It's not technically wrong to simply have printf(...) to the right of a rule, but when your lexer gets more complex (and when you start also using a parser) you may find it easier to troubleshoot if you always use {} to enclose production rules -- even empty ones.
Think about explicitly handling whitespace
It's very common for a parser to need to ignore whitespace. If that's the case, it's usually good to do so explicitly with a rule just above the error-handling rule(s) I mention below.
[ tn]+ { /* ignore whitespace */ }
Consider a "catch-all" rule for illegal tokens
Right now, pretty much any random character will be accepted. This might be fine, but especially while you're learning, you may find it useful to put a catch-all rule at the bottom of your list of rules:
. { printf("Bad character: %sn", yytext); }
Consider adding support for multiline comments
As your original (pre-edit) code had it, handling multiline comments is different but not too difficult. You can add this to your definitions (the first part of a flex file):
%x c_comment
Then add these rules to the rules section (second part of a flex file):
"/*" { BEGIN(c_comment); }
<c_comment>[^*]* { }
<c_comment>"*"+[^*/]* { }
<c_comment>"*/" { printf("Ignored a multiline commentn"); BEGIN(INITIAL); }
This defines a start condition called c_comment and switches into that condition when it finds the opening pair of characters for a comment. The next rule ignores everyting that is not a * character. The next line ignores all * characters that are not followed by a /. The point to these two rules is to match as many characters as possible. For performance reasons, you would generally want to write your lexer so that it matches strings that are as long as possible for each rule. This helps the lexer go faster.
Finally, the last rule finds the closing pair of characters and switches back into the initial context. You will also often see BEGIN(0) for that -- the statements are identical in function, but I prefer the more verbose BEGIN(INITIAL) form because I think it's easier to understand.
answered Dec 18 '14 at 18:38
Edward
45.4k376206
45.4k376206
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f73842%2fsimple-flex-based-lexer%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54
My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54
I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56