Simple flex-based lexer

up vote
4
down vote

favorite

I am trying to learn flex and have created this simple program. The rule for comments works correctly for single line comments such as:

// this is a comment

and this:

/* this is also a comment */

My code:

ID [A-Z][A-Za-z0-9]

KEYWORD if|else|then|for|fi|loop|pool|proc|func

OPERATOR "+"|"-"|"/"|"*"|"&"|"%"

PUNCTUATION ":"|","

%%

{ID} printf("An Id found:%sn",yytext);

{KEYWORD} printf("An Keyword found:%sn",yytext);

{OPERATOR} printf("An Operator found:%sn",yytext);

{PUNCTUATION} printf("An Punctuation found:%sn",yytext);

[/][/].* 



"/*".*"*/"



%%

int yywrap(){

  return 1;

}

main(){

  yylex();

}

I'd be interested in comments on this and particularly ways to improve the code, such as for being able to detect multi-line comments.

edited Nov 16 at 15:03

MCCCS

1034

asked Dec 16 '14 at 16:41

Setu Basak

17527

3

Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54

My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54

I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56

add a comment |

up vote
4
down vote

favorite

I am trying to learn flex and have created this simple program. The rule for comments works correctly for single line comments such as:

// this is a comment

and this:

/* this is also a comment */

My code:

ID [A-Z][A-Za-z0-9]

KEYWORD if|else|then|for|fi|loop|pool|proc|func

OPERATOR "+"|"-"|"/"|"*"|"&"|"%"

PUNCTUATION ":"|","

%%

{ID} printf("An Id found:%sn",yytext);

{KEYWORD} printf("An Keyword found:%sn",yytext);

{OPERATOR} printf("An Operator found:%sn",yytext);

{PUNCTUATION} printf("An Punctuation found:%sn",yytext);

[/][/].* 



"/*".*"*/"



%%

int yywrap(){

  return 1;

}

main(){

  yylex();

}

I'd be interested in comments on this and particularly ways to improve the code, such as for being able to detect multi-line comments.

edited Nov 16 at 15:03

MCCCS

1034

asked Dec 16 '14 at 16:41

Setu Basak

17527

3

Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54

My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54

I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56

add a comment |

up vote
4
down vote

favorite

I am trying to learn flex and have created this simple program. The rule for comments works correctly for single line comments such as:

// this is a comment

and this:

/* this is also a comment */

My code:

ID [A-Z][A-Za-z0-9]

KEYWORD if|else|then|for|fi|loop|pool|proc|func

OPERATOR "+"|"-"|"/"|"*"|"&"|"%"

PUNCTUATION ":"|","

%%

{ID} printf("An Id found:%sn",yytext);

{KEYWORD} printf("An Keyword found:%sn",yytext);

{OPERATOR} printf("An Operator found:%sn",yytext);

{PUNCTUATION} printf("An Punctuation found:%sn",yytext);

[/][/].* 



"/*".*"*/"



%%

int yywrap(){

  return 1;

}

main(){

  yylex();

}

I'd be interested in comments on this and particularly ways to improve the code, such as for being able to detect multi-line comments.

edited Nov 16 at 15:03

MCCCS

1034

asked Dec 16 '14 at 16:41

Setu Basak

17527

I am trying to learn flex and have created this simple program. The rule for comments works correctly for single line comments such as:

// this is a comment

and this:

/* this is also a comment */

My code:

ID [A-Z][A-Za-z0-9]

KEYWORD if|else|then|for|fi|loop|pool|proc|func

OPERATOR "+"|"-"|"/"|"*"|"&"|"%"

PUNCTUATION ":"|","

%%

{ID} printf("An Id found:%sn",yytext);

{KEYWORD} printf("An Keyword found:%sn",yytext);

{OPERATOR} printf("An Operator found:%sn",yytext);

{PUNCTUATION} printf("An Punctuation found:%sn",yytext);

[/][/].* 



"/*".*"*/"



%%

int yywrap(){

  return 1;

}

main(){

  yylex();

}

I'd be interested in comments on this and particularly ways to improve the code, such as for being able to detect multi-line comments.

compiler lexer

edited Nov 16 at 15:03

MCCCS

1034

asked Dec 16 '14 at 16:41

Setu Basak

17527

edited Nov 16 at 15:03

MCCCS

1034

asked Dec 16 '14 at 16:41

Setu Basak

17527

edited Nov 16 at 15:03

MCCCS

1034

edited Nov 16 at 15:03

MCCCS

1034

edited Nov 16 at 15:03

MCCCS

1034

asked Dec 16 '14 at 16:41

Setu Basak

17527

asked Dec 16 '14 at 16:41

Setu Basak

17527

asked Dec 16 '14 at 16:41

Setu Basak

17527

3

Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54

My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54

I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56

add a comment |

3

Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54

My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54

I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56

Hello @setu. We realize you are pretty new to this Stack. Please make sure that your code works before posting here, in order to avoid your question being closed/deleted. As it stands I think you could post to Stack Overflow instead to get help with the bugs.
– Phrancis
Dec 16 '14 at 16:54

My code worked.But just to know the trick why worked.I have edited my post.
– Setu Basak
Dec 16 '14 at 16:54

I've edited to try to bring the question into line with site guidelines. Please make sure I haven't omitted too much for the question to still be useful.
– Edward
Dec 18 '14 at 17:56

add a comment |

1 Answer
1

active

oldest

votes

up vote
7
down vote

The code looks OK for what it does so far, but there are some things you might want to do to improve it:

Always use `{}` for production rules

It's not technically wrong to simply have printf(...) to the right of a rule, but when your lexer gets more complex (and when you start also using a parser) you may find it easier to troubleshoot if you always use {} to enclose production rules -- even empty ones.

Think about explicitly handling whitespace

It's very common for a parser to need to ignore whitespace. If that's the case, it's usually good to do so explicitly with a rule just above the error-handling rule(s) I mention below.

[ tn]+   { /* ignore whitespace */ }

Consider a "catch-all" rule for illegal tokens

Right now, pretty much any random character will be accepted. This might be fine, but especially while you're learning, you may find it useful to put a catch-all rule at the bottom of your list of rules:

.   { printf("Bad character: %sn", yytext); }

Consider adding support for multiline comments

As your original (pre-edit) code had it, handling multiline comments is different but not too difficult. You can add this to your definitions (the first part of a flex file):

%x c_comment

Then add these rules to the rules section (second part of a flex file):

"/*"   { BEGIN(c_comment); }

<c_comment>[^*]*        { }

<c_comment>"*"+[^*/]*   { }

<c_comment>"*/"         { printf("Ignored a multiline commentn"); BEGIN(INITIAL); }

This defines a start condition called c_comment and switches into that condition when it finds the opening pair of characters for a comment. The next rule ignores everyting that is not a * character. The next line ignores all * characters that are not followed by a /. The point to these two rules is to match as many characters as possible. For performance reasons, you would generally want to write your lexer so that it matches strings that are as long as possible for each rule. This helps the lexer go faster.

Finally, the last rule finds the closing pair of characters and switches back into the initial context. You will also often see BEGIN(0) for that -- the statements are identical in function, but I prefer the more verbose BEGIN(INITIAL) form because I think it's easier to understand.

answered Dec 18 '14 at 18:38

Edward

45.4k376206

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f73842%2fsimple-flex-based-lexer%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
7
down vote

The code looks OK for what it does so far, but there are some things you might want to do to improve it:

Always use `{}` for production rules

Think about explicitly handling whitespace

It's very common for a parser to need to ignore whitespace. If that's the case, it's usually good to do so explicitly with a rule just above the error-handling rule(s) I mention below.

[ tn]+   { /* ignore whitespace */ }

Consider a "catch-all" rule for illegal tokens

.   { printf("Bad character: %sn", yytext); }

Consider adding support for multiline comments

As your original (pre-edit) code had it, handling multiline comments is different but not too difficult. You can add this to your definitions (the first part of a flex file):

%x c_comment

Then add these rules to the rules section (second part of a flex file):

"/*"   { BEGIN(c_comment); }

<c_comment>[^*]*        { }

<c_comment>"*"+[^*/]*   { }

<c_comment>"*/"         { printf("Ignored a multiline commentn"); BEGIN(INITIAL); }

answered Dec 18 '14 at 18:38

Edward

45.4k376206

add a comment |

up vote
7
down vote

The code looks OK for what it does so far, but there are some things you might want to do to improve it:

Always use `{}` for production rules

Think about explicitly handling whitespace

It's very common for a parser to need to ignore whitespace. If that's the case, it's usually good to do so explicitly with a rule just above the error-handling rule(s) I mention below.

[ tn]+   { /* ignore whitespace */ }

Consider a "catch-all" rule for illegal tokens

.   { printf("Bad character: %sn", yytext); }

Consider adding support for multiline comments

As your original (pre-edit) code had it, handling multiline comments is different but not too difficult. You can add this to your definitions (the first part of a flex file):

%x c_comment

Then add these rules to the rules section (second part of a flex file):

"/*"   { BEGIN(c_comment); }

<c_comment>[^*]*        { }

<c_comment>"*"+[^*/]*   { }

<c_comment>"*/"         { printf("Ignored a multiline commentn"); BEGIN(INITIAL); }

answered Dec 18 '14 at 18:38

Edward

45.4k376206

add a comment |

up vote
7
down vote

The code looks OK for what it does so far, but there are some things you might want to do to improve it:

Always use `{}` for production rules

Think about explicitly handling whitespace

It's very common for a parser to need to ignore whitespace. If that's the case, it's usually good to do so explicitly with a rule just above the error-handling rule(s) I mention below.

[ tn]+   { /* ignore whitespace */ }

Consider a "catch-all" rule for illegal tokens

.   { printf("Bad character: %sn", yytext); }

Consider adding support for multiline comments

As your original (pre-edit) code had it, handling multiline comments is different but not too difficult. You can add this to your definitions (the first part of a flex file):

%x c_comment

Then add these rules to the rules section (second part of a flex file):

"/*"   { BEGIN(c_comment); }

<c_comment>[^*]*        { }

<c_comment>"*"+[^*/]*   { }

<c_comment>"*/"         { printf("Ignored a multiline commentn"); BEGIN(INITIAL); }

answered Dec 18 '14 at 18:38

Edward

45.4k376206

The code looks OK for what it does so far, but there are some things you might want to do to improve it:

Always use `{}` for production rules

Think about explicitly handling whitespace

It's very common for a parser to need to ignore whitespace. If that's the case, it's usually good to do so explicitly with a rule just above the error-handling rule(s) I mention below.

[ tn]+   { /* ignore whitespace */ }

Consider a "catch-all" rule for illegal tokens

.   { printf("Bad character: %sn", yytext); }

Consider adding support for multiline comments

As your original (pre-edit) code had it, handling multiline comments is different but not too difficult. You can add this to your definitions (the first part of a flex file):

%x c_comment

Then add these rules to the rules section (second part of a flex file):

"/*"   { BEGIN(c_comment); }

<c_comment>[^*]*        { }

<c_comment>"*"+[^*/]*   { }

<c_comment>"*/"         { printf("Ignored a multiline commentn"); BEGIN(INITIAL); }

answered Dec 18 '14 at 18:38

Edward

45.4k376206

answered Dec 18 '14 at 18:38

Edward

45.4k376206

answered Dec 18 '14 at 18:38

Edward

45.4k376206

answered Dec 18 '14 at 18:38

Edward

45.4k376206

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk