Improve markdown syntax highlighting in kate #174
Merged
MicheleC
merged 2 commits from feat/markdown_mods
into master
2 years ago
Loading…
Reference in new issue
There is no content yet.
Delete Branch 'feat/markdown_mods'
Deleting a branch is permanent. It CANNOT be undone. Continue?
I've made some changes to markdown.xml which work well for me.
Some, like the diff block, I think will be acceptable, others like the colouring are obviously personal preferences, which might not be, so this is posted as a wip.
I haven't tried to be too clever with the regexes - they work but could probably do with some improvement.
I think the original author probably used some StyleNum definitions to show the highlighting in their default colours [which seem to be set in katehighlight.cpp]. My preference is to use defStyleNum="dsNormal" and define the colours required with color="" and/or backgroundColor="" - easier for the user to choose their own colours - and set their own values for boolean options.
The changes are:
Add an 'idlink' to allow differentiation between an inline link and a link to an 'id' reference within the same page
The \s? is necessary because a ruler can be char-space-char-space-char as well as three or more contiguous characters.
Show line only through the string and the two tildes at each end - any other characters attached won't be struck out.
Set up a prediff context to display diffs (see later)
For a diff, any line starting with a '-', '+', '--- ', '+++ ' or '@@', or 'diff ' will be displayed as set by
<itemData name="diff..
Separate diffheader1/2 to allow lines starting '--- ' to be set as bold
Don't display files with a .text extension as markdown
Bullet and numlists can be indented by tabs or multiples of 4 spaces - so they are prioritized over code to avoid code highlighting on indented items
Add bold+italics option to blockquote, bullet lists, and num lists
Add a context for a fenced code block enclosed between triple backticks - anything not a diff
Copied from
<context attribute="comment" ...
- not sure of the significance of all the values, but it works!Add a context for a diff block
Expand on the above to display diffs with colours and bold as appropriate
Regex for the fenced code blocks pre and prediff - 'prediff' has precedence over 'pre' to enable diff highlighting
The space between ``` and diff can be null, space(s), or tabs - ditto, for example, ``` shell
Display code additionally as
this is code
, that is as one line enclosed in single backticks, but not if the backtick has been escapedFor a link to an internal 'id' reference
Colour choices and some bold for diff - based on *.diff/*.patch highlighting
Ruler colour
Add a background colour to strikeout - shows better than just the line through
Make the double space line break more visible - remove underscore and add background colour
Colours for blockquote, bullet lists, and num lists text
Add light grey background to fenced code blocks as with some markdown viewers
Add light grey background and set color with color="" rather than through defStyleNum="dsBaseN"
Colour links blue, and show a link to an internal id in italics
The italic and bold values of false are defaults [as per language.dtd]
Colour image links blue, and tone down the default background colour, which was for defStyleNum="dsAlert"
WIP: improve markdown syntax highlighting in kateto improve markdown syntax highlighting in kate 2 years agoHi @Ray-V, thanks for this. I will take a look during the weekend.
<!ENTITY autolinkregex '<(https?|ftp):[^\">\s]+>'>
<!ENTITY mailtolinkregex '<(?:mailto:)?([-.\w]+\@[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+)>'>
<!ENTITY rulerregex '^\s*([\*\-_]){3,}\s*$'>
<!ENTITY rulerregex '^\s*([\*\-_]\s?){3,}\s*$'>
This may actually need to be
since it seems multiple separators are allowed in between *-_
<!-- two spaces at end of line generates linebreak -->
<!ENTITY linebreakregex " $">
<!ENTITY strikeoutregex "[~]{2}[^~].*[^~][~]{2}"> <!-- pandoc style -->
<!ENTITY strikeoutregex "~~[^~].*~~">
<!ENTITY strikeoutregex "~~[^~].*~~"
IMO, this should be:
<!ENTITY strikeoutregex "~~[^~]+~~"
We don't want
.*
to eat tildas on a long tilda sequence.What do you think?
EDIT: this seems to work fine already with the current regex, although the new proposed one is shorter. Did you have any specific problem that caused you to modify the original regex?
<RegExpr attribute="emphasis" String="&emphasisregex;" />
<RegExpr attribute="ruler" String="&rulerregex;" />
<RegExpr context="bullet" String="^[,\t, {4}]*[\*\+\-]\s" />
<RegExpr context="numlist" String="^[,\t, {4}]*[\d]+\.\s" />
[,\t, {4}]*
This part of the regex does not seems right to me. It would match any line starting with comma, tab, space, open brace, number 4, close brace, followed by * or + or -, separator and the rest of the list.
These lines match:
These lines don't match:
Probably what we need here is:
</context>
<context attribute="pre" lineEndContext="#stay" name="pre" >
<RegExpr String="```$" attribute="pre" context="#pop" endRegion="pre"/>
We need String="```" without end-of-line $ to handle inline code sections too.
<RegExpr attribute="code" String="`[^`]+`" />
<RegExpr context="comment" String="<!--" beginRegion="comment" />
<RegExpr context="prediff" String="```\s{0,}diff" beginRegion="prediff" />
<RegExpr context="pre" String="```.*" beginRegion="pre" />
Just String="```" without .* will do.
EDIT: actually we need to distinguish between the case ``` is used for inline code orfenced code block. Probably need more changes around here. We also need to be able to handle fenced blocks surrounded by ~~~
@Ray-V I am going through your work and commenting as I go. Once I am done, I will probably push a commit on top of yours with some of the changes I suggested, so that you can test them as well. In the end when everything is ready, we will squash them into a single commit and push to master.
<RegExpr context="comment" String="<!--" beginRegion="comment" />
<RegExpr context="prediff" String="```\s{0,}diff" beginRegion="prediff" />
<RegExpr context="pre" String="```.*" beginRegion="pre" />
<RegExpr attribute="code" String="`[^`].*[^\\]`" />
As mentioned above, we need a rule to handle also inline
code
<itemData name="diffheader2" defStyleNum="dsNormal" backgroundColor="#eeeeee" color="#800000" />
<itemData name="diffheaderdiff" defStyleNum="dsNormal" backgroundColor="#eeeeee" color="#000000" bold="true" />
<itemData name="difflineremove" defStyleNum="dsNormal" backgroundColor="#eeeeee" color="red" />
<itemData name="difflineadd" defStyleNum="dsNormal" backgroundColor="#eeeeee" color="blue" />
blue and red are not bad, but it may be worth considering red abd green backgrounds, as in many online diff tools.
<itemData name="code" defStyleNum="dsNormal" color="darkcyan" backgroundColor="#eeeeee" />
<itemData name="reflink" defStyleNum="dsOthers" color="blue" />
<itemData name="idlink" defStyleNum="dsOthers" color="blue" italic="true" />
<itemData name="inlinelink" defStyleNum="dsOthers" color="blue" />
I think it is quite normal for links to be underlined.
Of course it is a very subjective and personal thing.
@SlavekB @blu.256: what do you think? Should we keep links underlined or not?
Discussed with Slavek. Since this is not really a clickable link in katepart, it makes more sense to leave the item not underlined.
NOTE: I will be pushing a commit later today. I am marking issues 'resolved' as I am working through changes locally. This helps me tracking the resolution of the various issues.
3861f22b30
toc5f768e605
2 years ago@Ray-V I have added a commit on top of yours (and rebased on top of current master). Could you please test if everything works fine for you and feedback about it?
EDIT: I tested using the test file found here plus some extra changes I made myself.
@Ray-V
reminder, if you could have a look at the adidtional changes pushed on top of your commits. Thank you :-)
Gitea allows a zero to three space lead-in with multiple separators, so let's base the regex on that.
Bullet and Num lists should be indented by tabs or groups of spaces set to tab_width == 4 spaces.
Any deviation is allowed, for example indentation==3 spaces would show in the list, but it would be better if the kate highlighting didn't work for any list item which is out of line.
The original markdown spec states:
although Markdown.pl allows for one or more tabs.
Gitea only supports one tab or 1-4 spaces, so let's base the regex on that.
The code regex needs modifying otherwise any indentation by any other number of spaces shows as code.
The regex for the code between single backticks doesn't work where there is more than one block in a line.
The whole string between the first and last backtick is highlighted where it should only be the individual code blocks that are highlighted.
AND
Re: the addition of inlinecoderegex - this needs to be for any number of backticks, which gitea supports.
Let's reserve the attribute 'inlinecode' for a single line code block, and 'code' for whatever is indented by tab[s] or 4x spaces.
The inlinecode string seems to cover these two scenarios, except it doesn't provide a check of whether the ending number of backticks equals the number at the start. I would think though that in 99.99% of cases a single backtick will be used.
This is syntax highlighting for kate and I think it would be better to colour the text, to be consistent with diff.xml.
The addition of the grey background highlights the block and the extent of the diff as there are other elements in a markdown document which aren't present in a diff/patch.
For the strikeout, any string containing a tilde, whether escaped or not, won't be struck out.
And there should be no tab or space after the starting tilde pair.
The 'tilde' regexes don't work for a
~~~diff ... ~~~
block - it just shows as normal text.Add tildes to the pre* regexes - not strictly correct because start and end must be the same character, but who's not going to use matching sets at each end?
I think there's more that can be added.
Gitea for example, supports the commonmark use of a backslash for a line-break.
Please, is there anything waiting to move forward? It would be good to finish it soon to be part of the upcoming R14.0.13.
Nothing holding this up other than me reviewing the latest comments from @Ray-V. I will have a look through the weekend or next week and feedback. Definitely something to include in R14.0.13.
Well, thank you.
Sorry, had a busy week and so I haven't looked at this yet. Will do this week though, so we can add this to R14.0.13.
Ok, finally got some time to dedicate to this.
Comments on the first two points, more to follow later. In the end I will upload a commit with the new proposed changes.
Thanks for the correction on the leading separators. I had not noticed that if we have more than 3 separators, the line should not be considered an horizontal line at all. So we do need to limit the leading numbers to a max of 3.
I don't see the point to replace
\s
with[ \t]
though. Although technically\s
matches characters that should not be considered (\n
,\r
,\f
),\s
is used extensively in most of the other regexes in use.[ \t]
makes the regex harder to read without any real benefit.The new proposed regex is:
<!ENTITY rulerregex '^ {,3}([\*\-_]\s*){3,}\s*$'>
Good point to bring up. According to the original markdown specs:
List markers typically start at the left margin, but may be indented by up to three spaces. List markers must be followed by one or more spaces or a tab
.gitea/github and some online websites follow that, so we should limit the leading spaces to a max of 3.
For the spaces following the marker, gitea/github and online websites seems to behave differently if there are 5 or more spaces between marker and the rest of the line. I propose we follow the spec and we don't put a limit on the number of spaces after the marker.
The new proposed regex are:
With this, no change to the
code
regex is required since there is no overlapping between the rules forbullet
,numlist
andcode
.Good catch, it was indeed wrong. But the proposed solution is also wrong, because it can't handle single ticks within code sections. Most notably the example in the initial markdown page:
There is a literal backtick (`) here.
I have tested this at length and it seems the current regex engine that is used in TDE does not support non greedy regex nor lookahead with backreferences. Therefore I had to come up with a bit of a trick, but that seems to handle all the cases correctly.
Proposed regex for it is:
This handles the cases where code starts with 1 or 2 backticks. If we have 3 backticks we are handling them as a fenced code block already.
I don't think we need to introduce an addtional
inlinecode
itemData
, since it would presents the possible problem thatcode
andinlinecode
could be rendered in a different way.The proposed soultion seems to handle the following cases correctly:
Nevertheless the original name of
inlinecode
I had proposed was confusing (a better name would have beenfencedcodeblock
). I also noticed that such blocks are rendered differently from code blocks defined by 4 spaces or tab. I will do further investigation on that and write back when fixed.I have fixed the problem with the visualization of
pre
andpretilde
blocks: their content now shows in the same way ofcode
.Please note that we need to keep both
pre
andpretilde
contexts separate because 3 tildes will not terminate a fenced block started with 3 backticks and viceversa. So the change in commit e6abb13e to unify the two regex into one is not correct.@Ray-V @SlavekB The question is: do we want code blocks (created by 4 space indentation) and fenced blocks (created with 3 backticks/tildes) to look different? Often we use 3 backticks to show code, so do you think it is better to show code and fenced blocks with the same colors?
I take the point in having something consistent with what we do for diff.xml and in such case I am ok with it, as long as we do exactly the same (no gray background).
On the other hand I also propose: why we don't update diff.xml to use green/red diffs in gitea/github style? @Ray-V @SlavekB what do you think?
Good point in not having a separator after
~~
. And from testing, the same applies to the character before the ending~~
.New proposed regex is:
For linebreak, good suggestion to add the backslash character. The regex can be simplified as follow though:
Good catch. Fixed that.
@Ray-V @SlavekB currently the filenames in diffs are shown with the same color. It would be good to differentiate them in red/green or red/blue as gitea/github do. What do you think?
e6abb13e34
to653f20b8c5
2 years agoI have rebased the branch on top of master and add a commit based on the explanation above. Please review and comment.
Once we are all happy with the final version, all those commits can be squashed into one before merging.
Yes, we commonly use three backticks in the same meaning as a fenced block to show code. So it seems like a good idea to be in the same colors.
Yes, green/red appears to be a frequently used combination, so yes, it would make sense for diff.xml to be united with others.
The names of the files can be considered as part of the diff control headers that are in one color. Therefore, it seems good to me that such headers do not use red/green and do not disturb real changes in content.
@SlavekB thanks for the feedback. I have now pushed a further commit to align the colors for diff and markdown and set on green/red scheme as default.
If a user wants a different color scheme, it has the possibility to customize the colors within Kate to his/her liking.
5b7fdae779
to4e910c2ad6
2 years ago4e910c2ad6
into master 2 years agoMerged as agreed with @SlavekB on jabber discussions.
@Ray-V thanks for the great work!
improve markdown syntax highlighting in kateto Improve markdown syntax highlighting in kate 2 years ago4e910c2ad6
.