C-Gate 2.8 much slower to sync networks

Discussion in 'C-Bus Toolkit and C-Gate Software' started by more-solutions, Mar 9, 2010.

  1. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    I have the following environment:
    - 35 networks with CNIs
    - C-Lution communicating via C-Gate

    Previously, from starting C-Gate (whichever came with Toolkit 1.6), it took around 10-15 minutes to sync all networks so that C-Lution could take control of the lighting.

    Now, having upgraded to Toolkit 1.11, it is taking *much* longer (I'm not entirely sure how long because I needed to gain control so have had to force sync using NET SYNC commands, but after 20 minutes left to its own devices I had only 5 neetworks in sync, and there were devices showing error status that were not showing errors before, including some devices which Toolkit said it was unable to communicate with).

    How can I speed this up? Are the errors I'm getting an indication that there have always been errors that older versions were not detecting? (I'm refering to "state=error" in the output from a tree command. What does this actually mean?)

    I am on site for the next few hours and would really like to be able to tweak some settings before I leave.

    Note that there are the following differences in C-GateConfig.txt caused by the upgrade:
    command.show-responses = yes (was =no)
    sync-time=3600 (was 14400)
     
    more-solutions, Mar 9, 2010
    #1
  2. more-solutions

    ashleigh Moderator

    Joined:
    Aug 4, 2004
    Messages:
    2,393
    Likes Received:
    25
    Location:
    Adelaide, South Australia
    There are a couple of problems here that make it *really* hard to answer.

    The first is that you have upgraded from TK 1.6 which is very very very very old now. It's so old I can't remember when it came out. Most likely about 3-5 years ago, I'm guessing around 2005.

    Since then, TK and cgate have had hundreds of changes and modifications. (Roughly, very roughly... TK has had about 1500 modifications, updates or defects corrected, and cgate about 500 to 700.)

    If you were reporting this as a change in behaviour from TK 1.10 to TK 1.11, it would be easier to understand or dig into where a change in behaviour might have come from.

    So the first problem is - there's been a heck of a lot of water under that bridge.

    The second problem is that there is some expectation you have on sync speed, and its not clear if the behaviour you are now seeing is normal, or cgate is more or less tolerant than before, or if you have a problem or not.

    Cgate has been changed in the last 12-18 months, specifically as far as tolerance to errors goes, because it was previously been criticised for being TOO sensitive to faults in an installation (eg maintenance - a device gets removed to allow painting, etc). So its UNLIKELY that cgate is now MORE sensitive to errors than it used to be.

    Any further, more detailed information will have to be answered by the chaps who know in excruciating detail.
     
    ashleigh, Mar 9, 2010
    #2
  3. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    Sounds about right, that'll be roughly when the site was commissioned.

    I'm a great believer in "if it ain't bbroke don't fix it", but at some point we're likely to be installing some newer dimmers which I'm assuming I'll need a recent TK to work with. Since I had some quiet time to do the upgrade I took a chance - ouch!

    I've now rolled back to the old version but would like to persue this so I'm happy to run some more tests on my next visit.

    Just so you know, all the faults have gone after rolling back. In TK1.11 I had one network I couldn't even scan (from TK, from command line, etc), but it's come straight in now I've rolled back.

    As a software developer myself I know where you're coming from. I'd like to give you some decent info to work from - where do I start?

    If it helps I can start incremental upgrades, it'll probably be one version per site visit (which are every couple of weeks). If so, if there are specific versions to target just let me know.

    Looking at the old C-gate I have running now, almost immediately after C-gate starts I see all networks starting to synchronise (looking at the output from TREE), with all units over a period of about a minute going into NEW status (across all networks), and then over the next 10 minutes or so going to OK status as the data is collected from them.

    When I did the same thing with the new C-Gate, most networks sat in new status with no units and nothing much happened until I did NET FLUSH/NET SYNC on the network manually. Even then, one network refused to do anything (sync error, no units found) whether done manually or from Toolkit.

    (One problem I have is that if I scan from toolkit, all units stay in "NEW" status according to TREE, and in that state they do not seem to play well with C-Lution; I have to wait for a full SYNC to complete so all units show OK not NEW.)

    As I've been investigating this I've been digging deep into the information that C-Gate can tell me. Is there any documentation about the results of the GET command? Eg for a unit which stuck in ERROR status in the new C-gate but is now OK after rolling back, I see ErrorFlags=9 - what does that mean? (The ErrorFlags value has not changed after rolling back. Another unit on the same network which showed OK status in both C-Gate versions has ErrorFlags=1 so maybe bit 5 is significant?)
     
    more-solutions, Mar 9, 2010
    #3
  4. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    Also:
    net check_unravel
    .. didn't seem to work. It's not a command I've used before, but running it before the downgrade only ever gave me "200 OK" (with nothing before that). Now, after the downgrade, I see that net check_unravel gives quite a lot of output, and it might be a useful command to know about.

    I assume there's no good reason why this should have failed on my networks in the new version?
     
    more-solutions, Mar 9, 2010
    #4
  5. more-solutions

    ashleigh Moderator

    Joined:
    Aug 4, 2004
    Messages:
    2,393
    Likes Received:
    25
    Location:
    Adelaide, South Australia
    The best thing you can do is to gather level 9 logs from before AND after the upgrade, these will be big. Compress them (using 7-zip which is free), and send them in. Will PM details.
     
    ashleigh, Mar 9, 2010
    #5
  6. more-solutions

    daniel C-Busser Moderator

    Joined:
    Jul 26, 2004
    Messages:
    769
    Likes Received:
    21
    Location:
    Adelaide
    Information about capturing level 9 logs here: http://www.cbusforums.com/forums/showthread.php?t=5161

    The most important options are:
    command.show-responses=yes
    global-event-level=9
    use-event-file=yes

    If you can capture a level 9 log in TK 1.6 from the beginning throughout the entire process, and the same again in TK 1.11.x, this will be ideal for us in terms of looking at the differences. If you have the chance, a description and log captured in TK 1.10.9 would also be very useful.

    Other things you can tell us in relation to the logs:
    - What steps you are following in TK. Are you clicking "Open all networks" under the project and then leaving it to run? Or are you doing something else?
    - The time while using 1.11.x at which you diverged from the process normally used in 1.6 and started doing things manually, forcing syncs, etc.
    - if/when you used any other applications such as S+, Homegate, etc and what you are doing in those apps.

    Lastly, thank you for your time, I know it's limited but these logs will be very valuable.

    Cheers
    Daniel
     
    daniel, Mar 10, 2010
    #6
  7. more-solutions

    daniel C-Busser Moderator

    Joined:
    Jul 26, 2004
    Messages:
    769
    Likes Received:
    21
    Location:
    Adelaide
    Just touching on a few things, before receiving any logs:

    Background syncs are different from demand syncs. They occur automatically in C-Gate once you open a network. The value of option 'sync-time' influences the duration between the first and last commands of a background sync. This has been fiddled with extensively over time but for a value of 3600 (1 hour) a background sync will take place over a very imprecise duration of approximately 20-40 minutes. Your original configuration was set to 14400 (4 hours). For some large installations in the past, increasing this value was seen as a panacea to help solve problems. Sometimes it helped reduce traffic congestion, sometimes it caused other issues because each network would now take much longer to sync. And sometimes C-Gate wasn't interpreting the value correctly! A lot of fixes over the years have removed the need to increase this value, so you don't necessarily have to leap to change it. But it's worth knowing the essence of this option.

    Connecting to a network in TK will disable background syncs because TK sets the network property autosync=no (it is the only client application to do this). The Scan button will then do a demand sync. This takes a minute or two and you'll see the progress in TK. So if you are only using TK, sync-time doesn't have any bearing on this case.

    I am not sure whether use of C-Lution sets autosync back to yes or not. A lot depends on whether it connects before or after TK. This is a fairly important point which the logs will reveal.

    The ErrorFlags thing surprises me as I don't recall any significant changes in this area. They seem to indicate that the unit has reset in some way, but again, the logs will tell more.

    The variation in "net check_unravel" output suggests to me different event levels when you issued the command. If the "global-event-level=9" option was set both times, you *should* see pretty much the same output.
     
    daniel, Mar 10, 2010
    #7
  8. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    I'll generate some level 9 logs next time I am on site. It'll be just 1.6 and 1.11.0 for now due to the time it takes to do - I get on site around 6am and have about an hour to "play" before the site becomes too active for the lighting server to be unavailable.

    Well normally TK isn't even open; the server boots, starts C-Gate automatically (with a couple of Java options to give it some extra memory) then it's left to perform the sync. About 15 mins later C-Lution starts automatically and the system should be ready to use.

    I have a basic web GUI I built which I can use to monitor the progress of the syncs (basically it runs TREE commands and interprets the output) which yesterday was how I discovered that all networks were open but showing as "new" but with no units, aside from maybe 3 or 4 networks (from memory) which were starting to sync. "Normally" (ie with the old C-Gate) all would start to sync immediately and would complete within about 12 minutes. But yesterday, after waiting much longer than that I started TK to force some manual scans in order to get to a point that I could start C-Lution and get the site running (it was gone 7am by this point and that meant several scheduled lighting changes were overdue). This, combined with lots of NET SYNC commands run manually, allowed me to get operational but with one network not responding and several units showing errors.

    Eventually, I had no choice but to revert back to the old version, which involved switching to a backup of the C-Gate directory and starting afresh - I was up and running within 15 mins with no unit errors.

    I'll runs some clean tests next time I am on site; I'll start with level 9 logging with the new C-Gate, let it run for about 20 mins, manually sync a couple of networks after that, and shut down after about 30 mins. I'll then repeat with the old C-Gate as close as possible to the first test, but this time I'll be able to leave it running. I know that you'd probably like longer logs, but I have to get up before 4am to be there at 6am as it is! If I can get there a little earlier I will, though.

    Unless you need me to run one of those I won't (at least not this time).

    Not a problem, I'm very grateful that you're there to help me!
     
    more-solutions, Mar 10, 2010
    #8
  9. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    Thanks for this information.

    What is the difference between the resulting TK scan and the auto scan? I assume that having all units showing "new" rather than "ok" means it (initially anyway) runs NET SYNCNEW - is there anything I can do to make the states change to OK?

    Not sure why the C-Lution driver would care whether it's NEW or OK but it does seem to, although I haven't investigated this in much depth.

    Yes, this was changed a few years back on the back of advice.

    It would be good to change it back, I will do next time I am on site.

    I would be very surprised if it changed any settings but yes it would be good to know.

    I assume TK turns autosync back on - is that after the scan completes or on closing TK?

    Since the site should run for weeks without restarting, and since there are occasions where autosync will be required (eg after a faulty unit gets replaced), it obviously matters what impact running TK has (I'm the only person who runs it, which is limited to my site visits).

    That makes sense - the old config (which I reverted to) does seem to have had full logging enabled (for the past 3 years...). It's not caused a disk space issue because the log file had been set with NTFS compression, but I notice that the new version now rotates the logs which is sensible but breaks that compression (I assume I can work around this easily enough).

    Obviously NTFS doesn't do as well as 7-zip (which has long been installed on almost any PC I touch, a very good tool I agree) but it still makes a big difference for log files.
     
    more-solutions, Mar 10, 2010
    #9
  10. more-solutions

    daniel C-Busser Moderator

    Joined:
    Jul 26, 2004
    Messages:
    769
    Likes Received:
    21
    Location:
    Adelaide
    The TK scan is accelerated, no pauses between commands. It also queries a little more information for Toolkit's needs.

    TK when closing will set autosync=yes on all networks it interacted with.

    We'll take the site-specific stuff further in emails.
     
    daniel, Mar 10, 2010
    #10
  11. more-solutions

    daniel C-Busser Moderator

    Joined:
    Jul 26, 2004
    Messages:
    769
    Likes Received:
    21
    Location:
    Adelaide
    Toolkit 1.11.1 has been released. Announcement here.

    Changes relevant to this thread:

    1. Fixed C-Gate regression for sites using the project.start= config option. The networks will now open immediately instead of after 15 minutes.

    2. C-Gate will now sync up to 25 parallel networks at once (instead of 5). This should improve operations at CNI-centric installations.
     
    daniel, Mar 31, 2010
    #11
  12. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    Great, thanks! I will try when next on site (~1week)

    How does this compare with the older version? I have 35 networks, presumably the other 10 will start as the first ones finish?

    My objective, obviously, is to get all 35 networks in sync as quickly as possible. I appreciate that starting all 35 together is necessarily the best way to do that.
     
    more-solutions, Mar 31, 2010
    #12
  13. more-solutions

    Lucky555

    Joined:
    Aug 13, 2007
    Messages:
    229
    Likes Received:
    0
    Hi Daniel,

    The section above raises the question in my simple mind - What does Schedule Plus / Homegate do in relation to autosync= setting ?
    Thanks...
     
    Lucky555, Apr 4, 2010
    #13
  14. more-solutions

    Darren Senior Member

    Joined:
    Jul 29, 2004
    Messages:
    2,361
    Likes Received:
    0
    Location:
    Adelaide, South Australia
    They use autosync. The sync period is defined in the C-Gate configuration file.
     
    Darren, Apr 6, 2010
    #14
  15. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    Just realised I wrote that wrong. Obviously (I hope) what I meant was:

    I appreciate that starting all 35 together is not necessarily the best way to do that.
     
    more-solutions, Apr 6, 2010
    #15
Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.